[RFC PATCH net-next v2 0/5] netns: allow to identify peer netns

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
@ 2014-09-23 13:20 Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 1/5] netns: allocate netns ids Nicolas Dichtel
                   ` (7 more replies)
  0 siblings, 8 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-23 13:20 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto

The goal of this serie is to be able to multicast netlink messages with an
attribute that identify a peer netns.
This is needed by the userland to interpret some informations contained in
netlink messages (like IFLA_LINK value, but also some other attributes in case
of x-netns netdevice (see also
http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

Ids are stored in the parent user namespace. These ids are valid only inside
this user namespace. The user can retrieve these ids via a new netlink messages,
but only if peer netns are in the same user namespace.

Patch 1/5 and 2/5 introduce the netlink API mechanism to exports these ids to
the userland.
Patch 3/5 and 4/5 shows an example of how to use these ids in rtnetlink
messages. And patch 5/5 shows that the netlink messages can be symetric between
a GET and a SET.

iproute2 patches are available, I can send them on demand.

Here is a small screenshot to show how it can be used by userland:
$ ip netns add foo
$ ip netns del foo
$ ip netns
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns add foo
$ ip netns
foo (id: 3)
init_net (id: 1)
$ ip netns exec foo ip netns
foo (id: 3)
init_net (id: 1)
$ ip netns exec foo ip link add ipip1 link-netnsid 1 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip netns exec foo ip l ls ipip1
6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 1

The parameter link-netnsid shows us where the interface sends and receives
packets (and thus we know where encapsulated addresses are set).

RFCv1 -> RFCv2:
  remove useless ()
  ids are now stored in the user ns. It's possible to get an id for a peer netns
  only if the current netns and the peer netns have the same user ns parent.

 MAINTAINERS                    |   1 +
 include/linux/user_namespace.h |   4 ++
 include/net/ip_tunnels.h       |   1 +
 include/net/net_namespace.h    |  12 +++++
 include/net/rtnetlink.h        |   2 +
 include/uapi/linux/Kbuild      |   1 +
 include/uapi/linux/if_link.h   |   1 +
 include/uapi/linux/netns.h     |  29 ++++++++++
 kernel/user_namespace.c        |   6 +++
 net/core/net_namespace.c       | 119 ++++++++++++++++++++++++++++++++++++++++-
 net/core/rtnetlink.c           |  47 ++++++++++++++--
 net/ipv4/ip_gre.c              |   2 +
 net/ipv4/ip_tunnel.c           |   8 +++
 net/ipv4/ip_vti.c              |   1 +
 net/ipv4/ipip.c                |   1 +
 net/ipv6/sit.c                 |   1 +
 net/netlink/genetlink.c        |   4 ++
 17 files changed, 236 insertions(+), 4 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v2 1/5] netns: allocate netns ids
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
@ 2014-09-23 13:20 ` Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 2/5] netns: add genl cmd to get the id of a netns Nicolas Dichtel
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-23 13:20 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, Nicolas Dichtel

With this patch, an id is allocated for each netns. Id database is stored in the
user namespace. It's allowed to get an id of a peer netns only if they share the
same user ns.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/user_namespace.h |  4 ++++
 include/net/net_namespace.h    | 11 +++++++++++
 kernel/user_namespace.c        |  6 ++++++
 net/core/net_namespace.c       | 20 +++++++++++++++++++-
 4 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..9d122b540422 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -5,6 +5,7 @@
 #include <linux/nsproxy.h>
 #include <linux/sched.h>
 #include <linux/err.h>
+#include <linux/idr.h>
 
 #define UID_GID_MAP_MAX_EXTENTS 5
 
@@ -33,6 +34,9 @@ struct user_namespace {
 	struct key		*persistent_keyring_register;
 	struct rw_semaphore	persistent_keyring_register_sem;
 #endif
+#ifdef CONFIG_NET_NS
+	struct idr		netns_ids;
+#endif
 };
 
 extern struct user_namespace init_user_ns;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 361d26077196..92b5f94e2842 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,6 +59,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	int			netnsid;
 
 	unsigned int		proc_inum;
 
@@ -289,6 +290,16 @@ static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
+static inline int peernet2id(struct net *net, struct net *peer)
+{
+	if (net->user_ns != peer->user_ns)
+		return -EPERM;
+
+	return peer->netnsid;
+}
+
+struct net *get_net_from_netnsid(struct net *net, int id);
+
 struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..30316a2eed49 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -104,6 +104,9 @@ int create_user_ns(struct cred *new)
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	init_rwsem(&ns->persistent_keyring_register_sem);
 #endif
+#ifdef CONFIG_NET_NS
+	idr_init(&ns->netns_ids);
+#endif
 	return 0;
 }
 
@@ -133,6 +136,9 @@ void free_user_ns(struct user_namespace *ns)
 
 	do {
 		parent = ns->parent;
+#ifdef CONFIG_NET_NS
+		idr_destroy(&ns->netns_ids);
+#endif
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 		key_put(ns->persistent_keyring_register);
 #endif
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7f155175bba8..f44378de7831 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -144,6 +144,19 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+struct net *get_net_from_netnsid(struct net *net, int id)
+{
+	struct net *peer;
+
+	rcu_read_lock();
+	peer = idr_find(&net->user_ns->netns_ids, id);
+	if (peer)
+		get_net(peer);
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -151,13 +164,16 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 {
 	/* Must be called with net_mutex held */
 	const struct pernet_operations *ops, *saved_ops;
-	int error = 0;
+	int error = 0, id;
 	LIST_HEAD(net_exit_list);
 
 	atomic_set(&net->count, 1);
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+	id = idr_alloc_cyclic(&user_ns->netns_ids, net, 1, 0, GFP_KERNEL);
+	if (id > 0)
+		net->netnsid = id;
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -288,6 +304,8 @@ static void cleanup_net(struct work_struct *work)
 	list_for_each_entry(net, &net_kill_list, cleanup_list) {
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
+		if (net->netnsid)
+			idr_remove(&net->user_ns->netns_ids, net->netnsid);
 	}
 	rtnl_unlock();
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v2 2/5] netns: add genl cmd to get the id of a netns
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 1/5] netns: allocate netns ids Nicolas Dichtel
@ 2014-09-23 13:20 ` Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 3/5] rtnl: add link netns id to interface messages Nicolas Dichtel
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-23 13:20 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, Nicolas Dichtel

This patch allows a user to get an id of a peer netns. It will be usefull for
userland to be able to associate a netns file descriptor with a netns id.

Note: to be able to got an id, both netns should be in the same user ns.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 MAINTAINERS                 |  1 +
 include/net/net_namespace.h |  1 +
 include/uapi/linux/Kbuild   |  1 +
 include/uapi/linux/netns.h  | 29 +++++++++++++
 net/core/net_namespace.c    | 99 +++++++++++++++++++++++++++++++++++++++++++++
 net/netlink/genetlink.c     |  4 ++
 6 files changed, 135 insertions(+)
 create mode 100644 include/uapi/linux/netns.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b4e23acc6441..dbf691c68473 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6278,6 +6278,7 @@ F:	include/linux/netdevice.h
 F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
+F:	include/uapi/linux/netns.h
 F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 92b5f94e2842..1b65f5ccacf5 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -299,6 +299,7 @@ static inline int peernet2id(struct net *net, struct net *peer)
 }
 
 struct net *get_net_from_netnsid(struct net *net, int id);
+int netns_genl_register(void);
 
 struct pernet_operations {
 	struct list_head list;
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index fb3f7b675229..840f049c48fa 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -275,6 +275,7 @@ header-y += netfilter_decnet.h
 header-y += netfilter_ipv4.h
 header-y += netfilter_ipv6.h
 header-y += netlink.h
+header-y += netns.h
 header-y += netrom.h
 header-y += nfc.h
 header-y += nfs.h
diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
new file mode 100644
index 000000000000..72537e31d7d2
--- /dev/null
+++ b/include/uapi/linux/netns.h
@@ -0,0 +1,29 @@
+#ifndef _UAPI_LINUX_NETNS_H_
+#define _UAPI_LINUX_NETNS_H_
+
+/* Generic netlink messages */
+
+#define NETNS_GENL_NAME			"netns"
+#define NETNS_GENL_VERSION		0x1
+
+/* Commands */
+enum {
+	NETNS_CMD_UNSPEC,
+	NETNS_CMD_GET,
+	__NETNS_CMD_MAX,
+};
+
+#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
+
+/* Attributes */
+enum {
+	NETNSA_NONE,
+	NETNSA_NSINDEX,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NETNS_H_ */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f44378de7831..a60f2bbf4302 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -15,6 +15,8 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/netns.h>
+#include <net/genetlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -417,6 +419,103 @@ static struct pernet_operations __net_initdata net_ns_ops = {
 	.exit = net_ns_net_exit,
 };
 
+static struct genl_family netns_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NETNS_GENL_NAME,
+	.version	= NETNS_GENL_VERSION,
+	.hdrsize	= 0,
+	.maxattr	= NETNSA_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC, },
+	[NETNSA_NSINDEX]	= { .type = NLA_U32, },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int netns_nl_get_size(void)
+{
+	return nla_total_size(sizeof(u32)) /* NETNSA_NSINDEX */
+	       ;
+}
+
+static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net, struct net *peer)
+{
+	void *hdr;
+	int id;
+
+	hdr = genlmsg_put(skb, portid, seq, &netns_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	id = peernet2id(net, peer);
+	if (id < 0)
+		return id;
+	if (nla_put_u32(skb, NETNSA_NSINDEX, id))
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int netns_nl_cmd_getid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct net *peer;
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
+			    NLM_F_ACK, NETNS_CMD_GETID, net, peer);
+	if (err < 0)
+		goto err_out;
+
+	err = genlmsg_unicast(net, msg, info->snd_portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+out:
+	put_net(peer);
+	return err;
+}
+
+static struct genl_ops netns_nl_ops[] = {
+	{
+		.cmd = NETNS_CMD_GETID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_getid,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+int netns_genl_register(void)
+{
+	return genl_register_family_with_ops(&netns_nl_family, netns_nl_ops);
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 76393f2f4b22..c6f39e40c9f3 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1029,6 +1029,10 @@ static int __init genl_init(void)
 	if (err)
 		goto problem;
 
+	err = netns_genl_register();
+	if (err < 0)
+		goto problem;
+
 	return 0;
 
 problem:
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v2 3/5] rtnl: add link netns id to interface messages
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 1/5] netns: allocate netns ids Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 2/5] netns: add genl cmd to get the id of a netns Nicolas Dichtel
@ 2014-09-23 13:20 ` Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 4/5] iptunnels: advertise link netns via netlink Nicolas Dichtel
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-23 13:20 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, Nicolas Dichtel

This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels). When there is no id,
because user ns of link netns and interface netns is not the same, we put 0
into this attribute (id 0 is not valid) to indicate to userland that the link
netns is different from the interface netns. Hence, userland knows that some
information like IFLA_LINK are not interpretable.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/rtnetlink.h      |  2 ++
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 22 ++++++++++++++++++++++
 3 files changed, 25 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index e21b9f9653c0..6c6d5393fc34 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -46,6 +46,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  *			    to create when creating a new device.
  *	@get_num_rx_queues: Function to determine number of receive queues
  *			    to create when creating a new device.
+ *	@get_link_net: Function to get the i/o netns of the device
  */
 struct rtnl_link_ops {
 	struct list_head	list;
@@ -93,6 +94,7 @@ struct rtnl_link_ops {
 	int			(*fill_slave_info)(struct sk_buff *skb,
 						   const struct net_device *dev,
 						   const struct net_device *slave_dev);
+	struct net		*(*get_link_net)(const struct net_device *dev);
 };
 
 int __rtnl_link_register(struct rtnl_link_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c80f95f6ee78..21dd2bcb295f 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_LINK_NETNSID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a6882686ca3a..99ed83c62685 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -862,6 +862,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
+	       + nla_total_size(4) /* IFLA_LINK_NETNSID */
 	       + nla_total_size(ext_filter_mask
 			        & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */
 	       + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
@@ -1134,6 +1135,27 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			goto nla_put_failure;
 	}
 
+	if (dev->rtnl_link_ops &&
+	    dev->rtnl_link_ops->get_link_net) {
+		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
+
+		if (!net_eq(dev_net(dev), link_net)) {
+			int id = peernet2id(dev_net(dev), link_net);
+
+			/* If the link netns is not in the same user ns, put id
+			 * 0 in IFLA_LINK_NETNSID to indicate to userland that
+			 * the link netns is not the current netns, but that it
+			 * don't have access to it.
+			 */
+			if (id == -EPERM)
+				id = 0;
+
+			if (id >= 0 &&
+			    nla_put_u32(skb, IFLA_LINK_NETNSID, id))
+				goto nla_put_failure;
+		}
+	}
+
 	if (!(af_spec = nla_nest_start(skb, IFLA_AF_SPEC)))
 		goto nla_put_failure;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v2 4/5] iptunnels: advertise link netns via netlink
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
                   ` (2 preceding siblings ...)
  2014-09-23 13:20 ` [RFC PATCH net-next v2 3/5] rtnl: add link netns id to interface messages Nicolas Dichtel
@ 2014-09-23 13:20 ` Nicolas Dichtel
  2014-09-23 13:20 ` [RFC PATCH net-next v2 5/5] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-23 13:20 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, Nicolas Dichtel

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/ip_tunnels.h | 1 +
 net/ipv4/ip_gre.c        | 2 ++
 net/ipv4/ip_tunnel.c     | 8 ++++++++
 net/ipv4/ip_vti.c        | 1 +
 net/ipv4/ipip.c          | 1 +
 net/ipv6/sit.c           | 1 +
 6 files changed, 14 insertions(+)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 7f538ba6e267..c92a99b5b77e 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -119,6 +119,7 @@ struct ip_tunnel_net {
 int ip_tunnel_init(struct net_device *dev);
 void ip_tunnel_uninit(struct net_device *dev);
 void  ip_tunnel_dellink(struct net_device *dev, struct list_head *head);
+struct net *ip_tunnel_get_link_net(const struct net_device *dev);
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 		       struct rtnl_link_ops *ops, char *devname);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 829aff8bf723..05157427d8f0 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -825,6 +825,7 @@ static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
@@ -839,6 +840,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __net_init ipgre_tap_init_net(struct net *net)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index e3a3dc91e49c..da5a2b6fed81 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -954,6 +954,14 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_dellink);
 
+struct net *ip_tunnel_get_link_net(const struct net_device *dev)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip_tunnel_get_link_net);
+
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 				  struct rtnl_link_ops *ops, char *devname)
 {
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index e453cb724a95..93862411669c 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -530,6 +530,7 @@ static struct rtnl_link_ops vti_link_ops __read_mostly = {
 	.changelink	= vti_changelink,
 	.get_size	= vti_get_size,
 	.fill_info	= vti_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __init vti_init(void)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index bfec31df8b21..fd423e65d6df 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -496,6 +496,7 @@ static struct rtnl_link_ops ipip_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipip_get_size,
 	.fill_info	= ipip_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel ipip_handler __read_mostly = {
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index db75809ab843..5c227cc13170 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1761,6 +1761,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
 	.get_size	= ipip6_get_size,
 	.fill_info	= ipip6_fill_info,
 	.dellink	= ipip6_dellink,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel sit_handler __read_mostly = {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v2 5/5] rtnl: allow to create device with IFLA_LINK_NETNSID set
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
                   ` (3 preceding siblings ...)
  2014-09-23 13:20 ` [RFC PATCH net-next v2 4/5] iptunnels: advertise link netns via netlink Nicolas Dichtel
@ 2014-09-23 13:20 ` Nicolas Dichtel
  2014-09-23 19:22 ` [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Cong Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-23 13:20 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, Nicolas Dichtel

This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/rtnetlink.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 99ed83c62685..34b894ae79b4 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1220,6 +1220,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
+	[IFLA_LINK_NETNSID]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1992,7 +1993,7 @@ replay:
 		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 0];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
-		struct net *dest_net;
+		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
@@ -2098,7 +2099,18 @@ replay:
 		if (IS_ERR(dest_net))
 			return PTR_ERR(dest_net);
 
-		dev = rtnl_create_link(dest_net, ifname, name_assign_type, ops, tb);
+		if (tb[IFLA_LINK_NETNSID]) {
+			int id = nla_get_u32(tb[IFLA_LINK_NETNSID]);
+
+			link_net = get_net_from_netnsid(dest_net, id);
+			if (link_net == NULL) {
+				err =  -EINVAL;
+				goto out;
+			}
+		}
+
+		dev = rtnl_create_link(link_net ? : dest_net, ifname,
+				       name_assign_type, ops, tb);
 		if (IS_ERR(dev)) {
 			err = PTR_ERR(dev);
 			goto out;
@@ -2126,9 +2138,16 @@ replay:
 			}
 		}
 		err = rtnl_configure_link(dev, ifm);
-		if (err < 0)
+		if (err < 0) {
 			unregister_netdevice(dev);
+			goto out;
+		}
+
+		if (link_net)
+			err = dev_change_net_namespace(dev, net, ifname);
 out:
+		if (link_net)
+			put_net(link_net);
 		put_net(dest_net);
 		return err;
 	}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
                   ` (4 preceding siblings ...)
  2014-09-23 13:20 ` [RFC PATCH net-next v2 5/5] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
@ 2014-09-23 19:22 ` Cong Wang
  2014-09-24  9:23   ` Nicolas Dichtel
  2014-09-23 19:26 ` Andy Lutomirski
  2014-09-26 18:10 ` Eric W. Biederman
  7 siblings, 1 reply; 67+ messages in thread
From: Cong Wang @ 2014-09-23 19:22 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
>
> Here is a small screenshot to show how it can be used by userland:
> $ ip netns add foo
> $ ip netns del foo
> $ ip netns
> $ touch /var/run/netns/init_net
> $ mount --bind /proc/1/ns/net /var/run/netns/init_net
> $ ip netns add foo
> $ ip netns
> foo (id: 3)
> init_net (id: 1)
> $ ip netns exec foo ip netns
> foo (id: 3)
> init_net (id: 1)
> $ ip netns exec foo ip link add ipip1 link-netnsid 1 type ipip remote 10.16.0.121 local 10.16.0.249
> $ ip netns exec foo ip l ls ipip1
> 6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default
>     link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 1
>
> The parameter link-netnsid shows us where the interface sends and receives
> packets (and thus we know where encapsulated addresses are set).
>

So ipip1 is shown in netns foo but functioning in netns init_net? Getting the
id of init_net in foo depends on your mount namespace, /var/run/netns/ may
not visible inside foo, in this case, link-netnsid is meaningless. It
is not your
fault, network namespace already heavily relies on mount namespace (sysfs
needs to be remount otherwise you can not create device with the same name.)

On the other hand, what's the problem you are trying to solve? AFAIK,
the ifindex
issue is purely in output, IOW, the device still functions correctly
even through
its link ifindex is not correct after moving to another namespace. If
not, it is bug
we need to fix.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
                   ` (5 preceding siblings ...)
  2014-09-23 19:22 ` [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Cong Wang
@ 2014-09-23 19:26 ` Andy Lutomirski
  2014-09-24  9:31   ` Nicolas Dichtel
  2014-09-26 18:10 ` Eric W. Biederman
  7 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2014-09-23 19:26 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Network Development, Linux Containers, linux-kernel, Linux API,
	David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Andrew Morton

On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> The goal of this serie is to be able to multicast netlink messages with an
> attribute that identify a peer netns.
> This is needed by the userland to interpret some informations contained in
> netlink messages (like IFLA_LINK value, but also some other attributes in case
> of x-netns netdevice (see also
> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>
> Ids are stored in the parent user namespace. These ids are valid only inside
> this user namespace. The user can retrieve these ids via a new netlink messages,
> but only if peer netns are in the same user namespace.

What about the parent / ancestors of the owning userns?  Can processes
in those usernses see any form of netns id?

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-23 19:22 ` [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Cong Wang
@ 2014-09-24  9:23   ` Nicolas Dichtel
  2014-09-24 16:01     ` Cong Wang
  0 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-24  9:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

Le 23/09/2014 21:22, Cong Wang a écrit :
> On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>>
>> Here is a small screenshot to show how it can be used by userland:
>> $ ip netns add foo
>> $ ip netns del foo
>> $ ip netns
>> $ touch /var/run/netns/init_net
>> $ mount --bind /proc/1/ns/net /var/run/netns/init_net
>> $ ip netns add foo
>> $ ip netns
>> foo (id: 3)
>> init_net (id: 1)
>> $ ip netns exec foo ip netns
>> foo (id: 3)
>> init_net (id: 1)
>> $ ip netns exec foo ip link add ipip1 link-netnsid 1 type ipip remote 10.16.0.121 local 10.16.0.249
>> $ ip netns exec foo ip l ls ipip1
>> 6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default
>>      link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 1
>>
>> The parameter link-netnsid shows us where the interface sends and receives
>> packets (and thus we know where encapsulated addresses are set).
>>
>
> So ipip1 is shown in netns foo but functioning in netns init_net? Getting the
> id of init_net in foo depends on your mount namespace, /var/run/netns/ may
> not visible inside foo, in this case, link-netnsid is meaningless. It
> is not your
> fault, network namespace already heavily relies on mount namespace (sysfs
> needs to be remount otherwise you can not create device with the same name.)
>
> On the other hand, what's the problem you are trying to solve? AFAIK,
> the ifindex
> issue is purely in output, IOW, the device still functions correctly
> even through
> its link ifindex is not correct after moving to another namespace. If
> not, it is bug
> we need to fix.
>
The problem is explained here:
http://thread.gmane.org/gmane.linux.network/315933/focus=316064
and here:
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-23 19:26 ` Andy Lutomirski
@ 2014-09-24  9:31   ` Nicolas Dichtel
  2014-09-24 17:05     ` Andy Lutomirski
  0 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-24  9:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Network Development, Linux Containers, linux-kernel, Linux API,
	David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Andrew Morton

Le 23/09/2014 21:26, Andy Lutomirski a écrit :
> On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> The goal of this serie is to be able to multicast netlink messages with an
>> attribute that identify a peer netns.
>> This is needed by the userland to interpret some informations contained in
>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>> of x-netns netdevice (see also
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>
>> Ids are stored in the parent user namespace. These ids are valid only inside
>> this user namespace. The user can retrieve these ids via a new netlink messages,
>> but only if peer netns are in the same user namespace.
>
> What about the parent / ancestors of the owning userns?  Can processes
> in those usernses see any form of netns id?
With this serie no. I'm not sure if ancestors really needs to be able to
get these ids. What is your opinion?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24  9:23   ` Nicolas Dichtel
@ 2014-09-24 16:01     ` Cong Wang
  2014-09-24 16:15       ` Cong Wang
  2014-09-24 16:27       ` Nicolas Dichtel
  0 siblings, 2 replies; 67+ messages in thread
From: Cong Wang @ 2014-09-24 16:01 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On Wed, Sep 24, 2014 at 2:23 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 23/09/2014 21:22, Cong Wang a écrit :
>
>> On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
>> <nicolas.dichtel@6wind.com> wrote:
>>>
>>>
>>> Here is a small screenshot to show how it can be used by userland:
>>> $ ip netns add foo
>>> $ ip netns del foo
>>> $ ip netns
>>> $ touch /var/run/netns/init_net
>>> $ mount --bind /proc/1/ns/net /var/run/netns/init_net
>>> $ ip netns add foo
>>> $ ip netns
>>> foo (id: 3)
>>> init_net (id: 1)
>>> $ ip netns exec foo ip netns
>>> foo (id: 3)
>>> init_net (id: 1)
>>> $ ip netns exec foo ip link add ipip1 link-netnsid 1 type ipip remote
>>> 10.16.0.121 local 10.16.0.249
>>> $ ip netns exec foo ip l ls ipip1
>>> 6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
>>> DEFAULT group default
>>>      link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 1
>>>
>>> The parameter link-netnsid shows us where the interface sends and
>>> receives
>>> packets (and thus we know where encapsulated addresses are set).
>>>
>>
>> So ipip1 is shown in netns foo but functioning in netns init_net? Getting
>> the
>> id of init_net in foo depends on your mount namespace, /var/run/netns/ may
>> not visible inside foo, in this case, link-netnsid is meaningless. It
>> is not your
>> fault, network namespace already heavily relies on mount namespace (sysfs
>> needs to be remount otherwise you can not create device with the same
>> name.)
>>
>> On the other hand, what's the problem you are trying to solve? AFAIK,
>> the ifindex
>> issue is purely in output, IOW, the device still functions correctly
>> even through
>> its link ifindex is not correct after moving to another namespace. If
>> not, it is bug
>> we need to fix.
>>
> The problem is explained here:
> http://thread.gmane.org/gmane.linux.network/315933/focus=316064
> and here:
> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239
>

Please, summarize the discussion in your changelog, instead of pointing
to a long thread.

And clearly you missed my question above: how do you get netns id
without sharing /var/run/netns/ ?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 16:01     ` Cong Wang
@ 2014-09-24 16:15       ` Cong Wang
  2014-09-24 16:31         ` Nicolas Dichtel
  2014-09-24 16:27       ` Nicolas Dichtel
  1 sibling, 1 reply; 67+ messages in thread
From: Cong Wang @ 2014-09-24 16:15 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On Wed, Sep 24, 2014 at 9:01 AM, Cong Wang <cwang@twopensource.com> wrote:
>
> And clearly you missed my question above: how do you get netns id
> without sharing /var/run/netns/ ?

OK, I found it:

> Ids are stored in the parent user namespace. These ids are valid only inside
> this user namespace. The user can retrieve these ids via a new netlink messages,
> but only if peer netns are in the same user namespace.

So your example is confusing, perhaps you need some other way to show the ID's
instead of binding to ip netns output which is basically ls
/var/run/netns/. We don't
want an inner netns know anything outside, IOW, we don't share /var/run/netns/.
I think in this case your ID's are still available, but aren't you
providing a new way
for the inner netns device to escape which we are trying to avoid?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 16:01     ` Cong Wang
  2014-09-24 16:15       ` Cong Wang
@ 2014-09-24 16:27       ` Nicolas Dichtel
  2014-09-24 16:45         ` Cong Wang
  1 sibling, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-24 16:27 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

Le 24/09/2014 18:01, Cong Wang a écrit :
> On Wed, Sep 24, 2014 at 2:23 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> Le 23/09/2014 21:22, Cong Wang a écrit :
>>
>>> On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
>>> <nicolas.dichtel@6wind.com> wrote:
>>>>
>>>>
>>>> Here is a small screenshot to show how it can be used by userland:
>>>> $ ip netns add foo
>>>> $ ip netns del foo
>>>> $ ip netns
>>>> $ touch /var/run/netns/init_net
>>>> $ mount --bind /proc/1/ns/net /var/run/netns/init_net
>>>> $ ip netns add foo
>>>> $ ip netns
>>>> foo (id: 3)
>>>> init_net (id: 1)
>>>> $ ip netns exec foo ip netns
>>>> foo (id: 3)
>>>> init_net (id: 1)
>>>> $ ip netns exec foo ip link add ipip1 link-netnsid 1 type ipip remote
>>>> 10.16.0.121 local 10.16.0.249
>>>> $ ip netns exec foo ip l ls ipip1
>>>> 6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
>>>> DEFAULT group default
>>>>       link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 1
>>>>
>>>> The parameter link-netnsid shows us where the interface sends and
>>>> receives
>>>> packets (and thus we know where encapsulated addresses are set).
>>>>
>>>
>>> So ipip1 is shown in netns foo but functioning in netns init_net? Getting
>>> the
>>> id of init_net in foo depends on your mount namespace, /var/run/netns/ may
>>> not visible inside foo, in this case, link-netnsid is meaningless. It
>>> is not your
>>> fault, network namespace already heavily relies on mount namespace (sysfs
>>> needs to be remount otherwise you can not create device with the same
>>> name.)
>>>
>>> On the other hand, what's the problem you are trying to solve? AFAIK,
>>> the ifindex
>>> issue is purely in output, IOW, the device still functions correctly
>>> even through
>>> its link ifindex is not correct after moving to another namespace. If
>>> not, it is bug
>>> we need to fix.
>>>
>> The problem is explained here:
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064
>> and here:
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239
>>
>
> Please, summarize the discussion in your changelog, instead of pointing
> to a long thread.
The thread is long, but the mail in focus contains the information. Here is a 
copy and paste:
What I'm trying to solve is to have full info in netlink messages sent by the
kernel, thus beeing able to identify a peer netns (and this is close from what
audit guys are trying to have). Theorically, messages sent by the kernel can be
reused as is to have the same configuration. This is not the case with x-netns
devices. Here is an example, with ip tunnels:

$ ip netns add 1
$ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev eth0
$ ip -d link ls ipip1
8: ipip1 <at> eth0: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT
group default
      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
      ipip remote 10.16.0.121 local 10.16.0.249 dev eth0 ttl inherit pmtudisc
$ ip link set ipip1 netns 1
$ ip netns exec 1 ip -d link ls ipip1
8: ipip1 <at> tunl0: <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN mode
DEFAULT group default
      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
      ipip remote 10.16.0.121 local 10.16.0.249 dev tunl0 ttl inherit pmtudisc

Now informations got with 'ip link' are wrong and incomplete:
   - the link dev is now tunl0 instead of eth0, because we only got an ifindex
     from the kernel without any netns informations.
   - the encapsulation addresses are not part of this netns but the user doesn't
     known that (still because netns info is missing). These IPv4 addresses may
     exist into this netns.
   - it's not possible to create the same netdevice with these infos.

Hope it's more clear now.

>
> And clearly you missed my question above: how do you get netns id
> without sharing /var/run/netns/ ?
>
You can get an id only if you already have a "pointer" to this netns.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 16:15       ` Cong Wang
@ 2014-09-24 16:31         ` Nicolas Dichtel
  2014-09-24 16:48           ` Cong Wang
  0 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-24 16:31 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

Le 24/09/2014 18:15, Cong Wang a écrit :
> On Wed, Sep 24, 2014 at 9:01 AM, Cong Wang <cwang@twopensource.com> wrote:
>>
>> And clearly you missed my question above: how do you get netns id
>> without sharing /var/run/netns/ ?
>
> OK, I found it:
>
>> Ids are stored in the parent user namespace. These ids are valid only inside
>> this user namespace. The user can retrieve these ids via a new netlink messages,
>> but only if peer netns are in the same user namespace.
>
> So your example is confusing, perhaps you need some other way to show the ID's
> instead of binding to ip netns output which is basically ls
> /var/run/netns/. We don't
> want an inner netns know anything outside, IOW, we don't share /var/run/netns/.
Hmm, not sure to understand you. My usecase shares /var/run/netns, because
there is only one user ns and one mount ns.

> I think in this case your ID's are still available, but aren't you
> providing a new way
> for the inner netns device to escape which we are trying to avoid?
It's why the ids depend on user ns. Only if user ns are the same we allow to
get an id for a peer netns.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 16:27       ` Nicolas Dichtel
@ 2014-09-24 16:45         ` Cong Wang
  2014-09-25  8:53           ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Cong Wang @ 2014-09-24 16:45 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On Wed, Sep 24, 2014 at 9:27 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Now informations got with 'ip link' are wrong and incomplete:
>   - the link dev is now tunl0 instead of eth0, because we only got an
> ifindex
>     from the kernel without any netns informations.

This is not new, macvlan has the same problem. This is why I said
it is mostly a display problem, maybe just mark the ifindex as -1 or
something when it is not in this netns. At least I don't expect the inner
netns know anything outside, and I don't think I am the only one using
netns in this way.

>   - the encapsulation addresses are not part of this netns but the user
> doesn't
>     known that (still because netns info is missing). These IPv4 addresses
> may
>     exist into this netns.

I don't remember your x-netns code, but we have two choices:

1) Lookup the route of the netns which it is in

If the address is not available in this netns, it will fail, this is expected
since tunnel device is not a pure L2 device. Or maybe just fail
early when we move it.

2) Lookup the route of the netns where it was created

Transparent for upper layer, but as you said, the outer address is not
available in this netns therefore hard to display. Just hiding this information
doesn't seem wrong to me.

>   - it's not possible to create the same netdevice with these infos.
>

This is expected, because after all you are already in a different netns.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 16:31         ` Nicolas Dichtel
@ 2014-09-24 16:48           ` Cong Wang
  2014-09-25  8:53             ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Cong Wang @ 2014-09-24 16:48 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On Wed, Sep 24, 2014 at 9:31 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
>> I think in this case your ID's are still available, but aren't you
>> providing a new way
>> for the inner netns device to escape which we are trying to avoid?
>
> It's why the ids depend on user ns. Only if user ns are the same we allow to
> get an id for a peer netns.

Too late, userns is relatively new, relying on it breaks our existing
assumption.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24  9:31   ` Nicolas Dichtel
@ 2014-09-24 17:05     ` Andy Lutomirski
  2014-09-25  7:54       ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2014-09-24 17:05 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Network Development, Linux Containers, linux-kernel, Linux API,
	David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Andrew Morton

On Wed, Sep 24, 2014 at 2:31 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 23/09/2014 21:26, Andy Lutomirski a écrit :
>
>> On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
>> <nicolas.dichtel@6wind.com> wrote:
>>>
>>> The goal of this serie is to be able to multicast netlink messages with
>>> an
>>> attribute that identify a peer netns.
>>> This is needed by the userland to interpret some informations contained
>>> in
>>> netlink messages (like IFLA_LINK value, but also some other attributes in
>>> case
>>> of x-netns netdevice (see also
>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>
>>> Ids are stored in the parent user namespace. These ids are valid only
>>> inside
>>> this user namespace. The user can retrieve these ids via a new netlink
>>> messages,
>>> but only if peer netns are in the same user namespace.
>>
>>
>> What about the parent / ancestors of the owning userns?  Can processes
>> in those usernses see any form of netns id?
>
> With this serie no. I'm not sure if ancestors really needs to be able to
> get these ids. What is your opinion?

I might be missing some consideration here, but I would hope that ip
link would work correctly if I have a veth interface shared with a
netns that's in a child userns.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 17:05     ` Andy Lutomirski
@ 2014-09-25  7:54       ` Nicolas Dichtel
  0 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-25  7:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Network Development, Linux Containers, linux-kernel, Linux API,
	David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Andrew Morton

Le 24/09/2014 19:05, Andy Lutomirski a écrit :
> On Wed, Sep 24, 2014 at 2:31 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> Le 23/09/2014 21:26, Andy Lutomirski a écrit :
>>
>>> On Tue, Sep 23, 2014 at 6:20 AM, Nicolas Dichtel
>>> <nicolas.dichtel@6wind.com> wrote:
>>>>
>>>> The goal of this serie is to be able to multicast netlink messages with
>>>> an
>>>> attribute that identify a peer netns.
>>>> This is needed by the userland to interpret some informations contained
>>>> in
>>>> netlink messages (like IFLA_LINK value, but also some other attributes in
>>>> case
>>>> of x-netns netdevice (see also
>>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>>
>>>> Ids are stored in the parent user namespace. These ids are valid only
>>>> inside
>>>> this user namespace. The user can retrieve these ids via a new netlink
>>>> messages,
>>>> but only if peer netns are in the same user namespace.
>>>
>>>
>>> What about the parent / ancestors of the owning userns?  Can processes
>>> in those usernses see any form of netns id?
>>
>> With this serie no. I'm not sure if ancestors really needs to be able to
>> get these ids. What is your opinion?
>
> I might be missing some consideration here, but I would hope that ip
> link would work correctly if I have a veth interface shared with a
> netns that's in a child userns.
No, you're right. Will send a v3.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 16:48           ` Cong Wang
@ 2014-09-25  8:53             ` Nicolas Dichtel
  2014-09-26  1:58               ` Cong Wang
  0 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-25  8:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

Le 24/09/2014 18:48, Cong Wang a écrit :
> On Wed, Sep 24, 2014 at 9:31 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>>> I think in this case your ID's are still available, but aren't you
>>> providing a new way
>>> for the inner netns device to escape which we are trying to avoid?
>>
>> It's why the ids depend on user ns. Only if user ns are the same we allow to
>> get an id for a peer netns.
>
> Too late, userns is relatively new, relying on it breaks our existing
> assumption.
>
I don't get your point. netns has been added in kernel after user ns:
acce292c82d4 user namespace: add the framework => 2.6.23
5f256becd868 [NET]: Basic network namespace infrastructure. => 2.6.24

In the kernel, each netns is linked with a user ns.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-24 16:45         ` Cong Wang
@ 2014-09-25  8:53           ` Nicolas Dichtel
  2014-09-26  2:09             ` Cong Wang
  0 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-25  8:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

Le 24/09/2014 18:45, Cong Wang a écrit :
> On Wed, Sep 24, 2014 at 9:27 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> Now informations got with 'ip link' are wrong and incomplete:
>>    - the link dev is now tunl0 instead of eth0, because we only got an
>> ifindex
>>      from the kernel without any netns informations.
>
> This is not new, macvlan has the same problem. This is why I said
> it is mostly a display problem, maybe just mark the ifindex as -1 or
> something when it is not in this netns. At least I don't expect the inner
> netns know anything outside, and I don't think I am the only one using
> netns in this way.
I understand your point but there is several use of netns. Netns can be used
also to instantiate virtual routers. In this case, administrators or daemons
need to be able to monitor and dump the configuration on all netns
(particularly beeing able to identify fully x-netns interfaces). We start to
discuss this in one of the two thread pointed in my cover letter and get the
conclusion that checking user ns is a good way to know if an id should be
disclosed or not for a peer netns.
Can you describe your use case?

>
>>    - the encapsulation addresses are not part of this netns but the user
>> doesn't
>>      known that (still because netns info is missing). These IPv4 addresses
>> may
>>      exist into this netns.
>
> I don't remember your x-netns code, but we have two choices:
>
> 1) Lookup the route of the netns which it is in
>
> If the address is not available in this netns, it will fail, this is expected
> since tunnel device is not a pure L2 device. Or maybe just fail
> early when we move it.
>
> 2) Lookup the route of the netns where it was created
>
> Transparent for upper layer, but as you said, the outer address is not
> available in this netns therefore hard to display. Just hiding this information
> doesn't seem wrong to me.
Your assumption here is that all dameons were started before the tunnel was
created. But this is not true, a daemon may be started later. Another case is
when a daemon crash: we need to be able to restart it and it should be able to
recover all needed information.

>
>
>>    - it's not possible to create the same netdevice with these infos.
>>
>
> This is expected, because after all you are already in a different netns.
>
A different netns only means a different network stack, not a different user ns
or mount ns or PID ns, ...
If you only play with netns, you may want to monitor all activies in all netns
(this is already possible) and beeing able to link information between netns
(this is what I'm trying to solve).

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-25  8:53             ` Nicolas Dichtel
@ 2014-09-26  1:58               ` Cong Wang
  2014-09-26 13:38                 ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Cong Wang @ 2014-09-26  1:58 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On Thu, Sep 25, 2014 at 1:53 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 24/09/2014 18:48, Cong Wang a écrit :
>
>> On Wed, Sep 24, 2014 at 9:31 AM, Nicolas Dichtel
>> <nicolas.dichtel@6wind.com> wrote:
>>>>
>>>> I think in this case your ID's are still available, but aren't you
>>>> providing a new way
>>>> for the inner netns device to escape which we are trying to avoid?
>>>
>>>
>>> It's why the ids depend on user ns. Only if user ns are the same we allow
>>> to
>>> get an id for a peer netns.
>>
>>
>> Too late, userns is relatively new, relying on it breaks our existing
>> assumption.
>>
> I don't get your point. netns has been added in kernel after user ns:
> acce292c82d4 user namespace: add the framework => 2.6.23
> 5f256becd868 [NET]: Basic network namespace infrastructure. => 2.6.24

Was it complete on 2.6.x? I doubt...

https://lkml.org/lkml/2014/8/20/826

   As at Linux 3.8, most relevant subsystems supported  user  names‐
       paces,  but  a number of filesystems did not have the infrastruc‐
       ture needed to map user and group IDs  between  user  namespaces.
       Linux  3.9  added the required infrastructure support for many of
       the remaining unsupported filesystems (Plan 9 (9P),  Andrew  File
       System  (AFS),  Ceph,  CIFS,  CODA,  NFS, and OCFS2).  Linux 3.11
       added support the last of the unsupported major filesystems, XFS.


>
> In the kernel, each netns is linked with a user ns.

Are you saying every time we create a netns we have a new userns?
This doesn't make sense for me.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-25  8:53           ` Nicolas Dichtel
@ 2014-09-26  2:09             ` Cong Wang
  2014-09-26 13:40               ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Cong Wang @ 2014-09-26  2:09 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On Thu, Sep 25, 2014 at 1:53 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 24/09/2014 18:45, Cong Wang a écrit :
>>
>> On Wed, Sep 24, 2014 at 9:27 AM, Nicolas Dichtel
>> <nicolas.dichtel@6wind.com> wrote:
>>>
>>> Now informations got with 'ip link' are wrong and incomplete:
>>>    - the link dev is now tunl0 instead of eth0, because we only got an
>>> ifindex
>>>      from the kernel without any netns informations.
>>
>>
>> This is not new, macvlan has the same problem. This is why I said
>> it is mostly a display problem, maybe just mark the ifindex as -1 or
>> something when it is not in this netns. At least I don't expect the inner
>> netns know anything outside, and I don't think I am the only one using
>> netns in this way.
>
> I understand your point but there is several use of netns. Netns can be used
> also to instantiate virtual routers. In this case, administrators or daemons
> need to be able to monitor and dump the configuration on all netns
> (particularly beeing able to identify fully x-netns interfaces). We start to
> discuss this in one of the two thread pointed in my cover letter and get the
> conclusion that checking user ns is a good way to know if an id should be
> disclosed or not for a peer netns.

Then you are leaking information, this breaks isolation.

> Can you describe your use case?

Yes, too simple: isolation networking, different netns's don't see each other
(including anything inside) and only communicate via veth.


> If you only play with netns, you may want to monitor all activies in all
> netns
> (this is already possible) and beeing able to link information between netns
> (this is what I'm trying to solve).


No, I don't want to monitor anything. Even if I wanted, I would just start one
daemon in each netns instead of one for all.

On the other hand, why not exchange the configuration via veth
between different netns? There are many ways to do so with TCP HTTP etc.
This doesn't have to be solved in kernel.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26  1:58               ` Cong Wang
@ 2014-09-26 13:38                 ` Nicolas Dichtel
  0 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-26 13:38 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

Le 26/09/2014 03:58, Cong Wang a écrit :
> On Thu, Sep 25, 2014 at 1:53 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> Le 24/09/2014 18:48, Cong Wang a écrit :
>>
>>> On Wed, Sep 24, 2014 at 9:31 AM, Nicolas Dichtel
>>> <nicolas.dichtel@6wind.com> wrote:
>>>>>
>>>>> I think in this case your ID's are still available, but aren't you
>>>>> providing a new way
>>>>> for the inner netns device to escape which we are trying to avoid?
>>>>
>>>>
>>>> It's why the ids depend on user ns. Only if user ns are the same we allow
>>>> to
>>>> get an id for a peer netns.
>>>
>>>
>>> Too late, userns is relatively new, relying on it breaks our existing
>>> assumption.
>>>
>> I don't get your point. netns has been added in kernel after user ns:
>> acce292c82d4 user namespace: add the framework => 2.6.23
>> 5f256becd868 [NET]: Basic network namespace infrastructure. => 2.6.24
>
> Was it complete on 2.6.x? I doubt...
>
> https://lkml.org/lkml/2014/8/20/826
>
>     As at Linux 3.8, most relevant subsystems supported  user  names‐
>         paces,  but  a number of filesystems did not have the infrastruc‐
>         ture needed to map user and group IDs  between  user  namespaces.
>         Linux  3.9  added the required infrastructure support for many of
>         the remaining unsupported filesystems (Plan 9 (9P),  Andrew  File
>         System  (AFS),  Ceph,  CIFS,  CODA,  NFS, and OCFS2).  Linux 3.11
>         added support the last of the unsupported major filesystems, XFS.
>
>
>>
>> In the kernel, each netns is linked with a user ns.
>
> Are you saying every time we create a netns we have a new userns?
> This doesn't make sense for me.
>
No. I mean that each netns depends on a userns.
See include/net/net_namespace.h:
struct net {
[snip]
         struct user_namespace   *user_ns;       /* Owning user namespace */
[snip]
}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26  2:09             ` Cong Wang
@ 2014-09-26 13:40               ` Nicolas Dichtel
  2014-09-26 19:15                 ` David Ahern
  0 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-26 13:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

Le 26/09/2014 04:09, Cong Wang a écrit :
> On Thu, Sep 25, 2014 at 1:53 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> Le 24/09/2014 18:45, Cong Wang a écrit :
>>>
>>> On Wed, Sep 24, 2014 at 9:27 AM, Nicolas Dichtel
>>> <nicolas.dichtel@6wind.com> wrote:
>>>>
>>>> Now informations got with 'ip link' are wrong and incomplete:
>>>>     - the link dev is now tunl0 instead of eth0, because we only got an
>>>> ifindex
>>>>       from the kernel without any netns informations.
>>>
>>>
>>> This is not new, macvlan has the same problem. This is why I said
>>> it is mostly a display problem, maybe just mark the ifindex as -1 or
>>> something when it is not in this netns. At least I don't expect the inner
>>> netns know anything outside, and I don't think I am the only one using
>>> netns in this way.
>>
>> I understand your point but there is several use of netns. Netns can be used
>> also to instantiate virtual routers. In this case, administrators or daemons
>> need to be able to monitor and dump the configuration on all netns
>> (particularly beeing able to identify fully x-netns interfaces). We start to
>> discuss this in one of the two thread pointed in my cover letter and get the
>> conclusion that checking user ns is a good way to know if an id should be
>> disclosed or not for a peer netns.
>
> Then you are leaking information, this breaks isolation.
>
>> Can you describe your use case?
>
> Yes, too simple: isolation networking, different netns's don't see each other
> (including anything inside) and only communicate via veth.
If you are a privileged user and you are able to access a peer netns (move an
interface into this peer netns, move an interface from this peer netns to your
own netns), I don't see any reason to not beeing able to get information about
this peer netns (you are already a privileged user in both netns).
If you want to isolate this peer netns (I think you call it "inner netns"), you
have to create a new user ns for this netns, hence a privileged user into this
peer netns will not be able to act in your own netns. And with this scenario and
my patches, this privileged user will not be able to get an id. Isolation is
preserved.
How do you preserved it in your scenario?

>
>
>> If you only play with netns, you may want to monitor all activies in all
>> netns
>> (this is already possible) and beeing able to link information between netns
>> (this is what I'm trying to solve).
>
>
> No, I don't want to monitor anything. Even if I wanted, I would just start one
> daemon in each netns instead of one for all.
Ok you don't want, but some other people (not only me) want it! And having one
daemon per netns does not scale: there are scenarii with thousand netns which
are dynamically created and deleted.

>
> On the other hand, why not exchange the configuration via veth
> between different netns? There are many ways to do so with TCP HTTP etc.
> This doesn't have to be solved in kernel.
>
The standard way with linux to monitor network configuration is netlink.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
                   ` (6 preceding siblings ...)
  2014-09-23 19:26 ` Andy Lutomirski
@ 2014-09-26 18:10 ` Eric W. Biederman
  2014-09-26 18:26   ` Andy Lutomirski
  7 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-09-26 18:10 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, Cong Wang

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> The goal of this serie is to be able to multicast netlink messages with an
> attribute that identify a peer netns.
> This is needed by the userland to interpret some informations contained in
> netlink messages (like IFLA_LINK value, but also some other attributes in case
> of x-netns netdevice (see also
> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

I want say that the problem addressed by patch 3/5 of this series is a
fundamentally valid problem.  We have network objects spanning network
namespaces and it would be very nice to be able to talk about them in
netlink, and file descriptors are too local and argubably too heavy
weight for netlink quires and especially for netlink broadcast messages.

Furthermore the concept of ineternal concept of peernet2id seems valid.

However what you do not address is a way for CRIU (aka process
migration) to be able to restore these ids after process migration.
Going farther it looks like you are actively breaking process migration
at this time, making this set of patches a no-go.

When adding a new form of namespace id CRIU patches are just about
as necessary as iproute patches.

> Ids are stored in the parent user namespace. These ids are valid only inside
> this user namespace. The user can retrieve these ids via a new netlink messages,
> but only if peer netns are in the same user namespace.

That does not describe what you have actually implemented in the
patches.

I see two ways to go with this.

- A per network namespace table to that you can store ids for ``peer''
  network namespaces.  The table would need to be populated manually by
  the likes of ip netns add.

  That flips the order of assignment and makes this idea solid.

  Unfortunately in the case of a fully referencing mesh of N network
  namespaces such a mesh winds up taking O(N^2) space, which seems
  undesirable.

- Add a netlink attribute that says this network element is in a peer
  network namespace.

  Add a unicast query message that let's you ask if the remote
  end of a tunnel is in a network namespace specified by file
  descriptor.

I personally lean towards the second version as it is fundamentally
simpler, and generally scales better, and the visibility controls are
the existing visibility controls.  The only downside is it requires
a query after receiving a netlink broadcast message for the times that
we care.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 18:10 ` Eric W. Biederman
@ 2014-09-26 18:26   ` Andy Lutomirski
  2014-09-26 18:57     ` Eric W. Biederman
  0 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2014-09-26 18:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Nicolas Dichtel, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> The goal of this serie is to be able to multicast netlink messages with an
>> attribute that identify a peer netns.
>> This is needed by the userland to interpret some informations contained in
>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>> of x-netns netdevice (see also
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>
> I want say that the problem addressed by patch 3/5 of this series is a
> fundamentally valid problem.  We have network objects spanning network
> namespaces and it would be very nice to be able to talk about them in
> netlink, and file descriptors are too local and argubably too heavy
> weight for netlink quires and especially for netlink broadcast messages.
>
> Furthermore the concept of ineternal concept of peernet2id seems valid.
>
> However what you do not address is a way for CRIU (aka process
> migration) to be able to restore these ids after process migration.
> Going farther it looks like you are actively breaking process migration
> at this time, making this set of patches a no-go.
>
> When adding a new form of namespace id CRIU patches are just about
> as necessary as iproute patches.
>
>> Ids are stored in the parent user namespace. These ids are valid only inside
>> this user namespace. The user can retrieve these ids via a new netlink messages,
>> but only if peer netns are in the same user namespace.
>
> That does not describe what you have actually implemented in the
> patches.
>
> I see two ways to go with this.
>
> - A per network namespace table to that you can store ids for ``peer''
>   network namespaces.  The table would need to be populated manually by
>   the likes of ip netns add.
>
>   That flips the order of assignment and makes this idea solid.
>
>   Unfortunately in the case of a fully referencing mesh of N network
>   namespaces such a mesh winds up taking O(N^2) space, which seems
>   undesirable.
>
> - Add a netlink attribute that says this network element is in a peer
>   network namespace.
>
>   Add a unicast query message that let's you ask if the remote
>   end of a tunnel is in a network namespace specified by file
>   descriptor.
>
> I personally lean towards the second version as it is fundamentally
> simpler, and generally scales better, and the visibility controls are
> the existing visibility controls.  The only downside is it requires
> a query after receiving a netlink broadcast message for the times that
> we care.

The downside of that approach, and all the similar kcmp stuff, is that
it scales poorly for applications using it.  This is probably not the
end of the world, but it's not ideal.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 18:26   ` Andy Lutomirski
@ 2014-09-26 18:57     ` Eric W. Biederman
  2014-09-29 12:06       ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-09-26 18:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Nicolas Dichtel, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

Andy Lutomirski <luto@amacapital.net> writes:

> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>
>>> The goal of this serie is to be able to multicast netlink messages with an
>>> attribute that identify a peer netns.
>>> This is needed by the userland to interpret some informations contained in
>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>> of x-netns netdevice (see also
>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>
>> I want say that the problem addressed by patch 3/5 of this series is a
>> fundamentally valid problem.  We have network objects spanning network
>> namespaces and it would be very nice to be able to talk about them in
>> netlink, and file descriptors are too local and argubably too heavy
>> weight for netlink quires and especially for netlink broadcast messages.
>>
>> Furthermore the concept of ineternal concept of peernet2id seems valid.
>>
>> However what you do not address is a way for CRIU (aka process
>> migration) to be able to restore these ids after process migration.
>> Going farther it looks like you are actively breaking process migration
>> at this time, making this set of patches a no-go.
>>
>> When adding a new form of namespace id CRIU patches are just about
>> as necessary as iproute patches.
>>
>>> Ids are stored in the parent user namespace. These ids are valid only inside
>>> this user namespace. The user can retrieve these ids via a new netlink messages,
>>> but only if peer netns are in the same user namespace.
>>
>> That does not describe what you have actually implemented in the
>> patches.
>>
>> I see two ways to go with this.
>>
>> - A per network namespace table to that you can store ids for ``peer''
>>   network namespaces.  The table would need to be populated manually by
>>   the likes of ip netns add.
>>
>>   That flips the order of assignment and makes this idea solid.
>>
>>   Unfortunately in the case of a fully referencing mesh of N network
>>   namespaces such a mesh winds up taking O(N^2) space, which seems
>>   undesirable.
>>
>> - Add a netlink attribute that says this network element is in a peer
>>   network namespace.
>>
>>   Add a unicast query message that let's you ask if the remote
>>   end of a tunnel is in a network namespace specified by file
>>   descriptor.
>>
>> I personally lean towards the second version as it is fundamentally
>> simpler, and generally scales better, and the visibility controls are
>> the existing visibility controls.  The only downside is it requires
>> a query after receiving a netlink broadcast message for the times that
>> we care.
>
> The downside of that approach, and all the similar kcmp stuff, is that
> it scales poorly for applications using it.  This is probably not the
> end of the world, but it's not ideal.

Agreed, the efficiency is not ideal and there is plenty of room for
optimization.  We could certainly adopt some of kcmps ordering
infrastructure to make it suck less, or even potentially work out how
to return a file descriptor to the network namespace in question.

The key insight of my second proposal is that we can get out of the
broadcast message business, and only care about the remote namespace for
unicast messages.  Putting the work in an infrequently used slow path
instead of a comparitively common path gives us much more freedom in
the implementation.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 13:40               ` Nicolas Dichtel
@ 2014-09-26 19:15                 ` David Ahern
  2014-09-26 19:34                   ` Eric W. Biederman
  0 siblings, 1 reply; 67+ messages in thread
From: David Ahern @ 2014-09-26 19:15 UTC (permalink / raw)
  To: nicolas.dichtel, Cong Wang
  Cc: netdev, containers, linux-kernel, linux-api, David Miller,
	Eric W. Biederman, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On 9/26/14, 7:40 AM, Nicolas Dichtel wrote:
>>
>>
>> No, I don't want to monitor anything. Even if I wanted, I would just
>> start one
>> daemon in each netns instead of one for all.
> Ok you don't want, but some other people (not only me) want it! And
> having one
> daemon per netns does not scale: there are scenarii with thousand netns
> which
> are dynamically created and deleted.

An example of the scaling problem using quagga (old but still seems to 
be a relevant data point):

 
https://lists.quagga.net/pipermail/quagga-users/2010-February/011351.html

"2k VRFs that would be 2.6G"

And that does not include the overhead of each namespace -- roughly 
200kB/namespace on one kernel I checked (v3.10). So that's a ballpark of 
3G of memory.

David

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 19:15                 ` David Ahern
@ 2014-09-26 19:34                   ` Eric W. Biederman
  2014-09-26 19:44                     ` David Ahern
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-09-26 19:34 UTC (permalink / raw)
  To: David Ahern
  Cc: nicolas.dichtel, Cong Wang, netdev, containers, linux-kernel,
	linux-api, David Miller, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

David Ahern <lxhacker68@gmail.com> writes:

> On 9/26/14, 7:40 AM, Nicolas Dichtel wrote:
>>>
>>>
>>> No, I don't want to monitor anything. Even if I wanted, I would just
>>> start one
>>> daemon in each netns instead of one for all.
>> Ok you don't want, but some other people (not only me) want it! And
>> having one
>> daemon per netns does not scale: there are scenarii with thousand netns
>> which
>> are dynamically created and deleted.
>
> An example of the scaling problem using quagga (old but still seems to be a
> relevant data point):
>
>
> https://lists.quagga.net/pipermail/quagga-users/2010-February/011351.html
>
> "2k VRFs that would be 2.6G"
>
> And that does not include the overhead of each namespace -- roughly
> 200kB/namespace on one kernel I checked (v3.10). So that's a ballpark of 3G of
> memory.

Resetting the conversation just a little bit.

When I wrote the "ip netns" support I never expected that all
applications would want to run in a specific network namespace.  All
that is needed is one socket per network namespace.

Furthermore one socket or one procesess per network namespaces is
completely orthogonal to the patches presented.  I do not see a
identifying where the far end of a veth pair or similar set of
networking objects as anything that even closely resembles a path to a
using only a single socket.

So I think this whole subthread is quite silly and grossly off track.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 19:34                   ` Eric W. Biederman
@ 2014-09-26 19:44                     ` David Ahern
  2014-09-26 20:45                       ` Eric W. Biederman
  0 siblings, 1 reply; 67+ messages in thread
From: David Ahern @ 2014-09-26 19:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nicolas.dichtel, Cong Wang, netdev, containers, linux-kernel,
	linux-api, David Miller, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On 9/26/14, 1:34 PM, Eric W. Biederman wrote:
> When I wrote the "ip netns" support I never expected that all
> applications would want to run in a specific network namespace.  All
> that is needed is one socket per network namespace.

Sure that is another option. But for a process to create a socket or 
thread in a second namespace it has to run as root -- CAP_SYS_ADMIN is 
needed for setns (or perhaps there is another way to create the socket 
or thread in the namespace).

Second, it still does not address the scalability problem. For example a 
single daemon providing service across 2k namespaces means it needs 2k 
listen sockets. From there a system could have 20, 30 or 50 services 
running. Certainly lighter than a process per namespace, but not even 
close to ideal when talking about something like VRFs.

David

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 19:44                     ` David Ahern
@ 2014-09-26 20:45                       ` Eric W. Biederman
  2014-09-26 20:56                         ` David Ahern
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-09-26 20:45 UTC (permalink / raw)
  To: David Ahern
  Cc: nicolas.dichtel, Cong Wang, netdev, containers, linux-kernel,
	linux-api, David Miller, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

David Ahern <lxhacker68@gmail.com> writes:

> On 9/26/14, 1:34 PM, Eric W. Biederman wrote:
>> When I wrote the "ip netns" support I never expected that all
>> applications would want to run in a specific network namespace.  All
>> that is needed is one socket per network namespace.
>
> Sure that is another option. But for a process to create a socket or
> thread in a second namespace it has to run as root -- CAP_SYS_ADMIN is
> needed for setns (or perhaps there is another way to create the socket
> or thread in the namespace).

To do anything other than simply listen on a netlink socket you also
have to be root.  So this is most cases that I am aware of this is a
don't care.  Especially for routing daemons.

If it becomes a common pain in writing network namespace aware
applications that the you have to be root just to open your listening
socket then that probably would be sufficient justification for the
socketat system call that I have I prototyped and then never did
anything with because at the time it was insufficiently interesting.

> Second, it still does not address the scalability problem. For example
> a single daemon providing service across 2k namespaces means it needs
> 2k listen sockets. From there a system could have 20, 30 or 50
> services running. Certainly lighter than a process per namespace, but
> not even close to ideal when talking about something like VRFs.

Ah.  You are talking about a system with 2k namespaces and 20-50
services providing services in all 2k namespaces. Something completely
different than the case of quagga you mentioned earlier.

I expect quagga would need one netlink control socket and one socket
listening to netlink events, and a tcp connection or two to remote bgp
servers in each network namespace.  In that case I don't see anything
except a small constant difference in ways it can be handled.

For your new example of a crazy number of servers running on a box each
of which is had one listening socket in each network namespace maybe
they will be idle most of the time in most network namespaces and the
overhead will be significant.  Shrug those applications don't appear to
exist so I can't say what would make a good design.

If someone writes them and describes what is going on we can see if the
current set of interfaces is ideal or problematics.  If there are
signifcantly better interfaces that can be provided in a maintainable
way I imagine the patches would be easily accepted.

But again this has nothing do with the peer netns work.  So if you have
something practical to contribute please start a new thread.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 20:45                       ` Eric W. Biederman
@ 2014-09-26 20:56                         ` David Ahern
  0 siblings, 0 replies; 67+ messages in thread
From: David Ahern @ 2014-09-26 20:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nicolas.dichtel, Cong Wang, netdev, containers, linux-kernel,
	linux-api, David Miller, Stephen Hemminger, Andrew Morton,
	Andy Lutomirski

On 9/26/14, 2:45 PM, Eric W. Biederman wrote:
> Ah.  You are talking about a system with 2k namespaces and 20-50
> services providing services in all 2k namespaces. Something completely
> different than the case of quagga you mentioned earlier.

Not at all. The earlier quagga example was a starting point on the 
bigger topic -- inefficiencies of namespaces as VRFs. In all of the 
products I have worked on there is always more than 1 service running on 
the system.

> But again this has nothing do with the peer netns work.  So if you have
> something practical to contribute please start a new thread.

Sure, I'll start a new thread.

Thanks,
David

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-26 18:57     ` Eric W. Biederman
@ 2014-09-29 12:06       ` Nicolas Dichtel
  2014-09-29 18:43         ` Eric W. Biederman
  0 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-09-29 12:06 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski
  Cc: Network Development, Linux Containers, linux-kernel, Linux API,
	David S. Miller, Stephen Hemminger, Andrew Morton, Cong Wang

Le 26/09/2014 20:57, Eric W. Biederman a écrit :
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>
>>>> The goal of this serie is to be able to multicast netlink messages with an
>>>> attribute that identify a peer netns.
>>>> This is needed by the userland to interpret some informations contained in
>>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>>> of x-netns netdevice (see also
>>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>
>>> I want say that the problem addressed by patch 3/5 of this series is a
>>> fundamentally valid problem.  We have network objects spanning network
>>> namespaces and it would be very nice to be able to talk about them in
>>> netlink, and file descriptors are too local and argubably too heavy
>>> weight for netlink quires and especially for netlink broadcast messages.
>>>
>>> Furthermore the concept of ineternal concept of peernet2id seems valid.
>>>
>>> However what you do not address is a way for CRIU (aka process
>>> migration) to be able to restore these ids after process migration.
>>> Going farther it looks like you are actively breaking process migration
>>> at this time, making this set of patches a no-go.
Ok, I will look more deeply into CRIU.

>>>
>>> When adding a new form of namespace id CRIU patches are just about
>>> as necessary as iproute patches.
Noted.

>>>
>>>> Ids are stored in the parent user namespace. These ids are valid only inside
>>>> this user namespace. The user can retrieve these ids via a new netlink messages,
>>>> but only if peer netns are in the same user namespace.
>>>
>>> That does not describe what you have actually implemented in the
>>> patches.
>>>
>>> I see two ways to go with this.
>>>
>>> - A per network namespace table to that you can store ids for ``peer''
>>>    network namespaces.  The table would need to be populated manually by
>>>    the likes of ip netns add.
>>>
>>>    That flips the order of assignment and makes this idea solid.
I have a preference for this solution, because it allows to have a full
broadcast messages. When you have a lot of network interfaces (> 10k),
it saves a lot of time to avoid another request to get all informations.

>>>
>>>    Unfortunately in the case of a fully referencing mesh of N network
>>>    namespaces such a mesh winds up taking O(N^2) space, which seems
>>>    undesirable.
Memory consumption vs performances ;-)
In fact, when you have a lot of netns, you already should have some memory
available (at least N lo interfaces + N interfaces (veth or a x-netns
interface)). I'm not convinced that this is really an obstacle.

>>>
>>> - Add a netlink attribute that says this network element is in a peer
>>>    network namespace.
>>>
>>>    Add a unicast query message that let's you ask if the remote
>>>    end of a tunnel is in a network namespace specified by file
>>>    descriptor.
>>>
>>> I personally lean towards the second version as it is fundamentally
>>> simpler, and generally scales better, and the visibility controls are
>>> the existing visibility controls.  The only downside is it requires
>>> a query after receiving a netlink broadcast message for the times that
>>> we care.
>>
>> The downside of that approach, and all the similar kcmp stuff, is that
>> it scales poorly for applications using it.  This is probably not the
>> end of the world, but it's not ideal.
>
> Agreed, the efficiency is not ideal and there is plenty of room for
> optimization.  We could certainly adopt some of kcmps ordering
> infrastructure to make it suck less, or even potentially work out how
> to return a file descriptor to the network namespace in question.
>
> The key insight of my second proposal is that we can get out of the
> broadcast message business, and only care about the remote namespace for
> unicast messages.  Putting the work in an infrequently used slow path
> instead of a comparitively common path gives us much more freedom in
> the implementation.
I think it's better to have a full netlink messages, instead a partial one.
There is already a lot of attributes added for each rtnl interface messages to
be sure to describe all parameters of these interfaces.
And if the user don't care about ids (user has not set any id with iproute2),
we can just add the same attribute with id 0 (let's say it's a reserved id) to
indicate that the link part of this interface is in another netns.

The great benefit of your first proposal is that the ids are set by the
userspace and thus it allows a high flexibility.

Would you accept a patch that implements this first solution?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-29 12:06       ` Nicolas Dichtel
@ 2014-09-29 18:43         ` Eric W. Biederman
  2014-10-02 13:46           ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-09-29 18:43 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: Andy Lutomirski, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 26/09/2014 20:57, Eric W. Biederman a écrit :
>> Andy Lutomirski <luto@amacapital.net> writes:
>>
>>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>>
>>>>> The goal of this serie is to be able to multicast netlink messages with an
>>>>> attribute that identify a peer netns.
>>>>> This is needed by the userland to interpret some informations contained in
>>>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>>>> of x-netns netdevice (see also
>>>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>>
>>>> I want say that the problem addressed by patch 3/5 of this series is a
>>>> fundamentally valid problem.  We have network objects spanning network
>>>> namespaces and it would be very nice to be able to talk about them in
>>>> netlink, and file descriptors are too local and argubably too heavy
>>>> weight for netlink quires and especially for netlink broadcast messages.
>>>>
>>>> Furthermore the concept of ineternal concept of peernet2id seems valid.
>>>>
>>>> However what you do not address is a way for CRIU (aka process
>>>> migration) to be able to restore these ids after process migration.
>>>> Going farther it looks like you are actively breaking process migration
>>>> at this time, making this set of patches a no-go.
> Ok, I will look more deeply into CRIU.
>
>>>>
>>>> When adding a new form of namespace id CRIU patches are just about
>>>> as necessary as iproute patches.
> Noted.



>>>> That does not describe what you have actually implemented in the
>>>> patches.
>>>>
>>>> I see two ways to go with this.
>>>>
>>>> - A per network namespace table to that you can store ids for ``peer''
>>>>    network namespaces.  The table would need to be populated manually by
>>>>    the likes of ip netns add.
>>>>
>>>>    That flips the order of assignment and makes this idea solid.
> I have a preference for this solution, because it allows to have a full
> broadcast messages. When you have a lot of network interfaces (> 10k),
> it saves a lot of time to avoid another request to get all informations.

My practical question is how often does it happen that we care?

>>>>    Unfortunately in the case of a fully referencing mesh of N network
>>>>    namespaces such a mesh winds up taking O(N^2) space, which seems
>>>>    undesirable.
> Memory consumption vs performances ;-)
> In fact, when you have a lot of netns, you already should have some memory
> available (at least N lo interfaces + N interfaces (veth or a x-netns
> interface)). I'm not convinced that this is really an obstacle.

I would have to see how it all fits together. O(N^2) grows a lot faster
that N.  So after a point it isn't in the same ballpark of memory
consumption.

>> broadcast message business, and only care about the remote namespace for
>> unicast messages.  Putting the work in an infrequently used slow path
>> instead of a comparitively common path gives us much more freedom in
>> the implementation.
> I think it's better to have a full netlink messages, instead a partial one.
> There is already a lot of attributes added for each rtnl interface messages to
> be sure to describe all parameters of these interfaces.
> And if the user don't care about ids (user has not set any id with iproute2),
> we can just add the same attribute with id 0 (let's say it's a reserved id) to
> indicate that the link part of this interface is in another netns.

I imagine an id like that is something we would want ip netns add to
set, and probably set in all existing network namespaces as well.

> The great benefit of your first proposal is that the ids are set by the
> userspace and thus it allows a high flexibility.
>
> Would you accept a patch that implements this first solution?

I would not fundamentally reject it.  I would really like to make
certain we think through how it will be used and what the practical
benefits are.  Depending on how it is used the data structure could
be a killer or it could be a case where we see how to manage it and
simply don't care.

Eric


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-09-29 18:43         ` Eric W. Biederman
@ 2014-10-02 13:46           ` Nicolas Dichtel
  2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
  2014-10-02 19:20             ` [RFC PATCH net-next v2 0/5] " Eric W. Biederman
  0 siblings, 2 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-02 13:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

Le 29/09/2014 20:43, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 26/09/2014 20:57, Eric W. Biederman a écrit :
>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>
>>>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>>>> <ebiederm@xmission.com> wrote:
>>>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>>>
>>>>>> The goal of this serie is to be able to multicast netlink messages with an
>>>>>> attribute that identify a peer netns.
>>>>>> This is needed by the userland to interpret some informations contained in
>>>>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>>>>> of x-netns netdevice (see also
>>>>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>>>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>>>
>>>>> I want say that the problem addressed by patch 3/5 of this series is a
>>>>> fundamentally valid problem.  We have network objects spanning network
>>>>> namespaces and it would be very nice to be able to talk about them in
>>>>> netlink, and file descriptors are too local and argubably too heavy
>>>>> weight for netlink quires and especially for netlink broadcast messages.
>>>>>
>>>>> Furthermore the concept of ineternal concept of peernet2id seems valid.
>>>>>
>>>>> However what you do not address is a way for CRIU (aka process
>>>>> migration) to be able to restore these ids after process migration.
>>>>> Going farther it looks like you are actively breaking process migration
>>>>> at this time, making this set of patches a no-go.
>> Ok, I will look more deeply into CRIU.
>>
>>>>>
>>>>> When adding a new form of namespace id CRIU patches are just about
>>>>> as necessary as iproute patches.
>> Noted.
>
>
>
>>>>> That does not describe what you have actually implemented in the
>>>>> patches.
>>>>>
>>>>> I see two ways to go with this.
>>>>>
>>>>> - A per network namespace table to that you can store ids for ``peer''
>>>>>     network namespaces.  The table would need to be populated manually by
>>>>>     the likes of ip netns add.
>>>>>
>>>>>     That flips the order of assignment and makes this idea solid.
>> I have a preference for this solution, because it allows to have a full
>> broadcast messages. When you have a lot of network interfaces (> 10k),
>> it saves a lot of time to avoid another request to get all informations.
>
> My practical question is how often does it happen that we care?
In fact, I don't think that scenarii with a lot of netns have a full mesh of
x-netns interfaces. It will be more one "link" netns with the physical
interface and all other with one interface with the link part in this "link"
netns. Hence, only one nsid is needing in each netns.

>
>>>>>     Unfortunately in the case of a fully referencing mesh of N network
>>>>>     namespaces such a mesh winds up taking O(N^2) space, which seems
>>>>>     undesirable.
>> Memory consumption vs performances ;-)
>> In fact, when you have a lot of netns, you already should have some memory
>> available (at least N lo interfaces + N interfaces (veth or a x-netns
>> interface)). I'm not convinced that this is really an obstacle.
>
> I would have to see how it all fits together. O(N^2) grows a lot faster
> that N.  So after a point it isn't in the same ballpark of memory
> consumption.
>
>>> broadcast message business, and only care about the remote namespace for
>>> unicast messages.  Putting the work in an infrequently used slow path
>>> instead of a comparitively common path gives us much more freedom in
>>> the implementation.
>> I think it's better to have a full netlink messages, instead a partial one.
>> There is already a lot of attributes added for each rtnl interface messages to
>> be sure to describe all parameters of these interfaces.
>> And if the user don't care about ids (user has not set any id with iproute2),
>> we can just add the same attribute with id 0 (let's say it's a reserved id) to
>> indicate that the link part of this interface is in another netns.
>
> I imagine an id like that is something we would want ip netns add to
> set, and probably set in all existing network namespaces as well.
>
>> The great benefit of your first proposal is that the ids are set by the
>> userspace and thus it allows a high flexibility.
>>
>> Would you accept a patch that implements this first solution?
>
> I would not fundamentally reject it.  I would really like to make
> certain we think through how it will be used and what the practical
> benefits are.  Depending on how it is used the data structure could
> be a killer or it could be a case where we see how to manage it and
> simply don't care.
I will send a v3, so we can talk about it.


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v3 0/4] netns: allow to identify peer netns
  2014-10-02 13:46           ` Nicolas Dichtel
@ 2014-10-02 13:48             ` Nicolas Dichtel
  2014-10-02 13:48               ` [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
                                 ` (4 more replies)
  2014-10-02 19:20             ` [RFC PATCH net-next v2 0/5] " Eric W. Biederman
  1 sibling, 5 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-02 13:48 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang

The goal of this serie is to be able to multicast netlink messages with an
attribute that identify a peer netns.
This is needed by the userland to interpret some informations contained in
netlink messages (like IFLA_LINK value, but also some other attributes in case
of x-netns netdevice (see also
http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

Ids of peer netns are set by userland via a new genl messages. These ids are
stored per netns and are local (ie only valid in the netns where they are set).
To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
the id of a peer netns.

Patch 1/4 introduces the netlink API mechanism to set and get these ids.
Patch 2/4 and 3/4 shows an example of how to use these ids in rtnetlink
messages. And patch 4/4 shows that the netlink messages can be symetric between
a GET and a SET.

iproute2 patches are available, I can send them on demand.

Here is a small screenshot to show how it can be used by userland:
$ ip netns add foo
$ ip netns del foo
$ ip netns
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns add foo
$ ip netns exec foo ip netns set init_net 0
$ ip netns
foo
init_net
$ ip netns exec foo ip netns
foo
init_net (id: 0)
$ ip netns exec foo ip link add ipip1 link-netnsid 0 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip netns exec foo ip l ls ipip1
6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0

The parameter link-netnsid shows us where the interface sends and receives
packets (and thus we know where encapsulated addresses are set).

RFCv2 -> RFCv3:
  ids are now defined by userland (via netlink). Ids are stored in each netns
  (and they are local to this netns).
  add get_link_net support for ip6 tunnels
  netnsid is now a s32 instead of a u32

RFCv1 -> RFCv2:
  remove useless ()
  ids are now stored in the user ns. It's possible to get an id for a peer netns
  only if the current netns and the peer netns have the same user ns parent.

 MAINTAINERS                  |   1 +
 include/net/ip6_tunnel.h     |   1 +
 include/net/ip_tunnels.h     |   1 +
 include/net/net_namespace.h  |   5 ++
 include/net/rtnetlink.h      |   2 +
 include/uapi/linux/Kbuild    |   1 +
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/netns.h   |  31 +++++++
 net/core/net_namespace.c     | 195 +++++++++++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c         |  38 ++++++++-
 net/ipv4/ip_gre.c            |   2 +
 net/ipv4/ip_tunnel.c         |   8 ++
 net/ipv4/ip_vti.c            |   1 +
 net/ipv4/ipip.c              |   1 +
 net/ipv6/ip6_gre.c           |   1 +
 net/ipv6/ip6_tunnel.c        |   9 ++
 net/ipv6/ip6_vti.c           |   1 +
 net/ipv6/sit.c               |   1 +
 net/netlink/genetlink.c      |   4 +
 19 files changed, 301 insertions(+), 3 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids
  2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
@ 2014-10-02 13:48               ` Nicolas Dichtel
  2014-10-02 19:33                 ` Eric W. Biederman
  2014-10-02 13:48               ` [RFC PATCH net-next v3 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
                                 ` (3 subsequent siblings)
  4 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-02 13:48 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to netns (ie valid only into one netns).

This will be useful for netlink messages when a x-netns interface is dumped.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 MAINTAINERS                 |   1 +
 include/net/net_namespace.h |   5 ++
 include/uapi/linux/Kbuild   |   1 +
 include/uapi/linux/netns.h  |  31 +++++++
 net/core/net_namespace.c    | 195 ++++++++++++++++++++++++++++++++++++++++++++
 net/netlink/genetlink.c     |   4 +
 6 files changed, 237 insertions(+)
 create mode 100644 include/uapi/linux/netns.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f8db3c3acc67..8e7f5d668e6a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6278,6 +6278,7 @@ F:	include/linux/netdevice.h
 F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
+F:	include/uapi/linux/netns.h
 F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 361d26077196..d8847d978b59 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,6 +59,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	struct idr		netns_ids;
 
 	unsigned int		proc_inum;
 
@@ -289,6 +290,10 @@ static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
+int peernet2id(struct net *net, struct net *peer);
+struct net *get_net_ns_by_id(struct net *net, int id);
+int netns_genl_register(void);
+
 struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 70e150ebc6c9..33a0bbfe4736 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -276,6 +276,7 @@ header-y += netfilter_decnet.h
 header-y += netfilter_ipv4.h
 header-y += netfilter_ipv6.h
 header-y += netlink.h
+header-y += netns.h
 header-y += netrom.h
 header-y += nfc.h
 header-y += nfs.h
diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
new file mode 100644
index 000000000000..8ebb08885795
--- /dev/null
+++ b/include/uapi/linux/netns.h
@@ -0,0 +1,31 @@
+#ifndef _UAPI_LINUX_NETNS_H_
+#define _UAPI_LINUX_NETNS_H_
+
+/* Generic netlink messages */
+
+#define NETNS_GENL_NAME			"netns"
+#define NETNS_GENL_VERSION		0x1
+
+/* Commands */
+enum {
+	NETNS_CMD_UNSPEC,
+	NETNS_CMD_NEWID,
+	NETNS_CMD_GETID,
+	__NETNS_CMD_MAX,
+};
+
+#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
+
+/* Attributes */
+enum {
+	NETNSA_NONE,
+#define NETNSA_NSINDEX_UNKNOWN	-1
+	NETNSA_NSID,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NETNS_H_ */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7f155175bba8..4a5680ed42fb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -15,6 +15,8 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/netns.h>
+#include <net/genetlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -144,6 +146,50 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+/* This function is used by idr_for_each(). If net is equal to peer, the
+ * function returns the id so that idr_for_each() stops. Because we cannot
+ * returns the id 0 (idr_for_each() will not stop), we return the magic value
+ * -1 for it.
+ */
+static int net_eq_idr(int id, void *net, void *peer)
+{
+	if (net_eq(net, peer))
+		return id ? : -1;
+	return 0;
+}
+
+/* returns NETNSA_NSINDEX_UNKNOWN if not found */
+int peernet2id(struct net *net, struct net *peer)
+{
+	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+
+	ASSERT_RTNL();
+
+	/* Magic value for id 0. */
+	if (id == -1)
+		return 0;
+	if (id == 0)
+		return NETNSA_NSINDEX_UNKNOWN;
+
+	return id;
+}
+
+struct net *get_net_ns_by_id(struct net *net, int id)
+{
+	struct net *peer;
+
+	if (id < 0)
+		return NULL;
+
+	rcu_read_lock();
+	peer = idr_find(&net->netns_ids, id);
+	if (peer)
+		get_net(peer);
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -158,6 +204,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+	idr_init(&net->netns_ids);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -288,6 +335,14 @@ static void cleanup_net(struct work_struct *work)
 	list_for_each_entry(net, &net_kill_list, cleanup_list) {
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
+		for_each_net(tmp) {
+			int id = peernet2id(tmp, net);
+
+			if (id >= 0)
+				idr_remove(&tmp->netns_ids, id);
+		}
+		idr_destroy(&net->netns_ids);
+
 	}
 	rtnl_unlock();
 
@@ -399,6 +454,146 @@ static struct pernet_operations __net_initdata net_ns_ops = {
 	.exit = net_ns_net_exit,
 };
 
+static struct genl_family netns_genl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NETNS_GENL_NAME,
+	.version	= NETNS_GENL_VERSION,
+	.hdrsize	= 0,
+	.maxattr	= NETNSA_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
+	[NETNSA_NSID]		= { .type = NLA_S32 },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct net *peer;
+	int nsid, err;
+
+	if (!info->attrs[NETNSA_NSID])
+		return -EINVAL;
+	nsid = nla_get_s32(info->attrs[NETNSA_NSID]);
+	if (nsid < 0)
+		return -EINVAL;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	rtnl_lock();
+	if (peernet2id(net, peer) >= 0) {
+		err = -EEXIST;
+		goto out;
+	}
+
+	err = idr_alloc(&net->netns_ids, peer, nsid, nsid + 1, GFP_KERNEL);
+	if (err >= 0)
+		err = 0;
+out:
+	rtnl_unlock();
+	put_net(peer);
+	return err;
+}
+
+static int netns_nl_get_size(void)
+{
+	return nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+	       ;
+}
+
+static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net, struct net *peer)
+{
+	void *hdr;
+	int id;
+
+	hdr = genlmsg_put(skb, portid, seq, &netns_genl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	rtnl_lock();
+	id = peernet2id(net, peer);
+	rtnl_unlock();
+	if (nla_put_s32(skb, NETNSA_NSID, id))
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int netns_nl_cmd_getid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+	struct net *peer;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
+			    NLM_F_ACK, NETNS_CMD_GETID, net, peer);
+	if (err < 0)
+		goto err_out;
+
+	err = genlmsg_unicast(net, msg, info->snd_portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+out:
+	put_net(peer);
+	return err;
+}
+
+static struct genl_ops netns_genl_ops[] = {
+	{
+		.cmd = NETNS_CMD_NEWID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_newid,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NETNS_CMD_GETID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_getid,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+int netns_genl_register(void)
+{
+	return genl_register_family_with_ops(&netns_genl_family,
+					     netns_genl_ops);
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 76393f2f4b22..c6f39e40c9f3 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1029,6 +1029,10 @@ static int __init genl_init(void)
 	if (err)
 		goto problem;
 
+	err = netns_genl_register();
+	if (err < 0)
+		goto problem;
+
 	return 0;
 
 problem:
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v3 2/4] rtnl: add link netns id to interface messages
  2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
  2014-10-02 13:48               ` [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
@ 2014-10-02 13:48               ` Nicolas Dichtel
  2014-10-02 13:48               ` [RFC PATCH net-next v3 3/4] iptunnels: advertise link netns via netlink Nicolas Dichtel
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-02 13:48 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels). When there is no id,
we put NETNSA_NSINDEX_UNKNOWN into this attribute to indicate to userland that
the link netns is different from the interface netns. Hence, userland knows that
some information like IFLA_LINK are not interpretable.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/rtnetlink.h      |  2 ++
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 13 +++++++++++++
 3 files changed, 16 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index e21b9f9653c0..6c6d5393fc34 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -46,6 +46,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  *			    to create when creating a new device.
  *	@get_num_rx_queues: Function to determine number of receive queues
  *			    to create when creating a new device.
+ *	@get_link_net: Function to get the i/o netns of the device
  */
 struct rtnl_link_ops {
 	struct list_head	list;
@@ -93,6 +94,7 @@ struct rtnl_link_ops {
 	int			(*fill_slave_info)(struct sk_buff *skb,
 						   const struct net_device *dev,
 						   const struct net_device *slave_dev);
+	struct net		*(*get_link_net)(const struct net_device *dev);
 };
 
 int __rtnl_link_register(struct rtnl_link_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 0bdb77e16875..938c0c02ed2e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_LINK_NETNSID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a6882686ca3a..1b9329512496 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -862,6 +862,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
+	       + nla_total_size(4) /* IFLA_LINK_NETNSID */
 	       + nla_total_size(ext_filter_mask
 			        & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */
 	       + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
@@ -1134,6 +1135,18 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			goto nla_put_failure;
 	}
 
+	if (dev->rtnl_link_ops &&
+	    dev->rtnl_link_ops->get_link_net) {
+		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
+
+		if (!net_eq(dev_net(dev), link_net)) {
+			int id = peernet2id(dev_net(dev), link_net);
+
+			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
+				goto nla_put_failure;
+		}
+	}
+
 	if (!(af_spec = nla_nest_start(skb, IFLA_AF_SPEC)))
 		goto nla_put_failure;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v3 3/4] iptunnels: advertise link netns via netlink
  2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
  2014-10-02 13:48               ` [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
  2014-10-02 13:48               ` [RFC PATCH net-next v3 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
@ 2014-10-02 13:48               ` Nicolas Dichtel
  2014-10-02 13:48               ` [RFC PATCH net-next v3 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
  2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-02 13:48 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/ip6_tunnel.h | 1 +
 include/net/ip_tunnels.h | 1 +
 net/ipv4/ip_gre.c        | 2 ++
 net/ipv4/ip_tunnel.c     | 8 ++++++++
 net/ipv4/ip_vti.c        | 1 +
 net/ipv4/ipip.c          | 1 +
 net/ipv6/ip6_gre.c       | 1 +
 net/ipv6/ip6_tunnel.c    | 9 +++++++++
 net/ipv6/ip6_vti.c       | 1 +
 net/ipv6/sit.c           | 1 +
 10 files changed, 26 insertions(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index a5593dab6af7..8648519f4555 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -69,6 +69,7 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t);
 __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw);
 __u32 ip6_tnl_get_cap(struct ip6_tnl *t, const struct in6_addr *laddr,
 			     const struct in6_addr *raddr);
+struct net *ip6_tnl_get_link_net(const struct net_device *dev);
 
 static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 7f538ba6e267..c92a99b5b77e 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -119,6 +119,7 @@ struct ip_tunnel_net {
 int ip_tunnel_init(struct net_device *dev);
 void ip_tunnel_uninit(struct net_device *dev);
 void  ip_tunnel_dellink(struct net_device *dev, struct list_head *head);
+struct net *ip_tunnel_get_link_net(const struct net_device *dev);
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 		       struct rtnl_link_ops *ops, char *devname);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 0485ef18d254..c75974986053 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -827,6 +827,7 @@ static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
@@ -841,6 +842,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __net_init ipgre_tap_init_net(struct net *net)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index b75b47b0a223..a8ab238d0df4 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -954,6 +954,14 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_dellink);
 
+struct net *ip_tunnel_get_link_net(const struct net_device *dev)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip_tunnel_get_link_net);
+
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 				  struct rtnl_link_ops *ops, char *devname)
 {
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index e453cb724a95..93862411669c 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -530,6 +530,7 @@ static struct rtnl_link_ops vti_link_ops __read_mostly = {
 	.changelink	= vti_changelink,
 	.get_size	= vti_get_size,
 	.fill_info	= vti_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __init vti_init(void)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index ea88ab3102a8..406910d04b1b 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -498,6 +498,7 @@ static struct rtnl_link_ops ipip_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipip_get_size,
 	.fill_info	= ipip_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel ipip_handler __read_mostly = {
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 9a0a1aafe727..10981f568250 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1659,6 +1659,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = {
 	.dellink	= ip6gre_dellink,
 	.get_size	= ip6gre_get_size,
 	.fill_info	= ip6gre_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index e01bd0399297..b86d9f4ea5ec 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1699,6 +1699,14 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+struct net *ip6_tnl_get_link_net(const struct net_device *dev)
+{
+	struct ip6_tnl *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip6_tnl_get_link_net);
+
 static const struct nla_policy ip6_tnl_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_LINK]		= { .type = NLA_U32 },
 	[IFLA_IPTUN_LOCAL]		= { .len = sizeof(struct in6_addr) },
@@ -1722,6 +1730,7 @@ static struct rtnl_link_ops ip6_link_ops __read_mostly = {
 	.dellink	= ip6_tnl_dellink,
 	.get_size	= ip6_tnl_get_size,
 	.fill_info	= ip6_tnl_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct xfrm6_tunnel ip4ip6_handler __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 7f52fd9fa7b0..88e8aadcfac1 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -988,6 +988,7 @@ static struct rtnl_link_ops vti6_link_ops __read_mostly = {
 	.changelink	= vti6_changelink,
 	.get_size	= vti6_get_size,
 	.fill_info	= vti6_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0d4e27466f82..02ef387811be 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1765,6 +1765,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
 	.get_size	= ipip6_get_size,
 	.fill_info	= ipip6_fill_info,
 	.dellink	= ipip6_dellink,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel sit_handler __read_mostly = {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC PATCH net-next v3 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set
  2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
                                 ` (2 preceding siblings ...)
  2014-10-02 13:48               ` [RFC PATCH net-next v3 3/4] iptunnels: advertise link netns via netlink Nicolas Dichtel
@ 2014-10-02 13:48               ` Nicolas Dichtel
  2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-02 13:48 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/rtnetlink.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1b9329512496..57959a85ed2c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1211,6 +1211,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
+	[IFLA_LINK_NETNSID]	= { .type = NLA_S32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1983,7 +1984,7 @@ replay:
 		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 0];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
-		struct net *dest_net;
+		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
@@ -2089,7 +2090,18 @@ replay:
 		if (IS_ERR(dest_net))
 			return PTR_ERR(dest_net);
 
-		dev = rtnl_create_link(dest_net, ifname, name_assign_type, ops, tb);
+		if (tb[IFLA_LINK_NETNSID]) {
+			int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
+
+			link_net = get_net_ns_by_id(dest_net, id);
+			if (link_net == NULL) {
+				err =  -EINVAL;
+				goto out;
+			}
+		}
+
+		dev = rtnl_create_link(link_net ? : dest_net, ifname,
+				       name_assign_type, ops, tb);
 		if (IS_ERR(dev)) {
 			err = PTR_ERR(dev);
 			goto out;
@@ -2117,9 +2129,16 @@ replay:
 			}
 		}
 		err = rtnl_configure_link(dev, ifm);
-		if (err < 0)
+		if (err < 0) {
 			unregister_netdevice(dev);
+			goto out;
+		}
+
+		if (link_net)
+			err = dev_change_net_namespace(dev, dest_net, ifname);
 out:
+		if (link_net)
+			put_net(link_net);
 		put_net(dest_net);
 		return err;
 	}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-10-02 13:46           ` Nicolas Dichtel
  2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
@ 2014-10-02 19:20             ` Eric W. Biederman
  2014-10-02 19:31               ` Andy Lutomirski
  2014-10-03 12:22               ` Nicolas Dichtel
  1 sibling, 2 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-10-02 19:20 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: Andy Lutomirski, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 29/09/2014 20:43, Eric W. Biederman a écrit :
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>
>>> Le 26/09/2014 20:57, Eric W. Biederman a écrit :
>>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>>
>>>>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>>>>> <ebiederm@xmission.com> wrote:
>>>>>> I see two ways to go with this.
>>>>>>
>>>>>> - A per network namespace table to that you can store ids for ``peer''
>>>>>>     network namespaces.  The table would need to be populated manually by
>>>>>>     the likes of ip netns add.
>>>>>>
>>>>>>     That flips the order of assignment and makes this idea solid.
>>> I have a preference for this solution, because it allows to have a full
>>> broadcast messages. When you have a lot of network interfaces (> 10k),
>>> it saves a lot of time to avoid another request to get all informations.
>>
>> My practical question is how often does it happen that we care?
> In fact, I don't think that scenarii with a lot of netns have a full mesh of
> x-netns interfaces. It will be more one "link" netns with the physical
> interface and all other with one interface with the link part in this "link"
> netns. Hence, only one nsid is needing in each netns.

I will buy that a full mesh is unlikely.  

For people doing simulations anything physical has a limited number of
links.

For people wanting all to all connectivity setting up an internal
macvlan (or the equivalent) is likely much simpler and more efficient
that a full mesh.

So the question in my mind is how do we create these identifiers at need
(when we create the cross network namespace links) instead of at network
namespace creation time.  I don't see an answer to that in your patches,
and perhaps it obvious.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-10-02 19:20             ` [RFC PATCH net-next v2 0/5] " Eric W. Biederman
@ 2014-10-02 19:31               ` Andy Lutomirski
  2014-10-02 19:45                 ` Eric W. Biederman
  2014-10-03 12:22               ` Nicolas Dichtel
  1 sibling, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2014-10-02 19:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Nicolas Dichtel, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

On Thu, Oct 2, 2014 at 12:20 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 29/09/2014 20:43, Eric W. Biederman a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>
>>>> Le 26/09/2014 20:57, Eric W. Biederman a écrit :
>>>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>>>
>>>>>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>>>>>> <ebiederm@xmission.com> wrote:
>>>>>>> I see two ways to go with this.
>>>>>>>
>>>>>>> - A per network namespace table to that you can store ids for ``peer''
>>>>>>>     network namespaces.  The table would need to be populated manually by
>>>>>>>     the likes of ip netns add.
>>>>>>>
>>>>>>>     That flips the order of assignment and makes this idea solid.
>>>> I have a preference for this solution, because it allows to have a full
>>>> broadcast messages. When you have a lot of network interfaces (> 10k),
>>>> it saves a lot of time to avoid another request to get all informations.
>>>
>>> My practical question is how often does it happen that we care?
>> In fact, I don't think that scenarii with a lot of netns have a full mesh of
>> x-netns interfaces. It will be more one "link" netns with the physical
>> interface and all other with one interface with the link part in this "link"
>> netns. Hence, only one nsid is needing in each netns.
>
> I will buy that a full mesh is unlikely.
>
> For people doing simulations anything physical has a limited number of
> links.
>
> For people wanting all to all connectivity setting up an internal
> macvlan (or the equivalent) is likely much simpler and more efficient
> that a full mesh.
>
> So the question in my mind is how do we create these identifiers at need
> (when we create the cross network namespace links) instead of at network
> namespace creation time.  I don't see an answer to that in your patches,
> and perhaps it obvious.
>

I wonder whether part of the problem is that we're thinking about
scoping wrong.  What if we made the hierarchy more explicit?

For example, we could give each netns an admin-assigned identifier
(e.g. a 64-bit number, maybe required to be unique, maybe not)
relative to its containing userns.  Then we could come up with a way
to identify user namespaces (i.e. inode number relative to containing
user ns, if that's well-defined).

>From user code's perspective, netnses that are in the requester's
userns or its descendents are identified by a path through a (possibly
zero-length) sequence of userns ids followed by a netns id.  Netnses
outside the requester's userns hierarchy cannot be named at all.

Would this make sense?  It should keep the asymptotic complexity of
everything under control and, for users of very large numbers of
network namespaces with complex routing, it doesn't require a
correspondingly large number of fds. It would have the added benefit
of allowing the same scheme to be used for all the other namespace
types, although it could be a bit odd for pid namespaces, which really
do have their own hierarchy.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids
  2014-10-02 13:48               ` [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
@ 2014-10-02 19:33                 ` Eric W. Biederman
  2014-10-03 12:22                   ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-10-02 19:33 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> With this patch, a user can define an id for a peer netns by providing a FD or a
> PID. These ids are local to netns (ie valid only into one netns).
>
> This will be useful for netlink messages when a x-netns interface is
> dumped.

You have a "id -> struct net *" table but you don't have a 
"struct net * -> id" table which looks like it will impact the
performance of peernet2id at scale.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-10-02 19:31               ` Andy Lutomirski
@ 2014-10-02 19:45                 ` Eric W. Biederman
  2014-10-02 19:48                   ` Andy Lutomirski
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-10-02 19:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Nicolas Dichtel, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

Andy Lutomirski <luto@amacapital.net> writes:

> On Thu, Oct 2, 2014 at 12:20 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>
>>> Le 29/09/2014 20:43, Eric W. Biederman a écrit :
>>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>>
>>>>> Le 26/09/2014 20:57, Eric W. Biederman a écrit :
>>>>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>>>>
>>>>>>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>>>>>>> <ebiederm@xmission.com> wrote:
>>>>>>>> I see two ways to go with this.
>>>>>>>>
>>>>>>>> - A per network namespace table to that you can store ids for ``peer''
>>>>>>>>     network namespaces.  The table would need to be populated manually by
>>>>>>>>     the likes of ip netns add.
>>>>>>>>
>>>>>>>>     That flips the order of assignment and makes this idea solid.
>>>>> I have a preference for this solution, because it allows to have a full
>>>>> broadcast messages. When you have a lot of network interfaces (> 10k),
>>>>> it saves a lot of time to avoid another request to get all informations.
>>>>
>>>> My practical question is how often does it happen that we care?
>>> In fact, I don't think that scenarii with a lot of netns have a full mesh of
>>> x-netns interfaces. It will be more one "link" netns with the physical
>>> interface and all other with one interface with the link part in this "link"
>>> netns. Hence, only one nsid is needing in each netns.
>>
>> I will buy that a full mesh is unlikely.
>>
>> For people doing simulations anything physical has a limited number of
>> links.
>>
>> For people wanting all to all connectivity setting up an internal
>> macvlan (or the equivalent) is likely much simpler and more efficient
>> that a full mesh.
>>
>> So the question in my mind is how do we create these identifiers at need
>> (when we create the cross network namespace links) instead of at network
>> namespace creation time.  I don't see an answer to that in your patches,
>> and perhaps it obvious.
>>
>
> I wonder whether part of the problem is that we're thinking about
> scoping wrong.  What if we made the hierarchy more explicit?
>
> For example, we could give each netns an admin-assigned identifier
> (e.g. a 64-bit number, maybe required to be unique, maybe not)
> relative to its containing userns.  Then we could come up with a way
> to identify user namespaces (i.e. inode number relative to containing
> user ns, if that's well-defined).

If as suggested we only assign ids when a tunnel (or equivalent) is
created between two network namespaces the space cost is a non-issue.
The ids become at worst a constant factor addition to the cost of the
tunnel.

To keep things simple we may want to assign a free id (if one does not
exist) when we connect a tunnel to a network namespace.

> From user code's perspective, netnses that are in the requester's
> userns or its descendents are identified by a path through a (possibly
> zero-length) sequence of userns ids followed by a netns id.  Netnses
> outside the requester's userns hierarchy cannot be named at all.
>
> Would this make sense? 

Nope.  What happens if I migrate 2 of the 4 network namespaces in a user
namespace?  The migration potentially fails.  Application migration does
not require user namespace migration.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-10-02 19:45                 ` Eric W. Biederman
@ 2014-10-02 19:48                   ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-10-02 19:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Nicolas Dichtel, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

On Thu, Oct 2, 2014 at 12:45 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Thu, Oct 2, 2014 at 12:20 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>
>>>> Le 29/09/2014 20:43, Eric W. Biederman a écrit :
>>>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>>>
>>>>>> Le 26/09/2014 20:57, Eric W. Biederman a écrit :
>>>>>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>>>>>
>>>>>>>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>>>>>>>> <ebiederm@xmission.com> wrote:
>>>>>>>>> I see two ways to go with this.
>>>>>>>>>
>>>>>>>>> - A per network namespace table to that you can store ids for ``peer''
>>>>>>>>>     network namespaces.  The table would need to be populated manually by
>>>>>>>>>     the likes of ip netns add.
>>>>>>>>>
>>>>>>>>>     That flips the order of assignment and makes this idea solid.
>>>>>> I have a preference for this solution, because it allows to have a full
>>>>>> broadcast messages. When you have a lot of network interfaces (> 10k),
>>>>>> it saves a lot of time to avoid another request to get all informations.
>>>>>
>>>>> My practical question is how often does it happen that we care?
>>>> In fact, I don't think that scenarii with a lot of netns have a full mesh of
>>>> x-netns interfaces. It will be more one "link" netns with the physical
>>>> interface and all other with one interface with the link part in this "link"
>>>> netns. Hence, only one nsid is needing in each netns.
>>>
>>> I will buy that a full mesh is unlikely.
>>>
>>> For people doing simulations anything physical has a limited number of
>>> links.
>>>
>>> For people wanting all to all connectivity setting up an internal
>>> macvlan (or the equivalent) is likely much simpler and more efficient
>>> that a full mesh.
>>>
>>> So the question in my mind is how do we create these identifiers at need
>>> (when we create the cross network namespace links) instead of at network
>>> namespace creation time.  I don't see an answer to that in your patches,
>>> and perhaps it obvious.
>>>
>>
>> I wonder whether part of the problem is that we're thinking about
>> scoping wrong.  What if we made the hierarchy more explicit?
>>
>> For example, we could give each netns an admin-assigned identifier
>> (e.g. a 64-bit number, maybe required to be unique, maybe not)
>> relative to its containing userns.  Then we could come up with a way
>> to identify user namespaces (i.e. inode number relative to containing
>> user ns, if that's well-defined).
>
> If as suggested we only assign ids when a tunnel (or equivalent) is
> created between two network namespaces the space cost is a non-issue.
> The ids become at worst a constant factor addition to the cost of the
> tunnel.
>
> To keep things simple we may want to assign a free id (if one does not
> exist) when we connect a tunnel to a network namespace.
>
>> From user code's perspective, netnses that are in the requester's
>> userns or its descendents are identified by a path through a (possibly
>> zero-length) sequence of userns ids followed by a netns id.  Netnses
>> outside the requester's userns hierarchy cannot be named at all.
>>
>> Would this make sense?
>
> Nope.  What happens if I migrate 2 of the 4 network namespaces in a user
> namespace?  The migration potentially fails.  Application migration does
> not require user namespace migration.

Hmm.  I guess that, as long as those network namespaces aren't
connected to anything else, migrating like that makes sense and ought
to work.  Fair enough.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns
  2014-10-02 19:20             ` [RFC PATCH net-next v2 0/5] " Eric W. Biederman
  2014-10-02 19:31               ` Andy Lutomirski
@ 2014-10-03 12:22               ` Nicolas Dichtel
  1 sibling, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-03 12:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Network Development, Linux Containers,
	linux-kernel, Linux API, David S. Miller, Stephen Hemminger,
	Andrew Morton, Cong Wang

Le 02/10/2014 21:20, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 29/09/2014 20:43, Eric W. Biederman a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>
>>>> Le 26/09/2014 20:57, Eric W. Biederman a écrit :
>>>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>>>
>>>>>> On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
>>>>>> <ebiederm@xmission.com> wrote:
>>>>>>> I see two ways to go with this.
>>>>>>>
>>>>>>> - A per network namespace table to that you can store ids for ``peer''
>>>>>>>      network namespaces.  The table would need to be populated manually by
>>>>>>>      the likes of ip netns add.
>>>>>>>
>>>>>>>      That flips the order of assignment and makes this idea solid.
>>>> I have a preference for this solution, because it allows to have a full
>>>> broadcast messages. When you have a lot of network interfaces (> 10k),
>>>> it saves a lot of time to avoid another request to get all informations.
>>>
>>> My practical question is how often does it happen that we care?
>> In fact, I don't think that scenarii with a lot of netns have a full mesh of
>> x-netns interfaces. It will be more one "link" netns with the physical
>> interface and all other with one interface with the link part in this "link"
>> netns. Hence, only one nsid is needing in each netns.
>
> I will buy that a full mesh is unlikely.
>
> For people doing simulations anything physical has a limited number of
> links.
>
> For people wanting all to all connectivity setting up an internal
> macvlan (or the equivalent) is likely much simpler and more efficient
> that a full mesh.
>
> So the question in my mind is how do we create these identifiers at need
> (when we create the cross network namespace links) instead of at network
> namespace creation time.  I don't see an answer to that in your patches,
> and perhaps it obvious.
For me, it is the responsability of the user who creates the netns. He should
know what will be done with this new netns, hence he may or may not define an
id. It's also possible to delegate this to the user who will create the tunnel.
In other words, it's part of the configuration.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids
  2014-10-02 19:33                 ` Eric W. Biederman
@ 2014-10-03 12:22                   ` Nicolas Dichtel
  0 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-03 12:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Le 02/10/2014 21:33, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> With this patch, a user can define an id for a peer netns by providing a FD or a
>> PID. These ids are local to netns (ie valid only into one netns).
>>
>> This will be useful for netlink messages when a x-netns interface is
>> dumped.
>
> You have a "id -> struct net *" table but you don't have a
> "struct net * -> id" table which looks like it will impact the
> performance of peernet2id at scale.
It is indirectly stores in 'struct idr'. It can be optimized later, with a
proper algorithm to find quickly this 'struct net *' (hash table? something
else?). A basic algorithm will not be more scalable than the current
idr_for_each().

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
                                 ` (3 preceding siblings ...)
  2014-10-02 13:48               ` [RFC PATCH net-next v3 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
@ 2014-10-30 15:25               ` Nicolas Dichtel
  2014-10-30 15:25                 ` [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
                                   ` (4 more replies)
  4 siblings, 5 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang

The goal of this serie is to be able to multicast netlink messages with an
attribute that identify a peer netns.
This is needed by the userland to interpret some informations contained in
netlink messages (like IFLA_LINK value, but also some other attributes in case
of x-netns netdevice (see also
http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

Ids of peer netns are set by userland via a new genl messages. These ids are
stored per netns and are local (ie only valid in the netns where they are set).
To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
the id of a peer netns. Note that it will be possible to add a table (struct net
-> id) later to optimize this lookup if needed.

Patch 1/4 introduces the netlink API mechanism to set and get these ids.
Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
messages. And patch 4/4 shows that the netlink messages can be symetric between
a GET and a SET.

iproute2 patches are available, I can send them on demand.

Here is a small screenshot to show how it can be used by userland.

First, setup netns and required ids:
$ ip netns add foo
$ ip netns del foo
$ ip netns
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns add foo
$ ip netns exec foo ip netns set init_net 0
$ ip netns
foo
init_net
$ ip netns exec foo ip netns
foo
init_net (id: 0)

Now, add and display an ipip tunnel, with its link part in init_net (id 0 in
netns foo) and the netdevice in foo:
$ ip netns exec foo ip link add ipip1 link-netnsid 0 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip netns exec foo ip l ls ipip1
6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0

The parameter link-netnsid shows us where the interface sends and receives
packets (and thus we know where encapsulated addresses are set).

RFCv3 -> v4:
  rebase on net-next
  add copyright text in the new netns.h file

RFCv2 -> RFCv3:
  ids are now defined by userland (via netlink). Ids are stored in each netns
  (and they are local to this netns).
  add get_link_net support for ip6 tunnels
  netnsid is now a s32 instead of a u32

RFCv1 -> RFCv2:
  remove useless ()
  ids are now stored in the user ns. It's possible to get an id for a peer netns
  only if the current netns and the peer netns have the same user ns parent.

 MAINTAINERS                  |   1 +
 include/net/ip6_tunnel.h     |   1 +
 include/net/ip_tunnels.h     |   1 +
 include/net/net_namespace.h  |   5 ++
 include/net/rtnetlink.h      |   2 +
 include/uapi/linux/Kbuild    |   1 +
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/netns.h   |  38 +++++++++
 net/core/net_namespace.c     | 195 +++++++++++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c         |  38 ++++++++-
 net/ipv4/ip_gre.c            |   2 +
 net/ipv4/ip_tunnel.c         |   8 ++
 net/ipv4/ip_vti.c            |   1 +
 net/ipv4/ipip.c              |   1 +
 net/ipv6/ip6_gre.c           |   1 +
 net/ipv6/ip6_tunnel.c        |   9 ++
 net/ipv6/ip6_vti.c           |   1 +
 net/ipv6/sit.c               |   1 +
 net/netlink/genetlink.c      |   4 +
 19 files changed, 308 insertions(+), 3 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids
  2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
@ 2014-10-30 15:25                 ` Nicolas Dichtel
  2014-10-30 18:35                   ` Eric W. Biederman
  2014-10-30 15:25                 ` [PATCH net-next v4 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
                                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to netns (ie valid only into one netns).

This will be useful for netlink messages when a x-netns interface is dumped.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 MAINTAINERS                 |   1 +
 include/net/net_namespace.h |   5 ++
 include/uapi/linux/Kbuild   |   1 +
 include/uapi/linux/netns.h  |  38 +++++++++
 net/core/net_namespace.c    | 195 ++++++++++++++++++++++++++++++++++++++++++++
 net/netlink/genetlink.c     |   4 +
 6 files changed, 244 insertions(+)
 create mode 100644 include/uapi/linux/netns.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 43898b1a8a2d..de7e6fcbd5c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6382,6 +6382,7 @@ F:	include/linux/netdevice.h
 F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
+F:	include/uapi/linux/netns.h
 F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index e0d64667a4b3..0f1367a71b81 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,6 +59,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	struct idr		netns_ids;
 
 	unsigned int		proc_inum;
 
@@ -289,6 +290,10 @@ static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
+int peernet2id(struct net *net, struct net *peer);
+struct net *get_net_ns_by_id(struct net *net, int id);
+int netns_genl_register(void);
+
 struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 6cad97485bad..d7f49c69585a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -277,6 +277,7 @@ header-y += netfilter_decnet.h
 header-y += netfilter_ipv4.h
 header-y += netfilter_ipv6.h
 header-y += netlink.h
+header-y += netns.h
 header-y += netrom.h
 header-y += nfc.h
 header-y += nfs.h
diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
new file mode 100644
index 000000000000..2edf129377de
--- /dev/null
+++ b/include/uapi/linux/netns.h
@@ -0,0 +1,38 @@
+/* Copyright (c) 2014 6WIND S.A.
+ * Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ */
+#ifndef _UAPI_LINUX_NETNS_H_
+#define _UAPI_LINUX_NETNS_H_
+
+/* Generic netlink messages */
+
+#define NETNS_GENL_NAME			"netns"
+#define NETNS_GENL_VERSION		0x1
+
+/* Commands */
+enum {
+	NETNS_CMD_UNSPEC,
+	NETNS_CMD_NEWID,
+	NETNS_CMD_GETID,
+	__NETNS_CMD_MAX,
+};
+
+#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
+
+/* Attributes */
+enum {
+	NETNSA_NONE,
+#define NETNSA_NSINDEX_UNKNOWN	-1
+	NETNSA_NSID,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NETNS_H_ */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7f155175bba8..4a5680ed42fb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -15,6 +15,8 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/netns.h>
+#include <net/genetlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -144,6 +146,50 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+/* This function is used by idr_for_each(). If net is equal to peer, the
+ * function returns the id so that idr_for_each() stops. Because we cannot
+ * returns the id 0 (idr_for_each() will not stop), we return the magic value
+ * -1 for it.
+ */
+static int net_eq_idr(int id, void *net, void *peer)
+{
+	if (net_eq(net, peer))
+		return id ? : -1;
+	return 0;
+}
+
+/* returns NETNSA_NSINDEX_UNKNOWN if not found */
+int peernet2id(struct net *net, struct net *peer)
+{
+	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+
+	ASSERT_RTNL();
+
+	/* Magic value for id 0. */
+	if (id == -1)
+		return 0;
+	if (id == 0)
+		return NETNSA_NSINDEX_UNKNOWN;
+
+	return id;
+}
+
+struct net *get_net_ns_by_id(struct net *net, int id)
+{
+	struct net *peer;
+
+	if (id < 0)
+		return NULL;
+
+	rcu_read_lock();
+	peer = idr_find(&net->netns_ids, id);
+	if (peer)
+		get_net(peer);
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -158,6 +204,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+	idr_init(&net->netns_ids);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -288,6 +335,14 @@ static void cleanup_net(struct work_struct *work)
 	list_for_each_entry(net, &net_kill_list, cleanup_list) {
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
+		for_each_net(tmp) {
+			int id = peernet2id(tmp, net);
+
+			if (id >= 0)
+				idr_remove(&tmp->netns_ids, id);
+		}
+		idr_destroy(&net->netns_ids);
+
 	}
 	rtnl_unlock();
 
@@ -399,6 +454,146 @@ static struct pernet_operations __net_initdata net_ns_ops = {
 	.exit = net_ns_net_exit,
 };
 
+static struct genl_family netns_genl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NETNS_GENL_NAME,
+	.version	= NETNS_GENL_VERSION,
+	.hdrsize	= 0,
+	.maxattr	= NETNSA_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
+	[NETNSA_NSID]		= { .type = NLA_S32 },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct net *peer;
+	int nsid, err;
+
+	if (!info->attrs[NETNSA_NSID])
+		return -EINVAL;
+	nsid = nla_get_s32(info->attrs[NETNSA_NSID]);
+	if (nsid < 0)
+		return -EINVAL;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	rtnl_lock();
+	if (peernet2id(net, peer) >= 0) {
+		err = -EEXIST;
+		goto out;
+	}
+
+	err = idr_alloc(&net->netns_ids, peer, nsid, nsid + 1, GFP_KERNEL);
+	if (err >= 0)
+		err = 0;
+out:
+	rtnl_unlock();
+	put_net(peer);
+	return err;
+}
+
+static int netns_nl_get_size(void)
+{
+	return nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+	       ;
+}
+
+static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net, struct net *peer)
+{
+	void *hdr;
+	int id;
+
+	hdr = genlmsg_put(skb, portid, seq, &netns_genl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	rtnl_lock();
+	id = peernet2id(net, peer);
+	rtnl_unlock();
+	if (nla_put_s32(skb, NETNSA_NSID, id))
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int netns_nl_cmd_getid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+	struct net *peer;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
+			    NLM_F_ACK, NETNS_CMD_GETID, net, peer);
+	if (err < 0)
+		goto err_out;
+
+	err = genlmsg_unicast(net, msg, info->snd_portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+out:
+	put_net(peer);
+	return err;
+}
+
+static struct genl_ops netns_genl_ops[] = {
+	{
+		.cmd = NETNS_CMD_NEWID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_newid,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NETNS_CMD_GETID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_getid,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+int netns_genl_register(void)
+{
+	return genl_register_family_with_ops(&netns_genl_family,
+					     netns_genl_ops);
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 76393f2f4b22..c6f39e40c9f3 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1029,6 +1029,10 @@ static int __init genl_init(void)
 	if (err)
 		goto problem;
 
+	err = netns_genl_register();
+	if (err < 0)
+		goto problem;
+
 	return 0;
 
 problem:
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH net-next v4 2/4] rtnl: add link netns id to interface messages
  2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
  2014-10-30 15:25                 ` [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
@ 2014-10-30 15:25                 ` Nicolas Dichtel
  2014-10-30 15:25                 ` [PATCH net-next v4 3/4] iptunnels: advertise link netns via netlink Nicolas Dichtel
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels). When there is no id,
we put NETNSA_NSINDEX_UNKNOWN into this attribute to indicate to userland that
the link netns is different from the interface netns. Hence, userland knows that
some information like IFLA_LINK are not interpretable.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/rtnetlink.h      |  2 ++
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 13 +++++++++++++
 3 files changed, 16 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index e21b9f9653c0..6c6d5393fc34 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -46,6 +46,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  *			    to create when creating a new device.
  *	@get_num_rx_queues: Function to determine number of receive queues
  *			    to create when creating a new device.
+ *	@get_link_net: Function to get the i/o netns of the device
  */
 struct rtnl_link_ops {
 	struct list_head	list;
@@ -93,6 +94,7 @@ struct rtnl_link_ops {
 	int			(*fill_slave_info)(struct sk_buff *skb,
 						   const struct net_device *dev,
 						   const struct net_device *slave_dev);
+	struct net		*(*get_link_net)(const struct net_device *dev);
 };
 
 int __rtnl_link_register(struct rtnl_link_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 7072d8325016..d2729f63cf01 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_LINK_NETNSID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a6882686ca3a..1b9329512496 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -862,6 +862,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
+	       + nla_total_size(4) /* IFLA_LINK_NETNSID */
 	       + nla_total_size(ext_filter_mask
 			        & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */
 	       + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
@@ -1134,6 +1135,18 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			goto nla_put_failure;
 	}
 
+	if (dev->rtnl_link_ops &&
+	    dev->rtnl_link_ops->get_link_net) {
+		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
+
+		if (!net_eq(dev_net(dev), link_net)) {
+			int id = peernet2id(dev_net(dev), link_net);
+
+			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
+				goto nla_put_failure;
+		}
+	}
+
 	if (!(af_spec = nla_nest_start(skb, IFLA_AF_SPEC)))
 		goto nla_put_failure;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH net-next v4 3/4] iptunnels: advertise link netns via netlink
  2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
  2014-10-30 15:25                 ` [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
  2014-10-30 15:25                 ` [PATCH net-next v4 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
@ 2014-10-30 15:25                 ` Nicolas Dichtel
  2014-10-30 15:25                 ` [PATCH net-next v4 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
  2014-10-30 18:41                 ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Eric W. Biederman
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/ip6_tunnel.h | 1 +
 include/net/ip_tunnels.h | 1 +
 net/ipv4/ip_gre.c        | 2 ++
 net/ipv4/ip_tunnel.c     | 8 ++++++++
 net/ipv4/ip_vti.c        | 1 +
 net/ipv4/ipip.c          | 1 +
 net/ipv6/ip6_gre.c       | 1 +
 net/ipv6/ip6_tunnel.c    | 9 +++++++++
 net/ipv6/ip6_vti.c       | 1 +
 net/ipv6/sit.c           | 1 +
 10 files changed, 26 insertions(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index a5593dab6af7..8648519f4555 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -69,6 +69,7 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t);
 __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw);
 __u32 ip6_tnl_get_cap(struct ip6_tnl *t, const struct in6_addr *laddr,
 			     const struct in6_addr *raddr);
+struct net *ip6_tnl_get_link_net(const struct net_device *dev);
 
 static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 5bc6edeb7143..ce4ff6161fab 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -122,6 +122,7 @@ struct ip_tunnel_net {
 int ip_tunnel_init(struct net_device *dev);
 void ip_tunnel_uninit(struct net_device *dev);
 void  ip_tunnel_dellink(struct net_device *dev, struct list_head *head);
+struct net *ip_tunnel_get_link_net(const struct net_device *dev);
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 		       struct rtnl_link_ops *ops, char *devname);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 12055fdbe716..9e2e29a8c989 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -827,6 +827,7 @@ static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
@@ -841,6 +842,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __net_init ipgre_tap_init_net(struct net *net)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 0bb8e141eacc..3e1edd544b27 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -972,6 +972,14 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_dellink);
 
+struct net *ip_tunnel_get_link_net(const struct net_device *dev)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip_tunnel_get_link_net);
+
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 				  struct rtnl_link_ops *ops, char *devname)
 {
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 3e861011e4a3..f0fab26e4ddc 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -530,6 +530,7 @@ static struct rtnl_link_ops vti_link_ops __read_mostly = {
 	.changelink	= vti_changelink,
 	.get_size	= vti_get_size,
 	.fill_info	= vti_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __init vti_init(void)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 37096d64730e..e7a183baba0a 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -498,6 +498,7 @@ static struct rtnl_link_ops ipip_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipip_get_size,
 	.fill_info	= ipip_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel ipip_handler __read_mostly = {
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 12c3c8ef3849..5165ac7fde22 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1661,6 +1661,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = {
 	.dellink	= ip6gre_dellink,
 	.get_size	= ip6gre_get_size,
 	.fill_info	= ip6gre_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 9409887fb664..6b2534ea9c54 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1703,6 +1703,14 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+struct net *ip6_tnl_get_link_net(const struct net_device *dev)
+{
+	struct ip6_tnl *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip6_tnl_get_link_net);
+
 static const struct nla_policy ip6_tnl_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_LINK]		= { .type = NLA_U32 },
 	[IFLA_IPTUN_LOCAL]		= { .len = sizeof(struct in6_addr) },
@@ -1726,6 +1734,7 @@ static struct rtnl_link_ops ip6_link_ops __read_mostly = {
 	.dellink	= ip6_tnl_dellink,
 	.get_size	= ip6_tnl_get_size,
 	.fill_info	= ip6_tnl_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct xfrm6_tunnel ip4ip6_handler __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d440bb585524..43966dcc9603 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -992,6 +992,7 @@ static struct rtnl_link_ops vti6_link_ops __read_mostly = {
 	.changelink	= vti6_changelink,
 	.get_size	= vti6_get_size,
 	.fill_info	= vti6_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 58e5b4710127..c858d0eb267a 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1765,6 +1765,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
 	.get_size	= ipip6_get_size,
 	.fill_info	= ipip6_fill_info,
 	.dellink	= ipip6_dellink,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel sit_handler __read_mostly = {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH net-next v4 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set
  2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
                                   ` (2 preceding siblings ...)
  2014-10-30 15:25                 ` [PATCH net-next v4 3/4] iptunnels: advertise link netns via netlink Nicolas Dichtel
@ 2014-10-30 15:25                 ` Nicolas Dichtel
  2014-10-30 18:41                 ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Eric W. Biederman
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/rtnetlink.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1b9329512496..57959a85ed2c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1211,6 +1211,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
+	[IFLA_LINK_NETNSID]	= { .type = NLA_S32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1983,7 +1984,7 @@ replay:
 		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 0];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
-		struct net *dest_net;
+		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
@@ -2089,7 +2090,18 @@ replay:
 		if (IS_ERR(dest_net))
 			return PTR_ERR(dest_net);
 
-		dev = rtnl_create_link(dest_net, ifname, name_assign_type, ops, tb);
+		if (tb[IFLA_LINK_NETNSID]) {
+			int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
+
+			link_net = get_net_ns_by_id(dest_net, id);
+			if (link_net == NULL) {
+				err =  -EINVAL;
+				goto out;
+			}
+		}
+
+		dev = rtnl_create_link(link_net ? : dest_net, ifname,
+				       name_assign_type, ops, tb);
 		if (IS_ERR(dev)) {
 			err = PTR_ERR(dev);
 			goto out;
@@ -2117,9 +2129,16 @@ replay:
 			}
 		}
 		err = rtnl_configure_link(dev, ifm);
-		if (err < 0)
+		if (err < 0) {
 			unregister_netdevice(dev);
+			goto out;
+		}
+
+		if (link_net)
+			err = dev_change_net_namespace(dev, dest_net, ifname);
 out:
+		if (link_net)
+			put_net(link_net);
 		put_net(dest_net);
 		return err;
 	}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids
  2014-10-30 15:25                 ` [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
@ 2014-10-30 18:35                   ` Eric W. Biederman
  2014-10-31  9:41                     ` Nicolas Dichtel
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2014-10-30 18:35 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> With this patch, a user can define an id for a peer netns by providing a FD or a
> PID. These ids are local to netns (ie valid only into one netns).

Scratches head.  Do you actually find value in using the pid instead of
a file descriptor?

Doing things by pid was an early attempt to make things work, and has
been a bit clutsy.  If you don't find value in it I would recommend just
supporting getting/setting the network namespace by file descriptor.

Eric

> This will be useful for netlink messages when a x-netns interface is dumped.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
>  MAINTAINERS                 |   1 +
>  include/net/net_namespace.h |   5 ++
>  include/uapi/linux/Kbuild   |   1 +
>  include/uapi/linux/netns.h  |  38 +++++++++
>  net/core/net_namespace.c    | 195 ++++++++++++++++++++++++++++++++++++++++++++
>  net/netlink/genetlink.c     |   4 +
>  6 files changed, 244 insertions(+)
>  create mode 100644 include/uapi/linux/netns.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 43898b1a8a2d..de7e6fcbd5c2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6382,6 +6382,7 @@ F:	include/linux/netdevice.h
>  F:	include/uapi/linux/in.h
>  F:	include/uapi/linux/net.h
>  F:	include/uapi/linux/netdevice.h
> +F:	include/uapi/linux/netns.h
>  F:	tools/net/
>  F:	tools/testing/selftests/net/
>  F:	lib/random32.c
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index e0d64667a4b3..0f1367a71b81 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -59,6 +59,7 @@ struct net {
>  	struct list_head	exit_list;	/* Use only net_mutex */
>  
>  	struct user_namespace   *user_ns;	/* Owning user namespace */
> +	struct idr		netns_ids;
>  
>  	unsigned int		proc_inum;
>  
> @@ -289,6 +290,10 @@ static inline struct net *read_pnet(struct net * const *pnet)
>  #define __net_initconst	__initconst
>  #endif
>  
> +int peernet2id(struct net *net, struct net *peer);
> +struct net *get_net_ns_by_id(struct net *net, int id);
> +int netns_genl_register(void);
> +
>  struct pernet_operations {
>  	struct list_head list;
>  	int (*init)(struct net *net);
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index 6cad97485bad..d7f49c69585a 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -277,6 +277,7 @@ header-y += netfilter_decnet.h
>  header-y += netfilter_ipv4.h
>  header-y += netfilter_ipv6.h
>  header-y += netlink.h
> +header-y += netns.h
>  header-y += netrom.h
>  header-y += nfc.h
>  header-y += nfs.h
> diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
> new file mode 100644
> index 000000000000..2edf129377de
> --- /dev/null
> +++ b/include/uapi/linux/netns.h
> @@ -0,0 +1,38 @@
> +/* Copyright (c) 2014 6WIND S.A.
> + * Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + */
> +#ifndef _UAPI_LINUX_NETNS_H_
> +#define _UAPI_LINUX_NETNS_H_
> +
> +/* Generic netlink messages */
> +
> +#define NETNS_GENL_NAME			"netns"
> +#define NETNS_GENL_VERSION		0x1
> +
> +/* Commands */
> +enum {
> +	NETNS_CMD_UNSPEC,
> +	NETNS_CMD_NEWID,
> +	NETNS_CMD_GETID,
> +	__NETNS_CMD_MAX,
> +};
> +
> +#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
> +
> +/* Attributes */
> +enum {
> +	NETNSA_NONE,
> +#define NETNSA_NSINDEX_UNKNOWN	-1
> +	NETNSA_NSID,
> +	NETNSA_PID,
> +	NETNSA_FD,
> +	__NETNSA_MAX,
> +};
> +
> +#define NETNSA_MAX		(__NETNSA_MAX - 1)
> +
> +#endif /* _UAPI_LINUX_NETNS_H_ */
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 7f155175bba8..4a5680ed42fb 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -15,6 +15,8 @@
>  #include <linux/file.h>
>  #include <linux/export.h>
>  #include <linux/user_namespace.h>
> +#include <linux/netns.h>
> +#include <net/genetlink.h>
>  #include <net/net_namespace.h>
>  #include <net/netns/generic.h>
>  
> @@ -144,6 +146,50 @@ static void ops_free_list(const struct pernet_operations *ops,
>  	}
>  }
>  
> +/* This function is used by idr_for_each(). If net is equal to peer, the
> + * function returns the id so that idr_for_each() stops. Because we cannot
> + * returns the id 0 (idr_for_each() will not stop), we return the magic value
> + * -1 for it.
> + */
> +static int net_eq_idr(int id, void *net, void *peer)
> +{
> +	if (net_eq(net, peer))
> +		return id ? : -1;
> +	return 0;
> +}
> +
> +/* returns NETNSA_NSINDEX_UNKNOWN if not found */
> +int peernet2id(struct net *net, struct net *peer)
> +{
> +	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
> +
> +	ASSERT_RTNL();
> +
> +	/* Magic value for id 0. */
> +	if (id == -1)
> +		return 0;
> +	if (id == 0)
> +		return NETNSA_NSINDEX_UNKNOWN;
> +
> +	return id;
> +}
> +
> +struct net *get_net_ns_by_id(struct net *net, int id)
> +{
> +	struct net *peer;
> +
> +	if (id < 0)
> +		return NULL;
> +
> +	rcu_read_lock();
> +	peer = idr_find(&net->netns_ids, id);
> +	if (peer)
> +		get_net(peer);
> +	rcu_read_unlock();
> +
> +	return peer;
> +}
> +
>  /*
>   * setup_net runs the initializers for the network namespace object.
>   */
> @@ -158,6 +204,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
>  	atomic_set(&net->passive, 1);
>  	net->dev_base_seq = 1;
>  	net->user_ns = user_ns;
> +	idr_init(&net->netns_ids);
>  
>  #ifdef NETNS_REFCNT_DEBUG
>  	atomic_set(&net->use_count, 0);
> @@ -288,6 +335,14 @@ static void cleanup_net(struct work_struct *work)
>  	list_for_each_entry(net, &net_kill_list, cleanup_list) {
>  		list_del_rcu(&net->list);
>  		list_add_tail(&net->exit_list, &net_exit_list);
> +		for_each_net(tmp) {
> +			int id = peernet2id(tmp, net);
> +
> +			if (id >= 0)
> +				idr_remove(&tmp->netns_ids, id);
> +		}
> +		idr_destroy(&net->netns_ids);
> +
>  	}
>  	rtnl_unlock();
>  
> @@ -399,6 +454,146 @@ static struct pernet_operations __net_initdata net_ns_ops = {
>  	.exit = net_ns_net_exit,
>  };
>  
> +static struct genl_family netns_genl_family = {
> +	.id		= GENL_ID_GENERATE,
> +	.name		= NETNS_GENL_NAME,
> +	.version	= NETNS_GENL_VERSION,
> +	.hdrsize	= 0,
> +	.maxattr	= NETNSA_MAX,
> +	.netnsok	= true,
> +};
> +
> +static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
> +	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
> +	[NETNSA_NSID]		= { .type = NLA_S32 },
> +	[NETNSA_PID]		= { .type = NLA_U32 },
> +	[NETNSA_FD]		= { .type = NLA_U32 },
> +};
> +
> +static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct net *net = genl_info_net(info);
> +	struct net *peer;
> +	int nsid, err;
> +
> +	if (!info->attrs[NETNSA_NSID])
> +		return -EINVAL;
> +	nsid = nla_get_s32(info->attrs[NETNSA_NSID]);
> +	if (nsid < 0)
> +		return -EINVAL;
> +
> +	if (info->attrs[NETNSA_PID])
> +		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
> +	else if (info->attrs[NETNSA_FD])
> +		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
> +	else
> +		return -EINVAL;
> +	if (IS_ERR(peer))
> +		return PTR_ERR(peer);
> +
> +	rtnl_lock();
> +	if (peernet2id(net, peer) >= 0) {
> +		err = -EEXIST;
> +		goto out;
> +	}
> +
> +	err = idr_alloc(&net->netns_ids, peer, nsid, nsid + 1, GFP_KERNEL);
> +	if (err >= 0)
> +		err = 0;
> +out:
> +	rtnl_unlock();
> +	put_net(peer);
> +	return err;
> +}
> +
> +static int netns_nl_get_size(void)
> +{
> +	return nla_total_size(sizeof(s32)) /* NETNSA_NSID */
> +	       ;
> +}
> +
> +static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
> +			 int cmd, struct net *net, struct net *peer)
> +{
> +	void *hdr;
> +	int id;
> +
> +	hdr = genlmsg_put(skb, portid, seq, &netns_genl_family, flags, cmd);
> +	if (!hdr)
> +		return -EMSGSIZE;
> +
> +	rtnl_lock();
> +	id = peernet2id(net, peer);
> +	rtnl_unlock();
> +	if (nla_put_s32(skb, NETNSA_NSID, id))
> +		goto nla_put_failure;
> +
> +	return genlmsg_end(skb, hdr);
> +
> +nla_put_failure:
> +	genlmsg_cancel(skb, hdr);
> +	return -EMSGSIZE;
> +}
> +
> +static int netns_nl_cmd_getid(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct net *net = genl_info_net(info);
> +	struct sk_buff *msg;
> +	int err = -ENOBUFS;
> +	struct net *peer;
> +
> +	if (info->attrs[NETNSA_PID])
> +		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
> +	else if (info->attrs[NETNSA_FD])
> +		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
> +	else
> +		return -EINVAL;
> +
> +	if (IS_ERR(peer))
> +		return PTR_ERR(peer);
> +
> +	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
> +	if (!msg) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
> +			    NLM_F_ACK, NETNS_CMD_GETID, net, peer);
> +	if (err < 0)
> +		goto err_out;
> +
> +	err = genlmsg_unicast(net, msg, info->snd_portid);
> +	goto out;
> +
> +err_out:
> +	nlmsg_free(msg);
> +out:
> +	put_net(peer);
> +	return err;
> +}
> +
> +static struct genl_ops netns_genl_ops[] = {
> +	{
> +		.cmd = NETNS_CMD_NEWID,
> +		.policy = netns_nl_policy,
> +		.doit = netns_nl_cmd_newid,
> +		.flags = GENL_ADMIN_PERM,
> +	},
> +	{
> +		.cmd = NETNS_CMD_GETID,
> +		.policy = netns_nl_policy,
> +		.doit = netns_nl_cmd_getid,
> +		.flags = GENL_ADMIN_PERM,
> +	},
> +};
> +
> +int netns_genl_register(void)
> +{
> +	return genl_register_family_with_ops(&netns_genl_family,
> +					     netns_genl_ops);
> +}
> +
>  static int __init net_ns_init(void)
>  {
>  	struct net_generic *ng;
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 76393f2f4b22..c6f39e40c9f3 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -1029,6 +1029,10 @@ static int __init genl_init(void)
>  	if (err)
>  		goto problem;
>  
> +	err = netns_genl_register();
> +	if (err < 0)
> +		goto problem;
> +
>  	return 0;
>  
>  problem:

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
                                   ` (3 preceding siblings ...)
  2014-10-30 15:25                 ` [PATCH net-next v4 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
@ 2014-10-30 18:41                 ` Eric W. Biederman
  2014-10-31  9:48                   ` Nicolas Dichtel
                                     ` (2 more replies)
  4 siblings, 3 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-10-30 18:41 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> The goal of this serie is to be able to multicast netlink messages with an
> attribute that identify a peer netns.
> This is needed by the userland to interpret some informations contained in
> netlink messages (like IFLA_LINK value, but also some other attributes in case
> of x-netns netdevice (see also
> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>
> Ids of peer netns are set by userland via a new genl messages. These ids are
> stored per netns and are local (ie only valid in the netns where they are set).
> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
> the id of a peer netns. Note that it will be possible to add a table (struct net
> -> id) later to optimize this lookup if needed.
>
> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
> messages. And patch 4/4 shows that the netlink messages can be symetric between
> a GET and a SET.
>
> iproute2 patches are available, I can send them on demand.

A quick reply.  I think this patchset is in the right general direction.
There are some oddball details that seem odd/awkward to me such as using
genetlink instead of rtnetlink to get and set the ids, and not having
ids if they are not set (that feels like a maintenance/usability challenge).

I would like to give your patches a deep review, but I won't be able to
do that for a couple of weeks.  I am deep in the process of moving,
and will be mostly offline until about the Nov 11th.

Eric


> Here is a small screenshot to show how it can be used by userland.
>
> First, setup netns and required ids:
> $ ip netns add foo
> $ ip netns del foo
> $ ip netns
> $ touch /var/run/netns/init_net
> $ mount --bind /proc/1/ns/net /var/run/netns/init_net
> $ ip netns add foo
> $ ip netns exec foo ip netns set init_net 0
> $ ip netns
> foo
> init_net
> $ ip netns exec foo ip netns
> foo
> init_net (id: 0)
>
> Now, add and display an ipip tunnel, with its link part in init_net (id 0 in
> netns foo) and the netdevice in foo:
> $ ip netns exec foo ip link add ipip1 link-netnsid 0 type ipip remote 10.16.0.121 local 10.16.0.249
> $ ip netns exec foo ip l ls ipip1
> 6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
>     link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0
>
> The parameter link-netnsid shows us where the interface sends and receives
> packets (and thus we know where encapsulated addresses are set).
>
> RFCv3 -> v4:
>   rebase on net-next
>   add copyright text in the new netns.h file
>
> RFCv2 -> RFCv3:
>   ids are now defined by userland (via netlink). Ids are stored in each netns
>   (and they are local to this netns).
>   add get_link_net support for ip6 tunnels
>   netnsid is now a s32 instead of a u32
>
> RFCv1 -> RFCv2:
>   remove useless ()
>   ids are now stored in the user ns. It's possible to get an id for a peer netns
>   only if the current netns and the peer netns have the same user ns parent.
>
>  MAINTAINERS                  |   1 +
>  include/net/ip6_tunnel.h     |   1 +
>  include/net/ip_tunnels.h     |   1 +
>  include/net/net_namespace.h  |   5 ++
>  include/net/rtnetlink.h      |   2 +
>  include/uapi/linux/Kbuild    |   1 +
>  include/uapi/linux/if_link.h |   1 +
>  include/uapi/linux/netns.h   |  38 +++++++++
>  net/core/net_namespace.c     | 195 +++++++++++++++++++++++++++++++++++++++++++
>  net/core/rtnetlink.c         |  38 ++++++++-
>  net/ipv4/ip_gre.c            |   2 +
>  net/ipv4/ip_tunnel.c         |   8 ++
>  net/ipv4/ip_vti.c            |   1 +
>  net/ipv4/ipip.c              |   1 +
>  net/ipv6/ip6_gre.c           |   1 +
>  net/ipv6/ip6_tunnel.c        |   9 ++
>  net/ipv6/ip6_vti.c           |   1 +
>  net/ipv6/sit.c               |   1 +
>  net/netlink/genetlink.c      |   4 +
>  19 files changed, 308 insertions(+), 3 deletions(-)
>
> Comments are welcome.
>
> Regards,
> Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids
  2014-10-30 18:35                   ` Eric W. Biederman
@ 2014-10-31  9:41                     ` Nicolas Dichtel
  0 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-31  9:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Le 30/10/2014 19:35, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> With this patch, a user can define an id for a peer netns by providing a FD or a
>> PID. These ids are local to netns (ie valid only into one netns).
>
> Scratches head.  Do you actually find value in using the pid instead of
> a file descriptor?
I copied the mechanism from rtnl_link_get_net():
First check if the user provides a PID, if not, check for a FD.

>
> Doing things by pid was an early attempt to make things work, and has
> been a bit clutsy.  If you don't find value in it I would recommend just
> supporting getting/setting the network namespace by file descriptor.
Hmm, if I understand well, it's what is done in the patch:

[snip]
>> +static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
>> +{
[snip]
>> +	if (info->attrs[NETNSA_PID])
>> +		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
>> +	else if (info->attrs[NETNSA_FD])
>> +		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
>> +	else
>> +		return -EINVAL;
>> +	if (IS_ERR(peer))
>> +		return PTR_ERR(peer);

Am I right?


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-10-30 18:41                 ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Eric W. Biederman
@ 2014-10-31  9:48                   ` Nicolas Dichtel
  2014-10-31 19:14                     ` Eric W. Biederman
  2014-11-01 21:08                   ` [PATCH net-next v4 " David Miller
  2014-11-24 13:45                   ` Nicolas Dichtel
  2 siblings, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-10-31  9:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Le 30/10/2014 19:41, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> The goal of this serie is to be able to multicast netlink messages with an
>> attribute that identify a peer netns.
>> This is needed by the userland to interpret some informations contained in
>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>> of x-netns netdevice (see also
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>
>> Ids of peer netns are set by userland via a new genl messages. These ids are
>> stored per netns and are local (ie only valid in the netns where they are set).
>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>> the id of a peer netns. Note that it will be possible to add a table (struct net
>> -> id) later to optimize this lookup if needed.
>>
>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>> a GET and a SET.
>>
>> iproute2 patches are available, I can send them on demand.
>
> A quick reply.  I think this patchset is in the right general direction.
> There are some oddball details that seem odd/awkward to me such as using
> genetlink instead of rtnetlink to get and set the ids, and not having
> ids if they are not set (that feels like a maintenance/usability challenge).
No problem to use rtnetlink, in fact, I hesitated.

For the second point, I'm not sure to follow you: how to have an id, which will
not break migration, without asking the user to set it?
Note that if the user does not provide an id, you still have a magic value to
say "it's a peer netns but we don't know which one".

>
> I would like to give your patches a deep review, but I won't be able to
> do that for a couple of weeks.  I am deep in the process of moving,
> and will be mostly offline until about the Nov 11th.

No problem, I will wait.
I would be great to get a final version for the 3.19 ;-)


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-10-31  9:48                   ` Nicolas Dichtel
@ 2014-10-31 19:14                     ` Eric W. Biederman
  2014-11-05 14:23                       ` Nicolas Dichtel
  2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
  0 siblings, 2 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-10-31 19:14 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 30/10/2014 19:41, Eric W. Biederman a écrit :
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>
>>> The goal of this serie is to be able to multicast netlink messages with an
>>> attribute that identify a peer netns.
>>> This is needed by the userland to interpret some informations contained in
>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>> of x-netns netdevice (see also
>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>
>>> Ids of peer netns are set by userland via a new genl messages. These ids are
>>> stored per netns and are local (ie only valid in the netns where they are set).
>>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>>> the id of a peer netns. Note that it will be possible to add a table (struct net
>>> -> id) later to optimize this lookup if needed.
>>>
>>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>>> a GET and a SET.
>>>
>>> iproute2 patches are available, I can send them on demand.
>>
>> A quick reply.  I think this patchset is in the right general direction.
>> There are some oddball details that seem odd/awkward to me such as using
>> genetlink instead of rtnetlink to get and set the ids, and not having
>> ids if they are not set (that feels like a maintenance/usability challenge).
> No problem to use rtnetlink, in fact, I hesitated.
>
> For the second point, I'm not sure to follow you: how to have an id, which will
> not break migration, without asking the user to set it?

We have that situtation with ifindex already.  Basically the thought is
to allow an id to be set, but also allow an id to be auto-generated if
we use an namespace without an id being set.

My gut says if we can figure that out we will have an interface with
much more utility.

> Note that if the user does not provide an id, you still have a magic value to
> say "it's a peer netns but we don't know which one".

That is certainly an improvement in clarity over where we are today.

>> I would like to give your patches a deep review, but I won't be able to
>> do that for a couple of weeks.  I am deep in the process of moving,
>> and will be mostly offline until about the Nov 11th.
>
> No problem, I will wait.
> I would be great to get a final version for the 3.19 ;-)

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-10-30 18:41                 ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Eric W. Biederman
  2014-10-31  9:48                   ` Nicolas Dichtel
@ 2014-11-01 21:08                   ` David Miller
  2014-11-24 13:45                   ` Nicolas Dichtel
  2 siblings, 0 replies; 67+ messages in thread
From: David Miller @ 2014-11-01 21:08 UTC (permalink / raw)
  To: ebiederm
  Cc: nicolas.dichtel, netdev, containers, linux-kernel, linux-api,
	stephen, akpm, luto, cwang

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 30 Oct 2014 11:41:03 -0700

> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
> 
>> The goal of this serie is to be able to multicast netlink messages with an
>> attribute that identify a peer netns.
>> This is needed by the userland to interpret some informations contained in
>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>> of x-netns netdevice (see also
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>
>> Ids of peer netns are set by userland via a new genl messages. These ids are
>> stored per netns and are local (ie only valid in the netns where they are set).
>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>> the id of a peer netns. Note that it will be possible to add a table (struct net
>> -> id) later to optimize this lookup if needed.
>>
>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>> a GET and a SET.
>>
>> iproute2 patches are available, I can send them on demand.
> 
> A quick reply.  I think this patchset is in the right general direction.
> There are some oddball details that seem odd/awkward to me such as using
> genetlink instead of rtnetlink to get and set the ids, and not having
> ids if they are not set (that feels like a maintenance/usability challenge).
> 
> I would like to give your patches a deep review, but I won't be able to
> do that for a couple of weeks.  I am deep in the process of moving,
> and will be mostly offline until about the Nov 11th.

I'm going to mark this patch set 'deferred' in patchwork until things
move forward.

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-10-31 19:14                     ` Eric W. Biederman
@ 2014-11-05 14:23                       ` Nicolas Dichtel
  2014-12-04 16:21                         ` Nicolas Dichtel
  2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
  1 sibling, 1 reply; 67+ messages in thread
From: Nicolas Dichtel @ 2014-11-05 14:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Le 31/10/2014 20:14, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 30/10/2014 19:41, Eric W. Biederman a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>
>>>> The goal of this serie is to be able to multicast netlink messages with an
>>>> attribute that identify a peer netns.
>>>> This is needed by the userland to interpret some informations contained in
>>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>>> of x-netns netdevice (see also
>>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>>
>>>> Ids of peer netns are set by userland via a new genl messages. These ids are
>>>> stored per netns and are local (ie only valid in the netns where they are set).
>>>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>>>> the id of a peer netns. Note that it will be possible to add a table (struct net
>>>> -> id) later to optimize this lookup if needed.
>>>>
>>>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>>>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>>>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>>>> a GET and a SET.
>>>>
>>>> iproute2 patches are available, I can send them on demand.
>>>
>>> A quick reply.  I think this patchset is in the right general direction.
>>> There are some oddball details that seem odd/awkward to me such as using
>>> genetlink instead of rtnetlink to get and set the ids, and not having
>>> ids if they are not set (that feels like a maintenance/usability challenge).
>> No problem to use rtnetlink, in fact, I hesitated.
>>
>> For the second point, I'm not sure to follow you: how to have an id, which will
>> not break migration, without asking the user to set it?
>
> We have that situtation with ifindex already.  Basically the thought is
> to allow an id to be set, but also allow an id to be auto-generated if
> we use an namespace without an id being set.
If my understanding is correct, the difference is that we want to hide some
netns.
Do you think we can generate an id for each netns that does not have one and
relying on the fact that this id has no meaning unless you have a netns file
descriptor that allow you to get the id of this netns?


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-10-30 18:41                 ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Eric W. Biederman
  2014-10-31  9:48                   ` Nicolas Dichtel
  2014-11-01 21:08                   ` [PATCH net-next v4 " David Miller
@ 2014-11-24 13:45                   ` Nicolas Dichtel
  2 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-11-24 13:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Le 30/10/2014 19:41, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> The goal of this serie is to be able to multicast netlink messages with an
>> attribute that identify a peer netns.
>> This is needed by the userland to interpret some informations contained in
>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>> of x-netns netdevice (see also
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>
>> Ids of peer netns are set by userland via a new genl messages. These ids are
>> stored per netns and are local (ie only valid in the netns where they are set).
>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>> the id of a peer netns. Note that it will be possible to add a table (struct net
>> -> id) later to optimize this lookup if needed.
>>
>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>> a GET and a SET.
>>
>> iproute2 patches are available, I can send them on demand.
>
> A quick reply.  I think this patchset is in the right general direction.
> There are some oddball details that seem odd/awkward to me such as using
> genetlink instead of rtnetlink to get and set the ids, and not having
> ids if they are not set (that feels like a maintenance/usability challenge).
>
> I would like to give your patches a deep review, but I won't be able to
> do that for a couple of weeks.  I am deep in the process of moving,
> and will be mostly offline until about the Nov 11th.
Eric, did you have a chance to look at this?


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
  2014-11-05 14:23                       ` Nicolas Dichtel
@ 2014-12-04 16:21                         ` Nicolas Dichtel
  0 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2014-12-04 16:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, containers, linux-kernel, linux-api, davem, stephen,
	akpm, luto, cwang

Le 05/11/2014 15:23, Nicolas Dichtel a écrit :
> Le 31/10/2014 20:14, Eric W. Biederman a écrit :
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>
>>> Le 30/10/2014 19:41, Eric W. Biederman a écrit :
>>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>>>
>>>>> The goal of this serie is to be able to multicast netlink messages with an
>>>>> attribute that identify a peer netns.
>>>>> This is needed by the userland to interpret some informations contained in
>>>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>>>> of x-netns netdevice (see also
>>>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>>>
>>>>> Ids of peer netns are set by userland via a new genl messages. These ids are
>>>>> stored per netns and are local (ie only valid in the netns where they are
>>>>> set).
>>>>> To avoid allocating an int for each peer netns, I use idr_for_each() to
>>>>> retrieve
>>>>> the id of a peer netns. Note that it will be possible to add a table
>>>>> (struct net
>>>>> -> id) later to optimize this lookup if needed.
>>>>>
>>>>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>>>>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>>>>> messages. And patch 4/4 shows that the netlink messages can be symetric
>>>>> between
>>>>> a GET and a SET.
>>>>>
>>>>> iproute2 patches are available, I can send them on demand.
>>>>
>>>> A quick reply.  I think this patchset is in the right general direction.
>>>> There are some oddball details that seem odd/awkward to me such as using
>>>> genetlink instead of rtnetlink to get and set the ids, and not having
>>>> ids if they are not set (that feels like a maintenance/usability challenge).
>>> No problem to use rtnetlink, in fact, I hesitated.
>>>
>>> For the second point, I'm not sure to follow you: how to have an id, which will
>>> not break migration, without asking the user to set it?
>>
>> We have that situtation with ifindex already.  Basically the thought is
>> to allow an id to be set, but also allow an id to be auto-generated if
>> we use an namespace without an id being set.
> If my understanding is correct, the difference is that we want to hide some
> netns.
> Do you think we can generate an id for each netns that does not have one and
> relying on the fact that this id has no meaning unless you have a netns file
> descriptor that allow you to get the id of this netns?
Any comment Eric ?


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH net-next v5 0/4] netns: allow to identify peer netns
  2014-10-31 19:14                     ` Eric W. Biederman
  2014-11-05 14:23                       ` Nicolas Dichtel
@ 2015-01-15 14:11                       ` Nicolas Dichtel
  2015-01-15 14:11                         ` [PATCH net-next v5 1/4] netns: add rtnl cmd to add and get peer netns ids Nicolas Dichtel
                                           ` (4 more replies)
  1 sibling, 5 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang

The goal of this serie is to be able to multicast netlink messages with an
attribute that identify a peer netns.
This is needed by the userland to interpret some information contained in
netlink messages (like IFLA_LINK value, but also some other attributes in case
of x-netns netdevice (see also
http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

Ids of peer netns can be set by userland via a new rtnl cmd RTM_NEWNSID. When
the kernel needs an id for a peer (for example when advertising a new x-netns
interface via netlink), if the user didn't allocate an id, one will be
automatically allocated.
These ids are stored per netns and are local (ie only valid in the netns where
they are set). To avoid allocating an int for each peer netns, I use
idr_for_each() to retrieve the id of a peer netns. Note that it will be possible
to add a table (struct net -> id) later to optimize this lookup if needed.

Patch 1/4 introduces the rtnetlink API mechanism to set and get these ids.
Patch 2/4 and 3/4 implements an example of how to use these ids when advertising
information about a x-netns interface.
And patch 4/4 shows that the netlink messages can be symetric between a GET and
a SET.

iproute2 patches are available, I can send them on demand.

Here is a small screenshot to show how it can be used by userland.

# Initialization:
$ ip netns add foo
$ ip netns del foo
$ ip netns
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns add foo
$ ip -n foo netns
foo
init_net
$ ip -n foo netns set init_net 0
$ ip -n foo netns set foo 1

# Only netns seen from foo have an id:
$ ip netns
foo
init_net
$ ip -n foo netns
foo (id: 1)
init_net (id: 0)

# Add a 4in4 x-netns interface with a link-netnsid option and check the dump:
$ ip -n foo link add ipip1 link-netnsid 0 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip -n foo link ls ipip1
6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0
# The parameter link-netnsid shows us where the interface sends and receives
# packets (and thus we know where encapsulated addresses are set).

# Add a 4in4 x-netns interface without a link-netnsid option and check that an
# id is allocated in init_net for foo
$ ip netns
foo
init_net
$ ip -n foo link add ipip2 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip -n foo link set ipip2 netns init_net
$ ip link ls ipip2
7: ipip2@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0
$ ip netns
foo (id: 0)
init_net

v4 -> v5:
  use rtnetlink instead of genetlink
  allocate automatically an id if user didn't assign one
  rename include/uapi/linux/netns.h to include/uapi/linux/net_namespace.h
  add vxlan in patch #3

RFCv3 -> v4:
  rebase on net-next
  add copyright text in the new netns.h file

RFCv2 -> RFCv3:
  ids are now defined by userland (via netlink). Ids are stored in each netns
  (and they are local to this netns).
  add get_link_net support for ip6 tunnels
  netnsid is now a s32 instead of a u32

RFCv1 -> RFCv2:
  remove useless ()
  ids are now stored in the user ns. It's possible to get an id for a peer netns
  only if the current netns and the peer netns have the same user ns parent.

 MAINTAINERS                        |   1 +
 drivers/net/vxlan.c                |   8 ++
 include/net/ip6_tunnel.h           |   1 +
 include/net/ip_tunnels.h           |   1 +
 include/net/net_namespace.h        |   4 +
 include/net/rtnetlink.h            |   2 +
 include/uapi/linux/Kbuild          |   1 +
 include/uapi/linux/if_link.h       |   1 +
 include/uapi/linux/net_namespace.h |  23 ++++
 include/uapi/linux/rtnetlink.h     |   5 +
 net/core/net_namespace.c           | 210 +++++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c               |  38 ++++++-
 net/ipv4/ip_gre.c                  |   2 +
 net/ipv4/ip_tunnel.c               |   8 ++
 net/ipv4/ip_vti.c                  |   1 +
 net/ipv4/ipip.c                    |   1 +
 net/ipv6/ip6_gre.c                 |   1 +
 net/ipv6/ip6_tunnel.c              |   9 ++
 net/ipv6/ip6_vti.c                 |   1 +
 net/ipv6/sit.c                     |   1 +
 20 files changed, 316 insertions(+), 3 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH net-next v5 1/4] netns: add rtnl cmd to add and get peer netns ids
  2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
@ 2015-01-15 14:11                         ` Nicolas Dichtel
  2015-01-15 14:11                         ` [PATCH net-next v5 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
                                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to the netns where it is added (ie valid only into this
netns).

The main function (ie the one exported to other module), peernet2id(), allows to
get the id of a peer netns. If no id has been assigned by the user, this
function allocates one.

These ids will be used in netlink messages to point to a peer netns, for example
in case of a x-netns interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 MAINTAINERS                        |   1 +
 include/net/net_namespace.h        |   4 +
 include/uapi/linux/Kbuild          |   1 +
 include/uapi/linux/net_namespace.h |  23 ++++
 include/uapi/linux/rtnetlink.h     |   5 +
 net/core/net_namespace.c           | 210 +++++++++++++++++++++++++++++++++++++
 6 files changed, 244 insertions(+)
 create mode 100644 include/uapi/linux/net_namespace.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 9de900572633..9b91d9f0257e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6578,6 +6578,7 @@ F:	include/linux/netdevice.h
 F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
+F:	include/uapi/linux/net_namespace.h
 F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 2e8756b8c775..36faf4990c4b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -60,6 +60,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	struct idr		netns_ids;
 
 	struct ns_common	ns;
 
@@ -290,6 +291,9 @@ static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
+int peernet2id(struct net *net, struct net *peer);
+struct net *get_net_ns_by_id(struct net *net, int id);
+
 struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 00b100023c47..14b7b6e44c77 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -283,6 +283,7 @@ header-y += net.h
 header-y += netlink_diag.h
 header-y += netlink.h
 header-y += netrom.h
+header-y += net_namespace.h
 header-y += net_tstamp.h
 header-y += nfc.h
 header-y += nfs2.h
diff --git a/include/uapi/linux/net_namespace.h b/include/uapi/linux/net_namespace.h
new file mode 100644
index 000000000000..778cd2c3ebf4
--- /dev/null
+++ b/include/uapi/linux/net_namespace.h
@@ -0,0 +1,23 @@
+/* Copyright (c) 2015 6WIND S.A.
+ * Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ */
+#ifndef _UAPI_LINUX_NET_NAMESPACE_H_
+#define _UAPI_LINUX_NET_NAMESPACE_H_
+
+/* Attributes of RTM_NEWNSID/RTM_GETNSID messages */
+enum {
+	NETNSA_NONE,
+#define NETNSA_NSID_NOT_ASSIGNED -1
+	NETNSA_NSID,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NET_NAMESPACE_H_ */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index a1d18593f41e..5cc5d66bf519 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -132,6 +132,11 @@ enum {
 	RTM_GETMDB = 86,
 #define RTM_GETMDB RTM_GETMDB
 
+	RTM_NEWNSID = 88,
+#define RTM_NEWNSID RTM_NEWNSID
+	RTM_GETNSID = 90,
+#define RTM_GETNSID RTM_GETNSID
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index ce780c722e48..edf089dd792a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -15,6 +15,10 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/net_namespace.h>
+#include <linux/rtnetlink.h>
+#include <net/sock.h>
+#include <net/netlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -144,6 +148,77 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+static int alloc_netid(struct net *net, struct net *peer, int reqid)
+{
+	int min = 0, max = 0;
+
+	ASSERT_RTNL();
+
+	if (reqid >= 0) {
+		min = reqid;
+		max = reqid + 1;
+	}
+
+	return idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
+}
+
+/* This function is used by idr_for_each(). If net is equal to peer, the
+ * function returns the id so that idr_for_each() stops. Because we cannot
+ * returns the id 0 (idr_for_each() will not stop), we return the magic value
+ * NET_ID_ZERO (-1) for it.
+ */
+#define NET_ID_ZERO -1
+static int net_eq_idr(int id, void *net, void *peer)
+{
+	if (net_eq(net, peer))
+		return id ? : NET_ID_ZERO;
+	return 0;
+}
+
+static int __peernet2id(struct net *net, struct net *peer, bool alloc)
+{
+	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+
+	ASSERT_RTNL();
+
+	/* Magic value for id 0. */
+	if (id == NET_ID_ZERO)
+		return 0;
+	if (id > 0)
+		return id;
+
+	if (alloc)
+		return alloc_netid(net, peer, -1);
+
+	return -ENOENT;
+}
+
+/* This function returns the id of a peer netns. If no id is assigned, one will
+ * be allocated and returned.
+ */
+int peernet2id(struct net *net, struct net *peer)
+{
+	int id = __peernet2id(net, peer, true);
+
+	return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
+}
+
+struct net *get_net_ns_by_id(struct net *net, int id)
+{
+	struct net *peer;
+
+	if (id < 0)
+		return NULL;
+
+	rcu_read_lock();
+	peer = idr_find(&net->netns_ids, id);
+	if (peer)
+		get_net(peer);
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -158,6 +233,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+	idr_init(&net->netns_ids);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -288,6 +364,14 @@ static void cleanup_net(struct work_struct *work)
 	list_for_each_entry(net, &net_kill_list, cleanup_list) {
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
+		for_each_net(tmp) {
+			int id = __peernet2id(tmp, net, false);
+
+			if (id >= 0)
+				idr_remove(&tmp->netns_ids, id);
+		}
+		idr_destroy(&net->netns_ids);
+
 	}
 	rtnl_unlock();
 
@@ -402,6 +486,129 @@ static struct pernet_operations __net_initdata net_ns_ops = {
 	.exit = net_ns_net_exit,
 };
 
+static struct nla_policy rtnl_net_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
+	[NETNSA_NSID]		= { .type = NLA_S32 },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[NETNSA_MAX + 1];
+	struct net *peer;
+	int nsid, err;
+
+	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+			  rtnl_net_policy);
+	if (err < 0)
+		return err;
+	if (!tb[NETNSA_NSID])
+		return -EINVAL;
+	nsid = nla_get_s32(tb[NETNSA_NSID]);
+
+	if (tb[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(tb[NETNSA_PID]));
+	else if (tb[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
+	else
+		return -EINVAL;
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	if (__peernet2id(net, peer, false) >= 0) {
+		err = -EEXIST;
+		goto out;
+	}
+
+	err = alloc_netid(net, peer, nsid);
+	if (err > 0)
+		err = 0;
+out:
+	put_net(peer);
+	return err;
+}
+
+static int rtnl_net_get_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct rtgenmsg))
+	       + nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+	       ;
+}
+
+static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net, struct net *peer)
+{
+	struct nlmsghdr *nlh;
+	struct rtgenmsg *rth;
+	int id;
+
+	ASSERT_RTNL();
+
+	nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rth), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	rth = nlmsg_data(nlh);
+	rth->rtgen_family = AF_UNSPEC;
+
+	id = __peernet2id(net, peer, false);
+	if  (id < 0)
+		id = NETNSA_NSID_NOT_ASSIGNED;
+	if (nla_put_s32(skb, NETNSA_NSID, id))
+		goto nla_put_failure;
+
+	return nlmsg_end(skb, nlh);
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[NETNSA_MAX + 1];
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+	struct net *peer;
+
+	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+			  rtnl_net_policy);
+	if (err < 0)
+		return err;
+	if (tb[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(tb[NETNSA_PID]));
+	else if (tb[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
+	else
+		return -EINVAL;
+
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
+			    RTM_GETNSID, net, peer);
+	if (err < 0)
+		goto err_out;
+
+	err = rtnl_unicast(msg, net, NETLINK_CB(skb).portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+out:
+	put_net(peer);
+	return err;
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
@@ -435,6 +642,9 @@ static int __init net_ns_init(void)
 
 	register_pernet_subsys(&net_ns_ops);
 
+	rtnl_register(PF_UNSPEC, RTM_NEWNSID, rtnl_net_newid, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, NULL, NULL);
+
 	return 0;
 }
 
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH net-next v5 2/4] rtnl: add link netns id to interface messages
  2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
  2015-01-15 14:11                         ` [PATCH net-next v5 1/4] netns: add rtnl cmd to add and get peer netns ids Nicolas Dichtel
@ 2015-01-15 14:11                         ` Nicolas Dichtel
  2015-01-15 14:11                         ` [PATCH net-next v5 3/4] tunnels: advertise link netns via netlink Nicolas Dichtel
                                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels).
With this attribute, it's possible to interpret correctly all advertised
information (like IFLA_LINK, etc.).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/rtnetlink.h      |  2 ++
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 13 +++++++++++++
 3 files changed, 16 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index e21b9f9653c0..6c6d5393fc34 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -46,6 +46,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  *			    to create when creating a new device.
  *	@get_num_rx_queues: Function to determine number of receive queues
  *			    to create when creating a new device.
+ *	@get_link_net: Function to get the i/o netns of the device
  */
 struct rtnl_link_ops {
 	struct list_head	list;
@@ -93,6 +94,7 @@ struct rtnl_link_ops {
 	int			(*fill_slave_info)(struct sk_buff *skb,
 						   const struct net_device *dev,
 						   const struct net_device *slave_dev);
+	struct net		*(*get_link_net)(const struct net_device *dev);
 };
 
 int __rtnl_link_register(struct rtnl_link_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 2a8380edbb7e..0deee3eeddbf 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -146,6 +146,7 @@ enum {
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
 	IFLA_PHYS_SWITCH_ID,
+	IFLA_LINK_NETNSID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6a6cdade1676..ab78ba9a34e8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -875,6 +875,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
+	       + nla_total_size(4) /* IFLA_LINK_NETNSID */
 	       + nla_total_size(ext_filter_mask
 			        & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */
 	       + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
@@ -1169,6 +1170,18 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			goto nla_put_failure;
 	}
 
+	if (dev->rtnl_link_ops &&
+	    dev->rtnl_link_ops->get_link_net) {
+		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
+
+		if (!net_eq(dev_net(dev), link_net)) {
+			int id = peernet2id(dev_net(dev), link_net);
+
+			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
+				goto nla_put_failure;
+		}
+	}
+
 	if (!(af_spec = nla_nest_start(skb, IFLA_AF_SPEC)))
 		goto nla_put_failure;
 
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH net-next v5 3/4] tunnels: advertise link netns via netlink
  2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
  2015-01-15 14:11                         ` [PATCH net-next v5 1/4] netns: add rtnl cmd to add and get peer netns ids Nicolas Dichtel
  2015-01-15 14:11                         ` [PATCH net-next v5 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
@ 2015-01-15 14:11                         ` Nicolas Dichtel
  2015-01-15 14:11                         ` [PATCH net-next v5 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
  2015-01-19 19:16                         ` [PATCH net-next v5 0/4] netns: allow to identify peer netns David Miller
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 drivers/net/vxlan.c      | 8 ++++++++
 include/net/ip6_tunnel.h | 1 +
 include/net/ip_tunnels.h | 1 +
 net/ipv4/ip_gre.c        | 2 ++
 net/ipv4/ip_tunnel.c     | 8 ++++++++
 net/ipv4/ip_vti.c        | 1 +
 net/ipv4/ipip.c          | 1 +
 net/ipv6/ip6_gre.c       | 1 +
 net/ipv6/ip6_tunnel.c    | 9 +++++++++
 net/ipv6/ip6_vti.c       | 1 +
 net/ipv6/sit.c           | 1 +
 11 files changed, 34 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6b6b45622a0a..88dbb1edea6e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2922,6 +2922,13 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+static struct net *vxlan_get_link_net(const struct net_device *dev)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+
+	return vxlan->net;
+}
+
 static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.kind		= "vxlan",
 	.maxtype	= IFLA_VXLAN_MAX,
@@ -2933,6 +2940,7 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.dellink	= vxlan_dellink,
 	.get_size	= vxlan_get_size,
 	.fill_info	= vxlan_fill_info,
+	.get_link_net	= vxlan_get_link_net,
 };
 
 static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 9326c41c2d7f..76c091b53dae 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -70,6 +70,7 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t, const struct in6_addr *laddr,
 __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw);
 __u32 ip6_tnl_get_cap(struct ip6_tnl *t, const struct in6_addr *laddr,
 			     const struct in6_addr *raddr);
+struct net *ip6_tnl_get_link_net(const struct net_device *dev);
 
 static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index ce4db3cc5647..2c47061a6954 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -141,6 +141,7 @@ int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
 int ip_tunnel_init(struct net_device *dev);
 void ip_tunnel_uninit(struct net_device *dev);
 void  ip_tunnel_dellink(struct net_device *dev, struct list_head *head);
+struct net *ip_tunnel_get_link_net(const struct net_device *dev);
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 		       struct rtnl_link_ops *ops, char *devname);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 942576e27df1..6e7727f27393 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -829,6 +829,7 @@ static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
@@ -843,6 +844,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __net_init ipgre_tap_init_net(struct net *net)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d3e447936720..2cd08280c77b 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -972,6 +972,14 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_dellink);
 
+struct net *ip_tunnel_get_link_net(const struct net_device *dev)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip_tunnel_get_link_net);
+
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 				  struct rtnl_link_ops *ops, char *devname)
 {
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 1a7e979e80ba..94efe148181c 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -531,6 +531,7 @@ static struct rtnl_link_ops vti_link_ops __read_mostly = {
 	.dellink        = ip_tunnel_dellink,
 	.get_size	= vti_get_size,
 	.fill_info	= vti_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __init vti_init(void)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 40403114f00a..b58d6689874c 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -498,6 +498,7 @@ static struct rtnl_link_ops ipip_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipip_get_size,
 	.fill_info	= ipip_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel ipip_handler __read_mostly = {
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 13cda4c6313b..9306a5ff9149 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1662,6 +1662,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = {
 	.dellink	= ip6gre_dellink,
 	.get_size	= ip6gre_get_size,
 	.fill_info	= ip6gre_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 92b3da571980..266a264ec212 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1760,6 +1760,14 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+struct net *ip6_tnl_get_link_net(const struct net_device *dev)
+{
+	struct ip6_tnl *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip6_tnl_get_link_net);
+
 static const struct nla_policy ip6_tnl_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_LINK]		= { .type = NLA_U32 },
 	[IFLA_IPTUN_LOCAL]		= { .len = sizeof(struct in6_addr) },
@@ -1783,6 +1791,7 @@ static struct rtnl_link_ops ip6_link_ops __read_mostly = {
 	.dellink	= ip6_tnl_dellink,
 	.get_size	= ip6_tnl_get_size,
 	.fill_info	= ip6_tnl_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct xfrm6_tunnel ip4ip6_handler __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ace10d0b3aac..5fb9e212eca8 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -1016,6 +1016,7 @@ static struct rtnl_link_ops vti6_link_ops __read_mostly = {
 	.changelink	= vti6_changelink,
 	.get_size	= vti6_get_size,
 	.fill_info	= vti6_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 213546bd6d5d..3cc197c72b59 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1763,6 +1763,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
 	.get_size	= ipip6_get_size,
 	.fill_info	= ipip6_fill_info,
 	.dellink	= ipip6_dellink,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel sit_handler __read_mostly = {
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH net-next v5 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set
  2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
                                           ` (2 preceding siblings ...)
  2015-01-15 14:11                         ` [PATCH net-next v5 3/4] tunnels: advertise link netns via netlink Nicolas Dichtel
@ 2015-01-15 14:11                         ` Nicolas Dichtel
  2015-01-19 19:16                         ` [PATCH net-next v5 0/4] netns: allow to identify peer netns David Miller
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel

This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/rtnetlink.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ab78ba9a34e8..b2f6d8285a24 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1247,6 +1247,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
 	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
+	[IFLA_LINK_NETNSID]	= { .type = NLA_S32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2020,7 +2021,7 @@ replay:
 		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 0];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
-		struct net *dest_net;
+		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
@@ -2126,7 +2127,18 @@ replay:
 		if (IS_ERR(dest_net))
 			return PTR_ERR(dest_net);
 
-		dev = rtnl_create_link(dest_net, ifname, name_assign_type, ops, tb);
+		if (tb[IFLA_LINK_NETNSID]) {
+			int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
+
+			link_net = get_net_ns_by_id(dest_net, id);
+			if (!link_net) {
+				err =  -EINVAL;
+				goto out;
+			}
+		}
+
+		dev = rtnl_create_link(link_net ? : dest_net, ifname,
+				       name_assign_type, ops, tb);
 		if (IS_ERR(dev)) {
 			err = PTR_ERR(dev);
 			goto out;
@@ -2154,9 +2166,16 @@ replay:
 			}
 		}
 		err = rtnl_configure_link(dev, ifm);
-		if (err < 0)
+		if (err < 0) {
 			unregister_netdevice(dev);
+			goto out;
+		}
+
+		if (link_net)
+			err = dev_change_net_namespace(dev, dest_net, ifname);
 out:
+		if (link_net)
+			put_net(link_net);
 		put_net(dest_net);
 		return err;
 	}
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH net-next v5 0/4] netns: allow to identify peer netns
  2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
                                           ` (3 preceding siblings ...)
  2015-01-15 14:11                         ` [PATCH net-next v5 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
@ 2015-01-19 19:16                         ` David Miller
  4 siblings, 0 replies; 67+ messages in thread
From: David Miller @ 2015-01-19 19:16 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: netdev, containers, linux-kernel, linux-api, ebiederm, stephen,
	akpm, luto, cwang

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 15 Jan 2015 15:11:14 +0100

> The goal of this serie is to be able to multicast netlink messages with an
> attribute that identify a peer netns.
> This is needed by the userland to interpret some information contained in
> netlink messages (like IFLA_LINK value, but also some other attributes in case
> of x-netns netdevice (see also
> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
> 
> Ids of peer netns can be set by userland via a new rtnl cmd RTM_NEWNSID. When
> the kernel needs an id for a peer (for example when advertising a new x-netns
> interface via netlink), if the user didn't allocate an id, one will be
> automatically allocated.
> These ids are stored per netns and are local (ie only valid in the netns where
> they are set). To avoid allocating an int for each peer netns, I use
> idr_for_each() to retrieve the id of a peer netns. Note that it will be possible
> to add a table (struct net -> id) later to optimize this lookup if needed.
> 
> Patch 1/4 introduces the rtnetlink API mechanism to set and get these ids.
> Patch 2/4 and 3/4 implements an example of how to use these ids when advertising
> information about a x-netns interface.
> And patch 4/4 shows that the netlink messages can be symetric between a GET and
> a SET.
 ...

Seires applied, thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2015-01-19 19:17 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23 13:20 [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Nicolas Dichtel
2014-09-23 13:20 ` [RFC PATCH net-next v2 1/5] netns: allocate netns ids Nicolas Dichtel
2014-09-23 13:20 ` [RFC PATCH net-next v2 2/5] netns: add genl cmd to get the id of a netns Nicolas Dichtel
2014-09-23 13:20 ` [RFC PATCH net-next v2 3/5] rtnl: add link netns id to interface messages Nicolas Dichtel
2014-09-23 13:20 ` [RFC PATCH net-next v2 4/5] iptunnels: advertise link netns via netlink Nicolas Dichtel
2014-09-23 13:20 ` [RFC PATCH net-next v2 5/5] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
2014-09-23 19:22 ` [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns Cong Wang
2014-09-24  9:23   ` Nicolas Dichtel
2014-09-24 16:01     ` Cong Wang
2014-09-24 16:15       ` Cong Wang
2014-09-24 16:31         ` Nicolas Dichtel
2014-09-24 16:48           ` Cong Wang
2014-09-25  8:53             ` Nicolas Dichtel
2014-09-26  1:58               ` Cong Wang
2014-09-26 13:38                 ` Nicolas Dichtel
2014-09-24 16:27       ` Nicolas Dichtel
2014-09-24 16:45         ` Cong Wang
2014-09-25  8:53           ` Nicolas Dichtel
2014-09-26  2:09             ` Cong Wang
2014-09-26 13:40               ` Nicolas Dichtel
2014-09-26 19:15                 ` David Ahern
2014-09-26 19:34                   ` Eric W. Biederman
2014-09-26 19:44                     ` David Ahern
2014-09-26 20:45                       ` Eric W. Biederman
2014-09-26 20:56                         ` David Ahern
2014-09-23 19:26 ` Andy Lutomirski
2014-09-24  9:31   ` Nicolas Dichtel
2014-09-24 17:05     ` Andy Lutomirski
2014-09-25  7:54       ` Nicolas Dichtel
2014-09-26 18:10 ` Eric W. Biederman
2014-09-26 18:26   ` Andy Lutomirski
2014-09-26 18:57     ` Eric W. Biederman
2014-09-29 12:06       ` Nicolas Dichtel
2014-09-29 18:43         ` Eric W. Biederman
2014-10-02 13:46           ` Nicolas Dichtel
2014-10-02 13:48             ` [RFC PATCH net-next v3 0/4] " Nicolas Dichtel
2014-10-02 13:48               ` [RFC PATCH net-next v3 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
2014-10-02 19:33                 ` Eric W. Biederman
2014-10-03 12:22                   ` Nicolas Dichtel
2014-10-02 13:48               ` [RFC PATCH net-next v3 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
2014-10-02 13:48               ` [RFC PATCH net-next v3 3/4] iptunnels: advertise link netns via netlink Nicolas Dichtel
2014-10-02 13:48               ` [RFC PATCH net-next v3 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
2014-10-30 15:25               ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Nicolas Dichtel
2014-10-30 15:25                 ` [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids Nicolas Dichtel
2014-10-30 18:35                   ` Eric W. Biederman
2014-10-31  9:41                     ` Nicolas Dichtel
2014-10-30 15:25                 ` [PATCH net-next v4 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
2014-10-30 15:25                 ` [PATCH net-next v4 3/4] iptunnels: advertise link netns via netlink Nicolas Dichtel
2014-10-30 15:25                 ` [PATCH net-next v4 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
2014-10-30 18:41                 ` [PATCH net-next v4 0/4] netns: allow to identify peer netns Eric W. Biederman
2014-10-31  9:48                   ` Nicolas Dichtel
2014-10-31 19:14                     ` Eric W. Biederman
2014-11-05 14:23                       ` Nicolas Dichtel
2014-12-04 16:21                         ` Nicolas Dichtel
2015-01-15 14:11                       ` [PATCH net-next v5 " Nicolas Dichtel
2015-01-15 14:11                         ` [PATCH net-next v5 1/4] netns: add rtnl cmd to add and get peer netns ids Nicolas Dichtel
2015-01-15 14:11                         ` [PATCH net-next v5 2/4] rtnl: add link netns id to interface messages Nicolas Dichtel
2015-01-15 14:11                         ` [PATCH net-next v5 3/4] tunnels: advertise link netns via netlink Nicolas Dichtel
2015-01-15 14:11                         ` [PATCH net-next v5 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set Nicolas Dichtel
2015-01-19 19:16                         ` [PATCH net-next v5 0/4] netns: allow to identify peer netns David Miller
2014-11-01 21:08                   ` [PATCH net-next v4 " David Miller
2014-11-24 13:45                   ` Nicolas Dichtel
2014-10-02 19:20             ` [RFC PATCH net-next v2 0/5] " Eric W. Biederman
2014-10-02 19:31               ` Andy Lutomirski
2014-10-02 19:45                 ` Eric W. Biederman
2014-10-02 19:48                   ` Andy Lutomirski
2014-10-03 12:22               ` Nicolas Dichtel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).