All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/5] Ease netns management for userland
@ 2012-12-12 17:17 Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 1/5] netns: allocate an unique id to identify a netns Nicolas Dichtel
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka

The goal of this serie is to ease netns management by daemons. Some systems use
netns only to virtualize network stack and don't want to multiply userland
daemons.  These system may have a lot of netns, up to 2000. We don't want to
launch an instance of each daemons (quagga, strongswan, conntrackd, ...) for
each netns because it will consume a lot of ressources. Having one daemon that
manage all netns is more efficient (mainly if there are few objects to manage:
one or two routes per netns for example).
Hence, one goal of this serie is to allow, for a daemon, to monitor netns
activities, thus it can open or close netlink sockets, allocating structures
needed to manage these netns when they are created or deleted.
To help to identify a netns, an index has been added to each netns.

A new setsockopt() option is also added, to help daemons to open socket in the
right netns. For now, a daemon that want to open a socket in a specified netns,
need to call setns(CLONE_NEWNET) with a fd (not so easy to found), open the
socket and then call again setns() to go back in the initial netns. Having this
kind of setsockopt() will simplify operations. Obviously, this setsockopt()
should be done enough early (is test on sk_state enough?). The first target is
netlink socket but it can be useful for other kind of socket, it's why a add a
generic socket option.

As usual, the patch against iproute2 will be sent once the patches are included
and net-next merged. I can send it on demand.

 arch/alpha/include/asm/socket.h        |   2 +
 arch/avr32/include/uapi/asm/socket.h   |   2 +
 arch/frv/include/uapi/asm/socket.h     |   2 +
 arch/h8300/include/asm/socket.h        |   2 +
 arch/ia64/include/uapi/asm/socket.h    |   2 +
 arch/m32r/include/asm/socket.h         |   2 +
 arch/m68k/include/uapi/asm/socket.h    |   2 +
 arch/mips/include/uapi/asm/socket.h    |   2 +
 arch/mn10300/include/uapi/asm/socket.h |   2 +
 arch/parisc/include/uapi/asm/socket.h  |   2 +
 arch/powerpc/include/uapi/asm/socket.h |   2 +
 arch/s390/include/uapi/asm/socket.h    |   2 +
 arch/sparc/include/uapi/asm/socket.h   |   2 +
 arch/xtensa/include/uapi/asm/socket.h  |   2 +
 include/net/net_namespace.h            |   3 +
 include/uapi/asm-generic/socket.h      |   2 +
 include/uapi/linux/if_link.h           |   1 +
 include/uapi/linux/netns.h             |  31 +++++
 net/core/net_namespace.c               | 223 +++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c                   |   7 +-
 net/core/sock.c                        |  28 +++++
 net/netlink/genetlink.c                |   4 +
 22 files changed, 326 insertions(+), 1 deletion(-)

I do not pretend to be a netns expert, it's why I add RFC in the title ;-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next 1/5] netns: allocate an unique id to identify a netns
  2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
@ 2012-12-12 17:17 ` Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 2/5] netns: allow to dump netns with netlink Nicolas Dichtel
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel

This patch simply adds a field nsindex, which will contain a unique index.
The goal is to prepare the monitoring of netns activities with rtnelink and to
ease netns management by userland apps.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/net_namespace.h |  1 +
 net/core/net_namespace.c    | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index c5a43f5..5db7a1b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -55,6 +55,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	int			nsindex;	/* index to identify this ns */
 
 	struct proc_dir_entry 	*proc_net;
 	struct proc_dir_entry 	*proc_net_stat;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 6456439..f5267e4 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -27,6 +27,7 @@ static DEFINE_MUTEX(net_mutex);
 
 LIST_HEAD(net_namespace_list);
 EXPORT_SYMBOL_GPL(net_namespace_list);
+static DEFINE_IDA(net_namespace_ids);
 
 struct net init_net = {
 	.dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
@@ -157,6 +158,15 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+again:
+	error = ida_get_new_above(&net_namespace_ids, 1, &net->nsindex);
+	if (error < 0) {
+		if (error == -EAGAIN) {
+			ida_pre_get(&net_namespace_ids, GFP_KERNEL);
+			goto again;
+		}
+		return error;
+	}
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -171,6 +181,7 @@ out:
 	return error;
 
 out_undo:
+	ida_remove(&net_namespace_ids, net->nsindex);
 	/* Walk through the list backwards calling the exit functions
 	 * for the pernet modules whose init functions did not fail.
 	 */
@@ -297,6 +308,11 @@ static void cleanup_net(struct work_struct *work)
 	 */
 	synchronize_rcu();
 
+	list_for_each_entry(net, &net_exit_list, exit_list) {
+		/* Free the index */
+		ida_remove(&net_namespace_ids, net->nsindex);
+	}
+
 	/* Run all of the network namespace exit methods */
 	list_for_each_entry_reverse(ops, &pernet_list, list)
 		ops_exit_list(ops, &net_exit_list);
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next 2/5] netns: allow to dump netns with netlink
  2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 1/5] netns: allocate an unique id to identify a netns Nicolas Dichtel
@ 2012-12-12 17:17 ` Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 3/5] dev/netns: allow to get netns from nsindex in rtnl msg Nicolas Dichtel
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel

This patch adds the basic support of netlink for netns. The user can dump all
existing netns and get associated nsindex.
He also can get nsindex associated to a pid or fd.

To initialize genetlink family for netns, there is a problem of chicken and
eggs. genetlink init is done after init_net is created, hence when init_net is
created, we cannot call genl_register_family_with_ops(). It's why I put the
init part in genetlink module.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/net_namespace.h |   1 +
 include/uapi/linux/netns.h  |  27 ++++++++
 net/core/net_namespace.c    | 157 ++++++++++++++++++++++++++++++++++++++++++++
 net/netlink/genetlink.c     |   4 ++
 4 files changed, 189 insertions(+)
 create mode 100644 include/uapi/linux/netns.h

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5db7a1b..c373f2e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -306,6 +306,7 @@ extern int register_pernet_subsys(struct pernet_operations *);
 extern void unregister_pernet_subsys(struct pernet_operations *);
 extern int register_pernet_device(struct pernet_operations *);
 extern void unregister_pernet_device(struct pernet_operations *);
+extern int netns_genl_register(void);
 
 struct ctl_table;
 struct ctl_table_header;
diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
new file mode 100644
index 0000000..e1c1da3
--- /dev/null
+++ b/include/uapi/linux/netns.h
@@ -0,0 +1,27 @@
+#ifndef _UAPI_LINUX_NETNS_H_
+#define _UAPI_LINUX_NETNS_H_
+
+/* Generic netlink messages */
+
+#define NETNS_GENL_NAME			"netns"
+#define NETNS_GENL_VERSION		0x1
+
+/* Commands */
+enum {
+	NETNS_CMD_NOOP,
+	NETNS_CMD_GET,
+	__NETNS_CMD_MAX,
+};
+#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
+
+/* Attributes */
+enum {
+	NETNSA_NONE,
+	NETNSA_NSINDEX,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NETNS_H_ */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f5267e4..2ae22b0 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -14,6 +14,8 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/netns.h>
+#include <net/genetlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -397,6 +399,161 @@ struct net *get_net_ns_by_pid(pid_t pid)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_pid);
 
+static struct genl_family netns_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NETNS_GENL_NAME,
+	.version	= NETNS_GENL_VERSION,
+	.hdrsize	= 0,
+	.maxattr	= NETNSA_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC, },
+	[NETNSA_NSINDEX]	= { .type = NLA_U32, },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int netns_nl_get_size(void)
+{
+	return nla_total_size(sizeof(u32)) /* NETNSA_NSINDEX */
+	       ;
+}
+
+static int netns_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
+{
+	struct sk_buff *msg;
+	void *hdr;
+	int ret = -ENOBUFS;
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
+			  &netns_nl_family, 0, NETNS_CMD_NOOP);
+	if (!hdr) {
+		ret = -EMSGSIZE;
+		goto err_out;
+	}
+
+	genlmsg_end(msg, hdr);
+
+	return genlmsg_unicast(genl_info_net(info), msg, info->snd_portid);
+
+err_out:
+	nlmsg_free(msg);
+
+out:
+	return ret;
+}
+
+static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(skb, portid, seq, &netns_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, NETNSA_NSINDEX, net->nsindex))
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int netns_nl_cmd_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+
+	if (info->attrs[NETNSA_PID])
+		net = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		net = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		get_net(net);
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
+			    NLM_F_ACK, NETNS_CMD_GET, net);
+	if (err < 0)
+		goto err_out;
+
+	err = genlmsg_unicast(genl_info_net(info), msg, info->snd_portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+
+out:
+	put_net(net);
+	return err;
+}
+
+static int netns_nl_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	int i = 0, s_i = cb->args[0];
+	struct net *net;
+
+	rtnl_lock();
+	for_each_net(net) {
+		if (i < s_i) {
+			i++;
+			continue;
+		}
+
+		if (netns_nl_fill(skb, NETLINK_CB(cb->skb).portid,
+				  cb->nlh->nlmsg_seq, NLM_F_MULTI,
+				  NETNS_CMD_GET, net) <= 0)
+			goto out;
+
+		i++;
+	}
+
+out:
+	cb->args[0] = i;
+	rtnl_unlock();
+
+	return skb->len;
+}
+
+static struct genl_ops netns_nl_ops[] = {
+	{
+		.cmd = NETNS_CMD_NOOP,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_noop,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NETNS_CMD_GET,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_get,
+		.dumpit = netns_nl_cmd_dump,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+int netns_genl_register(void)
+{
+	return genl_register_family_with_ops(&netns_nl_family, netns_nl_ops,
+					     ARRAY_SIZE(netns_nl_ops));
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f2aabb6..6d25ddb 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -963,6 +963,10 @@ static int __init genl_init(void)
 	if (err < 0)
 		goto problem;
 
+	err = netns_genl_register();
+	if (err < 0)
+		goto problem;
+
 	return 0;
 
 problem:
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next 3/5] dev/netns: allow to get netns from nsindex in rtnl msg
  2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 1/5] netns: allocate an unique id to identify a netns Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 2/5] netns: allow to dump netns with netlink Nicolas Dichtel
@ 2012-12-12 17:17 ` Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 4/5] netns: advertise netns activity with netlink Nicolas Dichtel
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel

This patch allows to move a netdevice to another netns by giving the nsindex.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/net_namespace.h  |  1 +
 include/uapi/linux/if_link.h |  1 +
 net/core/net_namespace.c     | 14 ++++++++++++++
 net/core/rtnetlink.c         |  7 ++++++-
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index c373f2e..68e7a36 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -151,6 +151,7 @@ extern struct list_head net_namespace_list;
 
 extern struct net *get_net_ns_by_pid(pid_t pid);
 extern struct net *get_net_ns_by_fd(int pid);
+extern struct net *get_net_ns_by_nsindex(int nsindex);
 
 #ifdef CONFIG_NET_NS
 extern void __put_net(struct net *net);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 60f3b6b..6720a47 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -142,6 +142,7 @@ enum {
 #define IFLA_PROMISCUITY IFLA_PROMISCUITY
 	IFLA_NUM_TX_QUEUES,
 	IFLA_NUM_RX_QUEUES,
+	IFLA_NET_NS_INDEX,
 	__IFLA_MAX
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2ae22b0..18fc62f 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -399,6 +399,20 @@ struct net *get_net_ns_by_pid(pid_t pid)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_pid);
 
+struct net *get_net_ns_by_nsindex(int nsindex)
+{
+	struct net *net;
+
+	ASSERT_RTNL();
+	for_each_net(net)
+		if (net->nsindex == nsindex) {
+			get_net(net);
+			break;
+		}
+	return net;
+}
+EXPORT_SYMBOL_GPL(get_net_ns_by_nsindex);
+
 static struct genl_family netns_nl_family = {
 	.id		= GENL_ID_GENERATE,
 	.name		= NETNS_GENL_NAME,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1868625..e22954a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1115,6 +1115,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_LINKINFO]		= { .type = NLA_NESTED },
 	[IFLA_NET_NS_PID]	= { .type = NLA_U32 },
 	[IFLA_NET_NS_FD]	= { .type = NLA_U32 },
+	[IFLA_NET_NS_INDEX]	= { .type = NLA_U32 },
 	[IFLA_IFALIAS]	        = { .type = NLA_STRING, .len = IFALIASZ-1 },
 	[IFLA_VFINFO_LIST]	= {. type = NLA_NESTED },
 	[IFLA_VF_PORTS]		= { .type = NLA_NESTED },
@@ -1171,6 +1172,8 @@ struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 		net = get_net_ns_by_pid(nla_get_u32(tb[IFLA_NET_NS_PID]));
 	else if (tb[IFLA_NET_NS_FD])
 		net = get_net_ns_by_fd(nla_get_u32(tb[IFLA_NET_NS_FD]));
+	else if (tb[IFLA_NET_NS_INDEX])
+		net = get_net_ns_by_nsindex(nla_get_u32(tb[IFLA_NET_NS_INDEX]));
 	else
 		net = get_net(src_net);
 	return net;
@@ -1310,7 +1313,9 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	int send_addr_notify = 0;
 	int err;
 
-	if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD]) {
+	if (tb[IFLA_NET_NS_PID] ||
+	    tb[IFLA_NET_NS_FD] ||
+	    tb[IFLA_NET_NS_INDEX]) {
 		struct net *net = rtnl_link_get_net(dev_net(dev), tb);
 		if (IS_ERR(net)) {
 			err = PTR_ERR(net);
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next 4/5] netns: advertise netns activity with netlink
  2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
                   ` (2 preceding siblings ...)
  2012-12-12 17:17 ` [RFC PATCH net-next 3/5] dev/netns: allow to get netns from nsindex in rtnl msg Nicolas Dichtel
@ 2012-12-12 17:17 ` Nicolas Dichtel
  2012-12-12 17:17 ` [RFC PATCH net-next 5/5] net/sock: add support of SO_NETNS Nicolas Dichtel
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel

Goal of this patch is to send netlink messages when netns are crated/deleted.
This is useful for daemon that wants to manage all netns with only one running
instance.
Note that until that netns_nl_event_mcgrp group is not registered, we cannot
send event.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/uapi/linux/netns.h |  4 ++++
 net/core/net_namespace.c   | 38 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
index e1c1da3..e14d90b 100644
--- a/include/uapi/linux/netns.h
+++ b/include/uapi/linux/netns.h
@@ -6,10 +6,14 @@
 #define NETNS_GENL_NAME			"netns"
 #define NETNS_GENL_VERSION		0x1
 
+#define NETNS_GENL_MCAST_EVENT_NAME	"events"
+
 /* Commands */
 enum {
 	NETNS_CMD_NOOP,
 	NETNS_CMD_GET,
+	NETNS_CMD_NEW,
+	NETNS_CMD_DEL,
 	__NETNS_CMD_MAX,
 };
 #define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 18fc62f..da92ecb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -40,6 +40,8 @@ EXPORT_SYMBOL(init_net);
 
 static unsigned int max_gen_ptrs = INITIAL_NET_GEN_PTRS;
 
+static int netns_nl_event(struct net *net, int cmd);
+
 static struct net_generic *net_alloc_generic(void)
 {
 	struct net_generic *ng;
@@ -179,6 +181,7 @@ again:
 		if (error < 0)
 			goto out_undo;
 	}
+	netns_nl_event(net, NETNS_CMD_NEW);
 out:
 	return error;
 
@@ -311,6 +314,7 @@ static void cleanup_net(struct work_struct *work)
 	synchronize_rcu();
 
 	list_for_each_entry(net, &net_exit_list, exit_list) {
+		netns_nl_event(net, NETNS_CMD_DEL);
 		/* Free the index */
 		ida_remove(&net_namespace_ids, net->nsindex);
 	}
@@ -413,6 +417,10 @@ struct net *get_net_ns_by_nsindex(int nsindex)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_nsindex);
 
+static struct genl_multicast_group netns_nl_event_mcgrp = {
+	.name = NETNS_GENL_MCAST_EVENT_NAME,
+};
+
 static struct genl_family netns_nl_family = {
 	.id		= GENL_ID_GENERATE,
 	.name		= NETNS_GENL_NAME,
@@ -562,10 +570,38 @@ static struct genl_ops netns_nl_ops[] = {
 	},
 };
 
+static int netns_nl_event(struct net *net, int cmd)
+{
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+
+	/* Check that gennl infra is ready */
+	if (!netns_nl_event_mcgrp.id)
+		return -ENOENT;
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_ATOMIC);
+	if (!msg)
+		return -ENOMEM;
+
+	err = netns_nl_fill(msg, 0, 0, 0, cmd, net);
+	if (err < 0) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_multicast(msg, 0, netns_nl_event_mcgrp.id, GFP_ATOMIC);
+}
+
 int netns_genl_register(void)
 {
-	return genl_register_family_with_ops(&netns_nl_family, netns_nl_ops,
+	int err;
+
+	err =  genl_register_family_with_ops(&netns_nl_family, netns_nl_ops,
 					     ARRAY_SIZE(netns_nl_ops));
+	if (err < 0)
+		return err;
+
+	return genl_register_mc_group(&netns_nl_family, &netns_nl_event_mcgrp);
 }
 
 static int __init net_ns_init(void)
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next 5/5] net/sock: add support of SO_NETNS
  2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
                   ` (3 preceding siblings ...)
  2012-12-12 17:17 ` [RFC PATCH net-next 4/5] netns: advertise netns activity with netlink Nicolas Dichtel
@ 2012-12-12 17:17 ` Nicolas Dichtel
  2012-12-12 18:39 ` [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
  2012-12-12 19:25 ` Eric W. Biederman
  6 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel

This new setsockopt() option allows user to change netns of a socket. It
should be done enough early, before any bind(), etc.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 arch/alpha/include/asm/socket.h        |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  2 ++
 arch/h8300/include/asm/socket.h        |  2 ++
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/asm/socket.h         |  2 ++
 arch/m68k/include/uapi/asm/socket.h    |  2 ++
 arch/mips/include/uapi/asm/socket.h    |  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h    |  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 28 ++++++++++++++++++++++++++++
 16 files changed, 58 insertions(+)

diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index 0087d05..13aa509 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -77,6 +77,8 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #ifdef __KERNEL__
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 486df68..39cc927 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index 871f89b..ac7eef6 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -70,5 +70,7 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index 90a2e57..4d2a4e8 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 23d6759..ed4534b 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -79,4 +79,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/asm/socket.h b/arch/m32r/include/asm/socket.h
index 5e7088a..37d0eb0 100644
--- a/arch/m32r/include/asm/socket.h
+++ b/arch/m32r/include/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/m68k/include/uapi/asm/socket.h b/arch/m68k/include/uapi/asm/socket.h
index 285da3b..e79aad8 100644
--- a/arch/m68k/include/uapi/asm/socket.h
+++ b/arch/m68k/include/uapi/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 17307ab..356f943 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@ To add: #define SO_REUSEPORT 0x0200	/* Allow local address and port reuse.  */
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index af5366b..b899cf8 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index d9ff473..8503329 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -69,6 +69,8 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		0x4024
 
+#define SO_NETNS		0x4025
+
 
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index eb0b186..1a520ff 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -77,4 +77,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 436d07c..cbdda59 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -76,4 +76,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index c83a937..c1c2853 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -66,6 +66,8 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		0x0027
 
+#define SO_NETNS		0x0028
+
 
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 38079be..a8f956d 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -81,4 +81,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 2d32d07..08c108c 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -73,4 +73,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index a692ef4..7ec288f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -895,6 +895,30 @@ set_rcvbuf:
 		sock_valbool_flag(sk, SOCK_NOFCS, valbool);
 		break;
 
+	case SO_NETNS:
+#ifdef CONFIG_NET_NS
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+			ret = -EPERM;
+		else if (sk->sk_state != TCP_CLOSE)
+			ret = -EBUSY;	/* Too late to change netns */
+		else {
+			struct net *net = get_net_ns_by_nsindex(val);
+
+			if (net) {
+				/* We can not use sk_change_net() because sk
+				 * will not be released with
+				 * sk_release_kernel(). Let do it manually.
+				 */
+				put_net(sock_net(sk));
+				sock_net_set(sk, net);
+			} else
+				ret = -EINVAL;
+		}
+#else
+		ret = -EOPNOTSUPP;
+#endif
+		break;
+
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1140,6 +1164,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 
 		goto lenout;
 
+	case SO_NETNS:
+		v.val = sock_net(sk)->nsindex;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
                   ` (4 preceding siblings ...)
  2012-12-12 17:17 ` [RFC PATCH net-next 5/5] net/sock: add support of SO_NETNS Nicolas Dichtel
@ 2012-12-12 18:39 ` Nicolas Dichtel
  2012-12-12 19:25 ` Eric W. Biederman
  6 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 18:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka

2012/12/12 Nicolas Dichtel <nicolas.dichtel@6wind.com>:
> The goal of this serie is to ease netns management by daemons. Some systems use
> netns only to virtualize network stack and don't want to multiply userland
> daemons.  These system may have a lot of netns, up to 2000. We don't want to
> launch an instance of each daemons (quagga, strongswan, conntrackd, ...) for
> each netns because it will consume a lot of ressources. Having one daemon that
> manage all netns is more efficient (mainly if there are few objects to manage:
> one or two routes per netns for example).
> Hence, one goal of this serie is to allow, for a daemon, to monitor netns
> activities, thus it can open or close netlink sockets, allocating structures
> needed to manage these netns when they are created or deleted.
> To help to identify a netns, an index has been added to each netns.
>
> A new setsockopt() option is also added, to help daemons to open socket in the
> right netns. For now, a daemon that want to open a socket in a specified netns,
> need to call setns(CLONE_NEWNET) with a fd (not so easy to found), open the
> socket and then call again setns() to go back in the initial netns. Having this
> kind of setsockopt() will simplify operations. Obviously, this setsockopt()
> should be done enough early (is test on sk_state enough?). The first target is
> netlink socket but it can be useful for other kind of socket, it's why a add a
> generic socket option.
>
> As usual, the patch against iproute2 will be sent once the patches are included
> and net-next merged. I can send it on demand.
>
>  arch/alpha/include/asm/socket.h        |   2 +
>  arch/avr32/include/uapi/asm/socket.h   |   2 +
>  arch/frv/include/uapi/asm/socket.h     |   2 +
>  arch/h8300/include/asm/socket.h        |   2 +
>  arch/ia64/include/uapi/asm/socket.h    |   2 +
>  arch/m32r/include/asm/socket.h         |   2 +
>  arch/m68k/include/uapi/asm/socket.h    |   2 +
>  arch/mips/include/uapi/asm/socket.h    |   2 +
>  arch/mn10300/include/uapi/asm/socket.h |   2 +
>  arch/parisc/include/uapi/asm/socket.h  |   2 +
>  arch/powerpc/include/uapi/asm/socket.h |   2 +
>  arch/s390/include/uapi/asm/socket.h    |   2 +
>  arch/sparc/include/uapi/asm/socket.h   |   2 +
>  arch/xtensa/include/uapi/asm/socket.h  |   2 +
>  include/net/net_namespace.h            |   3 +
>  include/uapi/asm-generic/socket.h      |   2 +
>  include/uapi/linux/if_link.h           |   1 +
>  include/uapi/linux/netns.h             |  31 +++++
>  net/core/net_namespace.c               | 223 +++++++++++++++++++++++++++++++++
>  net/core/rtnetlink.c                   |   7 +-
>  net/core/sock.c                        |  28 +++++
>  net/netlink/genetlink.c                |   4 +
>  22 files changed, 326 insertions(+), 1 deletion(-)
>
> I do not pretend to be a netns expert, it's why I add RFC in the title ;-)
>
> Comments are welcome.

Sorry for the double send, it's a wrong manip!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
                   ` (5 preceding siblings ...)
  2012-12-12 18:39 ` [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
@ 2012-12-12 19:25 ` Eric W. Biederman
  2012-12-12 20:54   ` Nicolas Dichtel
  6 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2012-12-12 19:25 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, aatteka

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> The goal of this serie is to ease netns management by daemons. Some systems use
> netns only to virtualize network stack and don't want to multiply userland
> daemons.  These system may have a lot of netns, up to 2000. We don't want to
> launch an instance of each daemons (quagga, strongswan, conntrackd, ...) for
> each netns because it will consume a lot of ressources. Having one daemon that
> manage all netns is more efficient (mainly if there are few objects to manage:
> one or two routes per netns for example).
> Hence, one goal of this serie is to allow, for a daemon, to monitor netns
> activities, thus it can open or close netlink sockets, allocating structures
> needed to manage these netns when they are created or deleted.
> To help to identify a netns, an index has been added to each netns.
>
> A new setsockopt() option is also added, to help daemons to open socket in the
> right netns. For now, a daemon that want to open a socket in a specified netns,
> need to call setns(CLONE_NEWNET) with a fd (not so easy to found), open the
> socket and then call again setns() to go back in the initial netns. Having this
> kind of setsockopt() will simplify operations. Obviously, this setsockopt()
> should be done enough early (is test on sk_state enough?). The first target is
> netlink socket but it can be useful for other kind of socket, it's why a add a
> generic socket option.
>
> As usual, the patch against iproute2 will be sent once the patches are included
> and net-next merged. I can send it on demand.

Short answer you don't need to do any of this.

setns with the namespace files in /proc/<pid>/ns/net gives you more than
enough mechanism to solve this problem.  And iprout2 already supports
all of this.

And your approach creates very serious maintenances problems to the
point I don't even want to read your patches.  What namespace do your
namespace id's live in?

A socketopt to change the namespace of a socket is nasty because sockets
changing which network namespace they are in, leads to races which
aren't worth thinking about writing the code to handle.

Longer answer.

You can bind mount the namespace id's /proc/<pid>/ns/net files to
give you any name you want.  This puts naming policy in userspace
control, and nests just fine.

You can open a socket in any network namespace you want just
by calling setns before socket.  Wrapping this idiom in a library call
or if there is sufficient need in a socketat system call seems
reasonable.

There is a classic question of if two network namespace files refer to
the same network namespace and I have code in linux-next and my pull
request to Linus to give those files a unique inode number.

So please use the facilities already merged into the kernel.

Thank you,
Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-12 19:25 ` Eric W. Biederman
@ 2012-12-12 20:54   ` Nicolas Dichtel
  2012-12-12 21:11     ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-12 20:54 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka

Le 12/12/2012 20:25, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> The goal of this serie is to ease netns management by daemons. Some systems use
>> netns only to virtualize network stack and don't want to multiply userland
>> daemons.  These system may have a lot of netns, up to 2000. We don't want to
>> launch an instance of each daemons (quagga, strongswan, conntrackd, ...) for
>> each netns because it will consume a lot of ressources. Having one daemon that
>> manage all netns is more efficient (mainly if there are few objects to manage:
>> one or two routes per netns for example).
>> Hence, one goal of this serie is to allow, for a daemon, to monitor netns
>> activities, thus it can open or close netlink sockets, allocating structures
>> needed to manage these netns when they are created or deleted.
>> To help to identify a netns, an index has been added to each netns.
>>
>> A new setsockopt() option is also added, to help daemons to open socket in the
>> right netns. For now, a daemon that want to open a socket in a specified netns,
>> need to call setns(CLONE_NEWNET) with a fd (not so easy to found), open the
>> socket and then call again setns() to go back in the initial netns. Having this
>> kind of setsockopt() will simplify operations. Obviously, this setsockopt()
>> should be done enough early (is test on sk_state enough?). The first target is
>> netlink socket but it can be useful for other kind of socket, it's why a add a
>> generic socket option.
>>
>> As usual, the patch against iproute2 will be sent once the patches are included
>> and net-next merged. I can send it on demand.
>
> Short answer you don't need to do any of this.
>
> setns with the namespace files in /proc/<pid>/ns/net gives you more than
> enough mechanism to solve this problem.  And iprout2 already supports
> all of this.
>
> And your approach creates very serious maintenances problems to the
> point I don't even want to read your patches.  What namespace do your
> namespace id's live in?
>
> A socketopt to change the namespace of a socket is nasty because sockets
> changing which network namespace they are in, leads to races which
> aren't worth thinking about writing the code to handle.
>
> Longer answer.
>
> You can bind mount the namespace id's /proc/<pid>/ns/net files to
> give you any name you want.  This puts naming policy in userspace
> control, and nests just fine.
>
> You can open a socket in any network namespace you want just
> by calling setns before socket.  Wrapping this idiom in a library call
> or if there is sufficient need in a socketat system call seems
> reasonable.
Yes, I agree that this SO_NETNS may be a bad idea.

>
> There is a classic question of if two network namespace files refer to
> the same network namespace and I have code in linux-next and my pull
> request to Linus to give those files a unique inode number.
Interesseting to know that.

>
> So please use the facilities already merged into the kernel.
Ok, but how can a daemon get the list of netns? Suppose that we want that
quagga manage all netns, how can it get this list to open needed netlink
socket?
For example, iproute2 is only aware of netns created with iproute2, but it
will no detect other netns.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-12 20:54   ` Nicolas Dichtel
@ 2012-12-12 21:11     ` Eric W. Biederman
  2012-12-12 21:48       ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2012-12-12 21:11 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 12/12/2012 20:25, Eric W. Biederman a écrit :

>> Short answer you don't need to do any of this.
>>
>> setns with the namespace files in /proc/<pid>/ns/net gives you more than
>> enough mechanism to solve this problem.  And iprout2 already supports
>> all of this.
>>
>> And your approach creates very serious maintenances problems to the
>> point I don't even want to read your patches.  What namespace do your
>> namespace id's live in?
>>
>> A socketopt to change the namespace of a socket is nasty because sockets
>> changing which network namespace they are in, leads to races which
>> aren't worth thinking about writing the code to handle.
>>
>> Longer answer.
>>
>> You can bind mount the namespace id's /proc/<pid>/ns/net files to
>> give you any name you want.  This puts naming policy in userspace
>> control, and nests just fine.
>>
>> You can open a socket in any network namespace you want just
>> by calling setns before socket.  Wrapping this idiom in a library call
>> or if there is sufficient need in a socketat system call seems
>> reasonable.
> Yes, I agree that this SO_NETNS may be a bad idea.
>
>>
>> There is a classic question of if two network namespace files refer to
>> the same network namespace and I have code in linux-next and my pull
>> request to Linus to give those files a unique inode number.
> Interesseting to know that.
>
>>
>> So please use the facilities already merged into the kernel.
> Ok, but how can a daemon get the list of netns? Suppose that we want that
> quagga manage all netns, how can it get this list to open needed netlink
> socket?
>
> For example, iproute2 is only aware of netns created with iproute2, but it
> will no detect other netns.

iproute2 is only aware of network namespaces created with the convention
that iproute2 uses.

If you want other network namespaces to be visible globally use the same
or a similar convention. All iproute2 does is
"mount --bind /proc/<pid>/ns/net /var/run/netns/<name>".   So this
convention is not hard to follow.

It is very wrong to presume that without context you know the reason for
the exsitence of any network namespace and that you should or even that
you can manage it.  Think of running your multi-network namespace
managing application in a container.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-12 21:11     ` Eric W. Biederman
@ 2012-12-12 21:48       ` Eric W. Biederman
  2012-12-13 17:41         ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2012-12-12 21:48 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka

ebiederm@xmission.com (Eric W. Biederman) writes:

> It is very wrong to presume that without context you know the reason for
> the exsitence of any network namespace and that you should or even that
> you can manage it.  Think of running your multi-network namespace
> managing application in a container.

A good example of a network namespace you don't want to mess with are
the network namespaces created by vsftp and chrome for security purposes
to remove any possibility of creating new connections to the network.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-12 21:48       ` Eric W. Biederman
@ 2012-12-13 17:41         ` Nicolas Dichtel
  2012-12-13 19:08           ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-13 17:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka

Le 12/12/2012 22:48, Eric W. Biederman a écrit :
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> It is very wrong to presume that without context you know the reason for
>> the exsitence of any network namespace and that you should or even that
>> you can manage it.  Think of running your multi-network namespace
>> managing application in a container.
>
> A good example of a network namespace you don't want to mess with are
> the network namespaces created by vsftp and chrome for security purposes
> to remove any possibility of creating new connections to the network.
>
Ok, I get the point.

A last question: from an administration point of view, is it intended to
not be able to monitor which netns are currently used? Like it can be done
for sockets, files, ...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-13 17:41         ` Nicolas Dichtel
@ 2012-12-13 19:08           ` Eric W. Biederman
  2012-12-14 16:13             ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2012-12-13 19:08 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 12/12/2012 22:48, Eric W. Biederman a écrit :
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>
>>> It is very wrong to presume that without context you know the reason for
>>> the exsitence of any network namespace and that you should or even that
>>> you can manage it.  Think of running your multi-network namespace
>>> managing application in a container.
>>
>> A good example of a network namespace you don't want to mess with are
>> the network namespaces created by vsftp and chrome for security purposes
>> to remove any possibility of creating new connections to the network.
>>
> Ok, I get the point.
>
> A last question: from an administration point of view, is it intended to
> not be able to monitor which netns are currently used? Like it can be done
> for sockets, files, ...

No.  The difficulty monitoring which network namespaces are being used
is an unintended side effect.

My pending changes to /proc/<pid>/ns/net and friends that allow you to
stat those files and compare if two network are the same network
namespace should make that monitoring much easier.  It isn't perfect as
there currently isn't a way to take a socket and say which network
namespace is this socket in.  But the current solution should tell you
what is happening most of the time.

struct net allocates it's own slab type so /proc/slabinfo on a good day
can tell you how many network namespace structures have been allocated
and are in use.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-13 19:08           ` Eric W. Biederman
@ 2012-12-14 16:13             ` Nicolas Dichtel
  2012-12-14 16:50               ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-14 16:13 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka

Le 13/12/2012 20:08, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 12/12/2012 22:48, Eric W. Biederman a écrit :
>>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>>
>>>> It is very wrong to presume that without context you know the reason for
>>>> the exsitence of any network namespace and that you should or even that
>>>> you can manage it.  Think of running your multi-network namespace
>>>> managing application in a container.
>>>
>>> A good example of a network namespace you don't want to mess with are
>>> the network namespaces created by vsftp and chrome for security purposes
>>> to remove any possibility of creating new connections to the network.
>>>
>> Ok, I get the point.
>>
>> A last question: from an administration point of view, is it intended to
>> not be able to monitor which netns are currently used? Like it can be done
>> for sockets, files, ...
>
> No.  The difficulty monitoring which network namespaces are being used
> is an unintended side effect.
Why is netlink a bad idea? Having a way to know all existing netns is a start
point to monitor netns, isn't it?

>
> My pending changes to /proc/<pid>/ns/net and friends that allow you to
> stat those files and compare if two network are the same network
> namespace should make that monitoring much easier.  It isn't perfect as
> there currently isn't a way to take a socket and say which network
> namespace is this socket in.  But the current solution should tell you
> what is happening most of the time.
Yes, this will give interessing infos.

> struct net allocates it's own slab type so /proc/slabinfo on a good day
> can tell you how many network namespace structures have been allocated
> and are in use.
Ok.

Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-14 16:13             ` Nicolas Dichtel
@ 2012-12-14 16:50               ` Eric W. Biederman
  2012-12-19  9:47                 ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2012-12-14 16:50 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 13/12/2012 20:08, Eric W. Biederman a écrit :

>> No.  The difficulty monitoring which network namespaces are being used
>> is an unintended side effect.
> Why is netlink a bad idea? Having a way to know all existing netns is a start
> point to monitor netns, isn't it?

In the same way that having a neighbour table that contains all existing
ip address to mac addresses mappings is a starting point to monitor all
existing hosts.

All does not scale.

All removes a lot of perfectly valid use cases like checkpoint-restart,
and nesting containers.

All as different from what is already implemented requires implementing
yet another namespace to put the names of all into it.  We have enough
namespaces now thank you very much.

An unfiltered global list is about as interesting to use as putting
all files in /.  Sure you know which directory you put your file in but
which file is it?

What has already been implemented should be roughly as good for
monitoring as what is available with lsof.

And of course there is the fact that a global list of anything that is
the same from every perspective violates the principle of relativity,
and is in contradiction with the phsical reality in which we exist.

So there is no way that having a global all inclusive list of network
namespaces makes the least lick of sense and I really don't want to
think about it.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
  2012-12-14 16:50               ` Eric W. Biederman
@ 2012-12-19  9:47                 ` Nicolas Dichtel
  0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2012-12-19  9:47 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka

Le 14/12/2012 17:50, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 13/12/2012 20:08, Eric W. Biederman a écrit :
>
>>> No.  The difficulty monitoring which network namespaces are being used
>>> is an unintended side effect.
>> Why is netlink a bad idea? Having a way to know all existing netns is a start
>> point to monitor netns, isn't it?
>
> In the same way that having a neighbour table that contains all existing
> ip address to mac addresses mappings is a starting point to monitor all
> existing hosts.
>
> All does not scale.
>
> All removes a lot of perfectly valid use cases like checkpoint-restart,
> and nesting containers.
>
> All as different from what is already implemented requires implementing
> yet another namespace to put the names of all into it.  We have enough
> namespaces now thank you very much.
>
> An unfiltered global list is about as interesting to use as putting
> all files in /.  Sure you know which directory you put your file in but
> which file is it?
>
> What has already been implemented should be roughly as good for
> monitoring as what is available with lsof.
>
> And of course there is the fact that a global list of anything that is
> the same from every perspective violates the principle of relativity,
> and is in contradiction with the phsical reality in which we exist.
>
> So there is no way that having a global all inclusive list of network
> namespaces makes the least lick of sense and I really don't want to
> think about it.

Thank you for your explanations and your patience, this is very useful.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-12-19  9:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-12 17:17 [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
2012-12-12 17:17 ` [RFC PATCH net-next 1/5] netns: allocate an unique id to identify a netns Nicolas Dichtel
2012-12-12 17:17 ` [RFC PATCH net-next 2/5] netns: allow to dump netns with netlink Nicolas Dichtel
2012-12-12 17:17 ` [RFC PATCH net-next 3/5] dev/netns: allow to get netns from nsindex in rtnl msg Nicolas Dichtel
2012-12-12 17:17 ` [RFC PATCH net-next 4/5] netns: advertise netns activity with netlink Nicolas Dichtel
2012-12-12 17:17 ` [RFC PATCH net-next 5/5] net/sock: add support of SO_NETNS Nicolas Dichtel
2012-12-12 18:39 ` [RFC PATCH net-next 0/5] Ease netns management for userland Nicolas Dichtel
2012-12-12 19:25 ` Eric W. Biederman
2012-12-12 20:54   ` Nicolas Dichtel
2012-12-12 21:11     ` Eric W. Biederman
2012-12-12 21:48       ` Eric W. Biederman
2012-12-13 17:41         ` Nicolas Dichtel
2012-12-13 19:08           ` Eric W. Biederman
2012-12-14 16:13             ` Nicolas Dichtel
2012-12-14 16:50               ` Eric W. Biederman
2012-12-19  9:47                 ` Nicolas Dichtel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.