All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] netns: ease netlink use with a lot of netns
@ 2015-05-06  9:58 Nicolas Dichtel
  2015-05-06  9:58 ` [PATCH net-next 1/6] netns: returns always an id in __peernet2id() Nicolas Dichtel
                   ` (6 more replies)
  0 siblings, 7 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06  9:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm


This idea was informally discussed in Ottawa / netdev0.1. The goal is to
ease the use/scalability of netns, from a userland point of view.
Today, users need to open one netlink socket per family and per netns.
Thus, when the number of netns inscreases (for example 5K or more), the
number of sockets needed to manage them grows a lot.

The goal of this series is to be able to monitor netlink events, for a
specified family, for a set of netns, with only one netlink socket. For
this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID.
When this option is set on a netlink socket, this socket will receive
netlink notifications from all netns that have a nsid assigned into the
netns where the socket has been opened.
The nsid is sent to userland via an anscillary data.

Here is an example with a patched iproute2. vxlan10 is created in the
current netns (netns0, nsid 0) and then moved to another netns (netns1,
nsid 1):

$ ip netns exec netns0 ip monitor all-nsid label
[nsid 0][NSID]nsid 1 (iproute2 netns name: netns1)
[nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT
[nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
[nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
[nsid 1][NSID]nsid 0 (iproute2 netns name: netns0)
[nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
[nsid 1][ADDR]5: vxlan10    inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10
       valid_lft forever preferred_lft forever
[nsid 1][ROUTE]local 192.168.0.249 dev vxlan10  table local  proto kernel  scope host  src 192.168.0.249 
[nsid 1][ROUTE]ff00::/8 dev vxlan10  table local  metric 256  pref medium
[nsid 1][ROUTE]2001:123::/64 dev vxlan10  proto kernel  metric 256  pref medium
[nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
[nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]192.168.0.0/24 dev vxlan10  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]fe80::/64 dev vxlan10  proto kernel  metric 256  pref medium


 drivers/net/vxlan.c          |   2 +-
 include/linux/netlink.h      |   1 +
 include/net/net_namespace.h  |   2 +
 include/uapi/linux/netlink.h |   1 +
 net/core/net_namespace.c     | 127 +++++++++++++++++++++++++++----------------
 net/core/rtnetlink.c         |   2 +-
 net/netlink/af_netlink.c     |  39 ++++++++++++-
 7 files changed, 124 insertions(+), 50 deletions(-)


Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH net-next 1/6] netns: returns always an id in __peernet2id()
  2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
@ 2015-05-06  9:58 ` Nicolas Dichtel
  2015-05-06 11:19   ` Thomas Graf
  2015-05-06  9:58 ` [PATCH net-next 2/6] netns: always provide the id to rtnl_net_fill() Nicolas Dichtel
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06  9:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, Nicolas Dichtel

All callers of this function expect a nsid, not an error.
Thus, returns NETNSA_NSID_NOT_ASSIGNED in case of error.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/net_namespace.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 78fc04ad36fc..294d38742e2a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -192,10 +192,12 @@ static int __peernet2id(struct net *net, struct net *peer, bool alloc)
 	if (id > 0)
 		return id;
 
-	if (alloc)
-		return alloc_netid(net, peer, -1);
+	if (alloc) {
+		id = alloc_netid(net, peer, -1);
+		return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
+	}
 
-	return -ENOENT;
+	return NETNSA_NSID_NOT_ASSIGNED;
 }
 
 /* This function returns the id of a peer netns. If no id is assigned, one will
@@ -204,10 +206,8 @@ static int __peernet2id(struct net *net, struct net *peer, bool alloc)
 int peernet2id(struct net *net, struct net *peer)
 {
 	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
-	int id;
 
-	id = __peernet2id(net, peer, alloc);
-	return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
+	return __peernet2id(net, peer, alloc);
 }
 EXPORT_SYMBOL(peernet2id);
 
@@ -554,13 +554,10 @@ static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
 	rth = nlmsg_data(nlh);
 	rth->rtgen_family = AF_UNSPEC;
 
-	if (nsid >= 0) {
+	if (nsid >= 0)
 		id = nsid;
-	} else {
+	else
 		id = __peernet2id(net, peer, false);
-		if  (id < 0)
-			id = NETNSA_NSID_NOT_ASSIGNED;
-	}
 	if (nla_put_s32(skb, NETNSA_NSID, id))
 		goto nla_put_failure;
 
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next 2/6] netns: always provide the id to rtnl_net_fill()
  2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
  2015-05-06  9:58 ` [PATCH net-next 1/6] netns: returns always an id in __peernet2id() Nicolas Dichtel
@ 2015-05-06  9:58 ` Nicolas Dichtel
  2015-05-06 11:25   ` Thomas Graf
  2015-05-06  9:58 ` [PATCH net-next 3/6] netns: rename peernet2id() to peernet2id_alloc() Nicolas Dichtel
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06  9:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, Nicolas Dichtel

The goal of this commit is to prepare the rework of the locking of nsnid
protection.
After this patch, rtnl_net_notifyid() will not call anymore __peernet2id(),
ie no idr_* operation into this function.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/net_namespace.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 294d38742e2a..37c68bb72db3 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -147,8 +147,7 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
-static void rtnl_net_notifyid(struct net *net, struct net *peer, int cmd,
-			      int id);
+static void rtnl_net_notifyid(struct net *net, int cmd, int id);
 static int alloc_netid(struct net *net, struct net *peer, int reqid)
 {
 	int min = 0, max = 0, id;
@@ -162,7 +161,7 @@ static int alloc_netid(struct net *net, struct net *peer, int reqid)
 
 	id = idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
 	if (id >= 0)
-		rtnl_net_notifyid(net, peer, RTM_NEWNSID, id);
+		rtnl_net_notifyid(net, RTM_NEWNSID, id);
 
 	return id;
 }
@@ -365,7 +364,7 @@ static void cleanup_net(struct work_struct *work)
 			int id = __peernet2id(tmp, net, false);
 
 			if (id >= 0) {
-				rtnl_net_notifyid(tmp, net, RTM_DELNSID, id);
+				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
 				idr_remove(&tmp->netns_ids, id);
 			}
 		}
@@ -538,14 +537,10 @@ static int rtnl_net_get_size(void)
 }
 
 static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-			 int cmd, struct net *net, struct net *peer,
-			 int nsid)
+			 int cmd, struct net *net, int nsid)
 {
 	struct nlmsghdr *nlh;
 	struct rtgenmsg *rth;
-	int id;
-
-	ASSERT_RTNL();
 
 	nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rth), flags);
 	if (!nlh)
@@ -554,11 +549,7 @@ static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
 	rth = nlmsg_data(nlh);
 	rth->rtgen_family = AF_UNSPEC;
 
-	if (nsid >= 0)
-		id = nsid;
-	else
-		id = __peernet2id(net, peer, false);
-	if (nla_put_s32(skb, NETNSA_NSID, id))
+	if (nla_put_s32(skb, NETNSA_NSID, nsid))
 		goto nla_put_failure;
 
 	nlmsg_end(skb, nlh);
@@ -575,7 +566,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct nlattr *tb[NETNSA_MAX + 1];
 	struct sk_buff *msg;
 	struct net *peer;
-	int err;
+	int err, id;
 
 	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
 			  rtnl_net_policy);
@@ -597,8 +588,9 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto out;
 	}
 
+	id = __peernet2id(net, peer, false);
 	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-			    RTM_GETNSID, net, peer, -1);
+			    RTM_GETNSID, net, id);
 	if (err < 0)
 		goto err_out;
 
@@ -630,7 +622,7 @@ static int rtnl_net_dumpid_one(int id, void *peer, void *data)
 
 	ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
 			    net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-			    RTM_NEWNSID, net_cb->net, peer, id);
+			    RTM_NEWNSID, net_cb->net, id);
 	if (ret < 0)
 		return ret;
 
@@ -658,8 +650,7 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-static void rtnl_net_notifyid(struct net *net, struct net *peer, int cmd,
-			      int id)
+static void rtnl_net_notifyid(struct net *net, int cmd, int id)
 {
 	struct sk_buff *msg;
 	int err = -ENOMEM;
@@ -668,7 +659,7 @@ static void rtnl_net_notifyid(struct net *net, struct net *peer, int cmd,
 	if (!msg)
 		goto out;
 
-	err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, peer, id);
+	err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, id);
 	if (err < 0)
 		goto err_out;
 
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next 3/6] netns: rename peernet2id() to peernet2id_alloc()
  2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
  2015-05-06  9:58 ` [PATCH net-next 1/6] netns: returns always an id in __peernet2id() Nicolas Dichtel
  2015-05-06  9:58 ` [PATCH net-next 2/6] netns: always provide the id to rtnl_net_fill() Nicolas Dichtel
@ 2015-05-06  9:58 ` Nicolas Dichtel
  2015-05-06 11:27   ` Thomas Graf
  2015-05-06  9:58 ` [PATCH net-next 4/6] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06  9:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, Nicolas Dichtel

In a following commit, a new function will be introduced to only lookup for
a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the
existing function is renamed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 drivers/net/vxlan.c         | 2 +-
 include/net/net_namespace.h | 2 +-
 net/core/net_namespace.c    | 4 ++--
 net/core/rtnetlink.c        | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3517ab0aa803..48341ae49012 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -336,7 +336,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 
 	if (!net_eq(dev_net(vxlan->dev), vxlan->net) &&
 	    nla_put_s32(skb, NDA_LINK_NETNSID,
-			peernet2id(dev_net(vxlan->dev), vxlan->net)))
+			peernet2id_alloc(dev_net(vxlan->dev), vxlan->net)))
 		goto nla_put_failure;
 
 	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index f733656404de..6d1e2eae32fb 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -271,7 +271,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #define __net_initconst	__initconst
 #endif
 
-int peernet2id(struct net *net, struct net *peer);
+int peernet2id_alloc(struct net *net, struct net *peer);
 struct net *get_net_ns_by_id(struct net *net, int id);
 
 struct pernet_operations {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 37c68bb72db3..9c806ac569f9 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -202,13 +202,13 @@ static int __peernet2id(struct net *net, struct net *peer, bool alloc)
 /* This function returns the id of a peer netns. If no id is assigned, one will
  * be allocated and returned.
  */
-int peernet2id(struct net *net, struct net *peer)
+int peernet2id_alloc(struct net *net, struct net *peer)
 {
 	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
 
 	return __peernet2id(net, peer, alloc);
 }
-EXPORT_SYMBOL(peernet2id);
+EXPORT_SYMBOL(peernet2id_alloc);
 
 struct net *get_net_ns_by_id(struct net *net, int id)
 {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 666e0928ba40..83e08323fdcd 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1204,7 +1204,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
 
 		if (!net_eq(dev_net(dev), link_net)) {
-			int id = peernet2id(dev_net(dev), link_net);
+			int id = peernet2id_alloc(dev_net(dev), link_net);
 
 			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
 				goto nla_put_failure;
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next 4/6] netns: notify new nsid outside __peernet2id()
  2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
                   ` (2 preceding siblings ...)
  2015-05-06  9:58 ` [PATCH net-next 3/6] netns: rename peernet2id() to peernet2id_alloc() Nicolas Dichtel
@ 2015-05-06  9:58 ` Nicolas Dichtel
  2015-05-06 11:48   ` Thomas Graf
  2015-05-06  9:58 ` [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management Nicolas Dichtel
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06  9:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, Nicolas Dichtel

There is no functional change with this patch. It will ease the refactoring
of the locking system that protects nsids and the support of the netlink
socket option NETLINK_LISTEN_ALL_NSID.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/net_namespace.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 9c806ac569f9..cc4b84c944be 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -147,10 +147,9 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
-static void rtnl_net_notifyid(struct net *net, int cmd, int id);
 static int alloc_netid(struct net *net, struct net *peer, int reqid)
 {
-	int min = 0, max = 0, id;
+	int min = 0, max = 0;
 
 	ASSERT_RTNL();
 
@@ -159,11 +158,7 @@ static int alloc_netid(struct net *net, struct net *peer, int reqid)
 		max = reqid + 1;
 	}
 
-	id = idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
-	if (id >= 0)
-		rtnl_net_notifyid(net, RTM_NEWNSID, id);
-
-	return id;
+	return idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
 }
 
 /* This function is used by idr_for_each(). If net is equal to peer, the
@@ -179,34 +174,43 @@ static int net_eq_idr(int id, void *net, void *peer)
 	return 0;
 }
 
-static int __peernet2id(struct net *net, struct net *peer, bool alloc)
+static int __peernet2id(struct net *net, struct net *peer, bool *alloc)
 {
 	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+	bool alloc_it = *alloc;
 
 	ASSERT_RTNL();
 
+	*alloc = false;
+
 	/* Magic value for id 0. */
 	if (id == NET_ID_ZERO)
 		return 0;
 	if (id > 0)
 		return id;
 
-	if (alloc) {
+	if (alloc_it) {
 		id = alloc_netid(net, peer, -1);
+		*alloc = true;
 		return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
 	}
 
 	return NETNSA_NSID_NOT_ASSIGNED;
 }
 
+static void rtnl_net_notifyid(struct net *net, int cmd, int id);
 /* This function returns the id of a peer netns. If no id is assigned, one will
  * be allocated and returned.
  */
 int peernet2id_alloc(struct net *net, struct net *peer)
 {
 	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
+	int id;
 
-	return __peernet2id(net, peer, alloc);
+	id = __peernet2id(net, peer, &alloc);
+	if (alloc && id >= 0)
+		rtnl_net_notifyid(net, RTM_NEWNSID, id);
+	return id;
 }
 EXPORT_SYMBOL(peernet2id_alloc);
 
@@ -361,7 +365,8 @@ static void cleanup_net(struct work_struct *work)
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
 		for_each_net(tmp) {
-			int id = __peernet2id(tmp, net, false);
+			bool no = false;
+			int id = __peernet2id(tmp, net, &no);
 
 			if (id >= 0) {
 				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
@@ -496,6 +501,7 @@ static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[NETNSA_MAX + 1];
+	bool no = false;
 	struct net *peer;
 	int nsid, err;
 
@@ -516,14 +522,16 @@ static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	if (IS_ERR(peer))
 		return PTR_ERR(peer);
 
-	if (__peernet2id(net, peer, false) >= 0) {
+	if (__peernet2id(net, peer, &no) >= 0) {
 		err = -EEXIST;
 		goto out;
 	}
 
 	err = alloc_netid(net, peer, nsid);
-	if (err > 0)
+	if (err >= 0) {
+		rtnl_net_notifyid(net, RTM_NEWNSID, err);
 		err = 0;
+	}
 out:
 	put_net(peer);
 	return err;
@@ -566,6 +574,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct nlattr *tb[NETNSA_MAX + 1];
 	struct sk_buff *msg;
 	struct net *peer;
+	bool no = false;
 	int err, id;
 
 	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
@@ -588,7 +597,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto out;
 	}
 
-	id = __peernet2id(net, peer, false);
+	id = __peernet2id(net, peer, &no);
 	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
 			    RTM_GETNSID, net, id);
 	if (err < 0)
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management
  2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
                   ` (3 preceding siblings ...)
  2015-05-06  9:58 ` [PATCH net-next 4/6] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
@ 2015-05-06  9:58 ` Nicolas Dichtel
  2015-05-06 12:23   ` Thomas Graf
  2015-05-06  9:58 ` [PATCH net-next 6/6] netlink: allow to listen "all" netns Nicolas Dichtel
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
  6 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06  9:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, Nicolas Dichtel

Before this patch, nsid were protected by the rtnl lock. The goal of this
patch is to be able to find a nsid without needing to hold the rtnl lock.

The next patch will introduce a netlink socket option to listen to all
netns that have a nsid assigned into the netns where the socket is opened.
Thus, it's important to call rtnl_net_notifyid() outside the spinlock, to
avoid a recursive lock (nsid are notified via rtnl). This was the main
reason of the previous patch.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/net_namespace.c | 58 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index cc4b84c944be..0b4cb3d63449 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -28,6 +28,7 @@
 static LIST_HEAD(pernet_list);
 static struct list_head *first_device = &pernet_list;
 DEFINE_MUTEX(net_mutex);
+static DEFINE_SPINLOCK(nsid_lock);
 
 LIST_HEAD(net_namespace_list);
 EXPORT_SYMBOL_GPL(net_namespace_list);
@@ -147,18 +148,17 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+/* should be called with nsid_lock held */
 static int alloc_netid(struct net *net, struct net *peer, int reqid)
 {
 	int min = 0, max = 0;
 
-	ASSERT_RTNL();
-
 	if (reqid >= 0) {
 		min = reqid;
 		max = reqid + 1;
 	}
 
-	return idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
+	return idr_alloc(&net->netns_ids, peer, min, max, GFP_ATOMIC);
 }
 
 /* This function is used by idr_for_each(). If net is equal to peer, the
@@ -174,13 +174,15 @@ static int net_eq_idr(int id, void *net, void *peer)
 	return 0;
 }
 
+/* Should be called with nsid_lock held. If a new id is assigned, the bool alloc
+ * is set to true, thus the caller knows that the new id must be notified via
+ * rtnl.
+ */
 static int __peernet2id(struct net *net, struct net *peer, bool *alloc)
 {
 	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
 	bool alloc_it = *alloc;
 
-	ASSERT_RTNL();
-
 	*alloc = false;
 
 	/* Magic value for id 0. */
@@ -204,27 +206,47 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int id);
  */
 int peernet2id_alloc(struct net *net, struct net *peer)
 {
-	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
+	unsigned long flags;
+	bool alloc;
 	int id;
 
+	spin_lock_irqsave(&nsid_lock, flags);
+	alloc = atomic_read(&peer->count) == 0 ? false : true;
 	id = __peernet2id(net, peer, &alloc);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 	if (alloc && id >= 0)
 		rtnl_net_notifyid(net, RTM_NEWNSID, id);
 	return id;
 }
 EXPORT_SYMBOL(peernet2id_alloc);
 
+/* This function returns, if assigned, the id of a peer netns. */
+static int peernet2id(struct net *net, struct net *peer)
+{
+	unsigned long flags;
+	bool no = false;
+	int id;
+
+	spin_lock_irqsave(&nsid_lock, flags);
+	id = __peernet2id(net, peer, &no);
+	spin_unlock_irqrestore(&nsid_lock, flags);
+	return id;
+}
+
 struct net *get_net_ns_by_id(struct net *net, int id)
 {
+	unsigned long flags;
 	struct net *peer;
 
 	if (id < 0)
 		return NULL;
 
 	rcu_read_lock();
+	spin_lock_irqsave(&nsid_lock, flags);
 	peer = idr_find(&net->netns_ids, id);
 	if (peer)
 		get_net(peer);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 	rcu_read_unlock();
 
 	return peer;
@@ -366,14 +388,19 @@ static void cleanup_net(struct work_struct *work)
 		list_add_tail(&net->exit_list, &net_exit_list);
 		for_each_net(tmp) {
 			bool no = false;
-			int id = __peernet2id(tmp, net, &no);
+			int id;
 
-			if (id >= 0) {
-				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
+			spin_lock_irq(&nsid_lock);
+			id = __peernet2id(tmp, net, &no);
+			if (id >= 0)
 				idr_remove(&tmp->netns_ids, id);
-			}
+			spin_unlock_irq(&nsid_lock);
+			if (id >= 0)
+				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
 		}
+		spin_lock_irq(&nsid_lock);
 		idr_destroy(&net->netns_ids);
+		spin_unlock_irq(&nsid_lock);
 
 	}
 	rtnl_unlock();
@@ -501,6 +528,7 @@ static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[NETNSA_MAX + 1];
+	unsigned long flags;
 	bool no = false;
 	struct net *peer;
 	int nsid, err;
@@ -522,12 +550,14 @@ static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	if (IS_ERR(peer))
 		return PTR_ERR(peer);
 
+	spin_lock_irqsave(&nsid_lock, flags);
 	if (__peernet2id(net, peer, &no) >= 0) {
 		err = -EEXIST;
 		goto out;
 	}
 
 	err = alloc_netid(net, peer, nsid);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 	if (err >= 0) {
 		rtnl_net_notifyid(net, RTM_NEWNSID, err);
 		err = 0;
@@ -574,7 +604,6 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct nlattr *tb[NETNSA_MAX + 1];
 	struct sk_buff *msg;
 	struct net *peer;
-	bool no = false;
 	int err, id;
 
 	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
@@ -597,7 +626,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto out;
 	}
 
-	id = __peernet2id(net, peer, &no);
+	id = peernet2id(net, peer);
 	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
 			    RTM_GETNSID, net, id);
 	if (err < 0)
@@ -650,10 +679,11 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
 		.idx = 0,
 		.s_idx = cb->args[0],
 	};
+	unsigned long flags;
 
-	ASSERT_RTNL();
-
+	spin_lock_irqsave(&nsid_lock, flags);
 	idr_for_each(&net->netns_ids, rtnl_net_dumpid_one, &net_cb);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 
 	cb->args[0] = net_cb.idx;
 	return skb->len;
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next 6/6] netlink: allow to listen "all" netns
  2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
                   ` (4 preceding siblings ...)
  2015-05-06  9:58 ` [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management Nicolas Dichtel
@ 2015-05-06  9:58 ` Nicolas Dichtel
  2015-05-06 12:10   ` Thomas Graf
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
  6 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06  9:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, Nicolas Dichtel

More accurately, listen all netns that have a nsid assigned into the netns
where the netlink socket is opened.
For this purpose, a netlink socket option is added:
NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
socket will receive netlink notifications from all netns that have a nsid
assigned into the netns where the socket has been opened. The nsid is sent
to userland via an anscillary data.

With this patch, a daemon needs only one socket to listen many netns. This
is useful when the number of netns is high.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/netlink.h      |  1 +
 include/net/net_namespace.h  |  2 ++
 include/uapi/linux/netlink.h |  1 +
 net/core/net_namespace.c     | 10 +++++++++-
 net/netlink/af_netlink.c     | 39 +++++++++++++++++++++++++++++++++++++--
 5 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 6835c1279df7..2e34392ddfb7 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -28,6 +28,7 @@ struct netlink_skb_parms {
 	__u32			dst_group;
 	__u32			flags;
 	struct sock		*sk;
+	struct net		*net;
 };
 
 #define NETLINK_CB(skb)		(*(struct netlink_skb_parms*)&((skb)->cb))
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 6d1e2eae32fb..3f850acc844e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -272,6 +272,8 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #endif
 
 int peernet2id_alloc(struct net *net, struct net *peer);
+int peernet2id(struct net *net, struct net *peer);
+bool peernet_has_id(struct net *net, struct net *peer);
 struct net *get_net_ns_by_id(struct net *net, int id);
 
 struct pernet_operations {
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 1a85940f8ab7..3e34b7d702f8 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -108,6 +108,7 @@ struct nlmsgerr {
 #define NETLINK_NO_ENOBUFS	5
 #define NETLINK_RX_RING		6
 #define NETLINK_TX_RING		7
+#define NETLINK_LISTEN_ALL_NSID	8
 
 struct nl_pktinfo {
 	__u32	group;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 0b4cb3d63449..2af3cc4ecf2d 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -221,7 +221,7 @@ int peernet2id_alloc(struct net *net, struct net *peer)
 EXPORT_SYMBOL(peernet2id_alloc);
 
 /* This function returns, if assigned, the id of a peer netns. */
-static int peernet2id(struct net *net, struct net *peer)
+int peernet2id(struct net *net, struct net *peer)
 {
 	unsigned long flags;
 	bool no = false;
@@ -233,6 +233,14 @@ static int peernet2id(struct net *net, struct net *peer)
 	return id;
 }
 
+/* This function returns true is the peer netns has an id assigned into the
+ * current netns.
+ */
+bool peernet_has_id(struct net *net, struct net *peer)
+{
+	return peernet2id(net, peer) >= 0;
+}
+
 struct net *get_net_ns_by_id(struct net *net, int id)
 {
 	unsigned long flags;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ec4adbdcb9b4..bdbde542e952 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -83,6 +83,7 @@ struct listeners {
 #define NETLINK_RECV_PKTINFO	0x2
 #define NETLINK_BROADCAST_SEND_ERROR	0x4
 #define NETLINK_RECV_NO_ENOBUFS	0x8
+#define NETLINK_LISTEN_ALL	0x10
 
 static inline int netlink_is_kernel(struct sock *sk)
 {
@@ -1931,8 +1932,18 @@ static void do_one_broadcast(struct sock *sk,
 	    !test_bit(p->group - 1, nlk->groups))
 		return;
 
-	if (!net_eq(sock_net(sk), p->net))
-		return;
+	if (!net_eq(sock_net(sk), p->net)) {
+		if (!(nlk->flags & NETLINK_LISTEN_ALL))
+			return;
+
+		if (!peernet_has_id(sock_net(sk), p->net))
+			return;
+
+		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
+				     CAP_NET_BROADCAST))
+			return;
+	}
+	NETLINK_CB(p->skb).net = p->net;
 
 	if (p->failure) {
 		netlink_overrun(sk);
@@ -2201,6 +2212,16 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		break;
 	}
 #endif /* CONFIG_NETLINK_MMAP */
+	case NETLINK_LISTEN_ALL_NSID:
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_BROADCAST))
+			return -EPERM;
+
+		if (val)
+			nlk->flags |= NETLINK_LISTEN_ALL;
+		else
+			nlk->flags &= ~NETLINK_LISTEN_ALL;
+		err = 0;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 	}
@@ -2267,6 +2288,18 @@ static void netlink_cmsg_recv_pktinfo(struct msghdr *msg, struct sk_buff *skb)
 	put_cmsg(msg, SOL_NETLINK, NETLINK_PKTINFO, sizeof(info), &info);
 }
 
+static void netlink_cmsg_listen_all_nsid(struct sock *sk, struct msghdr *msg,
+					 struct sk_buff *skb)
+{
+	int nsid;
+
+	if (!NETLINK_CB(skb).net)
+		return;
+
+	nsid = peernet2id(sock_net(sk), NETLINK_CB(skb).net);
+	put_cmsg(msg, SOL_NETLINK, NETLINK_LISTEN_ALL_NSID, sizeof(int), &nsid);
+}
+
 static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 {
 	struct sock *sk = sock->sk;
@@ -2420,6 +2453,8 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	if (nlk->flags & NETLINK_RECV_PKTINFO)
 		netlink_cmsg_recv_pktinfo(msg, skb);
+	if (nlk->flags & NETLINK_LISTEN_ALL)
+		netlink_cmsg_listen_all_nsid(sk, msg, skb);
 
 	memset(&scm, 0, sizeof(scm));
 	scm.creds = *NETLINK_CREDS(skb);
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 1/6] netns: returns always an id in __peernet2id()
  2015-05-06  9:58 ` [PATCH net-next 1/6] netns: returns always an id in __peernet2id() Nicolas Dichtel
@ 2015-05-06 11:19   ` Thomas Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Graf @ 2015-05-06 11:19 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
> All callers of this function expect a nsid, not an error.
> Thus, returns NETNSA_NSID_NOT_ASSIGNED in case of error.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

I thought it was a bugfix based on the commit message. It's
a nice cleanup though.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 2/6] netns: always provide the id to rtnl_net_fill()
  2015-05-06  9:58 ` [PATCH net-next 2/6] netns: always provide the id to rtnl_net_fill() Nicolas Dichtel
@ 2015-05-06 11:25   ` Thomas Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Graf @ 2015-05-06 11:25 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
> The goal of this commit is to prepare the rework of the locking of nsnid
> protection.
> After this patch, rtnl_net_notifyid() will not call anymore __peernet2id(),
> ie no idr_* operation into this function.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 3/6] netns: rename peernet2id() to peernet2id_alloc()
  2015-05-06  9:58 ` [PATCH net-next 3/6] netns: rename peernet2id() to peernet2id_alloc() Nicolas Dichtel
@ 2015-05-06 11:27   ` Thomas Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Graf @ 2015-05-06 11:27 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
> In a following commit, a new function will be introduced to only lookup for
> a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the
> existing function is renamed.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 4/6] netns: notify new nsid outside __peernet2id()
  2015-05-06  9:58 ` [PATCH net-next 4/6] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
@ 2015-05-06 11:48   ` Thomas Graf
  2015-05-06 13:39     ` Nicolas Dichtel
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Graf @ 2015-05-06 11:48 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
> -static int __peernet2id(struct net *net, struct net *peer, bool alloc)
> +static int __peernet2id(struct net *net, struct net *peer, bool *alloc)
>  {
>  	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
> +	bool alloc_it = *alloc;
>  
>  	ASSERT_RTNL();
>  
> +	*alloc = false;
> +
>  	/* Magic value for id 0. */
>  	if (id == NET_ID_ZERO)
>  		return 0;
>  	if (id > 0)
>  		return id;
>  
> -	if (alloc) {
> +	if (alloc_it) {
>  		id = alloc_netid(net, peer, -1);
> +		*alloc = true;
>  		return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
>  	}
>  
>  	return NETNSA_NSID_NOT_ASSIGNED;
>  }

Since you need the allocation behaviour from one call site only it
might be cleaner to add a __peernet2id_alloc() which is used from
the old __peernet2id() so you can call __peernet2id_alloc() from
peernet2id_alloc() and avoid putting these ugly bool alloc = false
in all other call sites.

Otherwise this looks good.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 6/6] netlink: allow to listen "all" netns
  2015-05-06  9:58 ` [PATCH net-next 6/6] netlink: allow to listen "all" netns Nicolas Dichtel
@ 2015-05-06 12:10   ` Thomas Graf
  2015-05-06 13:42     ` Nicolas Dichtel
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Graf @ 2015-05-06 12:10 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
> More accurately, listen all netns that have a nsid assigned into the netns
> where the netlink socket is opened.
> For this purpose, a netlink socket option is added:
> NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
> socket will receive netlink notifications from all netns that have a nsid
> assigned into the netns where the socket has been opened. The nsid is sent
> to userland via an anscillary data.
> 
> With this patch, a daemon needs only one socket to listen many netns. This
> is useful when the number of netns is high.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

[...]

> +/* This function returns true is the peer netns has an id assigned into the
> + * current netns.
> + */
> +bool peernet_has_id(struct net *net, struct net *peer)
> +{
> +	return peernet2id(net, peer) >= 0;
> +}

Missing export?

> +
>  struct net *get_net_ns_by_id(struct net *net, int id)
>  {
>  	unsigned long flags;
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index ec4adbdcb9b4..bdbde542e952 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -83,6 +83,7 @@ struct listeners {
>  #define NETLINK_RECV_PKTINFO	0x2
>  #define NETLINK_BROADCAST_SEND_ERROR	0x4
>  #define NETLINK_RECV_NO_ENOBUFS	0x8
> +#define NETLINK_LISTEN_ALL	0x10

Maybe name this NETLINK_LISTEN_ALL_NSID just to make it clear?

> +		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> +				     CAP_NET_BROADCAST))
> +			return;
> +	}
> +	NETLINK_CB(p->skb).net = p->net;

Does this need a get_net()? The netns could disappear while the skb is
queued, right?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management
  2015-05-06  9:58 ` [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management Nicolas Dichtel
@ 2015-05-06 12:23   ` Thomas Graf
  2015-05-06 13:40     ` Nicolas Dichtel
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Graf @ 2015-05-06 12:23 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
> +/* Should be called with nsid_lock held. If a new id is assigned, the bool alloc
> + * is set to true, thus the caller knows that the new id must be notified via
> + * rtnl.
> + */
>  static int __peernet2id(struct net *net, struct net *peer, bool *alloc)
>  {
>  	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
>  	bool alloc_it = *alloc;
>  
> -	ASSERT_RTNL();
> -
>  	*alloc = false;
>  
>  	/* Magic value for id 0. */

If split into __peernet2id() and __peernet2id_alloc() then this could
live with RCU protection I guess so we only take the lock when we
actually allocate.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 4/6] netns: notify new nsid outside __peernet2id()
  2015-05-06 11:48   ` Thomas Graf
@ 2015-05-06 13:39     ` Nicolas Dichtel
  0 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06 13:39 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev, davem, ebiederm

Le 06/05/2015 13:48, Thomas Graf a écrit :
[snip]
>
> Since you need the allocation behaviour from one call site only it
> might be cleaner to add a __peernet2id_alloc() which is used from
> the old __peernet2id() so you can call __peernet2id_alloc() from
> peernet2id_alloc() and avoid putting these ugly bool alloc = false
> in all other call sites.
You're absolutely right.

Side note:

bool false = false;
...
__peernet2id_alloc(net, peer, &false);

This compiles but the behavior is unexpected. In __peernet2id_alloc() the bool
is true. I need to think a bit more to explain that.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management
  2015-05-06 12:23   ` Thomas Graf
@ 2015-05-06 13:40     ` Nicolas Dichtel
  2015-05-06 14:05       ` Thomas Graf
  0 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06 13:40 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev, davem, ebiederm

Le 06/05/2015 14:23, Thomas Graf a écrit :
> On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
>> +/* Should be called with nsid_lock held. If a new id is assigned, the bool alloc
>> + * is set to true, thus the caller knows that the new id must be notified via
>> + * rtnl.
>> + */
>>   static int __peernet2id(struct net *net, struct net *peer, bool *alloc)
>>   {
>>   	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
>>   	bool alloc_it = *alloc;
>>
>> -	ASSERT_RTNL();
>> -
>>   	*alloc = false;
>>
>>   	/* Magic value for id 0. */
>
> If split into __peernet2id() and __peernet2id_alloc() then this could
> live with RCU protection I guess so we only take the lock when we
> actually allocate.
>
The description of idr_for_each says:
"The caller must serialize idr_for_each() vs idr_get_new() and idr_remove()."

So, if I understand well, the lock is always needed.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 6/6] netlink: allow to listen "all" netns
  2015-05-06 12:10   ` Thomas Graf
@ 2015-05-06 13:42     ` Nicolas Dichtel
  0 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-06 13:42 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev, davem, ebiederm

Le 06/05/2015 14:10, Thomas Graf a écrit :
> On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
[snip]
>> +/* This function returns true is the peer netns has an id assigned into the
>> + * current netns.
>> + */
>> +bool peernet_has_id(struct net *net, struct net *peer)
>> +{
>> +	return peernet2id(net, peer) >= 0;
>> +}
>
> Missing export?
Only used by net/netlink/af_netlink.c, which cannot be compiled as a module.

>
>> +
>>   struct net *get_net_ns_by_id(struct net *net, int id)
>>   {
>>   	unsigned long flags;
>> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
>> index ec4adbdcb9b4..bdbde542e952 100644
>> --- a/net/netlink/af_netlink.c
>> +++ b/net/netlink/af_netlink.c
>> @@ -83,6 +83,7 @@ struct listeners {
>>   #define NETLINK_RECV_PKTINFO	0x2
>>   #define NETLINK_BROADCAST_SEND_ERROR	0x4
>>   #define NETLINK_RECV_NO_ENOBUFS	0x8
>> +#define NETLINK_LISTEN_ALL	0x10
>
> Maybe name this NETLINK_LISTEN_ALL_NSID just to make it clear?
Yes ... but it's also the name of the socket option (see include/uapi/linux
/netlink.h).
I can introduce a patch before this one to rename all these private flags from
NETLINK_FOO to NETLINK_F_FOO so that they will never overlap with netlink
socket options.

>
>> +		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> +				     CAP_NET_BROADCAST))
>> +			return;
>> +	}
>> +	NETLINK_CB(p->skb).net = p->net;
>
> Does this need a get_net()? The netns could disappear while the skb is
> queued, right?
>
You're right.

Thank you for your review.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management
  2015-05-06 13:40     ` Nicolas Dichtel
@ 2015-05-06 14:05       ` Thomas Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Graf @ 2015-05-06 14:05 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/06/15 at 03:40pm, Nicolas Dichtel wrote:
> Le 06/05/2015 14:23, Thomas Graf a ?crit :
> >On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
> >>+/* Should be called with nsid_lock held. If a new id is assigned, the bool alloc
> >>+ * is set to true, thus the caller knows that the new id must be notified via
> >>+ * rtnl.
> >>+ */
> >>  static int __peernet2id(struct net *net, struct net *peer, bool *alloc)
> >>  {
> >>  	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
> >>  	bool alloc_it = *alloc;
> >>
> >>-	ASSERT_RTNL();
> >>-
> >>  	*alloc = false;
> >>
> >>  	/* Magic value for id 0. */
> >
> >If split into __peernet2id() and __peernet2id_alloc() then this could
> >live with RCU protection I guess so we only take the lock when we
> >actually allocate.
> >
> The description of idr_for_each says:
> "The caller must serialize idr_for_each() vs idr_get_new() and idr_remove()."
> 
> So, if I understand well, the lock is always needed.

Ah, I looked at idr_alloc which says:

 * The user is responsible for exclusively synchronizing all operations
 * which may modify @idr.  However, read-only accesses such as idr_find()
 * or iteration can be performed under RCU read lock provided the user
 * destroys @ptr in RCU-safe way after removal from idr.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
                   ` (5 preceding siblings ...)
  2015-05-06  9:58 ` [PATCH net-next 6/6] netlink: allow to listen "all" netns Nicolas Dichtel
@ 2015-05-07  9:02 ` Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 1/7] netns: returns always an id in __peernet2id() Nicolas Dichtel
                     ` (8 more replies)
  6 siblings, 9 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm


This idea was informally discussed in Ottawa / netdev0.1. The goal is to
ease the use/scalability of netns, from a userland point of view.
Today, users need to open one netlink socket per family and per netns.
Thus, when the number of netns inscreases (for example 5K or more), the
number of sockets needed to manage them grows a lot.

The goal of this series is to be able to monitor netlink events, for a
specified family, for a set of netns, with only one netlink socket. For
this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID.
When this option is set on a netlink socket, this socket will receive
netlink notifications from all netns that have a nsid assigned into the
netns where the socket has been opened.
The nsid is sent to userland via an anscillary data.

Here is an example with a patched iproute2. vxlan10 is created in the
current netns (netns0, nsid 0) and then moved to another netns (netns1,
nsid 1):

$ ip netns exec netns0 ip monitor all-nsid label
[nsid 0][NSID]nsid 1 (iproute2 netns name: netns1)
[nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT
[nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
[nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
[nsid 1][NSID]nsid 0 (iproute2 netns name: netns0)
[nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
[nsid 1][ADDR]5: vxlan10    inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10
       valid_lft forever preferred_lft forever
[nsid 1][ROUTE]local 192.168.0.249 dev vxlan10  table local  proto kernel  scope host  src 192.168.0.249 
[nsid 1][ROUTE]ff00::/8 dev vxlan10  table local  metric 256  pref medium
[nsid 1][ROUTE]2001:123::/64 dev vxlan10  proto kernel  metric 256  pref medium
[nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
[nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]192.168.0.0/24 dev vxlan10  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]fe80::/64 dev vxlan10  proto kernel  metric 256  pref medium


 drivers/net/vxlan.c          |   2 +-
 include/linux/netlink.h      |   2 +
 include/net/net_namespace.h  |   2 +
 include/uapi/linux/netlink.h |   1 +
 net/core/net_namespace.c     | 132 ++++++++++++++++++++++++++++---------------
 net/core/rtnetlink.c         |   2 +-
 net/netlink/af_netlink.c     | 111 +++++++++++++++++++++++++-----------
 7 files changed, 170 insertions(+), 82 deletions(-)


Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 1/7] netns: returns always an id in __peernet2id()
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
@ 2015-05-07  9:02   ` Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 2/7] netns: always provide the id to rtnl_net_fill() Nicolas Dichtel
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm, Nicolas Dichtel

All callers of this function expect a nsid, not an error.
Thus, returns NETNSA_NSID_NOT_ASSIGNED in case of error so that callers
don't have to convert the error to NETNSA_NSID_NOT_ASSIGNED.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
---

v2: reword the commit log

 net/core/net_namespace.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 78fc04ad36fc..294d38742e2a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -192,10 +192,12 @@ static int __peernet2id(struct net *net, struct net *peer, bool alloc)
 	if (id > 0)
 		return id;
 
-	if (alloc)
-		return alloc_netid(net, peer, -1);
+	if (alloc) {
+		id = alloc_netid(net, peer, -1);
+		return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
+	}
 
-	return -ENOENT;
+	return NETNSA_NSID_NOT_ASSIGNED;
 }
 
 /* This function returns the id of a peer netns. If no id is assigned, one will
@@ -204,10 +206,8 @@ static int __peernet2id(struct net *net, struct net *peer, bool alloc)
 int peernet2id(struct net *net, struct net *peer)
 {
 	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
-	int id;
 
-	id = __peernet2id(net, peer, alloc);
-	return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
+	return __peernet2id(net, peer, alloc);
 }
 EXPORT_SYMBOL(peernet2id);
 
@@ -554,13 +554,10 @@ static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
 	rth = nlmsg_data(nlh);
 	rth->rtgen_family = AF_UNSPEC;
 
-	if (nsid >= 0) {
+	if (nsid >= 0)
 		id = nsid;
-	} else {
+	else
 		id = __peernet2id(net, peer, false);
-		if  (id < 0)
-			id = NETNSA_NSID_NOT_ASSIGNED;
-	}
 	if (nla_put_s32(skb, NETNSA_NSID, id))
 		goto nla_put_failure;
 
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 2/7] netns: always provide the id to rtnl_net_fill()
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 1/7] netns: returns always an id in __peernet2id() Nicolas Dichtel
@ 2015-05-07  9:02   ` Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 3/7] netns: rename peernet2id() to peernet2id_alloc() Nicolas Dichtel
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm, Nicolas Dichtel

The goal of this commit is to prepare the rework of the locking of nsnid
protection.
After this patch, rtnl_net_notifyid() will not call anymore __peernet2id(),
ie no idr_* operation into this function.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
---

v2: no change

 net/core/net_namespace.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 294d38742e2a..37c68bb72db3 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -147,8 +147,7 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
-static void rtnl_net_notifyid(struct net *net, struct net *peer, int cmd,
-			      int id);
+static void rtnl_net_notifyid(struct net *net, int cmd, int id);
 static int alloc_netid(struct net *net, struct net *peer, int reqid)
 {
 	int min = 0, max = 0, id;
@@ -162,7 +161,7 @@ static int alloc_netid(struct net *net, struct net *peer, int reqid)
 
 	id = idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
 	if (id >= 0)
-		rtnl_net_notifyid(net, peer, RTM_NEWNSID, id);
+		rtnl_net_notifyid(net, RTM_NEWNSID, id);
 
 	return id;
 }
@@ -365,7 +364,7 @@ static void cleanup_net(struct work_struct *work)
 			int id = __peernet2id(tmp, net, false);
 
 			if (id >= 0) {
-				rtnl_net_notifyid(tmp, net, RTM_DELNSID, id);
+				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
 				idr_remove(&tmp->netns_ids, id);
 			}
 		}
@@ -538,14 +537,10 @@ static int rtnl_net_get_size(void)
 }
 
 static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-			 int cmd, struct net *net, struct net *peer,
-			 int nsid)
+			 int cmd, struct net *net, int nsid)
 {
 	struct nlmsghdr *nlh;
 	struct rtgenmsg *rth;
-	int id;
-
-	ASSERT_RTNL();
 
 	nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rth), flags);
 	if (!nlh)
@@ -554,11 +549,7 @@ static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
 	rth = nlmsg_data(nlh);
 	rth->rtgen_family = AF_UNSPEC;
 
-	if (nsid >= 0)
-		id = nsid;
-	else
-		id = __peernet2id(net, peer, false);
-	if (nla_put_s32(skb, NETNSA_NSID, id))
+	if (nla_put_s32(skb, NETNSA_NSID, nsid))
 		goto nla_put_failure;
 
 	nlmsg_end(skb, nlh);
@@ -575,7 +566,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct nlattr *tb[NETNSA_MAX + 1];
 	struct sk_buff *msg;
 	struct net *peer;
-	int err;
+	int err, id;
 
 	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
 			  rtnl_net_policy);
@@ -597,8 +588,9 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto out;
 	}
 
+	id = __peernet2id(net, peer, false);
 	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-			    RTM_GETNSID, net, peer, -1);
+			    RTM_GETNSID, net, id);
 	if (err < 0)
 		goto err_out;
 
@@ -630,7 +622,7 @@ static int rtnl_net_dumpid_one(int id, void *peer, void *data)
 
 	ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
 			    net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-			    RTM_NEWNSID, net_cb->net, peer, id);
+			    RTM_NEWNSID, net_cb->net, id);
 	if (ret < 0)
 		return ret;
 
@@ -658,8 +650,7 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-static void rtnl_net_notifyid(struct net *net, struct net *peer, int cmd,
-			      int id)
+static void rtnl_net_notifyid(struct net *net, int cmd, int id)
 {
 	struct sk_buff *msg;
 	int err = -ENOMEM;
@@ -668,7 +659,7 @@ static void rtnl_net_notifyid(struct net *net, struct net *peer, int cmd,
 	if (!msg)
 		goto out;
 
-	err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, peer, id);
+	err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, id);
 	if (err < 0)
 		goto err_out;
 
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 3/7] netns: rename peernet2id() to peernet2id_alloc()
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 1/7] netns: returns always an id in __peernet2id() Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 2/7] netns: always provide the id to rtnl_net_fill() Nicolas Dichtel
@ 2015-05-07  9:02   ` Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 4/7] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm, Nicolas Dichtel

In a following commit, a new function will be introduced to only lookup for
a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the
existing function is renamed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
---

v2: no change

 drivers/net/vxlan.c         | 2 +-
 include/net/net_namespace.h | 2 +-
 net/core/net_namespace.c    | 4 ++--
 net/core/rtnetlink.c        | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3517ab0aa803..48341ae49012 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -336,7 +336,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 
 	if (!net_eq(dev_net(vxlan->dev), vxlan->net) &&
 	    nla_put_s32(skb, NDA_LINK_NETNSID,
-			peernet2id(dev_net(vxlan->dev), vxlan->net)))
+			peernet2id_alloc(dev_net(vxlan->dev), vxlan->net)))
 		goto nla_put_failure;
 
 	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index f733656404de..6d1e2eae32fb 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -271,7 +271,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #define __net_initconst	__initconst
 #endif
 
-int peernet2id(struct net *net, struct net *peer);
+int peernet2id_alloc(struct net *net, struct net *peer);
 struct net *get_net_ns_by_id(struct net *net, int id);
 
 struct pernet_operations {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 37c68bb72db3..9c806ac569f9 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -202,13 +202,13 @@ static int __peernet2id(struct net *net, struct net *peer, bool alloc)
 /* This function returns the id of a peer netns. If no id is assigned, one will
  * be allocated and returned.
  */
-int peernet2id(struct net *net, struct net *peer)
+int peernet2id_alloc(struct net *net, struct net *peer)
 {
 	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
 
 	return __peernet2id(net, peer, alloc);
 }
-EXPORT_SYMBOL(peernet2id);
+EXPORT_SYMBOL(peernet2id_alloc);
 
 struct net *get_net_ns_by_id(struct net *net, int id)
 {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 666e0928ba40..83e08323fdcd 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1204,7 +1204,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
 
 		if (!net_eq(dev_net(dev), link_net)) {
-			int id = peernet2id(dev_net(dev), link_net);
+			int id = peernet2id_alloc(dev_net(dev), link_net);
 
 			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
 				goto nla_put_failure;
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 4/7] netns: notify new nsid outside __peernet2id()
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
                     ` (2 preceding siblings ...)
  2015-05-07  9:02   ` [PATCH net-next v2 3/7] netns: rename peernet2id() to peernet2id_alloc() Nicolas Dichtel
@ 2015-05-07  9:02   ` Nicolas Dichtel
  2015-05-07 11:47     ` Thomas Graf
  2015-05-07  9:02   ` [PATCH net-next v2 5/7] netns: use a spin_lock to protect nsid management Nicolas Dichtel
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm, Nicolas Dichtel

There is no functional change with this patch. It will ease the refactoring
of the locking system that protects nsids and the support of the netlink
socket option NETLINK_LISTEN_ALL_NSID.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v2: add __peernet2id_alloc()

 net/core/net_namespace.c | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 9c806ac569f9..ee864241f8d6 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -147,10 +147,9 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
-static void rtnl_net_notifyid(struct net *net, int cmd, int id);
 static int alloc_netid(struct net *net, struct net *peer, int reqid)
 {
-	int min = 0, max = 0, id;
+	int min = 0, max = 0;
 
 	ASSERT_RTNL();
 
@@ -159,11 +158,7 @@ static int alloc_netid(struct net *net, struct net *peer, int reqid)
 		max = reqid + 1;
 	}
 
-	id = idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
-	if (id >= 0)
-		rtnl_net_notifyid(net, RTM_NEWNSID, id);
-
-	return id;
+	return idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
 }
 
 /* This function is used by idr_for_each(). If net is equal to peer, the
@@ -179,34 +174,50 @@ static int net_eq_idr(int id, void *net, void *peer)
 	return 0;
 }
 
-static int __peernet2id(struct net *net, struct net *peer, bool alloc)
+static int __peernet2id_alloc(struct net *net, struct net *peer, bool *alloc)
 {
 	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+	bool alloc_it = *alloc;
 
 	ASSERT_RTNL();
 
+	*alloc = false;
+
 	/* Magic value for id 0. */
 	if (id == NET_ID_ZERO)
 		return 0;
 	if (id > 0)
 		return id;
 
-	if (alloc) {
+	if (alloc_it) {
 		id = alloc_netid(net, peer, -1);
+		*alloc = true;
 		return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
 	}
 
 	return NETNSA_NSID_NOT_ASSIGNED;
 }
 
+static int __peernet2id(struct net *net, struct net *peer)
+{
+	bool no = false;
+
+	return __peernet2id_alloc(net, peer, &no);
+}
+
+static void rtnl_net_notifyid(struct net *net, int cmd, int id);
 /* This function returns the id of a peer netns. If no id is assigned, one will
  * be allocated and returned.
  */
 int peernet2id_alloc(struct net *net, struct net *peer)
 {
 	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
+	int id;
 
-	return __peernet2id(net, peer, alloc);
+	id = __peernet2id_alloc(net, peer, &alloc);
+	if (alloc && id >= 0)
+		rtnl_net_notifyid(net, RTM_NEWNSID, id);
+	return id;
 }
 EXPORT_SYMBOL(peernet2id_alloc);
 
@@ -361,7 +372,7 @@ static void cleanup_net(struct work_struct *work)
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
 		for_each_net(tmp) {
-			int id = __peernet2id(tmp, net, false);
+			int id = __peernet2id(tmp, net);
 
 			if (id >= 0) {
 				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
@@ -516,14 +527,16 @@ static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	if (IS_ERR(peer))
 		return PTR_ERR(peer);
 
-	if (__peernet2id(net, peer, false) >= 0) {
+	if (__peernet2id(net, peer) >= 0) {
 		err = -EEXIST;
 		goto out;
 	}
 
 	err = alloc_netid(net, peer, nsid);
-	if (err > 0)
+	if (err >= 0) {
+		rtnl_net_notifyid(net, RTM_NEWNSID, err);
 		err = 0;
+	}
 out:
 	put_net(peer);
 	return err;
@@ -588,7 +601,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto out;
 	}
 
-	id = __peernet2id(net, peer, false);
+	id = __peernet2id(net, peer);
 	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
 			    RTM_GETNSID, net, id);
 	if (err < 0)
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 5/7] netns: use a spin_lock to protect nsid management
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
                     ` (3 preceding siblings ...)
  2015-05-07  9:02   ` [PATCH net-next v2 4/7] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
@ 2015-05-07  9:02   ` Nicolas Dichtel
  2015-05-07  9:02   ` [PATCH net-next v2 6/7] netlink: rename private flags and states Nicolas Dichtel
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm, Nicolas Dichtel

Before this patch, nsid were protected by the rtnl lock. The goal of this
patch is to be able to find a nsid without needing to hold the rtnl lock.

The next patch will introduce a netlink socket option to listen to all
netns that have a nsid assigned into the netns where the socket is opened.
Thus, it's important to call rtnl_net_notifyid() outside the spinlock, to
avoid a recursive lock (nsid are notified via rtnl). This was the main
reason of the previous patch.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v2: no change

 net/core/net_namespace.c | 57 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 44 insertions(+), 13 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index ee864241f8d6..ae5008b097de 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -28,6 +28,7 @@
 static LIST_HEAD(pernet_list);
 static struct list_head *first_device = &pernet_list;
 DEFINE_MUTEX(net_mutex);
+static DEFINE_SPINLOCK(nsid_lock);
 
 LIST_HEAD(net_namespace_list);
 EXPORT_SYMBOL_GPL(net_namespace_list);
@@ -147,18 +148,17 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+/* should be called with nsid_lock held */
 static int alloc_netid(struct net *net, struct net *peer, int reqid)
 {
 	int min = 0, max = 0;
 
-	ASSERT_RTNL();
-
 	if (reqid >= 0) {
 		min = reqid;
 		max = reqid + 1;
 	}
 
-	return idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
+	return idr_alloc(&net->netns_ids, peer, min, max, GFP_ATOMIC);
 }
 
 /* This function is used by idr_for_each(). If net is equal to peer, the
@@ -174,13 +174,15 @@ static int net_eq_idr(int id, void *net, void *peer)
 	return 0;
 }
 
+/* Should be called with nsid_lock held. If a new id is assigned, the bool alloc
+ * is set to true, thus the caller knows that the new id must be notified via
+ * rtnl.
+ */
 static int __peernet2id_alloc(struct net *net, struct net *peer, bool *alloc)
 {
 	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
 	bool alloc_it = *alloc;
 
-	ASSERT_RTNL();
-
 	*alloc = false;
 
 	/* Magic value for id 0. */
@@ -198,6 +200,7 @@ static int __peernet2id_alloc(struct net *net, struct net *peer, bool *alloc)
 	return NETNSA_NSID_NOT_ASSIGNED;
 }
 
+/* should be called with nsid_lock held */
 static int __peernet2id(struct net *net, struct net *peer)
 {
 	bool no = false;
@@ -211,27 +214,46 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int id);
  */
 int peernet2id_alloc(struct net *net, struct net *peer)
 {
-	bool alloc = atomic_read(&peer->count) == 0 ? false : true;
+	unsigned long flags;
+	bool alloc;
 	int id;
 
+	spin_lock_irqsave(&nsid_lock, flags);
+	alloc = atomic_read(&peer->count) == 0 ? false : true;
 	id = __peernet2id_alloc(net, peer, &alloc);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 	if (alloc && id >= 0)
 		rtnl_net_notifyid(net, RTM_NEWNSID, id);
 	return id;
 }
 EXPORT_SYMBOL(peernet2id_alloc);
 
+/* This function returns, if assigned, the id of a peer netns. */
+static int peernet2id(struct net *net, struct net *peer)
+{
+	unsigned long flags;
+	int id;
+
+	spin_lock_irqsave(&nsid_lock, flags);
+	id = __peernet2id(net, peer);
+	spin_unlock_irqrestore(&nsid_lock, flags);
+	return id;
+}
+
 struct net *get_net_ns_by_id(struct net *net, int id)
 {
+	unsigned long flags;
 	struct net *peer;
 
 	if (id < 0)
 		return NULL;
 
 	rcu_read_lock();
+	spin_lock_irqsave(&nsid_lock, flags);
 	peer = idr_find(&net->netns_ids, id);
 	if (peer)
 		get_net(peer);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 	rcu_read_unlock();
 
 	return peer;
@@ -372,14 +394,19 @@ static void cleanup_net(struct work_struct *work)
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
 		for_each_net(tmp) {
-			int id = __peernet2id(tmp, net);
+			int id;
 
-			if (id >= 0) {
-				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
+			spin_lock_irq(&nsid_lock);
+			id = __peernet2id(tmp, net);
+			if (id >= 0)
 				idr_remove(&tmp->netns_ids, id);
-			}
+			spin_unlock_irq(&nsid_lock);
+			if (id >= 0)
+				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
 		}
+		spin_lock_irq(&nsid_lock);
 		idr_destroy(&net->netns_ids);
+		spin_unlock_irq(&nsid_lock);
 
 	}
 	rtnl_unlock();
@@ -507,6 +534,7 @@ static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[NETNSA_MAX + 1];
+	unsigned long flags;
 	struct net *peer;
 	int nsid, err;
 
@@ -527,12 +555,14 @@ static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
 	if (IS_ERR(peer))
 		return PTR_ERR(peer);
 
+	spin_lock_irqsave(&nsid_lock, flags);
 	if (__peernet2id(net, peer) >= 0) {
 		err = -EEXIST;
 		goto out;
 	}
 
 	err = alloc_netid(net, peer, nsid);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 	if (err >= 0) {
 		rtnl_net_notifyid(net, RTM_NEWNSID, err);
 		err = 0;
@@ -601,7 +631,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto out;
 	}
 
-	id = __peernet2id(net, peer);
+	id = peernet2id(net, peer);
 	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
 			    RTM_GETNSID, net, id);
 	if (err < 0)
@@ -654,10 +684,11 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
 		.idx = 0,
 		.s_idx = cb->args[0],
 	};
+	unsigned long flags;
 
-	ASSERT_RTNL();
-
+	spin_lock_irqsave(&nsid_lock, flags);
 	idr_for_each(&net->netns_ids, rtnl_net_dumpid_one, &net_cb);
+	spin_unlock_irqrestore(&nsid_lock, flags);
 
 	cb->args[0] = net_cb.idx;
 	return skb->len;
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 6/7] netlink: rename private flags and states
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
                     ` (4 preceding siblings ...)
  2015-05-07  9:02   ` [PATCH net-next v2 5/7] netns: use a spin_lock to protect nsid management Nicolas Dichtel
@ 2015-05-07  9:02   ` Nicolas Dichtel
  2015-05-07 11:49     ` Thomas Graf
  2015-05-07  9:02   ` [PATCH net-next v2 7/7] netlink: allow to listen "all" netns Nicolas Dichtel
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm, Nicolas Dichtel

These flags and states have the same prefix (NETLINK_) that netlink socket
options. To avoid confusion and to be able to name a flag like a socket
option, let's use an other prefix: NETLINK_[S|F]_.

Note: a comment has been fixed, it was talking about
NETLINK_RECV_NO_ENOBUFS socket option instead of NETLINK_NO_ENOBUFS.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v2: new in this series

 net/netlink/af_netlink.c | 59 ++++++++++++++++++++++++------------------------
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ec4adbdcb9b4..bf7f56d7a9aa 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -76,17 +76,17 @@ struct listeners {
 };
 
 /* state bits */
-#define NETLINK_CONGESTED	0x0
+#define NETLINK_S_CONGESTED		0x0
 
 /* flags */
-#define NETLINK_KERNEL_SOCKET	0x1
-#define NETLINK_RECV_PKTINFO	0x2
-#define NETLINK_BROADCAST_SEND_ERROR	0x4
-#define NETLINK_RECV_NO_ENOBUFS	0x8
+#define NETLINK_F_KERNEL_SOCKET		0x1
+#define NETLINK_F_RECV_PKTINFO		0x2
+#define NETLINK_F_BROADCAST_SEND_ERROR	0x4
+#define NETLINK_F_RECV_NO_ENOBUFS	0x8
 
 static inline int netlink_is_kernel(struct sock *sk)
 {
-	return nlk_sk(sk)->flags & NETLINK_KERNEL_SOCKET;
+	return nlk_sk(sk)->flags & NETLINK_F_KERNEL_SOCKET;
 }
 
 struct netlink_table *nl_table;
@@ -256,8 +256,9 @@ static void netlink_overrun(struct sock *sk)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
 
-	if (!(nlk->flags & NETLINK_RECV_NO_ENOBUFS)) {
-		if (!test_and_set_bit(NETLINK_CONGESTED, &nlk_sk(sk)->state)) {
+	if (!(nlk->flags & NETLINK_F_RECV_NO_ENOBUFS)) {
+		if (!test_and_set_bit(NETLINK_S_CONGESTED,
+				      &nlk_sk(sk)->state)) {
 			sk->sk_err = ENOBUFS;
 			sk->sk_error_report(sk);
 		}
@@ -270,8 +271,8 @@ static void netlink_rcv_wake(struct sock *sk)
 	struct netlink_sock *nlk = nlk_sk(sk);
 
 	if (skb_queue_empty(&sk->sk_receive_queue))
-		clear_bit(NETLINK_CONGESTED, &nlk->state);
-	if (!test_bit(NETLINK_CONGESTED, &nlk->state))
+		clear_bit(NETLINK_S_CONGESTED, &nlk->state);
+	if (!test_bit(NETLINK_S_CONGESTED, &nlk->state))
 		wake_up_interruptible(&nlk->wait);
 }
 
@@ -1656,7 +1657,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
 	nlk = nlk_sk(sk);
 
 	if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
-	     test_bit(NETLINK_CONGESTED, &nlk->state)) &&
+	     test_bit(NETLINK_S_CONGESTED, &nlk->state)) &&
 	    !netlink_skb_is_mmaped(skb)) {
 		DECLARE_WAITQUEUE(wait, current);
 		if (!*timeo) {
@@ -1671,7 +1672,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
 		add_wait_queue(&nlk->wait, &wait);
 
 		if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
-		     test_bit(NETLINK_CONGESTED, &nlk->state)) &&
+		     test_bit(NETLINK_S_CONGESTED, &nlk->state)) &&
 		    !sock_flag(sk, SOCK_DEAD))
 			*timeo = schedule_timeout(*timeo);
 
@@ -1895,7 +1896,7 @@ static int netlink_broadcast_deliver(struct sock *sk, struct sk_buff *skb)
 	struct netlink_sock *nlk = nlk_sk(sk);
 
 	if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
-	    !test_bit(NETLINK_CONGESTED, &nlk->state)) {
+	    !test_bit(NETLINK_S_CONGESTED, &nlk->state)) {
 		netlink_skb_set_owner_r(skb, sk);
 		__netlink_sendskb(sk, skb);
 		return atomic_read(&sk->sk_rmem_alloc) > (sk->sk_rcvbuf >> 1);
@@ -1956,7 +1957,7 @@ static void do_one_broadcast(struct sock *sk,
 		netlink_overrun(sk);
 		/* Clone failed. Notify ALL listeners. */
 		p->failure = 1;
-		if (nlk->flags & NETLINK_BROADCAST_SEND_ERROR)
+		if (nlk->flags & NETLINK_F_BROADCAST_SEND_ERROR)
 			p->delivery_failure = 1;
 	} else if (p->tx_filter && p->tx_filter(sk, p->skb2, p->tx_data)) {
 		kfree_skb(p->skb2);
@@ -1966,7 +1967,7 @@ static void do_one_broadcast(struct sock *sk,
 		p->skb2 = NULL;
 	} else if ((val = netlink_broadcast_deliver(sk, p->skb2)) < 0) {
 		netlink_overrun(sk);
-		if (nlk->flags & NETLINK_BROADCAST_SEND_ERROR)
+		if (nlk->flags & NETLINK_F_BROADCAST_SEND_ERROR)
 			p->delivery_failure = 1;
 	} else {
 		p->congested |= val;
@@ -2057,7 +2058,7 @@ static int do_one_set_err(struct sock *sk, struct netlink_set_err_data *p)
 	    !test_bit(p->group - 1, nlk->groups))
 		goto out;
 
-	if (p->code == ENOBUFS && nlk->flags & NETLINK_RECV_NO_ENOBUFS) {
+	if (p->code == ENOBUFS && nlk->flags & NETLINK_F_RECV_NO_ENOBUFS) {
 		ret = 1;
 		goto out;
 	}
@@ -2076,7 +2077,7 @@ out:
  * @code: error code, must be negative (as usual in kernelspace)
  *
  * This function returns the number of broadcast listeners that have set the
- * NETLINK_RECV_NO_ENOBUFS socket option.
+ * NETLINK_NO_ENOBUFS socket option.
  */
 int netlink_set_err(struct sock *ssk, u32 portid, u32 group, int code)
 {
@@ -2136,9 +2137,9 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 	switch (optname) {
 	case NETLINK_PKTINFO:
 		if (val)
-			nlk->flags |= NETLINK_RECV_PKTINFO;
+			nlk->flags |= NETLINK_F_RECV_PKTINFO;
 		else
-			nlk->flags &= ~NETLINK_RECV_PKTINFO;
+			nlk->flags &= ~NETLINK_F_RECV_PKTINFO;
 		err = 0;
 		break;
 	case NETLINK_ADD_MEMBERSHIP:
@@ -2167,18 +2168,18 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 	}
 	case NETLINK_BROADCAST_ERROR:
 		if (val)
-			nlk->flags |= NETLINK_BROADCAST_SEND_ERROR;
+			nlk->flags |= NETLINK_F_BROADCAST_SEND_ERROR;
 		else
-			nlk->flags &= ~NETLINK_BROADCAST_SEND_ERROR;
+			nlk->flags &= ~NETLINK_F_BROADCAST_SEND_ERROR;
 		err = 0;
 		break;
 	case NETLINK_NO_ENOBUFS:
 		if (val) {
-			nlk->flags |= NETLINK_RECV_NO_ENOBUFS;
-			clear_bit(NETLINK_CONGESTED, &nlk->state);
+			nlk->flags |= NETLINK_F_RECV_NO_ENOBUFS;
+			clear_bit(NETLINK_S_CONGESTED, &nlk->state);
 			wake_up_interruptible(&nlk->wait);
 		} else {
-			nlk->flags &= ~NETLINK_RECV_NO_ENOBUFS;
+			nlk->flags &= ~NETLINK_F_RECV_NO_ENOBUFS;
 		}
 		err = 0;
 		break;
@@ -2227,7 +2228,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
 		if (len < sizeof(int))
 			return -EINVAL;
 		len = sizeof(int);
-		val = nlk->flags & NETLINK_RECV_PKTINFO ? 1 : 0;
+		val = nlk->flags & NETLINK_F_RECV_PKTINFO ? 1 : 0;
 		if (put_user(len, optlen) ||
 		    put_user(val, optval))
 			return -EFAULT;
@@ -2237,7 +2238,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
 		if (len < sizeof(int))
 			return -EINVAL;
 		len = sizeof(int);
-		val = nlk->flags & NETLINK_BROADCAST_SEND_ERROR ? 1 : 0;
+		val = nlk->flags & NETLINK_F_BROADCAST_SEND_ERROR ? 1 : 0;
 		if (put_user(len, optlen) ||
 		    put_user(val, optval))
 			return -EFAULT;
@@ -2247,7 +2248,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
 		if (len < sizeof(int))
 			return -EINVAL;
 		len = sizeof(int);
-		val = nlk->flags & NETLINK_RECV_NO_ENOBUFS ? 1 : 0;
+		val = nlk->flags & NETLINK_F_RECV_NO_ENOBUFS ? 1 : 0;
 		if (put_user(len, optlen) ||
 		    put_user(val, optval))
 			return -EFAULT;
@@ -2418,7 +2419,7 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		msg->msg_namelen = sizeof(*addr);
 	}
 
-	if (nlk->flags & NETLINK_RECV_PKTINFO)
+	if (nlk->flags & NETLINK_F_RECV_PKTINFO)
 		netlink_cmsg_recv_pktinfo(msg, skb);
 
 	memset(&scm, 0, sizeof(scm));
@@ -2502,7 +2503,7 @@ __netlink_kernel_create(struct net *net, int unit, struct module *module,
 		goto out_sock_release;
 
 	nlk = nlk_sk(sk);
-	nlk->flags |= NETLINK_KERNEL_SOCKET;
+	nlk->flags |= NETLINK_F_KERNEL_SOCKET;
 
 	netlink_table_grab();
 	if (!nl_table[unit].registered) {
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH net-next v2 7/7] netlink: allow to listen "all" netns
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
                     ` (5 preceding siblings ...)
  2015-05-07  9:02   ` [PATCH net-next v2 6/7] netlink: rename private flags and states Nicolas Dichtel
@ 2015-05-07  9:02   ` Nicolas Dichtel
  2015-05-07 11:55     ` Thomas Graf
  2015-05-08 12:02   ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Eric W. Biederman
  2015-05-10  2:15   ` David Miller
  8 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-07  9:02 UTC (permalink / raw)
  To: netdev; +Cc: tgraf, davem, ebiederm, Nicolas Dichtel

More accurately, listen all netns that have a nsid assigned into the netns
where the netlink socket is opened.
For this purpose, a netlink socket option is added:
NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
socket will receive netlink notifications from all netns that have a nsid
assigned into the netns where the socket has been opened. The nsid is sent
to userland via an anscillary data.

With this patch, a daemon needs only one socket to listen many netns. This
is useful when the number of netns is high.

Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
the field nsid is valid or not. skb->cb is initialized to 0 on skb
allocation, thus we are sure that we will never send a nsid 0 by error to
the userland.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v2: put the nsid into the skb cb instead of the net pointer
    rename NETLINK_LISTEN_ALL to NETLINK_F_LISTEN_ALL_NSID

 include/linux/netlink.h      |  2 ++
 include/net/net_namespace.h  |  2 ++
 include/uapi/linux/netlink.h |  1 +
 net/core/net_namespace.c     | 10 ++++++++-
 net/netlink/af_netlink.c     | 52 +++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 6835c1279df7..9120edb650a0 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -28,6 +28,8 @@ struct netlink_skb_parms {
 	__u32			dst_group;
 	__u32			flags;
 	struct sock		*sk;
+	bool			nsid_is_set;
+	int			nsid;
 };
 
 #define NETLINK_CB(skb)		(*(struct netlink_skb_parms*)&((skb)->cb))
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 6d1e2eae32fb..3f850acc844e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -272,6 +272,8 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #endif
 
 int peernet2id_alloc(struct net *net, struct net *peer);
+int peernet2id(struct net *net, struct net *peer);
+bool peernet_has_id(struct net *net, struct net *peer);
 struct net *get_net_ns_by_id(struct net *net, int id);
 
 struct pernet_operations {
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 1a85940f8ab7..3e34b7d702f8 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -108,6 +108,7 @@ struct nlmsgerr {
 #define NETLINK_NO_ENOBUFS	5
 #define NETLINK_RX_RING		6
 #define NETLINK_TX_RING		7
+#define NETLINK_LISTEN_ALL_NSID	8
 
 struct nl_pktinfo {
 	__u32	group;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index ae5008b097de..a665bf490c88 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -229,7 +229,7 @@ int peernet2id_alloc(struct net *net, struct net *peer)
 EXPORT_SYMBOL(peernet2id_alloc);
 
 /* This function returns, if assigned, the id of a peer netns. */
-static int peernet2id(struct net *net, struct net *peer)
+int peernet2id(struct net *net, struct net *peer)
 {
 	unsigned long flags;
 	int id;
@@ -240,6 +240,14 @@ static int peernet2id(struct net *net, struct net *peer)
 	return id;
 }
 
+/* This function returns true is the peer netns has an id assigned into the
+ * current netns.
+ */
+bool peernet_has_id(struct net *net, struct net *peer)
+{
+	return peernet2id(net, peer) >= 0;
+}
+
 struct net *get_net_ns_by_id(struct net *net, int id)
 {
 	unsigned long flags;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index bf7f56d7a9aa..a5fff75accf8 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -83,6 +83,7 @@ struct listeners {
 #define NETLINK_F_RECV_PKTINFO		0x2
 #define NETLINK_F_BROADCAST_SEND_ERROR	0x4
 #define NETLINK_F_RECV_NO_ENOBUFS	0x8
+#define NETLINK_F_LISTEN_ALL_NSID	0x10
 
 static inline int netlink_is_kernel(struct sock *sk)
 {
@@ -1932,8 +1933,17 @@ static void do_one_broadcast(struct sock *sk,
 	    !test_bit(p->group - 1, nlk->groups))
 		return;
 
-	if (!net_eq(sock_net(sk), p->net))
-		return;
+	if (!net_eq(sock_net(sk), p->net)) {
+		if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
+			return;
+
+		if (!peernet_has_id(sock_net(sk), p->net))
+			return;
+
+		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
+				     CAP_NET_BROADCAST))
+			return;
+	}
 
 	if (p->failure) {
 		netlink_overrun(sk);
@@ -1959,13 +1969,22 @@ static void do_one_broadcast(struct sock *sk,
 		p->failure = 1;
 		if (nlk->flags & NETLINK_F_BROADCAST_SEND_ERROR)
 			p->delivery_failure = 1;
-	} else if (p->tx_filter && p->tx_filter(sk, p->skb2, p->tx_data)) {
+		goto out;
+	}
+	if (p->tx_filter && p->tx_filter(sk, p->skb2, p->tx_data)) {
 		kfree_skb(p->skb2);
 		p->skb2 = NULL;
-	} else if (sk_filter(sk, p->skb2)) {
+		goto out;
+	}
+	if (sk_filter(sk, p->skb2)) {
 		kfree_skb(p->skb2);
 		p->skb2 = NULL;
-	} else if ((val = netlink_broadcast_deliver(sk, p->skb2)) < 0) {
+		goto out;
+	}
+	NETLINK_CB(p->skb2).nsid = peernet2id(sock_net(sk), p->net);
+	NETLINK_CB(p->skb2).nsid_is_set = true;
+	val = netlink_broadcast_deliver(sk, p->skb2);
+	if (val < 0) {
 		netlink_overrun(sk);
 		if (nlk->flags & NETLINK_F_BROADCAST_SEND_ERROR)
 			p->delivery_failure = 1;
@@ -1974,6 +1993,7 @@ static void do_one_broadcast(struct sock *sk,
 		p->delivered = 1;
 		p->skb2 = NULL;
 	}
+out:
 	sock_put(sk);
 }
 
@@ -2202,6 +2222,16 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		break;
 	}
 #endif /* CONFIG_NETLINK_MMAP */
+	case NETLINK_LISTEN_ALL_NSID:
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_BROADCAST))
+			return -EPERM;
+
+		if (val)
+			nlk->flags |= NETLINK_F_LISTEN_ALL_NSID;
+		else
+			nlk->flags &= ~NETLINK_F_LISTEN_ALL_NSID;
+		err = 0;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 	}
@@ -2268,6 +2298,16 @@ static void netlink_cmsg_recv_pktinfo(struct msghdr *msg, struct sk_buff *skb)
 	put_cmsg(msg, SOL_NETLINK, NETLINK_PKTINFO, sizeof(info), &info);
 }
 
+static void netlink_cmsg_listen_all_nsid(struct sock *sk, struct msghdr *msg,
+					 struct sk_buff *skb)
+{
+	if (!NETLINK_CB(skb).nsid_is_set)
+		return;
+
+	put_cmsg(msg, SOL_NETLINK, NETLINK_LISTEN_ALL_NSID, sizeof(int),
+		 &NETLINK_CB(skb).nsid);
+}
+
 static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 {
 	struct sock *sk = sock->sk;
@@ -2421,6 +2461,8 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	if (nlk->flags & NETLINK_F_RECV_PKTINFO)
 		netlink_cmsg_recv_pktinfo(msg, skb);
+	if (nlk->flags & NETLINK_F_LISTEN_ALL_NSID)
+		netlink_cmsg_listen_all_nsid(sk, msg, skb);
 
 	memset(&scm, 0, sizeof(scm));
 	scm.creds = *NETLINK_CREDS(skb);
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 4/7] netns: notify new nsid outside __peernet2id()
  2015-05-07  9:02   ` [PATCH net-next v2 4/7] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
@ 2015-05-07 11:47     ` Thomas Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Graf @ 2015-05-07 11:47 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/07/15 at 11:02am, Nicolas Dichtel wrote:
> There is no functional change with this patch. It will ease the refactoring
> of the locking system that protects nsids and the support of the netlink
> socket option NETLINK_LISTEN_ALL_NSID.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Looks much cleaner this way, thanks.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 6/7] netlink: rename private flags and states
  2015-05-07  9:02   ` [PATCH net-next v2 6/7] netlink: rename private flags and states Nicolas Dichtel
@ 2015-05-07 11:49     ` Thomas Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Graf @ 2015-05-07 11:49 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/07/15 at 11:02am, Nicolas Dichtel wrote:
> These flags and states have the same prefix (NETLINK_) that netlink socket
> options. To avoid confusion and to be able to name a flag like a socket
> option, let's use an other prefix: NETLINK_[S|F]_.
> 
> Note: a comment has been fixed, it was talking about
> NETLINK_RECV_NO_ENOBUFS socket option instead of NETLINK_NO_ENOBUFS.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 7/7] netlink: allow to listen "all" netns
  2015-05-07  9:02   ` [PATCH net-next v2 7/7] netlink: allow to listen "all" netns Nicolas Dichtel
@ 2015-05-07 11:55     ` Thomas Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Graf @ 2015-05-07 11:55 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, ebiederm

On 05/07/15 at 11:02am, Nicolas Dichtel wrote:
> More accurately, listen all netns that have a nsid assigned into the netns
> where the netlink socket is opened.
> For this purpose, a netlink socket option is added:
> NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
> socket will receive netlink notifications from all netns that have a nsid
> assigned into the netns where the socket has been opened. The nsid is sent
> to userland via an anscillary data.
> 
> With this patch, a daemon needs only one socket to listen many netns. This
> is useful when the number of netns is high.
> 
> Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
> the field nsid is valid or not. skb->cb is initialized to 0 on skb
> allocation, thus we are sure that we will never send a nsid 0 by error to
> the userland.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

LGTM, nice work

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
                     ` (6 preceding siblings ...)
  2015-05-07  9:02   ` [PATCH net-next v2 7/7] netlink: allow to listen "all" netns Nicolas Dichtel
@ 2015-05-08 12:02   ` Eric W. Biederman
  2015-05-09 21:07     ` Nicolas Dichtel
  2015-05-22 20:50     ` Alexander Holler
  2015-05-10  2:15   ` David Miller
  8 siblings, 2 replies; 52+ messages in thread
From: Eric W. Biederman @ 2015-05-08 12:02 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, tgraf, davem


So I am dense.  I have read through the patches and I don't see where
you tag packets from other network namespaces with a network namespace
id.

In fact I don't even see an attribute that is approrpriate for such
tagging.

Your comment below indicates such tagging is taking place but I am dense
and I am not seeing it.

Eric

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> This idea was informally discussed in Ottawa / netdev0.1. The goal is to
> ease the use/scalability of netns, from a userland point of view.
> Today, users need to open one netlink socket per family and per netns.
> Thus, when the number of netns inscreases (for example 5K or more), the
> number of sockets needed to manage them grows a lot.
>
> The goal of this series is to be able to monitor netlink events, for a
> specified family, for a set of netns, with only one netlink socket. For
> this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID.
> When this option is set on a netlink socket, this socket will receive
> netlink notifications from all netns that have a nsid assigned into the
> netns where the socket has been opened.
> The nsid is sent to userland via an anscillary data.
>
> Here is an example with a patched iproute2. vxlan10 is created in the
> current netns (netns0, nsid 0) and then moved to another netns (netns1,
> nsid 1):
>
> $ ip netns exec netns0 ip monitor all-nsid label
> [nsid 0][NSID]nsid 1 (iproute2 netns name: netns1)
> [nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT
> [nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
>     link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
> [nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
>     link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
> [nsid 1][NSID]nsid 0 (iproute2 netns name: netns0)
> [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
>     link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
> [nsid 1][ADDR]5: vxlan10    inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10
>        valid_lft forever preferred_lft forever
> [nsid 1][ROUTE]local 192.168.0.249 dev vxlan10  table local  proto kernel  scope host  src 192.168.0.249 
> [nsid 1][ROUTE]ff00::/8 dev vxlan10  table local  metric 256  pref medium
> [nsid 1][ROUTE]2001:123::/64 dev vxlan10  proto kernel  metric 256  pref medium
> [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
>     link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
> [nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
> [nsid 1][ROUTE]192.168.0.0/24 dev vxlan10  proto kernel  scope link  src 192.168.0.249 
> [nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
> [nsid 1][ROUTE]fe80::/64 dev vxlan10  proto kernel  metric 256  pref medium
>
>
>  drivers/net/vxlan.c          |   2 +-
>  include/linux/netlink.h      |   2 +
>  include/net/net_namespace.h  |   2 +
>  include/uapi/linux/netlink.h |   1 +
>  net/core/net_namespace.c     | 132 ++++++++++++++++++++++++++++---------------
>  net/core/rtnetlink.c         |   2 +-
>  net/netlink/af_netlink.c     | 111 +++++++++++++++++++++++++-----------
>  7 files changed, 170 insertions(+), 82 deletions(-)
>
>
> Comments are welcome.
>
> Regards,
> Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-08 12:02   ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Eric W. Biederman
@ 2015-05-09 21:07     ` Nicolas Dichtel
  2015-05-22 20:50     ` Alexander Holler
  1 sibling, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-09 21:07 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, tgraf, davem

Le 08/05/2015 14:02, Eric W. Biederman a écrit :
>
> So I am dense.  I have read through the patches and I don't see where
> you tag packets from other network namespaces with a network namespace
> id.
In patch #7, the netns id is put in the skb cb (NETLINK_CB) in the function
do_one_broadcast().
The function netlink_cmsg_listen_all_nsid() will send this information to the
userland via a cmsg data. This function is called by netlink_recvmsg().

> In fact I don't even see an attribute that is approrpriate for such
> tagging.
The netns id is the same attribute that allows to identify a peer netns
via rtnetlink.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
                     ` (7 preceding siblings ...)
  2015-05-08 12:02   ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Eric W. Biederman
@ 2015-05-10  2:15   ` David Miller
  2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
  8 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2015-05-10  2:15 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, tgraf, ebiederm

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu,  7 May 2015 11:02:46 +0200

> This idea was informally discussed in Ottawa / netdev0.1. The goal is to
> ease the use/scalability of netns, from a userland point of view.
> Today, users need to open one netlink socket per family and per netns.
> Thus, when the number of netns inscreases (for example 5K or more), the
> number of sockets needed to manage them grows a lot.
> 
> The goal of this series is to be able to monitor netlink events, for a
> specified family, for a set of netns, with only one netlink socket. For
> this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID.
> When this option is set on a netlink socket, this socket will receive
> netlink notifications from all netns that have a nsid assigned into the
> netns where the socket has been opened.
> The nsid is sent to userland via an anscillary data.

Series applied, thanks Nicolas.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm
  2015-05-10  2:15   ` David Miller
@ 2015-05-20 14:19     ` Nicolas Dichtel
  2015-05-20 14:19       ` [PATCH iproute2-next 1/6] include: update linux/netlink.h Nicolas Dichtel
                         ` (5 more replies)
  0 siblings, 6 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-20 14:19 UTC (permalink / raw)
  To: shemminger; +Cc: netdev


The goal of this series is to take advantage of the new netlink socket option:
NETLINK_LISTEN_ALL_NSID.

Here is an output example. vxlan10 is created in the current netns (netns0,
nsid 0) and then moved to another netns (netns1, nsid 1):

$ ip netns exec netns0 ip monitor all-nsid label
[nsid 0][NSID]nsid 1 (iproute2 netns name: netns1)
[nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT
[nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
[nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
[nsid 1][NSID]nsid 0 (iproute2 netns name: netns0)
[nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
[nsid 1][ADDR]5: vxlan10    inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10
       valid_lft forever preferred_lft forever
[nsid 1][ROUTE]local 192.168.0.249 dev vxlan10  table local  proto kernel  scope host  src 192.168.0.249 
[nsid 1][ROUTE]ff00::/8 dev vxlan10  table local  metric 256  pref medium
[nsid 1][ROUTE]2001:123::/64 dev vxlan10  proto kernel  metric 256  pref medium
[nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
[nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]192.168.0.0/24 dev vxlan10  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249 
[nsid 1][ROUTE]fe80::/64 dev vxlan10  proto kernel  metric 256  pref medium


 bridge/monitor.c        |  1 +
 genl/ctrl.c             | 17 +++++++++----
 include/libnetlink.h    | 15 ++++++++++--
 include/linux/netlink.h |  1 +
 ip/ip_common.h          |  1 +
 ip/ipaddress.c          |  8 ++++--
 ip/iplink.c             |  1 +
 ip/ipmonitor.c          | 65 +++++++++++++++++++++++++++----------------------
 ip/ipnetconf.c          | 11 +++++++--
 ip/ipnetns.c            |  1 +
 ip/iproute.c            |  9 ++++---
 ip/rtmon.c              | 12 ++++++---
 ip/xfrm_monitor.c       | 15 +++++++++++-
 lib/libnetlink.c        | 49 ++++++++++++++++++++++++++++++++++---
 man/man3/libnetlink.3   |  7 +++---
 man/man8/ip-monitor.8   | 36 +++++++++++++++++++++++++++
 man/man8/ip-xfrm.8      | 21 +++++++++++++++-
 tc/tc_monitor.c         |  1 +
 18 files changed, 216 insertions(+), 55 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH iproute2-next 1/6] include: update linux/netlink.h
  2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
@ 2015-05-20 14:19       ` Nicolas Dichtel
  2015-05-20 14:19       ` [PATCH iproute2-next 2/6] man: update ip monitor page Nicolas Dichtel
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-20 14:19 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/netlink.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index e0a09df1a687..0c89ddd73f50 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -108,6 +108,7 @@ struct nlmsgerr {
 #define NETLINK_NO_ENOBUFS	5
 #define NETLINK_RX_RING		6
 #define NETLINK_TX_RING		7
+#define NETLINK_LISTEN_ALL_NSID	8
 
 struct nl_pktinfo {
 	__u32	group;
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH iproute2-next 2/6] man: update ip monitor page
  2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
  2015-05-20 14:19       ` [PATCH iproute2-next 1/6] include: update linux/netlink.h Nicolas Dichtel
@ 2015-05-20 14:19       ` Nicolas Dichtel
  2015-05-20 14:19       ` [PATCH iproute2-next 3/6] libnetlink: introduce rtnl_listen_filter_t Nicolas Dichtel
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-20 14:19 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

Add label option.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 man/man8/ip-monitor.8 | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/man/man8/ip-monitor.8 b/man/man8/ip-monitor.8
index 9663349fe9c7..3123b4014cc2 100644
--- a/man/man8/ip-monitor.8
+++ b/man/man8/ip-monitor.8
@@ -12,6 +12,8 @@ ip-monitor, rtmon \- state monitoring
 .IR OBJECT-LIST " ] ["
 .BI file " FILENAME "
 ] [
+.BI label
+] [
 .BI dev " DEVICE "
 ]
 .sp
@@ -42,6 +44,8 @@ command is the first in the command line and then the object list follows:
 .IR OBJECT-LIST " ] ["
 .BI file " FILENAME "
 ] [
+.BI label
+] [
 .BI dev " DEVICE "
 ]
 
@@ -59,6 +63,19 @@ described in previous sections.
 
 .P
 If the
+.BI label
+option is set, a prefix is displayed before each message to
+show the family of the message. For example:
+.sp
+.in +2
+[NEIGH]10.16.0.112 dev eth0 lladdr 00:04:23:df:2f:d0 REACHABLE
+[LINK]3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default
+    link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
+.in -2
+.sp
+
+.P
+If the
 .BI file
 option is given, the program does not listen on RTNETLINK,
 but opens the given file, and dumps its contents. The file
@@ -97,3 +114,5 @@ option is given, the program prints only events related to this device.
 
 .SH AUTHOR
 Original Manpage by Michail Litvak <mci@owl.openwall.com>
+.br
+Manpage revised by Nicolas Dichtel <nicolas.dichtel@6wind.com>
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH iproute2-next 3/6] libnetlink: introduce rtnl_listen_filter_t
  2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
  2015-05-20 14:19       ` [PATCH iproute2-next 1/6] include: update linux/netlink.h Nicolas Dichtel
  2015-05-20 14:19       ` [PATCH iproute2-next 2/6] man: update ip monitor page Nicolas Dichtel
@ 2015-05-20 14:19       ` Nicolas Dichtel
  2015-05-20 14:19       ` [PATCH iproute2-next 4/6] ipmonitor: introduce print_headers Nicolas Dichtel
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-20 14:19 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

There is no functional change with this commit. It only prepares the next one.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 bridge/monitor.c      |  1 +
 genl/ctrl.c           | 17 ++++++++++++-----
 include/libnetlink.h  | 11 +++++++++--
 ip/ip_common.h        |  1 +
 ip/ipaddress.c        |  8 ++++++--
 ip/iplink.c           |  1 +
 ip/ipmonitor.c        |  3 ++-
 ip/ipnetconf.c        | 11 +++++++++--
 ip/ipnetns.c          |  1 +
 ip/iproute.c          |  9 ++++++---
 ip/rtmon.c            | 12 +++++++++---
 ip/xfrm_monitor.c     |  1 +
 lib/libnetlink.c      |  8 ++++----
 man/man3/libnetlink.3 |  7 ++++---
 tc/tc_monitor.c       |  1 +
 15 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/bridge/monitor.c b/bridge/monitor.c
index 9e1ed48c5141..d8341ec5fbf1 100644
--- a/bridge/monitor.c
+++ b/bridge/monitor.c
@@ -36,6 +36,7 @@ static void usage(void)
 }
 
 static int accept_msg(const struct sockaddr_nl *who,
+		      struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
diff --git a/genl/ctrl.c b/genl/ctrl.c
index 3546129087ec..87d2334a084c 100644
--- a/genl/ctrl.c
+++ b/genl/ctrl.c
@@ -177,8 +177,9 @@ static int print_ctrl_grp(FILE *fp, struct rtattr *arg, __u32 ctrl_ver)
 /*
  * The controller sends one nlmsg per family
 */
-static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		      void *arg)
+static int print_ctrl(const struct sockaddr_nl *who,
+		      struct rtnl_ctrl_data *ctrl,
+		      struct nlmsghdr *n, void *arg)
 {
 	struct rtattr *tb[CTRL_ATTR_MAX + 1];
 	struct genlmsghdr *ghdr = NLMSG_DATA(n);
@@ -281,6 +282,12 @@ static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
 	return 0;
 }
 
+static int print_ctrl2(const struct sockaddr_nl *who,
+		      struct nlmsghdr *n, void *arg)
+{
+	return print_ctrl(who, NULL, n, arg);
+}
+
 static int ctrl_list(int cmd, int argc, char **argv)
 {
 	struct rtnl_handle rth;
@@ -339,7 +346,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
 			goto ctrl_done;
 		}
 
-		if (print_ctrl(NULL, nlh, (void *) stdout) < 0) {
+		if (print_ctrl2(NULL, nlh, (void *) stdout) < 0) {
 			fprintf(stderr, "Dump terminated\n");
 			goto ctrl_done;
 		}
@@ -355,7 +362,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
 			goto ctrl_done;
 		}
 
-		rtnl_dump_filter(&rth, print_ctrl, stdout);
+		rtnl_dump_filter(&rth, print_ctrl2, stdout);
 
         }
 
@@ -408,5 +415,5 @@ static int parse_ctrl(struct genl_util *a, int argc, char **argv)
 struct genl_util ctrl_genl_util = {
 	.name = "ctrl",
 	.parse_genlopt = parse_ctrl,
-	.print_genlopt = print_ctrl,
+	.print_genlopt = print_ctrl2,
 };
diff --git a/include/libnetlink.h b/include/libnetlink.h
index 898275b824d4..1b9c9255ce1d 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -41,9 +41,16 @@ extern int rtnl_dump_request(struct rtnl_handle *rth, int type, void *req,
 			     int len)
 	__attribute__((warn_unused_result));
 
+struct rtnl_ctrl_data {
+};
+
 typedef int (*rtnl_filter_t)(const struct sockaddr_nl *,
 			     struct nlmsghdr *n, void *);
 
+typedef int (*rtnl_listen_filter_t)(const struct sockaddr_nl *,
+				    struct rtnl_ctrl_data *,
+				    struct nlmsghdr *n, void *);
+
 struct rtnl_dump_filter_arg
 {
 	rtnl_filter_t filter;
@@ -118,9 +125,9 @@ static inline const char *rta_getattr_str(const struct rtattr *rta)
 	return (const char *)RTA_DATA(rta);
 }
 
-extern int rtnl_listen(struct rtnl_handle *, rtnl_filter_t handler,
+extern int rtnl_listen(struct rtnl_handle *, rtnl_listen_filter_t handler,
 		       void *jarg);
-extern int rtnl_from_file(FILE *, rtnl_filter_t handler,
+extern int rtnl_from_file(FILE *, rtnl_listen_filter_t handler,
 		       void *jarg);
 
 #define NLMSG_TAIL(nmsg) \
diff --git a/ip/ip_common.h b/ip/ip_common.h
index b082734d9e0c..f120f5b97143 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -33,6 +33,7 @@ extern int print_prefix(const struct sockaddr_nl *who,
 extern int print_rule(const struct sockaddr_nl *who,
 		      struct nlmsghdr *n, void *arg);
 extern int print_netconf(const struct sockaddr_nl *who,
+			 struct rtnl_ctrl_data *ctrl,
 			 struct nlmsghdr *n, void *arg);
 extern void netns_map_init(void);
 extern int print_nsid(const struct sockaddr_nl *who,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 92afa4904917..f36ccfb0d0fa 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1111,7 +1111,9 @@ static int save_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 	return ret == n->nlmsg_len ? 0 : ret;
 }
 
-static int show_handler(const struct sockaddr_nl *nl, struct nlmsghdr *n, void *arg)
+static int show_handler(const struct sockaddr_nl *nl,
+			struct rtnl_ctrl_data *ctrl,
+			struct nlmsghdr *n, void *arg)
 {
 	struct ifaddrmsg *ifa = NLMSG_DATA(n);
 
@@ -1128,7 +1130,9 @@ static int ipaddr_showdump(void)
 	exit(rtnl_from_file(stdin, &show_handler, NULL));
 }
 
-static int restore_handler(const struct sockaddr_nl *nl, struct nlmsghdr *n, void *arg)
+static int restore_handler(const struct sockaddr_nl *nl,
+			   struct rtnl_ctrl_data *ctrl,
+			   struct nlmsghdr *n, void *arg)
 {
 	int ret;
 
diff --git a/ip/iplink.c b/ip/iplink.c
index bb437b96239a..79a86011eb2c 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -180,6 +180,7 @@ static int get_addr_gen_mode(const char *mode)
 static int have_rtnl_newlink = -1;
 
 static int accept_msg(const struct sockaddr_nl *who,
+		      struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
 {
 	struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(n);
diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index 1205ee1c7039..27bbe4410644 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -38,6 +38,7 @@ static void usage(void)
 }
 
 static int accept_msg(const struct sockaddr_nl *who,
+		      struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE*)arg;
@@ -122,7 +123,7 @@ static int accept_msg(const struct sockaddr_nl *who,
 	if (n->nlmsg_type == RTM_NEWNETCONF) {
 		if (prefix_banner)
 			fprintf(fp, "[NETCONF]");
-		print_netconf(who, n, arg);
+		print_netconf(who, ctrl, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == NLMSG_TSTAMP) {
diff --git a/ip/ipnetconf.c b/ip/ipnetconf.c
index aa31ead06863..eca6eeee834d 100644
--- a/ip/ipnetconf.c
+++ b/ip/ipnetconf.c
@@ -40,7 +40,8 @@ static void usage(void)
 
 #define NETCONF_RTA(r)	((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct netconfmsg))))
 
-int print_netconf(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_netconf(const struct sockaddr_nl *who, struct rtnl_ctrl_data *ctrl,
+		  struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE*)arg;
 	struct netconfmsg *ncm = NLMSG_DATA(n);
@@ -123,6 +124,12 @@ int print_netconf(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	return 0;
 }
 
+static int print_netconf2(const struct sockaddr_nl *who,
+			  struct nlmsghdr *n, void *arg)
+{
+	return print_netconf(who, NULL, n, arg);
+}
+
 void ipnetconf_reset_filter(int ifindex)
 {
 	memset(&filter, 0, sizeof(filter));
@@ -177,7 +184,7 @@ dump:
 			perror("Cannot send dump request");
 			exit(1);
 		}
-		if (rtnl_dump_filter(&rth, print_netconf, stdout) < 0) {
+		if (rtnl_dump_filter(&rth, print_netconf2, stdout) < 0) {
 			fprintf(stderr, "Dump terminated\n");
 			exit(1);
 		}
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 438d59bc222e..019f954cc6f2 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -43,6 +43,7 @@ static struct rtnl_handle rtnsh = { .fd = -1 };
 static int have_rtnl_getnsid = -1;
 
 static int ipnetns_accept_msg(const struct sockaddr_nl *who,
+			      struct rtnl_ctrl_data *ctrl,
 			      struct nlmsghdr *n, void *arg)
 {
 	struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(n);
diff --git a/ip/iproute.c b/ip/iproute.c
index 670a4c64d235..8bca11a8132a 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -1681,8 +1681,9 @@ static int iproute_get(int argc, char **argv)
 	exit(0);
 }
 
-static int restore_handler(const struct sockaddr_nl *nl, struct nlmsghdr *n,
-			   void *arg)
+static int restore_handler(const struct sockaddr_nl *nl,
+			   struct rtnl_ctrl_data *ctrl,
+			   struct nlmsghdr *n, void *arg)
 {
 	int ret;
 
@@ -1724,7 +1725,9 @@ static int iproute_restore(void)
 	exit(rtnl_from_file(stdin, &restore_handler, NULL));
 }
 
-static int show_handler(const struct sockaddr_nl *nl, struct nlmsghdr *n, void *arg)
+static int show_handler(const struct sockaddr_nl *nl,
+			struct rtnl_ctrl_data *ctrl,
+			struct nlmsghdr *n, void *arg)
 {
 	print_route(nl, n, stdout);
 	return 0;
diff --git a/ip/rtmon.c b/ip/rtmon.c
index ff685e530d95..42b24fb5fd38 100644
--- a/ip/rtmon.c
+++ b/ip/rtmon.c
@@ -45,8 +45,8 @@ static void write_stamp(FILE *fp)
 	fwrite((void*)n1, 1, NLMSG_ALIGN(n1->nlmsg_len), fp);
 }
 
-static int dump_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		    void *arg)
+static int dump_msg(const struct sockaddr_nl *who, struct rtnl_ctrl_data *ctrl,
+		    struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE*)arg;
 	if (!init_phase)
@@ -56,6 +56,12 @@ static int dump_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 	return 0;
 }
 
+static int dump_msg2(const struct sockaddr_nl *who,
+		     struct nlmsghdr *n, void *arg)
+{
+	return dump_msg(who, NULL, n, arg);
+}
+
 static void usage(void)
 {
 	fprintf(stderr, "Usage: rtmon file FILE [ all | LISTofOBJECTS]\n");
@@ -163,7 +169,7 @@ main(int argc, char **argv)
 
 	write_stamp(fp);
 
-	if (rtnl_dump_filter(&rth, dump_msg, fp) < 0) {
+	if (rtnl_dump_filter(&rth, dump_msg2, fp) < 0) {
 		fprintf(stderr, "Dump terminated\n");
 		return 1;
 	}
diff --git a/ip/xfrm_monitor.c b/ip/xfrm_monitor.c
index 58c7d7f46b44..2119c51d92ac 100644
--- a/ip/xfrm_monitor.c
+++ b/ip/xfrm_monitor.c
@@ -290,6 +290,7 @@ static int xfrm_mapping_print(const struct sockaddr_nl *who,
 }
 
 static int xfrm_accept_msg(const struct sockaddr_nl *who,
+			   struct rtnl_ctrl_data *ctrl,
 			   struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE*)arg;
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 77e07ef7cf60..01b65cf806c0 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -419,7 +419,7 @@ int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, pid_t peer,
 }
 
 int rtnl_listen(struct rtnl_handle *rtnl,
-		rtnl_filter_t handler,
+		rtnl_listen_filter_t handler,
 		void *jarg)
 {
 	int status;
@@ -475,7 +475,7 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 				exit(1);
 			}
 
-			err = handler(&nladdr, h, jarg);
+			err = handler(&nladdr, NULL, h, jarg);
 			if (err < 0)
 				return err;
 
@@ -493,7 +493,7 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 	}
 }
 
-int rtnl_from_file(FILE *rtnl, rtnl_filter_t handler,
+int rtnl_from_file(FILE *rtnl, rtnl_listen_filter_t handler,
 		   void *jarg)
 {
 	int status;
@@ -541,7 +541,7 @@ int rtnl_from_file(FILE *rtnl, rtnl_filter_t handler,
 			return -1;
 		}
 
-		err = handler(&nladdr, h, jarg);
+		err = handler(&nladdr, NULL, h, jarg);
 		if (err < 0)
 			return err;
 	}
diff --git a/man/man3/libnetlink.3 b/man/man3/libnetlink.3
index e999bd68237a..99be9cc9f533 100644
--- a/man/man3/libnetlink.3
+++ b/man/man3/libnetlink.3
@@ -33,7 +33,8 @@ int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, pid_t peer,
 	      void *jarg)
 .sp
 int rtnl_listen(struct rtnl_handle *rtnl, 
-	      int (*handler)(struct sockaddr_nl *,struct nlmsghdr *n, void *),
+	      int (*handler)(struct sockaddr_nl *, struct rtnl_ctrl_data *,
+			     struct nlmsghdr *n, void *),
 	      void *jarg)
 .sp
 int rtnl_from_file(FILE *rtnl, 
@@ -108,8 +109,8 @@ rtnl_listen
 Receive netlink data after a request and pass it to 
 .I handler.
 .B handler
-is a callback that gets the message source address, the message itself,
-and the
+is a callback that gets the message source address, anscillary data, the message
+itself, and the
 .B jarg
 cookie as arguments. It will get called for all received messages.
 Only one message bundle is received. If there is a message
diff --git a/tc/tc_monitor.c b/tc/tc_monitor.c
index 0efe0343db0b..cae3616145c8 100644
--- a/tc/tc_monitor.c
+++ b/tc/tc_monitor.c
@@ -36,6 +36,7 @@ static void usage(void)
 
 
 static int accept_tcmsg(const struct sockaddr_nl *who,
+			struct rtnl_ctrl_data *ctrl,
 			struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE*)arg;
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH iproute2-next 4/6] ipmonitor: introduce print_headers
  2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
                         ` (2 preceding siblings ...)
  2015-05-20 14:19       ` [PATCH iproute2-next 3/6] libnetlink: introduce rtnl_listen_filter_t Nicolas Dichtel
@ 2015-05-20 14:19       ` Nicolas Dichtel
  2015-05-20 14:20       ` [PATCH iproute2-next 5/6] ipmonitor: allows to monitor in several netns Nicolas Dichtel
  2015-05-20 14:20       ` [PATCH iproute2-next 6/6] xfrmmonitor: " Nicolas Dichtel
  5 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-20 14:19 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

The goal of this patch is to avoid code duplication.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/ipmonitor.c | 45 +++++++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index 27bbe4410644..cae186d86153 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -37,6 +37,15 @@ static void usage(void)
 	exit(-1);
 }
 
+static void print_headers(FILE *fp, char *label)
+{
+	if (timestamp)
+		print_timestamp(fp);
+
+	if (prefix_banner)
+		fprintf(fp, "%s", label);
+}
+
 static int accept_msg(const struct sockaddr_nl *who,
 		      struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
@@ -55,42 +64,31 @@ static int accept_msg(const struct sockaddr_nl *who,
 		if (r->rtm_flags & RTM_F_CLONED)
 			return 0;
 
-		if (timestamp)
-			print_timestamp(fp);
-
 		if (r->rtm_family == RTNL_FAMILY_IPMR ||
 		    r->rtm_family == RTNL_FAMILY_IP6MR) {
-			if (prefix_banner)
-				fprintf(fp, "[MROUTE]");
+			print_headers(fp, "[MROUTE]");
 			print_mroute(who, n, arg);
 			return 0;
 		} else {
-			if (prefix_banner)
-				fprintf(fp, "[ROUTE]");
+			print_headers(fp, "[ROUTE]");
 			print_route(who, n, arg);
 			return 0;
 		}
 	}
 
-	if (timestamp)
-		print_timestamp(fp);
-
 	if (n->nlmsg_type == RTM_NEWLINK || n->nlmsg_type == RTM_DELLINK) {
 		ll_remember_index(who, n, NULL);
-		if (prefix_banner)
-			fprintf(fp, "[LINK]");
+		print_headers(fp, "[LINK]");
 		print_linkinfo(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWADDR || n->nlmsg_type == RTM_DELADDR) {
-		if (prefix_banner)
-			fprintf(fp, "[ADDR]");
+		print_headers(fp, "[ADDR]");
 		print_addrinfo(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWADDRLABEL || n->nlmsg_type == RTM_DELADDRLABEL) {
-		if (prefix_banner)
-			fprintf(fp, "[ADDRLABEL]");
+		print_headers(fp, "[ADDRLABEL]");
 		print_addrlabel(who, n, arg);
 		return 0;
 	}
@@ -103,26 +101,22 @@ static int accept_msg(const struct sockaddr_nl *who,
 				return 0;
 		}
 
-		if (prefix_banner)
-			fprintf(fp, "[NEIGH]");
+		print_headers(fp, "[NEIGH]");
 		print_neigh(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWPREFIX) {
-		if (prefix_banner)
-			fprintf(fp, "[PREFIX]");
+		print_headers(fp, "[PREFIX]");
 		print_prefix(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWRULE || n->nlmsg_type == RTM_DELRULE) {
-		if (prefix_banner)
-			fprintf(fp, "[RULE]");
+		print_headers(fp, "[RULE]");
 		print_rule(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWNETCONF) {
-		if (prefix_banner)
-			fprintf(fp, "[NETCONF]");
+		print_headers(fp, "[NETCONF]");
 		print_netconf(who, ctrl, n, arg);
 		return 0;
 	}
@@ -131,8 +125,7 @@ static int accept_msg(const struct sockaddr_nl *who,
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWNSID || n->nlmsg_type == RTM_DELNSID) {
-		if (prefix_banner)
-			fprintf(fp, "[NSID]");
+		print_headers(fp, "[NSID]");
 		print_nsid(who, n, arg);
 		return 0;
 	}
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH iproute2-next 5/6] ipmonitor: allows to monitor in several netns
  2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
                         ` (3 preceding siblings ...)
  2015-05-20 14:19       ` [PATCH iproute2-next 4/6] ipmonitor: introduce print_headers Nicolas Dichtel
@ 2015-05-20 14:20       ` Nicolas Dichtel
  2015-05-20 14:20       ` [PATCH iproute2-next 6/6] xfrmmonitor: " Nicolas Dichtel
  5 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-20 14:20 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

With this patch, it's now possible to listen in all netns that have an nsid
assigned into the netns where the socket is opened.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/libnetlink.h  |  4 ++++
 ip/ipmonitor.c        | 39 ++++++++++++++++++++++++++-------------
 lib/libnetlink.c      | 43 ++++++++++++++++++++++++++++++++++++++++++-
 man/man8/ip-monitor.8 | 17 +++++++++++++++++
 4 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 1b9c9255ce1d..bd9bde091abe 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -20,6 +20,8 @@ struct rtnl_handle
 	__u32			dump;
 	int			proto;
 	FILE		       *dump_fp;
+#define RTNL_HANDLE_F_LISTEN_ALL_NSID		0x01
+	int			flags;
 };
 
 extern int rcvbuf;
@@ -42,6 +44,7 @@ extern int rtnl_dump_request(struct rtnl_handle *rth, int type, void *req,
 	__attribute__((warn_unused_result));
 
 struct rtnl_ctrl_data {
+	int	nsid;
 };
 
 typedef int (*rtnl_filter_t)(const struct sockaddr_nl *,
@@ -125,6 +128,7 @@ static inline const char *rta_getattr_str(const struct rtattr *rta)
 	return (const char *)RTA_DATA(rta);
 }
 
+extern int rtnl_listen_all_nsid(struct rtnl_handle *);
 extern int rtnl_listen(struct rtnl_handle *, rtnl_listen_filter_t handler,
 		       void *jarg);
 extern int rtnl_from_file(FILE *, rtnl_listen_filter_t handler,
diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index cae186d86153..8bcf8822b398 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -26,22 +26,30 @@
 
 static void usage(void) __attribute__((noreturn));
 int prefix_banner;
+int listen_all_nsid;
 
 static void usage(void)
 {
-	fprintf(stderr, "Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ]"
-			"[ label ] [dev DEVICE]\n");
+	fprintf(stderr, "Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] "
+			"[ label ] [all-nsid] [dev DEVICE]\n");
 	fprintf(stderr, "LISTofOBJECTS := link | address | route | mroute | prefix |\n");
 	fprintf(stderr, "                 neigh | netconf | rule | nsid\n");
 	fprintf(stderr, "FILE := file FILENAME\n");
 	exit(-1);
 }
 
-static void print_headers(FILE *fp, char *label)
+static void print_headers(FILE *fp, char *label, struct rtnl_ctrl_data *ctrl)
 {
 	if (timestamp)
 		print_timestamp(fp);
 
+	if (listen_all_nsid) {
+		if (ctrl == NULL || ctrl->nsid < 0)
+			fprintf(fp, "[nsid current]");
+		else
+			fprintf(fp, "[nsid %d]", ctrl->nsid);
+	}
+
 	if (prefix_banner)
 		fprintf(fp, "%s", label);
 }
@@ -66,11 +74,11 @@ static int accept_msg(const struct sockaddr_nl *who,
 
 		if (r->rtm_family == RTNL_FAMILY_IPMR ||
 		    r->rtm_family == RTNL_FAMILY_IP6MR) {
-			print_headers(fp, "[MROUTE]");
+			print_headers(fp, "[MROUTE]", ctrl);
 			print_mroute(who, n, arg);
 			return 0;
 		} else {
-			print_headers(fp, "[ROUTE]");
+			print_headers(fp, "[ROUTE]", ctrl);
 			print_route(who, n, arg);
 			return 0;
 		}
@@ -78,17 +86,17 @@ static int accept_msg(const struct sockaddr_nl *who,
 
 	if (n->nlmsg_type == RTM_NEWLINK || n->nlmsg_type == RTM_DELLINK) {
 		ll_remember_index(who, n, NULL);
-		print_headers(fp, "[LINK]");
+		print_headers(fp, "[LINK]", ctrl);
 		print_linkinfo(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWADDR || n->nlmsg_type == RTM_DELADDR) {
-		print_headers(fp, "[ADDR]");
+		print_headers(fp, "[ADDR]", ctrl);
 		print_addrinfo(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWADDRLABEL || n->nlmsg_type == RTM_DELADDRLABEL) {
-		print_headers(fp, "[ADDRLABEL]");
+		print_headers(fp, "[ADDRLABEL]", ctrl);
 		print_addrlabel(who, n, arg);
 		return 0;
 	}
@@ -101,22 +109,22 @@ static int accept_msg(const struct sockaddr_nl *who,
 				return 0;
 		}
 
-		print_headers(fp, "[NEIGH]");
+		print_headers(fp, "[NEIGH]", ctrl);
 		print_neigh(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWPREFIX) {
-		print_headers(fp, "[PREFIX]");
+		print_headers(fp, "[PREFIX]", ctrl);
 		print_prefix(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWRULE || n->nlmsg_type == RTM_DELRULE) {
-		print_headers(fp, "[RULE]");
+		print_headers(fp, "[RULE]", ctrl);
 		print_rule(who, n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWNETCONF) {
-		print_headers(fp, "[NETCONF]");
+		print_headers(fp, "[NETCONF]", ctrl);
 		print_netconf(who, ctrl, n, arg);
 		return 0;
 	}
@@ -125,7 +133,7 @@ static int accept_msg(const struct sockaddr_nl *who,
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWNSID || n->nlmsg_type == RTM_DELNSID) {
-		print_headers(fp, "[NSID]");
+		print_headers(fp, "[NSID]", ctrl);
 		print_nsid(who, n, arg);
 		return 0;
 	}
@@ -178,6 +186,8 @@ int do_ipmonitor(int argc, char **argv)
 			file = *argv;
 		} else if (matches(*argv, "label") == 0) {
 			prefix_banner = 1;
+		} else if (matches(*argv, "all-nsid") == 0) {
+			listen_all_nsid = 1;
 		} else if (matches(*argv, "link") == 0) {
 			llink=1;
 			groups = 0;
@@ -284,6 +294,9 @@ int do_ipmonitor(int argc, char **argv)
 
 	if (rtnl_open(&rth, groups) < 0)
 		exit(1);
+	if (listen_all_nsid && rtnl_listen_all_nsid(&rth) < 0)
+		exit(1);
+
 	ll_init_map(&rth);
 	netns_map_init();
 
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 01b65cf806c0..424a5b6ffe8f 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -25,6 +25,10 @@
 
 #include "libnetlink.h"
 
+#ifndef SOL_NETLINK
+#define SOL_NETLINK 270
+#endif
+
 int rcvbuf = 1024 * 1024;
 
 void rtnl_close(struct rtnl_handle *rth)
@@ -418,6 +422,19 @@ int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, pid_t peer,
 	}
 }
 
+int rtnl_listen_all_nsid(struct rtnl_handle *rth)
+{
+	unsigned int on = 1;
+
+	if (setsockopt(rth->fd, SOL_NETLINK, NETLINK_LISTEN_ALL_NSID, &on,
+		       sizeof(on)) < 0) {
+		perror("NETLINK_LISTEN_ALL_NSID");
+		return -1;
+	}
+	rth->flags |= RTNL_HANDLE_F_LISTEN_ALL_NSID;
+	return 0;
+}
+
 int rtnl_listen(struct rtnl_handle *rtnl,
 		rtnl_listen_filter_t handler,
 		void *jarg)
@@ -433,6 +450,12 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 		.msg_iovlen = 1,
 	};
 	char   buf[16384];
+	char   cmsgbuf[BUFSIZ];
+
+	if (rtnl->flags & RTNL_HANDLE_F_LISTEN_ALL_NSID) {
+		msg.msg_control = &cmsgbuf;
+		msg.msg_controllen = sizeof(cmsgbuf);
+	}
 
 	memset(&nladdr, 0, sizeof(nladdr));
 	nladdr.nl_family = AF_NETLINK;
@@ -441,6 +464,9 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 
 	iov.iov_base = buf;
 	while (1) {
+		struct rtnl_ctrl_data ctrl;
+		struct cmsghdr *cmsg;
+
 		iov.iov_len = sizeof(buf);
 		status = recvmsg(rtnl->fd, &msg, 0);
 
@@ -461,6 +487,21 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 			fprintf(stderr, "Sender address length == %d\n", msg.msg_namelen);
 			exit(1);
 		}
+
+		if (rtnl->flags & RTNL_HANDLE_F_LISTEN_ALL_NSID) {
+			memset(&ctrl, 0, sizeof(ctrl));
+			ctrl.nsid = -1;
+			for (cmsg = CMSG_FIRSTHDR(&msg); cmsg;
+			     cmsg = CMSG_NXTHDR(&msg, cmsg))
+				if (cmsg->cmsg_level == SOL_NETLINK &&
+				    cmsg->cmsg_type == NETLINK_LISTEN_ALL_NSID &&
+				    cmsg->cmsg_len == CMSG_LEN(sizeof(int))) {
+					int *data = (int *)CMSG_DATA(cmsg);
+
+					ctrl.nsid = *data;
+				}
+		}
+
 		for (h = (struct nlmsghdr*)buf; status >= sizeof(*h); ) {
 			int err;
 			int len = h->nlmsg_len;
@@ -475,7 +516,7 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 				exit(1);
 			}
 
-			err = handler(&nladdr, NULL, h, jarg);
+			err = handler(&nladdr, &ctrl, h, jarg);
 			if (err < 0)
 				return err;
 
diff --git a/man/man8/ip-monitor.8 b/man/man8/ip-monitor.8
index 3123b4014cc2..d2bd381a8c32 100644
--- a/man/man8/ip-monitor.8
+++ b/man/man8/ip-monitor.8
@@ -14,6 +14,8 @@ ip-monitor, rtmon \- state monitoring
 ] [
 .BI label
 ] [
+.BI all-nsid
+] [
 .BI dev " DEVICE "
 ]
 .sp
@@ -46,6 +48,8 @@ command is the first in the command line and then the object list follows:
 ] [
 .BI label
 ] [
+.BI all-nsid
+] [
 .BI dev " DEVICE "
 ]
 
@@ -76,6 +80,19 @@ show the family of the message. For example:
 
 .P
 If the
+.BI all-nsid
+option is set, the program listens to all network namespaces that have a
+nsid assigned into the network namespace were the program is running.
+A prefix is displayed to show the network namespace where the message
+originates. Example:
+.sp
+.in +2
+[nsid 0]10.16.0.112 dev eth0 lladdr 00:04:23:df:2f:d0 REACHABLE
+.in -2
+.sp
+
+.P
+If the
 .BI file
 option is given, the program does not listen on RTNETLINK,
 but opens the given file, and dumps its contents. The file
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH iproute2-next 6/6] xfrmmonitor: allows to monitor in several netns
  2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
                         ` (4 preceding siblings ...)
  2015-05-20 14:20       ` [PATCH iproute2-next 5/6] ipmonitor: allows to monitor in several netns Nicolas Dichtel
@ 2015-05-20 14:20       ` Nicolas Dichtel
  5 siblings, 0 replies; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-20 14:20 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

With this patch, it's now possible to listen in all netns that have an nsid
assigned into the netns where is socket is opened.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/xfrm_monitor.c  | 14 +++++++++++++-
 man/man8/ip-xfrm.8 | 21 ++++++++++++++++++++-
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/ip/xfrm_monitor.c b/ip/xfrm_monitor.c
index 2119c51d92ac..ebccb71c688e 100644
--- a/ip/xfrm_monitor.c
+++ b/ip/xfrm_monitor.c
@@ -35,10 +35,11 @@
 #include "ip_common.h"
 
 static void usage(void) __attribute__((noreturn));
+int listen_all_nsid;
 
 static void usage(void)
 {
-	fprintf(stderr, "Usage: ip xfrm monitor [ all | OBJECTS | help ]\n");
+	fprintf(stderr, "Usage: ip xfrm monitor [all-nsid] [ all | OBJECTS | help ]\n");
 	fprintf(stderr, "OBJECTS := { acquire | expire | SA | aevent | policy | report }\n");
 	exit(-1);
 }
@@ -298,6 +299,13 @@ static int xfrm_accept_msg(const struct sockaddr_nl *who,
 	if (timestamp)
 		print_timestamp(fp);
 
+	if (listen_all_nsid) {
+		if (ctrl == NULL || ctrl->nsid < 0)
+			fprintf(fp, "[nsid current]");
+		else
+			fprintf(fp, "[nsid %d]", ctrl->nsid);
+	}
+
 	switch (n->nlmsg_type) {
 	case XFRM_MSG_NEWSA:
 	case XFRM_MSG_DELSA:
@@ -360,6 +368,8 @@ int do_xfrm_monitor(int argc, char **argv)
 		if (matches(*argv, "file") == 0) {
 			NEXT_ARG();
 			file = *argv;
+		} else if (matches(*argv, "all-nsid") == 0) {
+			listen_all_nsid = 1;
 		} else if (matches(*argv, "acquire") == 0) {
 			lacquire=1;
 			groups = 0;
@@ -412,6 +422,8 @@ int do_xfrm_monitor(int argc, char **argv)
 
 	if (rtnl_open_byproto(&rth, groups, NETLINK_XFRM) < 0)
 		exit(1);
+	if (listen_all_nsid && rtnl_listen_all_nsid(&rth) < 0)
+		exit(1);
 
 	if (rtnl_listen(&rth, xfrm_accept_msg, (void*)stdout) < 0)
 		exit(2);
diff --git a/man/man8/ip-xfrm.8 b/man/man8/ip-xfrm.8
index 29b397f35959..489ab6ed4964 100644
--- a/man/man8/ip-xfrm.8
+++ b/man/man8/ip-xfrm.8
@@ -364,7 +364,11 @@ ip-xfrm \- transform configuration
 .BR required " | " use
 
 .ti -8
-.BR "ip xfrm monitor" " [ " all " |"
+.BR "ip xfrm monitor" " ["
+.BI all-nsid
+] [
+.BI all
+ |
 .IR LISTofXFRM-OBJECTS " ]"
 
 .ti -8
@@ -669,7 +673,22 @@ ip xfrm monitor 	state monitoring for xfrm objects
 .PP
 The xfrm objects to monitor can be optionally specified.
 
+.P
+If the
+.BI all-nsid
+option is set, the program listens to all network namespaces that have a
+nsid assigned into the network namespace were the program is running.
+A prefix is displayed to show the network namespace where the message
+originates. Example:
+.sp
+.in +2
+[nsid 1]Flushed state proto 0
+.in -2
+.sp
+
 .SH AUTHOR
 Manpage revised by David Ward <david.ward@ll.mit.edu>
 .br
 Manpage revised by Christophe Gouault <christophe.gouault@6wind.com>
+.br
+Manpage revised by Nicolas Dichtel <nicolas.dichtel@6wind.com>
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-08 12:02   ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Eric W. Biederman
  2015-05-09 21:07     ` Nicolas Dichtel
@ 2015-05-22 20:50     ` Alexander Holler
  2015-05-22 21:04       ` Cong Wang
                         ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Alexander Holler @ 2015-05-22 20:50 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: Eric W. Biederman, netdev, tgraf, davem

Am 08.05.2015 um 14:02 schrieb Eric W. Biederman:
>
> So I am dense.  I have read through the patches and I don't see where
> you tag packets from other network namespaces with a network namespace
> id.

Me too,

I've recently written a little tool called snetmanmon (source is
available at github) to monitor and handle network related events
by using rtnetlink.

Having seen this patch series (thanks!), I've played with it.

I've applied the patch series to v4.1-rc4.

Maybe I'm using or holding it wrong, but I've some comments.

First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump
of the interfaces through RTM_GETLINK together with NLM_F_DUMP and
NLM_F_REQUEST should return all interfaces of all reachable namespaces.

Next, if NETLINK_LISTEN_ALL_NSID is enabled, I receive RTM_NEWLINK
but without any indication of the namespace. E.g. if I do
	ip netns add netns1
	ip netns exec netns1 brctl addbr br0
the RTM_NEWLINK for br0 (received in the root ns, not netns1) doesn't
have the attribute IFLA_LINK_NETNSID.

Same for the RTM_DELLINK msg if I call
	ip netns exec netns1 brctl delbr br0
afterwards. So both netlink messages are looking like br0 was
created in the root ns.

Another problem seems to be with veth devices. E.g. if I do
	ip link add veth0 type veth peer name veth1
	ip link set veth1 netns netns1
I receive
	RTM_NEWLINK for veth0 (no nsid)
	RTM_NEWLINK for veth1 (no nsid)
	RTM_DELLINK for veth1 (no nsid)
	RTM_NEWLINK for veth1 (with nsid 0)
That looks ok, except the missing RTM_NEWLINK for lo in netns1, which
was created together with the namespace. But if I now request a dump,
I get
	RTM_NEWLINK for veth0 (with nsid 0)
which looks like veth0 is part of nsid 0, and I get nothing for veth1.
Of course, that vlan device might be part of nsid 0 too (as veth1),
but its part named veth0 is not part of that namespace. So the
IFLA_LINK_NETNSID attribute received with the RTM_NEWLINK for veth0 through
the dump is misleading.

So it looks like either I missed something, I'm doing something wrong,
or there still is some work todo to make NETLINK_LISTEN_ALL_NSID work
like expected (or like my simple mind would expect it).

Thanks again for the patches, regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-22 20:50     ` Alexander Holler
@ 2015-05-22 21:04       ` Cong Wang
  2015-05-22 21:12         ` Alexander Holler
  2015-05-22 21:19       ` Eric W. Biederman
  2015-05-25  7:45       ` Nicolas Dichtel
  2 siblings, 1 reply; 52+ messages in thread
From: Cong Wang @ 2015-05-22 21:04 UTC (permalink / raw)
  To: Alexander Holler
  Cc: Nicolas Dichtel, Eric W. Biederman, netdev, Thomas Graf, David Miller

On Fri, May 22, 2015 at 1:50 PM, Alexander Holler <holler@ahsoftware.de> wrote:
> Am 08.05.2015 um 14:02 schrieb Eric W. Biederman:
>>
>>
>> So I am dense.  I have read through the patches and I don't see where
>> you tag packets from other network namespaces with a network namespace
>> id.
>
>
> Me too,
>
> I've recently written a little tool called snetmanmon (source is
> available at github) to monitor and handle network related events
> by using rtnetlink.
>
> Having seen this patch series (thanks!), I've played with it.
>
> I've applied the patch series to v4.1-rc4.
>
> Maybe I'm using or holding it wrong, but I've some comments.
>
> First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump
> of the interfaces through RTM_GETLINK together with NLM_F_DUMP and
> NLM_F_REQUEST should return all interfaces of all reachable namespaces.
>
> Next, if NETLINK_LISTEN_ALL_NSID is enabled, I receive RTM_NEWLINK
> but without any indication of the namespace. E.g. if I do
>         ip netns add netns1
>         ip netns exec netns1 brctl addbr br0
> the RTM_NEWLINK for br0 (received in the root ns, not netns1) doesn't
> have the attribute IFLA_LINK_NETNSID.


Bridge doesn't have an underlying link, so no LINK_NETNSID. LINK_NETNSID
is only added when its underlying link is in a different netns.


>
> Same for the RTM_DELLINK msg if I call
>         ip netns exec netns1 brctl delbr br0
> afterwards. So both netlink messages are looking like br0 was
> created in the root ns.
>
> Another problem seems to be with veth devices. E.g. if I do
>         ip link add veth0 type veth peer name veth1
>         ip link set veth1 netns netns1
> I receive
>         RTM_NEWLINK for veth0 (no nsid)
>         RTM_NEWLINK for veth1 (no nsid)
>         RTM_DELLINK for veth1 (no nsid)
>         RTM_NEWLINK for veth1 (with nsid 0)
> That looks ok, except the missing RTM_NEWLINK for lo in netns1, which
> was created together with the namespace. But if I now request a dump,
> I get
>         RTM_NEWLINK for veth0 (with nsid 0)
> which looks like veth0 is part of nsid 0, and I get nothing for veth1.
> Of course, that vlan device might be part of nsid 0 too (as veth1),
> but its part named veth0 is not part of that namespace. So the
> IFLA_LINK_NETNSID attribute received with the RTM_NEWLINK for veth0 through
> the dump is misleading.

That is because the code tries to do "lazy" allocation for netnsid,
it defers it util the dumping, veth case is special here given how they pair,
I noticed the same "problem" (it doesn't have to be a bug) when I reviewed
the code, nobody cared. ;-/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-22 21:04       ` Cong Wang
@ 2015-05-22 21:12         ` Alexander Holler
  2015-05-22 21:29           ` Cong Wang
  0 siblings, 1 reply; 52+ messages in thread
From: Alexander Holler @ 2015-05-22 21:12 UTC (permalink / raw)
  To: Cong Wang
  Cc: Nicolas Dichtel, Eric W. Biederman, netdev, Thomas Graf, David Miller

Am 22.05.2015 um 23:04 schrieb Cong Wang:
> On Fri, May 22, 2015 at 1:50 PM, Alexander Holler <holler@ahsoftware.de> wrote:
>> Am 08.05.2015 um 14:02 schrieb Eric W. Biederman:
>>>
>>>
>>> So I am dense.  I have read through the patches and I don't see where
>>> you tag packets from other network namespaces with a network namespace
>>> id.
>>
>>
>> Me too,
>>
>> I've recently written a little tool called snetmanmon (source is
>> available at github) to monitor and handle network related events
>> by using rtnetlink.
>>
>> Having seen this patch series (thanks!), I've played with it.
>>
>> I've applied the patch series to v4.1-rc4.
>>
>> Maybe I'm using or holding it wrong, but I've some comments.
>>
>> First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump
>> of the interfaces through RTM_GETLINK together with NLM_F_DUMP and
>> NLM_F_REQUEST should return all interfaces of all reachable namespaces.
>>
>> Next, if NETLINK_LISTEN_ALL_NSID is enabled, I receive RTM_NEWLINK
>> but without any indication of the namespace. E.g. if I do
>>          ip netns add netns1
>>          ip netns exec netns1 brctl addbr br0
>> the RTM_NEWLINK for br0 (received in the root ns, not netns1) doesn't
>> have the attribute IFLA_LINK_NETNSID.
>
>
> Bridge doesn't have an underlying link, so no LINK_NETNSID. LINK_NETNSID
> is only added when its underlying link is in a different netns.

I'm using "link" similiar as interface. Maybe I've no idea what the 
attribute LINK:NETSID really means, but I've understood it as the one 
attribute which indicates the namespace an interface (or link), br0 in 
my example, lives in.

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-22 20:50     ` Alexander Holler
  2015-05-22 21:04       ` Cong Wang
@ 2015-05-22 21:19       ` Eric W. Biederman
  2015-05-22 21:30         ` Alexander Holler
  2015-05-25  7:45       ` Nicolas Dichtel
  2 siblings, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2015-05-22 21:19 UTC (permalink / raw)
  To: Alexander Holler; +Cc: Nicolas Dichtel, netdev, tgraf, davem

Alexander Holler <holler@ahsoftware.de> writes:

> Am 08.05.2015 um 14:02 schrieb Eric W. Biederman:
>>
>> So I am dense.  I have read through the patches and I don't see where
>> you tag packets from other network namespaces with a network namespace
>> id.
>
> Me too,

You need to use recvmsg, and then parse out the NETLINK_LISTEN_ALL_NSID
control message.

It isn't a netlink attribute that is being returned.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-22 21:12         ` Alexander Holler
@ 2015-05-22 21:29           ` Cong Wang
  2015-05-22 21:46             ` Alexander Holler
  0 siblings, 1 reply; 52+ messages in thread
From: Cong Wang @ 2015-05-22 21:29 UTC (permalink / raw)
  To: Alexander Holler
  Cc: Nicolas Dichtel, Eric W. Biederman, netdev, Thomas Graf, David Miller

On Fri, May 22, 2015 at 2:12 PM, Alexander Holler <holler@ahsoftware.de> wrote:
>>
>> Bridge doesn't have an underlying link, so no LINK_NETNSID. LINK_NETNSID
>> is only added when its underlying link is in a different netns.
>
>
> I'm using "link" similiar as interface. Maybe I've no idea what the
> attribute LINK:NETSID really means, but I've understood it as the one
> attribute which indicates the namespace an interface (or link), br0 in my
> example, lives in.
>

It is for an underlying link for example: a veth pair is a link for each other,
a tunnel device has a link to transmit packets.

Bridge and bonding are master devices where "slaves" (or ports for bridge)
can join.

netns doesn't have a name or id by nature, we assign it a name by binding
mount some /proc file, these LINK_NETNSID's are not absolutely unique either,
just relatively.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-22 21:19       ` Eric W. Biederman
@ 2015-05-22 21:30         ` Alexander Holler
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Holler @ 2015-05-22 21:30 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Nicolas Dichtel, netdev, tgraf, davem

Am 22.05.2015 um 23:19 schrieb Eric W. Biederman:
> Alexander Holler <holler@ahsoftware.de> writes:
>
>> Am 08.05.2015 um 14:02 schrieb Eric W. Biederman:
>>>
>>> So I am dense.  I have read through the patches and I don't see where
>>> you tag packets from other network namespaces with a network namespace
>>> id.
>>
>> Me too,
>
> You need to use recvmsg, and then parse out the NETLINK_LISTEN_ALL_NSID
> control message.
>
> It isn't a netlink attribute that is being returned.

Hmm. :(

Maybe I should have read the kernel sources or something else where this 
is mentioned. Looks like I've understood it totally wrong.

Thanks for the explanation.

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-22 21:29           ` Cong Wang
@ 2015-05-22 21:46             ` Alexander Holler
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Holler @ 2015-05-22 21:46 UTC (permalink / raw)
  To: Cong Wang
  Cc: Nicolas Dichtel, Eric W. Biederman, netdev, Thomas Graf, David Miller

Am 22.05.2015 um 23:29 schrieb Cong Wang:
> On Fri, May 22, 2015 at 2:12 PM, Alexander Holler <holler@ahsoftware.de> wrote:
>>>
>>> Bridge doesn't have an underlying link, so no LINK_NETNSID. LINK_NETNSID
>>> is only added when its underlying link is in a different netns.
>>
>>
>> I'm using "link" similiar as interface. Maybe I've no idea what the
>> attribute LINK:NETSID really means, but I've understood it as the one
>> attribute which indicates the namespace an interface (or link), br0 in my
>> example, lives in.
>>
>
> It is for an underlying link for example: a veth pair is a link for each other,
> a tunnel device has a link to transmit packets.
>
> Bridge and bonding are master devices where "slaves" (or ports for bridge)
> can join.
>
> netns doesn't have a name or id by nature, we assign it a name by binding
> mount some /proc file, these LINK_NETNSID's are not absolutely unique either,
> just relatively.
>

Hmm. so making an inventory of all existing interfaces in all namespaces 
is either not really possible or ugly. If these IDs are not unique, what 
will I get when dumping the NETSID's? Sounds like I better forget that 
approach of using these IDs which means I would have to travel through 
the mounts and have to use GETLINK (with dump) from inside these 
namespaces, while having to identify the namespaces by their mount. That 
leads me to the problem how to find these mounts and ...

Not sure if I want to do that. It looked so easy to support namespaces 
but know it looks rather complicated.

Anyway, thanks a lot for the quick and fast responses and explanations.

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-22 20:50     ` Alexander Holler
  2015-05-22 21:04       ` Cong Wang
  2015-05-22 21:19       ` Eric W. Biederman
@ 2015-05-25  7:45       ` Nicolas Dichtel
  2015-05-25 10:55         ` Alexander Holler
  2 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-25  7:45 UTC (permalink / raw)
  To: Alexander Holler; +Cc: Eric W. Biederman, netdev, tgraf, davem

Le 22/05/2015 22:50, Alexander Holler a écrit :
> Am 08.05.2015 um 14:02 schrieb Eric W. Biederman:
>>
>> So I am dense.  I have read through the patches and I don't see where
>> you tag packets from other network namespaces with a network namespace
>> id.
>
> Me too,
>
> I've recently written a little tool called snetmanmon (source is
> available at github) to monitor and handle network related events
> by using rtnetlink.
>
> Having seen this patch series (thanks!), I've played with it.
>
> I've applied the patch series to v4.1-rc4.
>
> Maybe I'm using or holding it wrong, but I've some comments.
>
> First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump
> of the interfaces through RTM_GETLINK together with NLM_F_DUMP and
> NLM_F_REQUEST should return all interfaces of all reachable namespaces.
This option is only for 'listening', ie spontaneous notifications from the
kernel. It does nothing for request.

>
> Next, if NETLINK_LISTEN_ALL_NSID is enabled, I receive RTM_NEWLINK
> but without any indication of the namespace. E.g. if I do
>      ip netns add netns1
>      ip netns exec netns1 brctl addbr br0
> the RTM_NEWLINK for br0 (received in the root ns, not netns1) doesn't
> have the attribute IFLA_LINK_NETNSID.
nsid is sent through control message (see rcvmsg).
Try iproute2 branch net-next: 'ip monitor all-nsid'. It's an
example of how to use it.

>
> Same for the RTM_DELLINK msg if I call
>      ip netns exec netns1 brctl delbr br0
> afterwards. So both netlink messages are looking like br0 was
> created in the root ns.
>
> Another problem seems to be with veth devices. E.g. if I do
>      ip link add veth0 type veth peer name veth1
>      ip link set veth1 netns netns1
> I receive
>      RTM_NEWLINK for veth0 (no nsid)
>      RTM_NEWLINK for veth1 (no nsid)
>      RTM_DELLINK for veth1 (no nsid)
>      RTM_NEWLINK for veth1 (with nsid 0)
> That looks ok, except the missing RTM_NEWLINK for lo in netns1, which
The nsid for netns1 in the current netns is allocated when the veth1 is moved to
netns1. At this time, lo is created since a long time, thus the kernel won't
send any notification.
Note, you can manually allocate it with 'ip netns set netns1 -1', but you
won't get any notifications for the loopback.

> was created together with the namespace. But if I now request a dump,
> I get
>      RTM_NEWLINK for veth0 (with nsid 0)
> which looks like veth0 is part of nsid 0, and I get nothing for veth1.
The netlink message gives informations about veth1. With iproute2:
$ ip netns
netns1 (id: 0)
$ ip -d l ls veth0
9: veth0@if8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 72:36:c0:f4:35:64 brd ff:ff:ff:ff:ff:ff link-netnsid 0 
promiscuity 0
     veth addrgenmode eui64

Peer veth is the interface with ifindex 8 (@if8) in netns1 (link-netnsid 0).
To get informations about this interface, you need to dump it in netns1.

> Of course, that vlan device might be part of nsid 0 too (as veth1),
> but its part named veth0 is not part of that namespace. So the
> IFLA_LINK_NETNSID attribute received with the RTM_NEWLINK for veth0 through
> the dump is misleading.
Not sure to follow you. veth0 sits in the current netns (let's say init_net)
and veth1 in netns1.
So, when you dump veth0 in init_net, its link-netnsid is set to the id of
netns1 in init_net. And when you dump veth1 in netns1, it's link-netnsid is set
to the id of init_net in netns1.

>
> So it looks like either I missed something, I'm doing something wrong,
> or there still is some work todo to make NETLINK_LISTEN_ALL_NSID work
> like expected (or like my simple mind would expect it).
Having a patch that allows to perform request from a netns foo for a netns bar
is something doable, but much more complicated. And I think it requires more
thought. Let's see what will happen ;-)

>
> Thanks again for the patches, regards,
Thank you,


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-25  7:45       ` Nicolas Dichtel
@ 2015-05-25 10:55         ` Alexander Holler
  2015-05-25 13:09           ` Nicolas Dichtel
  0 siblings, 1 reply; 52+ messages in thread
From: Alexander Holler @ 2015-05-25 10:55 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Eric W. Biederman, netdev, tgraf, davem

Am 25.05.2015 um 09:45 schrieb Nicolas Dichtel:
> Le 22/05/2015 22:50, Alexander Holler a écrit :

>> First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump
>> of the interfaces through RTM_GETLINK together with NLM_F_DUMP and
>> NLM_F_REQUEST should return all interfaces of all reachable namespaces.
> This option is only for 'listening', ie spontaneous notifications from the
> kernel. It does nothing for request.

The problem is that you need informations about the affected interfaces. 
E.g. if you receive an NEWADDR or NEWROUTE for some interface (indicated 
by the index of the interface) in a(nother) namespace, how do you get 
informations about that interface, if not by a dump which includes the 
interfaces of these namespaces too? Without knowledge about the 
interface, these messages are not very usable. ;)

> Not sure to follow you. veth0 sits in the current netns (let's say
> init_net)
> and veth1 in netns1.
> So, when you dump veth0 in init_net, its link-netnsid is set to the id of
> netns1 in init_net. And when you dump veth1 in netns1, it's link-netnsid
> is set
> to the id of init_net in netns1.

I've misunderstood the meaning of IFLA_LINK_NETNSID. I thought it 
indicates the namespace an interface lives in, but it indicates the 
namespace it is linked too.

I've also thought that the NETNSID is a global unique identifier of a 
namespace, which seems to be wrong too. While I still not have read 
through all the sources, the other comments are suggesting that the NSID 
is just an ID which is unique only in one namespace, or in other words, 
every namespace has its own set of nsids. I'm not sure if I'm now right 
with that assumption, but that's what I now think after the responses to 
my mail. ;)

So to conclude, I've now scheduled support for namespaces to a far later 
point. It doesn't seem to be as easy as I've thought after having read 
the introductory mail of your patch series. ;)

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-25 10:55         ` Alexander Holler
@ 2015-05-25 13:09           ` Nicolas Dichtel
  2015-05-26 10:53             ` Alexander Holler
  0 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-25 13:09 UTC (permalink / raw)
  To: Alexander Holler; +Cc: Eric W. Biederman, netdev, tgraf, davem

Le 25/05/2015 12:55, Alexander Holler a écrit :
> Am 25.05.2015 um 09:45 schrieb Nicolas Dichtel:
>> Le 22/05/2015 22:50, Alexander Holler a écrit :
>
>>> First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump
>>> of the interfaces through RTM_GETLINK together with NLM_F_DUMP and
>>> NLM_F_REQUEST should return all interfaces of all reachable namespaces.
>> This option is only for 'listening', ie spontaneous notifications from the
>> kernel. It does nothing for request.
>
> The problem is that you need informations about the affected interfaces. E.g. if
> you receive an NEWADDR or NEWROUTE for some interface (indicated by the index of
> the interface) in a(nother) namespace, how do you get informations about that
> interface, if not by a dump which includes the interfaces of these namespaces
> too? Without knowledge about the interface, these messages are not very usable. ;)
Yes, this is the right things.

Usually, a daemon opens a socket to listen netlink event. Then, it opens
another netlink socket to dump the configuration (interfaces, addresses,
routes, etc.) and fill its internal structures. Starting from that point, for
most of configuration parameters, it doesn't need anymore to do dumps and thus
it can close the second socket. This allows your daemon to have only one socket
to monitor a set a netns.
Look at iproute for example, it starts by dumping all interfaces before
executing the specified command.

>
>> Not sure to follow you. veth0 sits in the current netns (let's say
>> init_net)
>> and veth1 in netns1.
>> So, when you dump veth0 in init_net, its link-netnsid is set to the id of
>> netns1 in init_net. And when you dump veth1 in netns1, it's link-netnsid
>> is set
>> to the id of init_net in netns1.
>
> I've misunderstood the meaning of IFLA_LINK_NETNSID. I thought it indicates the
> namespace an interface lives in, but it indicates the namespace it is linked too.
Yes.

>
> I've also thought that the NETNSID is a global unique identifier of a namespace,
> which seems to be wrong too. While I still not have read through all the
> sources, the other comments are suggesting that the NSID is just an ID which is
> unique only in one namespace, or in other words, every namespace has its own set
> of nsids. I'm not sure if I'm now right with that assumption, but that's what I
> now think after the responses to my mail. ;)
Right, nsid are local to a netns. This allows to migrate a container. With a
global id, that won't be possible. ifindex are local for the exact same purpose.

>
> So to conclude, I've now scheduled support for namespaces to a far later point.
> It doesn't seem to be as easy as I've thought after having read the introductory
> mail of your patch series. ;)
The main goal of the series was to improve scalability ;-)


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-25 13:09           ` Nicolas Dichtel
@ 2015-05-26 10:53             ` Alexander Holler
  2015-05-26 12:10               ` Nicolas Dichtel
  0 siblings, 1 reply; 52+ messages in thread
From: Alexander Holler @ 2015-05-26 10:53 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Eric W. Biederman, netdev, tgraf, davem

Am 25.05.2015 um 15:09 schrieb Nicolas Dichtel:
> Le 25/05/2015 12:55, Alexander Holler a écrit :
>> Am 25.05.2015 um 09:45 schrieb Nicolas Dichtel:
>>> Le 22/05/2015 22:50, Alexander Holler a écrit :
>>
>>>> First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump
>>>> of the interfaces through RTM_GETLINK together with NLM_F_DUMP and
>>>> NLM_F_REQUEST should return all interfaces of all reachable namespaces.
>>> This option is only for 'listening', ie spontaneous notifications
>>> from the
>>> kernel. It does nothing for request.
>>
>> The problem is that you need informations about the affected
>> interfaces. E.g. if
>> you receive an NEWADDR or NEWROUTE for some interface (indicated by
>> the index of
>> the interface) in a(nother) namespace, how do you get informations
>> about that
>> interface, if not by a dump which includes the interfaces of these
>> namespaces
>> too? Without knowledge about the interface, these messages are not
>> very usable. ;)
> Yes, this is the right things.
>
> Usually, a daemon opens a socket to listen netlink event. Then, it opens
> another netlink socket to dump the configuration (interfaces, addresses,
> routes, etc.) and fill its internal structures. Starting from that
> point, for
> most of configuration parameters, it doesn't need anymore to do dumps
> and thus
> it can close the second socket. This allows your daemon to have only one
> socket
> to monitor a set a netns.
> Look at iproute for example, it starts by dumping all interfaces before
> executing the specified command.

Hmm, sounds like we're talking in different rooms about the same thing 
in regard to the dump. ;)

I just wanted to explain why I think this series misses the (extended) 
dump which includes all interfaces (those of other namespaces too).

How does one use NETLINK_LISTEN_ALL_NSID without beeing able to dump all 
the interfaces of namespaces your patch series might send messages for?

The only way I currently see, is to start the listening part before any 
namespace is created. Doing so, it can fill it's internal structures 
with the RTM_NEWLINK messages (besides that missing one for lo). But how 
do you get these RTM_NEWLINK messages for already created namespaces and 
their interfaces, if not by a dump?

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-26 10:53             ` Alexander Holler
@ 2015-05-26 12:10               ` Nicolas Dichtel
  2015-05-26 14:36                 ` Alexander Holler
  0 siblings, 1 reply; 52+ messages in thread
From: Nicolas Dichtel @ 2015-05-26 12:10 UTC (permalink / raw)
  To: Alexander Holler; +Cc: Eric W. Biederman, netdev, tgraf, davem

Le 26/05/2015 12:53, Alexander Holler a écrit :
> Am 25.05.2015 um 15:09 schrieb Nicolas Dichtel:
[snip]
>
> Hmm, sounds like we're talking in different rooms about the same thing in regard
> to the dump. ;)
>
> I just wanted to explain why I think this series misses the (extended) dump
> which includes all interfaces (those of other namespaces too).
Héhé, I'm fully aware of the limitations, we move step by step but feel free to
send a patch ;-)
More seriously, I'm thinking to that problem but I did not start anything right
now and I don't know when I will have time to do it.
If I understand well, you are saying that this missing part is blocker to use
the new socket option. I don't agree with this. Doing a dump in an another netns
is easy to do.

>
> How does one use NETLINK_LISTEN_ALL_NSID without beeing able to dump all the
> interfaces of namespaces your patch series might send messages for?
>
> The only way I currently see, is to start the listening part before any
> namespace is created. Doing so, it can fill it's internal structures with the
> RTM_NEWLINK messages (besides that missing one for lo). But how do you get these
> RTM_NEWLINK messages for already created namespaces and their interfaces, if not
> by a dump?
I don't understand why dumping in another netns is a problem.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-26 12:10               ` Nicolas Dichtel
@ 2015-05-26 14:36                 ` Alexander Holler
  2015-05-29  5:57                   ` Alexander Holler
  0 siblings, 1 reply; 52+ messages in thread
From: Alexander Holler @ 2015-05-26 14:36 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Eric W. Biederman, netdev, tgraf, davem

Am 26.05.2015 um 14:10 schrieb Nicolas Dichtel:
> Le 26/05/2015 12:53, Alexander Holler a écrit :
>> Am 25.05.2015 um 15:09 schrieb Nicolas Dichtel:
> [snip]
>>
>> Hmm, sounds like we're talking in different rooms about the same thing
>> in regard
>> to the dump. ;)
>>
>> I just wanted to explain why I think this series misses the (extended)
>> dump
>> which includes all interfaces (those of other namespaces too).
> Héhé, I'm fully aware of the limitations, we move step by step but feel
> free to
> send a patch ;-)

I wasn't very successfull in sending kernel patches, so it would just be 
a waste of time (for me and anyone else).

> More seriously, I'm thinking to that problem but I did not start
> anything right
> now and I don't know when I will have time to do it.
> If I understand well, you are saying that this missing part is blocker
> to use
> the new socket option. I don't agree with this. Doing a dump in an
> another netns
> is easy to do.
>
>>
>> How does one use NETLINK_LISTEN_ALL_NSID without beeing able to dump
>> all the
>> interfaces of namespaces your patch series might send messages for?
>>
>> The only way I currently see, is to start the listening part before any
>> namespace is created. Doing so, it can fill it's internal structures
>> with the
>> RTM_NEWLINK messages (besides that missing one for lo). But how do you
>> get these
>> RTM_NEWLINK messages for already created namespaces and their
>> interfaces, if not
>> by a dump?
> I don't understand why dumping in another netns is a problem.

It isn't. I just wondered how you (or someone else) is using 
NETLINK_LISTEN_ALL_NSID, assuming it already serves a purpose. ;)

But your last sentence explained it.

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
  2015-05-26 14:36                 ` Alexander Holler
@ 2015-05-29  5:57                   ` Alexander Holler
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Holler @ 2015-05-29  5:57 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Eric W. Biederman, netdev, tgraf, davem

Am 26.05.2015 um 16:36 schrieb Alexander Holler:
> Am 26.05.2015 um 14:10 schrieb Nicolas Dichtel:
>> I don't understand why dumping in another netns is a problem.
>
> It isn't. I just wondered how you (or someone else) is using
> NETLINK_LISTEN_ALL_NSID, assuming it already serves a purpose. ;)

Maybe I should correct this, as I've lied because I got a bit bored. Sorry.

That dumping in another netns itself isn't a problem, but you need 
knowledge about the other namespaces. And if you have to go through all 
the pain to find and dump the interfaces in all other namenspaces by 
joining them, than there is not that much advantage in using 
NETLINK_LISTEN_ALL_NSID as you could still listen in another netns after 
the dump there too.

But there's no need to continue the discussion. I've understood that the 
patch series was only meant to reduce the number of sockets one 
otherwise would have continuously listen to.

Alexander Holler

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2015-05-29  5:57 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-06  9:58 [PATCH net-next 0/6] netns: ease netlink use with a lot of netns Nicolas Dichtel
2015-05-06  9:58 ` [PATCH net-next 1/6] netns: returns always an id in __peernet2id() Nicolas Dichtel
2015-05-06 11:19   ` Thomas Graf
2015-05-06  9:58 ` [PATCH net-next 2/6] netns: always provide the id to rtnl_net_fill() Nicolas Dichtel
2015-05-06 11:25   ` Thomas Graf
2015-05-06  9:58 ` [PATCH net-next 3/6] netns: rename peernet2id() to peernet2id_alloc() Nicolas Dichtel
2015-05-06 11:27   ` Thomas Graf
2015-05-06  9:58 ` [PATCH net-next 4/6] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
2015-05-06 11:48   ` Thomas Graf
2015-05-06 13:39     ` Nicolas Dichtel
2015-05-06  9:58 ` [PATCH net-next 5/6] netns: use a spin_lock to protect nsid management Nicolas Dichtel
2015-05-06 12:23   ` Thomas Graf
2015-05-06 13:40     ` Nicolas Dichtel
2015-05-06 14:05       ` Thomas Graf
2015-05-06  9:58 ` [PATCH net-next 6/6] netlink: allow to listen "all" netns Nicolas Dichtel
2015-05-06 12:10   ` Thomas Graf
2015-05-06 13:42     ` Nicolas Dichtel
2015-05-07  9:02 ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Nicolas Dichtel
2015-05-07  9:02   ` [PATCH net-next v2 1/7] netns: returns always an id in __peernet2id() Nicolas Dichtel
2015-05-07  9:02   ` [PATCH net-next v2 2/7] netns: always provide the id to rtnl_net_fill() Nicolas Dichtel
2015-05-07  9:02   ` [PATCH net-next v2 3/7] netns: rename peernet2id() to peernet2id_alloc() Nicolas Dichtel
2015-05-07  9:02   ` [PATCH net-next v2 4/7] netns: notify new nsid outside __peernet2id() Nicolas Dichtel
2015-05-07 11:47     ` Thomas Graf
2015-05-07  9:02   ` [PATCH net-next v2 5/7] netns: use a spin_lock to protect nsid management Nicolas Dichtel
2015-05-07  9:02   ` [PATCH net-next v2 6/7] netlink: rename private flags and states Nicolas Dichtel
2015-05-07 11:49     ` Thomas Graf
2015-05-07  9:02   ` [PATCH net-next v2 7/7] netlink: allow to listen "all" netns Nicolas Dichtel
2015-05-07 11:55     ` Thomas Graf
2015-05-08 12:02   ` [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns Eric W. Biederman
2015-05-09 21:07     ` Nicolas Dichtel
2015-05-22 20:50     ` Alexander Holler
2015-05-22 21:04       ` Cong Wang
2015-05-22 21:12         ` Alexander Holler
2015-05-22 21:29           ` Cong Wang
2015-05-22 21:46             ` Alexander Holler
2015-05-22 21:19       ` Eric W. Biederman
2015-05-22 21:30         ` Alexander Holler
2015-05-25  7:45       ` Nicolas Dichtel
2015-05-25 10:55         ` Alexander Holler
2015-05-25 13:09           ` Nicolas Dichtel
2015-05-26 10:53             ` Alexander Holler
2015-05-26 12:10               ` Nicolas Dichtel
2015-05-26 14:36                 ` Alexander Holler
2015-05-29  5:57                   ` Alexander Holler
2015-05-10  2:15   ` David Miller
2015-05-20 14:19     ` [PATCH iproute2-next 0/6] Allow to monitor 'all-nsid' with ip and ip xfrm Nicolas Dichtel
2015-05-20 14:19       ` [PATCH iproute2-next 1/6] include: update linux/netlink.h Nicolas Dichtel
2015-05-20 14:19       ` [PATCH iproute2-next 2/6] man: update ip monitor page Nicolas Dichtel
2015-05-20 14:19       ` [PATCH iproute2-next 3/6] libnetlink: introduce rtnl_listen_filter_t Nicolas Dichtel
2015-05-20 14:19       ` [PATCH iproute2-next 4/6] ipmonitor: introduce print_headers Nicolas Dichtel
2015-05-20 14:20       ` [PATCH iproute2-next 5/6] ipmonitor: allows to monitor in several netns Nicolas Dichtel
2015-05-20 14:20       ` [PATCH iproute2-next 6/6] xfrmmonitor: " Nicolas Dichtel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.