All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
@ 2018-08-28 23:18 Christian Brauner
  2018-08-28 23:18 ` [PATCH net-next 1/5] rtnetlink: add rtnl_get_net_ns_capable() Christian Brauner
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Christian Brauner @ 2018-08-28 23:18 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
	Christian Brauner

From: Christian Brauner <christian.brauner@ubuntu.com>

Hey,

A while back we introduced and enabled IFLA_IF_NETNSID in
RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
to signficant performance increases since it allows userspace to avoid
taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
interfaces from the netns associated with the netns_fd. Especially when a
lot of network namespaces are in use, using setns() becomes increasingly
problematic when performance matters.
Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
getifaddrs() style functions and friends). But currently, RTM_GETADDR
requests do not support a similar property like IFLA_IF_NETNSID for
RTM_*LINK requests.
This is problematic since userspace can retrieve interfaces from another
network namespace by sending a IFLA_IF_NETNSID property along but
RTM_GETLINK request but is still forced to use the legacy setns() style of
retrieving interfaces in RTM_GETADDR requests.

The goal of this series is to make it possible to perform RTM_GETADDR
requests on different network namespaces. To this end a new IFA_IF_NETNSID
property for RTM_*ADDR requests is introduced. It can be used to send a
network namespace identifier along in RTM_*ADDR requests.  The network
namespace identifier will be used to retrieve the target network namespace
in which the request is supposed to be fulfilled.  This aligns the behavior
of RTM_*ADDR requests with the behavior of RTM_*LINK requests.

Security:
- The caller must have assigned a valid network namespace identifier for
  the target network namespace.
- The caller must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Thanks!
Christian

[1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
[2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
[3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
[4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
[5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")

Christian Brauner (5):
  rtnetlink: add rtnl_get_net_ns_capable()
  if_addr: add IFA_IF_NETNSID
  ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
  ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
  rtnetlink: move type calculation out of loop

 include/net/rtnetlink.h      |  1 +
 include/uapi/linux/if_addr.h |  1 +
 net/core/rtnetlink.c         | 15 +++++---
 net/ipv4/devinet.c           | 38 +++++++++++++++-----
 net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
 5 files changed, 97 insertions(+), 28 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next 1/5] rtnetlink: add rtnl_get_net_ns_capable()
  2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
@ 2018-08-28 23:18 ` Christian Brauner
  2018-08-28 23:18 ` [PATCH net-next 2/5] if_addr: add IFA_IF_NETNSID Christian Brauner
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2018-08-28 23:18 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
	Christian Brauner

get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to
retrieve network namespaces based on network namespace identifiers. So
remove the static declaration and export in the rtnetlink header. Also,
rename it to rtnl_get_net_ns_capable() to make it obvious what this
function is doing.

Signed-off-by: Christian Brauner <christian@brauner.io>
---
 include/net/rtnetlink.h |  1 +
 net/core/rtnetlink.c    | 13 +++++++++----
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 0bbaa5488423..cf26e5aacac4 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -165,6 +165,7 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm);
 
 int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len,
 			struct netlink_ext_ack *exterr);
+struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid);
 
 #define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind)
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e3f743c141b3..c6c6f93cd195 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1795,7 +1795,12 @@ static bool link_dump_filtered(struct net_device *dev,
 	return false;
 }
 
-static struct net *get_target_net(struct sock *sk, int netnsid)
+/**
+ * rtnl_get_net_ns_capable - Get a network namespace based on a netnsid.
+ *
+ * Returns the network namespace on success or a error pointer on failure.
+ */
+struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid)
 {
 	struct net *net;
 
@@ -1847,7 +1852,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 			ifla_policy, NULL) >= 0) {
 		if (tb[IFLA_IF_NETNSID]) {
 			netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
-			tgt_net = get_target_net(skb->sk, netnsid);
+			tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
 			if (IS_ERR(tgt_net)) {
 				tgt_net = net;
 				netnsid = -1;
@@ -2715,7 +2720,7 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	if (tb[IFLA_IF_NETNSID]) {
 		netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
-		tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
+		tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
 		if (IS_ERR(tgt_net))
 			return PTR_ERR(tgt_net);
 	}
@@ -3125,7 +3130,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	if (tb[IFLA_IF_NETNSID]) {
 		netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
-		tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
+		tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
 		if (IS_ERR(tgt_net))
 			return PTR_ERR(tgt_net);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 2/5] if_addr: add IFA_IF_NETNSID
  2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
  2018-08-28 23:18 ` [PATCH net-next 1/5] rtnetlink: add rtnl_get_net_ns_capable() Christian Brauner
@ 2018-08-28 23:18 ` Christian Brauner
  2018-08-28 23:18 ` [PATCH net-next 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2018-08-28 23:18 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
	Christian Brauner

This adds a new IFA_IF_NETNSID property to be used by address families such
as PF_INET and PF_INET6.
The IFA_IF_NETNSID property can be used to send a network namespace
identifier as part of a request. If a IFA_IF_NETNSID property is identified
it will be used to retrieve the target network namespace in which the
request is to be made.

Signed-off-by: Christian Brauner <christian@brauner.io>
---
 include/uapi/linux/if_addr.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_addr.h b/include/uapi/linux/if_addr.h
index ebaf5701c9db..0e0cd588cac0 100644
--- a/include/uapi/linux/if_addr.h
+++ b/include/uapi/linux/if_addr.h
@@ -34,6 +34,7 @@ enum {
 	IFA_MULTICAST,
 	IFA_FLAGS,
 	IFA_RT_PRIORITY,  /* u32, priority/metric for prefix route */
+	IFA_IF_NETNSID,
 	__IFA_MAX,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
  2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
  2018-08-28 23:18 ` [PATCH net-next 1/5] rtnetlink: add rtnl_get_net_ns_capable() Christian Brauner
  2018-08-28 23:18 ` [PATCH net-next 2/5] if_addr: add IFA_IF_NETNSID Christian Brauner
@ 2018-08-28 23:18 ` Christian Brauner
  2018-08-28 23:18 ` [PATCH net-next 4/5] ipv6: " Christian Brauner
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2018-08-28 23:18 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
	Christian Brauner

- Backwards Compatibility:
  If userspace wants to determine whether ipv4 RTM_GETADDR requests support
  the new IFA_IF_NETNSID property they should verify that the reply after
  sending a request includes the IFA_IF_NETNSID property. If it does not
  userspace should assume that IFA_IF_NETNSID is not supported for ipv4
  RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
  RTM_GETADDR requests some of them pass down struct ifinfomsg when they
  should actually pass down struct ifaddrmsg. To not break existing
  tools that pass down the wrong struct we will do the same as for
  RTM_GETLINK | NLM_F_DUMP requests and not error out when the
  nlmsg_parse() fails.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian@brauner.io>
---
 net/ipv4/devinet.c | 38 ++++++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index d7585ab1a77a..7d980c2279f5 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -100,6 +100,7 @@ static const struct nla_policy ifa_ipv4_policy[IFA_MAX+1] = {
 	[IFA_CACHEINFO]		= { .len = sizeof(struct ifa_cacheinfo) },
 	[IFA_FLAGS]		= { .type = NLA_U32 },
 	[IFA_RT_PRIORITY]	= { .type = NLA_U32 },
+	[IFA_IF_NETNSID]	= { .type = NLA_S32 },
 };
 
 #define IN4_ADDR_HSIZE_SHIFT	8
@@ -1584,7 +1585,8 @@ static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp,
 }
 
 static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
-			    u32 portid, u32 seq, int event, unsigned int flags)
+			    u32 portid, u32 seq, int event, unsigned int flags,
+			    int netnsid)
 {
 	struct ifaddrmsg *ifm;
 	struct nlmsghdr  *nlh;
@@ -1601,6 +1603,9 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
 	ifm->ifa_scope = ifa->ifa_scope;
 	ifm->ifa_index = ifa->ifa_dev->dev->ifindex;
 
+	if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+		goto nla_put_failure;
+
 	if (!(ifm->ifa_flags & IFA_F_PERMANENT)) {
 		preferred = ifa->ifa_preferred_lft;
 		valid = ifa->ifa_valid_lft;
@@ -1648,6 +1653,9 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
 static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFA_MAX+1];
+	struct net *tgt_net = net;
+	int netnsid = -1;
 	int h, s_h;
 	int idx, s_idx;
 	int ip_idx, s_ip_idx;
@@ -1660,12 +1668,23 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 	s_idx = idx = cb->args[1];
 	s_ip_idx = ip_idx = cb->args[2];
 
+	if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
+			ifa_ipv4_policy, NULL) >= 0) {
+		if (tb[IFA_IF_NETNSID]) {
+			netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
+
+			tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
+			if (IS_ERR(tgt_net))
+				return PTR_ERR(tgt_net);
+		}
+	}
+
 	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
 		idx = 0;
-		head = &net->dev_index_head[h];
+		head = &tgt_net->dev_index_head[h];
 		rcu_read_lock();
-		cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
-			  net->dev_base_seq;
+		cb->seq = atomic_read(&tgt_net->ipv4.dev_addr_genid) ^
+			  tgt_net->dev_base_seq;
 		hlist_for_each_entry_rcu(dev, head, index_hlist) {
 			if (idx < s_idx)
 				goto cont;
@@ -1680,9 +1699,10 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 				if (ip_idx < s_ip_idx)
 					continue;
 				if (inet_fill_ifaddr(skb, ifa,
-					     NETLINK_CB(cb->skb).portid,
-					     cb->nlh->nlmsg_seq,
-					     RTM_NEWADDR, NLM_F_MULTI) < 0) {
+						     NETLINK_CB(cb->skb).portid,
+						     cb->nlh->nlmsg_seq,
+						     RTM_NEWADDR, NLM_F_MULTI,
+						     netnsid) < 0) {
 					rcu_read_unlock();
 					goto done;
 				}
@@ -1698,6 +1718,8 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 	cb->args[0] = h;
 	cb->args[1] = idx;
 	cb->args[2] = ip_idx;
+	if (netnsid >= 0)
+		put_net(tgt_net);
 
 	return skb->len;
 }
@@ -1715,7 +1737,7 @@ static void rtmsg_ifa(int event, struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 	if (!skb)
 		goto errout;
 
-	err = inet_fill_ifaddr(skb, ifa, portid, seq, event, 0);
+	err = inet_fill_ifaddr(skb, ifa, portid, seq, event, 0, -1);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in inet_nlmsg_size() */
 		WARN_ON(err == -EMSGSIZE);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
  2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
                   ` (2 preceding siblings ...)
  2018-08-28 23:18 ` [PATCH net-next 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
@ 2018-08-28 23:18 ` Christian Brauner
  2018-08-30 18:41   ` kbuild test robot
  2018-08-28 23:18 ` [PATCH net-next 5/5] rtnetlink: move type calculation out of loop Christian Brauner
  2018-08-29  8:30 ` [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Kirill Tkhai
  5 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2018-08-28 23:18 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
	Christian Brauner

- Backwards Compatibility:
  If userspace wants to determine whether ipv6 RTM_GETADDR requests support
  the new IFA_IF_NETNSID property they should verify that the reply after
  sending a request includes the IFA_IF_NETNSID property. If it does not
  userspace should assume that IFA_IF_NETNSID is not supported for ipv6
  RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
  RTM_GETADDR requests some of them pass down struct ifinfomsg when they
  should actually pass down struct ifaddrmsg. To not break existing
  tools that pass down the wrong struct we will do the same as for
  RTM_GETLINK | NLM_F_DUMP requests and not error out when the
  nlmsg_parse() fails.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian@brauner.io>
---
 net/ipv6/addrconf.c | 70 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f66a1cae3366..f48f3479b341 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4493,6 +4493,7 @@ static const struct nla_policy ifa_ipv6_policy[IFA_MAX+1] = {
 	[IFA_CACHEINFO]		= { .len = sizeof(struct ifa_cacheinfo) },
 	[IFA_FLAGS]		= { .len = sizeof(u32) },
 	[IFA_RT_PRIORITY]	= { .len = sizeof(u32) },
+	[IFA_IF_NETNSID]	= { .type = NLA_S32 },
 };
 
 static int
@@ -4796,7 +4797,8 @@ static inline int inet6_ifaddr_msgsize(void)
 }
 
 static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
-			     u32 portid, u32 seq, int event, unsigned int flags)
+			     u32 portid, u32 seq, int event, unsigned int flags,
+			     int netnsid)
 {
 	struct nlmsghdr  *nlh;
 	u32 preferred, valid;
@@ -4808,6 +4810,9 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
 	put_ifaddrmsg(nlh, ifa->prefix_len, ifa->flags, rt_scope(ifa->scope),
 		      ifa->idev->dev->ifindex);
 
+	if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+		goto error;
+
 	if (!((ifa->flags&IFA_F_PERMANENT) &&
 	      (ifa->prefered_lft == INFINITY_LIFE_TIME))) {
 		preferred = ifa->prefered_lft;
@@ -4857,7 +4862,8 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
 }
 
 static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
-				u32 portid, u32 seq, int event, u16 flags)
+			       u32 portid, u32 seq, int event, u16 flags,
+			       int netnsid)
 {
 	struct nlmsghdr  *nlh;
 	u8 scope = RT_SCOPE_UNIVERSE;
@@ -4870,6 +4876,9 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
 	if (!nlh)
 		return -EMSGSIZE;
 
+	if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+		return -EMSGSIZE;
+
 	put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
 	if (nla_put_in6_addr(skb, IFA_MULTICAST, &ifmca->mca_addr) < 0 ||
 	    put_cacheinfo(skb, ifmca->mca_cstamp, ifmca->mca_tstamp,
@@ -4883,7 +4892,8 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
 }
 
 static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
-				u32 portid, u32 seq, int event, unsigned int flags)
+			       u32 portid, u32 seq, int event,
+			       unsigned int flags, int netnsid)
 {
 	struct net_device *dev = fib6_info_nh_dev(ifaca->aca_rt);
 	int ifindex = dev ? dev->ifindex : 1;
@@ -4897,6 +4907,9 @@ static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
 	if (!nlh)
 		return -EMSGSIZE;
 
+	if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+		return -EMSGSIZE;
+
 	put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
 	if (nla_put_in6_addr(skb, IFA_ANYCAST, &ifaca->aca_addr) < 0 ||
 	    put_cacheinfo(skb, ifaca->aca_cstamp, ifaca->aca_tstamp,
@@ -4918,7 +4931,7 @@ enum addr_type_t {
 /* called with rcu_read_lock() */
 static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
 			  struct netlink_callback *cb, enum addr_type_t type,
-			  int s_ip_idx, int *p_ip_idx)
+			  int s_ip_idx, int *p_ip_idx, int netnsid)
 {
 	struct ifmcaddr6 *ifmca;
 	struct ifacaddr6 *ifaca;
@@ -4938,7 +4951,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
 						NETLINK_CB(cb->skb).portid,
 						cb->nlh->nlmsg_seq,
 						RTM_NEWADDR,
-						NLM_F_MULTI);
+						NLM_F_MULTI, netnsid);
 			if (err < 0)
 				break;
 			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
@@ -4955,7 +4968,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
 						  NETLINK_CB(cb->skb).portid,
 						  cb->nlh->nlmsg_seq,
 						  RTM_GETMULTICAST,
-						  NLM_F_MULTI);
+						  NLM_F_MULTI, netnsid);
 			if (err < 0)
 				break;
 		}
@@ -4970,7 +4983,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
 						  NETLINK_CB(cb->skb).portid,
 						  cb->nlh->nlmsg_seq,
 						  RTM_GETANYCAST,
-						  NLM_F_MULTI);
+						  NLM_F_MULTI, netnsid);
 			if (err < 0)
 				break;
 		}
@@ -4987,6 +5000,9 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 			   enum addr_type_t type)
 {
 	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFA_MAX+1];
+	struct net *tgt_net = net;
+	int netnsid = -1;
 	int h, s_h;
 	int idx, ip_idx;
 	int s_idx, s_ip_idx;
@@ -4998,11 +5014,22 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 	s_idx = idx = cb->args[1];
 	s_ip_idx = ip_idx = cb->args[2];
 
+	if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
+			ifa_ipv6_policy, NULL) >= 0) {
+		if (tb[IFA_IF_NETNSID]) {
+			netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
+
+			tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
+			if (IS_ERR(tgt_net))
+				return PTR_ERR(tgt_net);
+		}
+	}
+
 	rcu_read_lock();
-	cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^ net->dev_base_seq;
+	cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq;
 	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
 		idx = 0;
-		head = &net->dev_index_head[h];
+		head = &tgt_net->dev_index_head[h];
 		hlist_for_each_entry_rcu(dev, head, index_hlist) {
 			if (idx < s_idx)
 				goto cont;
@@ -5014,7 +5041,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 				goto cont;
 
 			if (in6_dump_addrs(idev, skb, cb, type,
-					   s_ip_idx, &ip_idx) < 0)
+					   s_ip_idx, &ip_idx, netnsid) < 0)
 				goto done;
 cont:
 			idx++;
@@ -5025,6 +5052,8 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 	cb->args[0] = h;
 	cb->args[1] = idx;
 	cb->args[2] = ip_idx;
+	if (netnsid >= 0)
+		put_net(tgt_net);
 
 	return skb->len;
 }
@@ -5055,12 +5084,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 			     struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(in_skb->sk);
+	struct net *tgt_net = net;
 	struct ifaddrmsg *ifm;
 	struct nlattr *tb[IFA_MAX+1];
 	struct in6_addr *addr = NULL, *peer;
 	struct net_device *dev = NULL;
 	struct inet6_ifaddr *ifa;
 	struct sk_buff *skb;
+	int netnsid = -1;
 	int err;
 
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy,
@@ -5068,15 +5099,24 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	if (err < 0)
 		return err;
 
+	if (tb[IFA_IF_NETNSID]) {
+		netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
+
+		tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(in_skb).sk,
+						  netnsid);
+		if (IS_ERR(tgt_net))
+			return PTR_ERR(tgt_net);
+	}
+
 	addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer);
 	if (!addr)
 		return -EINVAL;
 
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifa_index)
-		dev = dev_get_by_index(net, ifm->ifa_index);
+		dev = dev_get_by_index(tgt_net, ifm->ifa_index);
 
-	ifa = ipv6_get_ifaddr(net, addr, dev, 1);
+	ifa = ipv6_get_ifaddr(tgt_net, addr, dev, 1);
 	if (!ifa) {
 		err = -EADDRNOTAVAIL;
 		goto errout;
@@ -5089,14 +5129,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	}
 
 	err = inet6_fill_ifaddr(skb, ifa, NETLINK_CB(in_skb).portid,
-				nlh->nlmsg_seq, RTM_NEWADDR, 0);
+				nlh->nlmsg_seq, RTM_NEWADDR, 0, netnsid);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
 		WARN_ON(err == -EMSGSIZE);
 		kfree_skb(skb);
 		goto errout_ifa;
 	}
-	err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
+	err = rtnl_unicast(skb, tgt_net, NETLINK_CB(in_skb).portid);
 errout_ifa:
 	in6_ifa_put(ifa);
 errout:
@@ -5115,7 +5155,7 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
 	if (!skb)
 		goto errout;
 
-	err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0);
+	err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0, -1);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
 		WARN_ON(err == -EMSGSIZE);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 5/5] rtnetlink: move type calculation out of loop
  2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
                   ` (3 preceding siblings ...)
  2018-08-28 23:18 ` [PATCH net-next 4/5] ipv6: " Christian Brauner
@ 2018-08-28 23:18 ` Christian Brauner
  2018-08-29  8:30 ` [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Kirill Tkhai
  5 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2018-08-28 23:18 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
	Christian Brauner

I don't see how the type - which is one of
RTM_{GETADDR,GETROUTE,GETNETCONF} - can change. So do the message type
calculation once before entering the for loop.

Signed-off-by: Christian Brauner <christian@brauner.io>
---
 net/core/rtnetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c6c6f93cd195..a644d392918b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3215,13 +3215,13 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	int idx;
 	int s_idx = cb->family;
+	int type = cb->nlh->nlmsg_type - RTM_BASE;
 
 	if (s_idx == 0)
 		s_idx = 1;
 
 	for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) {
 		struct rtnl_link **tab;
-		int type = cb->nlh->nlmsg_type-RTM_BASE;
 		struct rtnl_link *link;
 		rtnl_dumpit_func dumpit;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
                   ` (4 preceding siblings ...)
  2018-08-28 23:18 ` [PATCH net-next 5/5] rtnetlink: move type calculation out of loop Christian Brauner
@ 2018-08-29  8:30 ` Kirill Tkhai
  2018-08-29 18:13   ` Christian Brauner
  5 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-08-29  8:30 UTC (permalink / raw)
  To: Christian Brauner, netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
	Christian Brauner

Hi, Christian,

On 29.08.2018 02:18, Christian Brauner wrote:
> From: Christian Brauner <christian.brauner@ubuntu.com>
> 
> Hey,
> 
> A while back we introduced and enabled IFLA_IF_NETNSID in
> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> to signficant performance increases since it allows userspace to avoid
> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> interfaces from the netns associated with the netns_fd. Especially when a
> lot of network namespaces are in use, using setns() becomes increasingly
> problematic when performance matters.

could you please give a real example, when setns()+socket(AF_NETLINK) cause
problems with the performance? You should do this only once on application
startup, and then you have created netlink sockets in any net namespaces you
need. What is the problem here?

> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> getifaddrs() style functions and friends). But currently, RTM_GETADDR
> requests do not support a similar property like IFLA_IF_NETNSID for
> RTM_*LINK requests.
> This is problematic since userspace can retrieve interfaces from another
> network namespace by sending a IFLA_IF_NETNSID property along but
> RTM_GETLINK request but is still forced to use the legacy setns() style of
> retrieving interfaces in RTM_GETADDR requests.
> 
> The goal of this series is to make it possible to perform RTM_GETADDR
> requests on different network namespaces. To this end a new IFA_IF_NETNSID
> property for RTM_*ADDR requests is introduced. It can be used to send a
> network namespace identifier along in RTM_*ADDR requests.  The network
> namespace identifier will be used to retrieve the target network namespace
> in which the request is supposed to be fulfilled.  This aligns the behavior
> of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> 
> Security:
> - The caller must have assigned a valid network namespace identifier for
>   the target network namespace.
> - The caller must have CAP_NET_ADMIN in the owning user namespace of the
>   target network namespace.
> 
> Thanks!
> Christian
> 
> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> 
> Christian Brauner (5):
>   rtnetlink: add rtnl_get_net_ns_capable()
>   if_addr: add IFA_IF_NETNSID
>   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
>   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
>   rtnetlink: move type calculation out of loop
> 
>  include/net/rtnetlink.h      |  1 +
>  include/uapi/linux/if_addr.h |  1 +
>  net/core/rtnetlink.c         | 15 +++++---
>  net/ipv4/devinet.c           | 38 +++++++++++++++-----
>  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
>  5 files changed, 97 insertions(+), 28 deletions(-)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-08-29  8:30 ` [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Kirill Tkhai
@ 2018-08-29 18:13   ` Christian Brauner
  2018-08-30  8:49     ` Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2018-08-29 18:13 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel

Hi Kirill,

Thanks for the question!

On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> Hi, Christian,
> 
> On 29.08.2018 02:18, Christian Brauner wrote:
> > From: Christian Brauner <christian@brauner.io>
> > 
> > Hey,
> > 
> > A while back we introduced and enabled IFLA_IF_NETNSID in
> > RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> > to signficant performance increases since it allows userspace to avoid
> > taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> > interfaces from the netns associated with the netns_fd. Especially when a
> > lot of network namespaces are in use, using setns() becomes increasingly
> > problematic when performance matters.
> 
> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> problems with the performance? You should do this only once on application
> startup, and then you have created netlink sockets in any net namespaces you
> need. What is the problem here?

So we have a daemon (LXD) that is often running thousands of containers.
When users issue a lxc list request against the daemon it returns a list
of all containers including all of the interfaces and addresses for each
container. To retrieve those addresses we currently rely on setns() +
getifaddrs() for each of those containers. That has horrible
performance.
The problem with what you're proposing is that the daemon would need to
cache a socket file descriptor for each container which is something
that we unfortunately cannot do since we can't excessively cache file
descriptors because we can easily hit the open file limit. We also
refrain from caching file descriptors for a long time for security
reasons.

For the case where users just request a list of the interfaces we
can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
performance. But we can't do the same with RTM_GETADDR requests which
was an oversight on my part when I wrote the original patchset for the
RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
RTM_GETADDR.
Based on this patchset I have written a userspace POC that is basically
a netns namespace aware getifaddr() or - as I like to call it -
netns_getifaddr().

> 
> > Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> > getifaddrs() style functions and friends). But currently, RTM_GETADDR
> > requests do not support a similar property like IFLA_IF_NETNSID for
> > RTM_*LINK requests.
> > This is problematic since userspace can retrieve interfaces from another
> > network namespace by sending a IFLA_IF_NETNSID property along but
> > RTM_GETLINK request but is still forced to use the legacy setns() style of
> > retrieving interfaces in RTM_GETADDR requests.
> > 
> > The goal of this series is to make it possible to perform RTM_GETADDR
> > requests on different network namespaces. To this end a new IFA_IF_NETNSID
> > property for RTM_*ADDR requests is introduced. It can be used to send a
> > network namespace identifier along in RTM_*ADDR requests.  The network
> > namespace identifier will be used to retrieve the target network namespace
> > in which the request is supposed to be fulfilled.  This aligns the behavior
> > of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> > 
> > Security:
> > - The caller must have assigned a valid network namespace identifier for
> >   the target network namespace.
> > - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> >   target network namespace.
> > 
> > Thanks!
> > Christian
> > 
> > [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> > [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> > [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> > [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> > [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> > 
> > Christian Brauner (5):
> >   rtnetlink: add rtnl_get_net_ns_capable()
> >   if_addr: add IFA_IF_NETNSID
> >   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> >   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> >   rtnetlink: move type calculation out of loop
> > 
> >  include/net/rtnetlink.h      |  1 +
> >  include/uapi/linux/if_addr.h |  1 +
> >  net/core/rtnetlink.c         | 15 +++++---
> >  net/ipv4/devinet.c           | 38 +++++++++++++++-----
> >  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
> >  5 files changed, 97 insertions(+), 28 deletions(-)
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-08-29 18:13   ` Christian Brauner
@ 2018-08-30  8:49     ` Kirill Tkhai
  2018-08-30 14:45       ` Christian Brauner
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-08-30  8:49 UTC (permalink / raw)
  To: Christian Brauner
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel

On 29.08.2018 21:13, Christian Brauner wrote:
> Hi Kirill,
> 
> Thanks for the question!
> 
> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
>> Hi, Christian,
>>
>> On 29.08.2018 02:18, Christian Brauner wrote:
>>> From: Christian Brauner <christian@brauner.io>
>>>
>>> Hey,
>>>
>>> A while back we introduced and enabled IFLA_IF_NETNSID in
>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
>>> to signficant performance increases since it allows userspace to avoid
>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
>>> interfaces from the netns associated with the netns_fd. Especially when a
>>> lot of network namespaces are in use, using setns() becomes increasingly
>>> problematic when performance matters.
>>
>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
>> problems with the performance? You should do this only once on application
>> startup, and then you have created netlink sockets in any net namespaces you
>> need. What is the problem here?
> 
> So we have a daemon (LXD) that is often running thousands of containers.
> When users issue a lxc list request against the daemon it returns a list
> of all containers including all of the interfaces and addresses for each
> container. To retrieve those addresses we currently rely on setns() +
> getifaddrs() for each of those containers. That has horrible
> performance.

Could you please provide some numbers showing that setns()
introduces signify performance decrease in the application?

I worry about all this just because of netlink interface is
already overloaded, while this patch makes it even less modular.
In case of one day we finally reach rtnl unscalability trap,
every common interface like this may be a decisive nail in
a coffin lid of possibility to overwrite everything.

> The problem with what you're proposing is that the daemon would need to
> cache a socket file descriptor for each container which is something
> that we unfortunately cannot do since we can't excessively cache file
> descriptors because we can easily hit the open file limit. We also
> refrain from caching file descriptors for a long time for security
> reasons.
> 
> For the case where users just request a list of the interfaces we
> can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
> performance. But we can't do the same with RTM_GETADDR requests which
> was an oversight on my part when I wrote the original patchset for the
> RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
> RTM_GETADDR.
> Based on this patchset I have written a userspace POC that is basically
> a netns namespace aware getifaddr() or - as I like to call it -
> netns_getifaddr().
> 
>>
>>> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
>>> getifaddrs() style functions and friends). But currently, RTM_GETADDR
>>> requests do not support a similar property like IFLA_IF_NETNSID for
>>> RTM_*LINK requests.
>>> This is problematic since userspace can retrieve interfaces from another
>>> network namespace by sending a IFLA_IF_NETNSID property along but
>>> RTM_GETLINK request but is still forced to use the legacy setns() style of
>>> retrieving interfaces in RTM_GETADDR requests.
>>>
>>> The goal of this series is to make it possible to perform RTM_GETADDR
>>> requests on different network namespaces. To this end a new IFA_IF_NETNSID
>>> property for RTM_*ADDR requests is introduced. It can be used to send a
>>> network namespace identifier along in RTM_*ADDR requests.  The network
>>> namespace identifier will be used to retrieve the target network namespace
>>> in which the request is supposed to be fulfilled.  This aligns the behavior
>>> of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
>>>
>>> Security:
>>> - The caller must have assigned a valid network namespace identifier for
>>>   the target network namespace.
>>> - The caller must have CAP_NET_ADMIN in the owning user namespace of the
>>>   target network namespace.
>>>
>>> Thanks!
>>> Christian
>>>
>>> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
>>> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
>>> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
>>> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
>>> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
>>>
>>> Christian Brauner (5):
>>>   rtnetlink: add rtnl_get_net_ns_capable()
>>>   if_addr: add IFA_IF_NETNSID
>>>   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
>>>   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
>>>   rtnetlink: move type calculation out of loop
>>>
>>>  include/net/rtnetlink.h      |  1 +
>>>  include/uapi/linux/if_addr.h |  1 +
>>>  net/core/rtnetlink.c         | 15 +++++---
>>>  net/ipv4/devinet.c           | 38 +++++++++++++++-----
>>>  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
>>>  5 files changed, 97 insertions(+), 28 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-08-30  8:49     ` Kirill Tkhai
@ 2018-08-30 14:45       ` Christian Brauner
  2018-08-30 15:49         ` Nicolas Dichtel
  2018-09-01  1:34         ` Christian Brauner
  0 siblings, 2 replies; 23+ messages in thread
From: Christian Brauner @ 2018-08-30 14:45 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel

On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> On 29.08.2018 21:13, Christian Brauner wrote:
> > Hi Kirill,
> > 
> > Thanks for the question!
> > 
> > On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> >> Hi, Christian,
> >>
> >> On 29.08.2018 02:18, Christian Brauner wrote:
> >>> From: Christian Brauner <christian@brauner.io>
> >>>
> >>> Hey,
> >>>
> >>> A while back we introduced and enabled IFLA_IF_NETNSID in
> >>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> >>> to signficant performance increases since it allows userspace to avoid
> >>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> >>> interfaces from the netns associated with the netns_fd. Especially when a
> >>> lot of network namespaces are in use, using setns() becomes increasingly
> >>> problematic when performance matters.
> >>
> >> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> >> problems with the performance? You should do this only once on application
> >> startup, and then you have created netlink sockets in any net namespaces you
> >> need. What is the problem here?
> > 
> > So we have a daemon (LXD) that is often running thousands of containers.
> > When users issue a lxc list request against the daemon it returns a list
> > of all containers including all of the interfaces and addresses for each
> > container. To retrieve those addresses we currently rely on setns() +
> > getifaddrs() for each of those containers. That has horrible
> > performance.
> 
> Could you please provide some numbers showing that setns()
> introduces signify performance decrease in the application?

Sure, might take a few days++ though since I'm traveling.

> 
> I worry about all this just because of netlink interface is
> already overloaded, while this patch makes it even less modular.

Introducing the IFA_IF_NETNSID property will not make the netlink
interface less modular. It is a clean, RTM_*ADDR-request specific
property using network namespace identifiers which we discussed in prior
patches are the way to go forward.

You can already get interfaces via GETLINK from another network
namespaces than the one you reside in (Which we enabled just a few
months back.) but you can't do the same for GETADDR. Those two are
almost always used together. When you want to get the links you usually
also want to get the addresses associated with it right after.
In a prior discussion we agreed that network namespace identifiers are
the way to go forward but that any other propery, i.e. PIDs and fds
should never be ported into other parts of the codebase and that is
indeed something I agree with.

> In case of one day we finally reach rtnl unscalability trap,
> every common interface like this may be a decisive nail in
> a coffin lid of possibility to overwrite everything.
> 
> > The problem with what you're proposing is that the daemon would need to
> > cache a socket file descriptor for each container which is something
> > that we unfortunately cannot do since we can't excessively cache file
> > descriptors because we can easily hit the open file limit. We also
> > refrain from caching file descriptors for a long time for security
> > reasons.
> > 
> > For the case where users just request a list of the interfaces we
> > can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
> > performance. But we can't do the same with RTM_GETADDR requests which
> > was an oversight on my part when I wrote the original patchset for the
> > RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
> > RTM_GETADDR.
> > Based on this patchset I have written a userspace POC that is basically
> > a netns namespace aware getifaddr() or - as I like to call it -
> > netns_getifaddr().
> > 
> >>
> >>> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> >>> getifaddrs() style functions and friends). But currently, RTM_GETADDR
> >>> requests do not support a similar property like IFLA_IF_NETNSID for
> >>> RTM_*LINK requests.
> >>> This is problematic since userspace can retrieve interfaces from another
> >>> network namespace by sending a IFLA_IF_NETNSID property along but
> >>> RTM_GETLINK request but is still forced to use the legacy setns() style of
> >>> retrieving interfaces in RTM_GETADDR requests.
> >>>
> >>> The goal of this series is to make it possible to perform RTM_GETADDR
> >>> requests on different network namespaces. To this end a new IFA_IF_NETNSID
> >>> property for RTM_*ADDR requests is introduced. It can be used to send a
> >>> network namespace identifier along in RTM_*ADDR requests.  The network
> >>> namespace identifier will be used to retrieve the target network namespace
> >>> in which the request is supposed to be fulfilled.  This aligns the behavior
> >>> of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> >>>
> >>> Security:
> >>> - The caller must have assigned a valid network namespace identifier for
> >>>   the target network namespace.
> >>> - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> >>>   target network namespace.
> >>>
> >>> Thanks!
> >>> Christian
> >>>
> >>> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> >>> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> >>> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> >>> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> >>> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> >>>
> >>> Christian Brauner (5):
> >>>   rtnetlink: add rtnl_get_net_ns_capable()
> >>>   if_addr: add IFA_IF_NETNSID
> >>>   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> >>>   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> >>>   rtnetlink: move type calculation out of loop
> >>>
> >>>  include/net/rtnetlink.h      |  1 +
> >>>  include/uapi/linux/if_addr.h |  1 +
> >>>  net/core/rtnetlink.c         | 15 +++++---
> >>>  net/ipv4/devinet.c           | 38 +++++++++++++++-----
> >>>  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
> >>>  5 files changed, 97 insertions(+), 28 deletions(-)
> >>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-08-30 14:45       ` Christian Brauner
@ 2018-08-30 15:49         ` Nicolas Dichtel
  2018-09-01  0:58           ` David Miller
  2018-09-01  1:34         ` Christian Brauner
  1 sibling, 1 reply; 23+ messages in thread
From: Nicolas Dichtel @ 2018-08-30 15:49 UTC (permalink / raw)
  To: Christian Brauner, Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc

Le 30/08/2018 à 16:45, Christian Brauner a écrit :
[snip]
> Introducing the IFA_IF_NETNSID property will not make the netlink
> interface less modular. It is a clean, RTM_*ADDR-request specific
> property using network namespace identifiers which we discussed in prior
> patches are the way to go forward.
> 
> You can already get interfaces via GETLINK from another network
> namespaces than the one you reside in (Which we enabled just a few
> months back.) but you can't do the same for GETADDR. Those two are
> almost always used together. When you want to get the links you usually
> also want to get the addresses associated with it right after.
> In a prior discussion we agreed that network namespace identifiers are
> the way to go forward but that any other propery, i.e. PIDs and fds
> should never be ported into other parts of the codebase and that is
> indeed something I agree with.
Yes, I agree with this and I think this series go to the right direction.

Maybe I would choose a more generic name for the attribute, something that can
be used in other netlink families (xfrm, netfilter, ...) also.
What about IFA_TARGET_NSID?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
  2018-08-28 23:18 ` [PATCH net-next 4/5] ipv6: " Christian Brauner
@ 2018-08-30 18:41   ` kbuild test robot
  2018-09-03  1:18     ` Christian Brauner
  0 siblings, 1 reply; 23+ messages in thread
From: kbuild test robot @ 2018-08-30 18:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kbuild-all, netdev, linux-kernel, davem, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, ktkhai, lucien.xin,
	jakub.kicinski, jbenc, nicolas.dichtel, Christian Brauner

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

Hi Christian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-add-IFA_IF_NETNSID-for-RTM_GETADDR/20180830-194411
config: x86_64-randconfig-s1-08302022 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "rtnl_get_net_ns_capable" [net/ipv6/ipv6.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23841 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-08-30 15:49         ` Nicolas Dichtel
@ 2018-09-01  0:58           ` David Miller
  2018-09-01 18:47             ` Christian Brauner
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2018-09-01  0:58 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: christian.brauner, ktkhai, netdev, linux-kernel, kuznet,
	yoshfuji, pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski, jbenc

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 30 Aug 2018 17:49:12 +0200

> Maybe I would choose a more generic name for the attribute,
> something that can be used in other netlink families (xfrm,
> netfilter, ...) also.  What about IFA_TARGET_NSID?

Agreed, let's at least try to think ahead about potential future
uses :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-08-30 14:45       ` Christian Brauner
  2018-08-30 15:49         ` Nicolas Dichtel
@ 2018-09-01  1:34         ` Christian Brauner
  2018-09-03 13:41           ` Kirill Tkhai
  1 sibling, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2018-09-01  1:34 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel

On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> > On 29.08.2018 21:13, Christian Brauner wrote:
> > > Hi Kirill,
> > > 
> > > Thanks for the question!
> > > 
> > > On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> > >> Hi, Christian,
> > >>
> > >> On 29.08.2018 02:18, Christian Brauner wrote:
> > >>> From: Christian Brauner <christian@brauner.io>
> > >>>
> > >>> Hey,
> > >>>
> > >>> A while back we introduced and enabled IFLA_IF_NETNSID in
> > >>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> > >>> to signficant performance increases since it allows userspace to avoid
> > >>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> > >>> interfaces from the netns associated with the netns_fd. Especially when a
> > >>> lot of network namespaces are in use, using setns() becomes increasingly
> > >>> problematic when performance matters.
> > >>
> > >> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> > >> problems with the performance? You should do this only once on application
> > >> startup, and then you have created netlink sockets in any net namespaces you
> > >> need. What is the problem here?
> > > 
> > > So we have a daemon (LXD) that is often running thousands of containers.
> > > When users issue a lxc list request against the daemon it returns a list
> > > of all containers including all of the interfaces and addresses for each
> > > container. To retrieve those addresses we currently rely on setns() +
> > > getifaddrs() for each of those containers. That has horrible
> > > performance.
> > 
> > Could you please provide some numbers showing that setns()
> > introduces signify performance decrease in the application?
> 
> Sure, might take a few days++ though since I'm traveling.

Hey Kirill,

As promised here's some code [1] that compares performance. I basically
did a setns() to the network namespace and called getifaddrs() and
compared this to the scenario where I use the newly introduced property.
I did this 1 million times and calculated the mean getifaddrs()
retrieval time based on that.
My patch cuts the time in half.

brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
Mean time in microseconds (netnsid): 81
Mean time in microseconds (setns): 162

Christian

I'm only appending the main file since the netsns_getifaddrs() code I
used is pretty long:

[1]:

#define _GNU_SOURCE
#define __STDC_FORMAT_MACROS
#include <fcntl.h>
#include <inttypes.h>
#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

#include "netns_getifaddrs.h"
#include "print_getifaddrs.h"

#define ITERATIONS 1000000
#define SEC_TO_MICROSEC(x) (1000000 * (x))

int main(int argc, char *argv[])
{
	int i, ret;
	__s32 netns_id;
	pid_t netns_pid;
	char path[1024];
	intmax_t times[ITERATIONS];
	struct timeval t1, t2;
	intmax_t time_in_mcs;
	int fret = EXIT_FAILURE;
	intmax_t sum = 0;
	int host_netns_fd = -1, netns_fd = -1;

	struct ifaddrs *ifaddrs = NULL;

	if (argc != 3)
		goto on_error;

	netns_id = atoi(argv[1]);
	netns_pid = atoi(argv[2]);
	printf("%d\n", netns_id);
	printf("%d\n", netns_pid);

	for (i = 0; i < ITERATIONS; i++) {
		ret = gettimeofday(&t1, NULL);
		if (ret < 0)
			goto on_error;

		ret = netns_getifaddrs(&ifaddrs, netns_id);
		freeifaddrs(ifaddrs);
		if (ret < 0)
			goto on_error;

		ret = gettimeofday(&t2, NULL);
		if (ret < 0)
			goto on_error;

		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
		times[i] = time_in_mcs;
	}

	for (i = 0; i < ITERATIONS; i++)
		sum += times[i];

	printf("Mean time in microseconds (netnsid): %ju\n",
	       sum / ITERATIONS);

	ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
	if (ret < 0 || (size_t)ret >= sizeof(path))
		goto on_error;

	netns_fd = open(path, O_RDONLY | O_CLOEXEC);
	if (netns_fd < 0)
		goto on_error;

	host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
	if (host_netns_fd < 0)
		goto on_error;

	memset(times, 0, sizeof(times));
	for (i = 0; i < ITERATIONS; i++) {
		ret = gettimeofday(&t1, NULL);
		if (ret < 0)
			goto on_error;

		ret = setns(netns_fd, CLONE_NEWNET);
		if (ret < 0)
			goto on_error;

		ret = getifaddrs(&ifaddrs);
		freeifaddrs(ifaddrs);
		if (ret < 0)
			goto on_error;

		ret = gettimeofday(&t2, NULL);
		if (ret < 0)
			goto on_error;

		ret = setns(host_netns_fd, CLONE_NEWNET);
		if (ret < 0)
			goto on_error;

		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
		times[i] = time_in_mcs;
	}

	for (i = 0; i < ITERATIONS; i++)
		sum += times[i];

	printf("Mean time in microseconds (setns): %ju\n",
	       sum / ITERATIONS);

	fret = EXIT_SUCCESS;

on_error:
	if (netns_fd >= 0)
		close(netns_fd);

	if (host_netns_fd >= 0)
		close(host_netns_fd);

	exit(fret);
}

> 
> > 
> > I worry about all this just because of netlink interface is
> > already overloaded, while this patch makes it even less modular.
> 
> Introducing the IFA_IF_NETNSID property will not make the netlink
> interface less modular. It is a clean, RTM_*ADDR-request specific
> property using network namespace identifiers which we discussed in prior
> patches are the way to go forward.
> 
> You can already get interfaces via GETLINK from another network
> namespaces than the one you reside in (Which we enabled just a few
> months back.) but you can't do the same for GETADDR. Those two are
> almost always used together. When you want to get the links you usually
> also want to get the addresses associated with it right after.
> In a prior discussion we agreed that network namespace identifiers are
> the way to go forward but that any other propery, i.e. PIDs and fds
> should never be ported into other parts of the codebase and that is
> indeed something I agree with.
> 
> > In case of one day we finally reach rtnl unscalability trap,
> > every common interface like this may be a decisive nail in
> > a coffin lid of possibility to overwrite everything.
> > 
> > > The problem with what you're proposing is that the daemon would need to
> > > cache a socket file descriptor for each container which is something
> > > that we unfortunately cannot do since we can't excessively cache file
> > > descriptors because we can easily hit the open file limit. We also
> > > refrain from caching file descriptors for a long time for security
> > > reasons.
> > > 
> > > For the case where users just request a list of the interfaces we
> > > can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
> > > performance. But we can't do the same with RTM_GETADDR requests which
> > > was an oversight on my part when I wrote the original patchset for the
> > > RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
> > > RTM_GETADDR.
> > > Based on this patchset I have written a userspace POC that is basically
> > > a netns namespace aware getifaddr() or - as I like to call it -
> > > netns_getifaddr().
> > > 
> > >>
> > >>> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> > >>> getifaddrs() style functions and friends). But currently, RTM_GETADDR
> > >>> requests do not support a similar property like IFLA_IF_NETNSID for
> > >>> RTM_*LINK requests.
> > >>> This is problematic since userspace can retrieve interfaces from another
> > >>> network namespace by sending a IFLA_IF_NETNSID property along but
> > >>> RTM_GETLINK request but is still forced to use the legacy setns() style of
> > >>> retrieving interfaces in RTM_GETADDR requests.
> > >>>
> > >>> The goal of this series is to make it possible to perform RTM_GETADDR
> > >>> requests on different network namespaces. To this end a new IFA_IF_NETNSID
> > >>> property for RTM_*ADDR requests is introduced. It can be used to send a
> > >>> network namespace identifier along in RTM_*ADDR requests.  The network
> > >>> namespace identifier will be used to retrieve the target network namespace
> > >>> in which the request is supposed to be fulfilled.  This aligns the behavior
> > >>> of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> > >>>
> > >>> Security:
> > >>> - The caller must have assigned a valid network namespace identifier for
> > >>>   the target network namespace.
> > >>> - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> > >>>   target network namespace.
> > >>>
> > >>> Thanks!
> > >>> Christian
> > >>>
> > >>> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> > >>> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> > >>> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> > >>> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> > >>> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> > >>>
> > >>> Christian Brauner (5):
> > >>>   rtnetlink: add rtnl_get_net_ns_capable()
> > >>>   if_addr: add IFA_IF_NETNSID
> > >>>   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> > >>>   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> > >>>   rtnetlink: move type calculation out of loop
> > >>>
> > >>>  include/net/rtnetlink.h      |  1 +
> > >>>  include/uapi/linux/if_addr.h |  1 +
> > >>>  net/core/rtnetlink.c         | 15 +++++---
> > >>>  net/ipv4/devinet.c           | 38 +++++++++++++++-----
> > >>>  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
> > >>>  5 files changed, 97 insertions(+), 28 deletions(-)
> > >>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-01  0:58           ` David Miller
@ 2018-09-01 18:47             ` Christian Brauner
  2018-09-02  9:58               ` Jiri Benc
  0 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2018-09-01 18:47 UTC (permalink / raw)
  To: David Miller
  Cc: nicolas.dichtel, ktkhai, netdev, linux-kernel, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski, jbenc

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

On Fri, Aug 31, 2018 at 05:58:47PM -0700, David Miller wrote:
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Thu, 30 Aug 2018 17:49:12 +0200
> 
> > Maybe I would choose a more generic name for the attribute,
> > something that can be used in other netlink families (xfrm,
> > netfilter, ...) also.  What about IFA_TARGET_NSID?
> 
> Agreed, let's at least try to think ahead about potential future
> uses :)

Sorry for the late reply. Yup, sounds good to me.
But maybe IFA_TARGET_NETNSID to indicate that we're talking network
namespaces here? Seems to me that NSID might give the wrong impression.
I'll send v1 soon. I expect tomorrow or sometime next week.

Thanks!
Christian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-01 18:47             ` Christian Brauner
@ 2018-09-02  9:58               ` Jiri Benc
  2018-09-03  7:50                 ` Nicolas Dichtel
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Benc @ 2018-09-02  9:58 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Miller, nicolas.dichtel, ktkhai, netdev, linux-kernel,
	kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern, fw,
	lucien.xin, jakub.kicinski

On Sat, 1 Sep 2018 20:47:05 +0200, Christian Brauner wrote:
> Sorry for the late reply. Yup, sounds good to me.
> But maybe IFA_TARGET_NETNSID to indicate that we're talking network
> namespaces here? Seems to me that NSID might give the wrong impression.
> I'll send v1 soon. I expect tomorrow or sometime next week.

On the other hand, we currently have IFLA_IF_NETNSID for the link
operations. IFA_IF_NETNSID is more consistent with the existing
attribute. It may be confusing to authors of user space programs to
have attribute names doing the same thing constructed differently for
different calls (IFLA_IF_NETNSID and IFA_TARGET_NETNSID).

As for the patch set itself, it makes sense to me, whatever the
final attribute name is.

Thanks,

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
  2018-08-30 18:41   ` kbuild test robot
@ 2018-09-03  1:18     ` Christian Brauner
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2018-09-03  1:18 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, netdev, linux-kernel, davem, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, ktkhai, lucien.xin,
	jakub.kicinski, jbenc, nicolas.dichtel

On Fri, Aug 31, 2018 at 02:41:45AM +0800, kbuild test robot wrote:
> Hi Christian,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on net-next/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-add-IFA_IF_NETNSID-for-RTM_GETADDR/20180830-194411
> config: x86_64-randconfig-s1-08302022 (attached as .config)
> compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
> >> ERROR: "rtnl_get_net_ns_capable" [net/ipv6/ipv6.ko] undefined!

Fwiw, this is triggered when ipv6 is built as a module. The first
version of my patch did not take that into account and left
rtnl_get_net_ns_capable() unexported. The second version will export
that function and fix this issue.

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-02  9:58               ` Jiri Benc
@ 2018-09-03  7:50                 ` Nicolas Dichtel
  2018-09-03  9:32                   ` Christian Brauner
  0 siblings, 1 reply; 23+ messages in thread
From: Nicolas Dichtel @ 2018-09-03  7:50 UTC (permalink / raw)
  To: Jiri Benc, Christian Brauner
  Cc: David Miller, ktkhai, netdev, linux-kernel, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski

Le 02/09/2018 à 11:58, Jiri Benc a écrit :
> On Sat, 1 Sep 2018 20:47:05 +0200, Christian Brauner wrote:
>> Sorry for the late reply. Yup, sounds good to me.
>> But maybe IFA_TARGET_NETNSID to indicate that we're talking network
>> namespaces here? Seems to me that NSID might give the wrong impression.
>> I'll send v1 soon. I expect tomorrow or sometime next week.
> 
> On the other hand, we currently have IFLA_IF_NETNSID for the link
> operations. IFA_IF_NETNSID is more consistent with the existing
> attribute. It may be confusing to authors of user space programs to
> have attribute names doing the same thing constructed differently for
> different calls (IFLA_IF_NETNSID and IFA_TARGET_NETNSID).
Yep, I also thought to this. It had sense because it was the link family.
What about adding an alias (#define IFLA_TARGET_NETNSID IFLA_IF_NETNSID) in the
uapi to better document this?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-03  7:50                 ` Nicolas Dichtel
@ 2018-09-03  9:32                   ` Christian Brauner
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2018-09-03  9:32 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Jiri Benc, David Miller, ktkhai, netdev, linux-kernel, kuznet,
	yoshfuji, pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski

On Mon, Sep 03, 2018 at 09:50:00AM +0200, Nicolas Dichtel wrote:
> Le 02/09/2018 à 11:58, Jiri Benc a écrit :
> > On Sat, 1 Sep 2018 20:47:05 +0200, Christian Brauner wrote:
> >> Sorry for the late reply. Yup, sounds good to me.
> >> But maybe IFA_TARGET_NETNSID to indicate that we're talking network
> >> namespaces here? Seems to me that NSID might give the wrong impression.
> >> I'll send v1 soon. I expect tomorrow or sometime next week.
> > 
> > On the other hand, we currently have IFLA_IF_NETNSID for the link
> > operations. IFA_IF_NETNSID is more consistent with the existing
> > attribute. It may be confusing to authors of user space programs to
> > have attribute names doing the same thing constructed differently for
> > different calls (IFLA_IF_NETNSID and IFA_TARGET_NETNSID).
> Yep, I also thought to this. It had sense because it was the link family.
> What about adding an alias (#define IFLA_TARGET_NETNSID IFLA_IF_NETNSID) in the
> uapi to better document this?

Sounds good to me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-01  1:34         ` Christian Brauner
@ 2018-09-03 13:41           ` Kirill Tkhai
  2018-09-03 13:50             ` Jiri Benc
  2018-09-03 14:22             ` Christian Brauner
  0 siblings, 2 replies; 23+ messages in thread
From: Kirill Tkhai @ 2018-09-03 13:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel

On 01.09.2018 04:34, Christian Brauner wrote:
> On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
>> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
>>> On 29.08.2018 21:13, Christian Brauner wrote:
>>>> Hi Kirill,
>>>>
>>>> Thanks for the question!
>>>>
>>>> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
>>>>> Hi, Christian,
>>>>>
>>>>> On 29.08.2018 02:18, Christian Brauner wrote:
>>>>>> From: Christian Brauner <christian@brauner.io>
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> A while back we introduced and enabled IFLA_IF_NETNSID in
>>>>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
>>>>>> to signficant performance increases since it allows userspace to avoid
>>>>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
>>>>>> interfaces from the netns associated with the netns_fd. Especially when a
>>>>>> lot of network namespaces are in use, using setns() becomes increasingly
>>>>>> problematic when performance matters.
>>>>>
>>>>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
>>>>> problems with the performance? You should do this only once on application
>>>>> startup, and then you have created netlink sockets in any net namespaces you
>>>>> need. What is the problem here?
>>>>
>>>> So we have a daemon (LXD) that is often running thousands of containers.
>>>> When users issue a lxc list request against the daemon it returns a list
>>>> of all containers including all of the interfaces and addresses for each
>>>> container. To retrieve those addresses we currently rely on setns() +
>>>> getifaddrs() for each of those containers. That has horrible
>>>> performance.
>>>
>>> Could you please provide some numbers showing that setns()
>>> introduces signify performance decrease in the application?
>>
>> Sure, might take a few days++ though since I'm traveling.
> 
> Hey Kirill,
> 
> As promised here's some code [1] that compares performance. I basically
> did a setns() to the network namespace and called getifaddrs() and
> compared this to the scenario where I use the newly introduced property.
> I did this 1 million times and calculated the mean getifaddrs()
> retrieval time based on that.
> My patch cuts the time in half.
> 
> brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
> Mean time in microseconds (netnsid): 81
> Mean time in microseconds (setns): 162
> 
> Christian
> 
> I'm only appending the main file since the netsns_getifaddrs() code I
> used is pretty long:
> 
> [1]:
> 
> #define _GNU_SOURCE
> #define __STDC_FORMAT_MACROS
> #include <fcntl.h>
> #include <inttypes.h>
> #include <linux/types.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <unistd.h>
> 
> #include "netns_getifaddrs.h"
> #include "print_getifaddrs.h"
> 
> #define ITERATIONS 1000000
> #define SEC_TO_MICROSEC(x) (1000000 * (x))
> 
> int main(int argc, char *argv[])
> {
> 	int i, ret;
> 	__s32 netns_id;
> 	pid_t netns_pid;
> 	char path[1024];
> 	intmax_t times[ITERATIONS];
> 	struct timeval t1, t2;
> 	intmax_t time_in_mcs;
> 	int fret = EXIT_FAILURE;
> 	intmax_t sum = 0;
> 	int host_netns_fd = -1, netns_fd = -1;
> 
> 	struct ifaddrs *ifaddrs = NULL;
> 
> 	if (argc != 3)
> 		goto on_error;
> 
> 	netns_id = atoi(argv[1]);
> 	netns_pid = atoi(argv[2]);
> 	printf("%d\n", netns_id);
> 	printf("%d\n", netns_pid);
> 
> 	for (i = 0; i < ITERATIONS; i++) {
> 		ret = gettimeofday(&t1, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = netns_getifaddrs(&ifaddrs, netns_id);
> 		freeifaddrs(ifaddrs);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = gettimeofday(&t2, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> 		times[i] = time_in_mcs;
> 	}
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		sum += times[i];
> 
> 	printf("Mean time in microseconds (netnsid): %ju\n",
> 	       sum / ITERATIONS);
> 
> 	ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
> 	if (ret < 0 || (size_t)ret >= sizeof(path))
> 		goto on_error;
> 
> 	netns_fd = open(path, O_RDONLY | O_CLOEXEC);
> 	if (netns_fd < 0)
> 		goto on_error;
> 
> 	host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
> 	if (host_netns_fd < 0)
> 		goto on_error;
> 
> 	memset(times, 0, sizeof(times));
> 	for (i = 0; i < ITERATIONS; i++) {
> 		ret = gettimeofday(&t1, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = setns(netns_fd, CLONE_NEWNET);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = getifaddrs(&ifaddrs);
> 		freeifaddrs(ifaddrs);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = gettimeofday(&t2, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = setns(host_netns_fd, CLONE_NEWNET);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> 		times[i] = time_in_mcs;
> 	}
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		sum += times[i];
> 
> 	printf("Mean time in microseconds (setns): %ju\n",
> 	       sum / ITERATIONS);
> 
> 	fret = EXIT_SUCCESS;
> 
> on_error:
> 	if (netns_fd >= 0)
> 		close(netns_fd);
> 
> 	if (host_netns_fd >= 0)
> 		close(host_netns_fd);
> 
> 	exit(fret);
> }

But this is a synthetic test, while I asked about real workflow.
Is this real problem for lxd, and there is observed performance
decrease?

I see, there are already nsid use in existing code, but I have to say,
that adding new types of variables as a system call arguments make it
less modular. When you request RTM_GETADDR for a specific nsid, this
obligates the kernel to make everything unchangable during the call,
doesn't it?

We may look at existing code as example, what problems this may cause.
Look at do_setlink(). There are many different types of variables,
and all of them should be dereferenced atomically. So, all the function
is executed under global rtnl. And this causes delays in another config
places, which are sensitive to rtnl. So, adding more dimensions to RTM_GETADDR
may turn it in the same overloaded function as do_setlink() is. And one
day, when we reach the state, when we must rework all of this, we won't
be able to do this. I'm not sure, now is not too late.

I just say about this, because it's possible we should consider another
approach in rtnl communication in general, and stop to overload it.

Kirill


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-03 13:41           ` Kirill Tkhai
@ 2018-09-03 13:50             ` Jiri Benc
  2018-09-03 14:53               ` Kirill Tkhai
  2018-09-03 14:22             ` Christian Brauner
  1 sibling, 1 reply; 23+ messages in thread
From: Jiri Benc @ 2018-09-03 13:50 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Christian Brauner, netdev, linux-kernel, davem, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski, nicolas.dichtel

On Mon, 3 Sep 2018 16:41:45 +0300, Kirill Tkhai wrote:
> But this is a synthetic test, while I asked about real workflow.
> Is this real problem for lxd, and there is observed performance
> decrease?

It's actually not as much a performance problem but rather the only way
to get the data in some situations. Namely, when you have only netnsid.
This happens e.g. when you want to query a veth peer in another netns.

setns() requires a file descriptor which you don't have. Nor there is
a way to convert netnsid to a fd.

While developing the IFLA_IF_NETNSID patch, I was first thinking about
implementing an API doing the conversion. The problem is there's no
good place to put this into. It can't be done over netlink: netlink is
unreliable and you can't have the kernel open a fd for you and lose it.
There's no ioctl to use. So we'd be left with a procfs/sysfs or a
syscall.

Using netnsid to refer to the target netns seems to be a nice solution -
after all, netnsid is the identifier to use in netlink.

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-03 13:41           ` Kirill Tkhai
  2018-09-03 13:50             ` Jiri Benc
@ 2018-09-03 14:22             ` Christian Brauner
  1 sibling, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2018-09-03 14:22 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel

On Mon, Sep 03, 2018 at 04:41:45PM +0300, Kirill Tkhai wrote:
> On 01.09.2018 04:34, Christian Brauner wrote:
> > On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
> >> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> >>> On 29.08.2018 21:13, Christian Brauner wrote:
> >>>> Hi Kirill,
> >>>>
> >>>> Thanks for the question!
> >>>>
> >>>> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> >>>>> Hi, Christian,
> >>>>>
> >>>>> On 29.08.2018 02:18, Christian Brauner wrote:
> >>>>>> From: Christian Brauner <christian@brauner.io>
> >>>>>>
> >>>>>> Hey,
> >>>>>>
> >>>>>> A while back we introduced and enabled IFLA_IF_NETNSID in
> >>>>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> >>>>>> to signficant performance increases since it allows userspace to avoid
> >>>>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> >>>>>> interfaces from the netns associated with the netns_fd. Especially when a
> >>>>>> lot of network namespaces are in use, using setns() becomes increasingly
> >>>>>> problematic when performance matters.
> >>>>>
> >>>>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> >>>>> problems with the performance? You should do this only once on application
> >>>>> startup, and then you have created netlink sockets in any net namespaces you
> >>>>> need. What is the problem here?
> >>>>
> >>>> So we have a daemon (LXD) that is often running thousands of containers.
> >>>> When users issue a lxc list request against the daemon it returns a list
> >>>> of all containers including all of the interfaces and addresses for each
> >>>> container. To retrieve those addresses we currently rely on setns() +
> >>>> getifaddrs() for each of those containers. That has horrible
> >>>> performance.
> >>>
> >>> Could you please provide some numbers showing that setns()
> >>> introduces signify performance decrease in the application?
> >>
> >> Sure, might take a few days++ though since I'm traveling.
> > 
> > Hey Kirill,
> > 
> > As promised here's some code [1] that compares performance. I basically
> > did a setns() to the network namespace and called getifaddrs() and
> > compared this to the scenario where I use the newly introduced property.
> > I did this 1 million times and calculated the mean getifaddrs()
> > retrieval time based on that.
> > My patch cuts the time in half.
> > 
> > brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
> > Mean time in microseconds (netnsid): 81
> > Mean time in microseconds (setns): 162
> > 
> > Christian
> > 
> > I'm only appending the main file since the netsns_getifaddrs() code I
> > used is pretty long:
> > 
> > [1]:
> > 
> > #define _GNU_SOURCE
> > #define __STDC_FORMAT_MACROS
> > #include <fcntl.h>
> > #include <inttypes.h>
> > #include <linux/types.h>
> > #include <sched.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <sys/stat.h>
> > #include <sys/time.h>
> > #include <sys/types.h>
> > #include <unistd.h>
> > 
> > #include "netns_getifaddrs.h"
> > #include "print_getifaddrs.h"
> > 
> > #define ITERATIONS 1000000
> > #define SEC_TO_MICROSEC(x) (1000000 * (x))
> > 
> > int main(int argc, char *argv[])
> > {
> > 	int i, ret;
> > 	__s32 netns_id;
> > 	pid_t netns_pid;
> > 	char path[1024];
> > 	intmax_t times[ITERATIONS];
> > 	struct timeval t1, t2;
> > 	intmax_t time_in_mcs;
> > 	int fret = EXIT_FAILURE;
> > 	intmax_t sum = 0;
> > 	int host_netns_fd = -1, netns_fd = -1;
> > 
> > 	struct ifaddrs *ifaddrs = NULL;
> > 
> > 	if (argc != 3)
> > 		goto on_error;
> > 
> > 	netns_id = atoi(argv[1]);
> > 	netns_pid = atoi(argv[2]);
> > 	printf("%d\n", netns_id);
> > 	printf("%d\n", netns_pid);
> > 
> > 	for (i = 0; i < ITERATIONS; i++) {
> > 		ret = gettimeofday(&t1, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = netns_getifaddrs(&ifaddrs, netns_id);
> > 		freeifaddrs(ifaddrs);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = gettimeofday(&t2, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> > 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> > 		times[i] = time_in_mcs;
> > 	}
> > 
> > 	for (i = 0; i < ITERATIONS; i++)
> > 		sum += times[i];
> > 
> > 	printf("Mean time in microseconds (netnsid): %ju\n",
> > 	       sum / ITERATIONS);
> > 
> > 	ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
> > 	if (ret < 0 || (size_t)ret >= sizeof(path))
> > 		goto on_error;
> > 
> > 	netns_fd = open(path, O_RDONLY | O_CLOEXEC);
> > 	if (netns_fd < 0)
> > 		goto on_error;
> > 
> > 	host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
> > 	if (host_netns_fd < 0)
> > 		goto on_error;
> > 
> > 	memset(times, 0, sizeof(times));
> > 	for (i = 0; i < ITERATIONS; i++) {
> > 		ret = gettimeofday(&t1, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = setns(netns_fd, CLONE_NEWNET);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = getifaddrs(&ifaddrs);
> > 		freeifaddrs(ifaddrs);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = gettimeofday(&t2, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = setns(host_netns_fd, CLONE_NEWNET);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> > 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> > 		times[i] = time_in_mcs;
> > 	}
> > 
> > 	for (i = 0; i < ITERATIONS; i++)
> > 		sum += times[i];
> > 
> > 	printf("Mean time in microseconds (setns): %ju\n",
> > 	       sum / ITERATIONS);
> > 
> > 	fret = EXIT_SUCCESS;
> > 
> > on_error:
> > 	if (netns_fd >= 0)
> > 		close(netns_fd);
> > 
> > 	if (host_netns_fd >= 0)
> > 		close(host_netns_fd);
> > 
> > 	exit(fret);
> > }
> 
> But this is a synthetic test, while I asked about real workflow.
> Is this real problem for lxd, and there is observed performance
> decrease?

As you can see in this mail I explicitly stated that it is a real
performacne issue we see with LXD. You asked for numbers I gave you
numbers by writing a test-program just per your request. The benefit of
this "synthetic" case is that it allows us to clearly see the
performance benefit. Expecting me to hack all of this into LXD just to
get some perf numbers that will show the exact same thing per your
request is - and I hope I'm not being unreasonable here - expecting a
bit much.

> 
> I see, there are already nsid use in existing code, but I have to say,
> that adding new types of variables as a system call arguments make it
> less modular. When you request RTM_GETADDR for a specific nsid, this
> obligates the kernel to make everything unchangable during the call,
> doesn't it?
> 
> We may look at existing code as example, what problems this may cause.
> Look at do_setlink(). There are many different types of variables,
> and all of them should be dereferenced atomically. So, all the function
> is executed under global rtnl. And this causes delays in another config
> places, which are sensitive to rtnl. So, adding more dimensions to RTM_GETADDR
> may turn it in the same overloaded function as do_setlink() is. And one
> day, when we reach the state, when we must rework all of this, we won't
> be able to do this. I'm not sure, now is not too late.
> 
> I just say about this, because it's possible we should consider another
> approach in rtnl communication in general, and stop to overload it.

While I sympathize with your concerns this all seems very vague. There
is a real-world use case that is solved by this patchset.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
  2018-09-03 13:50             ` Jiri Benc
@ 2018-09-03 14:53               ` Kirill Tkhai
  0 siblings, 0 replies; 23+ messages in thread
From: Kirill Tkhai @ 2018-09-03 14:53 UTC (permalink / raw)
  To: Jiri Benc
  Cc: Christian Brauner, netdev, linux-kernel, davem, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski, nicolas.dichtel

On 03.09.2018 16:50, Jiri Benc wrote:
> On Mon, 3 Sep 2018 16:41:45 +0300, Kirill Tkhai wrote:
>> But this is a synthetic test, while I asked about real workflow.
>> Is this real problem for lxd, and there is observed performance
>> decrease?
> 
> It's actually not as much a performance problem but rather the only way
> to get the data in some situations. Namely, when you have only netnsid.
> This happens e.g. when you want to query a veth peer in another netns.
> 
> setns() requires a file descriptor which you don't have. Nor there is
> a way to convert netnsid to a fd.
> 
> While developing the IFLA_IF_NETNSID patch, I was first thinking about
> implementing an API doing the conversion. The problem is there's no
> good place to put this into. It can't be done over netlink: netlink is
> unreliable and you can't have the kernel open a fd for you and lose it.
> There's no ioctl to use. So we'd be left with a procfs/sysfs or a
> syscall.
> 
> Using netnsid to refer to the target netns seems to be a nice solution -
> after all, netnsid is the identifier to use in netlink.

Ok, I have no objections, just mentioning that generic problem.

Thanks,
Kirill

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-09-03 14:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 1/5] rtnetlink: add rtnl_get_net_ns_capable() Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 2/5] if_addr: add IFA_IF_NETNSID Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 4/5] ipv6: " Christian Brauner
2018-08-30 18:41   ` kbuild test robot
2018-09-03  1:18     ` Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 5/5] rtnetlink: move type calculation out of loop Christian Brauner
2018-08-29  8:30 ` [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Kirill Tkhai
2018-08-29 18:13   ` Christian Brauner
2018-08-30  8:49     ` Kirill Tkhai
2018-08-30 14:45       ` Christian Brauner
2018-08-30 15:49         ` Nicolas Dichtel
2018-09-01  0:58           ` David Miller
2018-09-01 18:47             ` Christian Brauner
2018-09-02  9:58               ` Jiri Benc
2018-09-03  7:50                 ` Nicolas Dichtel
2018-09-03  9:32                   ` Christian Brauner
2018-09-01  1:34         ` Christian Brauner
2018-09-03 13:41           ` Kirill Tkhai
2018-09-03 13:50             ` Jiri Benc
2018-09-03 14:53               ` Kirill Tkhai
2018-09-03 14:22             ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.