[PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
@ 2017-01-20  6:10 David Ahern
  2017-01-20  6:10 ` [PATCH net-next v2 1/2] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: David Ahern @ 2017-01-20  6:10 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

This series closes a couple of gaps between IPv4 and IPv6 with respect
to multipath routes:

1. IPv4 allows all nexthops of multipath routes to be deleted using just
   the prefix and length; IPv6 only deletes the first nexthop for the
   route if only the prefix and length are given.

2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
   IPv6 returns a series of routes with the same prefix and length - one
   for each nexthop. This happens for both dumps and notifications.

IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
series of routes.

Patch 1 addresses the first by allowing IPv6 multipath routes to be
deleted using just the prefix and length. Patch 2 addresses the second
allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.

In both cases, the new behavior requires users to opt in by setting a new
flag, RTM_F_ALL_NEXTHOPS, in the rtm_flags of struct rtmsg which is
expected to be the ancillary header in the netlink request received from
the user. A program must opt in to the new behavior so as to not break
any existing applications.

The opt-in behavior works for both route deletes and dumps (the two
differences noted above), but not for notifications as notifications
do not take user input to specify flags. The only way to have
notifications generate RTA_MULTIPATH encodings is to have a gobal
flag -- e.g., sysctl. I'd prefer not to add a sysctl knob for this
backwards compatibility.

Notifications can always be handled as a follow on patch since any
existing code that actually works with IPv6 multipath routes (e.g.,
quagga does not) has to be able to handle the series of routes
currently returned by the dump request and notifications.

v2
- fixed locking in patch 1 as noted by DaveM
- changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
  set in rtm_flags
- revamped explanation of patch 2 and cover letter

David Ahern (2):
  net: ipv6: Allow shorthand delete of all nexthops in multipath route
  net: ipv6: Add option to dump multipath routes via RTA_MULTIPATH
    attribute

 include/net/ip6_fib.h          |   4 +-
 include/net/ip6_route.h        |   1 +
 include/uapi/linux/rtnetlink.h |   4 ++
 net/ipv6/ip6_fib.c             |  28 ++++++++-
 net/ipv6/route.c               | 138 ++++++++++++++++++++++++++++++++++-------
 5 files changed, 152 insertions(+), 23 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 1/2] net: ipv6: Allow shorthand delete of all nexthops in multipath route
  2017-01-20  6:10 [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
@ 2017-01-20  6:10 ` David Ahern
  2017-01-20  6:10 ` [PATCH net-next v2 2/2] net: ipv6: Add option to dump multipath routes via RTA_MULTIPATH attribute David Ahern
  2017-01-20 19:31 ` [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
  2 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2017-01-20  6:10 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

IPv4 allows multipath routes to be deleted using just the prefix and
length. For example:
    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24
        nexthop via 10.100.1.254  dev eth1 weight 1
        nexthop via 10.11.200.2  dev eth11.200 weight 1
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

    $ ip ro del 1.1.1.0/24 vrf red

    $ ip ro ls vrf red
    unreachable default metric 8192
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

The same notation does not work with IPv6 because of how multipath routes
are implemented for IPv6. For IPv6 only the first nexthop of a multipath
route is deleted if the request contains only a prefix and length. This
leads to unnecessary complexity in userspace dealing with IPv6 multipath
routes.

This patch allows all nexthops to be deleted without specifying each one
in the delete request by passing a new flag, RTM_F_ALL_NEXTHOPS, in
rtm_flags. Internally, this is done by walking the sibling list of the
route matching the specifications given (prefix, length, metric, protocol,
etc).

With this patch (and an updated iproute2 command):
    $  ip -6 ro ls vrf red
    2001:db8::/120 via 2001:db8:1::62 dev eth1 metric 256  pref medium
    2001:db8::/120 via 2001:db8:1::61 dev eth1 metric 256  pref medium
    2001:db8::/120 via 2001:db8:1::60 dev eth1 metric 256  pref medium
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    ...

    $ ip -6 ro del vrf red 1111::1/120
    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    ...

The flag is added to fib6_config by converting fc_type to a u16 (as
noted fc_type only uses 8 bits). The new u16 hole is a bitmap with
fc_delete_all_nexthop as the first bit.

v2
- fixed locking deleting route and its siblings as noted by DaveM

v2' (patch originally submitted standalone)
- switched examples to rfc 3849 documentation address per request
- changed delete loop to explicitly look at siblings list for
  first route matching specs given (metric, protocol, etc)

Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/ip6_fib.h          |  4 +++-
 include/uapi/linux/rtnetlink.h |  1 +
 net/ipv6/route.c               | 32 +++++++++++++++++++++++++++++++-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index a74e2aa40ef4..11ab99e87c5f 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -37,7 +37,9 @@ struct fib6_config {
 	int		fc_ifindex;
 	u32		fc_flags;
 	u32		fc_protocol;
-	u32		fc_type;	/* only 8 bits are used */
+	u16		fc_type;	/* only 8 bits are used */
+	u16		fc_delete_all_nexthop : 1,
+			__unused : 15;
 
 	struct in6_addr	fc_dst;
 	struct in6_addr	fc_src;
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 8c93ad1ef9ab..7fb206bc42f9 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -276,6 +276,7 @@ enum rt_scope_t {
 #define RTM_F_EQUALIZE		0x400	/* Multipath equalizer: NI	*/
 #define RTM_F_PREFIX		0x800	/* Prefix addresses		*/
 #define RTM_F_LOOKUP_TABLE	0x1000	/* set rtm_table to FIB lookup result */
+#define RTM_F_ALL_NEXTHOPS	0x2000	/* delete all nexthops (IPv6) */
 
 /* Reserved table identifiers */
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5585c501a540..f207d4d0a782 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2143,6 +2143,33 @@ int ip6_del_rt(struct rt6_info *rt)
 	return __ip6_del_rt(rt, &info);
 }
 
+static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg)
+{
+	struct fib6_table *table;
+	int err;
+
+	table = rt->rt6i_table;
+	write_lock_bh(&table->tb6_lock);
+
+	if (rt->rt6i_nsiblings && cfg->fc_delete_all_nexthop) {
+		struct rt6_info *sibling, *next_sibling;
+
+		list_for_each_entry_safe(sibling, next_sibling,
+					 &rt->rt6i_siblings,
+					 rt6i_siblings) {
+			err = fib6_del(sibling, &cfg->fc_nlinfo);
+			if (err)
+				goto out;
+		}
+	}
+
+	err = fib6_del(rt, &cfg->fc_nlinfo);
+out:
+	write_unlock_bh(&table->tb6_lock);
+	ip6_rt_put(rt);
+	return err;
+}
+
 static int ip6_route_del(struct fib6_config *cfg)
 {
 	struct fib6_table *table;
@@ -2179,7 +2206,7 @@ static int ip6_route_del(struct fib6_config *cfg)
 			dst_hold(&rt->dst);
 			read_unlock_bh(&table->tb6_lock);
 
-			return __ip6_del_rt(rt, &cfg->fc_nlinfo);
+			return __ip6_del_rt_siblings(rt, cfg);
 		}
 	}
 	read_unlock_bh(&table->tb6_lock);
@@ -2849,6 +2876,9 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (rtm->rtm_flags & RTM_F_CLONED)
 		cfg->fc_flags |= RTF_CACHE;
 
+	if (rtm->rtm_flags & RTM_F_ALL_NEXTHOPS)
+		cfg->fc_delete_all_nexthop = 1;
+
 	cfg->fc_nlinfo.portid = NETLINK_CB(skb).portid;
 	cfg->fc_nlinfo.nlh = nlh;
 	cfg->fc_nlinfo.nl_net = sock_net(skb->sk);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 2/2] net: ipv6: Add option to dump multipath routes via RTA_MULTIPATH attribute
  2017-01-20  6:10 [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
  2017-01-20  6:10 ` [PATCH net-next v2 1/2] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
@ 2017-01-20  6:10 ` David Ahern
  2017-01-20 19:31 ` [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
  2 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2017-01-20  6:10 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

IPv6 returns multipath routes as a series of individual routes making
their display and handling by userspace different and more complicated
than IPv4, putting the burden on the user to see that a route is part of
a multipath route and internally creating a multipath route if desired
(e.g., libnl does this as of commit 29b71371e764). This patch addresses
this difference, allowing users to request multipath routes to be
returned using the RTA_MULTIPATH attribute in dump requests.

To maintain backwards compatibility, a user has to opt in to the change
in behavior. Currently, the ancillary header in the netlink request
message is supposed to be an rtmsg, but it could be just an rtgenmsg
since only the family is actually referenced in the code to this point.
But, ip for example sends an ifinfomsg (plus a filter in newer versions)
which is completely wrong, but an established code base at this point.

This patch requires a struct rtmsg for the ancillary header, something it
is able to uniquely detect given the different struct sizes. Within rtmsg,
rtm_flags must have the new (from patch 1) RTM_F_ALL_NEXTHOPS flag set.

The end result is that IPv6 multipath routes can be treated and displayed
in a format similar to IPv4:

    $ ip -6 ro ls vrf red
    2001:db8::/120 metric 1024
            nexthop via 2001:db8:1::62  dev eth1 weight 1
            nexthop via 2001:db8:1::61  dev eth1 weight 1
            nexthop via 2001:db8:1::60  dev eth1 weight 1
            nexthop via 2001:db8:1::59  dev eth1 weight 1
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    ...

v2
- changed user api to opt in to new behavior from attribute appended to
  the request to requiring an rtmsg struct with the RTM_F_ALL_NEXTHOPS
  set

Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/ip6_route.h        |   1 +
 include/uapi/linux/rtnetlink.h |   5 +-
 net/ipv6/ip6_fib.c             |  28 ++++++++++-
 net/ipv6/route.c               | 106 +++++++++++++++++++++++++++++++++--------
 4 files changed, 118 insertions(+), 22 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 9dc2c182a263..7620974826a5 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -154,6 +154,7 @@ struct rt6_rtnl_dump_arg {
 	struct sk_buff *skb;
 	struct netlink_callback *cb;
 	struct net *net;
+	unsigned int rtm_flags;
 };
 
 int rt6_dump_route(struct rt6_info *rt, void *p_arg);
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 7fb206bc42f9..5a070f24f111 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -276,7 +276,10 @@ enum rt_scope_t {
 #define RTM_F_EQUALIZE		0x400	/* Multipath equalizer: NI	*/
 #define RTM_F_PREFIX		0x800	/* Prefix addresses		*/
 #define RTM_F_LOOKUP_TABLE	0x1000	/* set rtm_table to FIB lookup result */
-#define RTM_F_ALL_NEXTHOPS	0x2000	/* delete all nexthops (IPv6) */
+#define RTM_F_ALL_NEXTHOPS	0x2000	/* IPv6 flag:
+					 * delete: remove all nexthops
+					 * dump: nexthops can use RTA_MULTIPATH
+					 */
 
 /* Reserved table identifiers */
 
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index ef5485204522..ef2bf33e34e8 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -308,16 +308,27 @@ static void __net_init fib6_tables_init(struct net *net)
 
 static int fib6_dump_node(struct fib6_walker *w)
 {
+	struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *)w->args;
 	int res;
 	struct rt6_info *rt;
 
 	for (rt = w->leaf; rt; rt = rt->dst.rt6_next) {
-		res = rt6_dump_route(rt, w->args);
+		res = rt6_dump_route(rt, arg);
 		if (res < 0) {
 			/* Frame is full, suspend walking */
 			w->leaf = rt;
 			return 1;
 		}
+
+		/* if multipath routes are dumped in one route with
+		 * the RTA_MULTIPATH attribute, then jump rt to point
+		 * to the last sibling of this route (no need to dump
+		 * the sibling routes again)
+		 */
+		if ((arg->rtm_flags & RTM_F_ALL_NEXTHOPS) && rt->rt6i_nsiblings)
+			rt = list_last_entry(&rt->rt6i_siblings,
+					     struct rt6_info,
+					     rt6i_siblings);
 	}
 	w->leaf = NULL;
 	return 0;
@@ -398,6 +409,7 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	struct fib6_walker *w;
 	struct fib6_table *tb;
 	struct hlist_head *head;
+	unsigned int flags = 0;
 	int res = 0;
 
 	s_h = cb->args[0];
@@ -422,9 +434,23 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 		cb->args[2] = (long)w;
 	}
 
+	/* sadly, the size of the ancillary header can vary: the correct
+	 * header is struct rtmsg as passed by libnl. iproute2 sends
+	 * ifinfomsg [+ filter]. Technically, passing only rtgenmsg is
+	 * sufficient since the header has not been parsed before. Luckily,
+	 * the struct sizes sufficiently vary to detect the permutations.
+	 * For the dump request to contain flags, require rtmsg header.
+	 */
+	if (nlmsg_len(cb->nlh) == sizeof(struct rtmsg)) {
+		struct rtmsg *rtm = nlmsg_data(cb->nlh);
+
+		flags = rtm->rtm_flags;
+	}
+
 	arg.skb = skb;
 	arg.cb = cb;
 	arg.net = net;
+	arg.rtm_flags = flags;
 	w->args = &arg;
 
 	rcu_read_lock();
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f207d4d0a782..ee57adf1a765 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3195,11 +3195,65 @@ static inline size_t rt6_nlmsg_size(struct rt6_info *rt)
 	       + lwtunnel_get_encap_size(rt->dst.lwtstate);
 }
 
+static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
+			    unsigned int *flags)
+{
+	if (!netif_carrier_ok(rt->dst.dev)) {
+		*flags |= RTNH_F_LINKDOWN;
+		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+			*flags |= RTNH_F_DEAD;
+	}
+
+	if (rt->rt6i_flags & RTF_GATEWAY) {
+		if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0)
+			goto nla_put_failure;
+	}
+
+	if (rt->dst.dev &&
+	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
+		goto nla_put_failure;
+
+	if (rt->dst.lwtstate &&
+	    lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0)
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt)
+{
+	struct rtnexthop *rtnh;
+	unsigned int flags = 0;
+
+	rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
+	if (!rtnh)
+		goto nla_put_failure;
+
+	rtnh->rtnh_hops = 0;
+	rtnh->rtnh_ifindex = rt->dst.dev ? rt->dst.dev->ifindex : 0;
+
+	if (rt6_nexthop_info(skb, rt, &flags) < 0)
+		goto nla_put_failure;
+
+	rtnh->rtnh_flags = flags;
+
+	/* length of rtnetlink header + attributes */
+	rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *)rtnh;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int rt6_fill_node(struct net *net,
 			 struct sk_buff *skb, struct rt6_info *rt,
 			 struct in6_addr *dst, struct in6_addr *src,
 			 int iif, int type, u32 portid, u32 seq,
-			 unsigned int flags)
+			 unsigned int nlm_flags, unsigned int rtm_flags)
 {
 	u32 metrics[RTAX_MAX];
 	struct rtmsg *rtm;
@@ -3207,7 +3261,7 @@ static int rt6_fill_node(struct net *net,
 	long expires;
 	u32 table;
 
-	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), flags);
+	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), nlm_flags);
 	if (!nlh)
 		return -EMSGSIZE;
 
@@ -3246,11 +3300,6 @@ static int rt6_fill_node(struct net *net,
 	else
 		rtm->rtm_type = RTN_UNICAST;
 	rtm->rtm_flags = 0;
-	if (!netif_carrier_ok(rt->dst.dev)) {
-		rtm->rtm_flags |= RTNH_F_LINKDOWN;
-		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
-			rtm->rtm_flags |= RTNH_F_DEAD;
-	}
 	rtm->rtm_scope = RT_SCOPE_UNIVERSE;
 	rtm->rtm_protocol = rt->rt6i_protocol;
 	if (rt->rt6i_flags & RTF_DYNAMIC)
@@ -3314,17 +3363,36 @@ static int rt6_fill_node(struct net *net,
 	if (rtnetlink_put_metrics(skb, metrics) < 0)
 		goto nla_put_failure;
 
-	if (rt->rt6i_flags & RTF_GATEWAY) {
-		if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0)
-			goto nla_put_failure;
-	}
-
-	if (rt->dst.dev &&
-	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
-		goto nla_put_failure;
 	if (nla_put_u32(skb, RTA_PRIORITY, rt->rt6i_metric))
 		goto nla_put_failure;
 
+	/* if user wants nexthops included via the RTA_MULTIPATH
+	 * attribute, then walk the siblings list and add each
+	 * as a nexthop
+	 */
+	if ((rtm_flags & RTM_F_ALL_NEXTHOPS) && rt->rt6i_nsiblings) {
+		struct rt6_info *sibling, *next_sibling;
+		struct nlattr *mp;
+
+		mp = nla_nest_start(skb, RTA_MULTIPATH);
+		if (!mp)
+			goto nla_put_failure;
+
+		if (rt6_add_nexthop(skb, rt) < 0)
+			goto nla_put_failure;
+
+		list_for_each_entry_safe(sibling, next_sibling,
+					 &rt->rt6i_siblings, rt6i_siblings) {
+			if (rt6_add_nexthop(skb, sibling) < 0)
+				goto nla_put_failure;
+		}
+
+		nla_nest_end(skb, mp);
+	} else {
+		if (rt6_nexthop_info(skb, rt, &rtm->rtm_flags) < 0)
+			goto nla_put_failure;
+	}
+
 	expires = (rt->rt6i_flags & RTF_EXPIRES) ? rt->dst.expires - jiffies : 0;
 
 	if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, rt->dst.error) < 0)
@@ -3333,8 +3401,6 @@ static int rt6_fill_node(struct net *net,
 	if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt->rt6i_flags)))
 		goto nla_put_failure;
 
-	if (lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0)
-		goto nla_put_failure;
 
 	nlmsg_end(skb, nlh);
 	return 0;
@@ -3362,7 +3428,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg)
 	return rt6_fill_node(arg->net,
 		     arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
 		     NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq,
-		     NLM_F_MULTI);
+		     NLM_F_MULTI, arg->rtm_flags);
 }
 
 static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
@@ -3453,7 +3519,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 
 	err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif,
 			    RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
-			    nlh->nlmsg_seq, 0);
+			    nlh->nlmsg_seq, 0, 0);
 	if (err < 0) {
 		kfree_skb(skb);
 		goto errout;
@@ -3480,7 +3546,7 @@ void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info,
 		goto errout;
 
 	err = rt6_fill_node(net, skb, rt, NULL, NULL, 0,
-				event, info->portid, seq, nlm_flags);
+			    event, info->portid, seq, nlm_flags, 0);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in rt6_nlmsg_size() */
 		WARN_ON(err == -EMSGSIZE);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-20  6:10 [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
  2017-01-20  6:10 ` [PATCH net-next v2 1/2] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
  2017-01-20  6:10 ` [PATCH net-next v2 2/2] net: ipv6: Add option to dump multipath routes via RTA_MULTIPATH attribute David Ahern
@ 2017-01-20 19:31 ` David Ahern
  2017-01-24 16:06   ` David Miller
  2 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2017-01-20 19:31 UTC (permalink / raw)
  To: netdev

On 1/19/17 11:10 PM, David Ahern wrote:
> This series closes a couple of gaps between IPv4 and IPv6 with respect
> to multipath routes:
...
> In both cases, the new behavior requires users to opt in by setting a new
> flag, RTM_F_ALL_NEXTHOPS, in the rtm_flags of struct rtmsg which is
> expected to be the ancillary header in the netlink request received from
> the user. A program must opt in to the new behavior so as to not break
> any existing applications.
> 
> The opt-in behavior works for both route deletes and dumps (the two
> differences noted above), but not for notifications as notifications
> do not take user input to specify flags. The only way to have
> notifications generate RTA_MULTIPATH encodings is to have a gobal
> flag -- e.g., sysctl. I'd prefer not to add a sysctl knob for this
> backwards compatibility.

BTW, I am in favor of not requiring a user API for this but just doing it. I can't imagine anyone working with multipath routes not wanting the efficiency of the RTA_MULTIPATH attribute. These patches require an API only because of the rule not to break userspace. If we conclude to just do it without an API, the multipath_add and multipath_del need to be modified to only send a notification once at the end of the actions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-20 19:31 ` [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
@ 2017-01-24 16:06   ` David Miller
  2017-01-26 14:58     ` Nicolas Dichtel
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2017-01-24 16:06 UTC (permalink / raw)
  To: dsa; +Cc: netdev

From: David Ahern <dsa@cumulusnetworks.com>
Date: Fri, 20 Jan 2017 12:31:44 -0700

> BTW, I am in favor of not requiring a user API for this but just
> doing it. I can't imagine anyone working with multipath routes not
> wanting the efficiency of the RTA_MULTIPATH attribute. These patches
> require an API only because of the rule not to break userspace. If
> we conclude to just do it without an API, the multipath_add and
> multipath_del need to be modified to only send a notification once
> at the end of the actions.

After some consideration, I agree.

Please repost this series without the new netlink knob.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-24 16:06   ` David Miller
@ 2017-01-26 14:58     ` Nicolas Dichtel
  2017-01-26 17:26       ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: Nicolas Dichtel @ 2017-01-26 14:58 UTC (permalink / raw)
  To: David Miller, dsa; +Cc: netdev

Le 24/01/2017 à 17:06, David Miller a écrit :
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Fri, 20 Jan 2017 12:31:44 -0700
> 
>> BTW, I am in favor of not requiring a user API for this but just
>> doing it. I can't imagine anyone working with multipath routes not
>> wanting the efficiency of the RTA_MULTIPATH attribute. These patches
>> require an API only because of the rule not to break userspace. If
>> we conclude to just do it without an API, the multipath_add and
>> multipath_del need to be modified to only send a notification once
>> at the end of the actions.
> 
> After some consideration, I agree.
Sorry for my late reply but I fear that this will break some userland apps like
quagga.
I agree that it would be better but the API is here since v3.8, ie 4 years.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-26 14:58     ` Nicolas Dichtel
@ 2017-01-26 17:26       ` David Ahern
  2017-01-26 18:00         ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2017-01-26 17:26 UTC (permalink / raw)
  To: nicolas.dichtel, David Miller; +Cc: netdev

On 1/26/17 7:58 AM, Nicolas Dichtel wrote:
> Le 24/01/2017 à 17:06, David Miller a écrit :
>> From: David Ahern <dsa@cumulusnetworks.com>
>> Date: Fri, 20 Jan 2017 12:31:44 -0700
>>
>>> BTW, I am in favor of not requiring a user API for this but just
>>> doing it. I can't imagine anyone working with multipath routes not
>>> wanting the efficiency of the RTA_MULTIPATH attribute. These patches
>>> require an API only because of the rule not to break userspace. If
>>> we conclude to just do it without an API, the multipath_add and
>>> multipath_del need to be modified to only send a notification once
>>> at the end of the actions.
>>
>> After some consideration, I agree.
> Sorry for my late reply but I fear that this will break some userland apps like
> quagga.

Quagga does not properly handle IPv6 multipath routes received from the kernel. I checked this with debian/jessie version and our version, and Donald reviewed the source. It is broken.

This is a no-op to iproute2 -- it properly handles RTA_MULTIPATH for IPv6, though arguably scripts could exist parsing the existing series of routes. We could offer a switch to dump routes in the old format (opting in to the old behavior vs moving to the new consistent behavior).

libnl based apps work fine as of a patch Roopa got committed 4+ years ago.

> I agree that it would be better but the API is here since v3.8, ie 4 years.

What API are you referring to? You mean 51ebd3181572a?

I think this problem is a mixed bag. Currently, IPv6 multipath is completely different than IPv4 and MPLS. Adding an API to protect backward compatibility only shifts the problem -- it creates different APIs for IPv4 and IPv6 to get the same behavior. The middle ground is a sysctl that acts as a global switch to pick which difference you want with the caveat that it affects all apps in the namespace.

Consistency in behavior -- both API and data format -- is important for good, efficient userspace code. 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-26 17:26       ` David Ahern
@ 2017-01-26 18:00         ` David Miller
  2017-01-27 16:29           ` Nicolas Dichtel
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2017-01-26 18:00 UTC (permalink / raw)
  To: dsa; +Cc: nicolas.dichtel, netdev

From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu, 26 Jan 2017 10:26:51 -0700

> On 1/26/17 7:58 AM, Nicolas Dichtel wrote:
>> Le 24/01/2017 à 17:06, David Miller a écrit :
>>> From: David Ahern <dsa@cumulusnetworks.com>
>>> Date: Fri, 20 Jan 2017 12:31:44 -0700
>>>
>>>> BTW, I am in favor of not requiring a user API for this but just
>>>> doing it. I can't imagine anyone working with multipath routes not
>>>> wanting the efficiency of the RTA_MULTIPATH attribute. These patches
>>>> require an API only because of the rule not to break userspace. If
>>>> we conclude to just do it without an API, the multipath_add and
>>>> multipath_del need to be modified to only send a notification once
>>>> at the end of the actions.
>>>
>>> After some consideration, I agree.
>> Sorry for my late reply but I fear that this will break some userland apps like
>> quagga.
> 
> Quagga does not properly handle IPv6 multipath routes received from
> the kernel. I checked this with debian/jessie version and our
> version, and Donald reviewed the source. It is broken.

If this is true, quagga is asbolutely not an argument for this "breaking"
something.  It doesn't break anything.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-26 18:00         ` David Miller
@ 2017-01-27 16:29           ` Nicolas Dichtel
  2017-01-27 16:36             ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: Nicolas Dichtel @ 2017-01-27 16:29 UTC (permalink / raw)
  To: David Miller, dsa; +Cc: netdev

Le 26/01/2017 à 19:00, David Miller a écrit :
> From: David Ahern <dsa@cumulusnetworks.com>
[snip]
>> Quagga does not properly handle IPv6 multipath routes received from
>> the kernel. I checked this with debian/jessie version and our
>> version, and Donald reviewed the source. It is broken.
> 
> If this is true, quagga is asbolutely not an argument for this "breaking"
> something.  It doesn't break anything.
Ok, my tests also shows that quagga is buggy.
Let's change the way to advertise these routes.

It would be great to also use RTA_MULTIPATH when a route is deleted (like in
your patch 1/2).

Note that there is still a difference between ipv4 and ipv6: in ipv4 when a
nexthop is added/updated/removed, the whole route must be deleted and added
again. In IPv6, nexthop can be managed one by one.
It means that in ipv4, the full route is always dumped, which is not the case in
ipv6.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-27 16:29           ` Nicolas Dichtel
@ 2017-01-27 16:36             ` David Ahern
  2017-01-27 16:45               ` Nicolas Dichtel
  0 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2017-01-27 16:36 UTC (permalink / raw)
  To: nicolas.dichtel, David Miller; +Cc: netdev, roopa

On 1/27/17 9:29 AM, Nicolas Dichtel wrote:
> Le 26/01/2017 à 19:00, David Miller a écrit :
>> From: David Ahern <dsa@cumulusnetworks.com>
> [snip]
>>> Quagga does not properly handle IPv6 multipath routes received from
>>> the kernel. I checked this with debian/jessie version and our
>>> version, and Donald reviewed the source. It is broken.
>>
>> If this is true, quagga is asbolutely not an argument for this "breaking"
>> something.  It doesn't break anything.
> Ok, my tests also shows that quagga is buggy.
> Let's change the way to advertise these routes.
> 
> It would be great to also use RTA_MULTIPATH when a route is deleted (like in
> your patch 1/2).

I have updated notifications to use RTA_MULTIPATH. Working on the multipath add/delete/replace permutations now and what the notification looks like. Add/replace is easy and the notifications use RTA_MULTIPATH. Notifications for the delete path are complicated given that a delete could remove only a subset of nexthops. Given that, we might have to settle for a notification for each nexthop delete.

> 
> Note that there is still a difference between ipv4 and ipv6: in ipv4 when a
> nexthop is added/updated/removed, the whole route must be deleted and added
> again. In IPv6, nexthop can be managed one by one.
> It means that in ipv4, the full route is always dumped, which is not the case in
> ipv6.
> 

Yes. I have been working on how to delete a nexthop within an IPv4 route. It is much more complicated given how the route is stored compared to IPv6.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes
  2017-01-27 16:36             ` David Ahern
@ 2017-01-27 16:45               ` Nicolas Dichtel
  0 siblings, 0 replies; 11+ messages in thread
From: Nicolas Dichtel @ 2017-01-27 16:45 UTC (permalink / raw)
  To: David Ahern, David Miller; +Cc: netdev, roopa

Le 27/01/2017 à 17:36, David Ahern a écrit :
> On 1/27/17 9:29 AM, Nicolas Dichtel wrote:
>> Le 26/01/2017 à 19:00, David Miller a écrit :
>>> From: David Ahern <dsa@cumulusnetworks.com>
>> [snip]
>>>> Quagga does not properly handle IPv6 multipath routes received from
>>>> the kernel. I checked this with debian/jessie version and our
>>>> version, and Donald reviewed the source. It is broken.
>>>
>>> If this is true, quagga is asbolutely not an argument for this "breaking"
>>> something.  It doesn't break anything.
>> Ok, my tests also shows that quagga is buggy.
>> Let's change the way to advertise these routes.
>>
>> It would be great to also use RTA_MULTIPATH when a route is deleted (like in
>> your patch 1/2).
> 
> I have updated notifications to use RTA_MULTIPATH. Working on the multipath add/delete/replace permutations now and what the notification looks like. Add/replace is easy and the notifications use RTA_MULTIPATH. Notifications for the delete path are complicated given that a delete could remove only a subset of nexthops. Given that, we might have to settle for a notification for each nexthop delete.
Ok, I give it a quick look and I agree it seems not so easy.

> 
>>
>> Note that there is still a difference between ipv4 and ipv6: in ipv4 when a
>> nexthop is added/updated/removed, the whole route must be deleted and added
>> again. In IPv6, nexthop can be managed one by one.
>> It means that in ipv4, the full route is always dumped, which is not the case in
>> ipv6.
>>
> 
> Yes. I have been working on how to delete a nexthop within an IPv4 route. It is much more complicated given how the route is stored compared to IPv6.
> 
Ok.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-01-27 16:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-20  6:10 [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
2017-01-20  6:10 ` [PATCH net-next v2 1/2] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
2017-01-20  6:10 ` [PATCH net-next v2 2/2] net: ipv6: Add option to dump multipath routes via RTA_MULTIPATH attribute David Ahern
2017-01-20 19:31 ` [PATCH net-next v2 0/2] net: ipv6: Improve user experience with multipath routes David Ahern
2017-01-24 16:06   ` David Miller
2017-01-26 14:58     ` Nicolas Dichtel
2017-01-26 17:26       ` David Ahern
2017-01-26 18:00         ` David Miller
2017-01-27 16:29           ` Nicolas Dichtel
2017-01-27 16:36             ` David Ahern
2017-01-27 16:45               ` Nicolas Dichtel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.