[PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
@ 2017-01-27 23:20 David Ahern
  2017-01-27 23:20 ` [PATCH net-next v3 1/4] net: ipv6: add NLM_F_APPEND in notifications when applicable David Ahern
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: David Ahern @ 2017-01-27 23:20 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

This series closes a couple of gaps between IPv4 and IPv6 with respect
to multipath routes:

1. IPv4 allows all nexthops of multipath routes to be deleted using just
   the prefix and length; IPv6 only deletes the first nexthop for the
   route if only the prefix and length are given.

2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
   IPv6 returns a series of routes with the same prefix and length - one
   for each nexthop. This happens for both dumps and notifications.

IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
series of routes.

Patch 2 addresses the first item by allowing IPv6 multipath routes to be
deleted using just the prefix and length. Patch 3 addresses the second
allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.

Patch 1 adds the NLM_F_APPEND flag to notifications when the flag is
present in the request. The lack of this flag was noted testing route
appends and comparing to IPv4.

Patch 4 prints IPv6 addresses in compressed format when showing route
replace errors. This was noticed testing REPLACE failures.

The end result for multipath routes:
1. Route Add
   - one notification with RTA_MULTIPATH attribute

2. Route Replace
   - notification for first route and all siblings that have
     succeeded. This is needed regardless of success of remaining
     nexthops to maintain add/delete consistency should a failure
     happens on the second or following nexthop (ie., need to tell
     userspace that original route has been replaced and then the
     failure logic deletes all routes inserted thus far).
     
3. Route Delete
   - for multipath route only given nexthops are deleted. This path
     is hit when DELETE contains RTA_MULTIPATH. All other route deletes,
     all nexthops are deleted for given prefix and length (and any
     other specs if given)

   - one notification sent per nexthop deleted. This is unavoidable
     since IPv6 alllows a single nexthop to be deleted within a multipath
     route

4. Route Appends
   - IPv6 allows nexthops to be appended to an existing route. In this
     case one notification is sent per nexthop added

Addresses some of the inconsistencies also noted by Roopa at netdev0.1:
https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_slides.pdf

v3
- removed the need for a user API to opt-in to change. Requiring an
  API just shifts the difference from same API with different
  behavior to different API to achieve equivalent behavior

- route notifications changed to use RTA_MULTIPATH for add and replace

- upated commit messages and cover letter

v2
- fixed locking in patch 1 as noted by DaveM
- changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
  set in rtm_flags
- revamped explanation of patch 2 and cover letter

David Ahern (4):
  net: ipv6: add NLM_F_APPEND in notifications when applicable
  net: ipv6: Allow shorthand delete of all nexthops in multipath route
  net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH
    attribute
  net: ipv6: Use compressed IPv6 addresses showing route replace error

 include/net/ip6_fib.h |   4 +-
 include/net/netlink.h |   1 +
 net/ipv6/ip6_fib.c    |  19 +++++-
 net/ipv6/route.c      | 163 ++++++++++++++++++++++++++++++++++++++++++++------
 4 files changed, 165 insertions(+), 22 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net-next v3 1/4] net: ipv6: add NLM_F_APPEND in notifications when applicable
  2017-01-27 23:20 [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes David Ahern
@ 2017-01-27 23:20 ` David Ahern
  2017-01-27 23:20 ` [PATCH net-next v3 2/4] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2017-01-27 23:20 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

IPv6 does not set the NLM_F_APPEND flag in notifications to signal that
a NEWROUTE is an append versus a new route or a replaced one. Add the
flag if the request has it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv6/ip6_fib.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index ef5485204522..bcaf247232d7 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -746,6 +746,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 	u16 nlflags = NLM_F_EXCL;
 	int err;
 
+	if (info->nlh && info->nlh->nlmsg_flags & NLM_F_APPEND)
+		nlflags |= NLM_F_APPEND;
+
 	ins = &fn->leaf;
 
 	for (iter = fn->leaf; iter; iter = iter->dst.rt6_next) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v3 2/4] net: ipv6: Allow shorthand delete of all nexthops in multipath route
  2017-01-27 23:20 [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes David Ahern
  2017-01-27 23:20 ` [PATCH net-next v3 1/4] net: ipv6: add NLM_F_APPEND in notifications when applicable David Ahern
@ 2017-01-27 23:20 ` David Ahern
  2017-01-27 23:20 ` [PATCH net-next v3 3/4] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute David Ahern
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2017-01-27 23:20 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

IPv4 allows multipath routes to be deleted using just the prefix and
length. For example:
    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24
        nexthop via 10.100.1.254  dev eth1 weight 1
        nexthop via 10.11.200.2  dev eth11.200 weight 1
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

    $ ip ro del 1.1.1.0/24 vrf red

    $ ip ro ls vrf red
    unreachable default metric 8192
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

The same notation does not work with IPv6 because of how multipath routes
are implemented for IPv6. For IPv6 only the first nexthop of a multipath
route is deleted if the request contains only a prefix and length. This
leads to unnecessary complexity in userspace dealing with IPv6 multipath
routes.

This patch allows all nexthops to be deleted without specifying each one
in the delete request. Internally, this is done by walking the sibling
list of the route matching the specifications given (prefix, length,
metric, protocol, etc).

    $  ip -6 ro ls vrf red
    2001:db8::/120 via 2001:db8:1::62 dev eth1 metric 256  pref medium
    2001:db8::/120 via 2001:db8:1::61 dev eth1 metric 256  pref medium
    2001:db8::/120 via 2001:db8:1::60 dev eth1 metric 256  pref medium
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    ...

    $ ip -6 ro del vrf red 1111::1/120
    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    ...

Because IPv6 allows individual nexthops to be deleted without deleting
the entire route, the mutipath and non-multipath code paths have to be
discriminated so that all nexthops are only deleted for the latter case.
This is done by making the existing fc_type in fib6_config a u16 and then
adding a new u16 field with fc_delete_all_nh as the first bit.

Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v3
- removed need for RTM_F_ALL_NEXTHOPS user api

v2
- fixed locking deleting route and its siblings as noted by DaveM

v2' (patch originally submitted standalone)
- switched examples to rfc 3849 documentation address per request
- changed delete loop to explicitly look at siblings list for
  first route matching specs given (metric, protocol, etc)

 include/net/ip6_fib.h |  4 +++-
 net/ipv6/route.c      | 34 ++++++++++++++++++++++++++++++++--
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index a74e2aa40ef4..c979c878df1c 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -37,7 +37,9 @@ struct fib6_config {
 	int		fc_ifindex;
 	u32		fc_flags;
 	u32		fc_protocol;
-	u32		fc_type;	/* only 8 bits are used */
+	u16		fc_type;        /* only 8 bits are used */
+	u16		fc_delete_all_nh : 1,
+			__unused : 15;
 
 	struct in6_addr	fc_dst;
 	struct in6_addr	fc_src;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5046d2b24004..81e2b2a28806 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2143,6 +2143,34 @@ int ip6_del_rt(struct rt6_info *rt)
 	return __ip6_del_rt(rt, &info);
 }
 
+static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg)
+{
+	struct nl_info *info = &cfg->fc_nlinfo;
+	struct fib6_table *table;
+	int err;
+
+	table = rt->rt6i_table;
+	write_lock_bh(&table->tb6_lock);
+
+	if (rt->rt6i_nsiblings && cfg->fc_delete_all_nh) {
+		struct rt6_info *sibling, *next_sibling;
+
+		list_for_each_entry_safe(sibling, next_sibling,
+					 &rt->rt6i_siblings,
+					 rt6i_siblings) {
+			err = fib6_del(sibling, info);
+			if (err)
+				goto out;
+		}
+	}
+
+	err = fib6_del(rt, info);
+out:
+	write_unlock_bh(&table->tb6_lock);
+	ip6_rt_put(rt);
+	return err;
+}
+
 static int ip6_route_del(struct fib6_config *cfg)
 {
 	struct fib6_table *table;
@@ -2179,7 +2207,7 @@ static int ip6_route_del(struct fib6_config *cfg)
 			dst_hold(&rt->dst);
 			read_unlock_bh(&table->tb6_lock);
 
-			return __ip6_del_rt(rt, &cfg->fc_nlinfo);
+			return __ip6_del_rt_siblings(rt, cfg);
 		}
 	}
 	read_unlock_bh(&table->tb6_lock);
@@ -3131,8 +3159,10 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 	if (cfg.fc_mp)
 		return ip6_route_multipath_del(&cfg);
-	else
+	else {
+		cfg.fc_delete_all_nh = 1;
 		return ip6_route_del(&cfg);
+	}
 }
 
 static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v3 3/4] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute
  2017-01-27 23:20 [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes David Ahern
  2017-01-27 23:20 ` [PATCH net-next v3 1/4] net: ipv6: add NLM_F_APPEND in notifications when applicable David Ahern
  2017-01-27 23:20 ` [PATCH net-next v3 2/4] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
@ 2017-01-27 23:20 ` David Ahern
  2017-01-27 23:20 ` [PATCH net-next v3 4/4] net: ipv6: Use compressed IPv6 addresses showing route replace error David Ahern
  2017-01-29  1:00 ` [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes Roopa Prabhu
  4 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2017-01-27 23:20 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

IPv6 returns multipath routes as a series of individual routes making
their display and handling by userspace different and more complicated
than IPv4, putting the burden on the user to see that a route is part of
a multipath route and internally creating a multipath route if desired
(e.g., libnl does this as of commit 29b71371e764). This patch addresses
this difference, allowing multipath routes to be returned using the
RTA_MULTIPATH attribute.

The end result is that IPv6 multipath routes can be treated and displayed
in a format similar to IPv4:

    $ ip -6 ro ls vrf red
    2001:db8::/120 metric 1024
            nexthop via 2001:db8:1::62  dev eth1 weight 1
            nexthop via 2001:db8:1::61  dev eth1 weight 1
            nexthop via 2001:db8:1::60  dev eth1 weight 1
            nexthop via 2001:db8:1::59  dev eth1 weight 1
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    ...

Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v3
- dropped user API to opt-in to change

v2
- changed user api to opt in to new behavior from attribute appended to
  the request to requiring an rtmsg struct with the RTM_F_ALL_NEXTHOPS
  set

 include/net/netlink.h |   1 +
 net/ipv6/ip6_fib.c    |  16 ++++++-
 net/ipv6/route.c      | 127 +++++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 126 insertions(+), 18 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index d3938f11ae52..b239fcd33d80 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -229,6 +229,7 @@ struct nl_info {
 	struct nlmsghdr		*nlh;
 	struct net		*nl_net;
 	u32			portid;
+	bool			skip_notify;
 };
 
 int netlink_rcv_skb(struct sk_buff *skb,
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index bcaf247232d7..2542794b2c64 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -318,6 +318,16 @@ static int fib6_dump_node(struct fib6_walker *w)
 			w->leaf = rt;
 			return 1;
 		}
+
+		/* if multipath routes are dumped in one route with
+		 * the RTA_MULTIPATH attribute, then jump rt to point
+		 * to the last sibling of this route (no need to dump
+		 * the sibling routes again)
+		 */
+		if (rt->rt6i_nsiblings)
+			rt = list_last_entry(&rt->rt6i_siblings,
+					     struct rt6_info,
+					     rt6i_siblings);
 	}
 	w->leaf = NULL;
 	return 0;
@@ -871,7 +881,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		*ins = rt;
 		rt->rt6i_node = fn;
 		atomic_inc(&rt->rt6i_ref);
-		inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
+		if (!info->skip_notify)
+			inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
 		info->nl_net->ipv6.rt6_stats->fib_rt_entries++;
 
 		if (!(fn->fn_flags & RTN_RTINFO)) {
@@ -897,7 +908,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		rt->rt6i_node = fn;
 		rt->dst.rt6_next = iter->dst.rt6_next;
 		atomic_inc(&rt->rt6i_ref);
-		inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
+		if (!info->skip_notify)
+			inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
 		if (!(fn->fn_flags & RTN_RTINFO)) {
 			info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
 			fn->fn_flags |= RTN_RTINFO;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 81e2b2a28806..747f333ae006 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3010,19 +3010,25 @@ static int ip6_route_info_append(struct list_head *rt6_nh_list,
 
 static int ip6_route_multipath_add(struct fib6_config *cfg)
 {
+	struct rt6_info *rt, *rt_first = NULL;
 	struct fib6_config r_cfg;
 	struct rtnexthop *rtnh;
-	struct rt6_info *rt;
 	struct rt6_nh *err_nh;
 	struct rt6_nh *nh, *nh_safe;
+	__u16 nlflags;
 	int remaining;
 	int attrlen;
 	int err = 1;
 	int nhn = 0;
+	int append = cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_APPEND;
 	int replace = (cfg->fc_nlinfo.nlh &&
 		       (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE));
 	LIST_HEAD(rt6_nh_list);
 
+	nlflags = replace ? NLM_F_REPLACE : NLM_F_CREATE;
+	if (append)
+		nlflags |= NLM_F_APPEND;
+
 	remaining = cfg->fc_mp_len;
 	rtnh = (struct rtnexthop *)cfg->fc_mp;
 
@@ -3065,9 +3071,20 @@ static int ip6_route_multipath_add(struct fib6_config *cfg)
 		rtnh = rtnh_next(rtnh, &remaining);
 	}
 
+	/* for route append want to send separate notifications on
+	 * each add; for add and replace send one notification with
+	 * all nexthops
+	 */
+	if (!append)
+		cfg->fc_nlinfo.skip_notify = 1;
+
 	err_nh = NULL;
 	list_for_each_entry(nh, &rt6_nh_list, next) {
 		err = __ip6_ins_rt(nh->rt6_info, &cfg->fc_nlinfo, &nh->mxc);
+		/* save reference to first route for notification */
+		if (!rt_first && !err && !append)
+			rt_first = nh->rt6_info;
+
 		/* nh->rt6_info is used or freed at this point, reset to NULL*/
 		nh->rt6_info = NULL;
 		if (err) {
@@ -3089,9 +3106,21 @@ static int ip6_route_multipath_add(struct fib6_config *cfg)
 		nhn++;
 	}
 
+	if (rt_first)
+		inet6_rt_notify(RTM_NEWROUTE, rt_first, &cfg->fc_nlinfo,
+				nlflags);
+
 	goto cleanup;
 
 add_errout:
+	/* send notification for routes that were added so far so
+	 * that the delete notifications sent by ip6_route_del in
+	 * the next block are coherent
+	 */
+	if (rt_first)
+		inet6_rt_notify(RTM_NEWROUTE, rt_first, &cfg->fc_nlinfo,
+				nlflags);
+
 	/* Delete routes that were already added */
 	list_for_each_entry(nh, &rt6_nh_list, next) {
 		if (err_nh == nh)
@@ -3198,6 +3227,60 @@ static inline size_t rt6_nlmsg_size(struct rt6_info *rt)
 	       + lwtunnel_get_encap_size(rt->dst.lwtstate);
 }
 
+static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
+			    unsigned int *flags)
+{
+	if (!netif_running(rt->dst.dev) || !netif_carrier_ok(rt->dst.dev)) {
+		*flags |= RTNH_F_LINKDOWN;
+		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+			*flags |= RTNH_F_DEAD;
+	}
+
+	if (rt->rt6i_flags & RTF_GATEWAY) {
+		if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0)
+			goto nla_put_failure;
+	}
+
+	if (rt->dst.dev &&
+	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
+		goto nla_put_failure;
+
+	if (rt->dst.lwtstate &&
+	    lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0)
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt)
+{
+	struct rtnexthop *rtnh;
+	unsigned int flags = 0;
+
+	rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
+	if (!rtnh)
+		goto nla_put_failure;
+
+	rtnh->rtnh_hops = 0;
+	rtnh->rtnh_ifindex = rt->dst.dev ? rt->dst.dev->ifindex : 0;
+
+	if (rt6_nexthop_info(skb, rt, &flags) < 0)
+		goto nla_put_failure;
+
+	rtnh->rtnh_flags = flags;
+
+	/* length of rtnetlink header + attributes */
+	rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *)rtnh;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int rt6_fill_node(struct net *net,
 			 struct sk_buff *skb, struct rt6_info *rt,
 			 struct in6_addr *dst, struct in6_addr *src,
@@ -3249,11 +3332,6 @@ static int rt6_fill_node(struct net *net,
 	else
 		rtm->rtm_type = RTN_UNICAST;
 	rtm->rtm_flags = 0;
-	if (!netif_running(rt->dst.dev) || !netif_carrier_ok(rt->dst.dev)) {
-		rtm->rtm_flags |= RTNH_F_LINKDOWN;
-		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
-			rtm->rtm_flags |= RTNH_F_DEAD;
-	}
 	rtm->rtm_scope = RT_SCOPE_UNIVERSE;
 	rtm->rtm_protocol = rt->rt6i_protocol;
 	if (rt->rt6i_flags & RTF_DYNAMIC)
@@ -3317,17 +3395,36 @@ static int rt6_fill_node(struct net *net,
 	if (rtnetlink_put_metrics(skb, metrics) < 0)
 		goto nla_put_failure;
 
-	if (rt->rt6i_flags & RTF_GATEWAY) {
-		if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0)
-			goto nla_put_failure;
-	}
-
-	if (rt->dst.dev &&
-	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
-		goto nla_put_failure;
 	if (nla_put_u32(skb, RTA_PRIORITY, rt->rt6i_metric))
 		goto nla_put_failure;
 
+	/* For multipath routes, walk the siblings list and add
+	 * each as a nexthop within RTA_MULTIPATH. If this route
+	 * is an append, then only send the route that is added.
+	 */
+	if (rt->rt6i_nsiblings && !(flags & NLM_F_APPEND)) {
+		struct rt6_info *sibling, *next_sibling;
+		struct nlattr *mp;
+
+		mp = nla_nest_start(skb, RTA_MULTIPATH);
+		if (!mp)
+			goto nla_put_failure;
+
+		if (rt6_add_nexthop(skb, rt) < 0)
+			goto nla_put_failure;
+
+		list_for_each_entry_safe(sibling, next_sibling,
+					 &rt->rt6i_siblings, rt6i_siblings) {
+			if (rt6_add_nexthop(skb, sibling) < 0)
+				goto nla_put_failure;
+		}
+
+		nla_nest_end(skb, mp);
+	} else {
+		if (rt6_nexthop_info(skb, rt, &rtm->rtm_flags) < 0)
+			goto nla_put_failure;
+	}
+
 	expires = (rt->rt6i_flags & RTF_EXPIRES) ? rt->dst.expires - jiffies : 0;
 
 	if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, rt->dst.error) < 0)
@@ -3336,8 +3433,6 @@ static int rt6_fill_node(struct net *net,
 	if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt->rt6i_flags)))
 		goto nla_put_failure;
 
-	if (lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0)
-		goto nla_put_failure;
 
 	nlmsg_end(skb, nlh);
 	return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next v3 4/4] net: ipv6: Use compressed IPv6 addresses showing route replace error
  2017-01-27 23:20 [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes David Ahern
                   ` (2 preceding siblings ...)
  2017-01-27 23:20 ` [PATCH net-next v3 3/4] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute David Ahern
@ 2017-01-27 23:20 ` David Ahern
  2017-01-29  1:00 ` [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes Roopa Prabhu
  4 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2017-01-27 23:20 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

ip6_print_replace_route_err logs an error if a route replace fails with
IPv6 addresses in the full format. e.g,:

IPv6: IPV6: multipath route replace failed (check consistency of installed routes): 2001:0db8:0200:0000:0000:0000:0000:0000 nexthop 2001:0db8:0001:0000:0000:0000:0000:0016 ifi 0

Change the message to dump the addresses in the compressed format.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 747f333ae006..463cc0847a3d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2969,7 +2969,7 @@ static void ip6_print_replace_route_err(struct list_head *rt6_nh_list)
 	struct rt6_nh *nh;
 
 	list_for_each_entry(nh, rt6_nh_list, next) {
-		pr_warn("IPV6: multipath route replace failed (check consistency of installed routes): %pI6 nexthop %pI6 ifi %d\n",
+		pr_warn("IPV6: multipath route replace failed (check consistency of installed routes): %pI6c nexthop %pI6c ifi %d\n",
 		        &nh->r_cfg.fc_dst, &nh->r_cfg.fc_gateway,
 		        nh->r_cfg.fc_ifindex);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-27 23:20 [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes David Ahern
                   ` (3 preceding siblings ...)
  2017-01-27 23:20 ` [PATCH net-next v3 4/4] net: ipv6: Use compressed IPv6 addresses showing route replace error David Ahern
@ 2017-01-29  1:00 ` Roopa Prabhu
  2017-01-29 18:02   ` David Ahern
  4 siblings, 1 reply; 26+ messages in thread
From: Roopa Prabhu @ 2017-01-29  1:00 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, nicolas.dichtel

On 1/27/17, 3:20 PM, David Ahern wrote:
> This series closes a couple of gaps between IPv4 and IPv6 with respect
> to multipath routes:
>
> 1. IPv4 allows all nexthops of multipath routes to be deleted using just
>    the prefix and length; IPv6 only deletes the first nexthop for the
>    route if only the prefix and length are given.
>
> 2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
>    IPv6 returns a series of routes with the same prefix and length - one
>    for each nexthop. This happens for both dumps and notifications.
>
> IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
> series of routes.
>
> Patch 2 addresses the first item by allowing IPv6 multipath routes to be
> deleted using just the prefix and length. Patch 3 addresses the second
> allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.
>
> Patch 1 adds the NLM_F_APPEND flag to notifications when the flag is
> present in the request. The lack of this flag was noted testing route
> appends and comparing to IPv4.
>
> Patch 4 prints IPv6 addresses in compressed format when showing route
> replace errors. This was noticed testing REPLACE failures.
>
> The end result for multipath routes:
> 1. Route Add
>    - one notification with RTA_MULTIPATH attribute
>
> 2. Route Replace
>    - notification for first route and all siblings that have
>      succeeded. This is needed regardless of success of remaining
>      nexthops to maintain add/delete consistency should a failure
>      happens on the second or following nexthop (ie., need to tell
>      userspace that original route has been replaced and then the
>      failure logic deletes all routes inserted thus far).
>      
> 3. Route Delete
>    - for multipath route only given nexthops are deleted. This path
>      is hit when DELETE contains RTA_MULTIPATH. All other route deletes,
>      all nexthops are deleted for given prefix and length (and any
>      other specs if given)
>
>    - one notification sent per nexthop deleted. This is unavoidable
>      since IPv6 alllows a single nexthop to be deleted within a multipath
>      route
>
> 4. Route Appends
>    - IPv6 allows nexthops to be appended to an existing route. In this
>      case one notification is sent per nexthop added

thanks for listing all of these...I think you mentioned this case to me..
but I don't remember now why this notification is
sent per nexthop added. This is an update to an existing multipath route.
so seems like the notification should be a RTM_NEWROUTE with the full RTA_MULTIPATH route
(similar to route add)

Same holds for replace, I know the code might be tricky here...but the route replace
is also an update to an existing multipath route and hence should be a RTM_NEWROUTE
with the full multipath route (RTA_MULTIPATH) that changed (from userspace semantics POV)

I don't have a better solution, but with the above still being different, wondering
if its worth the risk changing the api for just a few notifications.

>
> Addresses some of the inconsistencies also noted by Roopa at netdev0.1:
> https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_slides.pdf
>
> v3
> - removed the need for a user API to opt-in to change. Requiring an
>   API just shifts the difference from same API with different
>   behavior to different API to achieve equivalent behavior
>
> - route notifications changed to use RTA_MULTIPATH for add and replace
>
> - upated commit messages and cover letter
>
> v2
> - fixed locking in patch 1 as noted by DaveM
> - changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
>   set in rtm_flags
> - revamped explanation of patch 2 and cover letter
>
> David Ahern (4):
>   net: ipv6: add NLM_F_APPEND in notifications when applicable
>   net: ipv6: Allow shorthand delete of all nexthops in multipath route
>   net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH
>     attribute
>   net: ipv6: Use compressed IPv6 addresses showing route replace error
>
>  include/net/ip6_fib.h |   4 +-
>  include/net/netlink.h |   1 +
>  net/ipv6/ip6_fib.c    |  19 +++++-
>  net/ipv6/route.c      | 163 ++++++++++++++++++++++++++++++++++++++++++++------
>  4 files changed, 165 insertions(+), 22 deletions(-)
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-29  1:00 ` [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes Roopa Prabhu
@ 2017-01-29 18:02   ` David Ahern
  2017-01-29 23:55     ` Roopa Prabhu
  2017-01-30 11:07     ` Nicolas Dichtel
  0 siblings, 2 replies; 26+ messages in thread
From: David Ahern @ 2017-01-29 18:02 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev, nicolas.dichtel

On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>> 4. Route Appends
>>    - IPv6 allows nexthops to be appended to an existing route. In this
>>      case one notification is sent per nexthop added
> 
> thanks for listing all of these...I think you mentioned this case to me..
> but I don't remember now why this notification is
> sent per nexthop added. This is an update to an existing multipath route.
> so seems like the notification should be a RTM_NEWROUTE with the full RTA_MULTIPATH route
> (similar to route add)

It could be; it's a question of what should userspace get -- the full route or the change? Append to me suggests the latter - userspace is told what changed. It is simpler kernel code wise to send the full new route. The append changes were done after our conversation. ;-)

> 
> Same holds for replace, I know the code might be tricky here...but the route replace
> is also an update to an existing multipath route and hence should be a RTM_NEWROUTE
> with the full multipath route (RTA_MULTIPATH) that changed (from userspace semantics POV)

It is. The only change on a replace is the encoding of all routes in RTA_MULTIPATH which is the point of this patch set. On successful replace, only 1 notification is sent with the full route.

> 
> I don't have a better solution, but with the above still being different, wondering
> if its worth the risk changing the api for just a few notifications.

Data centers are moving to L3, and multipath is a big part of that. Anyone who looks at ip -6 route enough knows it gets painful mentally pulling the individual routes into a single one. In addition, using RTA_MULTIPATH is more efficient at scale.

Furthermore, I have a follow on patch set that returns the route matched on an ip route get lookup. Returning the full multipath route is an important part to understanding the full context of the route to an address.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-29 18:02   ` David Ahern
@ 2017-01-29 23:55     ` Roopa Prabhu
  2017-01-30  1:29       ` David Ahern
  2017-01-30 11:08       ` Nicolas Dichtel
  2017-01-30 11:07     ` Nicolas Dichtel
  1 sibling, 2 replies; 26+ messages in thread
From: Roopa Prabhu @ 2017-01-29 23:55 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, nicolas.dichtel

On 1/29/17, 10:02 AM, David Ahern wrote:
> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>>> 4. Route Appends
>>>    - IPv6 allows nexthops to be appended to an existing route. In this
>>>      case one notification is sent per nexthop added
>> thanks for listing all of these...I think you mentioned this case to me..
>> but I don't remember now why this notification is
>> sent per nexthop added. This is an update to an existing multipath route.
>> so seems like the notification should be a RTM_NEWROUTE with the full RTA_MULTIPATH route
>> (similar to route add)
> It could be; it's a question of what should userspace get -- the full route or the change? Append to me suggests the latter - userspace is told what changed. It is simpler kernel code wise to send the full new route. The append changes were done after our conversation. ;-)
ok, yeah. you listing all the cases here made it more simpler to understand in the context of other notifications :). I would prefer all
RTM_NEWLINK notifications (ie new add or update to an existing route..replace/append), contain the full route via RTA_MULTIPATH.

>
>> Same holds for replace, I know the code might be tricky here...but the route replace
>> is also an update to an existing multipath route and hence should be a RTM_NEWROUTE
>> with the full multipath route (RTA_MULTIPATH) that changed (from userspace semantics POV)
> It is. The only change on a replace is the encoding of all routes in RTA_MULTIPATH which is the point of this patch set. On successful replace, only 1 notification is sent with the full route.
ok, good.
>
>> I don't have a better solution, but with the above still being different, wondering
>> if its worth the risk changing the api for just a few notifications.
> Data centers are moving to L3, and multipath is a big part of that. Anyone who looks at ip -6 route enough knows it gets painful mentally pulling the individual routes into a single one. In addition, using RTA_MULTIPATH is more efficient at scale.
>
> Furthermore, I have a follow on patch set that returns the route matched on an ip route get lookup. Returning the full multipath route is an important part to understanding the full context of the route to an address.

ok, sure.  If we are moving to RTA_MULTIPATH, I just want to make sure all the notifications consistently move to multipath.

so, to summarize and to get on the same page as you,
- all RTM_NEWLINK notifications will contain RTA_MULTIPATH when the route in the kernel is a multipath route:
            - since ipv6 allows add/append of a single nexthop, the notification for the first nexthop may go out without a RTA_MULTIPATH
- RTM_DELETE will also contain RTA_MULTIPATH when the user tries to delete the full route with RTA_MULTIPATH
            - since ipv6 allows deleting of a single nexthop, RTM_DELETE may contain a single nexthop delete ie without RTA_MULTIPATH (this is just to continue
                supporting the single nexthop delete support that ipv6 has)

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-29 23:55     ` Roopa Prabhu
@ 2017-01-30  1:29       ` David Ahern
  2017-01-30  2:20         ` Roopa Prabhu
  2017-01-30 11:08       ` Nicolas Dichtel
  1 sibling, 1 reply; 26+ messages in thread
From: David Ahern @ 2017-01-30  1:29 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev, nicolas.dichtel

On 1/29/17 4:55 PM, Roopa Prabhu wrote:
> On 1/29/17, 10:02 AM, David Ahern wrote:
>> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>>>> 4. Route Appends
>>>>    - IPv6 allows nexthops to be appended to an existing route. In this
>>>>      case one notification is sent per nexthop added
>>> thanks for listing all of these...I think you mentioned this case to me..
>>> but I don't remember now why this notification is
>>> sent per nexthop added. This is an update to an existing multipath route.
>>> so seems like the notification should be a RTM_NEWROUTE with the full RTA_MULTIPATH route
>>> (similar to route add)
>> It could be; it's a question of what should userspace get -- the full route or the change? Append to me suggests the latter - userspace is told what changed. It is simpler kernel code wise to send the full new route. The append changes were done after our conversation. ;-)
> ok, yeah. you listing all the cases here made it more simpler to understand in the context of other notifications :). I would prefer all
> RTM_NEWLINK notifications (ie new add or update to an existing route..replace/append), contain the full route via RTA_MULTIPATH.

Ok. I will send a v4 in a day or 2 (wait for more comments) with APPEND only generating 1 notification. Below response is based on this change.

> 
>>
>>> Same holds for replace, I know the code might be tricky here...but the route replace
>>> is also an update to an existing multipath route and hence should be a RTM_NEWROUTE
>>> with the full multipath route (RTA_MULTIPATH) that changed (from userspace semantics POV)
>> It is. The only change on a replace is the encoding of all routes in RTA_MULTIPATH which is the point of this patch set. On successful replace, only 1 notification is sent with the full route.
> ok, good.
>>
>>> I don't have a better solution, but with the above still being different, wondering
>>> if its worth the risk changing the api for just a few notifications.
>> Data centers are moving to L3, and multipath is a big part of that. Anyone who looks at ip -6 route enough knows it gets painful mentally pulling the individual routes into a single one. In addition, using RTA_MULTIPATH is more efficient at scale.
>>
>> Furthermore, I have a follow on patch set that returns the route matched on an ip route get lookup. Returning the full multipath route is an important part to understanding the full context of the route to an address.
> 
> ok, sure.  If we are moving to RTA_MULTIPATH, I just want to make sure all the notifications consistently move to multipath.
> 
> so, to summarize and to get on the same page as you,
> - all RTM_NEWLINK notifications will contain RTA_MULTIPATH when the route in the kernel is a multipath route:
>             - since ipv6 allows add/append of a single nexthop, the notification for the first nexthop may go out without a RTA_MULTIPATH

No. I notification for all cases - new, replace, append.

> - RTM_DELETE will also contain RTA_MULTIPATH when the user tries to delete the full route with RTA_MULTIPATH
>             - since ipv6 allows deleting of a single nexthop, RTM_DELETE may contain a single nexthop delete ie without RTA_MULTIPATH (this is just to continue
>                 supporting the single nexthop delete support that ipv6 has)

Let's give an example for each case:

1. Add - full multipath route

This route command:
ip -6 ro add vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::2 nexthop via 2001:db8:2::2

generates this notification (ip -t mon ro):

2001:db8:200::/120 table red metric 1024
        nexthop via 2001:db8:1::2  dev eth1 weight 1
        nexthop via 2001:db8:2::2  dev eth2 weight 1


2. Delete - 1 notification for each hop for all combinations of delete commands


3. Replace -  full multipath route

This route command (with the one from Add case already in place):
ip -6 ro replace vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::16 nexthop via 2001:db8:2::16

generates a single notification:

Replaced 2001:db8:200::/120 table red metric 1024
        nexthop via 2001:db8:1::16  dev eth1 weight 1
        nexthop via 2001:db8:2::16  dev eth2 weight 1


4. Append - 1 route after all hops are appended

This append command (starting with the one installed from the Replace case):
ip -6 ro append vrf red 2001:db8:200::/120 nexthop via 2001:db8:2::20 nexthop via 2001:db8:1::20

generates a single notification:

Append 2001:db8:200::/120 table red metric 1024
        nexthop via 2001:db8:2::20  dev eth2 weight 1
        nexthop via 2001:db8:1::20  dev eth1 weight 1
        nexthop via 2001:db8:1::16  dev eth1 weight 1
        nexthop via 2001:db8:2::16  dev eth2 weight 1

(The Replaced and Append annotations are due to a local iproute2 patch; iproute2 does
not currently distinguish NEWROUTE cases.)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30  1:29       ` David Ahern
@ 2017-01-30  2:20         ` Roopa Prabhu
  2017-01-30  2:57           ` David Ahern
  0 siblings, 1 reply; 26+ messages in thread
From: Roopa Prabhu @ 2017-01-30  2:20 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, nicolas.dichtel

On 1/29/17, 5:29 PM, David Ahern wrote:
[snip]
> Let's give an example for each case:
>
> 1. Add - full multipath route
>
> This route command:
> ip -6 ro add vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::2 nexthop via 2001:db8:2::2
>
> generates this notification (ip -t mon ro):
>
> 2001:db8:200::/120 table red metric 1024
>         nexthop via 2001:db8:1::2  dev eth1 weight 1
>         nexthop via 2001:db8:2::2  dev eth2 weight 1
>

ack,
> 2. Delete - 1 notification for each hop for all combinations of delete commands

here I was trying to say, for people deleting the full multipath route, you should send a RTA_MULTIPATH.
For the odd case of ipv6 single nexthop delete, you can continue to send a single nexthop delete (like today ..without RTA_MULTIPATH)
because there is no other way to notify a single nexthop delete with a RTM_DELROUTE.

The reason I say this is because: keeping future in mind, this will make things consistent for majority of people who will
start using RTA_MULTIPATH for both ipv4 and ipv6 for adds and deletes (requests/notifications and dumps) the same way.
single next hop delete will just be around because of fear of breaking existing applications.

>
>
> 3. Replace -  full multipath route
>
> This route command (with the one from Add case already in place):
> ip -6 ro replace vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::16 nexthop via 2001:db8:2::16
>
> generates a single notification:
>
> Replaced 2001:db8:200::/120 table red metric 1024
>         nexthop via 2001:db8:1::16  dev eth1 weight 1
>         nexthop via 2001:db8:2::16  dev eth2 weight 1

ack,
>
> 4. Append - 1 route after all hops are appended
>
> This append command (starting with the one installed from the Replace case):
> ip -6 ro append vrf red 2001:db8:200::/120 nexthop via 2001:db8:2::20 nexthop via 2001:db8:1::20
>
> generates a single notification:
>
> Append 2001:db8:200::/120 table red metric 1024
>         nexthop via 2001:db8:2::20  dev eth2 weight 1
>         nexthop via 2001:db8:1::20  dev eth1 weight 1
>         nexthop via 2001:db8:1::16  dev eth1 weight 1
>         nexthop via 2001:db8:2::16  dev eth2 weight 1
>
> (The Replaced and Append annotations are due to a local iproute2 patch; iproute2 does
> not currently distinguish NEWROUTE cases.)
>

ack,

thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30  2:20         ` Roopa Prabhu
@ 2017-01-30  2:57           ` David Ahern
  2017-01-30 11:13             ` Nicolas Dichtel
  2017-01-30 15:49             ` Roopa Prabhu
  0 siblings, 2 replies; 26+ messages in thread
From: David Ahern @ 2017-01-30  2:57 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev, nicolas.dichtel

On 1/29/17 7:20 PM, Roopa Prabhu wrote:
>> 2. Delete - 1 notification for each hop for all combinations of delete commands
> 
> here I was trying to say, for people deleting the full multipath route, you should send a RTA_MULTIPATH.

The only way to do that is to call inet6_rt_notify() *before* the route delete work has been done in fib6_del_route. Right now it is called at the end - after all sibling associations have been dropped, a time in which it can no longer fill in RTA_MULTIPATH.

fib6_del_route function does not have any failure paths and is called with the write lock held, so moving the notification might be ok. But it also means ignoring all subsequent failures from fib6_del, again which might be ok since the only failure is ENOENT which can not happen if we are walking the sibling list.

> For the odd case of ipv6 single nexthop delete, you can continue to send a single nexthop delete (like today ..without RTA_MULTIPATH)
> because there is no other way to notify a single nexthop delete with a RTM_DELROUTE.
> 
> The reason I say this is because: keeping future in mind, this will make things consistent for majority of people who will
> start using RTA_MULTIPATH for both ipv4 and ipv6 for adds and deletes (requests/notifications and dumps) the same way.
> single next hop delete will just be around because of fear of breaking existing applications.

Single next hop delete will be around because IPv6 allows it -- and because IPv4 needs to support it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-29 18:02   ` David Ahern
  2017-01-29 23:55     ` Roopa Prabhu
@ 2017-01-30 11:07     ` Nicolas Dichtel
  2017-01-30 15:23       ` David Ahern
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Dichtel @ 2017-01-30 11:07 UTC (permalink / raw)
  To: David Ahern, Roopa Prabhu; +Cc: netdev

Le 29/01/2017 à 19:02, David Ahern a écrit :
[snip]
> Data centers are moving to L3, and multipath is a big part of that. Anyone who looks at ip -6 route enough knows it gets painful mentally pulling the individual routes into a single one.
I agree, but it's only an iproute2 problem. iproute2 could group routes to have
a better output, there is no need to have a kernel patch for this ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-29 23:55     ` Roopa Prabhu
  2017-01-30  1:29       ` David Ahern
@ 2017-01-30 11:08       ` Nicolas Dichtel
  2017-01-30 15:45         ` Roopa Prabhu
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Dichtel @ 2017-01-30 11:08 UTC (permalink / raw)
  To: Roopa Prabhu, David Ahern; +Cc: netdev

Le 30/01/2017 à 00:55, Roopa Prabhu a écrit :
> On 1/29/17, 10:02 AM, David Ahern wrote:
>> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>>>> 4. Route Appends
>>>>    - IPv6 allows nexthops to be appended to an existing route. In this
>>>>      case one notification is sent per nexthop added
>>> thanks for listing all of these...I think you mentioned this case to me..
>>> but I don't remember now why this notification is
>>> sent per nexthop added. This is an update to an existing multipath route.
>>> so seems like the notification should be a RTM_NEWROUTE with the full RTA_MULTIPATH route
>>> (similar to route add)
>> It could be; it's a question of what should userspace get -- the full route or the change? Append to me suggests the latter - userspace is told what changed. It is simpler kernel code wise to send the full new route. The append changes were done after our conversation. ;-)
> ok, yeah. you listing all the cases here made it more simpler to understand in the context of other notifications :). I would prefer all
> RTM_NEWLINK notifications (ie new add or update to an existing route..replace/append), contain the full route via RTA_MULTIPATH.
I don't agree. With the previous proposal, you know *exactly* what happens with
each notification and this is the primary goal of the notifications. With the
last proposal, where RTA_MULTIPATH is used for replace and append, you have the
new result, but you don't know what has been done.
Usually, notifications are used to notify an event, not the result of an event.
If you want the result, you can use the dump cmd.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30  2:57           ` David Ahern
@ 2017-01-30 11:13             ` Nicolas Dichtel
  2017-01-30 15:59               ` David Ahern
  2017-01-30 15:49             ` Roopa Prabhu
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Dichtel @ 2017-01-30 11:13 UTC (permalink / raw)
  To: David Ahern, Roopa Prabhu; +Cc: netdev

Le 30/01/2017 à 03:57, David Ahern a écrit :
> On 1/29/17 7:20 PM, Roopa Prabhu wrote:
>>> 2. Delete - 1 notification for each hop for all combinations of delete commands
>>
>> here I was trying to say, for people deleting the full multipath route, you should send a RTA_MULTIPATH.
> 
> The only way to do that is to call inet6_rt_notify() *before* the route delete work has been done in fib6_del_route. Right now it is called at the end - after all sibling associations have been dropped, a time in which it can no longer fill in RTA_MULTIPATH.
> 
> fib6_del_route function does not have any failure paths and is called with the write lock held, so moving the notification might be ok. But it also means ignoring all subsequent failures from fib6_del, again which might be ok since the only failure is ENOENT which can not happen if we are walking the sibling list.
Is it not possible to prepare the RTA_MULTIPATH attribute during the deletion
(when needed information are still available) and to use/copy it later when the
notification is done?

Or at least, having RTA_MUTLTIPATH with one nexthop only, so that the user knows
that this route was part of an ECMP routes?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 11:07     ` Nicolas Dichtel
@ 2017-01-30 15:23       ` David Ahern
  2017-01-30 17:03         ` Nicolas Dichtel
  0 siblings, 1 reply; 26+ messages in thread
From: David Ahern @ 2017-01-30 15:23 UTC (permalink / raw)
  To: nicolas.dichtel, Roopa Prabhu; +Cc: netdev

On 1/30/17 4:07 AM, Nicolas Dichtel wrote:
> Le 29/01/2017 à 19:02, David Ahern a écrit :
> [snip]
>> Data centers are moving to L3, and multipath is a big part of that. Anyone who looks at ip -6 route enough knows it gets painful mentally pulling the individual routes into a single one.
> I agree, but it's only an iproute2 problem. iproute2 could group routes to have
> a better output, there is no need to have a kernel patch for this ;-)
> 

iproute2 is not the only rtnetlink user. The comment above uses ip show as an example. libnl has a workaround for IPv6 to update route objects versus replacing them - unnecessary complexity that does not need to replicated to iproute2, Quagga/FRR or python libraries implementing rtnetlink. Really, RTA_MULTIPATH support in notifications should have been added when multipath support was added to the IPv6. Patch 3 is mostly a refactoring of rt6_fill_node to fill in nexthop information. This could have been done 4+ years ago when RTA_MULTIPATH route adds was added to the stack.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 11:08       ` Nicolas Dichtel
@ 2017-01-30 15:45         ` Roopa Prabhu
  2017-01-30 16:41           ` Nicolas Dichtel
  0 siblings, 1 reply; 26+ messages in thread
From: Roopa Prabhu @ 2017-01-30 15:45 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: David Ahern, netdev

On 1/30/17, 3:08 AM, Nicolas Dichtel wrote:
> Le 30/01/2017 à 00:55, Roopa Prabhu a écrit :
>> On 1/29/17, 10:02 AM, David Ahern wrote:
>>> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>>>>> 4. Route Appends
>>>>>    - IPv6 allows nexthops to be appended to an existing route. In this
>>>>>      case one notification is sent per nexthop added
>>>> thanks for listing all of these...I think you mentioned this case to me..
>>>> but I don't remember now why this notification is
>>>> sent per nexthop added. This is an update to an existing multipath route.
>>>> so seems like the notification should be a RTM_NEWROUTE with the full RTA_MULTIPATH route
>>>> (similar to route add)
>>> It could be; it's a question of what should userspace get -- the full route or the change? Append to me suggests the latter - userspace is told what changed. It is simpler kernel code wise to send the full new route. The append changes were done after our conversation. ;-)
>> ok, yeah. you listing all the cases here made it more simpler to understand in the context of other notifications :). I would prefer all
>> RTM_NEWLINK notifications (ie new add or update to an existing route..replace/append), contain the full route via RTA_MULTIPATH.
> I don't agree. With the previous proposal, you know *exactly* what happens with
> each notification and this is the primary goal of the notifications. With the
> last proposal, where RTA_MULTIPATH is used for replace and append, you have the
> new result, but you don't know what has been done.
> Usually, notifications are used to notify an event, not the result of an event.
> If you want the result, you can use the dump cmd.

what has been done is conveyed by the APPEND and REPLACE flag in the notification. The user only cares about the updated multipath route...
giving parts of the route has never been very useful... and this is consistent with ipv4.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30  2:57           ` David Ahern
  2017-01-30 11:13             ` Nicolas Dichtel
@ 2017-01-30 15:49             ` Roopa Prabhu
  2017-01-30 16:12               ` David Ahern
  1 sibling, 1 reply; 26+ messages in thread
From: Roopa Prabhu @ 2017-01-30 15:49 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, nicolas.dichtel

On 1/29/17, 6:57 PM, David Ahern wrote:
> On 1/29/17 7:20 PM, Roopa Prabhu wrote:
>>> 2. Delete - 1 notification for each hop for all combinations of delete commands
>> here I was trying to say, for people deleting the full multipath route, you should send a RTA_MULTIPATH.
> The only way to do that is to call inet6_rt_notify() *before* the route delete work has been done in fib6_del_route. Right now it is called at the end - after all sibling associations have been dropped, a time in which it can no longer fill in RTA_MULTIPATH.
>
> fib6_del_route function does not have any failure paths and is called with the write lock held, so moving the notification might be ok. But it also means ignoring all subsequent failures from fib6_del, again which might be ok since the only failure is ENOENT which can not happen if we are walking the sibling list.
>
>
>> For the odd case of ipv6 single nexthop delete, you can continue to send a single nexthop delete (like today ..without RTA_MULTIPATH)
>> because there is no other way to notify a single nexthop delete with a RTM_DELROUTE.
>>
>> The reason I say this is because: keeping future in mind, this will make things consistent for majority of people who will
>> start using RTA_MULTIPATH for both ipv4 and ipv6 for adds and deletes (requests/notifications and dumps) the same way.
>> single next hop delete will just be around because of fear of breaking existing applications.
> Single next hop delete will be around because IPv6 allows it -- and because IPv4 needs to support it.
>
understand single next hop delete for ipv6 will be around..and my only point was to leave it around but not optimize for that case.
I don't think we should support single nexthop delete in ipv4 (I have not seen a requirement for that)... ipv4 is good as it is right now.
the additional complexity is not needed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 11:13             ` Nicolas Dichtel
@ 2017-01-30 15:59               ` David Ahern
  0 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2017-01-30 15:59 UTC (permalink / raw)
  To: nicolas.dichtel, Roopa Prabhu; +Cc: netdev

On 1/30/17 4:13 AM, Nicolas Dichtel wrote:
> Le 30/01/2017 à 03:57, David Ahern a écrit :
>> On 1/29/17 7:20 PM, Roopa Prabhu wrote:
>>>> 2. Delete - 1 notification for each hop for all combinations of delete commands
>>>
>>> here I was trying to say, for people deleting the full multipath route, you should send a RTA_MULTIPATH.
>>
>> The only way to do that is to call inet6_rt_notify() *before* the route delete work has been done in fib6_del_route. Right now it is called at the end - after all sibling associations have been dropped, a time in which it can no longer fill in RTA_MULTIPATH.
>>
>> fib6_del_route function does not have any failure paths and is called with the write lock held, so moving the notification might be ok. But it also means ignoring all subsequent failures from fib6_del, again which might be ok since the only failure is ENOENT which can not happen if we are walking the sibling list.
> Is it not possible to prepare the RTA_MULTIPATH attribute during the deletion
> (when needed information are still available) and to use/copy it later when the
> notification is done?

What I outlined above has the least complexity and does not require copies. And it is only proper for the 1 case where a multipath route is deleted by prefix and length -- the ability enabled by Patch 2.

> 
> Or at least, having RTA_MUTLTIPATH with one nexthop only, so that the user knows
> that this route was part of an ECMP routes?
> 

For the ip6_route_multipath_del path, sure the nexthops deleted can be wrapped in RTA_MULTIPATH if the nexthop indeed has siblings, but does it help userspace? If all attributes outside of the nexthops are identical, then userspace would match the route without it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 15:49             ` Roopa Prabhu
@ 2017-01-30 16:12               ` David Ahern
  2017-01-30 18:45                 ` Roopa Prabhu
  0 siblings, 1 reply; 26+ messages in thread
From: David Ahern @ 2017-01-30 16:12 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev, nicolas.dichtel

On 1/30/17 8:49 AM, Roopa Prabhu wrote:
>> Single next hop delete will be around because IPv6 allows it -- and because IPv4 needs to support it.
>>
> understand single next hop delete for ipv6 will be around..and my only point was to leave it around but not optimize for that case.
> I don't think we should support single nexthop delete in ipv4 (I have not seen a requirement for that)... ipv4 is good as it is right now.
> the additional complexity is not needed.
> 

IPv4 has a known bug -- delete a virtual interface in a multihop route and the entire route is deleted, including the nexthops for other devices. This does not happen for IPv6.

Simple example of that bug:

ip li add dummy1 type dummy
ip li add dummy2 type dummy
ip addr add dev dummy1 10.11.1.1/28
ip li set dummy1 up
ip addr add dev dummy2 10.11.2.1/28
ip li set dummy2 up
ip ro add 1.1.1.0/24 nexthop via 10.11.1.2 nexthop via 10.11.2.2
ip li del dummy2

--> the entire multipath route has been deleted.

And, fixing this bug enables work to make IPv4 append to be sane -- appending a route should modify an existing route by adding the nexthop, not adding a new route that I believe can never actually be hit.

Both cases mean modifying an IPv4 route -- adding or removing nexthops -- a capability that IPv6 allows so fixing this means closing another difference between the stacks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 15:45         ` Roopa Prabhu
@ 2017-01-30 16:41           ` Nicolas Dichtel
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Dichtel @ 2017-01-30 16:41 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: David Ahern, netdev

Le 30/01/2017 à 16:45, Roopa Prabhu a écrit :
> On 1/30/17, 3:08 AM, Nicolas Dichtel wrote:
>> Le 30/01/2017 à 00:55, Roopa Prabhu a écrit :
>>> On 1/29/17, 10:02 AM, David Ahern wrote:
>>>> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>>>>>> 4. Route Appends
>>>>>>    - IPv6 allows nexthops to be appended to an existing route. In this
>>>>>>      case one notification is sent per nexthop added
>>>>> thanks for listing all of these...I think you mentioned this case to me..
>>>>> but I don't remember now why this notification is
>>>>> sent per nexthop added. This is an update to an existing multipath route.
>>>>> so seems like the notification should be a RTM_NEWROUTE with the full RTA_MULTIPATH route
>>>>> (similar to route add)
>>>> It could be; it's a question of what should userspace get -- the full route or the change? Append to me suggests the latter - userspace is told what changed. It is simpler kernel code wise to send the full new route. The append changes were done after our conversation. ;-)
>>> ok, yeah. you listing all the cases here made it more simpler to understand in the context of other notifications :). I would prefer all
>>> RTM_NEWLINK notifications (ie new add or update to an existing route..replace/append), contain the full route via RTA_MULTIPATH.
>> I don't agree. With the previous proposal, you know *exactly* what happens with
>> each notification and this is the primary goal of the notifications. With the
>> last proposal, where RTA_MULTIPATH is used for replace and append, you have the
>> new result, but you don't know what has been done.
>> Usually, notifications are used to notify an event, not the result of an event.
>> If you want the result, you can use the dump cmd.
> 
> what has been done is conveyed by the APPEND and REPLACE flag in the notification. The user only cares about the updated multipath route...
APPEND with a full route is bit ambiguous ;-)

> giving parts of the route has never been very useful... and this is consistent with ipv4.
> 
I don't agree, the user cares about what happens. The dump is here to have the
result if you don't have it already (if the user has a cache, like the libnl,
it's easy to have the result). Daemons that listen nl usually start by a dump
and after they apply the change when they get notifications (they already know
the previous state).

And even if I agree that it's better to be consistent with ipv4, saying "it's
like this in ipv4" is not an argument to made the behavior good/better.
The need to fully remove an ecmp route to add a nexthop in ipv4 is really
painful (see David's example about the 'interface removal' case). And this
functional difference also impacts the notifications.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 15:23       ` David Ahern
@ 2017-01-30 17:03         ` Nicolas Dichtel
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Dichtel @ 2017-01-30 17:03 UTC (permalink / raw)
  To: David Ahern, Roopa Prabhu; +Cc: netdev

Le 30/01/2017 à 16:23, David Ahern a écrit :
> On 1/30/17 4:07 AM, Nicolas Dichtel wrote:
>> Le 29/01/2017 à 19:02, David Ahern a écrit :
>> [snip]
>>> Data centers are moving to L3, and multipath is a big part of that. Anyone who looks at ip -6 route enough knows it gets painful mentally pulling the individual routes into a single one.
>> I agree, but it's only an iproute2 problem. iproute2 could group routes to have
>> a better output, there is no need to have a kernel patch for this ;-)
>>
> 
> iproute2 is not the only rtnetlink user. The comment above uses ip show as an example. libnl has a workaround for IPv6 to update route objects versus replacing them - unnecessary complexity that does not need to replicated to iproute2, Quagga/FRR or python libraries implementing rtnetlink. Really, RTA_MULTIPATH support in notifications should have been added when multipath support was added to the IPv6. Patch 3 is mostly a refactoring of rt6_fill_node to fill in nexthop information. This could have been done 4+ years ago when RTA_MULTIPATH route adds was added to the stack.
> 
Like I said, I fully agree that RTA_MULTIPATH is better for the dump. For the
notifications, I'm not convinced. I did not this 4 years ago on purpose ;-)

I don't think that ipv4 is the right reference because the implementation is
really different. ipv6 is more flexible and this implies differences in the
notifications.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 16:12               ` David Ahern
@ 2017-01-30 18:45                 ` Roopa Prabhu
  2017-01-30 21:16                   ` Stephen Hemminger
  0 siblings, 1 reply; 26+ messages in thread
From: Roopa Prabhu @ 2017-01-30 18:45 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, nicolas.dichtel

On 1/30/17, 8:12 AM, David Ahern wrote:
> On 1/30/17 8:49 AM, Roopa Prabhu wrote:
>>> Single next hop delete will be around because IPv6 allows it -- and because IPv4 needs to support it.
>>>
>> understand single next hop delete for ipv6 will be around..and my only point was to leave it around but not optimize for that case.
>> I don't think we should support single nexthop delete in ipv4 (I have not seen a requirement for that)... ipv4 is good as it is right now.
>> the additional complexity is not needed.
>>
> IPv4 has a known bug -- delete a virtual interface in a multihop route and the entire route is deleted, including the nexthops for other devices. This does not happen for IPv6.
>
> Simple example of that bug:
>
> ip li add dummy1 type dummy
> ip li add dummy2 type dummy
> ip addr add dev dummy1 10.11.1.1/28
> ip li set dummy1 up
> ip addr add dev dummy2 10.11.2.1/28
> ip li set dummy2 up
> ip ro add 1.1.1.0/24 nexthop via 10.11.1.2 nexthop via 10.11.2.2
> ip li del dummy2
>
> --> the entire multipath route has been deleted.
>
>
> And, fixing this bug enables work to make IPv4 append to be sane -- appending a route should modify an existing route by adding the nexthop, not adding a new route that I believe can never actually be hit.
>
> Both cases mean modifying an IPv4 route -- adding or removing nexthops -- a capability that IPv6 allows so fixing this means closing another difference between the stacks.

good point on the bug you point out. agree the bug needs to be fixed ...but routing daemons react to this behavior pretty well...because they get a link notification and a route notification. I was ok with fixing ipv6 to be similar to ipv4...but I am not in favor of bringing in design choices that ipv6 made into ipv4 :).
In all cases, in my experience with routes, the update of ecmp route as a whole has always been ok (at-least not until now...maybe in the future
for new usecases)

In the case of the bug you point out, can the user be notified of the implicit update of the route in the kernel ...via replace flag or something ?.
regarding append..., ipv4 never really supported appending to an existing route......even in the case of a non-ecmp routes.
append just dictates the order where the route is added IIRC  (i maybe mistaken here..its been long i tried it).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 18:45                 ` Roopa Prabhu
@ 2017-01-30 21:16                   ` Stephen Hemminger
  2017-01-30 23:53                     ` David Ahern
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2017-01-30 21:16 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: David Ahern, netdev, nicolas.dichtel

On Mon, 30 Jan 2017 10:45:57 -0800
Roopa Prabhu <roopa@cumulusnetworks.com> wrote:

> On 1/30/17, 8:12 AM, David Ahern wrote:
> > On 1/30/17 8:49 AM, Roopa Prabhu wrote:  
> >>> Single next hop delete will be around because IPv6 allows it -- and because IPv4 needs to support it.
> >>>  
> >> understand single next hop delete for ipv6 will be around..and my only point was to leave it around but not optimize for that case.
> >> I don't think we should support single nexthop delete in ipv4 (I have not seen a requirement for that)... ipv4 is good as it is right now.
> >> the additional complexity is not needed.
> >>  
> > IPv4 has a known bug -- delete a virtual interface in a multihop route and the entire route is deleted, including the nexthops for other devices. This does not happen for IPv6.
> >
> > Simple example of that bug:
> >
> > ip li add dummy1 type dummy
> > ip li add dummy2 type dummy
> > ip addr add dev dummy1 10.11.1.1/28
> > ip li set dummy1 up
> > ip addr add dev dummy2 10.11.2.1/28
> > ip li set dummy2 up
> > ip ro add 1.1.1.0/24 nexthop via 10.11.1.2 nexthop via 10.11.2.2
> > ip li del dummy2
> >  
> > --> the entire multipath route has been deleted.  
> >
> >
> > And, fixing this bug enables work to make IPv4 append to be sane -- appending a route should modify an existing route by adding the nexthop, not adding a new route that I believe can never actually be hit.
> >
> > Both cases mean modifying an IPv4 route -- adding or removing nexthops -- a capability that IPv6 allows so fixing this means closing another difference between the stacks.  
> 
> good point on the bug you point out. agree the bug needs to be fixed ...but routing daemons react to this behavior pretty well...because they get a link notification and a route notification. I was ok with fixing ipv6 to be similar to ipv4...but I am not in favor of bringing in design choices that ipv6 made into ipv4 :).
> In all cases, in my experience with routes, the update of ecmp route as a whole has always been ok (at-least not until now...maybe in the future
> for new usecases)
> 
> In the case of the bug you point out, can the user be notified of the implicit update of the route in the kernel ...via replace flag or something ?.
> regarding append..., ipv4 never really supported appending to an existing route......even in the case of a non-ecmp routes.
> append just dictates the order where the route is added IIRC  (i maybe mistaken here..its been long i tried it).

My fear is that routing daemons already adapt to the funny semantics of multi-path routing in IPv4 vs IPv6
and therefore any change in semantics or flags risks breaking existing user space.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 21:16                   ` Stephen Hemminger
@ 2017-01-30 23:53                     ` David Ahern
  2017-01-30 23:56                       ` Stephen Hemminger
  0 siblings, 1 reply; 26+ messages in thread
From: David Ahern @ 2017-01-30 23:53 UTC (permalink / raw)
  To: Stephen Hemminger, Roopa Prabhu; +Cc: netdev, nicolas.dichtel

On 1/30/17 2:16 PM, Stephen Hemminger wrote:
> My fear is that routing daemons already adapt to the funny semantics of multi-path routing in IPv4 vs IPv6
> and therefore any change in semantics or flags risks breaking existing user space.

That is a possibility, but so far the 2 open source code bases I know of have problems with IPv6 mpath.

As I mentioned quagga does not work with IPv6 multipath as is today. 

I just looked at bird. IPv6 mpath support was added in Sept. 2016. It specifically hard codes not accepting RTA_MULTIPATH for IPv6 which I think is an odd choice and clearly coding to quirks as opposed to rtnetlink design. Having never looked at bird code I was able to get it working in < 1 hour. I will contact the patch author about that limitation. That said, the bird implementation needs work when you look at the add/delete/replace/append permutations, so the current code has its problems as well.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 23:53                     ` David Ahern
@ 2017-01-30 23:56                       ` Stephen Hemminger
  2017-01-31  0:02                         ` David Ahern
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2017-01-30 23:56 UTC (permalink / raw)
  To: David Ahern; +Cc: Roopa Prabhu, netdev, nicolas.dichtel

On Mon, 30 Jan 2017 16:53:44 -0700
David Ahern <dsa@cumulusnetworks.com> wrote:

> On 1/30/17 2:16 PM, Stephen Hemminger wrote:
> > My fear is that routing daemons already adapt to the funny semantics of multi-path routing in IPv4 vs IPv6
> > and therefore any change in semantics or flags risks breaking existing user space.  
> 
> That is a possibility, but so far the 2 open source code bases I know of have problems with IPv6 mpath.

Breaking closed source is not acceptable either.

> 
> As I mentioned quagga does not work with IPv6 multipath as is today. 
> 
> I just looked at bird. IPv6 mpath support was added in Sept. 2016. It specifically hard codes not accepting RTA_MULTIPATH for IPv6 which I think is an odd choice and clearly coding to quirks as opposed to rtnetlink design. Having never looked at bird code I was able to get it working in < 1 hour. I will contact the patch author about that limitation. That said, the bird implementation needs work when you look at the add/delete/replace/append permutations, so the current code has its problems as well.

Also what if quagga was fixed but  had to work with existing enterprise distros?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes
  2017-01-30 23:56                       ` Stephen Hemminger
@ 2017-01-31  0:02                         ` David Ahern
  0 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2017-01-31  0:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Roopa Prabhu, netdev, nicolas.dichtel

On 1/30/17 4:56 PM, Stephen Hemminger wrote:
>>
>> As I mentioned quagga does not work with IPv6 multipath as is today. 
>>
>> I just looked at bird. IPv6 mpath support was added in Sept. 2016. It specifically hard codes not accepting RTA_MULTIPATH for IPv6 which I think is an odd choice and clearly coding to quirks as opposed to rtnetlink design. Having never looked at bird code I was able to get it working in < 1 hour. I will contact the patch author about that limitation. That said, the bird implementation needs work when you look at the add/delete/replace/append permutations, so the current code has its problems as well.
> 
> Also what if quagga was fixed but  had to work with existing enterprise distros?
> 

I am testing the patches; it will work on both.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-01-31  0:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-27 23:20 [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes David Ahern
2017-01-27 23:20 ` [PATCH net-next v3 1/4] net: ipv6: add NLM_F_APPEND in notifications when applicable David Ahern
2017-01-27 23:20 ` [PATCH net-next v3 2/4] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
2017-01-27 23:20 ` [PATCH net-next v3 3/4] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute David Ahern
2017-01-27 23:20 ` [PATCH net-next v3 4/4] net: ipv6: Use compressed IPv6 addresses showing route replace error David Ahern
2017-01-29  1:00 ` [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes Roopa Prabhu
2017-01-29 18:02   ` David Ahern
2017-01-29 23:55     ` Roopa Prabhu
2017-01-30  1:29       ` David Ahern
2017-01-30  2:20         ` Roopa Prabhu
2017-01-30  2:57           ` David Ahern
2017-01-30 11:13             ` Nicolas Dichtel
2017-01-30 15:59               ` David Ahern
2017-01-30 15:49             ` Roopa Prabhu
2017-01-30 16:12               ` David Ahern
2017-01-30 18:45                 ` Roopa Prabhu
2017-01-30 21:16                   ` Stephen Hemminger
2017-01-30 23:53                     ` David Ahern
2017-01-30 23:56                       ` Stephen Hemminger
2017-01-31  0:02                         ` David Ahern
2017-01-30 11:08       ` Nicolas Dichtel
2017-01-30 15:45         ` Roopa Prabhu
2017-01-30 16:41           ` Nicolas Dichtel
2017-01-30 11:07     ` Nicolas Dichtel
2017-01-30 15:23       ` David Ahern
2017-01-30 17:03         ` Nicolas Dichtel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.