All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4
@ 2018-01-07 10:45 Ido Schimmel
  2018-01-07 10:45 ` [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle Ido Schimmel
                   ` (19 more replies)
  0 siblings, 20 replies; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

This set tries to eliminate some differences between IPv4's and IPv6's
treatment of nexthops. These differences are most likely a side effect
of IPv6's data structures (specifically 'rt6_info') that incorporate
both the route and the nexthop and the late addition of ECMP support in
commit 51ebd3181572 ("ipv6: add support of equal cost multipath
(ECMP)").

IPv4 and IPv6 do not react the same to certain netdev events. For
example, upon carrier change affected IPv4 nexthops are marked using the
RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly.
IPv6 on the other hand, does nothing which forces us to perform a
carrier check during route lookup and dump. This makes it difficult to
introduce features such as non-equal-cost multipath that are built on
top of this set [1].

In addition, when a netdev is put administratively down IPv4 nexthops
are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all
the routes using these nexthops. To be consistent with IPv4, multipath
routes should only be flushed when all nexthops in the group are
considered dead.

The first 12 patches introduce non-functional changes that store the
RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev
events, in a similar fashion to IPv4. This allows us to remove the
carrier check performed during route lookup and dump.

The next three patches make sure we only flush a multipath route when
all of its nexthops are dead.

Last three patches add test cases for IPv4/IPv6 FIB. These verify that
both address families react similarly to netdev events.

Finally, this series also serves as a good first step towards David
Ahern's goal of treating nexthops as standalone objects [2], as it makes
the code more in line with IPv4 where the nexthop and the nexthop group
are separate objects from the route itself.

1. https://github.com/idosch/linux/tree/ipv6-nexthops
2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf

Changes since RFC (feedback from David Ahern):
* Remove redundant declaration of rt6_ifdown() in patch 4 and adjust
comment referencing it accordingly
* Drop patch to flush multipath routes upon NETDEV_UNREGISTER. Reword
cover letter accordingly
* Use a temporary variable to make code more readable in patch 15

Ido Schimmel (18):
  ipv6: Remove redundant route flushing during namespace dismantle
  ipv6: Mark dead nexthops with appropriate flags
  ipv6: Clear nexthop flags upon netdev up
  ipv6: Prepare to handle multiple netdev events
  ipv6: Set nexthop flags upon carrier change
  ipv6: Set nexthop flags during route creation
  ipv6: Check nexthop flags during route lookup instead of carrier
  ipv6: Check nexthop flags in route dump instead of carrier
  ipv6: Ignore dead routes during lookup
  ipv6: Report dead flag during route dump
  ipv6: Add explicit flush indication to routes
  ipv6: Teach tree walker to skip multipath routes
  ipv6: Export sernum update function
  ipv6: Take table lock outside of sernum update function
  ipv6: Flush multipath routes when all siblings are dead
  selftests: fib_tests: Add test cases for IPv4/IPv6 FIB
  selftests: fib_tests: Add test cases for netdev down
  selftests: fib_tests: Add test cases for netdev carrier change

 include/net/ip6_fib.h                    |   4 +-
 include/net/ip6_route.h                  |   4 +-
 net/ipv6/addrconf.c                      |   9 +-
 net/ipv6/ip6_fib.c                       |  28 +-
 net/ipv6/route.c                         | 185 +++++++++++--
 tools/testing/selftests/net/Makefile     |   1 +
 tools/testing/selftests/net/fib_tests.sh | 429 +++++++++++++++++++++++++++++++
 7 files changed, 618 insertions(+), 42 deletions(-)
 create mode 100755 tools/testing/selftests/net/fib_tests.sh

-- 
2.14.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 16:53   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags Ido Schimmel
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

By the time fib6_net_exit() is executed all the netdevs in the namespace
have been either unregistered or pushed back to the default namespace.
That is because pernet subsys operations are always ordered before
pernet device operations and therefore invoked after them during
namespace dismantle.

Thus, all the routing tables in the namespace are empty by the time
fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed.

This allows us to simplify the condition in fib6_ifdown() as it's only
ever called with an actual netdev.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/ip6_fib.c | 1 -
 net/ipv6/route.c   | 8 +++-----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index a64d559fa513..3bbb89d8187d 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2103,7 +2103,6 @@ static void fib6_net_exit(struct net *net)
 {
 	unsigned int i;
 
-	rt6_ifdown(net, NULL);
 	del_timer_sync(&net->ipv6.ip6_fib_timer);
 
 	for (i = 0; i < FIB6_TABLE_HASHSZ; i++) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2490280b3394..c557362daa23 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3470,10 +3470,9 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
 	const struct arg_dev_net *adn = arg;
 	const struct net_device *dev = adn->dev;
 
-	if ((rt->dst.dev == dev || !dev) &&
+	if (rt->dst.dev == dev &&
 	    rt != adn->net->ipv6.ip6_null_entry &&
-	    (rt->rt6i_nsiblings == 0 ||
-	     (dev && netdev_unregistering(dev)) ||
+	    (rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
 	     !rt->rt6i_idev->cnf.ignore_routes_with_linkdown))
 		return -1;
 
@@ -3488,8 +3487,7 @@ void rt6_ifdown(struct net *net, struct net_device *dev)
 	};
 
 	fib6_clean_all(net, fib6_ifdown, &adn);
-	if (dev)
-		rt6_uncached_list_flush_dev(net, dev);
+	rt6_uncached_list_flush_dev(net, dev);
 }
 
 struct rt6_mtu_change_arg {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
  2018-01-07 10:45 ` [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 16:54   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up Ido Schimmel
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

When a netdev is put administratively down or unregistered all the
nexthops using it as their nexthop device should be marked with the
'dead' and 'linkdown' flags.

Currently, when a route is dumped its nexthop device is tested and the
flags are set accordingly. A similar check is performed during route
lookup.

Instead, we can simply mark the nexthops based on netdev events and
avoid checking the netdev's state during route dump and lookup.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c557362daa23..f5eda0aeab55 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3473,8 +3473,10 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
 	if (rt->dst.dev == dev &&
 	    rt != adn->net->ipv6.ip6_null_entry &&
 	    (rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
-	     !rt->rt6i_idev->cnf.ignore_routes_with_linkdown))
+	     !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)) {
+		rt->rt6i_nh_flags |= (RTNH_F_DEAD | RTNH_F_LINKDOWN);
 		return -1;
+	}
 
 	return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
  2018-01-07 10:45 ` [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle Ido Schimmel
  2018-01-07 10:45 ` [PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 16:54   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events Ido Schimmel
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Previous patch marked nexthops with the 'dead' and 'linkdown' flags.
Clear these flags when the netdev comes back up.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/net/ip6_route.h |  1 +
 net/ipv6/addrconf.c     |  3 +++
 net/ipv6/route.c        | 29 +++++++++++++++++++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 18e442ea93d8..caad39198c2a 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -169,6 +169,7 @@ void rt6_ifdown(struct net *net, struct net_device *dev);
 void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
+void rt6_sync_up(struct net_device *dev, unsigned int nh_flags);
 
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ed06b1190f05..b6405568ed7b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3484,6 +3484,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			if (run_pending)
 				addrconf_dad_run(idev);
 
+			/* Device has an address by now */
+			rt6_sync_up(dev, RTNH_F_DEAD);
+
 			/*
 			 * If the MTU changed during the interface down,
 			 * when the interface up, the changed MTU must be
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f5eda0aeab55..4796d87e0b93 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3459,6 +3459,35 @@ void rt6_clean_tohost(struct net *net, struct in6_addr *gateway)
 	fib6_clean_all(net, fib6_clean_tohost, gateway);
 }
 
+struct arg_netdev_event {
+	const struct net_device *dev;
+	unsigned int nh_flags;
+};
+
+static int fib6_ifup(struct rt6_info *rt, void *p_arg)
+{
+	const struct arg_netdev_event *arg = p_arg;
+	const struct net *net = dev_net(arg->dev);
+
+	if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev)
+		rt->rt6i_nh_flags &= ~arg->nh_flags;
+
+	return 0;
+}
+
+void rt6_sync_up(struct net_device *dev, unsigned int nh_flags)
+{
+	struct arg_netdev_event arg = {
+		.dev = dev,
+		.nh_flags = nh_flags,
+	};
+
+	if (nh_flags & RTNH_F_DEAD && netif_carrier_ok(dev))
+		arg.nh_flags |= RTNH_F_LINKDOWN;
+
+	fib6_clean_all(dev_net(dev), fib6_ifup, &arg);
+}
+
 struct arg_dev_net {
 	struct net_device *dev;
 	struct net *net;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (2 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 16:55   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change Ido Schimmel
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

To make IPv6 more in line with IPv4 we need to be able to respond
differently to different netdev events. For example, when a netdev is
unregistered all the routes using it as their nexthop device should be
flushed, whereas when the netdev's carrier changes only the 'linkdown'
flag should be toggled.

Currently, this is not possible, as the function that traverses the
routing tables is not aware of the triggering event.

Propagate the triggering event down, so that it could be used in later
patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/net/ip6_route.h |  2 +-
 net/ipv6/addrconf.c     |  4 ++--
 net/ipv6/route.c        | 37 +++++++++++++++++++++----------------
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index caad39198c2a..6a2f80cbdf65 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -165,11 +165,11 @@ struct rt6_rtnl_dump_arg {
 };
 
 int rt6_dump_route(struct rt6_info *rt, void *p_arg);
-void rt6_ifdown(struct net *net, struct net_device *dev);
 void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
 void rt6_sync_up(struct net_device *dev, unsigned int nh_flags);
+void rt6_disable_ip(struct net_device *dev, unsigned long event);
 
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index b6405568ed7b..a13e1ffe87ec 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3580,6 +3580,7 @@ static bool addr_is_local(const struct in6_addr *addr)
 
 static int addrconf_ifdown(struct net_device *dev, int how)
 {
+	unsigned long event = how ? NETDEV_UNREGISTER : NETDEV_DOWN;
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
 	struct inet6_ifaddr *ifa, *tmp;
@@ -3589,8 +3590,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 	ASSERT_RTNL();
 
-	rt6_ifdown(net, dev);
-	neigh_ifdown(&nd_tbl, dev);
+	rt6_disable_ip(dev, event);
 
 	idev = __in6_dev_get(dev);
 	if (!idev)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4796d87e0b93..194fe9d9cd85 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2344,7 +2344,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 	rt->rt6i_idev     = idev;
 	dst_metric_set(&rt->dst, RTAX_HOPLIMIT, 0);
 
-	/* Add this dst into uncached_list so that rt6_ifdown() can
+	/* Add this dst into uncached_list so that rt6_disable_ip() can
 	 * do proper release of the net_device
 	 */
 	rt6_uncached_list_add(rt);
@@ -3461,7 +3461,10 @@ void rt6_clean_tohost(struct net *net, struct in6_addr *gateway)
 
 struct arg_netdev_event {
 	const struct net_device *dev;
-	unsigned int nh_flags;
+	union {
+		unsigned int nh_flags;
+		unsigned long event;
+	};
 };
 
 static int fib6_ifup(struct rt6_info *rt, void *p_arg)
@@ -3488,19 +3491,15 @@ void rt6_sync_up(struct net_device *dev, unsigned int nh_flags)
 	fib6_clean_all(dev_net(dev), fib6_ifup, &arg);
 }
 
-struct arg_dev_net {
-	struct net_device *dev;
-	struct net *net;
-};
-
 /* called with write lock held for table with rt */
-static int fib6_ifdown(struct rt6_info *rt, void *arg)
+static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
 {
-	const struct arg_dev_net *adn = arg;
-	const struct net_device *dev = adn->dev;
+	const struct arg_netdev_event *arg = p_arg;
+	const struct net_device *dev = arg->dev;
+	const struct net *net = dev_net(dev);
 
 	if (rt->dst.dev == dev &&
-	    rt != adn->net->ipv6.ip6_null_entry &&
+	    rt != net->ipv6.ip6_null_entry &&
 	    (rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
 	     !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)) {
 		rt->rt6i_nh_flags |= (RTNH_F_DEAD | RTNH_F_LINKDOWN);
@@ -3510,15 +3509,21 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
 	return 0;
 }
 
-void rt6_ifdown(struct net *net, struct net_device *dev)
+static void rt6_sync_down_dev(struct net_device *dev, unsigned long event)
 {
-	struct arg_dev_net adn = {
+	struct arg_netdev_event arg = {
 		.dev = dev,
-		.net = net,
+		.event = event,
 	};
 
-	fib6_clean_all(net, fib6_ifdown, &adn);
-	rt6_uncached_list_flush_dev(net, dev);
+	fib6_clean_all(dev_net(dev), fib6_ifdown, &arg);
+}
+
+void rt6_disable_ip(struct net_device *dev, unsigned long event)
+{
+	rt6_sync_down_dev(dev, event);
+	rt6_uncached_list_flush_dev(dev_net(dev), dev);
+	neigh_ifdown(&nd_tbl, dev);
 }
 
 struct rt6_mtu_change_arg {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (3 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 16:56   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 06/18] ipv6: Set nexthop flags during route creation Ido Schimmel
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Similar to IPv4, when the carrier of a netdev changes we should toggle
the 'linkdown' flag on all the nexthops using it as their nexthop
device.

This will later allow us to test for the presence of this flag during
route lookup and dump.

Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
device with address") host and anycast routes used the loopback netdev
as their nexthop device and thus were not marked with the 'linkdown'
flag. The patch preserves this behavior and allows one to ping the local
address even when the nexthop device does not have a carrier and the
'ignore_routes_with_linkdown' sysctl is set.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/net/ip6_route.h |  1 +
 net/ipv6/addrconf.c     |  2 ++
 net/ipv6/route.c        | 23 +++++++++++++++++------
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 6a2f80cbdf65..34cd3b0c6ded 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -170,6 +170,7 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
 void rt6_sync_up(struct net_device *dev, unsigned int nh_flags);
 void rt6_disable_ip(struct net_device *dev, unsigned long event);
+void rt6_sync_down_dev(struct net_device *dev, unsigned long event);
 
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a13e1ffe87ec..2435f7ab070b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3438,6 +3438,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 		} else if (event == NETDEV_CHANGE) {
 			if (!addrconf_link_ready(dev)) {
 				/* device is still not ready. */
+				rt6_sync_down_dev(dev, event);
 				break;
 			}
 
@@ -3449,6 +3450,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 					 * multicast snooping switches
 					 */
 					ipv6_mc_up(idev);
+					rt6_sync_up(dev, RTNH_F_LINKDOWN);
 					break;
 				}
 				idev->if_flags |= IF_READY;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 194fe9d9cd85..2fd36c7dd143 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3498,18 +3498,29 @@ static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
 	const struct net_device *dev = arg->dev;
 	const struct net *net = dev_net(dev);
 
-	if (rt->dst.dev == dev &&
-	    rt != net->ipv6.ip6_null_entry &&
-	    (rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
-	     !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)) {
-		rt->rt6i_nh_flags |= (RTNH_F_DEAD | RTNH_F_LINKDOWN);
+	if (rt->dst.dev != dev || rt == net->ipv6.ip6_null_entry)
+		return 0;
+
+	switch (arg->event) {
+	case NETDEV_UNREGISTER:
 		return -1;
+	case NETDEV_DOWN:
+		if (rt->rt6i_nsiblings == 0 ||
+		    !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+			return -1;
+		rt->rt6i_nh_flags |= RTNH_F_DEAD;
+		/* fall through */
+	case NETDEV_CHANGE:
+		if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
+			break;
+		rt->rt6i_nh_flags |= RTNH_F_LINKDOWN;
+		break;
 	}
 
 	return 0;
 }
 
-static void rt6_sync_down_dev(struct net_device *dev, unsigned long event)
+void rt6_sync_down_dev(struct net_device *dev, unsigned long event)
 {
 	struct arg_netdev_event arg = {
 		.dev = dev,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 06/18] ipv6: Set nexthop flags during route creation
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (4 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 16:58   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier Ido Schimmel
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

It is valid to install routes with a nexthop device that does not have a
carrier, so we need to make sure they're marked accordingly.

As explained in the previous patch, host and anycast routes are never
marked with the 'linkdown' flag.

Note that reject routes are unaffected, as these use the loopback device
which always has a carrier.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/route.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2fd36c7dd143..314e3bf41f6f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2746,6 +2746,9 @@ static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg,
 	rt->rt6i_flags = cfg->fc_flags;
 
 install_route:
+	if (!(rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST)) &&
+	    !netif_carrier_ok(dev))
+		rt->rt6i_nh_flags |= RTNH_F_LINKDOWN;
 	rt->dst.dev = dev;
 	rt->rt6i_idev = idev;
 	rt->rt6i_table = table;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (5 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 06/18] ipv6: Set nexthop flags during route creation Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 16:59   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 08/18] ipv6: Check nexthop flags in route dump " Ido Schimmel
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Now that the RTNH_F_LINKDOWN flag is set in nexthops, we can avoid the
need to dereference the nexthop device and check its carrier and instead
check for the presence of the flag.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/route.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 314e3bf41f6f..ab0eed43ed97 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -474,7 +474,7 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
 			if (route_choosen == 0) {
 				struct inet6_dev *idev = sibling->rt6i_idev;
 
-				if (!netif_carrier_ok(sibling->dst.dev) &&
+				if (sibling->rt6i_nh_flags & RTNH_F_LINKDOWN &&
 				    idev->cnf.ignore_routes_with_linkdown)
 					break;
 				if (rt6_score_route(sibling, oif, strict) < 0)
@@ -679,10 +679,9 @@ static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict,
 	int m;
 	bool match_do_rr = false;
 	struct inet6_dev *idev = rt->rt6i_idev;
-	struct net_device *dev = rt->dst.dev;
 
-	if (dev && !netif_carrier_ok(dev) &&
-	    idev->cnf.ignore_routes_with_linkdown &&
+	if (idev->cnf.ignore_routes_with_linkdown &&
+	    rt->rt6i_nh_flags & RTNH_F_LINKDOWN &&
 	    !(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE))
 		goto out;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 08/18] ipv6: Check nexthop flags in route dump instead of carrier
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (6 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:01   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 09/18] ipv6: Ignore dead routes during lookup Ido Schimmel
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Similar to previous patch, there is no need to check for the carrier of
the nexthop device when dumping the route and we can instead check for
the presence of the RTNH_F_LINKDOWN flag.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ab0eed43ed97..f980f904d6ea 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4039,7 +4039,7 @@ static size_t rt6_nlmsg_size(struct rt6_info *rt)
 static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
 			    unsigned int *flags, bool skip_oif)
 {
-	if (!netif_running(rt->dst.dev) || !netif_carrier_ok(rt->dst.dev)) {
+	if (rt->rt6i_nh_flags & RTNH_F_LINKDOWN) {
 		*flags |= RTNH_F_LINKDOWN;
 		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
 			*flags |= RTNH_F_DEAD;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 09/18] ipv6: Ignore dead routes during lookup
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (7 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 08/18] ipv6: Check nexthop flags in route dump " Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:02   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 10/18] ipv6: Report dead flag during route dump Ido Schimmel
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Currently, dead routes are only present in the routing tables in case
the 'ignore_routes_with_linkdown' sysctl is set. Otherwise, they are
flushed.

Subsequent patches are going to remove the reliance on this sysctl and
make IPv6 more consistent with IPv4.

Before this is done, we need to make sure dead routes are skipped during
route lookup, so as to not cause packet loss.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/route.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f980f904d6ea..c00156805bf0 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -474,6 +474,8 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
 			if (route_choosen == 0) {
 				struct inet6_dev *idev = sibling->rt6i_idev;
 
+				if (sibling->rt6i_nh_flags & RTNH_F_DEAD)
+					break;
 				if (sibling->rt6i_nh_flags & RTNH_F_LINKDOWN &&
 				    idev->cnf.ignore_routes_with_linkdown)
 					break;
@@ -499,12 +501,15 @@ static inline struct rt6_info *rt6_device_match(struct net *net,
 	struct rt6_info *local = NULL;
 	struct rt6_info *sprt;
 
-	if (!oif && ipv6_addr_any(saddr))
-		goto out;
+	if (!oif && ipv6_addr_any(saddr) && !(rt->rt6i_nh_flags & RTNH_F_DEAD))
+		return rt;
 
 	for (sprt = rt; sprt; sprt = rcu_dereference(sprt->rt6_next)) {
 		struct net_device *dev = sprt->dst.dev;
 
+		if (sprt->rt6i_nh_flags & RTNH_F_DEAD)
+			continue;
+
 		if (oif) {
 			if (dev->ifindex == oif)
 				return sprt;
@@ -533,8 +538,8 @@ static inline struct rt6_info *rt6_device_match(struct net *net,
 		if (flags & RT6_LOOKUP_F_IFACE)
 			return net->ipv6.ip6_null_entry;
 	}
-out:
-	return rt;
+
+	return rt->rt6i_nh_flags & RTNH_F_DEAD ? net->ipv6.ip6_null_entry : rt;
 }
 
 #ifdef CONFIG_IPV6_ROUTER_PREF
@@ -680,6 +685,9 @@ static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict,
 	bool match_do_rr = false;
 	struct inet6_dev *idev = rt->rt6i_idev;
 
+	if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+		goto out;
+
 	if (idev->cnf.ignore_routes_with_linkdown &&
 	    rt->rt6i_nh_flags & RTNH_F_LINKDOWN &&
 	    !(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE))
@@ -2153,6 +2161,8 @@ static struct rt6_info *__ip6_route_redirect(struct net *net,
 	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 restart:
 	for_each_fib6_node_rt_rcu(fn) {
+		if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+			continue;
 		if (rt6_check_expired(rt))
 			continue;
 		if (rt->dst.error)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 10/18] ipv6: Report dead flag during route dump
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (8 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 09/18] ipv6: Ignore dead routes during lookup Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:02   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 11/18] ipv6: Add explicit flush indication to routes Ido Schimmel
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Up until now the RTNH_F_DEAD flag was only reported in route dump when
the 'ignore_routes_with_linkdown' sysctl was set. This is expected as
dead routes were flushed otherwise.

The reliance on this sysctl is going to be removed, so we need to report
the flag regardless of the sysctl's value.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/route.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c00156805bf0..f62d24948aa2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4049,6 +4049,9 @@ static size_t rt6_nlmsg_size(struct rt6_info *rt)
 static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
 			    unsigned int *flags, bool skip_oif)
 {
+	if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+		*flags |= RTNH_F_DEAD;
+
 	if (rt->rt6i_nh_flags & RTNH_F_LINKDOWN) {
 		*flags |= RTNH_F_LINKDOWN;
 		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 11/18] ipv6: Add explicit flush indication to routes
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (9 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 10/18] ipv6: Report dead flag during route dump Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:03   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes Ido Schimmel
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

When routes that are a part of a multipath route are evaluated by
fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events
the state of their sibling routes is not considered.

This will change in subsequent patches in order to align IPv6 with
IPv4's behavior. For example, when the last sibling in a multipath route
becomes dead, the entire multipath route needs to be removed.

To prevent the tree walker from re-evaluating all the sibling routes
each time, we can simply evaluate them once - when the first sibling is
traversed.

If we determine the entire multipath route needs to be removed, then the
'should_flush' bit is set in all the siblings, which will cause the
walker to flush them when it traverses them.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/net/ip6_fib.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 44d96a91e745..affea1aa6ae4 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -173,7 +173,8 @@ struct rt6_info {
 	unsigned short			rt6i_nfheader_len;
 	u8				rt6i_protocol;
 	u8				exception_bucket_flushed:1,
-					unused:7;
+					should_flush:1,
+					unused:6;
 };
 
 #define for_each_fib6_node_rt_rcu(fn)					\
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (10 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 11/18] ipv6: Add explicit flush indication to routes Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:04   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 13/18] ipv6: Export sernum update function Ido Schimmel
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

As explained in previous patch, fib6_ifdown() needs to consider the
state of all the sibling routes when a multipath route is traversed.

This is done by evaluating all the siblings when the first sibling in a
multipath route is traversed. If the multipath route does not need to be
flushed (e.g., not all siblings are dead), then we should just skip the
multipath route as our work is done.

Have the tree walker jump to the last sibling when it is determined that
the multipath route needs to be skipped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/ip6_fib.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 3bbb89d8187d..5e4b5eef0ddd 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1887,7 +1887,7 @@ static int fib6_clean_node(struct fib6_walker *w)
 
 	for_each_fib6_walker_rt(w) {
 		res = c->func(rt, c->arg);
-		if (res < 0) {
+		if (res == -1) {
 			w->leaf = rt;
 			res = fib6_del(rt, &info);
 			if (res) {
@@ -1900,6 +1900,12 @@ static int fib6_clean_node(struct fib6_walker *w)
 				continue;
 			}
 			return 0;
+		} else if (res == -2) {
+			if (WARN_ON(!rt->rt6i_nsiblings))
+				continue;
+			rt = list_last_entry(&rt->rt6i_siblings,
+					     struct rt6_info, rt6i_siblings);
+			continue;
 		}
 		WARN_ON(res != 0);
 	}
@@ -1911,7 +1917,8 @@ static int fib6_clean_node(struct fib6_walker *w)
  *	Convenient frontend to tree walker.
  *
  *	func is called on each route.
- *		It may return -1 -> delete this route.
+ *		It may return -2 -> skip multipath route.
+ *			      -1 -> delete this route.
  *		              0  -> continue walking
  */
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 13/18] ipv6: Export sernum update function
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (11 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:04   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 14/18] ipv6: Take table lock outside of " Ido Schimmel
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

We are going to allow dead routes to stay in the FIB tree (e.g., when
they are part of a multipath route, directly connected route with no
carrier) and revive them when their nexthop device gains carrier or when
it is put administratively up.

This is equivalent to the addition of the route to the FIB tree and we
should therefore take care of updating the sernum of all the parent
nodes of the node where the route is stored. Otherwise, we risk sockets
caching and using sub-optimal dst entries.

Export the function that performs the above, so that it could be invoked
from fib6_ifup() later on.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/net/ip6_fib.h |  1 +
 net/ipv6/ip6_fib.c    | 11 ++++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index affea1aa6ae4..ddf53dd1e948 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -405,6 +405,7 @@ unsigned int fib6_tables_seq_read(struct net *net);
 int fib6_tables_dump(struct net *net, struct notifier_block *nb);
 
 void fib6_update_sernum(struct rt6_info *rt);
+void fib6_update_sernum_upto_root(struct net *net, struct rt6_info *rt);
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 int fib6_rules_init(void);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 5e4b5eef0ddd..c1bbe7bf9fdd 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1102,8 +1102,8 @@ void fib6_force_start_gc(struct net *net)
 			  jiffies + net->ipv6.sysctl.ip6_rt_gc_interval);
 }
 
-static void fib6_update_sernum_upto_root(struct rt6_info *rt,
-					 int sernum)
+static void __fib6_update_sernum_upto_root(struct rt6_info *rt,
+					   int sernum)
 {
 	struct fib6_node *fn = rcu_dereference_protected(rt->rt6i_node,
 				lockdep_is_held(&rt->rt6i_table->tb6_lock));
@@ -1117,6 +1117,11 @@ static void fib6_update_sernum_upto_root(struct rt6_info *rt,
 	}
 }
 
+void fib6_update_sernum_upto_root(struct net *net, struct rt6_info *rt)
+{
+	__fib6_update_sernum_upto_root(rt, fib6_new_sernum(net));
+}
+
 /*
  *	Add routing information to the routing tree.
  *	<destination addr>/<source addr>
@@ -1230,7 +1235,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt,
 
 	err = fib6_add_rt2node(fn, rt, info, mxc, extack);
 	if (!err) {
-		fib6_update_sernum_upto_root(rt, sernum);
+		__fib6_update_sernum_upto_root(rt, sernum);
 		fib6_start_gc(info->nl_net, rt);
 	}
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 14/18] ipv6: Take table lock outside of sernum update function
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (12 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 13/18] ipv6: Export sernum update function Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:05   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead Ido Schimmel
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

The next patch is going to allow dead routes to remain in the FIB tree
in certain situations.

When this happens we need to be sure to bump the sernum of the nodes
where these are stored so that potential copies cached in sockets are
invalidated.

The function that performs this update assumes the table lock is not
taken when it is invoked, but that will not be the case when it is
invoked by the tree walker.

Have the function assume the lock is taken and make the single caller
take the lock itself.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/ip6_fib.c | 5 +----
 net/ipv6/route.c   | 2 ++
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index c1bbe7bf9fdd..edda5ad3b405 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -107,16 +107,13 @@ enum {
 
 void fib6_update_sernum(struct rt6_info *rt)
 {
-	struct fib6_table *table = rt->rt6i_table;
 	struct net *net = dev_net(rt->dst.dev);
 	struct fib6_node *fn;
 
-	spin_lock_bh(&table->tb6_lock);
 	fn = rcu_dereference_protected(rt->rt6i_node,
-			lockdep_is_held(&table->tb6_lock));
+			lockdep_is_held(&rt->rt6i_table->tb6_lock));
 	if (fn)
 		fn->fn_sernum = fib6_new_sernum(net);
-	spin_unlock_bh(&table->tb6_lock);
 }
 
 /*
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f62d24948aa2..a3bfce71c861 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1353,7 +1353,9 @@ static int rt6_insert_exception(struct rt6_info *nrt,
 
 	/* Update fn->fn_sernum to invalidate all cached dst */
 	if (!err) {
+		spin_lock_bh(&ort->rt6i_table->tb6_lock);
 		fib6_update_sernum(ort);
+		spin_unlock_bh(&ort->rt6i_table->tb6_lock);
 		fib6_force_start_gc(net);
 	}
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (13 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 14/18] ipv6: Take table lock outside of " Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:08   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB Ido Schimmel
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

By default, IPv6 deletes nexthops from a multipath route when the
nexthop device is put administratively down. This differs from IPv4
where the nexthops are kept, but marked with the RTNH_F_DEAD flag. A
multipath route is flushed when all of its nexthops become dead.

Align IPv6 with IPv4 and have it conform to the same guidelines.

In case the multipath route needs to be flushed, its siblings are
flushed one by one. Otherwise, the nexthops are marked with the
appropriate flags and the tree walker is instructed to skip all the
siblings.

As explained in previous patches, care is taken to update the sernum of
the affected tree nodes, so as to prevent the use of wrong dst entries.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv6/route.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 75 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a3bfce71c861..1054b059747f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3486,8 +3486,10 @@ static int fib6_ifup(struct rt6_info *rt, void *p_arg)
 	const struct arg_netdev_event *arg = p_arg;
 	const struct net *net = dev_net(arg->dev);
 
-	if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev)
+	if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev) {
 		rt->rt6i_nh_flags &= ~arg->nh_flags;
+		fib6_update_sernum_upto_root(dev_net(rt->dst.dev), rt);
+	}
 
 	return 0;
 }
@@ -3505,6 +3507,58 @@ void rt6_sync_up(struct net_device *dev, unsigned int nh_flags)
 	fib6_clean_all(dev_net(dev), fib6_ifup, &arg);
 }
 
+static bool rt6_multipath_uses_dev(const struct rt6_info *rt,
+				   const struct net_device *dev)
+{
+	struct rt6_info *iter;
+
+	if (rt->dst.dev == dev)
+		return true;
+	list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+		if (iter->dst.dev == dev)
+			return true;
+
+	return false;
+}
+
+static void rt6_multipath_flush(struct rt6_info *rt)
+{
+	struct rt6_info *iter;
+
+	rt->should_flush = 1;
+	list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+		iter->should_flush = 1;
+}
+
+static unsigned int rt6_multipath_dead_count(const struct rt6_info *rt,
+					     const struct net_device *down_dev)
+{
+	struct rt6_info *iter;
+	unsigned int dead = 0;
+
+	if (rt->dst.dev == down_dev || rt->rt6i_nh_flags & RTNH_F_DEAD)
+		dead++;
+	list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+		if (iter->dst.dev == down_dev ||
+		    iter->rt6i_nh_flags & RTNH_F_DEAD)
+			dead++;
+
+	return dead;
+}
+
+static void rt6_multipath_nh_flags_set(struct rt6_info *rt,
+				       const struct net_device *dev,
+				       unsigned int nh_flags)
+{
+	struct rt6_info *iter;
+
+	if (rt->dst.dev == dev)
+		rt->rt6i_nh_flags |= nh_flags;
+	list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+		if (iter->dst.dev == dev)
+			iter->rt6i_nh_flags |= nh_flags;
+}
+
 /* called with write lock held for table with rt */
 static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
 {
@@ -3512,20 +3566,33 @@ static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
 	const struct net_device *dev = arg->dev;
 	const struct net *net = dev_net(dev);
 
-	if (rt->dst.dev != dev || rt == net->ipv6.ip6_null_entry)
+	if (rt == net->ipv6.ip6_null_entry)
 		return 0;
 
 	switch (arg->event) {
 	case NETDEV_UNREGISTER:
-		return -1;
+		return rt->dst.dev == dev ? -1 : 0;
 	case NETDEV_DOWN:
-		if (rt->rt6i_nsiblings == 0 ||
-		    !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+		if (rt->should_flush)
 			return -1;
-		rt->rt6i_nh_flags |= RTNH_F_DEAD;
-		/* fall through */
+		if (!rt->rt6i_nsiblings)
+			return rt->dst.dev == dev ? -1 : 0;
+		if (rt6_multipath_uses_dev(rt, dev)) {
+			unsigned int count;
+
+			count = rt6_multipath_dead_count(rt, dev);
+			if (rt->rt6i_nsiblings + 1 == count) {
+				rt6_multipath_flush(rt);
+				return -1;
+			}
+			rt6_multipath_nh_flags_set(rt, dev, RTNH_F_DEAD |
+						   RTNH_F_LINKDOWN);
+			fib6_update_sernum(rt);
+		}
+		return -2;
 	case NETDEV_CHANGE:
-		if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
+		if (rt->dst.dev != dev ||
+		    rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
 			break;
 		rt->rt6i_nh_flags |= RTNH_F_LINKDOWN;
 		break;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (14 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:15   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down Ido Schimmel
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Add test cases to check that IPv4 and IPv6 react to a netdev being
unregistered as expected.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/Makefile     |   1 +
 tools/testing/selftests/net/fib_tests.sh | 146 +++++++++++++++++++++++++++++++
 2 files changed, 147 insertions(+)
 create mode 100755 tools/testing/selftests/net/fib_tests.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 500c74db746c..d7c30d366935 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -5,6 +5,7 @@ CFLAGS =  -Wall -Wl,--no-as-needed -O2 -g
 CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
+TEST_PROGS += fib_tests.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
new file mode 100755
index 000000000000..767d2ab2385d
--- /dev/null
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -0,0 +1,146 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB behavior in response to
+# different events.
+
+ret=0
+
+check_err()
+{
+	if [ $ret -eq 0 ]; then
+		ret=$1
+	fi
+}
+
+check_fail()
+{
+	if [ $1 -eq 0 ]; then
+		ret=1
+	fi
+}
+
+netns_create()
+{
+	local testns=$1
+
+	ip netns add $testns
+	ip netns exec $testns ip link set dev lo up
+}
+
+fib_unreg_unicast_test()
+{
+	ret=0
+
+	netns_create "testns"
+
+	ip netns exec testns ip link add dummy0 type dummy
+	ip netns exec testns ip link set dev dummy0 up
+
+	ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip link del dev dummy0
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+	check_fail $?
+
+	ip netns del testns
+
+	if [ $ret -ne 0 ]; then
+		echo "FAIL: unicast route test"
+		return 1
+	fi
+	echo "PASS: unicast route test"
+}
+
+fib_unreg_multipath_test()
+{
+	ret=0
+
+	netns_create "testns"
+
+	ip netns exec testns ip link add dummy0 type dummy
+	ip netns exec testns ip link set dev dummy0 up
+
+	ip netns exec testns ip link add dummy1 type dummy
+	ip netns exec testns ip link set dev dummy1 up
+
+	ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+	ip netns exec testns ip address add 192.0.2.1/24 dev dummy1
+	ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy1
+
+	ip netns exec testns ip route add 203.0.113.0/24 \
+		nexthop via 198.51.100.2 dev dummy0 \
+		nexthop via 192.0.2.2 dev dummy1
+	ip netns exec testns ip -6 route add 2001:db8:3::/64 \
+		nexthop via 2001:db8:1::2 dev dummy0 \
+		nexthop via 2001:db8:2::2 dev dummy1
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip link del dev dummy0
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+	# In IPv6 we do not flush the entire multipath route.
+	check_err $?
+
+	ip netns exec testns ip link del dev dummy1
+
+	ip netns del testns
+
+	if [ $ret -ne 0 ]; then
+		echo "FAIL: multipath route test"
+		return 1
+	fi
+	echo "PASS: multipath route test"
+}
+
+fib_unreg_test()
+{
+	echo "Running netdev unregister tests"
+
+	fib_unreg_unicast_test
+	fib_unreg_multipath_test
+}
+
+fib_test()
+{
+	fib_unreg_test
+}
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit 0
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+	echo "SKIP: Could not run test without ip tool"
+	exit 0
+fi
+
+ip route help 2>&1 | grep -q fibmatch
+if [ $? -ne 0 ]; then
+	echo "SKIP: iproute2 too old, missing fibmatch"
+	exit 0
+fi
+
+fib_test
+
+exit $ret
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (15 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:15   ` David Ahern
  2018-01-07 10:45 ` [PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change Ido Schimmel
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Check that IPv4 and IPv6 react the same when a netdev is being put
administratively down.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/fib_tests.sh | 141 +++++++++++++++++++++++++++++++
 1 file changed, 141 insertions(+)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 767d2ab2385d..25ba74f8a37e 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -120,9 +120,150 @@ fib_unreg_test()
 	fib_unreg_multipath_test
 }
 
+fib_down_unicast_test()
+{
+	ret=0
+
+	netns_create "testns"
+
+	ip netns exec testns ip link add dummy0 type dummy
+	ip netns exec testns ip link set dev dummy0 up
+
+	ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip link set dev dummy0 down
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+	check_fail $?
+
+	ip netns exec testns ip link del dev dummy0
+
+	ip netns del testns
+
+	if [ $ret -ne 0 ]; then
+		echo "FAIL: unicast route test"
+		return 1
+	fi
+	echo "PASS: unicast route test"
+}
+
+fib_down_multipath_test_do()
+{
+	local down_dev=$1
+	local up_dev=$2
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 \
+		oif $down_dev &> /dev/null
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 \
+		oif $down_dev &> /dev/null
+	check_fail $?
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 \
+		oif $up_dev &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 \
+		oif $up_dev &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 | \
+		grep $down_dev | grep -q "dead linkdown"
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 | \
+		grep $down_dev | grep -q "dead linkdown"
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 | \
+		grep $up_dev | grep -q "dead linkdown"
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 | \
+		grep $up_dev | grep -q "dead linkdown"
+	check_fail $?
+}
+
+fib_down_multipath_test()
+{
+	ret=0
+
+	netns_create "testns"
+
+	ip netns exec testns ip link add dummy0 type dummy
+	ip netns exec testns ip link set dev dummy0 up
+
+	ip netns exec testns ip link add dummy1 type dummy
+	ip netns exec testns ip link set dev dummy1 up
+
+	ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+	ip netns exec testns ip address add 192.0.2.1/24 dev dummy1
+	ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy1
+
+	ip netns exec testns ip route add 203.0.113.0/24 \
+		nexthop via 198.51.100.2 dev dummy0 \
+		nexthop via 192.0.2.2 dev dummy1
+	ip netns exec testns ip -6 route add 2001:db8:3::/64 \
+		nexthop via 2001:db8:1::2 dev dummy0 \
+		nexthop via 2001:db8:2::2 dev dummy1
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip link set dev dummy0 down
+	check_err $?
+
+	fib_down_multipath_test_do "dummy0" "dummy1"
+
+	ip netns exec testns ip link set dev dummy0 up
+	check_err $?
+	ip netns exec testns ip link set dev dummy1 down
+	check_err $?
+
+	fib_down_multipath_test_do "dummy1" "dummy0"
+
+	ip netns exec testns ip link set dev dummy0 down
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+	check_fail $?
+
+	ip netns exec testns ip link del dev dummy1
+	ip netns exec testns ip link del dev dummy0
+
+	ip netns del testns
+
+	if [ $ret -ne 0 ]; then
+		echo "FAIL: multipath route test"
+		return 1
+	fi
+	echo "PASS: multipath route test"
+}
+
+fib_down_test()
+{
+	echo "Running netdev down tests"
+
+	fib_down_unicast_test
+	fib_down_multipath_test
+}
+
 fib_test()
 {
 	fib_unreg_test
+	fib_down_test
 }
 
 if [ "$(id -u)" -ne 0 ];then
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (16 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down Ido Schimmel
@ 2018-01-07 10:45 ` Ido Schimmel
  2018-01-07 17:16   ` David Ahern
  2018-01-07 17:20 ` [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 David Ahern
  2018-01-08  2:30 ` David Miller
  19 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-07 10:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji,
	mlxsw, Ido Schimmel

Check that IPv4 and IPv6 react the same when the carrier of a netdev is
toggled. Local routes should not be affected by this, whereas unicast
routes should.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/fib_tests.sh | 142 +++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 25ba74f8a37e..a9154eefb2e2 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -260,10 +260,152 @@ fib_down_test()
 	fib_down_multipath_test
 }
 
+fib_carrier_local_test()
+{
+	ret=0
+
+	# Local routes should not be affected when carrier changes.
+	netns_create "testns"
+
+	ip netns exec testns ip link add dummy0 type dummy
+	ip netns exec testns ip link set dev dummy0 up
+
+	ip netns exec testns ip link set dev dummy0 carrier on
+
+	ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+	ip netns exec testns ip route get fibmatch 198.51.100.1 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 198.51.100.1 | \
+		grep -q "linkdown"
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 | \
+		grep -q "linkdown"
+	check_fail $?
+
+	ip netns exec testns ip link set dev dummy0 carrier off
+
+	ip netns exec testns ip route get fibmatch 198.51.100.1 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 198.51.100.1 | \
+		grep -q "linkdown"
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 | \
+		grep -q "linkdown"
+	check_fail $?
+
+	ip netns exec testns ip address add 192.0.2.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy0
+
+	ip netns exec testns ip route get fibmatch 192.0.2.1 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:2::1 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 192.0.2.1 | \
+		grep -q "linkdown"
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:2::1 | \
+		grep -q "linkdown"
+	check_fail $?
+
+	ip netns exec testns ip link del dev dummy0
+
+	ip netns del testns
+
+	if [ $ret -ne 0 ]; then
+		echo "FAIL: local route carrier test"
+		return 1
+	fi
+	echo "PASS: local route carrier test"
+}
+
+fib_carrier_unicast_test()
+{
+	ret=0
+
+	netns_create "testns"
+
+	ip netns exec testns ip link add dummy0 type dummy
+	ip netns exec testns ip link set dev dummy0 up
+
+	ip netns exec testns ip link set dev dummy0 carrier on
+
+	ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 | \
+		grep -q "linkdown"
+	check_fail $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 | \
+		grep -q "linkdown"
+	check_fail $?
+
+	ip netns exec testns ip link set dev dummy0 carrier off
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 198.51.100.2 | \
+		grep -q "linkdown"
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 | \
+		grep -q "linkdown"
+	check_err $?
+
+	ip netns exec testns ip address add 192.0.2.1/24 dev dummy0
+	ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy0
+
+	ip netns exec testns ip route get fibmatch 192.0.2.2 &> /dev/null
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:2::2 &> /dev/null
+	check_err $?
+
+	ip netns exec testns ip route get fibmatch 192.0.2.2 | \
+		grep -q "linkdown"
+	check_err $?
+	ip netns exec testns ip -6 route get fibmatch 2001:db8:2::2 | \
+		grep -q "linkdown"
+	check_err $?
+
+	ip netns exec testns ip link del dev dummy0
+
+	ip netns del testns
+
+	if [ $ret -ne 0 ]; then
+		echo "FAIL: unicast route carrier test"
+		return 1
+	fi
+	echo "PASS: unicast route carrier test"
+}
+
+fib_carrier_test()
+{
+	echo "Running netdev carrier change tests"
+
+	fib_carrier_local_test
+	fib_carrier_unicast_test
+}
+
 fib_test()
 {
 	fib_unreg_test
 	fib_down_test
+	fib_carrier_test
 }
 
 if [ "$(id -u)" -ne 0 ];then
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle
  2018-01-07 10:45 ` [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle Ido Schimmel
@ 2018-01-07 16:53   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 16:53 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> By the time fib6_net_exit() is executed all the netdevs in the namespace
> have been either unregistered or pushed back to the default namespace.
> That is because pernet subsys operations are always ordered before
> pernet device operations and therefore invoked after them during
> namespace dismantle.
> 
> Thus, all the routing tables in the namespace are empty by the time
> fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed.
> 
> This allows us to simplify the condition in fib6_ifdown() as it's only
> ever called with an actual netdev.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/ip6_fib.c | 1 -
>  net/ipv6/route.c   | 8 +++-----
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags
  2018-01-07 10:45 ` [PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags Ido Schimmel
@ 2018-01-07 16:54   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 16:54 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> When a netdev is put administratively down or unregistered all the
> nexthops using it as their nexthop device should be marked with the
> 'dead' and 'linkdown' flags.
> 
> Currently, when a route is dumped its nexthop device is tested and the
> flags are set accordingly. A similar check is performed during route
> lookup.
> 
> Instead, we can simply mark the nexthops based on netdev events and
> avoid checking the netdev's state during route dump and lookup.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/route.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up
  2018-01-07 10:45 ` [PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up Ido Schimmel
@ 2018-01-07 16:54   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 16:54 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Previous patch marked nexthops with the 'dead' and 'linkdown' flags.
> Clear these flags when the netdev comes back up.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/net/ip6_route.h |  1 +
>  net/ipv6/addrconf.c     |  3 +++
>  net/ipv6/route.c        | 29 +++++++++++++++++++++++++++++
>  3 files changed, 33 insertions(+)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events
  2018-01-07 10:45 ` [PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events Ido Schimmel
@ 2018-01-07 16:55   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 16:55 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> To make IPv6 more in line with IPv4 we need to be able to respond
> differently to different netdev events. For example, when a netdev is
> unregistered all the routes using it as their nexthop device should be
> flushed, whereas when the netdev's carrier changes only the 'linkdown'
> flag should be toggled.
> 
> Currently, this is not possible, as the function that traverses the
> routing tables is not aware of the triggering event.
> 
> Propagate the triggering event down, so that it could be used in later
> patches.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/net/ip6_route.h |  2 +-
>  net/ipv6/addrconf.c     |  4 ++--
>  net/ipv6/route.c        | 37 +++++++++++++++++++++----------------
>  3 files changed, 24 insertions(+), 19 deletions(-)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change
  2018-01-07 10:45 ` [PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change Ido Schimmel
@ 2018-01-07 16:56   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 16:56 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Similar to IPv4, when the carrier of a netdev changes we should toggle
> the 'linkdown' flag on all the nexthops using it as their nexthop
> device.
> 
> This will later allow us to test for the presence of this flag during
> route lookup and dump.
> 
> Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
> device with address") host and anycast routes used the loopback netdev
> as their nexthop device and thus were not marked with the 'linkdown'
> flag. The patch preserves this behavior and allows one to ping the local
> address even when the nexthop device does not have a carrier and the
> 'ignore_routes_with_linkdown' sysctl is set.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/net/ip6_route.h |  1 +
>  net/ipv6/addrconf.c     |  2 ++
>  net/ipv6/route.c        | 23 +++++++++++++++++------
>  3 files changed, 20 insertions(+), 6 deletions(-)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 06/18] ipv6: Set nexthop flags during route creation
  2018-01-07 10:45 ` [PATCH net-next 06/18] ipv6: Set nexthop flags during route creation Ido Schimmel
@ 2018-01-07 16:58   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 16:58 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> It is valid to install routes with a nexthop device that does not have a
> carrier, so we need to make sure they're marked accordingly.
> 
> As explained in the previous patch, host and anycast routes are never
> marked with the 'linkdown' flag.
> 
> Note that reject routes are unaffected, as these use the loopback device
> which always has a carrier.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/route.c | 3 +++
>  1 file changed, 3 insertions(+)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier
  2018-01-07 10:45 ` [PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier Ido Schimmel
@ 2018-01-07 16:59   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 16:59 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Now that the RTNH_F_LINKDOWN flag is set in nexthops, we can avoid the
> need to dereference the nexthop device and check its carrier and instead
> check for the presence of the flag.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/route.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 08/18] ipv6: Check nexthop flags in route dump instead of carrier
  2018-01-07 10:45 ` [PATCH net-next 08/18] ipv6: Check nexthop flags in route dump " Ido Schimmel
@ 2018-01-07 17:01   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:01 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Similar to previous patch, there is no need to check for the carrier of
> the nexthop device when dumping the route and we can instead check for
> the presence of the RTNH_F_LINKDOWN flag.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/route.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 09/18] ipv6: Ignore dead routes during lookup
  2018-01-07 10:45 ` [PATCH net-next 09/18] ipv6: Ignore dead routes during lookup Ido Schimmel
@ 2018-01-07 17:02   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:02 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Currently, dead routes are only present in the routing tables in case
> the 'ignore_routes_with_linkdown' sysctl is set. Otherwise, they are
> flushed.
> 
> Subsequent patches are going to remove the reliance on this sysctl and
> make IPv6 more consistent with IPv4.
> 
> Before this is done, we need to make sure dead routes are skipped during
> route lookup, so as to not cause packet loss.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/route.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 10/18] ipv6: Report dead flag during route dump
  2018-01-07 10:45 ` [PATCH net-next 10/18] ipv6: Report dead flag during route dump Ido Schimmel
@ 2018-01-07 17:02   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:02 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Up until now the RTNH_F_DEAD flag was only reported in route dump when
> the 'ignore_routes_with_linkdown' sysctl was set. This is expected as
> dead routes were flushed otherwise.
> 
> The reliance on this sysctl is going to be removed, so we need to report
> the flag regardless of the sysctl's value.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/route.c | 3 +++
>  1 file changed, 3 insertions(+)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 11/18] ipv6: Add explicit flush indication to routes
  2018-01-07 10:45 ` [PATCH net-next 11/18] ipv6: Add explicit flush indication to routes Ido Schimmel
@ 2018-01-07 17:03   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:03 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> When routes that are a part of a multipath route are evaluated by
> fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events
> the state of their sibling routes is not considered.
> 
> This will change in subsequent patches in order to align IPv6 with
> IPv4's behavior. For example, when the last sibling in a multipath route
> becomes dead, the entire multipath route needs to be removed.
> 
> To prevent the tree walker from re-evaluating all the sibling routes
> each time, we can simply evaluate them once - when the first sibling is
> traversed.
> 
> If we determine the entire multipath route needs to be removed, then the
> 'should_flush' bit is set in all the siblings, which will cause the
> walker to flush them when it traverses them.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/net/ip6_fib.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes
  2018-01-07 10:45 ` [PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes Ido Schimmel
@ 2018-01-07 17:04   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:04 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> As explained in previous patch, fib6_ifdown() needs to consider the
> state of all the sibling routes when a multipath route is traversed.
> 
> This is done by evaluating all the siblings when the first sibling in a
> multipath route is traversed. If the multipath route does not need to be
> flushed (e.g., not all siblings are dead), then we should just skip the
> multipath route as our work is done.
> 
> Have the tree walker jump to the last sibling when it is determined that
> the multipath route needs to be skipped.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/ip6_fib.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 13/18] ipv6: Export sernum update function
  2018-01-07 10:45 ` [PATCH net-next 13/18] ipv6: Export sernum update function Ido Schimmel
@ 2018-01-07 17:04   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:04 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> We are going to allow dead routes to stay in the FIB tree (e.g., when
> they are part of a multipath route, directly connected route with no
> carrier) and revive them when their nexthop device gains carrier or when
> it is put administratively up.
> 
> This is equivalent to the addition of the route to the FIB tree and we
> should therefore take care of updating the sernum of all the parent
> nodes of the node where the route is stored. Otherwise, we risk sockets
> caching and using sub-optimal dst entries.
> 
> Export the function that performs the above, so that it could be invoked
> from fib6_ifup() later on.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/net/ip6_fib.h |  1 +
>  net/ipv6/ip6_fib.c    | 11 ++++++++---
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 14/18] ipv6: Take table lock outside of sernum update function
  2018-01-07 10:45 ` [PATCH net-next 14/18] ipv6: Take table lock outside of " Ido Schimmel
@ 2018-01-07 17:05   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:05 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> The next patch is going to allow dead routes to remain in the FIB tree
> in certain situations.
> 
> When this happens we need to be sure to bump the sernum of the nodes
> where these are stored so that potential copies cached in sockets are
> invalidated.
> 
> The function that performs this update assumes the table lock is not
> taken when it is invoked, but that will not be the case when it is
> invoked by the tree walker.
> 
> Have the function assume the lock is taken and make the single caller
> take the lock itself.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/ip6_fib.c | 5 +----
>  net/ipv6/route.c   | 2 ++
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead
  2018-01-07 10:45 ` [PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead Ido Schimmel
@ 2018-01-07 17:08   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:08 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> By default, IPv6 deletes nexthops from a multipath route when the
> nexthop device is put administratively down. This differs from IPv4
> where the nexthops are kept, but marked with the RTNH_F_DEAD flag. A
> multipath route is flushed when all of its nexthops become dead.
> 
> Align IPv6 with IPv4 and have it conform to the same guidelines.
> 
> In case the multipath route needs to be flushed, its siblings are
> flushed one by one. Otherwise, the nexthops are marked with the
> appropriate flags and the tree walker is instructed to skip all the
> siblings.
> 
> As explained in previous patches, care is taken to update the sernum of
> the affected tree nodes, so as to prevent the use of wrong dst entries.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  net/ipv6/route.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 75 insertions(+), 8 deletions(-)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB
  2018-01-07 10:45 ` [PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB Ido Schimmel
@ 2018-01-07 17:15   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:15 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Add test cases to check that IPv4 and IPv6 react to a netdev being
> unregistered as expected.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  tools/testing/selftests/net/Makefile     |   1 +
>  tools/testing/selftests/net/fib_tests.sh | 146 +++++++++++++++++++++++++++++++
>  2 files changed, 147 insertions(+)
>  create mode 100755 tools/testing/selftests/net/fib_tests.sh

Acked-by: David Ahern <dsahern@gmail.com>

FYI: "ip -netns testns" is more efficient than "ip netns exec testns ip".

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down
  2018-01-07 10:45 ` [PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down Ido Schimmel
@ 2018-01-07 17:15   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:15 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Check that IPv4 and IPv6 react the same when a netdev is being put
> administratively down.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  tools/testing/selftests/net/fib_tests.sh | 141 +++++++++++++++++++++++++++++++
>  1 file changed, 141 insertions(+)
> 

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change
  2018-01-07 10:45 ` [PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change Ido Schimmel
@ 2018-01-07 17:16   ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:16 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Check that IPv4 and IPv6 react the same when the carrier of a netdev is
> toggled. Local routes should not be affected by this, whereas unicast
> routes should.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  tools/testing/selftests/net/fib_tests.sh | 142 +++++++++++++++++++++++++++++++
>  1 file changed, 142 insertions(+)

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (17 preceding siblings ...)
  2018-01-07 10:45 ` [PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change Ido Schimmel
@ 2018-01-07 17:20 ` David Ahern
  2018-01-09  8:04   ` Ido Schimmel
  2018-01-08  2:30 ` David Miller
  19 siblings, 1 reply; 41+ messages in thread
From: David Ahern @ 2018-01-07 17:20 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> This set tries to eliminate some differences between IPv4's and IPv6's
> treatment of nexthops. These differences are most likely a side effect
> of IPv6's data structures (specifically 'rt6_info') that incorporate
> both the route and the nexthop and the late addition of ECMP support in
> commit 51ebd3181572 ("ipv6: add support of equal cost multipath
> (ECMP)").
> 
> IPv4 and IPv6 do not react the same to certain netdev events. For
> example, upon carrier change affected IPv4 nexthops are marked using the
> RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly.
> IPv6 on the other hand, does nothing which forces us to perform a
> carrier check during route lookup and dump. This makes it difficult to
> introduce features such as non-equal-cost multipath that are built on
> top of this set [1].
> 
> In addition, when a netdev is put administratively down IPv4 nexthops
> are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all
> the routes using these nexthops. To be consistent with IPv4, multipath
> routes should only be flushed when all nexthops in the group are
> considered dead.
> 
> The first 12 patches introduce non-functional changes that store the
> RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev
> events, in a similar fashion to IPv4. This allows us to remove the
> carrier check performed during route lookup and dump.
> 
> The next three patches make sure we only flush a multipath route when
> all of its nexthops are dead.
> 
> Last three patches add test cases for IPv4/IPv6 FIB. These verify that
> both address families react similarly to netdev events.
> 
> Finally, this series also serves as a good first step towards David
> Ahern's goal of treating nexthops as standalone objects [2], as it makes
> the code more in line with IPv4 where the nexthop and the nexthop group
> are separate objects from the route itself.
> 
> 1. https://github.com/idosch/linux/tree/ipv6-nexthops
> 2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf
> 

Thanks for working on this - and creating the test cases.

One of many follow on changes that would be beneficial is to remove the
idev dereference in the hot path to check the
ignore_routes_with_linkdown sysctl.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4
  2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
                   ` (18 preceding siblings ...)
  2018-01-07 17:20 ` [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 David Ahern
@ 2018-01-08  2:30 ` David Miller
  19 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2018-01-08  2:30 UTC (permalink / raw)
  To: idosch
  Cc: netdev, dsahern, roopa, nicolas.dichtel, weiwan, kafai, yoshfuji, mlxsw

From: Ido Schimmel <idosch@mellanox.com>
Date: Sun,  7 Jan 2018 12:45:00 +0200

> This set tries to eliminate some differences between IPv4's and IPv6's
> treatment of nexthops. These differences are most likely a side effect
> of IPv6's data structures (specifically 'rt6_info') that incorporate
> both the route and the nexthop and the late addition of ECMP support in
> commit 51ebd3181572 ("ipv6: add support of equal cost multipath
> (ECMP)").
 ...
> Finally, this series also serves as a good first step towards David
> Ahern's goal of treating nexthops as standalone objects [2], as it makes
> the code more in line with IPv4 where the nexthop and the nexthop group
> are separate objects from the route itself.
 ...

Looks great, series applied, thanks Ido!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4
  2018-01-07 17:20 ` [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 David Ahern
@ 2018-01-09  8:04   ` Ido Schimmel
  2018-01-10  4:41     ` David Ahern
  0 siblings, 1 reply; 41+ messages in thread
From: Ido Schimmel @ 2018-01-09  8:04 UTC (permalink / raw)
  To: David Ahern
  Cc: Ido Schimmel, netdev, davem, roopa, nicolas.dichtel, weiwan,
	kafai, yoshfuji, mlxsw

On Sun, Jan 07, 2018 at 10:20:11AM -0700, David Ahern wrote:
> One of many follow on changes that would be beneficial is to remove the
> idev dereference in the hot path to check the
> ignore_routes_with_linkdown sysctl.

When a netdev loses its carrier we can set the RTNH_F_DEAD flag for all
the nexthops using it as their nexthop device, in case the sysctl is
set. Similarly, when the sysctl is toggled we can walk the routing
tables and toggle the flag where appropriate.

You have a different idea?

Thanks for reviewing!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4
  2018-01-09  8:04   ` Ido Schimmel
@ 2018-01-10  4:41     ` David Ahern
  0 siblings, 0 replies; 41+ messages in thread
From: David Ahern @ 2018-01-10  4:41 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Ido Schimmel, netdev, davem, roopa, nicolas.dichtel, weiwan,
	kafai, yoshfuji, mlxsw

On 1/9/18 1:04 AM, Ido Schimmel wrote:
> On Sun, Jan 07, 2018 at 10:20:11AM -0700, David Ahern wrote:
>> One of many follow on changes that would be beneficial is to remove the
>> idev dereference in the hot path to check the
>> ignore_routes_with_linkdown sysctl.
> 
> When a netdev loses its carrier we can set the RTNH_F_DEAD flag for all
> the nexthops using it as their nexthop device, in case the sysctl is
> set. Similarly, when the sysctl is toggled we can walk the routing
> tables and toggle the flag where appropriate.
> 
> You have a different idea?

that would work.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-01-10  4:41 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-07 10:45 [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 Ido Schimmel
2018-01-07 10:45 ` [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle Ido Schimmel
2018-01-07 16:53   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags Ido Schimmel
2018-01-07 16:54   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up Ido Schimmel
2018-01-07 16:54   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events Ido Schimmel
2018-01-07 16:55   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change Ido Schimmel
2018-01-07 16:56   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 06/18] ipv6: Set nexthop flags during route creation Ido Schimmel
2018-01-07 16:58   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier Ido Schimmel
2018-01-07 16:59   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 08/18] ipv6: Check nexthop flags in route dump " Ido Schimmel
2018-01-07 17:01   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 09/18] ipv6: Ignore dead routes during lookup Ido Schimmel
2018-01-07 17:02   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 10/18] ipv6: Report dead flag during route dump Ido Schimmel
2018-01-07 17:02   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 11/18] ipv6: Add explicit flush indication to routes Ido Schimmel
2018-01-07 17:03   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes Ido Schimmel
2018-01-07 17:04   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 13/18] ipv6: Export sernum update function Ido Schimmel
2018-01-07 17:04   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 14/18] ipv6: Take table lock outside of " Ido Schimmel
2018-01-07 17:05   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead Ido Schimmel
2018-01-07 17:08   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB Ido Schimmel
2018-01-07 17:15   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down Ido Schimmel
2018-01-07 17:15   ` David Ahern
2018-01-07 10:45 ` [PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change Ido Schimmel
2018-01-07 17:16   ` David Ahern
2018-01-07 17:20 ` [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4 David Ahern
2018-01-09  8:04   ` Ido Schimmel
2018-01-10  4:41     ` David Ahern
2018-01-08  2:30 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.