netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions
@ 2019-06-08 18:12 Stefano Brivio
  2019-06-08 18:12 ` [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route() Stefano Brivio
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Stefano Brivio @ 2019-06-08 18:12 UTC (permalink / raw)
  To: David Miller
  Cc: Jianlin Shi, Wei Wang, David Ahern, Martin KaFai Lau,
	Eric Dumazet, Matti Vaittinen, netdev

The commands 'ip -6 route list cache' and 'ip -6 route flush cache'
don't work at all after route exceptions have been moved to a separate
hash table in commit 2b760fcf5cfb ("ipv6: hook up exception table to store
dst cache"). Fix that.

v3: Drop check on RTM_F_CLONED and rework logic of return values of
    rt6_dump_route()

v2: Add count of routes handled in partial dumps, and skip them, in patch 1/2.

Stefano Brivio (2):
  ipv6: Dump route exceptions too in rt6_dump_route()
  ip6_fib: Don't discard nodes with valid routing information in
    fib6_locate_1()

 include/net/ip6_fib.h   |  1 +
 include/net/ip6_route.h |  2 +-
 net/ipv6/ip6_fib.c      | 24 ++++++++++-----
 net/ipv6/route.c        | 67 ++++++++++++++++++++++++++++++++++++-----
 4 files changed, 78 insertions(+), 16 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route()
  2019-06-08 18:12 [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions Stefano Brivio
@ 2019-06-08 18:12 ` Stefano Brivio
  2019-06-10 21:31   ` David Ahern
  2019-06-08 18:12 ` [PATCH net v3 2/2] ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() Stefano Brivio
  2019-06-10 21:38 ` [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions David Ahern
  2 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2019-06-08 18:12 UTC (permalink / raw)
  To: David Miller
  Cc: Jianlin Shi, Wei Wang, David Ahern, Martin KaFai Lau,
	Eric Dumazet, Matti Vaittinen, netdev

Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst
cache"), route exceptions reside in a separate hash table, and won't be
found by walking the FIB, so they won't be dumped to userspace on a
RTM_GETROUTE message.

This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to
have no function anymore:

 # ip -6 route get fc00:3::1
 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium
 # ip -6 route get fc00:4::1
 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium
 # ip -6 route list cache
 # ip -6 route flush cache
 # ip -6 route get fc00:3::1
 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium
 # ip -6 route get fc00:4::1
 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium

because iproute2 lists cached routes using RTM_GETROUTE, and flushes them
by listing all the routes, and deleting them with RTM_DELROUTE one by one.

Look up exceptions in the hash table associated with the current fib6_info
in rt6_dump_route(), and, if present and not expired, add them to the
dump.

We might be unable to dump all the entries for a given node in a single
message, so keep track of how many entries were handled for the current
node in fib6_walker, and skip that amount in case we start from the same
partially dumped node.

Re-allow userspace to get FIB results by passing the RTM_F_CLONED flag as
filter, by reverting commit 08e814c9e8eb ("net/ipv6: Bail early if user
only wants cloned entries"). Note that this is needed only for 'ip -6
route list cache': on a flush command, iproute2 doesn't pass RTM_F_CLONED.
Due to this inconsistency, we can't filter on RTM_F_CLONED, because a
flush command would not set the flag in the dump request and get no routes
to flush.

Also note that iproute2 will actually filter unwanted routes (i.e. print
exceptions iff the 'cache' specifier is given).

To avoid dumping exceptions if not requested, we can, in the future, add
support for NLM_F_MATCH as described by RFC 3549. This would also require
some changes in iproute2: whenever a 'cache' argument is given,
RTM_F_CLONED should be set in the dump request and, when filtering in the
kernel is desired, NLM_F_MATCH should be also passed. We can then signal
filtering with the NLM_F_DUMP_FILTERED whenever a NLM_F_MATCH flag caused
it.

To flush cached routes, a procfs entry could be introduced instead: that's
how it works for IPv4. We already have a rt6_flush_exception() function
ready to be wired to it. However, this would not solve the issue for
listing, and wouldn't fix the issue with current and previous versions of
iproute2.

v3:
  - more descriptive comment about expired exceptions in rt6_dump_route()
  - swap return values of rt6_dump_route() (suggested by Martin Lau)
  - don't zero skip_in_node in case we don't dump anything in a given pass
    (also suggested by Martin Lau)
  - remove check on RTM_F_CLONED altogether: in the current UAPI semantic,
    it's just a flag to indicate the route was cloned, not to filter on
    routes

v2: Add tracking of number of entries to be skipped in current node after
    a partial dump. As we restart from the same node, if not all the
    exceptions for a given node fit in a single message, the dump will
    not terminate, as suggested by Martin Lau. This is a concrete
    possibility, setting up a big number of exceptions for the same route
    actually causes the issue, suggested by David Ahern.

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
This will cause a non-trivial conflict with commit cc5c073a693f
("ipv6: Move exception bucket to fib6_nh") on net-next. I can submit
an equivalent patch against net-next, if it helps.

 include/net/ip6_fib.h   |  1 +
 include/net/ip6_route.h |  2 +-
 net/ipv6/ip6_fib.c      | 21 ++++++++-----
 net/ipv6/route.c        | 67 ++++++++++++++++++++++++++++++++++++-----
 4 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 855b352b660f..5909a9d8ff67 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -312,6 +312,7 @@ struct fib6_walker {
 	enum fib6_walk_state state;
 	unsigned int skip;
 	unsigned int count;
+	unsigned int skip_in_node;
 	int (*func)(struct fib6_walker *);
 	void *args;
 };
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 4790beaa86e0..b66c4aac56ab 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -178,7 +178,7 @@ struct rt6_rtnl_dump_arg {
 	struct fib_dump_filter filter;
 };
 
-int rt6_dump_route(struct fib6_info *f6i, void *p_arg);
+int rt6_dump_route(struct fib6_info *f6i, void *p_arg, unsigned int skip);
 void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 9180c8b6f764..73dbad54baea 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -469,12 +469,19 @@ static int fib6_dump_node(struct fib6_walker *w)
 	struct fib6_info *rt;
 
 	for_each_fib6_walker_rt(w) {
-		res = rt6_dump_route(rt, w->args);
-		if (res < 0) {
+		res = rt6_dump_route(rt, w->args, w->skip_in_node);
+		if (res >= 0) {
 			/* Frame is full, suspend walking */
 			w->leaf = rt;
+
+			/* We'll restart from this node, so if some routes were
+			 * already dumped, skip them next time.
+			 */
+			w->skip_in_node += res;
+
 			return 1;
 		}
+		w->skip_in_node = 0;
 
 		/* Multipath routes are dumped in one route with the
 		 * RTA_MULTIPATH attribute. Jump 'rt' to point to the
@@ -526,6 +533,7 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb,
 	if (cb->args[4] == 0) {
 		w->count = 0;
 		w->skip = 0;
+		w->skip_in_node = 0;
 
 		spin_lock_bh(&table->tb6_lock);
 		res = fib6_walk(net, w);
@@ -541,6 +549,7 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb,
 			w->state = FWS_INIT;
 			w->node = w->root;
 			w->skip = w->count;
+			w->skip_in_node = 0;
 		} else
 			w->skip = 0;
 
@@ -577,13 +586,10 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	} else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) {
 		struct rtmsg *rtm = nlmsg_data(nlh);
 
-		arg.filter.flags = rtm->rtm_flags & (RTM_F_PREFIX|RTM_F_CLONED);
+		if (rtm->rtm_flags & RTM_F_PREFIX)
+			arg.filter.flags = RTM_F_PREFIX;
 	}
 
-	/* fib entries are never clones */
-	if (arg.filter.flags & RTM_F_CLONED)
-		goto out;
-
 	w = (void *)cb->args[2];
 	if (!w) {
 		/* New dump:
@@ -2041,6 +2047,7 @@ static void fib6_clean_tree(struct net *net, struct fib6_node *root,
 	c.w.func = fib6_clean_node;
 	c.w.count = 0;
 	c.w.skip = 0;
+	c.w.skip_in_node = 0;
 	c.func = func;
 	c.sernum = sernum;
 	c.arg = arg;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0f60eb3a2873..45ada000a98e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4854,33 +4854,86 @@ static bool fib6_info_uses_dev(const struct fib6_info *f6i,
 	return false;
 }
 
-int rt6_dump_route(struct fib6_info *rt, void *p_arg)
+/* Return -1 if done with node, number of handled routes on partial dump */
+int rt6_dump_route(struct fib6_info *rt, void *p_arg, unsigned int skip)
 {
 	struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg;
 	struct fib_dump_filter *filter = &arg->filter;
+	struct rt6_exception_bucket *bucket;
 	unsigned int flags = NLM_F_MULTI;
+	struct rt6_exception *rt6_ex;
 	struct net *net = arg->net;
+	int i, count = 0;
 
 	if (rt == net->ipv6.fib6_null_entry)
-		return 0;
+		return -1;
 
 	if ((filter->flags & RTM_F_PREFIX) &&
 	    !(rt->fib6_flags & RTF_PREFIX_RT)) {
 		/* success since this is not a prefix route */
-		return 1;
+		return -1;
 	}
 	if (filter->filter_set) {
 		if ((filter->rt_type && rt->fib6_type != filter->rt_type) ||
 		    (filter->dev && !fib6_info_uses_dev(rt, filter->dev)) ||
 		    (filter->protocol && rt->fib6_protocol != filter->protocol)) {
-			return 1;
+			return -1;
 		}
 		flags |= NLM_F_DUMP_FILTERED;
 	}
 
-	return rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, 0,
-			     RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid,
-			     arg->cb->nlh->nlmsg_seq, flags);
+	if (skip) {
+		skip--;
+	} else {
+		if (rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, 0,
+				  RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid,
+				  arg->cb->nlh->nlmsg_seq, flags)) {
+			return 0;
+		}
+
+		count++;
+	}
+
+	bucket = rcu_dereference(rt->rt6i_exception_bucket);
+	if (!bucket)
+		return -1;
+
+	for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) {
+		hlist_for_each_entry(rt6_ex, &bucket->chain, hlist) {
+			if (skip) {
+				skip--;
+				continue;
+			}
+
+			/* Expiration of entries doesn't bump sernum, insertion
+			 * does. Removal is triggered by insertion, so we can
+			 * rely on the fact that if entries change between two
+			 * partial dumps, this node is scanned again completely,
+			 * see rt6_insert_exception() and fib6_dump_table().
+			 *
+			 * Count expired entries we go through as handled
+			 * entries that we'll skip next time, in case of partial
+			 * node dump. Otherwise, if entries expire meanwhile,
+			 * we'll skip the wrong amount.
+			 */
+			if (rt6_check_expired(rt6_ex->rt6i)) {
+				count++;
+				continue;
+			}
+
+			if (rt6_fill_node(net, arg->skb, rt, &rt6_ex->rt6i->dst,
+					  NULL, NULL, 0, RTM_NEWROUTE,
+					  NETLINK_CB(arg->cb->skb).portid,
+					  arg->cb->nlh->nlmsg_seq, flags)) {
+				return count;
+			}
+
+			count++;
+		}
+		bucket++;
+	}
+
+	return -1;
 }
 
 static int inet6_rtm_valid_getroute_req(struct sk_buff *skb,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net v3 2/2] ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()
  2019-06-08 18:12 [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions Stefano Brivio
  2019-06-08 18:12 ` [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route() Stefano Brivio
@ 2019-06-08 18:12 ` Stefano Brivio
  2019-06-10 21:38 ` [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions David Ahern
  2 siblings, 0 replies; 13+ messages in thread
From: Stefano Brivio @ 2019-06-08 18:12 UTC (permalink / raw)
  To: David Miller
  Cc: Jianlin Shi, Wei Wang, David Ahern, Martin KaFai Lau,
	Eric Dumazet, Matti Vaittinen, netdev

When we perform an inexact match on FIB nodes via fib6_locate_1(), longer
prefixes will be preferred to shorter ones. However, it might happen that
a node, with higher fn_bit value than some other, has no valid routing
information.

In this case, we'll pick that node, but it will be discarded by the check
on RTN_RTINFO in fib6_locate(), and we might miss nodes with valid routing
information but with lower fn_bit value.

This is apparent when a routing exception is created for a default route:
 # ip -6 route list
 fc00:1::/64 dev veth_A-R1 proto kernel metric 256 pref medium
 fc00:2::/64 dev veth_A-R2 proto kernel metric 256 pref medium
 fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 pref medium
 fe80::/64 dev veth_A-R1 proto kernel metric 256 pref medium
 fe80::/64 dev veth_A-R2 proto kernel metric 256 pref medium
 default via fc00:1::2 dev veth_A-R1 metric 1024 pref medium
 # ip -6 route list cache
 fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 expires 593sec mtu 1500 pref medium
 fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 593sec mtu 1500 pref medium
 # ip -6 route flush cache    # node for default route is discarded
 Failed to send flush request: No such process
 # ip -6 route list cache
 fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 586sec mtu 1500 pref medium

Check right away if the node has a RTN_RTINFO flag, before replacing the
'prev' pointer, that indicates the longest matching prefix found so far.

Fixes: 38fbeeeeccdb ("ipv6: prepare fib6_locate() for exception table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
v3: No changes

v2: No changes

 net/ipv6/ip6_fib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 73dbad54baea..1b68d0d5b6f1 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1542,7 +1542,8 @@ static struct fib6_node *fib6_locate_1(struct fib6_node *root,
 		if (plen == fn->fn_bit)
 			return fn;
 
-		prev = fn;
+		if (fn->fn_flags & RTN_RTINFO)
+			prev = fn;
 
 next:
 		/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route()
  2019-06-08 18:12 ` [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route() Stefano Brivio
@ 2019-06-10 21:31   ` David Ahern
  2019-06-10 21:45     ` Stefano Brivio
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2019-06-10 21:31 UTC (permalink / raw)
  To: Stefano Brivio, David Miller
  Cc: Jianlin Shi, Wei Wang, Martin KaFai Lau, Eric Dumazet,
	Matti Vaittinen, netdev

On 6/8/19 12:12 PM, Stefano Brivio wrote:
> To avoid dumping exceptions if not requested, we can, in the future, add
> support for NLM_F_MATCH as described by RFC 3549. This would also require
> some changes in iproute2: whenever a 'cache' argument is given,
> RTM_F_CLONED should be set in the dump request and, when filtering in the
> kernel is desired, NLM_F_MATCH should be also passed. We can then signal
> filtering with the NLM_F_DUMP_FILTERED whenever a NLM_F_MATCH flag caused
> it.

NLM_F_MATCH is set today. iproute2 for example uses NLM_F_DUMP for dump
requests and NLM_F_DUMP is defined as:

#define NLM_F_DUMP      (NLM_F_ROOT|NLM_F_MATCH)

further, the kernel already supports kernel side filtering now for
routes. See ip_valid_fib_dump_req.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions
  2019-06-08 18:12 [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions Stefano Brivio
  2019-06-08 18:12 ` [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route() Stefano Brivio
  2019-06-08 18:12 ` [PATCH net v3 2/2] ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() Stefano Brivio
@ 2019-06-10 21:38 ` David Ahern
  2019-06-10 21:50   ` Martin Lau
  2019-06-10 21:53   ` Stefano Brivio
  2 siblings, 2 replies; 13+ messages in thread
From: David Ahern @ 2019-06-10 21:38 UTC (permalink / raw)
  To: Stefano Brivio, David Miller
  Cc: Jianlin Shi, Wei Wang, Martin KaFai Lau, Eric Dumazet,
	Matti Vaittinen, netdev

On 6/8/19 12:12 PM, Stefano Brivio wrote:
> The commands 'ip -6 route list cache' and 'ip -6 route flush cache'
> don't work at all after route exceptions have been moved to a separate
> hash table in commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> dst cache"). Fix that.

The breakage is the limited ability to remove exceptions. Yes, you can
delete a v6 exception route if you know it exists. Without the ability
to list them, you have to guess.

The ability to list exceptions was deleted 2 years ago with 4.15. So far
no one has complained that exceptions do not show up in route dumps.
Rather than perturb the system again and worse with different behaviors,
in dot releases of stable trees, I think it would be better to converge
on consistent behavior between v4 and v6. By that I mean without the
CLONED flag, no exceptions are returned (default FIB dump). With the
CLONED flag only exceptions are returned.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route()
  2019-06-10 21:31   ` David Ahern
@ 2019-06-10 21:45     ` Stefano Brivio
  2019-06-10 21:47       ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2019-06-10 21:45 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Jianlin Shi, Wei Wang, Martin KaFai Lau,
	Eric Dumazet, Matti Vaittinen, netdev

On Mon, 10 Jun 2019 15:31:37 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 6/8/19 12:12 PM, Stefano Brivio wrote:
> > To avoid dumping exceptions if not requested, we can, in the future, add
> > support for NLM_F_MATCH as described by RFC 3549. This would also require
> > some changes in iproute2: whenever a 'cache' argument is given,
> > RTM_F_CLONED should be set in the dump request and, when filtering in the
> > kernel is desired, NLM_F_MATCH should be also passed. We can then signal
> > filtering with the NLM_F_DUMP_FILTERED whenever a NLM_F_MATCH flag caused
> > it.  
> 
> NLM_F_MATCH is set today. iproute2 for example uses NLM_F_DUMP for dump
> requests and NLM_F_DUMP is defined as:
> 
> #define NLM_F_DUMP      (NLM_F_ROOT|NLM_F_MATCH)
> 
> further, the kernel already supports kernel side filtering now for
> routes. See ip_valid_fib_dump_req.

Indeed, we don't have to add much: just make this work for IPv4 too,
honour NLM_F_MATCH, and skip filtering (further optimisation) on
NLM_F_DUMP_FILTERED in iproute2 (ip neigh already uses that).

-- 
Stefano

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route()
  2019-06-10 21:45     ` Stefano Brivio
@ 2019-06-10 21:47       ` David Ahern
  2019-06-10 21:55         ` Stefano Brivio
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2019-06-10 21:47 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David Miller, Jianlin Shi, Wei Wang, Martin KaFai Lau,
	Eric Dumazet, Matti Vaittinen, netdev

On 6/10/19 3:45 PM, Stefano Brivio wrote:
> Indeed, we don't have to add much: just make this work for IPv4 too,
> honour NLM_F_MATCH, and skip filtering (further optimisation) on
> NLM_F_DUMP_FILTERED in iproute2 (ip neigh already uses that).

you can't. Not all of iproute2's filter options are handled by the
kernel (and nor should they be).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions
  2019-06-10 21:38 ` [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions David Ahern
@ 2019-06-10 21:50   ` Martin Lau
  2019-06-10 21:53   ` Stefano Brivio
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Lau @ 2019-06-10 21:50 UTC (permalink / raw)
  To: David Ahern
  Cc: Stefano Brivio, David Miller, Jianlin Shi, Wei Wang,
	Eric Dumazet, Matti Vaittinen, netdev

On Mon, Jun 10, 2019 at 03:38:06PM -0600, David Ahern wrote:
> The ability to list exceptions was deleted 2 years ago with 4.15. So far
> no one has complained that exceptions do not show up in route dumps.
> Rather than perturb the system again and worse with different behaviors,
> in dot releases of stable trees, I think it would be better to converge
> on consistent behavior between v4 and v6. By that I mean without the
> CLONED flag, no exceptions are returned (default FIB dump). With the
> CLONED flag only exceptions are returned.
+1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions
  2019-06-10 21:38 ` [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions David Ahern
  2019-06-10 21:50   ` Martin Lau
@ 2019-06-10 21:53   ` Stefano Brivio
  2019-06-10 22:47     ` Stefano Brivio
  1 sibling, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2019-06-10 21:53 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Jianlin Shi, Wei Wang, Martin KaFai Lau,
	Eric Dumazet, Matti Vaittinen, netdev

On Mon, 10 Jun 2019 15:38:06 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 6/8/19 12:12 PM, Stefano Brivio wrote:
> > The commands 'ip -6 route list cache' and 'ip -6 route flush cache'
> > don't work at all after route exceptions have been moved to a separate
> > hash table in commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> > dst cache"). Fix that.  
> 
> The breakage is the limited ability to remove exceptions. Yes, you can
> delete a v6 exception route if you know it exists. Without the ability
> to list them, you have to guess.
> 
> The ability to list exceptions was deleted 2 years ago with 4.15. So far
> no one has complained that exceptions do not show up in route dumps.

I am doing it right now...

> Rather than perturb the system again and worse with different behaviors,

Well, I'm just trying to restore the behaviour before 2b760fcf5cfb
it's not "different".

I don't think 2b760fcf5cfb intended to break iproute2 like that.

> in dot releases of stable trees, I think it would be better to converge
> on consistent behavior between v4 and v6. By that I mean without the
> CLONED flag, no exceptions are returned (default FIB dump). With the
> CLONED flag only exceptions are returned.

Again, this needs a change in iproute2, because RTM_F_CLONED is *not*
passed on 'flush'. And sure, let's *also* do that, but not everybody
runs recent versions of iproute2.

-- 
Stefano

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route()
  2019-06-10 21:47       ` David Ahern
@ 2019-06-10 21:55         ` Stefano Brivio
  0 siblings, 0 replies; 13+ messages in thread
From: Stefano Brivio @ 2019-06-10 21:55 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Jianlin Shi, Wei Wang, Martin KaFai Lau,
	Eric Dumazet, Matti Vaittinen, netdev

On Mon, 10 Jun 2019 15:47:16 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 6/10/19 3:45 PM, Stefano Brivio wrote:
> > Indeed, we don't have to add much: just make this work for IPv4 too,
> > honour NLM_F_MATCH, and skip filtering (further optimisation) on
> > NLM_F_DUMP_FILTERED in iproute2 (ip neigh already uses that).  
> 
> you can't. Not all of iproute2's filter options are handled by the
> kernel (and nor should they be).

Right, of course. Discard that last part.

-- 
Stefano

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions
  2019-06-10 21:53   ` Stefano Brivio
@ 2019-06-10 22:47     ` Stefano Brivio
  2019-06-11 20:19       ` Martin Lau
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2019-06-10 22:47 UTC (permalink / raw)
  To: David Ahern, Martin KaFai Lau
  Cc: David Miller, Jianlin Shi, Wei Wang, Eric Dumazet,
	Matti Vaittinen, netdev

On Mon, 10 Jun 2019 23:53:15 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Mon, 10 Jun 2019 15:38:06 -0600
> David Ahern <dsahern@gmail.com> wrote:
> 
> > in dot releases of stable trees, I think it would be better to converge
> > on consistent behavior between v4 and v6. By that I mean without the
> > CLONED flag, no exceptions are returned (default FIB dump). With the
> > CLONED flag only exceptions are returned.  
> 
> Again, this needs a change in iproute2, because RTM_F_CLONED is *not*
> passed on 'flush'. And sure, let's *also* do that, but not everybody
> runs recent versions of iproute2.

One thing that sounds a bit more acceptable to me is:

- dump (in IPv4 and IPv6):
  - regular routes only, if !RTM_F_CLONED and NLM_F_MATCH
  - exceptions only, if RTM_F_CLONED and NLM_F_MATCH
  - everything if !NLM_F_MATCH

- fix iproute2 so that RTM_F_CLONED is passed on 'flush cache', or
  don't pass NLM_F_MATCH in that case

this way, the kernel respects the intended semantics of flags, and we
fix a bug in iproute2 (that was always present).

I think it's not ideal, because the kernel unexpectedly changed the
behaviour and we're not guaranteeing that older iproute2 works. The
fact it was broken for two years is probably a partial excuse for this,
though.

What do you think? I'll prepare a v4 for net-next if we all agree.

I'm not entirely sure which trees I should target. I guess this
introduces a feature in the kernel, so net-next, and fixes a bug in
iproute2, so iproute2.git?

-- 
Stefano

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions
  2019-06-10 22:47     ` Stefano Brivio
@ 2019-06-11 20:19       ` Martin Lau
  2019-06-11 21:09         ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Lau @ 2019-06-11 20:19 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David Ahern, David Miller, Jianlin Shi, Wei Wang, Eric Dumazet,
	Matti Vaittinen, netdev

On Tue, Jun 11, 2019 at 12:47:58AM +0200, Stefano Brivio wrote:
> On Mon, 10 Jun 2019 23:53:15 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Mon, 10 Jun 2019 15:38:06 -0600
> > David Ahern <dsahern@gmail.com> wrote:
> > 
> > > in dot releases of stable trees, I think it would be better to converge
> > > on consistent behavior between v4 and v6. By that I mean without the
> > > CLONED flag, no exceptions are returned (default FIB dump). With the
> > > CLONED flag only exceptions are returned.  
> > 
> > Again, this needs a change in iproute2, because RTM_F_CLONED is *not*
> > passed on 'flush'. And sure, let's *also* do that, but not everybody
> > runs recent versions of iproute2.
> 
> One thing that sounds a bit more acceptable to me is:
> 
> - dump (in IPv4 and IPv6):
>   - regular routes only, if !RTM_F_CLONED and NLM_F_MATCH
>   - exceptions only, if RTM_F_CLONED and NLM_F_MATCH
That seems reasonable since DavidAhern pointed out iproute2 already has
#define NLM_F_DUMP      (NLM_F_ROOT|NLM_F_MATCH)

>   - everything if !NLM_F_MATCH
I am not sure how may the kernel change looks like.  At least I don't
see the current ipv6/route.c or ipv6/ip6_fib.c is handling
nlmsg_flags.  I would defer to DavidAhern for comment.

> 
> - fix iproute2 so that RTM_F_CLONED is passed on 'flush cache', or
I would just pass RTM_F_CLONED with NLM_F_DUMP.

>   don't pass NLM_F_MATCH in that case
> 
> this way, the kernel respects the intended semantics of flags, and we
> fix a bug in iproute2 (that was always present).
> 
> I think it's not ideal, because the kernel unexpectedly changed the
> behaviour and we're not guaranteeing that older iproute2 works. The
> fact it was broken for two years is probably a partial excuse for this,
> though.
> 
> What do you think? I'll prepare a v4 for net-next if we all agree.
> 
> I'm not entirely sure which trees I should target. I guess this
> introduces a feature in the kernel, so net-next, and fixes a bug in
> iproute2, so iproute2.git?
> 
> -- 
> Stefano

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions
  2019-06-11 20:19       ` Martin Lau
@ 2019-06-11 21:09         ` David Ahern
  0 siblings, 0 replies; 13+ messages in thread
From: David Ahern @ 2019-06-11 21:09 UTC (permalink / raw)
  To: Martin Lau, Stefano Brivio
  Cc: David Miller, Jianlin Shi, Wei Wang, Eric Dumazet,
	Matti Vaittinen, netdev

On 6/11/19 2:19 PM, Martin Lau wrote:
> On Tue, Jun 11, 2019 at 12:47:58AM +0200, Stefano Brivio wrote:
>> On Mon, 10 Jun 2019 23:53:15 +0200
>> Stefano Brivio <sbrivio@redhat.com> wrote:
>>
>>> On Mon, 10 Jun 2019 15:38:06 -0600
>>> David Ahern <dsahern@gmail.com> wrote:
>>>
>>>> in dot releases of stable trees, I think it would be better to converge
>>>> on consistent behavior between v4 and v6. By that I mean without the
>>>> CLONED flag, no exceptions are returned (default FIB dump). With the
>>>> CLONED flag only exceptions are returned.  
>>>
>>> Again, this needs a change in iproute2, because RTM_F_CLONED is *not*
>>> passed on 'flush'. And sure, let's *also* do that, but not everybody
>>> runs recent versions of iproute2.
>>
>> One thing that sounds a bit more acceptable to me is:
>>
>> - dump (in IPv4 and IPv6):
>>   - regular routes only, if !RTM_F_CLONED and NLM_F_MATCH
>>   - exceptions only, if RTM_F_CLONED and NLM_F_MATCH
> That seems reasonable since DavidAhern pointed out iproute2 already has
> #define NLM_F_DUMP      (NLM_F_ROOT|NLM_F_MATCH)
> 
>>   - everything if !NLM_F_MATCH
> I am not sure how may the kernel change looks like.  At least I don't
> see the current ipv6/route.c or ipv6/ip6_fib.c is handling
> nlmsg_flags.  I would defer to DavidAhern for comment.

We might be battling change histories in 2 different code bases. We
should compare behaviors of kernel and iproute2 for 4.14 (pre-change),
4.15 (change), 4.19 (LTS), 5.0 (strict checking) and 5.2 and then look
at what the proposed kernel change does with the various iproute2 versions.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-11 21:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-08 18:12 [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions Stefano Brivio
2019-06-08 18:12 ` [PATCH net v3 1/2] ipv6: Dump route exceptions too in rt6_dump_route() Stefano Brivio
2019-06-10 21:31   ` David Ahern
2019-06-10 21:45     ` Stefano Brivio
2019-06-10 21:47       ` David Ahern
2019-06-10 21:55         ` Stefano Brivio
2019-06-08 18:12 ` [PATCH net v3 2/2] ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() Stefano Brivio
2019-06-10 21:38 ` [PATCH net v3 0/2] ipv6: Fix listing and flushing of cached route exceptions David Ahern
2019-06-10 21:50   ` Martin Lau
2019-06-10 21:53   ` Stefano Brivio
2019-06-10 22:47     ` Stefano Brivio
2019-06-11 20:19       ` Martin Lau
2019-06-11 21:09         ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).