* [PATCH net] ipv4: Update exception handling for multipath routes via same device
@ 2020-09-15 3:03 David Ahern
2020-09-15 22:46 ` David Miller
2020-10-07 15:20 ` Tobias Brunner
0 siblings, 2 replies; 4+ messages in thread
From: David Ahern @ 2020-09-15 3:03 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, David Ahern, Kfir Itzhak
Kfir reported that pmtu exceptions are not created properly for
deployments where multipath routes use the same device.
After some digging I see 2 compounding problems:
1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
the route lookup. This is the second use case where this has
been a problem (the first is related to use of vti devices with
VRF). I can not find any reason for the oif to be changed after the
lookup; the code goes back to the start of git. It does not seem
logical so remove it.
2. fib_lookups for exceptions do not call fib_select_path to handle
multipath route selection based on the hash.
The end result is that the fib_lookup used to add the exception
always creates it based using the first leg of the route.
An example topology showing the problem:
| host1
+------+
| eth0 | .209
+------+
|
+------+
switch | br0 |
+------+
|
+---------+---------+
| host2 | host3
+------+ +------+
| eth0 | .250 | eth0 | 192.168.252.252
+------+ +------+
+-----+ +-----+
| vti | .2 | vti | 192.168.247.3
+-----+ +-----+
\ /
=================================
tunnels
192.168.247.1/24
for h in host1 host2 host3; do
ip netns add ${h}
ip -netns ${h} link set lo up
ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
done
ip netns add switch
ip -netns switch li set lo up
ip -netns switch link add br0 type bridge stp 0
ip -netns switch link set br0 up
for n in 1 2 3; do
ip -netns switch link add eth-sw type veth peer name eth-h${n}
ip -netns switch li set eth-h${n} master br0 up
ip -netns switch li set eth-sw netns host${n} name eth0
done
ip -netns host1 addr add 192.168.252.209/24 dev eth0
ip -netns host1 link set dev eth0 up
ip -netns host1 route add 192.168.247.0/24 \
nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0
ip -netns host2 addr add 192.168.252.250/24 dev eth0
ip -netns host2 link set dev eth0 up
ip -netns host2 addr add 192.168.252.252/24 dev eth0
ip -netns host3 link set dev eth0 up
ip netns add tunnel
ip -netns tunnel li set lo up
ip -netns tunnel li add br0 type bridge
ip -netns tunnel li set br0 up
for n in $(seq 11 20); do
ip -netns tunnel addr add dev br0 192.168.247.${n}/24
done
for n in 2 3
do
ip -netns tunnel link add vti${n} type veth peer name eth${n}
ip -netns tunnel link set eth${n} mtu 1360 master br0 up
ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
done
ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3
ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
ip -netns host1 ro ls cache
Before this patch the cache always shows exceptions against the first
leg in the multipath route; 192.168.252.250 per this example. Since the
hash has an initial random seed, you may need to vary the final octet
more than what is listed. In my tests, using addresses between 11 and 19
usually found 1 that used both legs.
With this patch, the cache will have exceptions for both legs.
Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions")
Reported-by: Kfir Itzhak <mastertheknife@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
---
net/ipv4/route.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e5f210d00851..58642b29a499 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -786,8 +786,10 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
neigh_event_send(n, NULL);
} else {
if (fib_lookup(net, fl4, &res, 0) == 0) {
- struct fib_nh_common *nhc = FIB_RES_NHC(res);
+ struct fib_nh_common *nhc;
+ fib_select_path(net, &res, fl4, skb);
+ nhc = FIB_RES_NHC(res);
update_or_create_fnhe(nhc, fl4->daddr, new_gw,
0, false,
jiffies + ip_rt_gc_timeout);
@@ -1013,6 +1015,7 @@ out: kfree_skb(skb);
static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
{
struct dst_entry *dst = &rt->dst;
+ struct net *net = dev_net(dst->dev);
u32 old_mtu = ipv4_mtu(dst);
struct fib_result res;
bool lock = false;
@@ -1033,9 +1036,11 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
return;
rcu_read_lock();
- if (fib_lookup(dev_net(dst->dev), fl4, &res, 0) == 0) {
- struct fib_nh_common *nhc = FIB_RES_NHC(res);
+ if (fib_lookup(net, fl4, &res, 0) == 0) {
+ struct fib_nh_common *nhc;
+ fib_select_path(net, &res, fl4, NULL);
+ nhc = FIB_RES_NHC(res);
update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
jiffies + ip_rt_mtu_expires);
}
@@ -2668,8 +2673,6 @@ struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 *fl4,
fib_select_path(net, res, fl4, skb);
dev_out = FIB_RES_DEV(*res);
- fl4->flowi4_oif = dev_out->ifindex;
-
make_route:
rth = __mkroute_output(res, fl4, orig_oif, dev_out, flags);
--
2.24.3 (Apple Git-128)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net] ipv4: Update exception handling for multipath routes via same device
2020-09-15 3:03 [PATCH net] ipv4: Update exception handling for multipath routes via same device David Ahern
@ 2020-09-15 22:46 ` David Miller
2020-09-15 23:33 ` David Ahern
2020-10-07 15:20 ` Tobias Brunner
1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2020-09-15 22:46 UTC (permalink / raw)
To: dsahern; +Cc: netdev, kuba, mastertheknife
From: David Ahern <dsahern@kernel.org>
Date: Mon, 14 Sep 2020 21:03:54 -0600
> Kfir reported that pmtu exceptions are not created properly for
> deployments where multipath routes use the same device.
>
> After some digging I see 2 compounding problems:
> 1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
> the route lookup. This is the second use case where this has
> been a problem (the first is related to use of vti devices with
> VRF). I can not find any reason for the oif to be changed after the
> lookup; the code goes back to the start of git. It does not seem
> logical so remove it.
>
> 2. fib_lookups for exceptions do not call fib_select_path to handle
> multipath route selection based on the hash.
>
> The end result is that the fib_lookup used to add the exception
> always creates it based using the first leg of the route.
...
> Before this patch the cache always shows exceptions against the first
> leg in the multipath route; 192.168.252.250 per this example. Since the
> hash has an initial random seed, you may need to vary the final octet
> more than what is listed. In my tests, using addresses between 11 and 19
> usually found 1 that used both legs.
>
> With this patch, the cache will have exceptions for both legs.
>
> Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions")
> Reported-by: Kfir Itzhak <mastertheknife@gmail.com>
> Signed-off-by: David Ahern <dsahern@kernel.org>
Applied and queued up for -stable, thanks David.
The example topology and commands look like a good addition for
selftests perhaps?
Thanks again.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] ipv4: Update exception handling for multipath routes via same device
2020-09-15 22:46 ` David Miller
@ 2020-09-15 23:33 ` David Ahern
0 siblings, 0 replies; 4+ messages in thread
From: David Ahern @ 2020-09-15 23:33 UTC (permalink / raw)
To: David Miller, dsahern; +Cc: netdev, kuba, mastertheknife
On 9/15/20 4:46 PM, David Miller wrote:
>
> The example topology and commands look like a good addition for
> selftests perhaps?
>
Definitely. I plan to integrate it into pmtu.sh.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] ipv4: Update exception handling for multipath routes via same device
2020-09-15 3:03 [PATCH net] ipv4: Update exception handling for multipath routes via same device David Ahern
2020-09-15 22:46 ` David Miller
@ 2020-10-07 15:20 ` Tobias Brunner
1 sibling, 0 replies; 4+ messages in thread
From: Tobias Brunner @ 2020-10-07 15:20 UTC (permalink / raw)
To: David Ahern, netdev
Cc: davem, kuba, Kfir Itzhak, Steffen Klassert, Paul Wouters
Hi David,
> @@ -2668,8 +2673,6 @@ struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 *fl4,
> fib_select_path(net, res, fl4, skb);
>
> dev_out = FIB_RES_DEV(*res);
> - fl4->flowi4_oif = dev_out->ifindex;
> -
>
> make_route:
> rth = __mkroute_output(res, fl4, orig_oif, dev_out, flags);
This breaks some IPsec scenarios with interfaces in IPsec policies, an
example can be found under [1], where host moon configures policies with
eth0 as interface. Without this assignment, however, packets don't
match these policies anymore and are sent unprotected (or won't get
blocked by the drop policy).
Regards,
Tobias
[1] https://www.strongswan.org/testing/testresults/swanctl/manual-prio/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-10-07 15:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15 3:03 [PATCH net] ipv4: Update exception handling for multipath routes via same device David Ahern
2020-09-15 22:46 ` David Miller
2020-09-15 23:33 ` David Ahern
2020-10-07 15:20 ` Tobias Brunner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).