All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] l3mdev icmp error route lookup fixes
@ 2020-09-25 20:04 Mathieu Desnoyers
  2020-09-25 20:04 ` [RFC PATCH 1/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) Mathieu Desnoyers
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-09-25 20:04 UTC (permalink / raw)
  To: David Ahern, David S . Miller, netdev, Michael Jeanson
  Cc: linux-kernel, Mathieu Desnoyers

Hi,

Here is an updated series of fixes for ipv4 and ipv6 which which ensure
the route lookup is performed on the right routing table in VRF
configurations when sending TTL expired icmp errors (useful for
traceroute).

It includes tests for both ipv4 and ipv6.

These fixes address specifically address the code paths involved in
sending TTL expired icmp errors. As detailed in the individual commit
messages, those fixes do not address similar icmp errors related to
network namespaces and unreachable / fragmentation needed messages,
which appear to use different code paths.

The main changes since the last round are updates to the selftests.

Thanks,

Mathieu


Mathieu Desnoyers (2):
  ipv4/icmp: l3mdev: Perform icmp error route lookup on source device
    routing table (v2)
  ipv6/icmp: l3mdev: Perform icmp error route lookup on source device
    routing table (v2)

Michael Jeanson (1):
  selftests: Add VRF route leaking tests

 net/ipv4/icmp.c                               |  23 +-
 net/ipv6/icmp.c                               |   7 +-
 net/ipv6/ip6_output.c                         |   2 -
 tools/testing/selftests/net/Makefile          |   1 +
 .../selftests/net/vrf_route_leaking.sh        | 626 ++++++++++++++++++
 5 files changed, 653 insertions(+), 6 deletions(-)
 create mode 100755 tools/testing/selftests/net/vrf_route_leaking.sh

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2)
  2020-09-25 20:04 [RFC PATCH 0/3] l3mdev icmp error route lookup fixes Mathieu Desnoyers
@ 2020-09-25 20:04 ` Mathieu Desnoyers
  2020-09-25 20:04 ` [RFC PATCH 2/3] ipv6/icmp: " Mathieu Desnoyers
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-09-25 20:04 UTC (permalink / raw)
  To: David Ahern, David S . Miller, netdev, Michael Jeanson
  Cc: linux-kernel, Mathieu Desnoyers

As per RFC792, ICMP errors should be sent to the source host.

However, in configurations with Virtual Routing and Forwarding tables,
looking up which routing table to use is currently done by using the
destination net_device.

commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to
determine L3 domain") changes the interface passed to
l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
to skb_dst(skb_in)->dev. This effectively uses the destination device
rather than the source device for choosing which routing table should be
used to lookup where to send the ICMP error.

Therefore, if the source and destination interfaces are within separate
VRFs, or one in the global routing table and the other in a VRF, looking
up the source host in the destination interface's routing table will
fail if the destination interface's routing table contains no route to
the source host.

One observable effect of this issue is that traceroute does not work in
the following cases:

- Route leaking between global routing table and VRF
- Route leaking between VRFs

Preferably use the source device routing table when sending ICMP error
messages. If no source device is set, fall-back on the destination
device routing table. Else, use the main routing table (index 0).

[ It has been pointed out that a similar issue may exist with ICMP
  errors triggered when forwarding between network namespaces. It would
  be worthwhile to investigate, but is outside of the scope of this
  investigation. ]

[ It has also been pointed out that a similar issue exists with
  unreachable / fragmentation needed messages, which can be triggered by
  changing the MTU of eth1 in r1 to 1400 and running:

  ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2

  Some investigation points to raw_icmp_error() and raw_err() as being
  involved in this last scenario. The focus of this patch is TTL expired
  ICMP messages, which go through icmp_route_lookup.
  Investigation of failure modes related to raw_icmp_error() is beyond
  this investigation's scope. ]

Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain")
Link: https://tools.ietf.org/html/rfc792
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
---
Changes since v1:
- Introduce icmp_get_route_lookup_dev.
---
 net/ipv4/icmp.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index cf36f955bfe6..9ea66d903c41 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -457,6 +457,23 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	local_bh_enable();
 }
 
+/*
+ * The device used for looking up which routing table to use for sending an ICMP
+ * error is preferably the source whenever it is set, which should ensure the
+ * icmp error can be sent to the source host, else lookup using the routing
+ * table of the destination device, else use the main routing table (index 0).
+ */
+static struct net_device *icmp_get_route_lookup_dev(struct sk_buff *skb)
+{
+	struct net_device *route_lookup_dev = NULL;
+
+	if (skb->dev)
+		route_lookup_dev = skb->dev;
+	else if (skb_dst(skb))
+		route_lookup_dev = skb_dst(skb)->dev;
+	return route_lookup_dev;
+}
+
 static struct rtable *icmp_route_lookup(struct net *net,
 					struct flowi4 *fl4,
 					struct sk_buff *skb_in,
@@ -465,6 +482,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
 					int type, int code,
 					struct icmp_bxm *param)
 {
+	struct net_device *route_lookup_dev;
 	struct rtable *rt, *rt2;
 	struct flowi4 fl4_dec;
 	int err;
@@ -479,7 +497,8 @@ static struct rtable *icmp_route_lookup(struct net *net,
 	fl4->flowi4_proto = IPPROTO_ICMP;
 	fl4->fl4_icmp_type = type;
 	fl4->fl4_icmp_code = code;
-	fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
+	route_lookup_dev = icmp_get_route_lookup_dev(skb_in);
+	fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);
 
 	security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
 	rt = ip_route_output_key_hash(net, fl4, skb_in);
@@ -503,7 +522,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
 	if (err)
 		goto relookup_failed;
 
-	if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev,
+	if (inet_addr_type_dev_table(net, route_lookup_dev,
 				     fl4_dec.saddr) == RTN_LOCAL) {
 		rt2 = __ip_route_output_key(net, &fl4_dec);
 		if (IS_ERR(rt2))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/3] ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2)
  2020-09-25 20:04 [RFC PATCH 0/3] l3mdev icmp error route lookup fixes Mathieu Desnoyers
  2020-09-25 20:04 ` [RFC PATCH 1/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) Mathieu Desnoyers
@ 2020-09-25 20:04 ` Mathieu Desnoyers
  2020-09-25 20:04 ` [RFC PATCH 3/3] selftests: Add VRF route leaking tests Mathieu Desnoyers
  2020-10-05 16:30 ` [RFC PATCH 0/3] l3mdev icmp error route lookup fixes David Ahern
  3 siblings, 0 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-09-25 20:04 UTC (permalink / raw)
  To: David Ahern, David S . Miller, netdev, Michael Jeanson
  Cc: linux-kernel, Mathieu Desnoyers

As per RFC4443, the destination address field for ICMPv6 error messages
is copied from the source address field of the invoking packet.

In configurations with Virtual Routing and Forwarding tables, looking up
which routing table to use for sending ICMPv6 error messages is
currently done by using the destination net_device.

If the source and destination interfaces are within separate VRFs, or
one in the global routing table and the other in a VRF, looking up the
source address of the invoking packet in the destination interface's
routing table will fail if the destination interface's routing table
contains no route to the invoking packet's source address.

One observable effect of this issue is that traceroute6 does not work in
the following cases:

- Route leaking between global routing table and VRF
- Route leaking between VRFs

Use the source device routing table when sending ICMPv6 error
messages.

[ In the context of ipv4, it has been pointed out that a similar issue
  may exist with ICMP errors triggered when forwarding between network
  namespaces. It would be worthwhile to investigate whether ipv6 has
  similar issues, but is outside of the scope of this investigation. ]

[ Testing shows that similar issues exist with ipv6 unreachable /
  fragmentation needed messages.  However, investigation of this
  additional failure mode is beyond this investigation's scope. ]

Link: https://tools.ietf.org/html/rfc4443
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
---
Changes since v1:
- Introduce icmp6_get_route_lookup_dev.
- Use skb->dev for routing table lookup, because it is guaranteed to be
  non-NULL.
---
 net/ipv6/icmp.c       | 7 +++++--
 net/ipv6/ip6_output.c | 2 --
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index a4e4912ad607..91209a2760aa 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -501,8 +501,11 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	if (__ipv6_addr_needs_scope_id(addr_type)) {
 		iif = icmp6_iif(skb);
 	} else {
-		dst = skb_dst(skb);
-		iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev);
+		/*
+		 * The source device is used for looking up which routing table
+		 * to use for sending an ICMP error.
+		 */
+		iif = l3mdev_master_ifindex(skb->dev);
 	}
 
 	/*
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c78e67d7747f..cd623068de53 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -468,8 +468,6 @@ int ip6_forward(struct sk_buff *skb)
 	 *	check and decrement ttl
 	 */
 	if (hdr->hop_limit <= 1) {
-		/* Force OUTPUT device used as source address */
-		skb->dev = dst->dev;
 		icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0);
 		__IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/3] selftests: Add VRF route leaking tests
  2020-09-25 20:04 [RFC PATCH 0/3] l3mdev icmp error route lookup fixes Mathieu Desnoyers
  2020-09-25 20:04 ` [RFC PATCH 1/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) Mathieu Desnoyers
  2020-09-25 20:04 ` [RFC PATCH 2/3] ipv6/icmp: " Mathieu Desnoyers
@ 2020-09-25 20:04 ` Mathieu Desnoyers
  2020-10-05 16:30 ` [RFC PATCH 0/3] l3mdev icmp error route lookup fixes David Ahern
  3 siblings, 0 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-09-25 20:04 UTC (permalink / raw)
  To: David Ahern, David S . Miller, netdev, Michael Jeanson; +Cc: linux-kernel

From: Michael Jeanson <mjeanson@efficios.com>

The objective of the tests is to check that ICMP errors generated while
crossing between VRFs are properly routed back to the source host.

The first ttl test sends a ping with a ttl of 1 from h1 to h2 and parses the
output of the command to check that a ttl expired error is received.

The second ttl test runs traceroute from h1 to h2 and parses the output to
check for a hop on r1.

The mtu test sends a ping with a payload of 1450 from h1 to h2, through
r1 which has an interface with a mtu of 1400 and parses the output of the
command to check that a fragmentation needed error is received.

[ The IPv6 MTU test still fails with the symmetric routing setup and
  I'm not sure why. ]

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org

---
Changes since v2:
- Renamed test script to vrf_route_leaking.sh
- Added a default symmetric routing topology
- Documented the asymmetric routing topology
Changes since v1:
- Formating and indentation fixes
- Added '-t' to getopts
- Reworked verbose output of grep'd commands with a new function
- Expanded ip command names
- Added fragmentation tests
---
 tools/testing/selftests/net/Makefile          |   1 +
 .../selftests/net/vrf_route_leaking.sh        | 626 ++++++++++++++++++
 2 files changed, 627 insertions(+)
 create mode 100755 tools/testing/selftests/net/vrf_route_leaking.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 9491bbaa0831..3e7fb1e70c77 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -19,6 +19,7 @@ TEST_PROGS += txtimestamp.sh
 TEST_PROGS += vrf-xfrm-tests.sh
 TEST_PROGS += rxtimestamp.sh
 TEST_PROGS += devlink_port_split.py
+TEST_PROGS += vrf_route_leaking.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket nettest
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
diff --git a/tools/testing/selftests/net/vrf_route_leaking.sh b/tools/testing/selftests/net/vrf_route_leaking.sh
new file mode 100755
index 000000000000..23cf924754a5
--- /dev/null
+++ b/tools/testing/selftests/net/vrf_route_leaking.sh
@@ -0,0 +1,626 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2019 David Ahern <dsahern@gmail.com>. All rights reserved.
+# Copyright (c) 2020 Michael Jeanson <mjeanson@efficios.com>. All rights reserved.
+#
+# Requires CONFIG_NET_VRF, CONFIG_VETH, CONFIG_BRIDGE and CONFIG_NET_NS.
+#
+#
+# Symmetric routing topology
+#
+#                     blue         red
+# +----+              .253 +----+ .253              +----+
+# | h1 |-------------------| r1 |-------------------| h2 |
+# +----+ .1                +----+                .2 +----+
+#         172.16.1/24                  172.16.2/24
+#    2001:db8:16:1/64                  2001:db8:16:2/64
+#
+#
+# Route from h1 to h2 and back goes through r1, incoming vrf blue has a route
+# to the outgoing vrf red for the n2 network and red has a route back to n1.
+# The red VRF interface has a MTU of 1400.
+#
+# The first test sends a ping with a ttl of 1 from h1 to h2 and parses the
+# output of the command to check that a ttl expired error is received.
+#
+# The second test runs traceroute from h1 to h2 and parses the output to check
+# for a hop on r1.
+#
+# The third test sends a ping with a packet size of 1450 from h1 to h2 and
+# parses the output of the command to check that a fragmentation error is
+# received.
+#
+#
+# Asymmetric routing topology
+#
+# This topology represents a customer setup where the issue with icmp errors
+# and VRF route leaking was initialy reported. The MTU test isn't done here
+# because of the lack of a return route in the red VRF.
+#
+#                     blue         red
+#                     .253 +----+ .253
+#                     +----| r1 |----+
+#                     |    +----+    |
+# +----+              |              |              +----+
+# | h1 |--------------+              +--------------| h2 |
+# +----+ .1           |              |           .2 +----+
+#         172.16.1/24 |    +----+    | 172.16.2/24
+#    2001:db8:16:1/64 +----| r2 |----+ 2001:db8:16:2/64
+#                     .254 +----+ .254
+#
+#
+# Route from h1 to h2 goes through r1, incoming vrf blue has a route to the
+# outgoing vrf red for the n2 network but red doesn't have a route back to n1.
+# Route from h2 to h1 goes through r2.
+#
+# The objective is to check that the incoming vrf routing table is selected
+# to send an ICMP error back to the source when the ttl of a packet reaches 1
+# while it is forwarded between different vrfs.
+
+VERBOSE=0
+PAUSE_ON_FAIL=no
+DEFAULT_TTYPE=sym
+
+H1_N1=172.16.1.0/24
+H1_N1_6=2001:db8:16:1::/64
+
+H1_N1_IP=172.16.1.1
+R1_N1_IP=172.16.1.253
+R2_N1_IP=172.16.1.254
+
+H1_N1_IP6=2001:db8:16:1::1
+R1_N1_IP6=2001:db8:16:1::253
+R2_N1_IP6=2001:db8:16:1::254
+
+H2_N2=172.16.2.0/24
+H2_N2_6=2001:db8:16:2::/64
+
+H2_N2_IP=172.16.2.2
+R1_N2_IP=172.16.2.253
+R2_N2_IP=172.16.2.254
+
+H2_N2_IP6=2001:db8:16:2::2
+R1_N2_IP6=2001:db8:16:2::253
+R2_N2_IP6=2001:db8:16:2::254
+
+################################################################################
+# helpers
+
+log_section()
+{
+	echo
+	echo "###########################################################################"
+	echo "$*"
+	echo "###########################################################################"
+	echo
+}
+
+log_test()
+{
+	local rc=$1
+	local expected=$2
+	local msg="$3"
+
+	if [ "${rc}" -eq "${expected}" ]; then
+		printf "TEST: %-60s  [ OK ]\n" "${msg}"
+		nsuccess=$((nsuccess+1))
+	else
+		ret=1
+		nfail=$((nfail+1))
+		printf "TEST: %-60s  [FAIL]\n" "${msg}"
+		if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+			echo
+			echo "hit enter to continue, 'q' to quit"
+			read -r a
+			[ "$a" = "q" ] && exit 1
+		fi
+	fi
+}
+
+run_cmd()
+{
+	local cmd="$*"
+	local out
+	local rc
+
+	if [ "$VERBOSE" = "1" ]; then
+		echo "COMMAND: $cmd"
+	fi
+
+	# shellcheck disable=SC2086
+	out=$(eval $cmd 2>&1)
+	rc=$?
+	if [ "$VERBOSE" = "1" ] && [ -n "$out" ]; then
+		echo "$out"
+	fi
+
+	[ "$VERBOSE" = "1" ] && echo
+
+	return $rc
+}
+
+run_cmd_grep()
+{
+	local grep_pattern="$1"
+	shift
+	local cmd="$*"
+	local out
+	local rc
+
+	if [ "$VERBOSE" = "1" ]; then
+		echo "COMMAND: $cmd"
+	fi
+
+	# shellcheck disable=SC2086
+	out=$(eval $cmd 2>&1)
+	if [ "$VERBOSE" = "1" ] && [ -n "$out" ]; then
+		echo "$out"
+	fi
+
+	echo "$out" | grep -q "$grep_pattern"
+	rc=$?
+
+	[ "$VERBOSE" = "1" ] && echo
+
+	return $rc
+}
+
+################################################################################
+# setup and teardown
+
+cleanup()
+{
+	local ns
+
+	for ns in h1 h2 r1 r2; do
+		ip netns del $ns 2>/dev/null
+	done
+}
+
+setup_vrf()
+{
+	local ns=$1
+
+	ip -netns "${ns}" rule del pref 0
+	ip -netns "${ns}" rule add pref 32765 from all lookup local
+	ip -netns "${ns}" -6 rule del pref 0
+	ip -netns "${ns}" -6 rule add pref 32765 from all lookup local
+}
+
+create_vrf()
+{
+	local ns=$1
+	local vrf=$2
+	local table=$3
+
+	ip -netns "${ns}" link add "${vrf}" type vrf table "${table}"
+	ip -netns "${ns}" link set "${vrf}" up
+	ip -netns "${ns}" route add vrf "${vrf}" unreachable default metric 8192
+	ip -netns "${ns}" -6 route add vrf "${vrf}" unreachable default metric 8192
+
+	ip -netns "${ns}" addr add 127.0.0.1/8 dev "${vrf}"
+	ip -netns "${ns}" -6 addr add ::1 dev "${vrf}" nodad
+}
+
+setup_sym()
+{
+	local ns
+
+	# make sure we are starting with a clean slate
+	cleanup
+
+	#
+	# create nodes as namespaces
+	#
+	for ns in h1 h2 r1; do
+		ip netns add $ns
+		ip -netns $ns link set lo up
+
+		case "${ns}" in
+		h[12]) ip netns exec $ns sysctl -q -w net.ipv6.conf.all.forwarding=0
+		       ip netns exec $ns sysctl -q -w net.ipv6.conf.all.keep_addr_on_down=1
+			;;
+		r1)    ip netns exec $ns sysctl -q -w net.ipv4.ip_forward=1
+		       ip netns exec $ns sysctl -q -w net.ipv6.conf.all.forwarding=1
+		esac
+	done
+
+	#
+	# create interconnects
+	#
+	ip -netns h1 link add eth0 type veth peer name r1h1
+	ip -netns h1 link set r1h1 netns r1 name eth0 up
+
+	ip -netns h2 link add eth0 type veth peer name r1h2
+	ip -netns h2 link set r1h2 netns r1 name eth1 up
+
+	#
+	# h1
+	#
+	ip -netns h1 addr add dev eth0 ${H1_N1_IP}/24
+	ip -netns h1 -6 addr add dev eth0 ${H1_N1_IP6}/64 nodad
+	ip -netns h1 link set eth0 up
+
+	# h1 to h2 via r1
+	ip -netns h1    route add ${H2_N2} via ${R1_N1_IP} dev eth0
+	ip -netns h1 -6 route add ${H2_N2_6} via "${R1_N1_IP6}" dev eth0
+
+	#
+	# h2
+	#
+	ip -netns h2 addr add dev eth0 ${H2_N2_IP}/24
+	ip -netns h2 -6 addr add dev eth0 ${H2_N2_IP6}/64 nodad
+	ip -netns h2 link set eth0 up
+
+	# h2 to h1 via r1
+	ip -netns h2 route add default via ${R1_N2_IP} dev eth0
+	ip -netns h2 -6 route add default via ${R1_N2_IP6} dev eth0
+
+	#
+	# r1
+	#
+	setup_vrf r1
+	create_vrf r1 blue 1101
+	create_vrf r1 red 1102
+	ip -netns r1 link set mtu 1400 dev eth1
+	ip -netns r1 link set eth0 vrf blue up
+	ip -netns r1 link set eth1 vrf red up
+	ip -netns r1 addr add dev eth0 ${R1_N1_IP}/24
+	ip -netns r1 -6 addr add dev eth0 ${R1_N1_IP6}/64 nodad
+	ip -netns r1 addr add dev eth1 ${R1_N2_IP}/24
+	ip -netns r1 -6 addr add dev eth1 ${R1_N2_IP6}/64 nodad
+
+	# Route leak from blue to red
+	ip -netns r1 route add vrf blue ${H2_N2} dev red
+	ip -netns r1 -6 route add vrf blue ${H2_N2_6} dev red
+
+	# Route leak from red to blue
+	ip -netns r1 route add vrf red ${H1_N1} dev blue
+	ip -netns r1 -6 route add vrf red ${H1_N1_6} dev blue
+
+
+	# Wait for ip config to settle
+	sleep 2
+}
+
+setup_asym()
+{
+	local ns
+
+	# make sure we are starting with a clean slate
+	cleanup
+
+	#
+	# create nodes as namespaces
+	#
+	for ns in h1 h2 r1 r2; do
+		ip netns add $ns
+		ip -netns $ns link set lo up
+
+		case "${ns}" in
+		h[12]) ip netns exec $ns sysctl -q -w net.ipv6.conf.all.forwarding=0
+		       ip netns exec $ns sysctl -q -w net.ipv6.conf.all.keep_addr_on_down=1
+			;;
+		r[12]) ip netns exec $ns sysctl -q -w net.ipv4.ip_forward=1
+		       ip netns exec $ns sysctl -q -w net.ipv6.conf.all.forwarding=1
+		esac
+	done
+
+	#
+	# create interconnects
+	#
+	ip -netns h1 link add eth0 type veth peer name r1h1
+	ip -netns h1 link set r1h1 netns r1 name eth0 up
+
+	ip -netns h1 link add eth1 type veth peer name r2h1
+	ip -netns h1 link set r2h1 netns r2 name eth0 up
+
+	ip -netns h2 link add eth0 type veth peer name r1h2
+	ip -netns h2 link set r1h2 netns r1 name eth1 up
+
+	ip -netns h2 link add eth1 type veth peer name r2h2
+	ip -netns h2 link set r2h2 netns r2 name eth1 up
+
+	#
+	# h1
+	#
+	ip -netns h1 link add br0 type bridge
+	ip -netns h1 link set br0 up
+	ip -netns h1 addr add dev br0 ${H1_N1_IP}/24
+	ip -netns h1 -6 addr add dev br0 ${H1_N1_IP6}/64 nodad
+	ip -netns h1 link set eth0 master br0 up
+	ip -netns h1 link set eth1 master br0 up
+
+	# h1 to h2 via r1
+	ip -netns h1    route add ${H2_N2} via ${R1_N1_IP} dev br0
+	ip -netns h1 -6 route add ${H2_N2_6} via "${R1_N1_IP6}" dev br0
+
+	#
+	# h2
+	#
+	ip -netns h2 link add br0 type bridge
+	ip -netns h2 link set br0 up
+	ip -netns h2 addr add dev br0 ${H2_N2_IP}/24
+	ip -netns h2 -6 addr add dev br0 ${H2_N2_IP6}/64 nodad
+	ip -netns h2 link set eth0 master br0 up
+	ip -netns h2 link set eth1 master br0 up
+
+	# h2 to h1 via r2
+	ip -netns h2 route add default via ${R2_N2_IP} dev br0
+	ip -netns h2 -6 route add default via ${R2_N2_IP6} dev br0
+
+	#
+	# r1
+	#
+	setup_vrf r1
+	create_vrf r1 blue 1101
+	create_vrf r1 red 1102
+	ip -netns r1 link set mtu 1400 dev eth1
+	ip -netns r1 link set eth0 vrf blue up
+	ip -netns r1 link set eth1 vrf red up
+	ip -netns r1 addr add dev eth0 ${R1_N1_IP}/24
+	ip -netns r1 -6 addr add dev eth0 ${R1_N1_IP6}/64 nodad
+	ip -netns r1 addr add dev eth1 ${R1_N2_IP}/24
+	ip -netns r1 -6 addr add dev eth1 ${R1_N2_IP6}/64 nodad
+
+	# Route leak from blue to red
+	ip -netns r1 route add vrf blue ${H2_N2} dev red
+	ip -netns r1 -6 route add vrf blue ${H2_N2_6} dev red
+
+	# No route leak from red to blue
+
+	#
+	# r2
+	#
+	ip -netns r2 addr add dev eth0 ${R2_N1_IP}/24
+	ip -netns r2 -6 addr add dev eth0 ${R2_N1_IP6}/64 nodad
+	ip -netns r2 addr add dev eth1 ${R2_N2_IP}/24
+	ip -netns r2 -6 addr add dev eth1 ${R2_N2_IP6}/64 nodad
+
+	# Wait for ip config to settle
+	sleep 2
+}
+
+check_connectivity()
+{
+	ip netns exec h1 ping -c1 -w1 ${H2_N2_IP} >/dev/null 2>&1
+	log_test $? 0 "Basic IPv4 connectivity"
+	return $?
+}
+
+check_connectivity6()
+{
+	ip netns exec h1 "${ping6}" -c1 -w1 ${H2_N2_IP6} >/dev/null 2>&1
+	log_test $? 0 "Basic IPv6 connectivity"
+	return $?
+}
+
+check_traceroute()
+{
+	if [ ! -x "$(command -v traceroute)" ]; then
+		echo "SKIP: Could not run IPV4 test without traceroute"
+		return 1
+	fi
+}
+
+check_traceroute6()
+{
+	if [ ! -x "$(command -v traceroute6)" ]; then
+		echo "SKIP: Could not run IPV6 test without traceroute6"
+		return 1
+	fi
+}
+
+ipv4_traceroute()
+{
+	local ttype="$1"
+
+	[ "x$ttype" = "x" ] && ttype="$DEFAULT_TTYPE"
+
+	log_section "IPv4 ($ttype route): VRF ICMP error route lookup traceroute"
+
+	check_traceroute || return
+
+	setup_"$ttype"
+
+	check_connectivity || return
+
+	run_cmd_grep "${R1_N1_IP}" ip netns exec h1 traceroute ${H2_N2_IP}
+	log_test $? 0 "Traceroute reports a hop on r1"
+}
+
+ipv4_traceroute_asym()
+{
+	ipv4_traceroute asym
+}
+
+ipv6_traceroute()
+{
+	local ttype="$1"
+
+	[ "x$ttype" = "x" ] && ttype="$DEFAULT_TTYPE"
+
+	log_section "IPv6 ($ttype route): VRF ICMP error route lookup traceroute"
+
+	check_traceroute6 || return
+
+	setup_"$ttype"
+
+	check_connectivity6 || return
+
+	run_cmd_grep "${R1_N1_IP6}" ip netns exec h1 traceroute6 ${H2_N2_IP6}
+	log_test $? 0 "Traceroute6 reports a hop on r1"
+}
+
+ipv6_traceroute_asym()
+{
+	ipv6_traceroute asym
+}
+
+ipv4_ping_ttl()
+{
+	local ttype="$1"
+
+	[ "x$ttype" = "x" ] && ttype="$DEFAULT_TTYPE"
+
+	log_section "IPv4 ($ttype route): VRF ICMP ttl error route lookup ping"
+
+	setup_"$ttype"
+
+	check_connectivity || return
+
+	run_cmd_grep "Time to live exceeded" ip netns exec h1 ping -t1 -c1 -W2 ${H2_N2_IP}
+	log_test $? 0 "Ping received ICMP ttl exceeded"
+}
+
+ipv4_ping_ttl_asym()
+{
+	ipv4_ping_ttl asym
+}
+
+ipv4_ping_frag()
+{
+	local ttype="$1"
+
+	[ "x$ttype" = "x" ] && ttype="$DEFAULT_TTYPE"
+
+	log_section "IPv4 ($ttype route): VRF ICMP fragmentation error route lookup ping"
+
+	setup_"$ttype"
+
+	check_connectivity || return
+
+	run_cmd_grep "Frag needed" ip netns exec h1 ping -s 1450 -Mdo -c1 -W2 ${H2_N2_IP}
+	log_test $? 0 "Ping received ICMP Frag needed"
+}
+
+ipv4_ping_frag_asym()
+{
+	ipv4_ping_frag asym
+}
+
+ipv6_ping_ttl()
+{
+	local ttype="$1"
+
+	[ "x$ttype" = "x" ] && ttype="$DEFAULT_TTYPE"
+
+	log_section "IPv6 ($ttype route): VRF ICMP ttl error route lookup ping"
+
+	setup_"$ttype"
+
+	check_connectivity6 || return
+
+	run_cmd_grep "Time exceeded: Hop limit" ip netns exec h1 "${ping6}" -t1 -c1 -W2 ${H2_N2_IP6}
+	log_test $? 0 "Ping received ICMP Hop limit"
+}
+
+ipv6_ping_ttl_asym()
+{
+	ipv6_ping_ttl asym
+}
+
+ipv6_ping_frag()
+{
+	local ttype="$1"
+
+	[ "x$ttype" = "x" ] && ttype="$DEFAULT_TTYPE"
+
+	log_section "IPv6 ($ttype route): VRF ICMP fragmentation error route lookup ping"
+
+	setup_"$ttype"
+
+	check_connectivity6 || return
+
+	run_cmd_grep "Packet too big" ip netns exec h1 "${ping6}" -s 1450 -Mdo -c1 -W2 ${H2_N2_IP6}
+	log_test $? 0 "Ping received ICMP Packet too big"
+}
+
+ipv6_ping_frag_asym()
+{
+	ipv6_ping_frag asym
+}
+
+################################################################################
+# usage
+
+usage()
+{
+        cat <<EOF
+usage: ${0##*/} OPTS
+
+	-4          Run IPv4 tests only
+	-6          Run IPv6 tests only
+        -t TEST     Run only TEST
+	-p          Pause on fail
+	-v          verbose mode (show commands and output)
+EOF
+}
+
+################################################################################
+# main
+
+# Some systems don't have a ping6 binary anymore
+command -v ping6 > /dev/null 2>&1 && ping6=$(command -v ping6) || ping6=$(command -v ping)
+
+TESTS_IPV4="ipv4_ping_ttl ipv4_traceroute ipv4_ping_frag ipv4_ping_ttl_asym ipv4_traceroute_asym"
+TESTS_IPV6="ipv6_ping_ttl ipv6_traceroute ipv6_ping_frag ipv6_ping_ttl_asym ipv6_traceroute_asym"
+
+ret=0
+nsuccess=0
+nfail=0
+
+while getopts :46t:pvh o
+do
+	case $o in
+		4) TESTS=ipv4;;
+		6) TESTS=ipv6;;
+		t) TESTS=$OPTARG;;
+		p) PAUSE_ON_FAIL=yes;;
+		v) VERBOSE=1;;
+		h) usage; exit 0;;
+		*) usage; exit 1;;
+	esac
+done
+
+#
+# show user test config
+#
+if [ -z "$TESTS" ]; then
+        TESTS="$TESTS_IPV4 $TESTS_IPV6"
+elif [ "$TESTS" = "ipv4" ]; then
+        TESTS="$TESTS_IPV4"
+elif [ "$TESTS" = "ipv6" ]; then
+        TESTS="$TESTS_IPV6"
+fi
+
+for t in $TESTS
+do
+	case $t in
+	ipv4_ping_ttl|ping)              ipv4_ping_ttl;;&
+	ipv4_ping_ttl_asym|ping)         ipv4_ping_ttl_asym;;&
+	ipv4_traceroute|traceroute)      ipv4_traceroute;;&
+	ipv4_traceroute_asym|traceroute) ipv4_traceroute_asym;;&
+	ipv4_ping_frag|ping)             ipv4_ping_frag;;&
+
+	ipv6_ping_ttl|ping)              ipv6_ping_ttl;;&
+	ipv6_ping_ttl_asym|ping)         ipv6_ping_ttl_asym;;&
+	ipv6_traceroute|traceroute)      ipv6_traceroute;;&
+	ipv6_traceroute_asym|traceroute) ipv6_traceroute_asym;;&
+	ipv6_ping_frag|ping)             ipv6_ping_frag;;&
+
+	# setup namespaces and config, but do not run any tests
+	setup_sym|setup)                 setup_sym; exit 0;;
+	setup_asym)                      setup_asym; exit 0;;
+
+	help)                       echo "Test names: $TESTS"; exit 0;;
+	esac
+done
+
+cleanup
+
+printf "\nTests passed: %3d\n" ${nsuccess}
+printf "Tests failed: %3d\n"   ${nfail}
+
+exit $ret
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes
  2020-09-25 20:04 [RFC PATCH 0/3] l3mdev icmp error route lookup fixes Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2020-09-25 20:04 ` [RFC PATCH 3/3] selftests: Add VRF route leaking tests Mathieu Desnoyers
@ 2020-10-05 16:30 ` David Ahern
  2020-10-11 23:56   ` David Ahern
  3 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-10-05 16:30 UTC (permalink / raw)
  To: Mathieu Desnoyers, David Ahern, David S . Miller, netdev,
	Michael Jeanson
  Cc: linux-kernel

On 9/25/20 1:04 PM, Mathieu Desnoyers wrote:
> Hi,
> 
> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
> the route lookup is performed on the right routing table in VRF
> configurations when sending TTL expired icmp errors (useful for
> traceroute).
> 
> It includes tests for both ipv4 and ipv6.
> 
> These fixes address specifically address the code paths involved in
> sending TTL expired icmp errors. As detailed in the individual commit
> messages, those fixes do not address similar icmp errors related to
> network namespaces and unreachable / fragmentation needed messages,
> which appear to use different code paths.
> 
> The main changes since the last round are updates to the selftests.
> 

This looks fine to me. I noticed the IPv6 large packet test case is
failing; the fib6 tracepoint is showing the loopback as the iif which is
wrong:

ping6  8488 [004]   502.015817: fib6:fib6_table_lookup: table 255 oif 0
iif 1 proto 58 ::/0 -> 2001:db8:16:1::1/0 tos 0 scope 0 flags 0 ==> dev
lo gw :: err -113

I will dig into it later this week.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes
  2020-10-05 16:30 ` [RFC PATCH 0/3] l3mdev icmp error route lookup fixes David Ahern
@ 2020-10-11 23:56   ` David Ahern
  2020-10-12 12:57     ` Mathieu Desnoyers
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-10-11 23:56 UTC (permalink / raw)
  To: Mathieu Desnoyers, David Ahern, David S . Miller, netdev,
	Michael Jeanson
  Cc: linux-kernel

On 10/5/20 9:30 AM, David Ahern wrote:
> On 9/25/20 1:04 PM, Mathieu Desnoyers wrote:
>> Hi,
>>
>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
>> the route lookup is performed on the right routing table in VRF
>> configurations when sending TTL expired icmp errors (useful for
>> traceroute).
>>
>> It includes tests for both ipv4 and ipv6.
>>
>> These fixes address specifically address the code paths involved in
>> sending TTL expired icmp errors. As detailed in the individual commit
>> messages, those fixes do not address similar icmp errors related to
>> network namespaces and unreachable / fragmentation needed messages,
>> which appear to use different code paths.
>>
>> The main changes since the last round are updates to the selftests.
>>
> 
> This looks fine to me. I noticed the IPv6 large packet test case is
> failing; the fib6 tracepoint is showing the loopback as the iif which is
> wrong:
> 
> ping6  8488 [004]   502.015817: fib6:fib6_table_lookup: table 255 oif 0
> iif 1 proto 58 ::/0 -> 2001:db8:16:1::1/0 tos 0 scope 0 flags 0 ==> dev
> lo gw :: err -113
> 
> I will dig into it later this week.
> 

I see the problem here -- source address selection is picking ::1. I do
not have a solution to the problem yet, but its resolution is
independent of the change in this set so I think this one is good to go.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes
  2020-10-11 23:56   ` David Ahern
@ 2020-10-12 12:57     ` Mathieu Desnoyers
  2020-10-12 13:45       ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-10-12 12:57 UTC (permalink / raw)
  To: David Ahern
  Cc: David Ahern, David S. Miller, netdev, Michael Jeanson, linux-kernel

----- On Oct 11, 2020, at 7:56 PM, David Ahern dsahern@gmail.com wrote:

> On 10/5/20 9:30 AM, David Ahern wrote:
>> On 9/25/20 1:04 PM, Mathieu Desnoyers wrote:
>>> Hi,
>>>
>>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
>>> the route lookup is performed on the right routing table in VRF
>>> configurations when sending TTL expired icmp errors (useful for
>>> traceroute).
>>>
>>> It includes tests for both ipv4 and ipv6.
>>>
>>> These fixes address specifically address the code paths involved in
>>> sending TTL expired icmp errors. As detailed in the individual commit
>>> messages, those fixes do not address similar icmp errors related to
>>> network namespaces and unreachable / fragmentation needed messages,
>>> which appear to use different code paths.
>>>
>>> The main changes since the last round are updates to the selftests.
>>>
>> 
>> This looks fine to me. I noticed the IPv6 large packet test case is
>> failing; the fib6 tracepoint is showing the loopback as the iif which is
>> wrong:
>> 
>> ping6  8488 [004]   502.015817: fib6:fib6_table_lookup: table 255 oif 0
>> iif 1 proto 58 ::/0 -> 2001:db8:16:1::1/0 tos 0 scope 0 flags 0 ==> dev
>> lo gw :: err -113
>> 
>> I will dig into it later this week.
>> 
> 
> I see the problem here -- source address selection is picking ::1. I do
> not have a solution to the problem yet, but its resolution is
> independent of the change in this set so I think this one is good to go.

OK, do you want to pick up the RFC patch series, or should I re-send it
without RFC tag ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes
  2020-10-12 12:57     ` Mathieu Desnoyers
@ 2020-10-12 13:45       ` David Ahern
  2020-10-12 14:10         ` Mathieu Desnoyers
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2020-10-12 13:45 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David Ahern, David S. Miller, netdev, Michael Jeanson, linux-kernel

On 10/12/20 5:57 AM, Mathieu Desnoyers wrote:
> OK, do you want to pick up the RFC patch series, or should I re-send it
> without RFC tag ?

you need to re-send for Dave or Jakub to pick them up via patchworks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes
  2020-10-12 13:45       ` David Ahern
@ 2020-10-12 14:10         ` Mathieu Desnoyers
  2020-10-12 14:26           ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-10-12 14:10 UTC (permalink / raw)
  To: David Ahern
  Cc: David Ahern, David S. Miller, netdev, Michael Jeanson, linux-kernel

----- On Oct 12, 2020, at 9:45 AM, David Ahern dsahern@gmail.com wrote:

> On 10/12/20 5:57 AM, Mathieu Desnoyers wrote:
>> OK, do you want to pick up the RFC patch series, or should I re-send it
>> without RFC tag ?
> 
> you need to re-send for Dave or Jakub to pick them up via patchworks

OK. Can I have your Acked-by or Reviewed-by for all three patches ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes
  2020-10-12 14:10         ` Mathieu Desnoyers
@ 2020-10-12 14:26           ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2020-10-12 14:26 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David Ahern, David S. Miller, netdev, Michael Jeanson, linux-kernel

On 10/12/20 7:10 AM, Mathieu Desnoyers wrote:
> ----- On Oct 12, 2020, at 9:45 AM, David Ahern dsahern@gmail.com wrote:
> 
>> On 10/12/20 5:57 AM, Mathieu Desnoyers wrote:
>>> OK, do you want to pick up the RFC patch series, or should I re-send it
>>> without RFC tag ?
>>
>> you need to re-send for Dave or Jakub to pick them up via patchworks
> 
> OK. Can I have your Acked-by or Reviewed-by for all three patches ?
> 


sure.
Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-12 14:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 20:04 [RFC PATCH 0/3] l3mdev icmp error route lookup fixes Mathieu Desnoyers
2020-09-25 20:04 ` [RFC PATCH 1/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) Mathieu Desnoyers
2020-09-25 20:04 ` [RFC PATCH 2/3] ipv6/icmp: " Mathieu Desnoyers
2020-09-25 20:04 ` [RFC PATCH 3/3] selftests: Add VRF route leaking tests Mathieu Desnoyers
2020-10-05 16:30 ` [RFC PATCH 0/3] l3mdev icmp error route lookup fixes David Ahern
2020-10-11 23:56   ` David Ahern
2020-10-12 12:57     ` Mathieu Desnoyers
2020-10-12 13:45       ` David Ahern
2020-10-12 14:10         ` Mathieu Desnoyers
2020-10-12 14:26           ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.