All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test()
@ 2021-11-29 22:52 Peilin Ye
  2021-11-30  0:46 ` Peilin Ye
  2021-11-30  0:49 ` [PATCH net v2] " Peilin Ye
  0 siblings, 2 replies; 11+ messages in thread
From: Peilin Ye @ 2021-11-29 22:52 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Shuah Khan
  Cc: Peilin Ye, Cong Wang, Hangbin Liu, David Ahern, netdev,
	linux-kselftest, linux-kernel

From: Peilin Ye <peilin.ye@bytedance.com>

Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
failing.  ping sockets are bound to dummy1 using the "-I" option
(SO_BINDTODEVICE), but socket lookup is failing when receiving ping
replies, since the routing table thinks they belong to dummy0.

For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
raw_sk_bound_dev_eq() check fails.  Similar things happen in
ping_lookup() for SOCK_ICMP sockets.

Fix the tests by binding to dummy0 instead.  Redirect ping requests to
dummy1 before redirecting them again to lo, so that sk->sk_bound_dev_if
agrees with our routing table.

These tests used to pass due to a bug [1] in iputils, where "ping -I"
actually did not bind ICMP message sockets to device.  The bug has been
fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
to the specific device") in 2016, which is why our rp_filter tests
started to fail.  See [2] .

Tested with ping from iputils 20210722-41-gf9fb573:

$ ./fib_tests.sh -t rp_filter

IPv4 rp_filter tests
    TEST: rp_filter passes local packets		[ OK ]
    TEST: rp_filter passes loopback packets		[ OK ]

[1] https://github.com/iputils/iputils/issues/55
[2] https://github.com/iputils/iputils/commit/f455fee41c077d4b700a473b2f5b3487b8febc1d

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: adb701d6cfa4 ("selftests: add a test case for rp_filter")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
 tools/testing/selftests/net/fib_tests.sh | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 5abe92d55b69..b8bceae00f8e 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -453,15 +453,19 @@ fib_rp_filter_test()
 	$NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1
 	$NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1
 
+	$NS_EXEC tc qd add dev dummy0 parent root handle 1: fq_codel
+	$NS_EXEC tc filter add dev dummy0 parent 1: protocol arp basic action mirred egress redirect dev dummy1
+	$NS_EXEC tc filter add dev dummy0 parent 1: protocol ip basic action mirred egress redirect dev dummy1
+
 	$NS_EXEC tc qd add dev dummy1 parent root handle 1: fq_codel
 	$NS_EXEC tc filter add dev dummy1 parent 1: protocol arp basic action mirred egress redirect dev lo
 	$NS_EXEC tc filter add dev dummy1 parent 1: protocol ip basic action mirred egress redirect dev lo
 	set +e
 
-	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1"
+	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 198.51.100.1"
 	log_test $? 0 "rp_filter passes local packets"
 
-	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 127.0.0.1"
+	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 127.0.0.1"
 	log_test $? 0 "rp_filter passes loopback packets"
 
 	cleanup
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test()
  2021-11-29 22:52 [PATCH net] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test() Peilin Ye
@ 2021-11-30  0:46 ` Peilin Ye
  2021-11-30  0:49 ` [PATCH net v2] " Peilin Ye
  1 sibling, 0 replies; 11+ messages in thread
From: Peilin Ye @ 2021-11-30  0:46 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Shuah Khan
  Cc: Peilin Ye, Cong Wang, Hangbin Liu, David Ahern, netdev,
	linux-kselftest, linux-kernel

Hi all,

On Mon, Nov 29, 2021 at 02:52:30PM -0800, Peilin Ye wrote:
> For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
> When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
> is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
> raw_sk_bound_dev_eq() check fails.  Similar things happen in
> ping_lookup() for SOCK_ICMP sockets.
		    ^^^^^^^^^
I actually meant "SOCK_DGRAM".  Will fix in v2 soon.  Sorry about it.

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net v2] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test()
  2021-11-29 22:52 [PATCH net] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test() Peilin Ye
  2021-11-30  0:46 ` Peilin Ye
@ 2021-11-30  0:49 ` Peilin Ye
  2021-11-30  1:16   ` David Ahern
  2021-12-01  0:47   ` [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test() Peilin Ye
  1 sibling, 2 replies; 11+ messages in thread
From: Peilin Ye @ 2021-11-30  0:49 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Shuah Khan
  Cc: Peilin Ye, Cong Wang, Hangbin Liu, David Ahern, netdev,
	linux-kselftest, linux-kernel

From: Peilin Ye <peilin.ye@bytedance.com>

Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
failing.  ping sockets are bound to dummy1 using the "-I" option
(SO_BINDTODEVICE), but socket lookup is failing when receiving ping
replies, since the routing table thinks they belong to dummy0.

For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
raw_sk_bound_dev_eq() check fails.  Similar things happen in
ping_lookup() for SOCK_DGRAM sockets.

Fix the tests by binding to dummy0 instead.  Redirect ping requests to
dummy1 before redirecting them again to lo, so that sk->sk_bound_dev_if
agrees with our routing table.

These tests used to pass due to a bug [1] in iputils, where "ping -I"
actually did not bind ICMP message sockets to device.  The bug has been
fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
to the specific device") in 2016, which is why our rp_filter tests
started to fail.  See [2] .

Tested with ping from iputils 20210722-41-gf9fb573:

$ ./fib_tests.sh -t rp_filter

IPv4 rp_filter tests
    TEST: rp_filter passes local packets		[ OK ]
    TEST: rp_filter passes loopback packets		[ OK ]

[1] https://github.com/iputils/iputils/issues/55
[2] https://github.com/iputils/iputils/commit/f455fee41c077d4b700a473b2f5b3487b8febc1d

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: adb701d6cfa4 ("selftests: add a test case for rp_filter")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
Change in v2:
    - s/SOCK_ICMP/SOCK_DGRAM/ in commit message

 tools/testing/selftests/net/fib_tests.sh | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 5abe92d55b69..b8bceae00f8e 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -453,15 +453,19 @@ fib_rp_filter_test()
 	$NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1
 	$NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1
 
+	$NS_EXEC tc qd add dev dummy0 parent root handle 1: fq_codel
+	$NS_EXEC tc filter add dev dummy0 parent 1: protocol arp basic action mirred egress redirect dev dummy1
+	$NS_EXEC tc filter add dev dummy0 parent 1: protocol ip basic action mirred egress redirect dev dummy1
+
 	$NS_EXEC tc qd add dev dummy1 parent root handle 1: fq_codel
 	$NS_EXEC tc filter add dev dummy1 parent 1: protocol arp basic action mirred egress redirect dev lo
 	$NS_EXEC tc filter add dev dummy1 parent 1: protocol ip basic action mirred egress redirect dev lo
 	set +e
 
-	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1"
+	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 198.51.100.1"
 	log_test $? 0 "rp_filter passes local packets"
 
-	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 127.0.0.1"
+	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 127.0.0.1"
 	log_test $? 0 "rp_filter passes loopback packets"
 
 	cleanup
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test()
  2021-11-30  0:49 ` [PATCH net v2] " Peilin Ye
@ 2021-11-30  1:16   ` David Ahern
  2021-11-30  5:13     ` Peilin Ye
  2021-12-01  0:47   ` [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test() Peilin Ye
  1 sibling, 1 reply; 11+ messages in thread
From: David Ahern @ 2021-11-30  1:16 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Jakub Kicinski, Shuah Khan
  Cc: Peilin Ye, Cong Wang, Hangbin Liu, netdev, linux-kselftest, linux-kernel

On 11/29/21 5:49 PM, Peilin Ye wrote:
> diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
> index 5abe92d55b69..b8bceae00f8e 100755
> --- a/tools/testing/selftests/net/fib_tests.sh
> +++ b/tools/testing/selftests/net/fib_tests.sh
> @@ -453,15 +453,19 @@ fib_rp_filter_test()
>  	$NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1
>  	$NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1
>  
> +	$NS_EXEC tc qd add dev dummy0 parent root handle 1: fq_codel
> +	$NS_EXEC tc filter add dev dummy0 parent 1: protocol arp basic action mirred egress redirect dev dummy1
> +	$NS_EXEC tc filter add dev dummy0 parent 1: protocol ip basic action mirred egress redirect dev dummy1
> +
>  	$NS_EXEC tc qd add dev dummy1 parent root handle 1: fq_codel
>  	$NS_EXEC tc filter add dev dummy1 parent 1: protocol arp basic action mirred egress redirect dev lo
>  	$NS_EXEC tc filter add dev dummy1 parent 1: protocol ip basic action mirred egress redirect dev lo
>  	set +e
>  
> -	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1"
> +	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 198.51.100.1"
>  	log_test $? 0 "rp_filter passes local packets"
>  
> -	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 127.0.0.1"
> +	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 127.0.0.1"
>  	log_test $? 0 "rp_filter passes loopback packets"
>  
>  	cleanup
> 

confused by the point of this test if you are going to change dummy1 to
dummy0. dummy0 has 198.51.100.1 assigned to it, so the ping should
always work.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test()
  2021-11-30  1:16   ` David Ahern
@ 2021-11-30  5:13     ` Peilin Ye
  0 siblings, 0 replies; 11+ messages in thread
From: Peilin Ye @ 2021-11-30  5:13 UTC (permalink / raw)
  To: David Ahern
  Cc: David S. Miller, Jakub Kicinski, Shuah Khan, Peilin Ye,
	Cong Wang, Hangbin Liu, netdev, linux-kselftest, linux-kernel

Hi David,

On Mon, Nov 29, 2021 at 06:16:48PM -0700, David Ahern wrote:
> On 11/29/21 5:49 PM, Peilin Ye wrote:
> > -	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1"
> > +	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 198.51.100.1"
> >  	log_test $? 0 "rp_filter passes local packets"
> 
> confused by the point of this test if you are going to change dummy1 to
> dummy0. dummy0 has 198.51.100.1 assigned to it, so the ping should
> always work.

Ah...  Thanks for pointing this out, I'll try to figure out a different
way to test it in v3.

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test()
  2021-11-30  0:49 ` [PATCH net v2] " Peilin Ye
  2021-11-30  1:16   ` David Ahern
@ 2021-12-01  0:47   ` Peilin Ye
  2021-12-01 18:00     ` David Ahern
                       ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Peilin Ye @ 2021-12-01  0:47 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Shuah Khan
  Cc: Peilin Ye, Cong Wang, Hangbin Liu, David Ahern, netdev,
	linux-kselftest, linux-kernel, Peilin Ye

From: Peilin Ye <peilin.ye@bytedance.com>

Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
failing.  ping sockets are bound to dummy1 using the "-I" option
(SO_BINDTODEVICE), but socket lookup is failing when receiving ping
replies, since the routing table thinks they belong to dummy0.

For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
raw_sk_bound_dev_eq() check fails.  Similar things happen in
ping_lookup() for SOCK_DGRAM sockets.

These tests used to pass due to a bug [1] in iputils, where "ping -I"
actually did not bind ICMP message sockets to device.  The bug has been
fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
to the specific device") in 2016, which is why our rp_filter tests
started to fail.  See [2] .

Fixing the tests while keeping everything in one netns turns out to be
nontrivial.  Rework the tests and build the following topology:

 ┌─────────────────────────────┐    ┌─────────────────────────────┐
 │  network namespace 1 (ns1)  │    │  network namespace 2 (ns2)  │
 │                             │    │                             │
 │  ┌────┐     ┌─────┐         │    │  ┌─────┐            ┌────┐  │
 │  │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │  │
 │  └────┘     ├─────┴──────┐  │    │  ├─────┴──────┐     └────┘  │
 │             │192.0.2.1/24│  │    │  │192.0.2.1/24│             │
 │             └────────────┘  │    │  └────────────┘             │
 └─────────────────────────────┘    └─────────────────────────────┘

Consider sending an ICMP_ECHO packet A in ns2.  Both source and
destination IP addresses are 192.0.2.1, and we use strict mode rp_filter
in both ns1 and ns2:

  1. A is routed to lo since its destination IP address is one of ns2's
     local addresses (veth2);
  2. A is redirected from lo's egress to veth2's egress using mirred;
  3. A arrives at veth1's ingress in ns1;
  4. A is redirected from veth1's ingress to lo's ingress, again, using
     mirred;
  5. In __fib_validate_source(), fib_info_nh_uses_dev() returns false,
     since A was received on lo, but reverse path lookup says veth1;
  6. However A is not dropped since we have relaxed this check for lo in
     commit 66f8209547cc ("fib: relax source validation check for loopback
     packets");

Making sure A is not dropped here in this corner case is the whole point
of having this test.

  7. As A reaches the ICMP layer, an ICMP_ECHOREPLY packet, B, is
     generated;
  8. Similarly, B is redirected from lo's egress to veth1's egress (in
     ns1), then redirected once again from veth2's ingress to lo's
     ingress (in ns2), using mirred.

Also test "ping 127.0.0.1" from ns2.  It does not trigger the relaxed
check in __fib_validate_source(), but just to make sure the topology
works with loopback addresses.

Tested with ping from iputils 20210722-41-gf9fb573:

$ ./fib_tests.sh -t rp_filter

IPv4 rp_filter tests
    TEST: rp_filter passes local packets		[ OK ]
    TEST: rp_filter passes loopback packets		[ OK ]

[1] https://github.com/iputils/iputils/issues/55
[2] https://github.com/iputils/iputils/commit/f455fee41c077d4b700a473b2f5b3487b8febc1d

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: adb701d6cfa4 ("selftests: add a test case for rp_filter")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
Change in v3:
    - "ping -I dummy0 198.51.100.1" always work (David Ahern
      <dsahern@gmail.com>); use a different approach instead

Change in v2:
    - s/SOCK_ICMP/SOCK_DGRAM/ in commit message

 tools/testing/selftests/net/fib_tests.sh | 59 ++++++++++++++++++++----
 1 file changed, 49 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 5abe92d55b69..996af1ae3d3d 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -444,24 +444,63 @@ fib_rp_filter_test()
 	setup
 
 	set -e
+	ip netns add ns2
+	ip netns set ns2 auto
+
+	ip -netns ns2 link set dev lo up
+
+	$IP link add name veth1 type veth peer name veth2
+	$IP link set dev veth2 netns ns2
+	$IP address add 192.0.2.1/24 dev veth1
+	ip -netns ns2 address add 192.0.2.1/24 dev veth2
+	$IP link set dev veth1 up
+	ip -netns ns2 link set dev veth2 up
+
 	$IP link set dev lo address 52:54:00:6a:c7:5e
-	$IP link set dummy0 address 52:54:00:6a:c7:5e
-	$IP link add dummy1 type dummy
-	$IP link set dummy1 address 52:54:00:6a:c7:5e
-	$IP link set dev dummy1 up
+	$IP link set dev veth1 address 52:54:00:6a:c7:5e
+	ip -netns ns2 link set dev lo address 52:54:00:6a:c7:5e
+	ip -netns ns2 link set dev veth2 address 52:54:00:6a:c7:5e
+
+	# 1. (ns2) redirect lo's egress to veth2's egress
+	ip netns exec ns2 tc qdisc add dev lo parent root handle 1: fq_codel
+	ip netns exec ns2 tc filter add dev lo parent 1: protocol arp basic \
+		action mirred egress redirect dev veth2
+	ip netns exec ns2 tc filter add dev lo parent 1: protocol ip basic \
+		action mirred egress redirect dev veth2
+
+	# 2. (ns1) redirect veth1's ingress to lo's ingress
+	$NS_EXEC tc qdisc add dev veth1 ingress
+	$NS_EXEC tc filter add dev veth1 ingress protocol arp basic \
+		action mirred ingress redirect dev lo
+	$NS_EXEC tc filter add dev veth1 ingress protocol ip basic \
+		action mirred ingress redirect dev lo
+
+	# 3. (ns1) redirect lo's egress to veth1's egress
+	$NS_EXEC tc qdisc add dev lo parent root handle 1: fq_codel
+	$NS_EXEC tc filter add dev lo parent 1: protocol arp basic \
+		action mirred egress redirect dev veth1
+	$NS_EXEC tc filter add dev lo parent 1: protocol ip basic \
+		action mirred egress redirect dev veth1
+
+	# 4. (ns2) redirect veth2's ingress to lo's ingress
+	ip netns exec ns2 tc qdisc add dev veth2 ingress
+	ip netns exec ns2 tc filter add dev veth2 ingress protocol arp basic \
+		action mirred ingress redirect dev lo
+	ip netns exec ns2 tc filter add dev veth2 ingress protocol ip basic \
+		action mirred ingress redirect dev lo
+
 	$NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1
 	$NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1
 	$NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1
-
-	$NS_EXEC tc qd add dev dummy1 parent root handle 1: fq_codel
-	$NS_EXEC tc filter add dev dummy1 parent 1: protocol arp basic action mirred egress redirect dev lo
-	$NS_EXEC tc filter add dev dummy1 parent 1: protocol ip basic action mirred egress redirect dev lo
+	ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=1
+	ip netns exec ns2 sysctl -qw net.ipv4.conf.all.accept_local=1
+	ip netns exec ns2 sysctl -qw net.ipv4.conf.all.route_localnet=1
 	set +e
 
-	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1"
+	run_cmd "ip netns exec ns2 ping -w1 -c1 192.0.2.1"
 	log_test $? 0 "rp_filter passes local packets"
 
-	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 127.0.0.1"
+	run_cmd "ip netns exec ns2 ping -w1 -c1 127.0.0.1"
 	log_test $? 0 "rp_filter passes loopback packets"
 
 	cleanup
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test()
  2021-12-01  0:47   ` [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test() Peilin Ye
@ 2021-12-01 18:00     ` David Ahern
  2021-12-01 19:35       ` Peilin Ye
  2021-12-02 15:50     ` David Ahern
  2021-12-03  2:10     ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2021-12-01 18:00 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Jakub Kicinski, Shuah Khan
  Cc: Peilin Ye, Cong Wang, Hangbin Liu, netdev, linux-kselftest, linux-kernel

On 11/30/21 5:47 PM, Peilin Ye wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
> failing.  ping sockets are bound to dummy1 using the "-I" option
> (SO_BINDTODEVICE), but socket lookup is failing when receiving ping
> replies, since the routing table thinks they belong to dummy0.
> 
> For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
> When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
> is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
> raw_sk_bound_dev_eq() check fails.  Similar things happen in
> ping_lookup() for SOCK_DGRAM sockets.
> 
> These tests used to pass due to a bug [1] in iputils, where "ping -I"
> actually did not bind ICMP message sockets to device.  The bug has been
> fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
> to the specific device") in 2016, which is why our rp_filter tests
> started to fail.  See [2] .
> 
> Fixing the tests while keeping everything in one netns turns out to be
> nontrivial.  Rework the tests and build the following topology:
> 
>  ┌─────────────────────────────┐    ┌─────────────────────────────┐
>  │  network namespace 1 (ns1)  │    │  network namespace 2 (ns2)  │
>  │                             │    │                             │
>  │  ┌────┐     ┌─────┐         │    │  ┌─────┐            ┌────┐  │
>  │  │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │  │
>  │  └────┘     ├─────┴──────┐  │    │  ├─────┴──────┐     └────┘  │
>  │             │192.0.2.1/24│  │    │  │192.0.2.1/24│             │
>  │             └────────────┘  │    │  └────────────┘             │
>  └─────────────────────────────┘    └─────────────────────────────┘
> 

if the intention of the tests is to validate that rp_filter = 1 works as
designed, then I suggest a simpler test. 2 namespaces, 2 veth pairs.
Request goes through one interface, and the response comes in the other
via routing in ns2. ns1 would see the response coming in the 'wrong'
interface and drops it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test()
  2021-12-01 18:00     ` David Ahern
@ 2021-12-01 19:35       ` Peilin Ye
  0 siblings, 0 replies; 11+ messages in thread
From: Peilin Ye @ 2021-12-01 19:35 UTC (permalink / raw)
  To: David Ahern
  Cc: David S. Miller, Jakub Kicinski, Shuah Khan, Peilin Ye,
	Cong Wang, Hangbin Liu, netdev, linux-kselftest, linux-kernel

Hi David,

On Wed, Dec 01, 2021 at 11:00:26AM -0700, David Ahern wrote:
> On 11/30/21 5:47 PM, Peilin Ye wrote:
> >  ┌─────────────────────────────┐    ┌─────────────────────────────┐
> >  │  network namespace 1 (ns1)  │    │  network namespace 2 (ns2)  │
> >  │                             │    │                             │
> >  │  ┌────┐     ┌─────┐         │    │  ┌─────┐            ┌────┐  │
> >  │  │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │  │
> >  │  └────┘     ├─────┴──────┐  │    │  ├─────┴──────┐     └────┘  │
> >  │             │192.0.2.1/24│  │    │  │192.0.2.1/24│             │
> >  │             └────────────┘  │    │  └────────────┘             │
> >  └─────────────────────────────┘    └─────────────────────────────┘
> 
> if the intention of the tests is to validate that rp_filter = 1 works as
> designed, then I suggest a simpler test. 2 namespaces, 2 veth pairs.
> Request goes through one interface, and the response comes in the other
> via routing in ns2. ns1 would see the response coming in the 'wrong'
> interface and drops it.

Quite the opposite - the goal is to make sure that commit 66f8209547cc
("fib: relax source validation check for loopback packets") _prevents_
packets from being dropped when rp_filter = 1 in this corner case, as I
mentioned in the commit message.

In order to test this corner case, I need a packet that:

  1. was received on lo;
  2. has a local source IP address (other than lo's 127.0.0.1/8, which
     is 192.0.2.1 in this case);
  3. has no dst attached to it (in this case since it was redirected
     from veth).

See __fib_validate_source():

+       dev_match = dev_match || (res.type == RTN_LOCAL &&
+                                 dev == net->loopback_dev);
					      ^^^^^^^^^^^^
This relaxed check only applies to lo, and I do need to redirect packets
from veth ingress to lo ingress in order to trigger this.

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test()
  2021-12-01  0:47   ` [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test() Peilin Ye
  2021-12-01 18:00     ` David Ahern
@ 2021-12-02 15:50     ` David Ahern
  2021-12-03  2:10     ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2021-12-02 15:50 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Jakub Kicinski, Shuah Khan
  Cc: Peilin Ye, Cong Wang, Hangbin Liu, David Ahern, netdev,
	linux-kselftest, linux-kernel

On 11/30/21 5:47 PM, Peilin Ye wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
> failing.  ping sockets are bound to dummy1 using the "-I" option
> (SO_BINDTODEVICE), but socket lookup is failing when receiving ping
> replies, since the routing table thinks they belong to dummy0.
> 
> For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
> When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
> is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
> raw_sk_bound_dev_eq() check fails.  Similar things happen in
> ping_lookup() for SOCK_DGRAM sockets.
> 
> These tests used to pass due to a bug [1] in iputils, where "ping -I"
> actually did not bind ICMP message sockets to device.  The bug has been
> fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
> to the specific device") in 2016, which is why our rp_filter tests
> started to fail.  See [2] .
> 
> Fixing the tests while keeping everything in one netns turns out to be
> nontrivial.  Rework the tests and build the following topology:
> 
>  ┌─────────────────────────────┐    ┌─────────────────────────────┐
>  │  network namespace 1 (ns1)  │    │  network namespace 2 (ns2)  │
>  │                             │    │                             │
>  │  ┌────┐     ┌─────┐         │    │  ┌─────┐            ┌────┐  │
>  │  │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │  │
>  │  └────┘     ├─────┴──────┐  │    │  ├─────┴──────┐     └────┘  │
>  │             │192.0.2.1/24│  │    │  │192.0.2.1/24│             │
>  │             └────────────┘  │    │  └────────────┘             │
>  └─────────────────────────────┘    └─────────────────────────────┘
> 
> Consider sending an ICMP_ECHO packet A in ns2.  Both source and
> destination IP addresses are 192.0.2.1, and we use strict mode rp_filter
> in both ns1 and ns2:
> 
>   1. A is routed to lo since its destination IP address is one of ns2's
>      local addresses (veth2);
>   2. A is redirected from lo's egress to veth2's egress using mirred;
>   3. A arrives at veth1's ingress in ns1;
>   4. A is redirected from veth1's ingress to lo's ingress, again, using
>      mirred;
>   5. In __fib_validate_source(), fib_info_nh_uses_dev() returns false,
>      since A was received on lo, but reverse path lookup says veth1;
>   6. However A is not dropped since we have relaxed this check for lo in
>      commit 66f8209547cc ("fib: relax source validation check for loopback
>      packets");
> 
> Making sure A is not dropped here in this corner case is the whole point
> of having this test.
> 
>   7. As A reaches the ICMP layer, an ICMP_ECHOREPLY packet, B, is
>      generated;
>   8. Similarly, B is redirected from lo's egress to veth1's egress (in
>      ns1), then redirected once again from veth2's ingress to lo's
>      ingress (in ns2), using mirred.
> 
> Also test "ping 127.0.0.1" from ns2.  It does not trigger the relaxed
> check in __fib_validate_source(), but just to make sure the topology
> works with loopback addresses.
> 
> Tested with ping from iputils 20210722-41-gf9fb573:
> 
> $ ./fib_tests.sh -t rp_filter
> 
> IPv4 rp_filter tests
>     TEST: rp_filter passes local packets		[ OK ]
>     TEST: rp_filter passes loopback packets		[ OK ]
> 
> [1] https://github.com/iputils/iputils/issues/55
> [2] https://github.com/iputils/iputils/commit/f455fee41c077d4b700a473b2f5b3487b8febc1d
> 
> Reported-by: Hangbin Liu <liuhangbin@gmail.com>
> Fixes: adb701d6cfa4 ("selftests: add a test case for rp_filter")
> Reviewed-by: Cong Wang <cong.wang@bytedance.com>
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
> ---
> Change in v3:
>     - "ping -I dummy0 198.51.100.1" always work (David Ahern
>       <dsahern@gmail.com>); use a different approach instead
> 
> Change in v2:
>     - s/SOCK_ICMP/SOCK_DGRAM/ in commit message
> 
>  tools/testing/selftests/net/fib_tests.sh | 59 ++++++++++++++++++++----
>  1 file changed, 49 insertions(+), 10 deletions(-)
> 

Acked-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test()
  2021-12-01  0:47   ` [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test() Peilin Ye
  2021-12-01 18:00     ` David Ahern
  2021-12-02 15:50     ` David Ahern
@ 2021-12-03  2:10     ` patchwork-bot+netdevbpf
  2021-12-03 13:41       ` Hangbin Liu
  2 siblings, 1 reply; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-12-03  2:10 UTC (permalink / raw)
  To: Peilin Ye
  Cc: davem, kuba, shuah, peilin.ye, xiyou.wangcong, liuhangbin,
	dsahern, netdev, linux-kselftest, linux-kernel

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 30 Nov 2021 16:47:20 -0800 you wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
> failing.  ping sockets are bound to dummy1 using the "-I" option
> (SO_BINDTODEVICE), but socket lookup is failing when receiving ping
> replies, since the routing table thinks they belong to dummy0.
> 
> [...]

Here is the summary with links:
  - [net,v3] selftests/fib_tests: Rework fib_rp_filter_test()
    https://git.kernel.org/netdev/net/c/f6071e5e3961

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test()
  2021-12-03  2:10     ` patchwork-bot+netdevbpf
@ 2021-12-03 13:41       ` Hangbin Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Hangbin Liu @ 2021-12-03 13:41 UTC (permalink / raw)
  To: Peilin Ye; +Cc: peilin.ye, xiyou.wangcong, netdev

On Fri, Dec 03, 2021 at 02:10:09AM +0000, patchwork-bot+netdevbpf@kernel.org wrote:
> Hello:
> 
> This patch was applied to netdev/net.git (master)
> by Jakub Kicinski <kuba@kernel.org>:
> 
> On Tue, 30 Nov 2021 16:47:20 -0800 you wrote:
> > From: Peilin Ye <peilin.ye@bytedance.com>
> > 
> > Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
> > failing.  ping sockets are bound to dummy1 using the "-I" option
> > (SO_BINDTODEVICE), but socket lookup is failing when receiving ping
> > replies, since the routing table thinks they belong to dummy0.
> > 
> > [...]
> 
> Here is the summary with links:
>   - [net,v3] selftests/fib_tests: Rework fib_rp_filter_test()
>     https://git.kernel.org/netdev/net/c/f6071e5e3961
> 
> You are awesome, thank you!

Thanks Peilin for your fixup.

Cheers
Hangbin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-03 13:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-29 22:52 [PATCH net] selftests/fib_tests: ping from dummy0 in fib_rp_filter_test() Peilin Ye
2021-11-30  0:46 ` Peilin Ye
2021-11-30  0:49 ` [PATCH net v2] " Peilin Ye
2021-11-30  1:16   ` David Ahern
2021-11-30  5:13     ` Peilin Ye
2021-12-01  0:47   ` [PATCH net v3] selftests/fib_tests: Rework fib_rp_filter_test() Peilin Ye
2021-12-01 18:00     ` David Ahern
2021-12-01 19:35       ` Peilin Ye
2021-12-02 15:50     ` David Ahern
2021-12-03  2:10     ` patchwork-bot+netdevbpf
2021-12-03 13:41       ` Hangbin Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.