netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3 0/2] net: Fix return value of qdisc ingress handling on success
@ 2022-10-02 10:24 Paul Blakey
  2022-10-02 10:24 ` [PATCH net v3 1/2] " Paul Blakey
  2022-10-02 10:24 ` [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey
  0 siblings, 2 replies; 6+ messages in thread
From: Paul Blakey @ 2022-10-02 10:24 UTC (permalink / raw)
  To: Daniel Borkmann, Paul Blakey, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni

Fix patch + self-test with the currently broken scenario.

v2->v3:
  Added DROP return to TC_ACT_SHOT case (Cong).

v1->v2:
  Changed blamed commit
  Added self-test

Paul Blakey (2):
  net: Fix return value of qdisc ingress handling on success
  selftests: add selftest for chaining of tc ingress handling to egress

 net/core/dev.c                                |  4 +
 .../net/test_ingress_egress_chaining.sh       | 81 +++++++++++++++++++
 2 files changed, 85 insertions(+)
 create mode 100644 tools/testing/selftests/net/test_ingress_egress_chaining.sh

-- 
2.30.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net v3 1/2] net: Fix return value of qdisc ingress handling on success
  2022-10-02 10:24 [PATCH net v3 0/2] net: Fix return value of qdisc ingress handling on success Paul Blakey
@ 2022-10-02 10:24 ` Paul Blakey
  2022-10-02 10:24 ` [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey
  1 sibling, 0 replies; 6+ messages in thread
From: Paul Blakey @ 2022-10-02 10:24 UTC (permalink / raw)
  To: Daniel Borkmann, Paul Blakey, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni

Currently qdisc ingress handling (sch_handle_ingress()) doesn't
set a return value and it is left to the old return value of
the caller (__netif_receive_skb_core()) which is RX drop, so if
the packet is consumed, caller will stop and return this value
as if the packet was dropped.

This causes a problem in the kernel tcp stack when having a
egress tc rule forwarding to a ingress tc rule.
The tcp stack sending packets on the device having the egress rule
will see the packets as not successfully transmitted (although they
actually were), will not advance it's internal state of sent data,
and packets returning on such tcp stream will be dropped by the tcp
stack with reason ack-of-unsent-data. See reproduction in [0] below.

Fix that by setting the return value to RX success if
the packet was handled successfully.

[0] Reproduction steps:
 $ ip link add veth1 type veth peer name peer1
 $ ip link add veth2 type veth peer name peer2
 $ ifconfig peer1 5.5.5.6/24 up
 $ ip netns add ns0
 $ ip link set dev peer2 netns ns0
 $ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
 $ ifconfig veth2 0 up
 $ ifconfig veth1 0 up

 #ingress forwarding veth1 <-> veth2
 $ tc qdisc add dev veth2 ingress
 $ tc qdisc add dev veth1 ingress
 $ tc filter add dev veth2 ingress prio 1 proto all flower \
   action mirred egress redirect dev veth1
 $ tc filter add dev veth1 ingress prio 1 proto all flower \
   action mirred egress redirect dev veth2

 #steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
 $ tc qdisc add dev peer1 clsact
 $ tc filter add dev peer1 egress prio 20 proto ip flower \
   action mirred ingress redirect dev veth1

 #run iperf and see connection not running
 $ iperf3 -s&
 $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1

 #delete egress rule, and run again, now should work
 $ tc filter del dev peer1 egress
 $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1

Fixes: f697c3e8b35c ("[NET]: Avoid unnecessary cloning for ingress filtering")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 net/core/dev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 56c8b0921c9f..2c14f48d2457 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5136,11 +5136,13 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 	case TC_ACT_SHOT:
 		mini_qdisc_qstats_cpu_drop(miniq);
 		kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
+		*ret = NET_RX_DROP;
 		return NULL;
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
 		consume_skb(skb);
+		*ret = NET_RX_SUCCESS;
 		return NULL;
 	case TC_ACT_REDIRECT:
 		/* skb_mac_header check was done by cls/act_bpf, so
@@ -5153,8 +5155,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 			*another = true;
 			break;
 		}
+		*ret = NET_RX_SUCCESS;
 		return NULL;
 	case TC_ACT_CONSUMED:
+		*ret = NET_RX_SUCCESS;
 		return NULL;
 	default:
 		break;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress
  2022-10-02 10:24 [PATCH net v3 0/2] net: Fix return value of qdisc ingress handling on success Paul Blakey
  2022-10-02 10:24 ` [PATCH net v3 1/2] " Paul Blakey
@ 2022-10-02 10:24 ` Paul Blakey
  2022-10-04  9:36   ` Paolo Abeni
  2022-10-04  9:40   ` Paolo Abeni
  1 sibling, 2 replies; 6+ messages in thread
From: Paul Blakey @ 2022-10-02 10:24 UTC (permalink / raw)
  To: Daniel Borkmann, Paul Blakey, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni

This test runs a simple ingress tc setup between two veth pairs,
then adds a egress->ingress rule to test the chaining of tc ingress
pipeline to tc egress piepline.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 .../net/test_ingress_egress_chaining.sh       | 81 +++++++++++++++++++
 1 file changed, 81 insertions(+)
 create mode 100644 tools/testing/selftests/net/test_ingress_egress_chaining.sh

diff --git a/tools/testing/selftests/net/test_ingress_egress_chaining.sh b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
new file mode 100644
index 000000000000..4775f5657e68
--- /dev/null
+++ b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test runs a simple ingress tc setup between two veth pairs,
+# and chains a single egress rule to test ingress chaining to egress.
+#
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit $ksft_skip
+fi
+
+if [ ! -x "$(command -v iperf)" ]; then
+	echo "SKIP: Could not run test without iperf tool"
+	exit $ksft_skip
+fi
+
+needed_mods="act_mirred cls_flower sch_ingress"
+for mod in $needed_mods; do
+	modinfo $mod &>/dev/null || { echo "SKIP: Need act_mirred module"; exit $ksft_skip; }
+done
+
+ns="ns$((RANDOM%899+100))"
+veth1="veth1$((RANDOM%899+100))"
+veth2="veth2$((RANDOM%899+100))"
+peer1="peer1$((RANDOM%899+100))"
+peer2="peer2$((RANDOM%899+100))"
+
+function fail() {
+	echo "FAIL: $@" >> /dev/stderr
+	exit 1
+}
+
+function cleanup() {
+	killall -q -9 iperf
+	ip link del $veth1 &> /dev/null
+	ip link del $veth2 &> /dev/null
+	ip netns del $ns &> /dev/null
+}
+trap cleanup EXIT
+
+function config() {
+	echo "Setup veth pairs [$veth1, $peer1], and veth pair [$veth2, $peer2]"
+	ip link add $veth1 type veth peer name $peer1
+	ip link add $veth2 type veth peer name $peer2
+	ifconfig $peer1 5.5.5.6/24 up
+	ip netns add $ns
+	ip link set dev $peer2 netns $ns
+	ip netns exec $ns ifconfig $peer2 5.5.5.5/24 up
+	ifconfig $veth2 0 up
+	ifconfig $veth1 0 up
+
+	echo "Add tc filter ingress->egress forwarding $veth1 <-> $veth2"
+	tc qdisc add dev $veth2 ingress
+	tc qdisc add dev $veth1 ingress
+	tc filter add dev $veth2 ingress prio 1 proto all flower \
+		action mirred egress redirect dev $veth1
+	tc filter add dev $veth1 ingress prio 1 proto all flower \
+		action mirred egress redirect dev $veth2
+
+	echo "Add tc filter egress->ingress forwarding $peer1 -> $veth1, bypassing the veth pipe"
+	tc qdisc add dev $peer1 clsact
+	tc filter add dev $peer1 egress prio 20 proto ip flower \
+		action mirred ingress redirect dev $veth1
+}
+
+function test_run() {
+	echo "Run iperf"
+	iperf -s -D
+	timeout -k 2 10 ip netns exec $ns iperf -c 5.5.5.6 -i 1 -t 2 || fail "iperf failed"
+	echo "Test passed"
+}
+
+config
+test_run
+trap - EXIT
+cleanup
+
+
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress
  2022-10-02 10:24 ` [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey
@ 2022-10-04  9:36   ` Paolo Abeni
  2022-10-12  8:12     ` Paul Blakey
  2022-10-04  9:40   ` Paolo Abeni
  1 sibling, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2022-10-04  9:36 UTC (permalink / raw)
  To: Paul Blakey, Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski

Hello,

On Sun, 2022-10-02 at 13:24 +0300, Paul Blakey wrote:
> This test runs a simple ingress tc setup between two veth pairs,
> then adds a egress->ingress rule to test the chaining of tc ingress
> pipeline to tc egress piepline.
> 
> Signed-off-by: Paul Blakey <paulb@nvidia.com>
> ---
>  .../net/test_ingress_egress_chaining.sh       | 81 +++++++++++++++++++
>  1 file changed, 81 insertions(+)
>  create mode 100644 tools/testing/selftests/net/test_ingress_egress_chaining.sh
> 
> diff --git a/tools/testing/selftests/net/test_ingress_egress_chaining.sh b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
> new file mode 100644
> index 000000000000..4775f5657e68
> --- /dev/null
> +++ b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
> @@ -0,0 +1,81 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# This test runs a simple ingress tc setup between two veth pairs,
> +# and chains a single egress rule to test ingress chaining to egress.
> +#
> +# Kselftest framework requirement - SKIP code is 4.
> +ksft_skip=4
> +
> +if [ "$(id -u)" -ne 0 ];then
> +	echo "SKIP: Need root privileges"
> +	exit $ksft_skip
> +fi
> +
> +if [ ! -x "$(command -v iperf)" ]; then
> +	echo "SKIP: Could not run test without iperf tool"

You just need to establish a TCP connection towards a given IP, right?

Than you can use the existing self-tests program:

# listener:
./udpgso_bench_rx -t & 

# client:
./udpgso_bench_tx -t -l <transfer time> -4  -D <listener IP>

and avoid dependencies on external tools.

> +	exit $ksft_skip
> +fi
> +
> +needed_mods="act_mirred cls_flower sch_ingress"
> +for mod in $needed_mods; do
> +	modinfo $mod &>/dev/null || { echo "SKIP: Need act_mirred module"; exit $ksft_skip; }
> +done
> +
> +ns="ns$((RANDOM%899+100))"
> +veth1="veth1$((RANDOM%899+100))"
> +veth2="veth2$((RANDOM%899+100))"
> +peer1="peer1$((RANDOM%899+100))"
> +peer2="peer2$((RANDOM%899+100))"
> +
> +function fail() {
> +	echo "FAIL: $@" >> /dev/stderr
> +	exit 1
> +}
> +
> +function cleanup() {
> +	killall -q -9 iperf
> +	ip link del $veth1 &> /dev/null
> +	ip link del $veth2 &> /dev/null
> +	ip netns del $ns &> /dev/null
> +}
> +trap cleanup EXIT
> +
> +function config() {
> +	echo "Setup veth pairs [$veth1, $peer1], and veth pair [$veth2, $peer2]"
> +	ip link add $veth1 type veth peer name $peer1
> +	ip link add $veth2 type veth peer name $peer2
> +	ifconfig $peer1 5.5.5.6/24 up

Please use the modern 'ip addr' syntax. More importantly, it's better
if you move both peers in separate netns, to avoid 'random' self-test
failure due to the specific local routing configuration.

Additionally you could pick addresses from tests blocks (192.0.2.0/24,
198.51.100.0/24, 203.0.113.0/24) or at least from private ranges.

> +	ip netns add $ns
> +	ip link set dev $peer2 netns $ns
> +	ip netns exec $ns ifconfig $peer2 5.5.5.5/24 up
> +	ifconfig $veth2 0 up
> +	ifconfig $veth1 0 up

Please use 'ip link' ...

> +
> +	echo "Add tc filter ingress->egress forwarding $veth1 <-> $veth2"
> +	tc qdisc add dev $veth2 ingress
> +	tc qdisc add dev $veth1 ingress
> +	tc filter add dev $veth2 ingress prio 1 proto all flower \
> +		action mirred egress redirect dev $veth1
> +	tc filter add dev $veth1 ingress prio 1 proto all flower \
> +		action mirred egress redirect dev $veth2
> +
> +	echo "Add tc filter egress->ingress forwarding $peer1 -> $veth1, bypassing the veth pipe"
> +	tc qdisc add dev $peer1 clsact
> +	tc filter add dev $peer1 egress prio 20 proto ip flower \
> +		action mirred ingress redirect dev $veth1
> +}
> +
> +function test_run() {
> +	echo "Run iperf"
> +	iperf -s -D

Depending on the timing, the server can create the listener socket
after that the client tried to connect, causing random failures. 

You should introduce some explicit, small, delay to give the server the
time to start-up, e.g.:

# start server
sleep 0.2
# start client


Thanks!

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress
  2022-10-02 10:24 ` [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey
  2022-10-04  9:36   ` Paolo Abeni
@ 2022-10-04  9:40   ` Paolo Abeni
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Abeni @ 2022-10-04  9:40 UTC (permalink / raw)
  To: Paul Blakey, Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski

On Sun, 2022-10-02 at 13:24 +0300, Paul Blakey wrote:
> This test runs a simple ingress tc setup between two veth pairs,
> then adds a egress->ingress rule to test the chaining of tc ingress
> pipeline to tc egress piepline.
> 
> Signed-off-by: Paul Blakey <paulb@nvidia.com>
> ---

whoops... I forgot an important item: you should add the new test to
the net self-tests Makefile:

TEST_PROGS += test_ingress_egress_chaining.sh

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress
  2022-10-04  9:36   ` Paolo Abeni
@ 2022-10-12  8:12     ` Paul Blakey
  0 siblings, 0 replies; 6+ messages in thread
From: Paul Blakey @ 2022-10-12  8:12 UTC (permalink / raw)
  To: Paolo Abeni, Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski



On 04/10/2022 12:36, Paolo Abeni wrote:
> Hello,
> 
> On Sun, 2022-10-02 at 13:24 +0300, Paul Blakey wrote:
>> This test runs a simple ingress tc setup between two veth pairs,
>> then adds a egress->ingress rule to test the chaining of tc ingress
>> pipeline to tc egress piepline.
>>
>> Signed-off-by: Paul Blakey <paulb@nvidia.com>
>> ---
>>   .../net/test_ingress_egress_chaining.sh       | 81 +++++++++++++++++++
>>   1 file changed, 81 insertions(+)
>>   create mode 100644 tools/testing/selftests/net/test_ingress_egress_chaining.sh
>>
>> diff --git a/tools/testing/selftests/net/test_ingress_egress_chaining.sh b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
>> new file mode 100644
>> index 000000000000..4775f5657e68
>> --- /dev/null
>> +++ b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
>> @@ -0,0 +1,81 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +# This test runs a simple ingress tc setup between two veth pairs,
>> +# and chains a single egress rule to test ingress chaining to egress.
>> +#
>> +# Kselftest framework requirement - SKIP code is 4.
>> +ksft_skip=4
>> +
>> +if [ "$(id -u)" -ne 0 ];then
>> +	echo "SKIP: Need root privileges"
>> +	exit $ksft_skip
>> +fi
>> +
>> +if [ ! -x "$(command -v iperf)" ]; then
>> +	echo "SKIP: Could not run test without iperf tool"
> 
> You just need to establish a TCP connection towards a given IP, right?
> 
> Than you can use the existing self-tests program:
> 
> # listener:
> ./udpgso_bench_rx -t &
> 
> # client:
> ./udpgso_bench_tx -t -l <transfer time> -4  -D <listener IP>
> 
> and avoid dependencies on external tools.
> 
>> +	exit $ksft_skip
>> +fi
>> +
>> +needed_mods="act_mirred cls_flower sch_ingress"
>> +for mod in $needed_mods; do
>> +	modinfo $mod &>/dev/null || { echo "SKIP: Need act_mirred module"; exit $ksft_skip; }
>> +done
>> +
>> +ns="ns$((RANDOM%899+100))"
>> +veth1="veth1$((RANDOM%899+100))"
>> +veth2="veth2$((RANDOM%899+100))"
>> +peer1="peer1$((RANDOM%899+100))"
>> +peer2="peer2$((RANDOM%899+100))"
>> +
>> +function fail() {
>> +	echo "FAIL: $@" >> /dev/stderr
>> +	exit 1
>> +}
>> +
>> +function cleanup() {
>> +	killall -q -9 iperf
>> +	ip link del $veth1 &> /dev/null
>> +	ip link del $veth2 &> /dev/null
>> +	ip netns del $ns &> /dev/null
>> +}
>> +trap cleanup EXIT
>> +
>> +function config() {
>> +	echo "Setup veth pairs [$veth1, $peer1], and veth pair [$veth2, $peer2]"
>> +	ip link add $veth1 type veth peer name $peer1
>> +	ip link add $veth2 type veth peer name $peer2
>> +	ifconfig $peer1 5.5.5.6/24 up
> 
> Please use the modern 'ip addr' syntax. More importantly, it's better
> if you move both peers in separate netns, to avoid 'random' self-test
> failure due to the specific local routing configuration.

I have one of the peers outside a namespace because I needed the egress 
tc rule to see both devices.

Besides that I will change as requested, Thanks.

> 
> Additionally you could pick addresses from tests blocks (192.0.2.0/24,
> 198.51.100.0/24, 203.0.113.0/24) or at least from private ranges.
> 
>> +	ip netns add $ns
>> +	ip link set dev $peer2 netns $ns
>> +	ip netns exec $ns ifconfig $peer2 5.5.5.5/24 up
>> +	ifconfig $veth2 0 up
>> +	ifconfig $veth1 0 up
> 
> Please use 'ip link' ...
> 
>> +
>> +	echo "Add tc filter ingress->egress forwarding $veth1 <-> $veth2"
>> +	tc qdisc add dev $veth2 ingress
>> +	tc qdisc add dev $veth1 ingress
>> +	tc filter add dev $veth2 ingress prio 1 proto all flower \
>> +		action mirred egress redirect dev $veth1
>> +	tc filter add dev $veth1 ingress prio 1 proto all flower \
>> +		action mirred egress redirect dev $veth2
>> +
>> +	echo "Add tc filter egress->ingress forwarding $peer1 -> $veth1, bypassing the veth pipe"
>> +	tc qdisc add dev $peer1 clsact
>> +	tc filter add dev $peer1 egress prio 20 proto ip flower \
>> +		action mirred ingress redirect dev $veth1
>> +}
>> +
>> +function test_run() {
>> +	echo "Run iperf"
>> +	iperf -s -D
> 
> Depending on the timing, the server can create the listener socket
> after that the client tried to connect, causing random failures.
> 
> You should introduce some explicit, small, delay to give the server the
> time to start-up, e.g.:
> 
> # start server
> sleep 0.2
> # start client
> 
> 
> Thanks!
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-12  8:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-02 10:24 [PATCH net v3 0/2] net: Fix return value of qdisc ingress handling on success Paul Blakey
2022-10-02 10:24 ` [PATCH net v3 1/2] " Paul Blakey
2022-10-02 10:24 ` [PATCH net v3 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey
2022-10-04  9:36   ` Paolo Abeni
2022-10-12  8:12     ` Paul Blakey
2022-10-04  9:40   ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).