All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2 0/2] net: Fix return value of qdisc ingress handling on success
@ 2022-09-25  8:14 Paul Blakey
  2022-09-25  8:14 ` [PATCH net v2 1/2] " Paul Blakey
  2022-09-25  8:14 ` [PATCH net v2 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey
  0 siblings, 2 replies; 8+ messages in thread
From: Paul Blakey @ 2022-09-25  8:14 UTC (permalink / raw)
  To: Daniel Borkmann, Paul Blakey, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni

Fix patch + self-test with the currently broken scenario.

v1->v2:
  Changed blamed commit
  Added self-test

Paul Blakey (2):
  net: Fix return value of qdisc ingress handling on success
  selftests: add selftest for chaining of tc ingress handling to egress

 net/core/dev.c                                |  3 +
 .../net/test_ingress_egress_chaining.sh       | 81 +++++++++++++++++++
 2 files changed, 84 insertions(+)
 create mode 100644 tools/testing/selftests/net/test_ingress_egress_chaining.sh

-- 
2.30.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success
  2022-09-25  8:14 [PATCH net v2 0/2] net: Fix return value of qdisc ingress handling on success Paul Blakey
@ 2022-09-25  8:14 ` Paul Blakey
  2022-09-25 18:00   ` Cong Wang
  2022-09-25  8:14 ` [PATCH net v2 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey
  1 sibling, 1 reply; 8+ messages in thread
From: Paul Blakey @ 2022-09-25  8:14 UTC (permalink / raw)
  To: Daniel Borkmann, Paul Blakey, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni

Currently qdisc ingress handling (sch_handle_ingress()) doesn't
set a return value and it is left to the old return value of
the caller (__netif_receive_skb_core()) which is RX drop, so if
the packet is consumed, caller will stop and return this value
as if the packet was dropped.

This causes a problem in the kernel tcp stack when having a
egress tc rule forwarding to a ingress tc rule.
The tcp stack sending packets on the device having the egress rule
will see the packets as not successfully transmitted (although they
actually were), will not advance it's internal state of sent data,
and packets returning on such tcp stream will be dropped by the tcp
stack with reason ack-of-unsent-data. See reproduction in [0] below.

Fix that by setting the return value to RX success if
the packet was handled successfully.

[0] Reproduction steps:
 $ ip link add veth1 type veth peer name peer1
 $ ip link add veth2 type veth peer name peer2
 $ ifconfig peer1 5.5.5.6/24 up
 $ ip netns add ns0
 $ ip link set dev peer2 netns ns0
 $ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
 $ ifconfig veth2 0 up
 $ ifconfig veth1 0 up

 #ingress forwarding veth1 <-> veth2
 $ tc qdisc add dev veth2 ingress
 $ tc qdisc add dev veth1 ingress
 $ tc filter add dev veth2 ingress prio 1 proto all flower \
   action mirred egress redirect dev veth1
 $ tc filter add dev veth1 ingress prio 1 proto all flower \
   action mirred egress redirect dev veth2

 #steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
 $ tc qdisc add dev peer1 clsact
 $ tc filter add dev peer1 egress prio 20 proto ip flower \
   action mirred ingress redirect dev veth1

 #run iperf and see connection not running
 $ iperf3 -s&
 $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1

 #delete egress rule, and run again, now should work
 $ tc filter del dev peer1 egress
 $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1

Fixes: f697c3e8b35c ("[NET]: Avoid unnecessary cloning for ingress filtering")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 56c8b0921c9f..c58ab657b164 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5141,6 +5141,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
 		consume_skb(skb);
+		*ret = NET_RX_SUCCESS;
 		return NULL;
 	case TC_ACT_REDIRECT:
 		/* skb_mac_header check was done by cls/act_bpf, so
@@ -5153,8 +5154,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 			*another = true;
 			break;
 		}
+		*ret = NET_RX_SUCCESS;
 		return NULL;
 	case TC_ACT_CONSUMED:
+		*ret = NET_RX_SUCCESS;
 		return NULL;
 	default:
 		break;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net v2 2/2] selftests: add selftest for chaining of tc ingress handling to egress
  2022-09-25  8:14 [PATCH net v2 0/2] net: Fix return value of qdisc ingress handling on success Paul Blakey
  2022-09-25  8:14 ` [PATCH net v2 1/2] " Paul Blakey
@ 2022-09-25  8:14 ` Paul Blakey
  1 sibling, 0 replies; 8+ messages in thread
From: Paul Blakey @ 2022-09-25  8:14 UTC (permalink / raw)
  To: Daniel Borkmann, Paul Blakey, Vlad Buslov, Oz Shlomo, Roi Dayan,
	netdev, Saeed Mahameed
  Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni

This test runs a simple ingress tc setup between two veth pairs,
then adds a egress->ingress rule to test the chaining of tc ingress
pipeline to tc egress piepline.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 .../net/test_ingress_egress_chaining.sh       | 81 +++++++++++++++++++
 1 file changed, 81 insertions(+)
 create mode 100644 tools/testing/selftests/net/test_ingress_egress_chaining.sh

diff --git a/tools/testing/selftests/net/test_ingress_egress_chaining.sh b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
new file mode 100644
index 000000000000..4775f5657e68
--- /dev/null
+++ b/tools/testing/selftests/net/test_ingress_egress_chaining.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test runs a simple ingress tc setup between two veth pairs,
+# and chains a single egress rule to test ingress chaining to egress.
+#
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit $ksft_skip
+fi
+
+if [ ! -x "$(command -v iperf)" ]; then
+	echo "SKIP: Could not run test without iperf tool"
+	exit $ksft_skip
+fi
+
+needed_mods="act_mirred cls_flower sch_ingress"
+for mod in $needed_mods; do
+	modinfo $mod &>/dev/null || { echo "SKIP: Need act_mirred module"; exit $ksft_skip; }
+done
+
+ns="ns$((RANDOM%899+100))"
+veth1="veth1$((RANDOM%899+100))"
+veth2="veth2$((RANDOM%899+100))"
+peer1="peer1$((RANDOM%899+100))"
+peer2="peer2$((RANDOM%899+100))"
+
+function fail() {
+	echo "FAIL: $@" >> /dev/stderr
+	exit 1
+}
+
+function cleanup() {
+	killall -q -9 iperf
+	ip link del $veth1 &> /dev/null
+	ip link del $veth2 &> /dev/null
+	ip netns del $ns &> /dev/null
+}
+trap cleanup EXIT
+
+function config() {
+	echo "Setup veth pairs [$veth1, $peer1], and veth pair [$veth2, $peer2]"
+	ip link add $veth1 type veth peer name $peer1
+	ip link add $veth2 type veth peer name $peer2
+	ifconfig $peer1 5.5.5.6/24 up
+	ip netns add $ns
+	ip link set dev $peer2 netns $ns
+	ip netns exec $ns ifconfig $peer2 5.5.5.5/24 up
+	ifconfig $veth2 0 up
+	ifconfig $veth1 0 up
+
+	echo "Add tc filter ingress->egress forwarding $veth1 <-> $veth2"
+	tc qdisc add dev $veth2 ingress
+	tc qdisc add dev $veth1 ingress
+	tc filter add dev $veth2 ingress prio 1 proto all flower \
+		action mirred egress redirect dev $veth1
+	tc filter add dev $veth1 ingress prio 1 proto all flower \
+		action mirred egress redirect dev $veth2
+
+	echo "Add tc filter egress->ingress forwarding $peer1 -> $veth1, bypassing the veth pipe"
+	tc qdisc add dev $peer1 clsact
+	tc filter add dev $peer1 egress prio 20 proto ip flower \
+		action mirred ingress redirect dev $veth1
+}
+
+function test_run() {
+	echo "Run iperf"
+	iperf -s -D
+	timeout -k 2 10 ip netns exec $ns iperf -c 5.5.5.6 -i 1 -t 2 || fail "iperf failed"
+	echo "Test passed"
+}
+
+config
+test_run
+trap - EXIT
+cleanup
+
+
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success
  2022-09-25  8:14 ` [PATCH net v2 1/2] " Paul Blakey
@ 2022-09-25 18:00   ` Cong Wang
  2022-09-28  7:55     ` Paul Blakey
  0 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2022-09-25 18:00 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan, netdev,
	Saeed Mahameed, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni

On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
> Currently qdisc ingress handling (sch_handle_ingress()) doesn't
> set a return value and it is left to the old return value of
> the caller (__netif_receive_skb_core()) which is RX drop, so if
> the packet is consumed, caller will stop and return this value
> as if the packet was dropped.
> 
> This causes a problem in the kernel tcp stack when having a
> egress tc rule forwarding to a ingress tc rule.
> The tcp stack sending packets on the device having the egress rule
> will see the packets as not successfully transmitted (although they
> actually were), will not advance it's internal state of sent data,
> and packets returning on such tcp stream will be dropped by the tcp
> stack with reason ack-of-unsent-data. See reproduction in [0] below.
> 

Hm, but how is this return value propagated to egress? I checked
tcf_mirred_act() code, but don't see how it is even used there.

318         err = tcf_mirred_forward(want_ingress, skb2);
319         if (err) {
320 out:
321                 tcf_action_inc_overlimit_qstats(&m->common);
322                 if (tcf_mirred_is_act_redirect(m_eaction))
323                         retval = TC_ACT_SHOT;
324         }
325         __this_cpu_dec(mirred_rec_level);
326
327         return retval;


What am I missing?

Also, the offending commit is very old and this configuration is not
uncommon at all, how could we even not notice this for such a long time?

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success
  2022-09-25 18:00   ` Cong Wang
@ 2022-09-28  7:55     ` Paul Blakey
  2022-10-01 20:19       ` Cong Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Blakey @ 2022-09-28  7:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan, netdev,
	Saeed Mahameed, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni



On 25/09/2022 21:00, Cong Wang wrote:
> On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
>> Currently qdisc ingress handling (sch_handle_ingress()) doesn't
>> set a return value and it is left to the old return value of
>> the caller (__netif_receive_skb_core()) which is RX drop, so if
>> the packet is consumed, caller will stop and return this value
>> as if the packet was dropped.
>>
>> This causes a problem in the kernel tcp stack when having a
>> egress tc rule forwarding to a ingress tc rule.
>> The tcp stack sending packets on the device having the egress rule
>> will see the packets as not successfully transmitted (although they
>> actually were), will not advance it's internal state of sent data,
>> and packets returning on such tcp stream will be dropped by the tcp
>> stack with reason ack-of-unsent-data. See reproduction in [0] below.
>>
> 
> Hm, but how is this return value propagated to egress? I checked
> tcf_mirred_act() code, but don't see how it is even used there.
> 
> 318         err = tcf_mirred_forward(want_ingress, skb2);
> 319         if (err) {
> 320 out:
> 321                 tcf_action_inc_overlimit_qstats(&m->common);
> 322                 if (tcf_mirred_is_act_redirect(m_eaction))
> 323                         retval = TC_ACT_SHOT;
> 324         }
> 325         __this_cpu_dec(mirred_rec_level);
> 326
> 327         return retval;
> 
> 
> What am I missing?

for the ingress acting act_mirred it will return TC_ACT_CONSUMED above
the code you mentioned (since redirect=1, use_reinsert=1. Although 
TC_ACT_STOLEN which is the retval set for this action, will also act the 
same)


It is propagated as such (TX stack starting from tcp):

__tcp_transmit_skb
ip_queue_xmit
__ip_queue_xmit
ip_local_out
dst_output
ip_output
ip_finish_output
ip_finish_output2
neigh_output
neigh_hh_output
dev_queue_xmit
sch_handle_egress
   tcf_classify
   tcf_mirred_act
   tcf_mirred_forward
   netif_receive_skb
     # here we moved to ingress processing
     netif_receive_skb_internal
     __netif_receive_skb
     __netif_receive_skb_core
       sch_handle_ingress
         tcf_classify
         tcf_mirred_act
           tcf_mirred_forward
           tcf_dev_queue_xmit
           dev_queue_xmit
           # sends packet ...
         return TC_ACT_CONSUMED
       return NULL, and leaves *ret untouched
     return NET_RX_DROP
	...


> 
> Also, the offending commit is very old and this configuration is not
> uncommon at all, how could we even not notice this for such a long time?

I blamed the commit that left the ret value unset for the first time, 
but normally netif_receive_skb return value is ignored (as mentioned in 
the comment above it), this time it isn't ignored because it was chained 
to egress processing of the tcp transmit path. And this specific case 
where I chain tc ingress to tc egress came much later (with the commit
I blamed in v1 of this patch).

> 
> Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success
  2022-09-28  7:55     ` Paul Blakey
@ 2022-10-01 20:19       ` Cong Wang
  2022-10-01 20:33         ` Cong Wang
  2022-10-02  9:30         ` Paul Blakey
  0 siblings, 2 replies; 8+ messages in thread
From: Cong Wang @ 2022-10-01 20:19 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan, netdev,
	Saeed Mahameed, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni

On Wed, Sep 28, 2022 at 10:55:49AM +0300, Paul Blakey wrote:
> 
> 
> On 25/09/2022 21:00, Cong Wang wrote:
> > On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
> > > Currently qdisc ingress handling (sch_handle_ingress()) doesn't
> > > set a return value and it is left to the old return value of
> > > the caller (__netif_receive_skb_core()) which is RX drop, so if
> > > the packet is consumed, caller will stop and return this value
> > > as if the packet was dropped.
> > > 
> > > This causes a problem in the kernel tcp stack when having a
> > > egress tc rule forwarding to a ingress tc rule.
> > > The tcp stack sending packets on the device having the egress rule
> > > will see the packets as not successfully transmitted (although they
> > > actually were), will not advance it's internal state of sent data,
> > > and packets returning on such tcp stream will be dropped by the tcp
> > > stack with reason ack-of-unsent-data. See reproduction in [0] below.
> > > 
> > 
> > Hm, but how is this return value propagated to egress? I checked
> > tcf_mirred_act() code, but don't see how it is even used there.
> > 
> > 318         err = tcf_mirred_forward(want_ingress, skb2);
> > 319         if (err) {
> > 320 out:
> > 321                 tcf_action_inc_overlimit_qstats(&m->common);
> > 322                 if (tcf_mirred_is_act_redirect(m_eaction))
> > 323                         retval = TC_ACT_SHOT;
> > 324         }
> > 325         __this_cpu_dec(mirred_rec_level);
> > 326
> > 327         return retval;
> > 
> > 
> > What am I missing?
> 
> for the ingress acting act_mirred it will return TC_ACT_CONSUMED above
> the code you mentioned (since redirect=1, use_reinsert=1. Although
> TC_ACT_STOLEN which is the retval set for this action, will also act the
> same)
> 
> 
> It is propagated as such (TX stack starting from tcp):
> 

Sorry for my misunderstanding.

I meant to say those TC_ACT_* return value, not NET_RX_*, but I worried
too much here, as mirred lets user specify the return value.

BTW, it seems you at least miss the drop case, which is NET_RX_DROP for
TC_ACT_SHOT at least? Possibly other code paths in sch_handle_ingress()
too.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success
  2022-10-01 20:19       ` Cong Wang
@ 2022-10-01 20:33         ` Cong Wang
  2022-10-02  9:30         ` Paul Blakey
  1 sibling, 0 replies; 8+ messages in thread
From: Cong Wang @ 2022-10-01 20:33 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan, netdev,
	Saeed Mahameed, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni

On Sat, Oct 01, 2022 at 01:19:50PM -0700, Cong Wang wrote:
> On Wed, Sep 28, 2022 at 10:55:49AM +0300, Paul Blakey wrote:
> > 
> > 
> > On 25/09/2022 21:00, Cong Wang wrote:
> > > On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
> > > > Currently qdisc ingress handling (sch_handle_ingress()) doesn't
> > > > set a return value and it is left to the old return value of
> > > > the caller (__netif_receive_skb_core()) which is RX drop, so if
> > > > the packet is consumed, caller will stop and return this value
> > > > as if the packet was dropped.
> > > > 
> > > > This causes a problem in the kernel tcp stack when having a
> > > > egress tc rule forwarding to a ingress tc rule.
> > > > The tcp stack sending packets on the device having the egress rule
> > > > will see the packets as not successfully transmitted (although they
> > > > actually were), will not advance it's internal state of sent data,
> > > > and packets returning on such tcp stream will be dropped by the tcp
> > > > stack with reason ack-of-unsent-data. See reproduction in [0] below.
> > > > 
> > > 
> > > Hm, but how is this return value propagated to egress? I checked
> > > tcf_mirred_act() code, but don't see how it is even used there.
> > > 
> > > 318         err = tcf_mirred_forward(want_ingress, skb2);
> > > 319         if (err) {
> > > 320 out:
> > > 321                 tcf_action_inc_overlimit_qstats(&m->common);
> > > 322                 if (tcf_mirred_is_act_redirect(m_eaction))
> > > 323                         retval = TC_ACT_SHOT;
> > > 324         }
> > > 325         __this_cpu_dec(mirred_rec_level);
> > > 326
> > > 327         return retval;
> > > 
> > > 
> > > What am I missing?
> > 
> > for the ingress acting act_mirred it will return TC_ACT_CONSUMED above
> > the code you mentioned (since redirect=1, use_reinsert=1. Although
> > TC_ACT_STOLEN which is the retval set for this action, will also act the
> > same)
> > 
> > 
> > It is propagated as such (TX stack starting from tcp):
> > 
> 
> Sorry for my misunderstanding.
> 
> I meant to say those TC_ACT_* return value, not NET_RX_*, but I worried
> too much here, as mirred lets user specify the return value.
> 
> BTW, it seems you at least miss the drop case, which is NET_RX_DROP for
> TC_ACT_SHOT at least? Possibly other code paths in sch_handle_ingress()
> too.
> 

I mean:

diff --git a/net/core/dev.c b/net/core/dev.c
index fa53830d0683..d1db8210d671 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5109,6 +5109,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
        struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress);
        struct tcf_result cl_res;

+       *ret = NET_RX_SUCCESS;
        /* If there's at least one ingress present somewhere (so
         * we get here via enabled static key), remaining devices
         * that are not configured with an ingress qdisc will bail
@@ -5136,6 +5137,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
        case TC_ACT_SHOT:
                mini_qdisc_qstats_cpu_drop(miniq);
                kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
+               *ret = NET_RX_DROP;
                return NULL;
        case TC_ACT_STOLEN:
        case TC_ACT_QUEUED:
@@ -5160,6 +5162,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
                break;
        }
 #endif /* CONFIG_NET_CLS_ACT */
+       *ret = NET_RX_SUCCESS;
        return skb;
 }



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success
  2022-10-01 20:19       ` Cong Wang
  2022-10-01 20:33         ` Cong Wang
@ 2022-10-02  9:30         ` Paul Blakey
  1 sibling, 0 replies; 8+ messages in thread
From: Paul Blakey @ 2022-10-02  9:30 UTC (permalink / raw)
  To: Cong Wang
  Cc: Daniel Borkmann, Vlad Buslov, Oz Shlomo, Roi Dayan, netdev,
	Saeed Mahameed, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni



On 01/10/2022 23:19, Cong Wang wrote:
> On Wed, Sep 28, 2022 at 10:55:49AM +0300, Paul Blakey wrote:
>>
>>
>> On 25/09/2022 21:00, Cong Wang wrote:
>>> On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
>>>> Currently qdisc ingress handling (sch_handle_ingress()) doesn't
>>>> set a return value and it is left to the old return value of
>>>> the caller (__netif_receive_skb_core()) which is RX drop, so if
>>>> the packet is consumed, caller will stop and return this value
>>>> as if the packet was dropped.
>>>>
>>>> This causes a problem in the kernel tcp stack when having a
>>>> egress tc rule forwarding to a ingress tc rule.
>>>> The tcp stack sending packets on the device having the egress rule
>>>> will see the packets as not successfully transmitted (although they
>>>> actually were), will not advance it's internal state of sent data,
>>>> and packets returning on such tcp stream will be dropped by the tcp
>>>> stack with reason ack-of-unsent-data. See reproduction in [0] below.
>>>>
>>>
>>> Hm, but how is this return value propagated to egress? I checked
>>> tcf_mirred_act() code, but don't see how it is even used there.
>>>
>>> 318         err = tcf_mirred_forward(want_ingress, skb2);
>>> 319         if (err) {
>>> 320 out:
>>> 321                 tcf_action_inc_overlimit_qstats(&m->common);
>>> 322                 if (tcf_mirred_is_act_redirect(m_eaction))
>>> 323                         retval = TC_ACT_SHOT;
>>> 324         }
>>> 325         __this_cpu_dec(mirred_rec_level);
>>> 326
>>> 327         return retval;
>>>
>>>
>>> What am I missing?
>>
>> for the ingress acting act_mirred it will return TC_ACT_CONSUMED above
>> the code you mentioned (since redirect=1, use_reinsert=1. Although
>> TC_ACT_STOLEN which is the retval set for this action, will also act the
>> same)
>>
>>
>> It is propagated as such (TX stack starting from tcp):
>>
> 
> Sorry for my misunderstanding.
> 
> I meant to say those TC_ACT_* return value, not NET_RX_*, but I worried
> too much here, as mirred lets user specify the return value

Yes TC_ACT_* start at the action mirred case, and end in 
tcf_handle_ingress/egresss switch cases, which then should be converted 
to NET_RX and NET_XMIT if done.

> 
> BTW, it seems you at least miss the drop case, which is NET_RX_DROP for
> TC_ACT_SHOT at least? Possibly other code paths in sch_handle_ingress()
> too.

I'll add the SHOT for v3 as the packet was handled in this case, but I 
should only update ret where the packet/skb was handled, which is where 
we also return NULL, as otherwise rx pipeline should continue and will 
update ret once handled (say in running the rx_handler).

For example, if there are not tc filters (tcf_classify returns 
TC_ACT_UNSPEC) I should not update *ret, and it will continue to the rx 
handler, and if there isn't any, it would return the default ret RX_DROP 
value.


> 
> Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-02  9:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-25  8:14 [PATCH net v2 0/2] net: Fix return value of qdisc ingress handling on success Paul Blakey
2022-09-25  8:14 ` [PATCH net v2 1/2] " Paul Blakey
2022-09-25 18:00   ` Cong Wang
2022-09-28  7:55     ` Paul Blakey
2022-10-01 20:19       ` Cong Wang
2022-10-01 20:33         ` Cong Wang
2022-10-02  9:30         ` Paul Blakey
2022-09-25  8:14 ` [PATCH net v2 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.