netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
@ 2022-09-23 15:11 Davide Caratti
  2022-09-25 18:08 ` Cong Wang
  2022-11-18 23:07 ` Peilin Ye
  0 siblings, 2 replies; 17+ messages in thread
From: Davide Caratti @ 2022-09-23 15:11 UTC (permalink / raw)
  To: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Paolo Abeni
  Cc: Marcelo Ricardo Leitner, wizhao, netdev

William reports kernel soft-lockups on some OVS topologies when TC mirred
"egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
mirred action in egress-to-ingress can easily produce a dmesg splat like:

 ============================================
 WARNING: possible recursive locking detected
 6.0.0-rc4+ #511 Not tainted
 --------------------------------------------
 nc/1037 is trying to acquire lock:
 ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160

 but task is already holding lock:
 ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(slock-AF_INET/1);
   lock(slock-AF_INET/1);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 12 locks held by nc/1037:
  #0: ffff950687843d40 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
  #1: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
  #2: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
  #3: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
  #4: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
  #5: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
  #6: ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
  #7: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
  #8: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
  #9: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
  #10: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
  #11: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160

 stack backtrace:
 CPU: 1 PID: 1037 Comm: nc Not tainted 6.0.0-rc4+ #511
 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x44/0x5b
  __lock_acquire.cold.76+0x121/0x2a7
  lock_acquire+0xd5/0x310
  _raw_spin_lock_nested+0x39/0x70
  tcp_v4_rcv+0x1023/0x1160
  ip_protocol_deliver_rcu+0x4d/0x280
  ip_local_deliver_finish+0xac/0x160
  ip_local_deliver+0x71/0x220
  ip_rcv+0x5a/0x200
  __netif_receive_skb_one_core+0x89/0xa0
  netif_receive_skb+0x1c1/0x400
  tcf_mirred_act+0x2a5/0x610 [act_mirred]
  tcf_action_exec+0xb3/0x210
  fl_classify+0x1f7/0x240 [cls_flower]
  tcf_classify+0x7b/0x320
  __dev_queue_xmit+0x3a4/0x11b0
  ip_finish_output2+0x3b8/0xa10
  ip_output+0x7f/0x260
  __ip_queue_xmit+0x1ce/0x610
  __tcp_transmit_skb+0xabc/0xc80
  tcp_rcv_state_process+0x669/0x1290
  tcp_v4_do_rcv+0xd7/0x370
  tcp_v4_rcv+0x10bc/0x1160
  ip_protocol_deliver_rcu+0x4d/0x280
  ip_local_deliver_finish+0xac/0x160
  ip_local_deliver+0x71/0x220
  ip_rcv+0x5a/0x200
  __netif_receive_skb_one_core+0x89/0xa0
  netif_receive_skb+0x1c1/0x400
  tcf_mirred_act+0x2a5/0x610 [act_mirred]
  tcf_action_exec+0xb3/0x210
  fl_classify+0x1f7/0x240 [cls_flower]
  tcf_classify+0x7b/0x320
  __dev_queue_xmit+0x3a4/0x11b0
  ip_finish_output2+0x3b8/0xa10
  ip_output+0x7f/0x260
  __ip_queue_xmit+0x1ce/0x610
  __tcp_transmit_skb+0xabc/0xc80
  tcp_write_xmit+0x229/0x12c0
  __tcp_push_pending_frames+0x32/0xf0
  tcp_sendmsg_locked+0x297/0xe10
  tcp_sendmsg+0x27/0x40
  sock_sendmsg+0x58/0x70
  __sys_sendto+0xfd/0x170
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0033:0x7f11a06fd281
 Code: 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 43 2c 00 41 89 ca 8b 00 85 c0 75 1c 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 41 56 41 89 ce 41 55
 RSP: 002b:00007ffd17958358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000555c6e671610 RCX: 00007f11a06fd281
 RDX: 0000000000002000 RSI: 0000555c6e73a9f0 RDI: 0000000000000003
 RBP: 0000555c6e6433b0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
 R13: 0000555c6e671410 R14: 0000555c6e671410 R15: 0000555c6e6433f8
  </TASK>

that is very similar to those observed by William in his setup.
By using netif_rx() for mirred ingress packets, packets are queued in the
backlog, like it's done in the receive path of "loopback" and "veth", and
the deadlock is not visible anymore. Also add a selftest that can be used
to reproduce the problem / verify the fix.

Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
Reported-by: William Zhao <wizhao@redhat.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/act_mirred.c                        |  2 +-
 .../selftests/net/forwarding/tc_actions.sh    | 29 ++++++++++++++++++-
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index a1d70cf86843..ff965ed2dd9f 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -213,7 +213,7 @@ static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
 	if (!want_ingress)
 		err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
 	else
-		err = netif_receive_skb(skb);
+		err = netif_rx(skb);
 
 	return err;
 }
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index 1e0a62f638fe..6e1b0cc68f7d 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -3,7 +3,8 @@
 
 ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
 	mirred_egress_mirror_test matchall_mirred_egress_mirror_test \
-	gact_trap_test mirred_egress_to_ingress_test"
+	gact_trap_test mirred_egress_to_ingress_test \
+	mirred_egress_to_ingress_tcp_test"
 NUM_NETIFS=4
 source tc_common.sh
 source lib.sh
@@ -198,6 +199,32 @@ mirred_egress_to_ingress_test()
 	log_test "mirred_egress_to_ingress ($tcflags)"
 }
 
+mirred_egress_to_ingress_tcp_test()
+{
+	local tmpfile=$(mktemp) tmpfile1=$(mktemp)
+
+	RET=0
+	dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
+	tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
+		ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
+			action ct commit nat src addr 192.0.2.2 pipe \
+			action ct clear pipe \
+			action ct commit nat dst addr 192.0.2.1 pipe \
+			action ct clear pipe \
+			action skbedit ptype host pipe \
+			action mirred ingress redirect dev $h1
+
+	ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $tmpfile1 &
+	local rpid=$!
+	ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$tmpfile
+	wait -n $rpid
+	cmp -s $tmpfile $tmpfile1
+	check_err $? "server output check failed"
+	tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
+	rm -f $tmpfile $tmpfile1
+	log_test "mirred_egress_to_ingress_tcp"
+}
+
 setup_prepare()
 {
 	h1=${NETIFS[p1]}
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2022-09-23 15:11 [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress Davide Caratti
@ 2022-09-25 18:08 ` Cong Wang
  2022-10-04 17:40   ` Davide Caratti
  2022-11-18 23:07 ` Peilin Ye
  1 sibling, 1 reply; 17+ messages in thread
From: Cong Wang @ 2022-09-25 18:08 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Jamal Hadi Salim, Jiri Pirko, Paolo Abeni,
	Marcelo Ricardo Leitner, wizhao, netdev

On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> William reports kernel soft-lockups on some OVS topologies when TC mirred
> "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> mirred action in egress-to-ingress can easily produce a dmesg splat like:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  6.0.0-rc4+ #511 Not tainted
>  --------------------------------------------
>  nc/1037 is trying to acquire lock:
>  ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> 
>  but task is already holding lock:
>  ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(slock-AF_INET/1);
>    lock(slock-AF_INET/1);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  12 locks held by nc/1037:
>   #0: ffff950687843d40 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
>   #1: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
>   #2: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
>   #3: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
>   #4: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
>   #5: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
>   #6: ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
>   #7: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
>   #8: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
>   #9: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
>   #10: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
>   #11: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
> 
>  stack backtrace:
>  CPU: 1 PID: 1037 Comm: nc Not tainted 6.0.0-rc4+ #511
>  Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x44/0x5b
>   __lock_acquire.cold.76+0x121/0x2a7
>   lock_acquire+0xd5/0x310
>   _raw_spin_lock_nested+0x39/0x70
>   tcp_v4_rcv+0x1023/0x1160
>   ip_protocol_deliver_rcu+0x4d/0x280
>   ip_local_deliver_finish+0xac/0x160
>   ip_local_deliver+0x71/0x220
>   ip_rcv+0x5a/0x200
>   __netif_receive_skb_one_core+0x89/0xa0
>   netif_receive_skb+0x1c1/0x400
>   tcf_mirred_act+0x2a5/0x610 [act_mirred]
>   tcf_action_exec+0xb3/0x210
>   fl_classify+0x1f7/0x240 [cls_flower]
>   tcf_classify+0x7b/0x320
>   __dev_queue_xmit+0x3a4/0x11b0
>   ip_finish_output2+0x3b8/0xa10
>   ip_output+0x7f/0x260
>   __ip_queue_xmit+0x1ce/0x610
>   __tcp_transmit_skb+0xabc/0xc80
>   tcp_rcv_state_process+0x669/0x1290
>   tcp_v4_do_rcv+0xd7/0x370
>   tcp_v4_rcv+0x10bc/0x1160
>   ip_protocol_deliver_rcu+0x4d/0x280
>   ip_local_deliver_finish+0xac/0x160
>   ip_local_deliver+0x71/0x220
>   ip_rcv+0x5a/0x200
>   __netif_receive_skb_one_core+0x89/0xa0
>   netif_receive_skb+0x1c1/0x400
>   tcf_mirred_act+0x2a5/0x610 [act_mirred]
>   tcf_action_exec+0xb3/0x210
>   fl_classify+0x1f7/0x240 [cls_flower]
>   tcf_classify+0x7b/0x320
>   __dev_queue_xmit+0x3a4/0x11b0
>   ip_finish_output2+0x3b8/0xa10
>   ip_output+0x7f/0x260
>   __ip_queue_xmit+0x1ce/0x610
>   __tcp_transmit_skb+0xabc/0xc80
>   tcp_write_xmit+0x229/0x12c0
>   __tcp_push_pending_frames+0x32/0xf0
>   tcp_sendmsg_locked+0x297/0xe10
>   tcp_sendmsg+0x27/0x40
>   sock_sendmsg+0x58/0x70
>   __sys_sendto+0xfd/0x170
>   __x64_sys_sendto+0x24/0x30
>   do_syscall_64+0x3a/0x90
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>  RIP: 0033:0x7f11a06fd281
>  Code: 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 43 2c 00 41 89 ca 8b 00 85 c0 75 1c 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 41 56 41 89 ce 41 55
>  RSP: 002b:00007ffd17958358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>  RAX: ffffffffffffffda RBX: 0000555c6e671610 RCX: 00007f11a06fd281
>  RDX: 0000000000002000 RSI: 0000555c6e73a9f0 RDI: 0000000000000003
>  RBP: 0000555c6e6433b0 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
>  R13: 0000555c6e671410 R14: 0000555c6e671410 R15: 0000555c6e6433f8
>   </TASK>
> 
> that is very similar to those observed by William in his setup.
> By using netif_rx() for mirred ingress packets, packets are queued in the
> backlog, like it's done in the receive path of "loopback" and "veth", and
> the deadlock is not visible anymore. Also add a selftest that can be used
> to reproduce the problem / verify the fix.

Which also means we can no longer know the RX path status any more,
right? I mean if we have filters on ingress, we can't know whether they
drop this packet or not, after this patch? To me, this at least breaks
users' expectation.

BTW, have you thought about solving the above lockdep warning in TCP
layer?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2022-09-25 18:08 ` Cong Wang
@ 2022-10-04 17:40   ` Davide Caratti
  2022-10-16 17:28     ` Cong Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Davide Caratti @ 2022-10-04 17:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, Jiri Pirko, Paolo Abeni,
	Marcelo Ricardo Leitner, wizhao, netdev

hello Cong, thanks for looking at this!

On Sun, Sep 25, 2022 at 11:08:48AM -0700, Cong Wang wrote:
> On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> > mirred action in egress-to-ingress can easily produce a dmesg splat like:
> > 
> >  ============================================
> >  WARNING: possible recursive locking detected

[...]

> >  6.0.0-rc4+ #511 Not tainted
> >  --------------------------------------------
> >  nc/1037 is trying to acquire lock:
> >  ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> > 
> >  but task is already holding lock:
> >  ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160

FTR, this is:

2091         sk_incoming_cpu_update(sk);
2092
2093         bh_lock_sock_nested(sk); <--- the lock reported in the splat
2094         tcp_segs_in(tcp_sk(sk), skb);
2095         ret = 0;
2096         if (!sock_owned_by_user(sk)) {

> BTW, have you thought about solving the above lockdep warning in TCP
> layer?

yes, but that doesn't look like a trivial fix at all - and I doubt it's
worth doing it just to make mirred and TCP "friends". Please note:
on current kernel this doesn't just result in a lockdep warning: using
iperf3 on unpatched kernels it's possible to see a real deadlock, like:

WARRNING: possible circular locking dependency detected
 6.0.0-rc4+ #511 Not tainted
 ------------------------------------------------------
 iperf3/1021 is trying to acquire lock:
 ffff976005c5a630 (slock-AF_INET6/1){+...}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160

 but task is already holding lock:
 ffff97607b06e0b0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160

 which lock already depends on the new lock.


 the existing dependency chain (in reverse order) is:

 -> #1 (slock-AF_INET/1){+.-.}-{2:2}:
        lock_acquire+0xd5/0x310
        _raw_spin_lock_nested+0x39/0x70
        tcp_v4_rcv+0x1023/0x1160
        ip_protocol_deliver_rcu+0x4d/0x280
        ip_local_deliver_finish+0xac/0x160
        ip_local_deliver+0x71/0x220
        ip_rcv+0x5a/0x200
        __netif_receive_skb_one_core+0x89/0xa0
        netif_receive_skb+0x1c1/0x400
        tcf_mirred_act+0x2a5/0x610 [act_mirred]
        tcf_action_exec+0xb3/0x210
        fl_classify+0x1f7/0x240 [cls_flower]
        tcf_classify+0x7b/0x320
        __dev_queue_xmit+0x3a4/0x11b0
        ip_finish_output2+0x3b8/0xa10
        ip_output+0x7f/0x260
        __ip_queue_xmit+0x1ce/0x610
        __tcp_transmit_skb+0xabc/0xc80
        tcp_rcv_established+0x284/0x810
        tcp_v4_do_rcv+0x1f3/0x370
        tcp_v4_rcv+0x10bc/0x1160
        ip_protocol_deliver_rcu+0x4d/0x280
        ip_local_deliver_finish+0xac/0x160
        ip_local_deliver+0x71/0x220
        ip_rcv+0x5a/0x200
        __netif_receive_skb_one_core+0x89/0xa0
        netif_receive_skb+0x1c1/0x400
        tcf_mirred_act+0x2a5/0x610 [act_mirred]
        tcf_action_exec+0xb3/0x210
        fl_classify+0x1f7/0x240 [cls_flower]
        tcf_classify+0x7b/0x320
        __dev_queue_xmit+0x3a4/0x11b0
        ip_finish_output2+0x3b8/0xa10
        ip_output+0x7f/0x260
        __ip_queue_xmit+0x1ce/0x610
        __tcp_transmit_skb+0xabc/0xc80
        tcp_write_xmit+0x229/0x12c0
        __tcp_push_pending_frames+0x32/0xf0
        tcp_sendmsg_locked+0x297/0xe10
        tcp_sendmsg+0x27/0x40
        sock_sendmsg+0x58/0x70
        sock_write_iter+0x9a/0x100
        vfs_write+0x481/0x4f0
        ksys_write+0xc2/0xe0
        do_syscall_64+0x3a/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd

 -> #0 (slock-AF_INET6/1){+...}-{2:2}:
        check_prevs_add+0x185/0xf50
        __lock_acquire+0x11eb/0x1620
        lock_acquire+0xd5/0x310
        _raw_spin_lock_nested+0x39/0x70
        tcp_v4_rcv+0x1023/0x1160
        ip_protocol_deliver_rcu+0x4d/0x280
        ip_local_deliver_finish+0xac/0x160
        ip_local_deliver+0x71/0x220
        ip_rcv+0x5a/0x200
        __netif_receive_skb_one_core+0x89/0xa0
        netif_receive_skb+0x1c1/0x400
        tcf_mirred_act+0x2a5/0x610 [act_mirred]
        tcf_action_exec+0xb3/0x210
        fl_classify+0x1f7/0x240 [cls_flower]
        tcf_classify+0x7b/0x320
        __dev_queue_xmit+0x3a4/0x11b0
        ip_finish_output2+0x3b8/0xa10
        ip_output+0x7f/0x260
        __ip_queue_xmit+0x1ce/0x610
        __tcp_transmit_skb+0xabc/0xc80
        tcp_rcv_established+0x284/0x810
        tcp_v4_do_rcv+0x1f3/0x370
        tcp_v4_rcv+0x10bc/0x1160
        ip_protocol_deliver_rcu+0x4d/0x280
        ip_local_deliver_finish+0xac/0x160
        ip_local_deliver+0x71/0x220
        ip_rcv+0x5a/0x200
        __netif_receive_skb_one_core+0x89/0xa0
        netif_receive_skb+0x1c1/0x400
        tcf_mirred_act+0x2a5/0x610 [act_mirred]
        tcf_action_exec+0xb3/0x210
        fl_classify+0x1f7/0x240 [cls_flower]
        tcf_classify+0x7b/0x320
        __dev_queue_xmit+0x3a4/0x11b0
        ip_finish_output2+0x3b8/0xa10
        ip_output+0x7f/0x260
        __ip_queue_xmit+0x1ce/0x610
        __tcp_transmit_skb+0xabc/0xc80
        tcp_write_xmit+0x229/0x12c0
        __tcp_push_pending_frames+0x32/0xf0
        tcp_sendmsg_locked+0x297/0xe10
        tcp_sendmsg+0x27/0x40
        sock_sendmsg+0x42/0x70
        sock_write_iter+0x9a/0x100
        vfs_write+0x481/0x4f0
        ksys_write+0xc2/0xe0
        do_syscall_64+0x3a/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slock-AF_INET/1);
                                lock(slock-AF_INET6/1);
                                lock(slock-AF_INET/1);
   lock(slock-AF_INET6/1);

  *** DEADLOCK ***

 12 locks held by iperf3/1021:
  #0: ffff976005c5a6c0 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
  #1: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
  #2: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
  #3: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
  #4: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
  #5: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
  #6: ffff97607b06e0b0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
  #7: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
  #8: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
  #9: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
  #10: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
  #11: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160

 [...]

 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [swapper/1:0]

Moreover, even if we improve TCP locking in order to avoid lockups for
this simple topology, I suspect that TCP will experience some packet
losses: when mirred detects 4 nested calls of tcf_mirred_act(), the kernel
will protect against excessive stack growth and drop the skb (that can
also be a full TSO packet). Probably the protocol can recover, but the
performance will be certainly non-optimal.

> Which also means we can no longer know the RX path status any more,
> right? I mean if we have filters on ingress, we can't know whether they
> drop this packet or not, after this patch? To me, this at least breaks
> users' expectation.

Fair point! Then maybe we don't need to change the whole TC mirred ingress:
since the problem only affects egress to ingress, we can preserve the call
to netif_recive_skb() on ingress->ingress, and just use the backlog in the
egress->ingress direction _ that has been broken since the very beginning
and got similar fixes in the past [1]. Something like:

-- >8 --
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -205,12 +205,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
        return err;
 }

-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int tcf_mirred_forward(bool want_ingress, bool at_ingress, struct sk_buff *skb)
 {
        int err;

        if (!want_ingress)
                err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
+       else if (!at_ingress)
+               err = netif_rx(skb);
        else
                err = netif_receive_skb(skb);

@@ -306,7 +308,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
                /* let's the caller reinsert the packet, if possible */
                if (use_reinsert) {
                        res->ingress = want_ingress;
-                       err = tcf_mirred_forward(res->ingress, skb);
+                       err = tcf_mirred_forward(res->ingress, at_ingress, skb);
                        if (err)
                                tcf_action_inc_overlimit_qstats(&m->common);
                        __this_cpu_dec(mirred_rec_level);
@@ -314,7 +316,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
                }
        }

-       err = tcf_mirred_forward(want_ingress, skb2);
+       err = tcf_mirred_forward(want_ingress, at_ingress, skb2);
        if (err) {
 out:
                tcf_action_inc_overlimit_qstats(&m->common);
-- >8 --

WDYT? Any feedback appreciated, thanks!

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f799ada6bf2397c351220088b9b0980125c77280

-- 
davide


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2022-10-04 17:40   ` Davide Caratti
@ 2022-10-16 17:28     ` Cong Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Cong Wang @ 2022-10-16 17:28 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Jamal Hadi Salim, Jiri Pirko, Paolo Abeni,
	Marcelo Ricardo Leitner, wizhao, netdev, peilin.ye

On Tue, Oct 04, 2022 at 07:40:27PM +0200, Davide Caratti wrote:
> hello Cong, thanks for looking at this!
> 
> On Sun, Sep 25, 2022 at 11:08:48AM -0700, Cong Wang wrote:
> > On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> > > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > > "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> > > mirred action in egress-to-ingress can easily produce a dmesg splat like:
> > > 
> > >  ============================================
> > >  WARNING: possible recursive locking detected
> 
> [...]
> 
> > >  6.0.0-rc4+ #511 Not tainted
> > >  --------------------------------------------
> > >  nc/1037 is trying to acquire lock:
> > >  ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> > > 
> > >  but task is already holding lock:
> > >  ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> 
> FTR, this is:

Yeah, Peilin actually looked deeper into this issue. Let's copy him.

> 
> 2091         sk_incoming_cpu_update(sk);
> 2092
> 2093         bh_lock_sock_nested(sk); <--- the lock reported in the splat
> 2094         tcp_segs_in(tcp_sk(sk), skb);
> 2095         ret = 0;
> 2096         if (!sock_owned_by_user(sk)) {
> 
> > BTW, have you thought about solving the above lockdep warning in TCP
> > layer?
> 
> yes, but that doesn't look like a trivial fix at all - and I doubt it's
> worth doing it just to make mirred and TCP "friends". Please note:
> on current kernel this doesn't just result in a lockdep warning: using
> iperf3 on unpatched kernels it's possible to see a real deadlock, like:

I'd say your test case is rare, because I don't think it is trivial for
a TCP socket to send packets to itself.

 
> > Which also means we can no longer know the RX path status any more,
> > right? I mean if we have filters on ingress, we can't know whether they
> > drop this packet or not, after this patch? To me, this at least breaks
> > users' expectation.
> 
> Fair point! Then maybe we don't need to change the whole TC mirred ingress:
> since the problem only affects egress to ingress, we can preserve the call
> to netif_recive_skb() on ingress->ingress, and just use the backlog in the
> egress->ingress direction _ that has been broken since the very beginning
> and got similar fixes in the past [1]. Something like:

Regarless ingress->ingress or egress->ingress, this patch breaks
users' expectation. And, actually egress->ingress is more common than
ingress->ingress, in my experience.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2022-09-23 15:11 [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress Davide Caratti
  2022-09-25 18:08 ` Cong Wang
@ 2022-11-18 23:07 ` Peilin Ye
  1 sibling, 0 replies; 17+ messages in thread
From: Peilin Ye @ 2022-11-18 23:07 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Paolo Abeni,
	Marcelo Ricardo Leitner, wizhao, netdev

Hi all,

On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> +mirred_egress_to_ingress_tcp_test()
> +{
> +	local tmpfile=$(mktemp) tmpfile1=$(mktemp)
> +
> +	RET=0
> +	dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
> +	tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
> +		ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
> +			action ct commit nat src addr 192.0.2.2 pipe \
> +			action ct clear pipe \
> +			action ct commit nat dst addr 192.0.2.1 pipe \
> +			action ct clear pipe \
> +			action skbedit ptype host pipe \
> +			action mirred ingress redirect dev $h1

FWIW, I couldn't reproduce the lockup using this test case (with
forwarding.config.sample), but I got the same lockup in tcp_v4_rcv()
using a different (but probably less realistic) TC filter:

	tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
		ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
			action pedit ex munge ip src set 192.0.2.2 pipe \
			action pedit ex munge ip dst set 192.0.2.1 pipe \
			action skbedit ptype host pipe \
			action mirred ingress redirect dev $h1

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-15  0:31       ` Jakub Kicinski
@ 2024-02-15 17:55         ` Davide Caratti
  0 siblings, 0 replies; 17+ messages in thread
From: Davide Caratti @ 2024-02-15 17:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jamal Hadi Salim, davem, netdev, edumazet, pabeni,
	Marcelo Ricardo Leitner, xiyou.wangcong, jiri, shmulik.ladkani

hi,
On Thu, Feb 15, 2024 at 1:31 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

[...]

> > If I well read, Jakub's patch [1] uses the backlog for egress->ingress
> > regardless of the "nest level": no more calls to netif_receive_skb():
> > It's the same as my initial proposal for fixing the OVS soft-lockup [2],
> > the code is different because now we have tcf_mirred_to_dev().
>
> FWIW feel free to add your Sob or Co-dev+Sob!
> It is very much the same solution as what you posted at some stage.

I'm ok with the current tags! thanks!
-- 
davide


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-14 16:10     ` Davide Caratti
@ 2024-02-15  0:31       ` Jakub Kicinski
  2024-02-15 17:55         ` Davide Caratti
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-15  0:31 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Jamal Hadi Salim, davem, netdev, edumazet, pabeni,
	Marcelo Ricardo Leitner, xiyou.wangcong, jiri, shmulik.ladkani

On Wed, 14 Feb 2024 17:10:37 +0100 Davide Caratti wrote:
> > So tests pass - but on the list i only see one patch and the other is
> > on lore, not sure how to ACK something that is not on email, but FWIW:
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > 
> > The second patch avoids the recursion issue (which was the root cause)
> > and the first patch is really undoing ca22da2fbd693  
> 
> If I well read, Jakub's patch [1] uses the backlog for egress->ingress
> regardless of the "nest level": no more calls to netif_receive_skb():
> It's the same as my initial proposal for fixing the OVS soft-lockup [2],
> the code is different because now we have tcf_mirred_to_dev().

FWIW feel free to add your Sob or Co-dev+Sob!
It is very much the same solution as what you posted at some stage.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-14 15:28   ` Jamal Hadi Salim
@ 2024-02-14 16:10     ` Davide Caratti
  2024-02-15  0:31       ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Davide Caratti @ 2024-02-14 16:10 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni,
	Marcelo Ricardo Leitner, xiyou.wangcong, jiri, shmulik.ladkani

On Wed, Feb 14, 2024 at 10:28:27AM -0500, Jamal Hadi Salim wrote:
> On Wed, Feb 14, 2024 at 10:11 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > > lockdep.

[...]

> > Doing a quick test of this and other patch i saw..
> 
> 
> So tests pass - but on the list i only see one patch and the other is
> on lore, not sure how to ACK something that is not on email, but FWIW:
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> The second patch avoids the recursion issue (which was the root cause)
> and the first patch is really undoing ca22da2fbd693

If I well read, Jakub's patch [1] uses the backlog for egress->ingress
regardless of the "nest level": no more calls to netif_receive_skb():
It's the same as my initial proposal for fixing the OVS soft-lockup [2],
the code is different because now we have tcf_mirred_to_dev().

I'm ok with this solution, and maybe we need to re-think if it's
really a problem to loose the statistics for action subsequent to a
mirred redirect egress->ingress. IMHO it's not a problem to loose them:
if RPS or RX timestamping is enabled on the RX target device, we would
loose this information anyways (maybe this answers to Jiri's question
above in the thread).

The other patch [3] fixes the reported UAF, but if the Fixes: tag is
correct it will probably need some special care if it propagates to
stable branches.

thanks,
-- 
davide

[1] https://lore.kernel.org/netdev/20240209235413.3717039-1-kuba@kernel.org/
[2] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/ 
[3] https://lore.kernel.org/netdev/20240214033848.981211-2-kuba@kernel.org/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-14 15:11 ` Jamal Hadi Salim
@ 2024-02-14 15:28   ` Jamal Hadi Salim
  2024-02-14 16:10     ` Davide Caratti
  0 siblings, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-14 15:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
	Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani

On Wed, Feb 14, 2024 at 10:11 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > lockdep.
> >
> > In the past there was a concern that the backlog indirection will
> > lead to loss of error reporting / less accurate stats. But the current
> > workaround does not seem to address the issue.
> >
> > Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> > Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Suggested-by: Davide Caratti <dcaratti@redhat.com>
> > Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > CC: jhs@mojatatu.com
> > CC: xiyou.wangcong@gmail.com
> > CC: jiri@resnulli.us
> > CC: shmulik.ladkani@gmail.com
> > ---
> >  net/sched/act_mirred.c                             | 14 +++++---------
> >  .../testing/selftests/net/forwarding/tc_actions.sh |  3 ---
> >  2 files changed, 5 insertions(+), 12 deletions(-)
> >
> > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > index 93a96e9d8d90..35c366f043d9 100644
> > --- a/net/sched/act_mirred.c
> > +++ b/net/sched/act_mirred.c
> > @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> >         return err;
> >  }
> >
> > -static bool is_mirred_nested(void)
> > -{
> > -       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> > -}
> > -
> > -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> > +static int
> > +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
> >  {
> >         int err;
> >
> >         if (!want_ingress)
> >                 err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> > -       else if (is_mirred_nested())
> > +       else if (!at_ingress)
> >                 err = netif_rx(skb);
> >         else
> >                 err = netif_receive_skb(skb);
> > @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
> >
> >                 skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
> >
> > -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> > +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> >         } else {
> > -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> > +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> >         }
> >
> >         if (err) {
> > diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > index b0f5e55d2d0b..589629636502 100755
> > --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> > +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
> >         check_err $? "didn't mirred redirect ICMP"
> >         tc_check_packets "dev $h1 ingress" 102 10
> >         check_err $? "didn't drop mirred ICMP"
> > -       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> > -       test ${overlimits} = 10
> > -       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
> >
> >         tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> >         tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> > --
> > 2.43.0
> >
>
>
> Doing a quick test of this and other patch i saw..


So tests pass - but on the list i only see one patch and the other is
on lore, not sure how to ACK something that is not on email, but FWIW:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

The second patch avoids the recursion issue (which was the root cause)
and the first patch is really undoing ca22da2fbd693
I dont know underlying issue in ca22da2fbd693 is solved (and dont have
time to look into it). Davide?

cheers,
jamal

> cheers,
> jamal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-09 23:54 Jakub Kicinski
  2024-02-12 14:51 ` Jamal Hadi Salim
@ 2024-02-14 15:11 ` Jamal Hadi Salim
  2024-02-14 15:28   ` Jamal Hadi Salim
  1 sibling, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-14 15:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
	Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani

On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> lockdep.
>
> In the past there was a concern that the backlog indirection will
> lead to loss of error reporting / less accurate stats. But the current
> workaround does not seem to address the issue.
>
> Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Suggested-by: Davide Caratti <dcaratti@redhat.com>
> Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: jhs@mojatatu.com
> CC: xiyou.wangcong@gmail.com
> CC: jiri@resnulli.us
> CC: shmulik.ladkani@gmail.com
> ---
>  net/sched/act_mirred.c                             | 14 +++++---------
>  .../testing/selftests/net/forwarding/tc_actions.sh |  3 ---
>  2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 93a96e9d8d90..35c366f043d9 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>         return err;
>  }
>
> -static bool is_mirred_nested(void)
> -{
> -       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> -}
> -
> -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> +static int
> +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
>  {
>         int err;
>
>         if (!want_ingress)
>                 err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> -       else if (is_mirred_nested())
> +       else if (!at_ingress)
>                 err = netif_rx(skb);
>         else
>                 err = netif_receive_skb(skb);
> @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
>
>                 skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
>
> -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
>         } else {
> -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
>         }
>
>         if (err) {
> diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> index b0f5e55d2d0b..589629636502 100755
> --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
>         check_err $? "didn't mirred redirect ICMP"
>         tc_check_packets "dev $h1 ingress" 102 10
>         check_err $? "didn't drop mirred ICMP"
> -       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> -       test ${overlimits} = 10
> -       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
>
>         tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
>         tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> --
> 2.43.0
>


Doing a quick test of this and other patch i saw..

cheers,
jamal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-14  0:27       ` Jakub Kicinski
@ 2024-02-14  3:40         ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-14  3:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jamal Hadi Salim, davem, netdev, edumazet,
	Marcelo Ricardo Leitner, Davide Caratti, xiyou.wangcong, jiri,
	shmulik.ladkani, victor

On Tue, 13 Feb 2024 16:27:44 -0800 Jakub Kicinski wrote:
> If I'm looking right the bug seems fairly straightforward but tricky 
> to cleanly fix :( I also haven't dug deep enough in the history to
> be provide a real Fixes tag...

I submitted a slightly modified version officially:
https://lore.kernel.org/r/20240214033848.981211-2-kuba@kernel.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-13 11:06     ` Paolo Abeni
@ 2024-02-14  0:27       ` Jakub Kicinski
  2024-02-14  3:40         ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-14  0:27 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jamal Hadi Salim, davem, netdev, edumazet,
	Marcelo Ricardo Leitner, Davide Caratti, xiyou.wangcong, jiri,
	shmulik.ladkani, victor

On Tue, 13 Feb 2024 12:06:04 +0100 Paolo Abeni wrote:
> > Something broke.
> > Create a ns. Put one half of veth into the namespace. Create a filter
> > inside the net ns.
> > at_ns$ tc qdisc add dev port0 ingress_block 21 clsact
> > at_ns$ tc filter add block 21 egress protocol ip prio 10 matchall
> > action mirred ingress redirect dev port0
> > 
> > Send a ping from host:
> > at_host@ ping 10.0.0.2 -c 1 -I <vethportonhostside>
> > 
> > And.. hits uaf.... see attached.  
> 
> It looks like:
> 
> netif_receive_skb
> run_tc()
> 	act_mirred	
> 		netif_receive_skb
> 			sch_handle_ingress
> 				act_mirred // nesting limit hit
> 			// free skb
> 		// netif_receive_skb returns NET_RX_DROP
> 	// act_mirred returns TC_ACT_SHOT
> // UaF while de-referencing the (freed) skb
> 
> 
> No idea how to solve it on top of my mind :(

If I'm looking right the bug seems fairly straightforward but tricky 
to cleanly fix :( I also haven't dug deep enough in the history to
be provide a real Fixes tag...

--->8-------------
net/sched: act_mirred: don't override retval if we already lost the skb

If we're redirecting the skb, and haven't called tcf_mirred_forward(),
yet, we need to tell the core to drop the skb by setting the retcode
to SHOT. If we have called tcf_mirred_forward(), however, the skb
is out of our hands and returning SHOT will lead to UaF.

Move the overrides up to the error paths which actually need them.
Note that the err variable is only used to store return code from
tcf_mirred_forward() and we don't have to set it.

Fixes: 16085e48cb48 ("net/sched: act_mirred: Create function tcf_mirred_to_dev and improve readability")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/sched/act_mirred.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 93a96e9d8d90..922a018329cd 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -270,7 +270,8 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
 	if (unlikely(!(dev->flags & IFF_UP)) || !netif_carrier_ok(dev)) {
 		net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
 				       dev->name);
-		err = -ENODEV;
+		if (is_redirect)
+			retval = TC_ACT_SHOT;
 		goto out;
 	}
 
@@ -284,7 +285,8 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
 	if (!dont_clone) {
 		skb_to_send = skb_clone(skb, GFP_ATOMIC);
 		if (!skb_to_send) {
-			err =  -ENOMEM;
+			if (is_redirect)
+				retval = TC_ACT_SHOT;
 			goto out;
 		}
 	}
@@ -327,8 +329,6 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
 	if (err) {
 out:
 		tcf_action_inc_overlimit_qstats(&m->common);
-		if (is_redirect)
-			retval = TC_ACT_SHOT;
 	}
 
 	return retval;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-12 15:11   ` Jamal Hadi Salim
@ 2024-02-13 11:06     ` Paolo Abeni
  2024-02-14  0:27       ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2024-02-13 11:06 UTC (permalink / raw)
  To: Jamal Hadi Salim, Jakub Kicinski
  Cc: davem, netdev, edumazet, Marcelo Ricardo Leitner, Davide Caratti,
	xiyou.wangcong, jiri, shmulik.ladkani

On Mon, 2024-02-12 at 10:11 -0500, Jamal Hadi Salim wrote:
> On Mon, Feb 12, 2024 at 9:51 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > 
> > On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > 
> > > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > > lockdep.
> > > 
> > > In the past there was a concern that the backlog indirection will
> > > lead to loss of error reporting / less accurate stats. But the current
> > > workaround does not seem to address the issue.
> > > 
> > 
> > Let us run some basic tests on this first - it's a hairy spot. Also,
> 
> Something broke.
> Create a ns. Put one half of veth into the namespace. Create a filter
> inside the net ns.
> at_ns$ tc qdisc add dev port0 ingress_block 21 clsact
> at_ns$ tc filter add block 21 egress protocol ip prio 10 matchall
> action mirred ingress redirect dev port0
> 
> Send a ping from host:
> at_host@ ping 10.0.0.2 -c 1 -I <vethportonhostside>
> 
> And.. hits uaf.... see attached.

It looks like:

netif_receive_skb
run_tc()
	act_mirred	
		netif_receive_skb
			sch_handle_ingress
				act_mirred // nesting limit hit
			// free skb
		// netif_receive_skb returns NET_RX_DROP
	// act_mirred returns TC_ACT_SHOT
// UaF while de-referencing the (freed) skb


No idea how to solve it on top of my mind :(

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-12 14:51 ` Jamal Hadi Salim
  2024-02-12 15:02   ` Jakub Kicinski
@ 2024-02-12 15:11   ` Jamal Hadi Salim
  2024-02-13 11:06     ` Paolo Abeni
  1 sibling, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-12 15:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
	Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani

[-- Attachment #1: Type: text/plain, Size: 4490 bytes --]

On Mon, Feb 12, 2024 at 9:51 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > lockdep.
> >
> > In the past there was a concern that the backlog indirection will
> > lead to loss of error reporting / less accurate stats. But the current
> > workaround does not seem to address the issue.
> >
>
> Let us run some basic tests on this first - it's a hairy spot. Also,

Something broke.
Create a ns. Put one half of veth into the namespace. Create a filter
inside the net ns.
at_ns$ tc qdisc add dev port0 ingress_block 21 clsact
at_ns$ tc filter add block 21 egress protocol ip prio 10 matchall
action mirred ingress redirect dev port0

Send a ping from host:
at_host@ ping 10.0.0.2 -c 1 -I <vethportonhostside>

And.. hits uaf.... see attached.
Sorry, no time to look at it right now but one of us
(Victor/Pedro/myself) will look later unless someone else beats us to
it.

cheers,
jamal

> IIRC Cong had some reservations about this in the past. Can't remember
> what they were.


> cheers,
> jamal
>
>
> > Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> > Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Suggested-by: Davide Caratti <dcaratti@redhat.com>
> > Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > CC: jhs@mojatatu.com
> > CC: xiyou.wangcong@gmail.com
> > CC: jiri@resnulli.us
> > CC: shmulik.ladkani@gmail.com
> > ---
> >  net/sched/act_mirred.c                             | 14 +++++---------
> >  .../testing/selftests/net/forwarding/tc_actions.sh |  3 ---
> >  2 files changed, 5 insertions(+), 12 deletions(-)
> >
> > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > index 93a96e9d8d90..35c366f043d9 100644
> > --- a/net/sched/act_mirred.c
> > +++ b/net/sched/act_mirred.c
> > @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> >         return err;
> >  }
> >
> > -static bool is_mirred_nested(void)
> > -{
> > -       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> > -}
> > -
> > -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> > +static int
> > +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
> >  {
> >         int err;
> >
> >         if (!want_ingress)
> >                 err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> > -       else if (is_mirred_nested())
> > +       else if (!at_ingress)
> >                 err = netif_rx(skb);
> >         else
> >                 err = netif_receive_skb(skb);
> > @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
> >
> >                 skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
> >
> > -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> > +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> >         } else {
> > -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> > +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> >         }
> >
> >         if (err) {
> > diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > index b0f5e55d2d0b..589629636502 100755
> > --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> > +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
> >         check_err $? "didn't mirred redirect ICMP"
> >         tc_check_packets "dev $h1 ingress" 102 10
> >         check_err $? "didn't drop mirred ICMP"
> > -       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> > -       test ${overlimits} = 10
> > -       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
> >
> >         tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> >         tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> > --
> > 2.43.0
> >

[-- Attachment #2: mirred_ingress_bug.txt --]
[-- Type: text/plain, Size: 26164 bytes --]

[  252.615958] Packet exceeded mirred recursion limit on dev port0
[  252.616534] ==================================================================
[  252.616911] BUG: KASAN: slab-use-after-free in tc_run+0x50a/0x5b0
[  252.617241] Read of size 4 at addr ffff88800d453584 by task ping/548

[  252.617475] CPU: 8 PID: 548 Comm: ping Not tainted 6.8.0-rc3-00862-g0f37666d87d2-dirty #147
[  252.617475] Call Trace:
[  252.617475]  <IRQ>
[  252.617475]  dump_stack_lvl+0x64/0xb0
[  252.617475]  print_report+0xcf/0x620
[  252.617475]  ? tc_run+0x50a/0x5b0
[  252.617475]  ? kasan_complete_mode_report_info+0x7c/0x200
[  252.617475]  ? tc_run+0x50a/0x5b0
[  252.617475]  kasan_report+0xc9/0x110
[  252.617475]  ? tc_run+0x50a/0x5b0
[  252.617475]  __asan_report_load4_noabort+0x14/0x20
[  252.617475]  tc_run+0x50a/0x5b0
[  252.617475]  ? __pfx_tc_run+0x10/0x10
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  ? kvm_sched_clock_read+0x11/0x20
[  252.617475]  ? __lock_acquire+0xcf0/0x6760
[  252.617475]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.617475]  ? __pfx___lock_acquire+0x10/0x10
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.617475]  ? __pfx_lock_acquire+0x10/0x10
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  netif_receive_skb+0xf9/0x4f0
[  252.617475]  ? __pfx_netif_receive_skb+0x10/0x10
[  252.617475]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.617475]  tcf_mirred_act+0x3d1/0xe50
[  252.617475]  ? __pfx_tcf_mirred_act+0x10/0x10
[  252.617475]  tcf_action_exec+0x561/0x870
[  252.617475]  mall_classify+0x168/0x240
[  252.617475]  __tcf_classify.constprop.0+0x50c/0x660
[  252.617475]  tcf_classify+0x6e/0xa0
[  252.617475]  ? __pfx_tcf_classify+0x10/0x10
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.617475]  tc_run+0x31b/0x5b0
[  252.617475]  ? __pfx_tc_run+0x10/0x10
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  ? kvm_sched_clock_read+0x11/0x20
[  252.617475]  ? __lock_acquire+0xcf0/0x6760
[  252.617475]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.617475]  ? __pfx___lock_acquire+0x10/0x10
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.617475]  ? __pfx_lock_acquire+0x10/0x10
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  netif_receive_skb+0xf9/0x4f0
[  252.617475]  ? __pfx_netif_receive_skb+0x10/0x10
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.617475]  tcf_mirred_act+0x3d1/0xe50
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  ? __pfx_tcf_mirred_act+0x10/0x10
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  ? mark_lock.part.0+0xfa/0x31a0
[  252.617475]  ? __pfx_mark_lock.part.0+0x10/0x10
[  252.617475]  tcf_action_exec+0x561/0x870
[  252.617475]  mall_classify+0x168/0x240
[  252.617475]  ? mark_lock.part.0+0xfa/0x31a0
[  252.617475]  __tcf_classify.constprop.0+0x50c/0x660
[  252.617475]  tcf_classify+0x6e/0xa0
[  252.617475]  ? __pfx_tcf_classify+0x10/0x10
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.617475]  tc_run+0x31b/0x5b0
[  252.617475]  ? __pfx_tc_run+0x10/0x10
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  ? kvm_sched_clock_read+0x11/0x20
[  252.617475]  ? __lock_acquire+0xcf0/0x6760
[  252.617475]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.617475]  ? __pfx___lock_acquire+0x10/0x10
[  252.617475]  ? __pfx_mark_lock.part.0+0x10/0x10
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.617475]  ? __pfx_lock_acquire+0x10/0x10
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  ? __lock_acquire+0xd9f/0x6760
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  netif_receive_skb+0xf9/0x4f0
[  252.617475]  ? __pfx_netif_receive_skb+0x10/0x10
[  252.617475]  ? __pfx___lock_acquire+0x10/0x10
[  252.617475]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.617475]  tcf_mirred_act+0x3d1/0xe50
[  252.617475]  ? __pfx_tcf_mirred_act+0x10/0x10
[  252.617475]  ? update_stack_state+0x2d4/0x600
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  tcf_action_exec+0x561/0x870
[  252.617475]  mall_classify+0x168/0x240
[  252.617475]  __tcf_classify.constprop.0+0x50c/0x660
[  252.617475]  tcf_classify+0x6e/0xa0
[  252.617475]  ? __pfx_tcf_classify+0x10/0x10
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.617475]  tc_run+0x31b/0x5b0
[  252.617475]  ? __pfx_tc_run+0x10/0x10
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  ? kvm_sched_clock_read+0x11/0x20
[  252.617475]  ? kvm_sched_clock_read+0x11/0x20
[  252.617475]  ? __lock_acquire+0xcf0/0x6760
[  252.617475]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.617475]  ? __pfx___lock_acquire+0x10/0x10
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.617475]  ? __pfx_lock_acquire+0x10/0x10
[  252.617475]  ? _raw_spin_unlock_irq+0x28/0x60
[  252.617475]  ? process_backlog+0x416/0x5c0
[  252.617475]  ? __this_cpu_preempt_check+0x13/0x20
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  process_backlog+0xd7/0x5c0
[  252.617475]  __napi_poll.constprop.0+0xae/0x590
[  252.617475]  ? net_rx_action+0x252/0xb40
[  252.617475]  net_rx_action+0x47c/0xb40
[  252.617475]  ? __pfx_net_rx_action+0x10/0x10
[  252.617475]  ? irq_exit_rcu+0xb4/0x120
[  252.617475]  ? __this_cpu_preempt_check+0x13/0x20
[  252.617475]  __do_softirq+0x1c3/0x853
[  252.617475]  ? __dev_queue_xmit+0x9b3/0x3550
[  252.617475]  do_softirq+0xb5/0xf0
[  252.617475]  </IRQ>
[  252.617475]  <TASK>
[  252.617475]  __local_bh_enable_ip+0x104/0x130
[  252.617475]  ? __dev_queue_xmit+0x9b3/0x3550
[  252.617475]  __dev_queue_xmit+0x9c8/0x3550
[  252.617475]  ? __pfx_mark_lock.part.0+0x10/0x10
[  252.617475]  ? __pfx___dev_queue_xmit+0x10/0x10
[  252.617475]  ? __this_cpu_preempt_check+0x13/0x20
[  252.617475]  ? neigh_resolve_output+0x5bd/0x7b0
[  252.617475]  ? ip_finish_output2+0x6ac/0x1f40
[  252.617475]  ? __this_cpu_preempt_check+0x13/0x20
[  252.617475]  ? eth_header+0x16e/0x1b0
[  252.617475]  neigh_resolve_output+0x431/0x7b0
[  252.617475]  ip_finish_output2+0x6ac/0x1f40
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  ? selinux_ip_postroute+0x357/0x400
[  252.617475]  ? __pfx_selinux_ip_postroute+0x10/0x10
[  252.617475]  ? __get_random_u32_below+0x17/0x70
[  252.617475]  ? __pfx_ip_finish_output2+0x10/0x10
[  252.617475]  ? __this_cpu_preempt_check+0x13/0x20
[  252.617475]  __ip_finish_output+0x6ea/0xed0
[  252.617475]  ? __ip_finish_output+0x6ea/0xed0
[  252.617475]  ip_finish_output+0x2b/0x260
[  252.617475]  ip_output+0x175/0x500
[  252.617475]  ? __pfx_ip_output+0x10/0x10
[  252.617475]  ? __ip_make_skb+0xdb5/0x2060
[  252.617475]  ? __pfx_ip_finish_output+0x10/0x10
[  252.617475]  ip_push_pending_frames+0x249/0x480
[  252.617475]  raw_sendmsg+0xf13/0x2d30
[  252.617475]  ? __kasan_check_read+0x11/0x20
[  252.617475]  ? __lock_acquire+0x1af0/0x6760
[  252.617475]  ? check_preemption_disabled+0x70/0xf0
[  252.617475]  ? __pfx_raw_sendmsg+0x10/0x10
[  252.617475]  ? mark_lock.part.0+0xfa/0x31a0
[  252.617475]  ? kvm_sched_clock_read+0x11/0x20
[  252.617475]  ? __pfx_sock_has_perm+0x10/0x10
[  252.617475]  ? __pfx_lock_release+0x10/0x10
[  252.617475]  ? __might_fault+0xc9/0x180
[  252.617475]  inet_sendmsg+0xd5/0x100
[  252.617475]  ? inet_sendmsg+0xd5/0x100
[  252.617475]  __sys_sendto+0x3be/0x510
[  252.617475]  ? __pfx___sys_sendto+0x10/0x10
[  252.617475]  ? do_user_addr_fault+0x401/0xb70
[  252.617475]  ? __this_cpu_preempt_check+0x13/0x20
[  252.617475]  ? __pfx_lock_release+0x10/0x10
[  252.617475]  ? __pfx___up_read+0x10/0x10
[  252.617475]  ? irqentry_exit_to_user_mode+0x8d/0x1d0
[  252.617475]  ? irqentry_exit+0x6b/0x90
[  252.617475]  ? debug_smp_processor_id+0x17/0x20
[  252.617475]  __x64_sys_sendto+0xe0/0x1a0
[  252.617475]  ? do_syscall_64+0x2a/0x140
[  252.617475]  ? trace_hardirqs_on+0x22/0x120
[  252.617475]  do_syscall_64+0x6d/0x140
[  252.617475]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  252.617475] RIP: 0033:0x7f1d7a8a4b93
[  252.617475] Code: 8b 15 71 72 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 51 fa 0c 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
[  252.617475] RSP: 002b:00007fffe5c7c578 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[  252.617475] RAX: ffffffffffffffda RBX: 000055c8f446f340 RCX: 00007f1d7a8a4b93
[  252.617475] RDX: 0000000000000040 RSI: 000055c8f62889c0 RDI: 0000000000000003
[  252.617475] RBP: 000055c8f62889c0 R08: 000055c8f44715c0 R09: 0000000000000010
[  252.617475] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040
[  252.617475] R13: 00007fffe5c7dc60 R14: 0000001d00000001 R15: 000055c8f4472680
[  252.617475]  </TASK>

[  252.617475] Allocated by task 548:
[  252.617475]  kasan_save_stack+0x28/0x50
[  252.617475]  kasan_save_track+0x14/0x40
[  252.617475]  kasan_save_alloc_info+0x65/0x80
[  252.617475]  __kasan_slab_alloc+0x8d/0x90
[  252.617475]  kmem_cache_alloc_node+0x115/0x310
[  252.617475]  __alloc_skb+0x20b/0x2c0
[  252.617475]  __ip_append_data+0x27f8/0x3de0
[  252.617475]  ip_append_data+0xd4/0x190
[  252.617475]  raw_sendmsg+0xa91/0x2d30
[  252.617475]  inet_sendmsg+0xd5/0x100
[  252.617475]  __sys_sendto+0x3be/0x510
[  252.617475]  __x64_sys_sendto+0xe0/0x1a0
[  252.617475]  do_syscall_64+0x6d/0x140
[  252.617475]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

[  252.617475] Freed by task 548:
[  252.617475]  kasan_save_stack+0x28/0x50
[  252.617475]  kasan_save_track+0x14/0x40
[  252.617475]  kasan_save_free_info+0x41/0x60
[  252.617475]  poison_slab_object+0x10e/0x190
[  252.617475]  __kasan_slab_free+0x38/0x90
[  252.617475]  kmem_cache_free+0x106/0x3e0
[  252.617475]  kfree_skbmem+0x9e/0x150
[  252.617475]  kfree_skb_reason+0x118/0x370
[  252.617475]  __netif_receive_skb_core.constprop.0+0x1a4e/0x3910
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  netif_receive_skb+0xf9/0x4f0
[  252.617475]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.617475]  tcf_mirred_act+0x3d1/0xe50
[  252.617475]  tcf_action_exec+0x561/0x870
[  252.617475]  mall_classify+0x168/0x240
[  252.617475]  __tcf_classify.constprop.0+0x50c/0x660
[  252.617475]  tcf_classify+0x6e/0xa0
[  252.617475]  tc_run+0x31b/0x5b0
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  netif_receive_skb+0xf9/0x4f0
[  252.617475]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.617475]  tcf_mirred_act+0x3d1/0xe50
[  252.617475]  tcf_action_exec+0x561/0x870
[  252.617475]  mall_classify+0x168/0x240
[  252.617475]  __tcf_classify.constprop.0+0x50c/0x660
[  252.617475]  tcf_classify+0x6e/0xa0
[  252.617475]  tc_run+0x31b/0x5b0
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  netif_receive_skb+0xf9/0x4f0
[  252.617475]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.617475]  tcf_mirred_act+0x3d1/0xe50
[  252.617475]  tcf_action_exec+0x561/0x870
[  252.617475]  mall_classify+0x168/0x240
[  252.617475]  __tcf_classify.constprop.0+0x50c/0x660
[  252.617475]  tcf_classify+0x6e/0xa0
[  252.617475]  tc_run+0x31b/0x5b0
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  netif_receive_skb+0xf9/0x4f0
[  252.617475]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.617475]  tcf_mirred_act+0x3d1/0xe50
[  252.617475]  tcf_action_exec+0x561/0x870
[  252.617475]  mall_classify+0x168/0x240
[  252.617475]  __tcf_classify.constprop.0+0x50c/0x660
[  252.617475]  tcf_classify+0x6e/0xa0
[  252.617475]  tc_run+0x31b/0x5b0
[  252.617475]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.617475]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.617475]  __netif_receive_skb+0x21/0x1b0
[  252.617475]  process_backlog+0xd7/0x5c0
[  252.617475]  __napi_poll.constprop.0+0xae/0x590
[  252.617475]  net_rx_action+0x47c/0xb40
[  252.617475]  __do_softirq+0x1c3/0x853

[  252.617475] The buggy address belongs to the object at ffff88800d453540
                which belongs to the cache skbuff_head_cache of size 224
[  252.617475] The buggy address is located 68 bytes inside of
                freed 224-byte region [ffff88800d453540, ffff88800d453620)

[  252.617475] The buggy address belongs to the physical page:
[  252.617475] page:00000000848240a3 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800d453b80 pfn:0xd452
[  252.617475] head:00000000848240a3 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  252.617475] flags: 0x100000000000840(slab|head|node=0|zone=1)
[  252.617475] page_type: 0xffffffff()
[  252.617475] raw: 0100000000000840 ffff88800244e3c0 dead000000000100 dead000000000122
[  252.617475] raw: ffff88800d453b80 0000000080190004 00000001ffffffff 0000000000000000
[  252.617475] page dumped because: kasan: bad access detected

[  252.617475] Memory state around the buggy address:
[  252.617475]  ffff88800d453480: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  252.617475]  ffff88800d453500: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
[  252.617475] >ffff88800d453580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  252.617475]                    ^
[  252.617475]  ffff88800d453600: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
[  252.617475]  ffff88800d453680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  252.617475] ==================================================================
[  252.684330] Disabling lock debugging due to kernel taint
[  252.684601] ------------[ cut here ]------------
[  252.684839] refcount_t: underflow; use-after-free.
[  252.685141] WARNING: CPU: 8 PID: 548 at lib/refcount.c:28 refcount_warn_saturate+0xd4/0x190
[  252.685567] Modules linked in:
[  252.685728] CPU: 8 PID: 548 Comm: ping Tainted: G    B              6.8.0-rc3-00862-g0f37666d87d2-dirty #147
[  252.686214] RIP: 0010:refcount_warn_saturate+0xd4/0x190
[  252.686501] Code: cc cc cc cc 0f b6 05 26 19 f7 02 3c 01 0f 87 d1 3e 68 01 84 c0 75 e4 48 c7 c7 e0 33 83 97 c6 05 0c 19 f7 02 01 e8 2c ed f0 fe <0f> 0b eb cd 0f b6 05 fa 18 f7 02 3c 01 0f 87 a7 3e 68 01 84 c0 75
[  252.687439] RSP: 0018:ffff888038a08480 EFLAGS: 00010286
[  252.687699] RAX: 0000000000000000 RBX: ffff88801110829c RCX: 0000000000000000
[  252.688057] RDX: 0000000000000103 RSI: 0000000000000004 RDI: 0000000000000001
[  252.688457] RBP: ffff888038a08490 R08: 0000000000000001 R09: ffffed100717d139
[  252.688804] R10: ffff888038be89cb R11: 0000000063666572 R12: 0000000000000003
[  252.689163] R13: 0000000000000340 R14: ffff888011108060 R15: ffff88801110829c
[  252.689545] FS:  00007f1d7a5e2c40(0000) GS:ffff888038a00000(0000) knlGS:0000000000000000
[  252.689927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.690210] CR2: 00007f1d7a85e0d0 CR3: 000000000b184000 CR4: 0000000000750ef0
[  252.690584] PKRU: 55555554
[  252.690731] Call Trace:
[  252.690871]  <IRQ>
[  252.690988]  ? show_regs+0x6a/0x80
[  252.691155]  ? __warn+0xcf/0x2c0
[  252.691327]  ? refcount_warn_saturate+0xd4/0x190
[  252.691585]  ? report_bug+0x341/0x3d0
[  252.691777]  ? handle_bug+0x40/0x80
[  252.691966]  ? exc_invalid_op+0x19/0x50
[  252.692153]  ? asm_exc_invalid_op+0x1b/0x20
[  252.692409]  ? refcount_warn_saturate+0xd4/0x190
[  252.692646]  sock_wfree+0x596/0x770
[  252.692816]  ? __netif_receive_skb_core.constprop.0+0x1a4e/0x3910
[  252.693138]  skb_release_head_state+0x7e/0x1e0
[  252.693388]  kfree_skb_reason+0xd3/0x370
[  252.693576]  __netif_receive_skb_core.constprop.0+0x1a4e/0x3910
[  252.693889]  ? kvm_sched_clock_read+0x11/0x20
[  252.694112]  ? __lock_acquire+0xcf0/0x6760
[  252.694315]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.694650]  ? __pfx___lock_acquire+0x10/0x10
[  252.694871]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.695119]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.695415]  ? __pfx_lock_acquire+0x10/0x10
[  252.695630]  __netif_receive_skb+0x21/0x1b0
[  252.695833]  netif_receive_skb+0xf9/0x4f0
[  252.696040]  ? __pfx_netif_receive_skb+0x10/0x10
[  252.696274]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.696495]  tcf_mirred_act+0x3d1/0xe50
[  252.696687]  ? __pfx_tcf_mirred_act+0x10/0x10
[  252.696901]  tcf_action_exec+0x561/0x870
[  252.697108]  mall_classify+0x168/0x240
[  252.697305]  __tcf_classify.constprop.0+0x50c/0x660
[  252.697569]  tcf_classify+0x6e/0xa0
[  252.697758]  ? __pfx_tcf_classify+0x10/0x10
[  252.697963]  ? debug_smp_processor_id+0x17/0x20
[  252.698176]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.698456]  tc_run+0x31b/0x5b0
[  252.698605]  ? __pfx_tc_run+0x10/0x10
[  252.698799]  ? debug_smp_processor_id+0x17/0x20
[  252.699007]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.699242]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.699535]  ? kvm_sched_clock_read+0x11/0x20
[  252.699754]  ? __lock_acquire+0xcf0/0x6760
[  252.699950]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.700239]  ? __pfx___lock_acquire+0x10/0x10
[  252.700478]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.700721]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.701006]  ? __pfx_lock_acquire+0x10/0x10
[  252.701193]  __netif_receive_skb+0x21/0x1b0
[  252.701415]  netif_receive_skb+0xf9/0x4f0
[  252.701595]  ? __pfx_netif_receive_skb+0x10/0x10
[  252.701811]  ? __kasan_check_read+0x11/0x20
[  252.702013]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.702189]  tcf_mirred_act+0x3d1/0xe50
[  252.702370]  ? __kasan_check_read+0x11/0x20
[  252.702579]  ? __pfx_tcf_mirred_act+0x10/0x10
[  252.702782]  ? __kasan_check_read+0x11/0x20
[  252.702978]  ? mark_lock.part.0+0xfa/0x31a0
[  252.703176]  ? __pfx_mark_lock.part.0+0x10/0x10
[  252.703407]  tcf_action_exec+0x561/0x870
[  252.703594]  mall_classify+0x168/0x240
[  252.703789]  ? mark_lock.part.0+0xfa/0x31a0
[  252.704000]  __tcf_classify.constprop.0+0x50c/0x660
[  252.704238]  tcf_classify+0x6e/0xa0
[  252.704437]  ? __pfx_tcf_classify+0x10/0x10
[  252.704641]  ? debug_smp_processor_id+0x17/0x20
[  252.704854]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.705105]  tc_run+0x31b/0x5b0
[  252.705264]  ? __pfx_tc_run+0x10/0x10
[  252.705469]  ? debug_smp_processor_id+0x17/0x20
[  252.705680]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.705932]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.706192]  ? __kasan_check_read+0x11/0x20
[  252.706398]  ? kvm_sched_clock_read+0x11/0x20
[  252.706631]  ? __lock_acquire+0xcf0/0x6760
[  252.706830]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.707137]  ? __pfx___lock_acquire+0x10/0x10
[  252.707373]  ? __pfx_mark_lock.part.0+0x10/0x10
[  252.707604]  ? __kasan_check_read+0x11/0x20
[  252.707793]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.708024]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.708278]  ? __pfx_lock_acquire+0x10/0x10
[  252.708515]  ? __kasan_check_read+0x11/0x20
[  252.708721]  ? __lock_acquire+0xd9f/0x6760
[  252.708901]  __netif_receive_skb+0x21/0x1b0
[  252.709082]  netif_receive_skb+0xf9/0x4f0
[  252.709279]  ? __pfx_netif_receive_skb+0x10/0x10
[  252.709524]  ? __pfx___lock_acquire+0x10/0x10
[  252.709764]  tcf_mirred_to_dev+0x7dc/0xdd0
[  252.709958]  tcf_mirred_act+0x3d1/0xe50
[  252.710130]  ? __pfx_tcf_mirred_act+0x10/0x10
[  252.710331]  ? update_stack_state+0x2d4/0x600
[  252.710595]  ? __kasan_check_read+0x11/0x20
[  252.710792]  tcf_action_exec+0x561/0x870
[  252.710973]  mall_classify+0x168/0x240
[  252.711156]  __tcf_classify.constprop.0+0x50c/0x660
[  252.711390]  tcf_classify+0x6e/0xa0
[  252.711566]  ? __pfx_tcf_classify+0x10/0x10
[  252.711762]  ? debug_smp_processor_id+0x17/0x20
[  252.711980]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.712212]  tc_run+0x31b/0x5b0
[  252.712369]  ? __pfx_tc_run+0x10/0x10
[  252.712541]  ? debug_smp_processor_id+0x17/0x20
[  252.712746]  ? rcu_lockdep_current_cpu_online+0x3e/0x150
[  252.713001]  __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[  252.713272]  ? kvm_sched_clock_read+0x11/0x20
[  252.713518]  ? kvm_sched_clock_read+0x11/0x20
[  252.713739]  ? __lock_acquire+0xcf0/0x6760
[  252.713928]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  252.714228]  ? __pfx___lock_acquire+0x10/0x10
[  252.714460]  __netif_receive_skb_one_core+0xb4/0x1c0
[  252.714720]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  252.714958]  ? __pfx_lock_acquire+0x10/0x10
[  252.715164]  ? _raw_spin_unlock_irq+0x28/0x60
[  252.715388]  ? process_backlog+0x416/0x5c0
[  252.715596]  ? __this_cpu_preempt_check+0x13/0x20
[  252.715834]  __netif_receive_skb+0x21/0x1b0
[  252.716024]  process_backlog+0xd7/0x5c0
[  252.716230]  __napi_poll.constprop.0+0xae/0x590
[  252.716469]  ? net_rx_action+0x252/0xb40
[  252.716702]  net_rx_action+0x47c/0xb40
[  252.716884]  ? __pfx_net_rx_action+0x10/0x10
[  252.717083]  ? irq_exit_rcu+0xb4/0x120
[  252.717263]  ? __this_cpu_preempt_check+0x13/0x20
[  252.717516]  __do_softirq+0x1c3/0x853
[  252.717702]  ? __dev_queue_xmit+0x9b3/0x3550
[  252.717900]  do_softirq+0xb5/0xf0
[  252.718062]  </IRQ>
[  252.718166]  <TASK>
[  252.718290]  __local_bh_enable_ip+0x104/0x130
[  252.718516]  ? __dev_queue_xmit+0x9b3/0x3550
[  252.718737]  __dev_queue_xmit+0x9c8/0x3550
[  252.718927]  ? __pfx_mark_lock.part.0+0x10/0x10
[  252.719139]  ? __pfx___dev_queue_xmit+0x10/0x10
[  252.719398]  ? __this_cpu_preempt_check+0x13/0x20
[  252.719632]  ? neigh_resolve_output+0x5bd/0x7b0
[  252.719828]  ? ip_finish_output2+0x6ac/0x1f40
[  252.720034]  ? __this_cpu_preempt_check+0x13/0x20
[  252.720261]  ? eth_header+0x16e/0x1b0
[  252.720487]  neigh_resolve_output+0x431/0x7b0
[  252.720694]  ip_finish_output2+0x6ac/0x1f40
[  252.720906]  ? __kasan_check_read+0x11/0x20
[  252.721105]  ? selinux_ip_postroute+0x357/0x400
[  252.721313]  ? __pfx_selinux_ip_postroute+0x10/0x10
[  252.721558]  ? __get_random_u32_below+0x17/0x70
[  252.721796]  ? __pfx_ip_finish_output2+0x10/0x10
[  252.722011]  ? __this_cpu_preempt_check+0x13/0x20
[  252.722243]  __ip_finish_output+0x6ea/0xed0
[  252.722463]  ? __ip_finish_output+0x6ea/0xed0
[  252.722675]  ip_finish_output+0x2b/0x260
[  252.722859]  ip_output+0x175/0x500
[  252.723040]  ? __pfx_ip_output+0x10/0x10
[  252.723237]  ? __ip_make_skb+0xdb5/0x2060
[  252.723448]  ? __pfx_ip_finish_output+0x10/0x10
[  252.723666]  ip_push_pending_frames+0x249/0x480
[  252.723869]  raw_sendmsg+0xf13/0x2d30
[  252.724044]  ? __kasan_check_read+0x11/0x20
[  252.724240]  ? __lock_acquire+0x1af0/0x6760
[  252.724454]  ? check_preemption_disabled+0x70/0xf0
[  252.724688]  ? __pfx_raw_sendmsg+0x10/0x10
[  252.724867]  ? mark_lock.part.0+0xfa/0x31a0
[  252.725069]  ? kvm_sched_clock_read+0x11/0x20
[  252.725270]  ? __pfx_sock_has_perm+0x10/0x10
[  252.725501]  ? __pfx_lock_release+0x10/0x10
[  252.725698]  ? __might_fault+0xc9/0x180
[  252.725877]  inet_sendmsg+0xd5/0x100
[  252.726042]  ? inet_sendmsg+0xd5/0x100
[  252.726219]  __sys_sendto+0x3be/0x510
[  252.726399]  ? __pfx___sys_sendto+0x10/0x10
[  252.726609]  ? do_user_addr_fault+0x401/0xb70
[  252.726842]  ? __this_cpu_preempt_check+0x13/0x20
[  252.727048]  ? __pfx_lock_release+0x10/0x10
[  252.727252]  ? __pfx___up_read+0x10/0x10
[  252.727457]  ? irqentry_exit_to_user_mode+0x8d/0x1d0
[  252.727701]  ? irqentry_exit+0x6b/0x90
[  252.727872]  ? debug_smp_processor_id+0x17/0x20
[  252.728109]  __x64_sys_sendto+0xe0/0x1a0
[  252.728294]  ? do_syscall_64+0x2a/0x140
[  252.728495]  ? trace_hardirqs_on+0x22/0x120
[  252.728706]  do_syscall_64+0x6d/0x140
[  252.728873]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  252.729106] RIP: 0033:0x7f1d7a8a4b93
[  252.729279] Code: 8b 15 71 72 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 51 fa 0c 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
[  252.730188] RSP: 002b:00007fffe5c7c578 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[  252.730558] RAX: ffffffffffffffda RBX: 000055c8f446f340 RCX: 00007f1d7a8a4b93
[  252.730886] RDX: 0000000000000040 RSI: 000055c8f62889c0 RDI: 0000000000000003
[  252.731222] RBP: 000055c8f62889c0 R08: 000055c8f44715c0 R09: 0000000000000010
[  252.731580] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040
[  252.731934] R13: 00007fffe5c7dc60 R14: 0000001d00000001 R15: 000055c8f4472680
[  252.732261]  </TASK>
[  252.732393] irq event stamp: 9034
[  252.732572] hardirqs last  enabled at (9034): [<ffffffff972f451a>] irqentry_exit+0x3a/0x90
[  252.732938] hardirqs last disabled at (9033): [<ffffffff972f3030>] sysvec_apic_timer_interrupt+0x10/0xc0
[  252.733390] softirqs last  enabled at (8974): [<ffffffff966d2c63>] __dev_queue_xmit+0x9b3/0x3550
[  252.733810] softirqs last disabled at (8975): [<ffffffff94b922d5>] do_softirq+0xb5/0xf0
[  252.734195] ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-12 14:51 ` Jamal Hadi Salim
@ 2024-02-12 15:02   ` Jakub Kicinski
  2024-02-12 15:11   ` Jamal Hadi Salim
  1 sibling, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-12 15:02 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
	Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani

On Mon, 12 Feb 2024 09:51:20 -0500 Jamal Hadi Salim wrote:
> > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > lockdep.
> >
> > In the past there was a concern that the backlog indirection will
> > lead to loss of error reporting / less accurate stats. But the current
> > workaround does not seem to address the issue.
> 
> Let us run some basic tests on this first - it's a hairy spot. 

Thanks!

> Also, IIRC Cong had some reservations about this in the past. Can't
> remember what they were.

He was worried that the overlimits stats are no longer counted when we
decouple (and Davide's selftest actually checks for that so I had to
modify it).

I'm not married to the solution in any way but bugs much less serious
get three letter acronyms, we can't pretend this one doesn't exist.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
  2024-02-09 23:54 Jakub Kicinski
@ 2024-02-12 14:51 ` Jamal Hadi Salim
  2024-02-12 15:02   ` Jakub Kicinski
  2024-02-12 15:11   ` Jamal Hadi Salim
  2024-02-14 15:11 ` Jamal Hadi Salim
  1 sibling, 2 replies; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-12 14:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
	Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani

On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> lockdep.
>
> In the past there was a concern that the backlog indirection will
> lead to loss of error reporting / less accurate stats. But the current
> workaround does not seem to address the issue.
>

Let us run some basic tests on this first - it's a hairy spot. Also,
IIRC Cong had some reservations about this in the past. Can't remember
what they were.

cheers,
jamal


> Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Suggested-by: Davide Caratti <dcaratti@redhat.com>
> Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: jhs@mojatatu.com
> CC: xiyou.wangcong@gmail.com
> CC: jiri@resnulli.us
> CC: shmulik.ladkani@gmail.com
> ---
>  net/sched/act_mirred.c                             | 14 +++++---------
>  .../testing/selftests/net/forwarding/tc_actions.sh |  3 ---
>  2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 93a96e9d8d90..35c366f043d9 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>         return err;
>  }
>
> -static bool is_mirred_nested(void)
> -{
> -       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> -}
> -
> -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> +static int
> +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
>  {
>         int err;
>
>         if (!want_ingress)
>                 err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> -       else if (is_mirred_nested())
> +       else if (!at_ingress)
>                 err = netif_rx(skb);
>         else
>                 err = netif_receive_skb(skb);
> @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
>
>                 skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
>
> -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
>         } else {
> -               err = tcf_mirred_forward(want_ingress, skb_to_send);
> +               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
>         }
>
>         if (err) {
> diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> index b0f5e55d2d0b..589629636502 100755
> --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
>         check_err $? "didn't mirred redirect ICMP"
>         tc_check_packets "dev $h1 ingress" 102 10
>         check_err $? "didn't drop mirred ICMP"
> -       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> -       test ${overlimits} = 10
> -       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
>
>         tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
>         tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
@ 2024-02-09 23:54 Jakub Kicinski
  2024-02-12 14:51 ` Jamal Hadi Salim
  2024-02-14 15:11 ` Jamal Hadi Salim
  0 siblings, 2 replies; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-09 23:54 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, Jakub Kicinski,
	Marcelo Ricardo Leitner, Davide Caratti, jhs, xiyou.wangcong,
	jiri, shmulik.ladkani

The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.

In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.

Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: jhs@mojatatu.com
CC: xiyou.wangcong@gmail.com
CC: jiri@resnulli.us
CC: shmulik.ladkani@gmail.com
---
 net/sched/act_mirred.c                             | 14 +++++---------
 .../testing/selftests/net/forwarding/tc_actions.sh |  3 ---
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 93a96e9d8d90..35c366f043d9 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	return err;
 }
 
-static bool is_mirred_nested(void)
-{
-	return unlikely(__this_cpu_read(mirred_nest_level) > 1);
-}
-
-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int
+tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
 {
 	int err;
 
 	if (!want_ingress)
 		err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
-	else if (is_mirred_nested())
+	else if (!at_ingress)
 		err = netif_rx(skb);
 	else
 		err = netif_receive_skb(skb);
@@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
 
 		skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
 
-		err = tcf_mirred_forward(want_ingress, skb_to_send);
+		err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
 	} else {
-		err = tcf_mirred_forward(want_ingress, skb_to_send);
+		err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
 	}
 
 	if (err) {
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index b0f5e55d2d0b..589629636502 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
 	check_err $? "didn't mirred redirect ICMP"
 	tc_check_packets "dev $h1 ingress" 102 10
 	check_err $? "didn't drop mirred ICMP"
-	local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
-	test ${overlimits} = 10
-	check_err $? "wrong overlimits, expected 10 got ${overlimits}"
 
 	tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
 	tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-02-15 17:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-23 15:11 [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress Davide Caratti
2022-09-25 18:08 ` Cong Wang
2022-10-04 17:40   ` Davide Caratti
2022-10-16 17:28     ` Cong Wang
2022-11-18 23:07 ` Peilin Ye
2024-02-09 23:54 Jakub Kicinski
2024-02-12 14:51 ` Jamal Hadi Salim
2024-02-12 15:02   ` Jakub Kicinski
2024-02-12 15:11   ` Jamal Hadi Salim
2024-02-13 11:06     ` Paolo Abeni
2024-02-14  0:27       ` Jakub Kicinski
2024-02-14  3:40         ` Jakub Kicinski
2024-02-14 15:11 ` Jamal Hadi Salim
2024-02-14 15:28   ` Jamal Hadi Salim
2024-02-14 16:10     ` Davide Caratti
2024-02-15  0:31       ` Jakub Kicinski
2024-02-15 17:55         ` Davide Caratti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).