netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Davide Caratti <dcaratti@redhat.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@resnulli.us>, Paolo Abeni <pabeni@redhat.com>,
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	wizhao@redhat.com, netdev@vger.kernel.org
Subject: Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
Date: Sun, 25 Sep 2022 11:08:48 -0700	[thread overview]
Message-ID: <YzCZMHYmk53mQ+HK@pop-os.localdomain> (raw)
In-Reply-To: <33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com>

On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> William reports kernel soft-lockups on some OVS topologies when TC mirred
> "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> mirred action in egress-to-ingress can easily produce a dmesg splat like:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  6.0.0-rc4+ #511 Not tainted
>  --------------------------------------------
>  nc/1037 is trying to acquire lock:
>  ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> 
>  but task is already holding lock:
>  ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(slock-AF_INET/1);
>    lock(slock-AF_INET/1);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  12 locks held by nc/1037:
>   #0: ffff950687843d40 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
>   #1: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
>   #2: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
>   #3: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
>   #4: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
>   #5: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
>   #6: ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
>   #7: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
>   #8: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
>   #9: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
>   #10: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
>   #11: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
> 
>  stack backtrace:
>  CPU: 1 PID: 1037 Comm: nc Not tainted 6.0.0-rc4+ #511
>  Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x44/0x5b
>   __lock_acquire.cold.76+0x121/0x2a7
>   lock_acquire+0xd5/0x310
>   _raw_spin_lock_nested+0x39/0x70
>   tcp_v4_rcv+0x1023/0x1160
>   ip_protocol_deliver_rcu+0x4d/0x280
>   ip_local_deliver_finish+0xac/0x160
>   ip_local_deliver+0x71/0x220
>   ip_rcv+0x5a/0x200
>   __netif_receive_skb_one_core+0x89/0xa0
>   netif_receive_skb+0x1c1/0x400
>   tcf_mirred_act+0x2a5/0x610 [act_mirred]
>   tcf_action_exec+0xb3/0x210
>   fl_classify+0x1f7/0x240 [cls_flower]
>   tcf_classify+0x7b/0x320
>   __dev_queue_xmit+0x3a4/0x11b0
>   ip_finish_output2+0x3b8/0xa10
>   ip_output+0x7f/0x260
>   __ip_queue_xmit+0x1ce/0x610
>   __tcp_transmit_skb+0xabc/0xc80
>   tcp_rcv_state_process+0x669/0x1290
>   tcp_v4_do_rcv+0xd7/0x370
>   tcp_v4_rcv+0x10bc/0x1160
>   ip_protocol_deliver_rcu+0x4d/0x280
>   ip_local_deliver_finish+0xac/0x160
>   ip_local_deliver+0x71/0x220
>   ip_rcv+0x5a/0x200
>   __netif_receive_skb_one_core+0x89/0xa0
>   netif_receive_skb+0x1c1/0x400
>   tcf_mirred_act+0x2a5/0x610 [act_mirred]
>   tcf_action_exec+0xb3/0x210
>   fl_classify+0x1f7/0x240 [cls_flower]
>   tcf_classify+0x7b/0x320
>   __dev_queue_xmit+0x3a4/0x11b0
>   ip_finish_output2+0x3b8/0xa10
>   ip_output+0x7f/0x260
>   __ip_queue_xmit+0x1ce/0x610
>   __tcp_transmit_skb+0xabc/0xc80
>   tcp_write_xmit+0x229/0x12c0
>   __tcp_push_pending_frames+0x32/0xf0
>   tcp_sendmsg_locked+0x297/0xe10
>   tcp_sendmsg+0x27/0x40
>   sock_sendmsg+0x58/0x70
>   __sys_sendto+0xfd/0x170
>   __x64_sys_sendto+0x24/0x30
>   do_syscall_64+0x3a/0x90
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>  RIP: 0033:0x7f11a06fd281
>  Code: 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 43 2c 00 41 89 ca 8b 00 85 c0 75 1c 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 41 56 41 89 ce 41 55
>  RSP: 002b:00007ffd17958358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>  RAX: ffffffffffffffda RBX: 0000555c6e671610 RCX: 00007f11a06fd281
>  RDX: 0000000000002000 RSI: 0000555c6e73a9f0 RDI: 0000000000000003
>  RBP: 0000555c6e6433b0 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
>  R13: 0000555c6e671410 R14: 0000555c6e671410 R15: 0000555c6e6433f8
>   </TASK>
> 
> that is very similar to those observed by William in his setup.
> By using netif_rx() for mirred ingress packets, packets are queued in the
> backlog, like it's done in the receive path of "loopback" and "veth", and
> the deadlock is not visible anymore. Also add a selftest that can be used
> to reproduce the problem / verify the fix.

Which also means we can no longer know the RX path status any more,
right? I mean if we have filters on ingress, we can't know whether they
drop this packet or not, after this patch? To me, this at least breaks
users' expectation.

BTW, have you thought about solving the above lockdep warning in TCP
layer?

Thanks.

  reply	other threads:[~2022-09-25 18:09 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-23 15:11 [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress Davide Caratti
2022-09-25 18:08 ` Cong Wang [this message]
2022-10-04 17:40   ` Davide Caratti
2022-10-16 17:28     ` Cong Wang
2022-11-18 23:07 ` Peilin Ye
2024-02-09 23:54 Jakub Kicinski
2024-02-12 14:51 ` Jamal Hadi Salim
2024-02-12 15:02   ` Jakub Kicinski
2024-02-12 15:11   ` Jamal Hadi Salim
2024-02-13 11:06     ` Paolo Abeni
2024-02-14  0:27       ` Jakub Kicinski
2024-02-14  3:40         ` Jakub Kicinski
2024-02-14 15:11 ` Jamal Hadi Salim
2024-02-14 15:28   ` Jamal Hadi Salim
2024-02-14 16:10     ` Davide Caratti
2024-02-15  0:31       ` Jakub Kicinski
2024-02-15 17:55         ` Davide Caratti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzCZMHYmk53mQ+HK@pop-os.localdomain \
    --to=xiyou.wangcong@gmail.com \
    --cc=dcaratti@redhat.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=wizhao@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).