* [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
@ 2024-02-09 23:54 Jakub Kicinski
2024-02-12 14:51 ` Jamal Hadi Salim
2024-02-14 15:11 ` Jamal Hadi Salim
0 siblings, 2 replies; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-09 23:54 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, Jakub Kicinski,
Marcelo Ricardo Leitner, Davide Caratti, jhs, xiyou.wangcong,
jiri, shmulik.ladkani
The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.
In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.
Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: jhs@mojatatu.com
CC: xiyou.wangcong@gmail.com
CC: jiri@resnulli.us
CC: shmulik.ladkani@gmail.com
---
net/sched/act_mirred.c | 14 +++++---------
.../testing/selftests/net/forwarding/tc_actions.sh | 3 ---
2 files changed, 5 insertions(+), 12 deletions(-)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 93a96e9d8d90..35c366f043d9 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
return err;
}
-static bool is_mirred_nested(void)
-{
- return unlikely(__this_cpu_read(mirred_nest_level) > 1);
-}
-
-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int
+tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
{
int err;
if (!want_ingress)
err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
- else if (is_mirred_nested())
+ else if (!at_ingress)
err = netif_rx(skb);
else
err = netif_receive_skb(skb);
@@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
- err = tcf_mirred_forward(want_ingress, skb_to_send);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
} else {
- err = tcf_mirred_forward(want_ingress, skb_to_send);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
}
if (err) {
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index b0f5e55d2d0b..589629636502 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
check_err $? "didn't mirred redirect ICMP"
tc_check_packets "dev $h1 ingress" 102 10
check_err $? "didn't drop mirred ICMP"
- local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
- test ${overlimits} = 10
- check_err $? "wrong overlimits, expected 10 got ${overlimits}"
tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-09 23:54 [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress Jakub Kicinski
@ 2024-02-12 14:51 ` Jamal Hadi Salim
2024-02-12 15:02 ` Jakub Kicinski
2024-02-12 15:11 ` Jamal Hadi Salim
2024-02-14 15:11 ` Jamal Hadi Salim
1 sibling, 2 replies; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-12 14:51 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani
On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> lockdep.
>
> In the past there was a concern that the backlog indirection will
> lead to loss of error reporting / less accurate stats. But the current
> workaround does not seem to address the issue.
>
Let us run some basic tests on this first - it's a hairy spot. Also,
IIRC Cong had some reservations about this in the past. Can't remember
what they were.
cheers,
jamal
> Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Suggested-by: Davide Caratti <dcaratti@redhat.com>
> Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: jhs@mojatatu.com
> CC: xiyou.wangcong@gmail.com
> CC: jiri@resnulli.us
> CC: shmulik.ladkani@gmail.com
> ---
> net/sched/act_mirred.c | 14 +++++---------
> .../testing/selftests/net/forwarding/tc_actions.sh | 3 ---
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 93a96e9d8d90..35c366f043d9 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> return err;
> }
>
> -static bool is_mirred_nested(void)
> -{
> - return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> -}
> -
> -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> +static int
> +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
> {
> int err;
>
> if (!want_ingress)
> err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> - else if (is_mirred_nested())
> + else if (!at_ingress)
> err = netif_rx(skb);
> else
> err = netif_receive_skb(skb);
> @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
>
> skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
>
> - err = tcf_mirred_forward(want_ingress, skb_to_send);
> + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> } else {
> - err = tcf_mirred_forward(want_ingress, skb_to_send);
> + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> }
>
> if (err) {
> diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> index b0f5e55d2d0b..589629636502 100755
> --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
> check_err $? "didn't mirred redirect ICMP"
> tc_check_packets "dev $h1 ingress" 102 10
> check_err $? "didn't drop mirred ICMP"
> - local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> - test ${overlimits} = 10
> - check_err $? "wrong overlimits, expected 10 got ${overlimits}"
>
> tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-12 14:51 ` Jamal Hadi Salim
@ 2024-02-12 15:02 ` Jakub Kicinski
2024-02-12 15:11 ` Jamal Hadi Salim
1 sibling, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-12 15:02 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani
On Mon, 12 Feb 2024 09:51:20 -0500 Jamal Hadi Salim wrote:
> > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > lockdep.
> >
> > In the past there was a concern that the backlog indirection will
> > lead to loss of error reporting / less accurate stats. But the current
> > workaround does not seem to address the issue.
>
> Let us run some basic tests on this first - it's a hairy spot.
Thanks!
> Also, IIRC Cong had some reservations about this in the past. Can't
> remember what they were.
He was worried that the overlimits stats are no longer counted when we
decouple (and Davide's selftest actually checks for that so I had to
modify it).
I'm not married to the solution in any way but bugs much less serious
get three letter acronyms, we can't pretend this one doesn't exist.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-12 14:51 ` Jamal Hadi Salim
2024-02-12 15:02 ` Jakub Kicinski
@ 2024-02-12 15:11 ` Jamal Hadi Salim
2024-02-13 11:06 ` Paolo Abeni
1 sibling, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-12 15:11 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani
[-- Attachment #1: Type: text/plain, Size: 4490 bytes --]
On Mon, Feb 12, 2024 at 9:51 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > lockdep.
> >
> > In the past there was a concern that the backlog indirection will
> > lead to loss of error reporting / less accurate stats. But the current
> > workaround does not seem to address the issue.
> >
>
> Let us run some basic tests on this first - it's a hairy spot. Also,
Something broke.
Create a ns. Put one half of veth into the namespace. Create a filter
inside the net ns.
at_ns$ tc qdisc add dev port0 ingress_block 21 clsact
at_ns$ tc filter add block 21 egress protocol ip prio 10 matchall
action mirred ingress redirect dev port0
Send a ping from host:
at_host@ ping 10.0.0.2 -c 1 -I <vethportonhostside>
And.. hits uaf.... see attached.
Sorry, no time to look at it right now but one of us
(Victor/Pedro/myself) will look later unless someone else beats us to
it.
cheers,
jamal
> IIRC Cong had some reservations about this in the past. Can't remember
> what they were.
> cheers,
> jamal
>
>
> > Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> > Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Suggested-by: Davide Caratti <dcaratti@redhat.com>
> > Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > CC: jhs@mojatatu.com
> > CC: xiyou.wangcong@gmail.com
> > CC: jiri@resnulli.us
> > CC: shmulik.ladkani@gmail.com
> > ---
> > net/sched/act_mirred.c | 14 +++++---------
> > .../testing/selftests/net/forwarding/tc_actions.sh | 3 ---
> > 2 files changed, 5 insertions(+), 12 deletions(-)
> >
> > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > index 93a96e9d8d90..35c366f043d9 100644
> > --- a/net/sched/act_mirred.c
> > +++ b/net/sched/act_mirred.c
> > @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> > return err;
> > }
> >
> > -static bool is_mirred_nested(void)
> > -{
> > - return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> > -}
> > -
> > -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> > +static int
> > +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
> > {
> > int err;
> >
> > if (!want_ingress)
> > err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> > - else if (is_mirred_nested())
> > + else if (!at_ingress)
> > err = netif_rx(skb);
> > else
> > err = netif_receive_skb(skb);
> > @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
> >
> > skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
> >
> > - err = tcf_mirred_forward(want_ingress, skb_to_send);
> > + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> > } else {
> > - err = tcf_mirred_forward(want_ingress, skb_to_send);
> > + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> > }
> >
> > if (err) {
> > diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > index b0f5e55d2d0b..589629636502 100755
> > --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> > +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
> > check_err $? "didn't mirred redirect ICMP"
> > tc_check_packets "dev $h1 ingress" 102 10
> > check_err $? "didn't drop mirred ICMP"
> > - local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> > - test ${overlimits} = 10
> > - check_err $? "wrong overlimits, expected 10 got ${overlimits}"
> >
> > tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> > tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> > --
> > 2.43.0
> >
[-- Attachment #2: mirred_ingress_bug.txt --]
[-- Type: text/plain, Size: 26164 bytes --]
[ 252.615958] Packet exceeded mirred recursion limit on dev port0
[ 252.616534] ==================================================================
[ 252.616911] BUG: KASAN: slab-use-after-free in tc_run+0x50a/0x5b0
[ 252.617241] Read of size 4 at addr ffff88800d453584 by task ping/548
[ 252.617475] CPU: 8 PID: 548 Comm: ping Not tainted 6.8.0-rc3-00862-g0f37666d87d2-dirty #147
[ 252.617475] Call Trace:
[ 252.617475] <IRQ>
[ 252.617475] dump_stack_lvl+0x64/0xb0
[ 252.617475] print_report+0xcf/0x620
[ 252.617475] ? tc_run+0x50a/0x5b0
[ 252.617475] ? kasan_complete_mode_report_info+0x7c/0x200
[ 252.617475] ? tc_run+0x50a/0x5b0
[ 252.617475] kasan_report+0xc9/0x110
[ 252.617475] ? tc_run+0x50a/0x5b0
[ 252.617475] __asan_report_load4_noabort+0x14/0x20
[ 252.617475] tc_run+0x50a/0x5b0
[ 252.617475] ? __pfx_tc_run+0x10/0x10
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] ? kvm_sched_clock_read+0x11/0x20
[ 252.617475] ? __lock_acquire+0xcf0/0x6760
[ 252.617475] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.617475] ? __pfx___lock_acquire+0x10/0x10
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.617475] ? __pfx_lock_acquire+0x10/0x10
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] netif_receive_skb+0xf9/0x4f0
[ 252.617475] ? __pfx_netif_receive_skb+0x10/0x10
[ 252.617475] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.617475] tcf_mirred_act+0x3d1/0xe50
[ 252.617475] ? __pfx_tcf_mirred_act+0x10/0x10
[ 252.617475] tcf_action_exec+0x561/0x870
[ 252.617475] mall_classify+0x168/0x240
[ 252.617475] __tcf_classify.constprop.0+0x50c/0x660
[ 252.617475] tcf_classify+0x6e/0xa0
[ 252.617475] ? __pfx_tcf_classify+0x10/0x10
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.617475] tc_run+0x31b/0x5b0
[ 252.617475] ? __pfx_tc_run+0x10/0x10
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] ? kvm_sched_clock_read+0x11/0x20
[ 252.617475] ? __lock_acquire+0xcf0/0x6760
[ 252.617475] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.617475] ? __pfx___lock_acquire+0x10/0x10
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.617475] ? __pfx_lock_acquire+0x10/0x10
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] netif_receive_skb+0xf9/0x4f0
[ 252.617475] ? __pfx_netif_receive_skb+0x10/0x10
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.617475] tcf_mirred_act+0x3d1/0xe50
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] ? __pfx_tcf_mirred_act+0x10/0x10
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] ? mark_lock.part.0+0xfa/0x31a0
[ 252.617475] ? __pfx_mark_lock.part.0+0x10/0x10
[ 252.617475] tcf_action_exec+0x561/0x870
[ 252.617475] mall_classify+0x168/0x240
[ 252.617475] ? mark_lock.part.0+0xfa/0x31a0
[ 252.617475] __tcf_classify.constprop.0+0x50c/0x660
[ 252.617475] tcf_classify+0x6e/0xa0
[ 252.617475] ? __pfx_tcf_classify+0x10/0x10
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.617475] tc_run+0x31b/0x5b0
[ 252.617475] ? __pfx_tc_run+0x10/0x10
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] ? kvm_sched_clock_read+0x11/0x20
[ 252.617475] ? __lock_acquire+0xcf0/0x6760
[ 252.617475] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.617475] ? __pfx___lock_acquire+0x10/0x10
[ 252.617475] ? __pfx_mark_lock.part.0+0x10/0x10
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.617475] ? __pfx_lock_acquire+0x10/0x10
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] ? __lock_acquire+0xd9f/0x6760
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] netif_receive_skb+0xf9/0x4f0
[ 252.617475] ? __pfx_netif_receive_skb+0x10/0x10
[ 252.617475] ? __pfx___lock_acquire+0x10/0x10
[ 252.617475] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.617475] tcf_mirred_act+0x3d1/0xe50
[ 252.617475] ? __pfx_tcf_mirred_act+0x10/0x10
[ 252.617475] ? update_stack_state+0x2d4/0x600
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] tcf_action_exec+0x561/0x870
[ 252.617475] mall_classify+0x168/0x240
[ 252.617475] __tcf_classify.constprop.0+0x50c/0x660
[ 252.617475] tcf_classify+0x6e/0xa0
[ 252.617475] ? __pfx_tcf_classify+0x10/0x10
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.617475] tc_run+0x31b/0x5b0
[ 252.617475] ? __pfx_tc_run+0x10/0x10
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] ? kvm_sched_clock_read+0x11/0x20
[ 252.617475] ? kvm_sched_clock_read+0x11/0x20
[ 252.617475] ? __lock_acquire+0xcf0/0x6760
[ 252.617475] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.617475] ? __pfx___lock_acquire+0x10/0x10
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.617475] ? __pfx_lock_acquire+0x10/0x10
[ 252.617475] ? _raw_spin_unlock_irq+0x28/0x60
[ 252.617475] ? process_backlog+0x416/0x5c0
[ 252.617475] ? __this_cpu_preempt_check+0x13/0x20
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] process_backlog+0xd7/0x5c0
[ 252.617475] __napi_poll.constprop.0+0xae/0x590
[ 252.617475] ? net_rx_action+0x252/0xb40
[ 252.617475] net_rx_action+0x47c/0xb40
[ 252.617475] ? __pfx_net_rx_action+0x10/0x10
[ 252.617475] ? irq_exit_rcu+0xb4/0x120
[ 252.617475] ? __this_cpu_preempt_check+0x13/0x20
[ 252.617475] __do_softirq+0x1c3/0x853
[ 252.617475] ? __dev_queue_xmit+0x9b3/0x3550
[ 252.617475] do_softirq+0xb5/0xf0
[ 252.617475] </IRQ>
[ 252.617475] <TASK>
[ 252.617475] __local_bh_enable_ip+0x104/0x130
[ 252.617475] ? __dev_queue_xmit+0x9b3/0x3550
[ 252.617475] __dev_queue_xmit+0x9c8/0x3550
[ 252.617475] ? __pfx_mark_lock.part.0+0x10/0x10
[ 252.617475] ? __pfx___dev_queue_xmit+0x10/0x10
[ 252.617475] ? __this_cpu_preempt_check+0x13/0x20
[ 252.617475] ? neigh_resolve_output+0x5bd/0x7b0
[ 252.617475] ? ip_finish_output2+0x6ac/0x1f40
[ 252.617475] ? __this_cpu_preempt_check+0x13/0x20
[ 252.617475] ? eth_header+0x16e/0x1b0
[ 252.617475] neigh_resolve_output+0x431/0x7b0
[ 252.617475] ip_finish_output2+0x6ac/0x1f40
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] ? selinux_ip_postroute+0x357/0x400
[ 252.617475] ? __pfx_selinux_ip_postroute+0x10/0x10
[ 252.617475] ? __get_random_u32_below+0x17/0x70
[ 252.617475] ? __pfx_ip_finish_output2+0x10/0x10
[ 252.617475] ? __this_cpu_preempt_check+0x13/0x20
[ 252.617475] __ip_finish_output+0x6ea/0xed0
[ 252.617475] ? __ip_finish_output+0x6ea/0xed0
[ 252.617475] ip_finish_output+0x2b/0x260
[ 252.617475] ip_output+0x175/0x500
[ 252.617475] ? __pfx_ip_output+0x10/0x10
[ 252.617475] ? __ip_make_skb+0xdb5/0x2060
[ 252.617475] ? __pfx_ip_finish_output+0x10/0x10
[ 252.617475] ip_push_pending_frames+0x249/0x480
[ 252.617475] raw_sendmsg+0xf13/0x2d30
[ 252.617475] ? __kasan_check_read+0x11/0x20
[ 252.617475] ? __lock_acquire+0x1af0/0x6760
[ 252.617475] ? check_preemption_disabled+0x70/0xf0
[ 252.617475] ? __pfx_raw_sendmsg+0x10/0x10
[ 252.617475] ? mark_lock.part.0+0xfa/0x31a0
[ 252.617475] ? kvm_sched_clock_read+0x11/0x20
[ 252.617475] ? __pfx_sock_has_perm+0x10/0x10
[ 252.617475] ? __pfx_lock_release+0x10/0x10
[ 252.617475] ? __might_fault+0xc9/0x180
[ 252.617475] inet_sendmsg+0xd5/0x100
[ 252.617475] ? inet_sendmsg+0xd5/0x100
[ 252.617475] __sys_sendto+0x3be/0x510
[ 252.617475] ? __pfx___sys_sendto+0x10/0x10
[ 252.617475] ? do_user_addr_fault+0x401/0xb70
[ 252.617475] ? __this_cpu_preempt_check+0x13/0x20
[ 252.617475] ? __pfx_lock_release+0x10/0x10
[ 252.617475] ? __pfx___up_read+0x10/0x10
[ 252.617475] ? irqentry_exit_to_user_mode+0x8d/0x1d0
[ 252.617475] ? irqentry_exit+0x6b/0x90
[ 252.617475] ? debug_smp_processor_id+0x17/0x20
[ 252.617475] __x64_sys_sendto+0xe0/0x1a0
[ 252.617475] ? do_syscall_64+0x2a/0x140
[ 252.617475] ? trace_hardirqs_on+0x22/0x120
[ 252.617475] do_syscall_64+0x6d/0x140
[ 252.617475] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 252.617475] RIP: 0033:0x7f1d7a8a4b93
[ 252.617475] Code: 8b 15 71 72 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 51 fa 0c 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
[ 252.617475] RSP: 002b:00007fffe5c7c578 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 252.617475] RAX: ffffffffffffffda RBX: 000055c8f446f340 RCX: 00007f1d7a8a4b93
[ 252.617475] RDX: 0000000000000040 RSI: 000055c8f62889c0 RDI: 0000000000000003
[ 252.617475] RBP: 000055c8f62889c0 R08: 000055c8f44715c0 R09: 0000000000000010
[ 252.617475] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040
[ 252.617475] R13: 00007fffe5c7dc60 R14: 0000001d00000001 R15: 000055c8f4472680
[ 252.617475] </TASK>
[ 252.617475] Allocated by task 548:
[ 252.617475] kasan_save_stack+0x28/0x50
[ 252.617475] kasan_save_track+0x14/0x40
[ 252.617475] kasan_save_alloc_info+0x65/0x80
[ 252.617475] __kasan_slab_alloc+0x8d/0x90
[ 252.617475] kmem_cache_alloc_node+0x115/0x310
[ 252.617475] __alloc_skb+0x20b/0x2c0
[ 252.617475] __ip_append_data+0x27f8/0x3de0
[ 252.617475] ip_append_data+0xd4/0x190
[ 252.617475] raw_sendmsg+0xa91/0x2d30
[ 252.617475] inet_sendmsg+0xd5/0x100
[ 252.617475] __sys_sendto+0x3be/0x510
[ 252.617475] __x64_sys_sendto+0xe0/0x1a0
[ 252.617475] do_syscall_64+0x6d/0x140
[ 252.617475] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 252.617475] Freed by task 548:
[ 252.617475] kasan_save_stack+0x28/0x50
[ 252.617475] kasan_save_track+0x14/0x40
[ 252.617475] kasan_save_free_info+0x41/0x60
[ 252.617475] poison_slab_object+0x10e/0x190
[ 252.617475] __kasan_slab_free+0x38/0x90
[ 252.617475] kmem_cache_free+0x106/0x3e0
[ 252.617475] kfree_skbmem+0x9e/0x150
[ 252.617475] kfree_skb_reason+0x118/0x370
[ 252.617475] __netif_receive_skb_core.constprop.0+0x1a4e/0x3910
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] netif_receive_skb+0xf9/0x4f0
[ 252.617475] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.617475] tcf_mirred_act+0x3d1/0xe50
[ 252.617475] tcf_action_exec+0x561/0x870
[ 252.617475] mall_classify+0x168/0x240
[ 252.617475] __tcf_classify.constprop.0+0x50c/0x660
[ 252.617475] tcf_classify+0x6e/0xa0
[ 252.617475] tc_run+0x31b/0x5b0
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] netif_receive_skb+0xf9/0x4f0
[ 252.617475] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.617475] tcf_mirred_act+0x3d1/0xe50
[ 252.617475] tcf_action_exec+0x561/0x870
[ 252.617475] mall_classify+0x168/0x240
[ 252.617475] __tcf_classify.constprop.0+0x50c/0x660
[ 252.617475] tcf_classify+0x6e/0xa0
[ 252.617475] tc_run+0x31b/0x5b0
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] netif_receive_skb+0xf9/0x4f0
[ 252.617475] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.617475] tcf_mirred_act+0x3d1/0xe50
[ 252.617475] tcf_action_exec+0x561/0x870
[ 252.617475] mall_classify+0x168/0x240
[ 252.617475] __tcf_classify.constprop.0+0x50c/0x660
[ 252.617475] tcf_classify+0x6e/0xa0
[ 252.617475] tc_run+0x31b/0x5b0
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] netif_receive_skb+0xf9/0x4f0
[ 252.617475] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.617475] tcf_mirred_act+0x3d1/0xe50
[ 252.617475] tcf_action_exec+0x561/0x870
[ 252.617475] mall_classify+0x168/0x240
[ 252.617475] __tcf_classify.constprop.0+0x50c/0x660
[ 252.617475] tcf_classify+0x6e/0xa0
[ 252.617475] tc_run+0x31b/0x5b0
[ 252.617475] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.617475] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.617475] __netif_receive_skb+0x21/0x1b0
[ 252.617475] process_backlog+0xd7/0x5c0
[ 252.617475] __napi_poll.constprop.0+0xae/0x590
[ 252.617475] net_rx_action+0x47c/0xb40
[ 252.617475] __do_softirq+0x1c3/0x853
[ 252.617475] The buggy address belongs to the object at ffff88800d453540
which belongs to the cache skbuff_head_cache of size 224
[ 252.617475] The buggy address is located 68 bytes inside of
freed 224-byte region [ffff88800d453540, ffff88800d453620)
[ 252.617475] The buggy address belongs to the physical page:
[ 252.617475] page:00000000848240a3 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800d453b80 pfn:0xd452
[ 252.617475] head:00000000848240a3 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 252.617475] flags: 0x100000000000840(slab|head|node=0|zone=1)
[ 252.617475] page_type: 0xffffffff()
[ 252.617475] raw: 0100000000000840 ffff88800244e3c0 dead000000000100 dead000000000122
[ 252.617475] raw: ffff88800d453b80 0000000080190004 00000001ffffffff 0000000000000000
[ 252.617475] page dumped because: kasan: bad access detected
[ 252.617475] Memory state around the buggy address:
[ 252.617475] ffff88800d453480: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 252.617475] ffff88800d453500: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
[ 252.617475] >ffff88800d453580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 252.617475] ^
[ 252.617475] ffff88800d453600: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
[ 252.617475] ffff88800d453680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 252.617475] ==================================================================
[ 252.684330] Disabling lock debugging due to kernel taint
[ 252.684601] ------------[ cut here ]------------
[ 252.684839] refcount_t: underflow; use-after-free.
[ 252.685141] WARNING: CPU: 8 PID: 548 at lib/refcount.c:28 refcount_warn_saturate+0xd4/0x190
[ 252.685567] Modules linked in:
[ 252.685728] CPU: 8 PID: 548 Comm: ping Tainted: G B 6.8.0-rc3-00862-g0f37666d87d2-dirty #147
[ 252.686214] RIP: 0010:refcount_warn_saturate+0xd4/0x190
[ 252.686501] Code: cc cc cc cc 0f b6 05 26 19 f7 02 3c 01 0f 87 d1 3e 68 01 84 c0 75 e4 48 c7 c7 e0 33 83 97 c6 05 0c 19 f7 02 01 e8 2c ed f0 fe <0f> 0b eb cd 0f b6 05 fa 18 f7 02 3c 01 0f 87 a7 3e 68 01 84 c0 75
[ 252.687439] RSP: 0018:ffff888038a08480 EFLAGS: 00010286
[ 252.687699] RAX: 0000000000000000 RBX: ffff88801110829c RCX: 0000000000000000
[ 252.688057] RDX: 0000000000000103 RSI: 0000000000000004 RDI: 0000000000000001
[ 252.688457] RBP: ffff888038a08490 R08: 0000000000000001 R09: ffffed100717d139
[ 252.688804] R10: ffff888038be89cb R11: 0000000063666572 R12: 0000000000000003
[ 252.689163] R13: 0000000000000340 R14: ffff888011108060 R15: ffff88801110829c
[ 252.689545] FS: 00007f1d7a5e2c40(0000) GS:ffff888038a00000(0000) knlGS:0000000000000000
[ 252.689927] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 252.690210] CR2: 00007f1d7a85e0d0 CR3: 000000000b184000 CR4: 0000000000750ef0
[ 252.690584] PKRU: 55555554
[ 252.690731] Call Trace:
[ 252.690871] <IRQ>
[ 252.690988] ? show_regs+0x6a/0x80
[ 252.691155] ? __warn+0xcf/0x2c0
[ 252.691327] ? refcount_warn_saturate+0xd4/0x190
[ 252.691585] ? report_bug+0x341/0x3d0
[ 252.691777] ? handle_bug+0x40/0x80
[ 252.691966] ? exc_invalid_op+0x19/0x50
[ 252.692153] ? asm_exc_invalid_op+0x1b/0x20
[ 252.692409] ? refcount_warn_saturate+0xd4/0x190
[ 252.692646] sock_wfree+0x596/0x770
[ 252.692816] ? __netif_receive_skb_core.constprop.0+0x1a4e/0x3910
[ 252.693138] skb_release_head_state+0x7e/0x1e0
[ 252.693388] kfree_skb_reason+0xd3/0x370
[ 252.693576] __netif_receive_skb_core.constprop.0+0x1a4e/0x3910
[ 252.693889] ? kvm_sched_clock_read+0x11/0x20
[ 252.694112] ? __lock_acquire+0xcf0/0x6760
[ 252.694315] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.694650] ? __pfx___lock_acquire+0x10/0x10
[ 252.694871] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.695119] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.695415] ? __pfx_lock_acquire+0x10/0x10
[ 252.695630] __netif_receive_skb+0x21/0x1b0
[ 252.695833] netif_receive_skb+0xf9/0x4f0
[ 252.696040] ? __pfx_netif_receive_skb+0x10/0x10
[ 252.696274] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.696495] tcf_mirred_act+0x3d1/0xe50
[ 252.696687] ? __pfx_tcf_mirred_act+0x10/0x10
[ 252.696901] tcf_action_exec+0x561/0x870
[ 252.697108] mall_classify+0x168/0x240
[ 252.697305] __tcf_classify.constprop.0+0x50c/0x660
[ 252.697569] tcf_classify+0x6e/0xa0
[ 252.697758] ? __pfx_tcf_classify+0x10/0x10
[ 252.697963] ? debug_smp_processor_id+0x17/0x20
[ 252.698176] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.698456] tc_run+0x31b/0x5b0
[ 252.698605] ? __pfx_tc_run+0x10/0x10
[ 252.698799] ? debug_smp_processor_id+0x17/0x20
[ 252.699007] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.699242] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.699535] ? kvm_sched_clock_read+0x11/0x20
[ 252.699754] ? __lock_acquire+0xcf0/0x6760
[ 252.699950] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.700239] ? __pfx___lock_acquire+0x10/0x10
[ 252.700478] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.700721] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.701006] ? __pfx_lock_acquire+0x10/0x10
[ 252.701193] __netif_receive_skb+0x21/0x1b0
[ 252.701415] netif_receive_skb+0xf9/0x4f0
[ 252.701595] ? __pfx_netif_receive_skb+0x10/0x10
[ 252.701811] ? __kasan_check_read+0x11/0x20
[ 252.702013] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.702189] tcf_mirred_act+0x3d1/0xe50
[ 252.702370] ? __kasan_check_read+0x11/0x20
[ 252.702579] ? __pfx_tcf_mirred_act+0x10/0x10
[ 252.702782] ? __kasan_check_read+0x11/0x20
[ 252.702978] ? mark_lock.part.0+0xfa/0x31a0
[ 252.703176] ? __pfx_mark_lock.part.0+0x10/0x10
[ 252.703407] tcf_action_exec+0x561/0x870
[ 252.703594] mall_classify+0x168/0x240
[ 252.703789] ? mark_lock.part.0+0xfa/0x31a0
[ 252.704000] __tcf_classify.constprop.0+0x50c/0x660
[ 252.704238] tcf_classify+0x6e/0xa0
[ 252.704437] ? __pfx_tcf_classify+0x10/0x10
[ 252.704641] ? debug_smp_processor_id+0x17/0x20
[ 252.704854] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.705105] tc_run+0x31b/0x5b0
[ 252.705264] ? __pfx_tc_run+0x10/0x10
[ 252.705469] ? debug_smp_processor_id+0x17/0x20
[ 252.705680] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.705932] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.706192] ? __kasan_check_read+0x11/0x20
[ 252.706398] ? kvm_sched_clock_read+0x11/0x20
[ 252.706631] ? __lock_acquire+0xcf0/0x6760
[ 252.706830] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.707137] ? __pfx___lock_acquire+0x10/0x10
[ 252.707373] ? __pfx_mark_lock.part.0+0x10/0x10
[ 252.707604] ? __kasan_check_read+0x11/0x20
[ 252.707793] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.708024] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.708278] ? __pfx_lock_acquire+0x10/0x10
[ 252.708515] ? __kasan_check_read+0x11/0x20
[ 252.708721] ? __lock_acquire+0xd9f/0x6760
[ 252.708901] __netif_receive_skb+0x21/0x1b0
[ 252.709082] netif_receive_skb+0xf9/0x4f0
[ 252.709279] ? __pfx_netif_receive_skb+0x10/0x10
[ 252.709524] ? __pfx___lock_acquire+0x10/0x10
[ 252.709764] tcf_mirred_to_dev+0x7dc/0xdd0
[ 252.709958] tcf_mirred_act+0x3d1/0xe50
[ 252.710130] ? __pfx_tcf_mirred_act+0x10/0x10
[ 252.710331] ? update_stack_state+0x2d4/0x600
[ 252.710595] ? __kasan_check_read+0x11/0x20
[ 252.710792] tcf_action_exec+0x561/0x870
[ 252.710973] mall_classify+0x168/0x240
[ 252.711156] __tcf_classify.constprop.0+0x50c/0x660
[ 252.711390] tcf_classify+0x6e/0xa0
[ 252.711566] ? __pfx_tcf_classify+0x10/0x10
[ 252.711762] ? debug_smp_processor_id+0x17/0x20
[ 252.711980] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.712212] tc_run+0x31b/0x5b0
[ 252.712369] ? __pfx_tc_run+0x10/0x10
[ 252.712541] ? debug_smp_processor_id+0x17/0x20
[ 252.712746] ? rcu_lockdep_current_cpu_online+0x3e/0x150
[ 252.713001] __netif_receive_skb_core.constprop.0+0x12e1/0x3910
[ 252.713272] ? kvm_sched_clock_read+0x11/0x20
[ 252.713518] ? kvm_sched_clock_read+0x11/0x20
[ 252.713739] ? __lock_acquire+0xcf0/0x6760
[ 252.713928] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 252.714228] ? __pfx___lock_acquire+0x10/0x10
[ 252.714460] __netif_receive_skb_one_core+0xb4/0x1c0
[ 252.714720] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 252.714958] ? __pfx_lock_acquire+0x10/0x10
[ 252.715164] ? _raw_spin_unlock_irq+0x28/0x60
[ 252.715388] ? process_backlog+0x416/0x5c0
[ 252.715596] ? __this_cpu_preempt_check+0x13/0x20
[ 252.715834] __netif_receive_skb+0x21/0x1b0
[ 252.716024] process_backlog+0xd7/0x5c0
[ 252.716230] __napi_poll.constprop.0+0xae/0x590
[ 252.716469] ? net_rx_action+0x252/0xb40
[ 252.716702] net_rx_action+0x47c/0xb40
[ 252.716884] ? __pfx_net_rx_action+0x10/0x10
[ 252.717083] ? irq_exit_rcu+0xb4/0x120
[ 252.717263] ? __this_cpu_preempt_check+0x13/0x20
[ 252.717516] __do_softirq+0x1c3/0x853
[ 252.717702] ? __dev_queue_xmit+0x9b3/0x3550
[ 252.717900] do_softirq+0xb5/0xf0
[ 252.718062] </IRQ>
[ 252.718166] <TASK>
[ 252.718290] __local_bh_enable_ip+0x104/0x130
[ 252.718516] ? __dev_queue_xmit+0x9b3/0x3550
[ 252.718737] __dev_queue_xmit+0x9c8/0x3550
[ 252.718927] ? __pfx_mark_lock.part.0+0x10/0x10
[ 252.719139] ? __pfx___dev_queue_xmit+0x10/0x10
[ 252.719398] ? __this_cpu_preempt_check+0x13/0x20
[ 252.719632] ? neigh_resolve_output+0x5bd/0x7b0
[ 252.719828] ? ip_finish_output2+0x6ac/0x1f40
[ 252.720034] ? __this_cpu_preempt_check+0x13/0x20
[ 252.720261] ? eth_header+0x16e/0x1b0
[ 252.720487] neigh_resolve_output+0x431/0x7b0
[ 252.720694] ip_finish_output2+0x6ac/0x1f40
[ 252.720906] ? __kasan_check_read+0x11/0x20
[ 252.721105] ? selinux_ip_postroute+0x357/0x400
[ 252.721313] ? __pfx_selinux_ip_postroute+0x10/0x10
[ 252.721558] ? __get_random_u32_below+0x17/0x70
[ 252.721796] ? __pfx_ip_finish_output2+0x10/0x10
[ 252.722011] ? __this_cpu_preempt_check+0x13/0x20
[ 252.722243] __ip_finish_output+0x6ea/0xed0
[ 252.722463] ? __ip_finish_output+0x6ea/0xed0
[ 252.722675] ip_finish_output+0x2b/0x260
[ 252.722859] ip_output+0x175/0x500
[ 252.723040] ? __pfx_ip_output+0x10/0x10
[ 252.723237] ? __ip_make_skb+0xdb5/0x2060
[ 252.723448] ? __pfx_ip_finish_output+0x10/0x10
[ 252.723666] ip_push_pending_frames+0x249/0x480
[ 252.723869] raw_sendmsg+0xf13/0x2d30
[ 252.724044] ? __kasan_check_read+0x11/0x20
[ 252.724240] ? __lock_acquire+0x1af0/0x6760
[ 252.724454] ? check_preemption_disabled+0x70/0xf0
[ 252.724688] ? __pfx_raw_sendmsg+0x10/0x10
[ 252.724867] ? mark_lock.part.0+0xfa/0x31a0
[ 252.725069] ? kvm_sched_clock_read+0x11/0x20
[ 252.725270] ? __pfx_sock_has_perm+0x10/0x10
[ 252.725501] ? __pfx_lock_release+0x10/0x10
[ 252.725698] ? __might_fault+0xc9/0x180
[ 252.725877] inet_sendmsg+0xd5/0x100
[ 252.726042] ? inet_sendmsg+0xd5/0x100
[ 252.726219] __sys_sendto+0x3be/0x510
[ 252.726399] ? __pfx___sys_sendto+0x10/0x10
[ 252.726609] ? do_user_addr_fault+0x401/0xb70
[ 252.726842] ? __this_cpu_preempt_check+0x13/0x20
[ 252.727048] ? __pfx_lock_release+0x10/0x10
[ 252.727252] ? __pfx___up_read+0x10/0x10
[ 252.727457] ? irqentry_exit_to_user_mode+0x8d/0x1d0
[ 252.727701] ? irqentry_exit+0x6b/0x90
[ 252.727872] ? debug_smp_processor_id+0x17/0x20
[ 252.728109] __x64_sys_sendto+0xe0/0x1a0
[ 252.728294] ? do_syscall_64+0x2a/0x140
[ 252.728495] ? trace_hardirqs_on+0x22/0x120
[ 252.728706] do_syscall_64+0x6d/0x140
[ 252.728873] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 252.729106] RIP: 0033:0x7f1d7a8a4b93
[ 252.729279] Code: 8b 15 71 72 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 51 fa 0c 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
[ 252.730188] RSP: 002b:00007fffe5c7c578 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 252.730558] RAX: ffffffffffffffda RBX: 000055c8f446f340 RCX: 00007f1d7a8a4b93
[ 252.730886] RDX: 0000000000000040 RSI: 000055c8f62889c0 RDI: 0000000000000003
[ 252.731222] RBP: 000055c8f62889c0 R08: 000055c8f44715c0 R09: 0000000000000010
[ 252.731580] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040
[ 252.731934] R13: 00007fffe5c7dc60 R14: 0000001d00000001 R15: 000055c8f4472680
[ 252.732261] </TASK>
[ 252.732393] irq event stamp: 9034
[ 252.732572] hardirqs last enabled at (9034): [<ffffffff972f451a>] irqentry_exit+0x3a/0x90
[ 252.732938] hardirqs last disabled at (9033): [<ffffffff972f3030>] sysvec_apic_timer_interrupt+0x10/0xc0
[ 252.733390] softirqs last enabled at (8974): [<ffffffff966d2c63>] __dev_queue_xmit+0x9b3/0x3550
[ 252.733810] softirqs last disabled at (8975): [<ffffffff94b922d5>] do_softirq+0xb5/0xf0
[ 252.734195] ---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-12 15:11 ` Jamal Hadi Salim
@ 2024-02-13 11:06 ` Paolo Abeni
2024-02-14 0:27 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2024-02-13 11:06 UTC (permalink / raw)
To: Jamal Hadi Salim, Jakub Kicinski
Cc: davem, netdev, edumazet, Marcelo Ricardo Leitner, Davide Caratti,
xiyou.wangcong, jiri, shmulik.ladkani
On Mon, 2024-02-12 at 10:11 -0500, Jamal Hadi Salim wrote:
> On Mon, Feb 12, 2024 at 9:51 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > > lockdep.
> > >
> > > In the past there was a concern that the backlog indirection will
> > > lead to loss of error reporting / less accurate stats. But the current
> > > workaround does not seem to address the issue.
> > >
> >
> > Let us run some basic tests on this first - it's a hairy spot. Also,
>
> Something broke.
> Create a ns. Put one half of veth into the namespace. Create a filter
> inside the net ns.
> at_ns$ tc qdisc add dev port0 ingress_block 21 clsact
> at_ns$ tc filter add block 21 egress protocol ip prio 10 matchall
> action mirred ingress redirect dev port0
>
> Send a ping from host:
> at_host@ ping 10.0.0.2 -c 1 -I <vethportonhostside>
>
> And.. hits uaf.... see attached.
It looks like:
netif_receive_skb
run_tc()
act_mirred
netif_receive_skb
sch_handle_ingress
act_mirred // nesting limit hit
// free skb
// netif_receive_skb returns NET_RX_DROP
// act_mirred returns TC_ACT_SHOT
// UaF while de-referencing the (freed) skb
No idea how to solve it on top of my mind :(
Cheers,
Paolo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-13 11:06 ` Paolo Abeni
@ 2024-02-14 0:27 ` Jakub Kicinski
2024-02-14 3:40 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-14 0:27 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jamal Hadi Salim, davem, netdev, edumazet,
Marcelo Ricardo Leitner, Davide Caratti, xiyou.wangcong, jiri,
shmulik.ladkani, victor
On Tue, 13 Feb 2024 12:06:04 +0100 Paolo Abeni wrote:
> > Something broke.
> > Create a ns. Put one half of veth into the namespace. Create a filter
> > inside the net ns.
> > at_ns$ tc qdisc add dev port0 ingress_block 21 clsact
> > at_ns$ tc filter add block 21 egress protocol ip prio 10 matchall
> > action mirred ingress redirect dev port0
> >
> > Send a ping from host:
> > at_host@ ping 10.0.0.2 -c 1 -I <vethportonhostside>
> >
> > And.. hits uaf.... see attached.
>
> It looks like:
>
> netif_receive_skb
> run_tc()
> act_mirred
> netif_receive_skb
> sch_handle_ingress
> act_mirred // nesting limit hit
> // free skb
> // netif_receive_skb returns NET_RX_DROP
> // act_mirred returns TC_ACT_SHOT
> // UaF while de-referencing the (freed) skb
>
>
> No idea how to solve it on top of my mind :(
If I'm looking right the bug seems fairly straightforward but tricky
to cleanly fix :( I also haven't dug deep enough in the history to
be provide a real Fixes tag...
--->8-------------
net/sched: act_mirred: don't override retval if we already lost the skb
If we're redirecting the skb, and haven't called tcf_mirred_forward(),
yet, we need to tell the core to drop the skb by setting the retcode
to SHOT. If we have called tcf_mirred_forward(), however, the skb
is out of our hands and returning SHOT will lead to UaF.
Move the overrides up to the error paths which actually need them.
Note that the err variable is only used to store return code from
tcf_mirred_forward() and we don't have to set it.
Fixes: 16085e48cb48 ("net/sched: act_mirred: Create function tcf_mirred_to_dev and improve readability")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/sched/act_mirred.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 93a96e9d8d90..922a018329cd 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -270,7 +270,8 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
if (unlikely(!(dev->flags & IFF_UP)) || !netif_carrier_ok(dev)) {
net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
dev->name);
- err = -ENODEV;
+ if (is_redirect)
+ retval = TC_ACT_SHOT;
goto out;
}
@@ -284,7 +285,8 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
if (!dont_clone) {
skb_to_send = skb_clone(skb, GFP_ATOMIC);
if (!skb_to_send) {
- err = -ENOMEM;
+ if (is_redirect)
+ retval = TC_ACT_SHOT;
goto out;
}
}
@@ -327,8 +329,6 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
if (err) {
out:
tcf_action_inc_overlimit_qstats(&m->common);
- if (is_redirect)
- retval = TC_ACT_SHOT;
}
return retval;
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-14 0:27 ` Jakub Kicinski
@ 2024-02-14 3:40 ` Jakub Kicinski
0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-14 3:40 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jamal Hadi Salim, davem, netdev, edumazet,
Marcelo Ricardo Leitner, Davide Caratti, xiyou.wangcong, jiri,
shmulik.ladkani, victor
On Tue, 13 Feb 2024 16:27:44 -0800 Jakub Kicinski wrote:
> If I'm looking right the bug seems fairly straightforward but tricky
> to cleanly fix :( I also haven't dug deep enough in the history to
> be provide a real Fixes tag...
I submitted a slightly modified version officially:
https://lore.kernel.org/r/20240214033848.981211-2-kuba@kernel.org/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-09 23:54 [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress Jakub Kicinski
2024-02-12 14:51 ` Jamal Hadi Salim
@ 2024-02-14 15:11 ` Jamal Hadi Salim
2024-02-14 15:28 ` Jamal Hadi Salim
1 sibling, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-14 15:11 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani
On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> lockdep.
>
> In the past there was a concern that the backlog indirection will
> lead to loss of error reporting / less accurate stats. But the current
> workaround does not seem to address the issue.
>
> Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Suggested-by: Davide Caratti <dcaratti@redhat.com>
> Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: jhs@mojatatu.com
> CC: xiyou.wangcong@gmail.com
> CC: jiri@resnulli.us
> CC: shmulik.ladkani@gmail.com
> ---
> net/sched/act_mirred.c | 14 +++++---------
> .../testing/selftests/net/forwarding/tc_actions.sh | 3 ---
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 93a96e9d8d90..35c366f043d9 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> return err;
> }
>
> -static bool is_mirred_nested(void)
> -{
> - return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> -}
> -
> -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> +static int
> +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
> {
> int err;
>
> if (!want_ingress)
> err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> - else if (is_mirred_nested())
> + else if (!at_ingress)
> err = netif_rx(skb);
> else
> err = netif_receive_skb(skb);
> @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
>
> skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
>
> - err = tcf_mirred_forward(want_ingress, skb_to_send);
> + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> } else {
> - err = tcf_mirred_forward(want_ingress, skb_to_send);
> + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> }
>
> if (err) {
> diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> index b0f5e55d2d0b..589629636502 100755
> --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
> check_err $? "didn't mirred redirect ICMP"
> tc_check_packets "dev $h1 ingress" 102 10
> check_err $? "didn't drop mirred ICMP"
> - local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> - test ${overlimits} = 10
> - check_err $? "wrong overlimits, expected 10 got ${overlimits}"
>
> tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> --
> 2.43.0
>
Doing a quick test of this and other patch i saw..
cheers,
jamal
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-14 15:11 ` Jamal Hadi Salim
@ 2024-02-14 15:28 ` Jamal Hadi Salim
2024-02-14 16:10 ` Davide Caratti
0 siblings, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2024-02-14 15:28 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, Marcelo Ricardo Leitner,
Davide Caratti, xiyou.wangcong, jiri, shmulik.ladkani
On Wed, Feb 14, 2024 at 10:11 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > lockdep.
> >
> > In the past there was a concern that the backlog indirection will
> > lead to loss of error reporting / less accurate stats. But the current
> > workaround does not seem to address the issue.
> >
> > Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
> > Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Suggested-by: Davide Caratti <dcaratti@redhat.com>
> > Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > CC: jhs@mojatatu.com
> > CC: xiyou.wangcong@gmail.com
> > CC: jiri@resnulli.us
> > CC: shmulik.ladkani@gmail.com
> > ---
> > net/sched/act_mirred.c | 14 +++++---------
> > .../testing/selftests/net/forwarding/tc_actions.sh | 3 ---
> > 2 files changed, 5 insertions(+), 12 deletions(-)
> >
> > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > index 93a96e9d8d90..35c366f043d9 100644
> > --- a/net/sched/act_mirred.c
> > +++ b/net/sched/act_mirred.c
> > @@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> > return err;
> > }
> >
> > -static bool is_mirred_nested(void)
> > -{
> > - return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> > -}
> > -
> > -static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> > +static int
> > +tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
> > {
> > int err;
> >
> > if (!want_ingress)
> > err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> > - else if (is_mirred_nested())
> > + else if (!at_ingress)
> > err = netif_rx(skb);
> > else
> > err = netif_receive_skb(skb);
> > @@ -319,9 +315,9 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
> >
> > skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
> >
> > - err = tcf_mirred_forward(want_ingress, skb_to_send);
> > + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> > } else {
> > - err = tcf_mirred_forward(want_ingress, skb_to_send);
> > + err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
> > }
> >
> > if (err) {
> > diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > index b0f5e55d2d0b..589629636502 100755
> > --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> > +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > @@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
> > check_err $? "didn't mirred redirect ICMP"
> > tc_check_packets "dev $h1 ingress" 102 10
> > check_err $? "didn't drop mirred ICMP"
> > - local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> > - test ${overlimits} = 10
> > - check_err $? "wrong overlimits, expected 10 got ${overlimits}"
> >
> > tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> > tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> > --
> > 2.43.0
> >
>
>
> Doing a quick test of this and other patch i saw..
So tests pass - but on the list i only see one patch and the other is
on lore, not sure how to ACK something that is not on email, but FWIW:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
The second patch avoids the recursion issue (which was the root cause)
and the first patch is really undoing ca22da2fbd693
I dont know underlying issue in ca22da2fbd693 is solved (and dont have
time to look into it). Davide?
cheers,
jamal
> cheers,
> jamal
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-14 15:28 ` Jamal Hadi Salim
@ 2024-02-14 16:10 ` Davide Caratti
2024-02-15 0:31 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Davide Caratti @ 2024-02-14 16:10 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni,
Marcelo Ricardo Leitner, xiyou.wangcong, jiri, shmulik.ladkani
On Wed, Feb 14, 2024 at 10:28:27AM -0500, Jamal Hadi Salim wrote:
> On Wed, Feb 14, 2024 at 10:11 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Fri, Feb 9, 2024 at 6:54 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
> > > for nested calls to mirred ingress") hangs our testing VMs every 10 or so
> > > runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
> > > lockdep.
[...]
> > Doing a quick test of this and other patch i saw..
>
>
> So tests pass - but on the list i only see one patch and the other is
> on lore, not sure how to ACK something that is not on email, but FWIW:
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>
> The second patch avoids the recursion issue (which was the root cause)
> and the first patch is really undoing ca22da2fbd693
If I well read, Jakub's patch [1] uses the backlog for egress->ingress
regardless of the "nest level": no more calls to netif_receive_skb():
It's the same as my initial proposal for fixing the OVS soft-lockup [2],
the code is different because now we have tcf_mirred_to_dev().
I'm ok with this solution, and maybe we need to re-think if it's
really a problem to loose the statistics for action subsequent to a
mirred redirect egress->ingress. IMHO it's not a problem to loose them:
if RPS or RX timestamping is enabled on the RX target device, we would
loose this information anyways (maybe this answers to Jiri's question
above in the thread).
The other patch [3] fixes the reported UAF, but if the Fixes: tag is
correct it will probably need some special care if it propagates to
stable branches.
thanks,
--
davide
[1] https://lore.kernel.org/netdev/20240209235413.3717039-1-kuba@kernel.org/
[2] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
[3] https://lore.kernel.org/netdev/20240214033848.981211-2-kuba@kernel.org/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-14 16:10 ` Davide Caratti
@ 2024-02-15 0:31 ` Jakub Kicinski
2024-02-15 17:55 ` Davide Caratti
0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-02-15 0:31 UTC (permalink / raw)
To: Davide Caratti
Cc: Jamal Hadi Salim, davem, netdev, edumazet, pabeni,
Marcelo Ricardo Leitner, xiyou.wangcong, jiri, shmulik.ladkani
On Wed, 14 Feb 2024 17:10:37 +0100 Davide Caratti wrote:
> > So tests pass - but on the list i only see one patch and the other is
> > on lore, not sure how to ACK something that is not on email, but FWIW:
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> >
> > The second patch avoids the recursion issue (which was the root cause)
> > and the first patch is really undoing ca22da2fbd693
>
> If I well read, Jakub's patch [1] uses the backlog for egress->ingress
> regardless of the "nest level": no more calls to netif_receive_skb():
> It's the same as my initial proposal for fixing the OVS soft-lockup [2],
> the code is different because now we have tcf_mirred_to_dev().
FWIW feel free to add your Sob or Co-dev+Sob!
It is very much the same solution as what you posted at some stage.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2024-02-15 0:31 ` Jakub Kicinski
@ 2024-02-15 17:55 ` Davide Caratti
0 siblings, 0 replies; 17+ messages in thread
From: Davide Caratti @ 2024-02-15 17:55 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jamal Hadi Salim, davem, netdev, edumazet, pabeni,
Marcelo Ricardo Leitner, xiyou.wangcong, jiri, shmulik.ladkani
hi,
On Thu, Feb 15, 2024 at 1:31 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
[...]
> > If I well read, Jakub's patch [1] uses the backlog for egress->ingress
> > regardless of the "nest level": no more calls to netif_receive_skb():
> > It's the same as my initial proposal for fixing the OVS soft-lockup [2],
> > the code is different because now we have tcf_mirred_to_dev().
>
> FWIW feel free to add your Sob or Co-dev+Sob!
> It is very much the same solution as what you posted at some stage.
I'm ok with the current tags! thanks!
--
davide
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
@ 2022-09-23 15:11 Davide Caratti
2022-09-25 18:08 ` Cong Wang
2022-11-18 23:07 ` Peilin Ye
0 siblings, 2 replies; 17+ messages in thread
From: Davide Caratti @ 2022-09-23 15:11 UTC (permalink / raw)
To: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Paolo Abeni
Cc: Marcelo Ricardo Leitner, wizhao, netdev
William reports kernel soft-lockups on some OVS topologies when TC mirred
"egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
mirred action in egress-to-ingress can easily produce a dmesg splat like:
============================================
WARNING: possible recursive locking detected
6.0.0-rc4+ #511 Not tainted
--------------------------------------------
nc/1037 is trying to acquire lock:
ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
but task is already holding lock:
ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(slock-AF_INET/1);
lock(slock-AF_INET/1);
*** DEADLOCK ***
May be due to missing lock nesting notation
12 locks held by nc/1037:
#0: ffff950687843d40 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
#1: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
#2: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
#3: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
#4: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
#5: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
#6: ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
#7: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
#8: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
#9: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
#10: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
#11: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
stack backtrace:
CPU: 1 PID: 1037 Comm: nc Not tainted 6.0.0-rc4+ #511
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x5b
__lock_acquire.cold.76+0x121/0x2a7
lock_acquire+0xd5/0x310
_raw_spin_lock_nested+0x39/0x70
tcp_v4_rcv+0x1023/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_rcv_state_process+0x669/0x1290
tcp_v4_do_rcv+0xd7/0x370
tcp_v4_rcv+0x10bc/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_write_xmit+0x229/0x12c0
__tcp_push_pending_frames+0x32/0xf0
tcp_sendmsg_locked+0x297/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x58/0x70
__sys_sendto+0xfd/0x170
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f11a06fd281
Code: 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 43 2c 00 41 89 ca 8b 00 85 c0 75 1c 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 41 56 41 89 ce 41 55
RSP: 002b:00007ffd17958358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000555c6e671610 RCX: 00007f11a06fd281
RDX: 0000000000002000 RSI: 0000555c6e73a9f0 RDI: 0000000000000003
RBP: 0000555c6e6433b0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
R13: 0000555c6e671410 R14: 0000555c6e671410 R15: 0000555c6e6433f8
</TASK>
that is very similar to those observed by William in his setup.
By using netif_rx() for mirred ingress packets, packets are queued in the
backlog, like it's done in the receive path of "loopback" and "veth", and
the deadlock is not visible anymore. Also add a selftest that can be used
to reproduce the problem / verify the fix.
Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
Reported-by: William Zhao <wizhao@redhat.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
net/sched/act_mirred.c | 2 +-
.../selftests/net/forwarding/tc_actions.sh | 29 ++++++++++++++++++-
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index a1d70cf86843..ff965ed2dd9f 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -213,7 +213,7 @@ static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
if (!want_ingress)
err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
else
- err = netif_receive_skb(skb);
+ err = netif_rx(skb);
return err;
}
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index 1e0a62f638fe..6e1b0cc68f7d 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -3,7 +3,8 @@
ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
mirred_egress_mirror_test matchall_mirred_egress_mirror_test \
- gact_trap_test mirred_egress_to_ingress_test"
+ gact_trap_test mirred_egress_to_ingress_test \
+ mirred_egress_to_ingress_tcp_test"
NUM_NETIFS=4
source tc_common.sh
source lib.sh
@@ -198,6 +199,32 @@ mirred_egress_to_ingress_test()
log_test "mirred_egress_to_ingress ($tcflags)"
}
+mirred_egress_to_ingress_tcp_test()
+{
+ local tmpfile=$(mktemp) tmpfile1=$(mktemp)
+
+ RET=0
+ dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
+ tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
+ ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
+ action ct commit nat src addr 192.0.2.2 pipe \
+ action ct clear pipe \
+ action ct commit nat dst addr 192.0.2.1 pipe \
+ action ct clear pipe \
+ action skbedit ptype host pipe \
+ action mirred ingress redirect dev $h1
+
+ ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $tmpfile1 &
+ local rpid=$!
+ ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$tmpfile
+ wait -n $rpid
+ cmp -s $tmpfile $tmpfile1
+ check_err $? "server output check failed"
+ tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
+ rm -f $tmpfile $tmpfile1
+ log_test "mirred_egress_to_ingress_tcp"
+}
+
setup_prepare()
{
h1=${NETIFS[p1]}
--
2.37.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2022-09-23 15:11 Davide Caratti
@ 2022-09-25 18:08 ` Cong Wang
2022-10-04 17:40 ` Davide Caratti
2022-11-18 23:07 ` Peilin Ye
1 sibling, 1 reply; 17+ messages in thread
From: Cong Wang @ 2022-09-25 18:08 UTC (permalink / raw)
To: Davide Caratti
Cc: Jamal Hadi Salim, Jiri Pirko, Paolo Abeni,
Marcelo Ricardo Leitner, wizhao, netdev
On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> William reports kernel soft-lockups on some OVS topologies when TC mirred
> "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> mirred action in egress-to-ingress can easily produce a dmesg splat like:
>
> ============================================
> WARNING: possible recursive locking detected
> 6.0.0-rc4+ #511 Not tainted
> --------------------------------------------
> nc/1037 is trying to acquire lock:
> ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
>
> but task is already holding lock:
> ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(slock-AF_INET/1);
> lock(slock-AF_INET/1);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 12 locks held by nc/1037:
> #0: ffff950687843d40 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
> #1: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
> #2: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
> #3: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
> #4: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
> #5: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
> #6: ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> #7: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
> #8: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
> #9: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
> #10: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
> #11: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
>
> stack backtrace:
> CPU: 1 PID: 1037 Comm: nc Not tainted 6.0.0-rc4+ #511
> Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x44/0x5b
> __lock_acquire.cold.76+0x121/0x2a7
> lock_acquire+0xd5/0x310
> _raw_spin_lock_nested+0x39/0x70
> tcp_v4_rcv+0x1023/0x1160
> ip_protocol_deliver_rcu+0x4d/0x280
> ip_local_deliver_finish+0xac/0x160
> ip_local_deliver+0x71/0x220
> ip_rcv+0x5a/0x200
> __netif_receive_skb_one_core+0x89/0xa0
> netif_receive_skb+0x1c1/0x400
> tcf_mirred_act+0x2a5/0x610 [act_mirred]
> tcf_action_exec+0xb3/0x210
> fl_classify+0x1f7/0x240 [cls_flower]
> tcf_classify+0x7b/0x320
> __dev_queue_xmit+0x3a4/0x11b0
> ip_finish_output2+0x3b8/0xa10
> ip_output+0x7f/0x260
> __ip_queue_xmit+0x1ce/0x610
> __tcp_transmit_skb+0xabc/0xc80
> tcp_rcv_state_process+0x669/0x1290
> tcp_v4_do_rcv+0xd7/0x370
> tcp_v4_rcv+0x10bc/0x1160
> ip_protocol_deliver_rcu+0x4d/0x280
> ip_local_deliver_finish+0xac/0x160
> ip_local_deliver+0x71/0x220
> ip_rcv+0x5a/0x200
> __netif_receive_skb_one_core+0x89/0xa0
> netif_receive_skb+0x1c1/0x400
> tcf_mirred_act+0x2a5/0x610 [act_mirred]
> tcf_action_exec+0xb3/0x210
> fl_classify+0x1f7/0x240 [cls_flower]
> tcf_classify+0x7b/0x320
> __dev_queue_xmit+0x3a4/0x11b0
> ip_finish_output2+0x3b8/0xa10
> ip_output+0x7f/0x260
> __ip_queue_xmit+0x1ce/0x610
> __tcp_transmit_skb+0xabc/0xc80
> tcp_write_xmit+0x229/0x12c0
> __tcp_push_pending_frames+0x32/0xf0
> tcp_sendmsg_locked+0x297/0xe10
> tcp_sendmsg+0x27/0x40
> sock_sendmsg+0x58/0x70
> __sys_sendto+0xfd/0x170
> __x64_sys_sendto+0x24/0x30
> do_syscall_64+0x3a/0x90
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f11a06fd281
> Code: 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 43 2c 00 41 89 ca 8b 00 85 c0 75 1c 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 41 56 41 89 ce 41 55
> RSP: 002b:00007ffd17958358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000555c6e671610 RCX: 00007f11a06fd281
> RDX: 0000000000002000 RSI: 0000555c6e73a9f0 RDI: 0000000000000003
> RBP: 0000555c6e6433b0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
> R13: 0000555c6e671410 R14: 0000555c6e671410 R15: 0000555c6e6433f8
> </TASK>
>
> that is very similar to those observed by William in his setup.
> By using netif_rx() for mirred ingress packets, packets are queued in the
> backlog, like it's done in the receive path of "loopback" and "veth", and
> the deadlock is not visible anymore. Also add a selftest that can be used
> to reproduce the problem / verify the fix.
Which also means we can no longer know the RX path status any more,
right? I mean if we have filters on ingress, we can't know whether they
drop this packet or not, after this patch? To me, this at least breaks
users' expectation.
BTW, have you thought about solving the above lockdep warning in TCP
layer?
Thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2022-09-25 18:08 ` Cong Wang
@ 2022-10-04 17:40 ` Davide Caratti
2022-10-16 17:28 ` Cong Wang
0 siblings, 1 reply; 17+ messages in thread
From: Davide Caratti @ 2022-10-04 17:40 UTC (permalink / raw)
To: Cong Wang
Cc: Jamal Hadi Salim, Jiri Pirko, Paolo Abeni,
Marcelo Ricardo Leitner, wizhao, netdev
hello Cong, thanks for looking at this!
On Sun, Sep 25, 2022 at 11:08:48AM -0700, Cong Wang wrote:
> On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> > mirred action in egress-to-ingress can easily produce a dmesg splat like:
> >
> > ============================================
> > WARNING: possible recursive locking detected
[...]
> > 6.0.0-rc4+ #511 Not tainted
> > --------------------------------------------
> > nc/1037 is trying to acquire lock:
> > ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> >
> > but task is already holding lock:
> > ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
FTR, this is:
2091 sk_incoming_cpu_update(sk);
2092
2093 bh_lock_sock_nested(sk); <--- the lock reported in the splat
2094 tcp_segs_in(tcp_sk(sk), skb);
2095 ret = 0;
2096 if (!sock_owned_by_user(sk)) {
> BTW, have you thought about solving the above lockdep warning in TCP
> layer?
yes, but that doesn't look like a trivial fix at all - and I doubt it's
worth doing it just to make mirred and TCP "friends". Please note:
on current kernel this doesn't just result in a lockdep warning: using
iperf3 on unpatched kernels it's possible to see a real deadlock, like:
WARRNING: possible circular locking dependency detected
6.0.0-rc4+ #511 Not tainted
------------------------------------------------------
iperf3/1021 is trying to acquire lock:
ffff976005c5a630 (slock-AF_INET6/1){+...}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
but task is already holding lock:
ffff97607b06e0b0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (slock-AF_INET/1){+.-.}-{2:2}:
lock_acquire+0xd5/0x310
_raw_spin_lock_nested+0x39/0x70
tcp_v4_rcv+0x1023/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_rcv_established+0x284/0x810
tcp_v4_do_rcv+0x1f3/0x370
tcp_v4_rcv+0x10bc/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_write_xmit+0x229/0x12c0
__tcp_push_pending_frames+0x32/0xf0
tcp_sendmsg_locked+0x297/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x58/0x70
sock_write_iter+0x9a/0x100
vfs_write+0x481/0x4f0
ksys_write+0xc2/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (slock-AF_INET6/1){+...}-{2:2}:
check_prevs_add+0x185/0xf50
__lock_acquire+0x11eb/0x1620
lock_acquire+0xd5/0x310
_raw_spin_lock_nested+0x39/0x70
tcp_v4_rcv+0x1023/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_rcv_established+0x284/0x810
tcp_v4_do_rcv+0x1f3/0x370
tcp_v4_rcv+0x10bc/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_write_xmit+0x229/0x12c0
__tcp_push_pending_frames+0x32/0xf0
tcp_sendmsg_locked+0x297/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x42/0x70
sock_write_iter+0x9a/0x100
vfs_write+0x481/0x4f0
ksys_write+0xc2/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(slock-AF_INET/1);
lock(slock-AF_INET6/1);
lock(slock-AF_INET/1);
lock(slock-AF_INET6/1);
*** DEADLOCK ***
12 locks held by iperf3/1021:
#0: ffff976005c5a6c0 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
#1: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
#2: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
#3: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
#4: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
#5: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
#6: ffff97607b06e0b0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
#7: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
#8: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
#9: ffffffffbca072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
#10: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
#11: ffffffffbca07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
[...]
kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [swapper/1:0]
Moreover, even if we improve TCP locking in order to avoid lockups for
this simple topology, I suspect that TCP will experience some packet
losses: when mirred detects 4 nested calls of tcf_mirred_act(), the kernel
will protect against excessive stack growth and drop the skb (that can
also be a full TSO packet). Probably the protocol can recover, but the
performance will be certainly non-optimal.
> Which also means we can no longer know the RX path status any more,
> right? I mean if we have filters on ingress, we can't know whether they
> drop this packet or not, after this patch? To me, this at least breaks
> users' expectation.
Fair point! Then maybe we don't need to change the whole TC mirred ingress:
since the problem only affects egress to ingress, we can preserve the call
to netif_recive_skb() on ingress->ingress, and just use the backlog in the
egress->ingress direction _ that has been broken since the very beginning
and got similar fixes in the past [1]. Something like:
-- >8 --
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -205,12 +205,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
return err;
}
-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int tcf_mirred_forward(bool want_ingress, bool at_ingress, struct sk_buff *skb)
{
int err;
if (!want_ingress)
err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
+ else if (!at_ingress)
+ err = netif_rx(skb);
else
err = netif_receive_skb(skb);
@@ -306,7 +308,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
/* let's the caller reinsert the packet, if possible */
if (use_reinsert) {
res->ingress = want_ingress;
- err = tcf_mirred_forward(res->ingress, skb);
+ err = tcf_mirred_forward(res->ingress, at_ingress, skb);
if (err)
tcf_action_inc_overlimit_qstats(&m->common);
__this_cpu_dec(mirred_rec_level);
@@ -314,7 +316,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
}
}
- err = tcf_mirred_forward(want_ingress, skb2);
+ err = tcf_mirred_forward(want_ingress, at_ingress, skb2);
if (err) {
out:
tcf_action_inc_overlimit_qstats(&m->common);
-- >8 --
WDYT? Any feedback appreciated, thanks!
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f799ada6bf2397c351220088b9b0980125c77280
--
davide
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2022-10-04 17:40 ` Davide Caratti
@ 2022-10-16 17:28 ` Cong Wang
0 siblings, 0 replies; 17+ messages in thread
From: Cong Wang @ 2022-10-16 17:28 UTC (permalink / raw)
To: Davide Caratti
Cc: Jamal Hadi Salim, Jiri Pirko, Paolo Abeni,
Marcelo Ricardo Leitner, wizhao, netdev, peilin.ye
On Tue, Oct 04, 2022 at 07:40:27PM +0200, Davide Caratti wrote:
> hello Cong, thanks for looking at this!
>
> On Sun, Sep 25, 2022 at 11:08:48AM -0700, Cong Wang wrote:
> > On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> > > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > > "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> > > mirred action in egress-to-ingress can easily produce a dmesg splat like:
> > >
> > > ============================================
> > > WARNING: possible recursive locking detected
>
> [...]
>
> > > 6.0.0-rc4+ #511 Not tainted
> > > --------------------------------------------
> > > nc/1037 is trying to acquire lock:
> > > ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> > >
> > > but task is already holding lock:
> > > ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
>
> FTR, this is:
Yeah, Peilin actually looked deeper into this issue. Let's copy him.
>
> 2091 sk_incoming_cpu_update(sk);
> 2092
> 2093 bh_lock_sock_nested(sk); <--- the lock reported in the splat
> 2094 tcp_segs_in(tcp_sk(sk), skb);
> 2095 ret = 0;
> 2096 if (!sock_owned_by_user(sk)) {
>
> > BTW, have you thought about solving the above lockdep warning in TCP
> > layer?
>
> yes, but that doesn't look like a trivial fix at all - and I doubt it's
> worth doing it just to make mirred and TCP "friends". Please note:
> on current kernel this doesn't just result in a lockdep warning: using
> iperf3 on unpatched kernels it's possible to see a real deadlock, like:
I'd say your test case is rare, because I don't think it is trivial for
a TCP socket to send packets to itself.
> > Which also means we can no longer know the RX path status any more,
> > right? I mean if we have filters on ingress, we can't know whether they
> > drop this packet or not, after this patch? To me, this at least breaks
> > users' expectation.
>
> Fair point! Then maybe we don't need to change the whole TC mirred ingress:
> since the problem only affects egress to ingress, we can preserve the call
> to netif_recive_skb() on ingress->ingress, and just use the backlog in the
> egress->ingress direction _ that has been broken since the very beginning
> and got similar fixes in the past [1]. Something like:
Regarless ingress->ingress or egress->ingress, this patch breaks
users' expectation. And, actually egress->ingress is more common than
ingress->ingress, in my experience.
Thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
2022-09-23 15:11 Davide Caratti
2022-09-25 18:08 ` Cong Wang
@ 2022-11-18 23:07 ` Peilin Ye
1 sibling, 0 replies; 17+ messages in thread
From: Peilin Ye @ 2022-11-18 23:07 UTC (permalink / raw)
To: Davide Caratti
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Paolo Abeni,
Marcelo Ricardo Leitner, wizhao, netdev
Hi all,
On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> +mirred_egress_to_ingress_tcp_test()
> +{
> + local tmpfile=$(mktemp) tmpfile1=$(mktemp)
> +
> + RET=0
> + dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
> + tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
> + ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
> + action ct commit nat src addr 192.0.2.2 pipe \
> + action ct clear pipe \
> + action ct commit nat dst addr 192.0.2.1 pipe \
> + action ct clear pipe \
> + action skbedit ptype host pipe \
> + action mirred ingress redirect dev $h1
FWIW, I couldn't reproduce the lockup using this test case (with
forwarding.config.sample), but I got the same lockup in tcp_v4_rcv()
using a different (but probably less realistic) TC filter:
tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
action pedit ex munge ip src set 192.0.2.2 pipe \
action pedit ex munge ip dst set 192.0.2.1 pipe \
action skbedit ptype host pipe \
action mirred ingress redirect dev $h1
Thanks,
Peilin Ye
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-02-15 17:56 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-09 23:54 [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress Jakub Kicinski
2024-02-12 14:51 ` Jamal Hadi Salim
2024-02-12 15:02 ` Jakub Kicinski
2024-02-12 15:11 ` Jamal Hadi Salim
2024-02-13 11:06 ` Paolo Abeni
2024-02-14 0:27 ` Jakub Kicinski
2024-02-14 3:40 ` Jakub Kicinski
2024-02-14 15:11 ` Jamal Hadi Salim
2024-02-14 15:28 ` Jamal Hadi Salim
2024-02-14 16:10 ` Davide Caratti
2024-02-15 0:31 ` Jakub Kicinski
2024-02-15 17:55 ` Davide Caratti
-- strict thread matches above, loose matches on Subject: below --
2022-09-23 15:11 Davide Caratti
2022-09-25 18:08 ` Cong Wang
2022-10-04 17:40 ` Davide Caratti
2022-10-16 17:28 ` Cong Wang
2022-11-18 23:07 ` Peilin Ye
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).