netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 1/2] net/sched: act_mirred: better wording on protection against excessive stack growth
@ 2022-12-20 18:25 Davide Caratti
  2022-12-20 18:25 ` [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress Davide Caratti
  0 siblings, 1 reply; 8+ messages in thread
From: Davide Caratti @ 2022-12-20 18:25 UTC (permalink / raw)
  To: netdev
  Cc: jhs, jiri, marcelo.leitner, pabeni, wizhao, xiyou.wangcong, lucien.xin

with commit e2ca070f89ec ("net: sched: protect against stack overflow in
TC act_mirred"), act_mirred protected itself against excessive stack growth
using per_cpu counter of nested calls to tcf_mirred_act(), and capping it
to MIRRED_RECURSION_LIMIT. However, such protection does not detect
recursion/loops in case the packet is enqueued to the backlog (for example,
when the mirred target device has RPS or skb timestamping enabled). Change
the wording from "recursion" to "nesting" to make it more clear to readers.

CC: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/act_mirred.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 7284bcea7b0b..c8abb5136491 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -29,8 +29,8 @@
 static LIST_HEAD(mirred_list);
 static DEFINE_SPINLOCK(mirred_list_lock);
 
-#define MIRRED_RECURSION_LIMIT    4
-static DEFINE_PER_CPU(unsigned int, mirred_rec_level);
+#define MIRRED_NEST_LIMIT    4
+static DEFINE_PER_CPU(unsigned int, mirred_nest_level);
 
 static bool tcf_mirred_is_act_redirect(int action)
 {
@@ -226,7 +226,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
 	struct sk_buff *skb2 = skb;
 	bool m_mac_header_xmit;
 	struct net_device *dev;
-	unsigned int rec_level;
+	unsigned int nest_level;
 	int retval, err = 0;
 	bool use_reinsert;
 	bool want_ingress;
@@ -237,11 +237,11 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
 	int mac_len;
 	bool at_nh;
 
-	rec_level = __this_cpu_inc_return(mirred_rec_level);
-	if (unlikely(rec_level > MIRRED_RECURSION_LIMIT)) {
+	nest_level = __this_cpu_inc_return(mirred_nest_level);
+	if (unlikely(nest_level > MIRRED_NEST_LIMIT)) {
 		net_warn_ratelimited("Packet exceeded mirred recursion limit on dev %s\n",
 				     netdev_name(skb->dev));
-		__this_cpu_dec(mirred_rec_level);
+		__this_cpu_dec(mirred_nest_level);
 		return TC_ACT_SHOT;
 	}
 
@@ -310,7 +310,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
 			err = tcf_mirred_forward(want_ingress, skb);
 			if (err)
 				tcf_action_inc_overlimit_qstats(&m->common);
-			__this_cpu_dec(mirred_rec_level);
+			__this_cpu_dec(mirred_nest_level);
 			return TC_ACT_CONSUMED;
 		}
 	}
@@ -322,7 +322,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
 		if (tcf_mirred_is_act_redirect(m_eaction))
 			retval = TC_ACT_SHOT;
 	}
-	__this_cpu_dec(mirred_rec_level);
+	__this_cpu_dec(mirred_nest_level);
 
 	return retval;
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress
  2022-12-20 18:25 [RFC net-next 1/2] net/sched: act_mirred: better wording on protection against excessive stack growth Davide Caratti
@ 2022-12-20 18:25 ` Davide Caratti
  2022-12-26 16:03   ` Jamal Hadi Salim
  0 siblings, 1 reply; 8+ messages in thread
From: Davide Caratti @ 2022-12-20 18:25 UTC (permalink / raw)
  To: netdev
  Cc: jhs, jiri, marcelo.leitner, pabeni, wizhao, xiyou.wangcong, lucien.xin

William reports kernel soft-lockups on some OVS topologies when TC mirred
egress->ingress action is hit by local TCP traffic [1].
The same can also be reproduced with SCTP (thanks Xin for verifying), when
client and server reach themselves through mirred egress to ingress, and
one of the two peers sends a "heartbeat" packet (from within a timer).

Enqueueing to backlog proved to fix this soft lockup; however, as Cong
noticed [2], we should preserve - when possible - the current mirred
behavior that counts as "overlimits" any eventual packet drop subsequent to
the mirred forwarding action [3]. A compromise solution might use the
backlog only when tcf_mirred_act() has a nest level greater than one:
change tcf_mirred_forward() accordingly.

Also, add a kselftest that can reproduce the lockup and verifies TC mirred
ability to account for further packet drops after TC mirred egress->ingress
(when the nest level is 1).

 [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
 [3] such behavior is not guaranteed: for example, if RPS or skb RX
     timestamping is enabled on the mirred target device, the kernel
     can defer receiving the skb and return NET_RX_SUCCESS inside
     tcf_mirred_forward().

Reported-by: William Zhao <wizhao@redhat.com>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/act_mirred.c                        |  7 +++
 .../selftests/net/forwarding/tc_actions.sh    | 49 ++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index c8abb5136491..8037ec9b1d31 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -206,12 +206,19 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	return err;
 }
 
+static bool is_mirred_nested(void)
+{
+	return unlikely(__this_cpu_read(mirred_nest_level) > 1);
+}
+
 static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
 {
 	int err;
 
 	if (!want_ingress)
 		err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
+	else if (is_mirred_nested())
+		err = netif_rx(skb);
 	else
 		err = netif_receive_skb(skb);
 
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index 1e0a62f638fe..919c0dd9fe4b 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -3,7 +3,8 @@
 
 ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
 	mirred_egress_mirror_test matchall_mirred_egress_mirror_test \
-	gact_trap_test mirred_egress_to_ingress_test"
+	gact_trap_test mirred_egress_to_ingress_test \
+	mirred_egress_to_ingress_tcp_test"
 NUM_NETIFS=4
 source tc_common.sh
 source lib.sh
@@ -198,6 +199,52 @@ mirred_egress_to_ingress_test()
 	log_test "mirred_egress_to_ingress ($tcflags)"
 }
 
+mirred_egress_to_ingress_tcp_test()
+{
+	local tmpfile=$(mktemp) tmpfile1=$(mktemp)
+
+	RET=0
+	dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
+	tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
+		$tcflags ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
+			action ct commit nat src addr 192.0.2.2 pipe \
+			action ct clear pipe \
+			action ct commit nat dst addr 192.0.2.1 pipe \
+			action ct clear pipe \
+			action skbedit ptype host pipe \
+			action mirred ingress redirect dev $h1
+	tc filter add dev $h1 protocol ip pref 101 handle 101 egress flower \
+		$tcflags ip_proto icmp \
+			action mirred ingress redirect dev $h1
+	tc filter add dev $h1 protocol ip pref 102 handle 102 ingress flower \
+		ip_proto icmp \
+			action drop
+
+	ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $tmpfile1  &
+	local rpid=$!
+	ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$tmpfile
+	wait -n $rpid
+	cmp -s $tmpfile $tmpfile1
+	check_err $? "server output check failed"
+
+	$MZ $h1 -c 10 -p 64 -a $h1mac -b $h1mac -A 192.0.2.1 -B 192.0.2.1 \
+		-t icmp "ping,id=42,seq=5" -q
+	tc_check_packets "dev $h1 egress" 101 10
+	check_err $? "didn't mirred redirect ICMP"
+	tc_check_packets "dev $h1 ingress" 102 10
+	check_err $? "didn't drop mirred ICMP"
+	local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
+	test ${overlimits} = 10
+	check_err $? "wrong overlimits, expected 10 got ${overlimits}"
+
+	tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
+	tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
+	tc filter del dev $h1 ingress protocol ip pref 102 handle 102 flower
+
+	rm -f $tmpfile $tmpfile1
+	log_test "mirred_egress_to_ingress_tcp ($tcflags)"
+}
+
 setup_prepare()
 {
 	h1=${NETIFS[p1]}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress
  2022-12-20 18:25 ` [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress Davide Caratti
@ 2022-12-26 16:03   ` Jamal Hadi Salim
  2023-01-05 17:08     ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 8+ messages in thread
From: Jamal Hadi Salim @ 2022-12-26 16:03 UTC (permalink / raw)
  To: Davide Caratti
  Cc: netdev, jiri, marcelo.leitner, pabeni, wizhao, xiyou.wangcong,
	lucien.xin

Davide,
So this would fix the false positive you are seeing i believe;
however, will it fix a real loop?
In particular I am not sure if the next packet grabbed from the
backlog will end up in the
same CPU.

cheers,
jamal

On Tue, Dec 20, 2022 at 1:25 PM Davide Caratti <dcaratti@redhat.com> wrote:
>
> William reports kernel soft-lockups on some OVS topologies when TC mirred
> egress->ingress action is hit by local TCP traffic [1].
> The same can also be reproduced with SCTP (thanks Xin for verifying), when
> client and server reach themselves through mirred egress to ingress, and
> one of the two peers sends a "heartbeat" packet (from within a timer).
>
> Enqueueing to backlog proved to fix this soft lockup; however, as Cong
> noticed [2], we should preserve - when possible - the current mirred
> behavior that counts as "overlimits" any eventual packet drop subsequent to
> the mirred forwarding action [3]. A compromise solution might use the
> backlog only when tcf_mirred_act() has a nest level greater than one:
> change tcf_mirred_forward() accordingly.
>
> Also, add a kselftest that can reproduce the lockup and verifies TC mirred
> ability to account for further packet drops after TC mirred egress->ingress
> (when the nest level is 1).
>
>  [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
>  [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
>  [3] such behavior is not guaranteed: for example, if RPS or skb RX
>      timestamping is enabled on the mirred target device, the kernel
>      can defer receiving the skb and return NET_RX_SUCCESS inside
>      tcf_mirred_forward().
>
> Reported-by: William Zhao <wizhao@redhat.com>
> CC: Xin Long <lucien.xin@gmail.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
>  net/sched/act_mirred.c                        |  7 +++
>  .../selftests/net/forwarding/tc_actions.sh    | 49 ++++++++++++++++++-
>  2 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index c8abb5136491..8037ec9b1d31 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -206,12 +206,19 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>         return err;
>  }
>
> +static bool is_mirred_nested(void)
> +{
> +       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> +}
> +
>  static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
>  {
>         int err;
>
>         if (!want_ingress)
>                 err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> +       else if (is_mirred_nested())
> +               err = netif_rx(skb);
>         else
>                 err = netif_receive_skb(skb);
>
> diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> index 1e0a62f638fe..919c0dd9fe4b 100755
> --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> @@ -3,7 +3,8 @@
>
>  ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
>         mirred_egress_mirror_test matchall_mirred_egress_mirror_test \
> -       gact_trap_test mirred_egress_to_ingress_test"
> +       gact_trap_test mirred_egress_to_ingress_test \
> +       mirred_egress_to_ingress_tcp_test"
>  NUM_NETIFS=4
>  source tc_common.sh
>  source lib.sh
> @@ -198,6 +199,52 @@ mirred_egress_to_ingress_test()
>         log_test "mirred_egress_to_ingress ($tcflags)"
>  }
>
> +mirred_egress_to_ingress_tcp_test()
> +{
> +       local tmpfile=$(mktemp) tmpfile1=$(mktemp)
> +
> +       RET=0
> +       dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
> +       tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
> +               $tcflags ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
> +                       action ct commit nat src addr 192.0.2.2 pipe \
> +                       action ct clear pipe \
> +                       action ct commit nat dst addr 192.0.2.1 pipe \
> +                       action ct clear pipe \
> +                       action skbedit ptype host pipe \
> +                       action mirred ingress redirect dev $h1
> +       tc filter add dev $h1 protocol ip pref 101 handle 101 egress flower \
> +               $tcflags ip_proto icmp \
> +                       action mirred ingress redirect dev $h1
> +       tc filter add dev $h1 protocol ip pref 102 handle 102 ingress flower \
> +               ip_proto icmp \
> +                       action drop
> +
> +       ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $tmpfile1  &
> +       local rpid=$!
> +       ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$tmpfile
> +       wait -n $rpid
> +       cmp -s $tmpfile $tmpfile1
> +       check_err $? "server output check failed"
> +
> +       $MZ $h1 -c 10 -p 64 -a $h1mac -b $h1mac -A 192.0.2.1 -B 192.0.2.1 \
> +               -t icmp "ping,id=42,seq=5" -q
> +       tc_check_packets "dev $h1 egress" 101 10
> +       check_err $? "didn't mirred redirect ICMP"
> +       tc_check_packets "dev $h1 ingress" 102 10
> +       check_err $? "didn't drop mirred ICMP"
> +       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> +       test ${overlimits} = 10
> +       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
> +
> +       tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> +       tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> +       tc filter del dev $h1 ingress protocol ip pref 102 handle 102 flower
> +
> +       rm -f $tmpfile $tmpfile1
> +       log_test "mirred_egress_to_ingress_tcp ($tcflags)"
> +}
> +
>  setup_prepare()
>  {
>         h1=${NETIFS[p1]}
> --
> 2.38.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress
  2022-12-26 16:03   ` Jamal Hadi Salim
@ 2023-01-05 17:08     ` Marcelo Ricardo Leitner
  2023-01-09 15:59       ` Jamal Hadi Salim
  0 siblings, 1 reply; 8+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-05 17:08 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Davide Caratti, netdev, jiri, pabeni, wizhao, xiyou.wangcong, lucien.xin

Hi Jamal,

On Mon, Dec 26, 2022 at 11:03:34AM -0500, Jamal Hadi Salim wrote:
> Davide,

He's on PTO, so please let me try to answer it here.

> So this would fix the false positive you are seeing i believe;
> however, will it fix a real loop?
> In particular I am not sure if the next packet grabbed from the
> backlog will end up in the
> same CPU.

This series is not changing, functionally speaking, any about that.
As explained on the first patch, this "loop protection" already
doesn't work when RFS or skb timestamps are used. The idea on that
protection was different from the beginning already - to actually
protect from stack growth, and not packet loop. So the first patch
here amends the wording on it to something closer to reality.

Makes sense?

Cheers,
Marcelo

> 
> cheers,
> jamal
> 
> On Tue, Dec 20, 2022 at 1:25 PM Davide Caratti <dcaratti@redhat.com> wrote:
> >
> > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > egress->ingress action is hit by local TCP traffic [1].
> > The same can also be reproduced with SCTP (thanks Xin for verifying), when
> > client and server reach themselves through mirred egress to ingress, and
> > one of the two peers sends a "heartbeat" packet (from within a timer).
> >
> > Enqueueing to backlog proved to fix this soft lockup; however, as Cong
> > noticed [2], we should preserve - when possible - the current mirred
> > behavior that counts as "overlimits" any eventual packet drop subsequent to
> > the mirred forwarding action [3]. A compromise solution might use the
> > backlog only when tcf_mirred_act() has a nest level greater than one:
> > change tcf_mirred_forward() accordingly.
> >
> > Also, add a kselftest that can reproduce the lockup and verifies TC mirred
> > ability to account for further packet drops after TC mirred egress->ingress
> > (when the nest level is 1).
> >
> >  [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> >  [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
> >  [3] such behavior is not guaranteed: for example, if RPS or skb RX
> >      timestamping is enabled on the mirred target device, the kernel
> >      can defer receiving the skb and return NET_RX_SUCCESS inside
> >      tcf_mirred_forward().
> >
> > Reported-by: William Zhao <wizhao@redhat.com>
> > CC: Xin Long <lucien.xin@gmail.com>
> > Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> > ---
> >  net/sched/act_mirred.c                        |  7 +++
> >  .../selftests/net/forwarding/tc_actions.sh    | 49 ++++++++++++++++++-
> >  2 files changed, 55 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > index c8abb5136491..8037ec9b1d31 100644
> > --- a/net/sched/act_mirred.c
> > +++ b/net/sched/act_mirred.c
> > @@ -206,12 +206,19 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> >         return err;
> >  }
> >
> > +static bool is_mirred_nested(void)
> > +{
> > +       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> > +}
> > +
> >  static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> >  {
> >         int err;
> >
> >         if (!want_ingress)
> >                 err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> > +       else if (is_mirred_nested())
> > +               err = netif_rx(skb);
> >         else
> >                 err = netif_receive_skb(skb);
> >
> > diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > index 1e0a62f638fe..919c0dd9fe4b 100755
> > --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> > +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > @@ -3,7 +3,8 @@
> >
> >  ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
> >         mirred_egress_mirror_test matchall_mirred_egress_mirror_test \
> > -       gact_trap_test mirred_egress_to_ingress_test"
> > +       gact_trap_test mirred_egress_to_ingress_test \
> > +       mirred_egress_to_ingress_tcp_test"
> >  NUM_NETIFS=4
> >  source tc_common.sh
> >  source lib.sh
> > @@ -198,6 +199,52 @@ mirred_egress_to_ingress_test()
> >         log_test "mirred_egress_to_ingress ($tcflags)"
> >  }
> >
> > +mirred_egress_to_ingress_tcp_test()
> > +{
> > +       local tmpfile=$(mktemp) tmpfile1=$(mktemp)
> > +
> > +       RET=0
> > +       dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
> > +       tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
> > +               $tcflags ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
> > +                       action ct commit nat src addr 192.0.2.2 pipe \
> > +                       action ct clear pipe \
> > +                       action ct commit nat dst addr 192.0.2.1 pipe \
> > +                       action ct clear pipe \
> > +                       action skbedit ptype host pipe \
> > +                       action mirred ingress redirect dev $h1
> > +       tc filter add dev $h1 protocol ip pref 101 handle 101 egress flower \
> > +               $tcflags ip_proto icmp \
> > +                       action mirred ingress redirect dev $h1
> > +       tc filter add dev $h1 protocol ip pref 102 handle 102 ingress flower \
> > +               ip_proto icmp \
> > +                       action drop
> > +
> > +       ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $tmpfile1  &
> > +       local rpid=$!
> > +       ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$tmpfile
> > +       wait -n $rpid
> > +       cmp -s $tmpfile $tmpfile1
> > +       check_err $? "server output check failed"
> > +
> > +       $MZ $h1 -c 10 -p 64 -a $h1mac -b $h1mac -A 192.0.2.1 -B 192.0.2.1 \
> > +               -t icmp "ping,id=42,seq=5" -q
> > +       tc_check_packets "dev $h1 egress" 101 10
> > +       check_err $? "didn't mirred redirect ICMP"
> > +       tc_check_packets "dev $h1 ingress" 102 10
> > +       check_err $? "didn't drop mirred ICMP"
> > +       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> > +       test ${overlimits} = 10
> > +       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
> > +
> > +       tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> > +       tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> > +       tc filter del dev $h1 ingress protocol ip pref 102 handle 102 flower
> > +
> > +       rm -f $tmpfile $tmpfile1
> > +       log_test "mirred_egress_to_ingress_tcp ($tcflags)"
> > +}
> > +
> >  setup_prepare()
> >  {
> >         h1=${NETIFS[p1]}
> > --
> > 2.38.1
> >
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress
  2023-01-05 17:08     ` Marcelo Ricardo Leitner
@ 2023-01-09 15:59       ` Jamal Hadi Salim
  2023-01-09 18:39         ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Jamal Hadi Salim @ 2023-01-09 15:59 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Davide Caratti, netdev, jiri, pabeni, wizhao, xiyou.wangcong, lucien.xin

Sorry, I thought it was addressing the issue we discussed last time.
Lets discuss in our monthly meeting.

cheers,
jamal

On Thu, Jan 5, 2023 at 12:08 PM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> Hi Jamal,
>
> On Mon, Dec 26, 2022 at 11:03:34AM -0500, Jamal Hadi Salim wrote:
> > Davide,
>
> He's on PTO, so please let me try to answer it here.
>
> > So this would fix the false positive you are seeing i believe;
> > however, will it fix a real loop?
> > In particular I am not sure if the next packet grabbed from the
> > backlog will end up in the
> > same CPU.
>
> This series is not changing, functionally speaking, any about that.
> As explained on the first patch, this "loop protection" already
> doesn't work when RFS or skb timestamps are used. The idea on that
> protection was different from the beginning already - to actually
> protect from stack growth, and not packet loop. So the first patch
> here amends the wording on it to something closer to reality.
>
> Makes sense?
>
> Cheers,
> Marcelo
>
> >
> > cheers,
> > jamal
> >
> > On Tue, Dec 20, 2022 at 1:25 PM Davide Caratti <dcaratti@redhat.com> wrote:
> > >
> > > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > > egress->ingress action is hit by local TCP traffic [1].
> > > The same can also be reproduced with SCTP (thanks Xin for verifying), when
> > > client and server reach themselves through mirred egress to ingress, and
> > > one of the two peers sends a "heartbeat" packet (from within a timer).
> > >
> > > Enqueueing to backlog proved to fix this soft lockup; however, as Cong
> > > noticed [2], we should preserve - when possible - the current mirred
> > > behavior that counts as "overlimits" any eventual packet drop subsequent to
> > > the mirred forwarding action [3]. A compromise solution might use the
> > > backlog only when tcf_mirred_act() has a nest level greater than one:
> > > change tcf_mirred_forward() accordingly.
> > >
> > > Also, add a kselftest that can reproduce the lockup and verifies TC mirred
> > > ability to account for further packet drops after TC mirred egress->ingress
> > > (when the nest level is 1).
> > >
> > >  [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
> > >  [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
> > >  [3] such behavior is not guaranteed: for example, if RPS or skb RX
> > >      timestamping is enabled on the mirred target device, the kernel
> > >      can defer receiving the skb and return NET_RX_SUCCESS inside
> > >      tcf_mirred_forward().
> > >
> > > Reported-by: William Zhao <wizhao@redhat.com>
> > > CC: Xin Long <lucien.xin@gmail.com>
> > > Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> > > ---
> > >  net/sched/act_mirred.c                        |  7 +++
> > >  .../selftests/net/forwarding/tc_actions.sh    | 49 ++++++++++++++++++-
> > >  2 files changed, 55 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > > index c8abb5136491..8037ec9b1d31 100644
> > > --- a/net/sched/act_mirred.c
> > > +++ b/net/sched/act_mirred.c
> > > @@ -206,12 +206,19 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> > >         return err;
> > >  }
> > >
> > > +static bool is_mirred_nested(void)
> > > +{
> > > +       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
> > > +}
> > > +
> > >  static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
> > >  {
> > >         int err;
> > >
> > >         if (!want_ingress)
> > >                 err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
> > > +       else if (is_mirred_nested())
> > > +               err = netif_rx(skb);
> > >         else
> > >                 err = netif_receive_skb(skb);
> > >
> > > diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > > index 1e0a62f638fe..919c0dd9fe4b 100755
> > > --- a/tools/testing/selftests/net/forwarding/tc_actions.sh
> > > +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
> > > @@ -3,7 +3,8 @@
> > >
> > >  ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
> > >         mirred_egress_mirror_test matchall_mirred_egress_mirror_test \
> > > -       gact_trap_test mirred_egress_to_ingress_test"
> > > +       gact_trap_test mirred_egress_to_ingress_test \
> > > +       mirred_egress_to_ingress_tcp_test"
> > >  NUM_NETIFS=4
> > >  source tc_common.sh
> > >  source lib.sh
> > > @@ -198,6 +199,52 @@ mirred_egress_to_ingress_test()
> > >         log_test "mirred_egress_to_ingress ($tcflags)"
> > >  }
> > >
> > > +mirred_egress_to_ingress_tcp_test()
> > > +{
> > > +       local tmpfile=$(mktemp) tmpfile1=$(mktemp)
> > > +
> > > +       RET=0
> > > +       dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile
> > > +       tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \
> > > +               $tcflags ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \
> > > +                       action ct commit nat src addr 192.0.2.2 pipe \
> > > +                       action ct clear pipe \
> > > +                       action ct commit nat dst addr 192.0.2.1 pipe \
> > > +                       action ct clear pipe \
> > > +                       action skbedit ptype host pipe \
> > > +                       action mirred ingress redirect dev $h1
> > > +       tc filter add dev $h1 protocol ip pref 101 handle 101 egress flower \
> > > +               $tcflags ip_proto icmp \
> > > +                       action mirred ingress redirect dev $h1
> > > +       tc filter add dev $h1 protocol ip pref 102 handle 102 ingress flower \
> > > +               ip_proto icmp \
> > > +                       action drop
> > > +
> > > +       ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $tmpfile1  &
> > > +       local rpid=$!
> > > +       ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$tmpfile
> > > +       wait -n $rpid
> > > +       cmp -s $tmpfile $tmpfile1
> > > +       check_err $? "server output check failed"
> > > +
> > > +       $MZ $h1 -c 10 -p 64 -a $h1mac -b $h1mac -A 192.0.2.1 -B 192.0.2.1 \
> > > +               -t icmp "ping,id=42,seq=5" -q
> > > +       tc_check_packets "dev $h1 egress" 101 10
> > > +       check_err $? "didn't mirred redirect ICMP"
> > > +       tc_check_packets "dev $h1 ingress" 102 10
> > > +       check_err $? "didn't drop mirred ICMP"
> > > +       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
> > > +       test ${overlimits} = 10
> > > +       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
> > > +
> > > +       tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
> > > +       tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
> > > +       tc filter del dev $h1 ingress protocol ip pref 102 handle 102 flower
> > > +
> > > +       rm -f $tmpfile $tmpfile1
> > > +       log_test "mirred_egress_to_ingress_tcp ($tcflags)"
> > > +}
> > > +
> > >  setup_prepare()
> > >  {
> > >         h1=${NETIFS[p1]}
> > > --
> > > 2.38.1
> > >
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress
  2023-01-09 15:59       ` Jamal Hadi Salim
@ 2023-01-09 18:39         ` Jakub Kicinski
  2023-01-09 20:46           ` Jamal Hadi Salim
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2023-01-09 18:39 UTC (permalink / raw)
  To: Jamal Hadi Salim, xiyou.wangcong
  Cc: Marcelo Ricardo Leitner, Davide Caratti, netdev, jiri, pabeni,
	wizhao, lucien.xin

On Mon, 9 Jan 2023 10:59:49 -0500 Jamal Hadi Salim wrote:
> Sorry, I thought it was addressing the issue we discussed last time.
> Lets discuss in our monthly meeting.

When is your meeting? One could consider this patch as a fix and it
looks kinda "obviously correct", so the need for delays and discussions
is a bit lost on me.

Cong, WDYT?

Reminder: please don't top post and trim your replies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress
  2023-01-09 18:39         ` Jakub Kicinski
@ 2023-01-09 20:46           ` Jamal Hadi Salim
  2023-01-11 14:26             ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 8+ messages in thread
From: Jamal Hadi Salim @ 2023-01-09 20:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: xiyou.wangcong, Marcelo Ricardo Leitner, Davide Caratti, netdev,
	jiri, pabeni, wizhao, lucien.xin

On Mon, Jan 9, 2023 at 1:39 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 9 Jan 2023 10:59:49 -0500 Jamal Hadi Salim wrote:
> > Sorry, I thought it was addressing the issue we discussed last time.
> > Lets discuss in our monthly meeting.
>
> When is your meeting? One could consider this patch as a fix and it
> looks kinda "obviously correct", so the need for delays and discussions
> is a bit lost on me.

The original issue was discussed in our meetup and the monthly meetup was
today - where we agreed this patch solves the outstanding issue.

No idea what is going on with the stoopid mail client; i gave up on thunderbird
but possibly the OP used html and this thing is acting in kind. I will double
check next time...


cheers,
jamal

> Cong, WDYT?
>
> Reminder: please don't top post and trim your replies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress
  2023-01-09 20:46           ` Jamal Hadi Salim
@ 2023-01-11 14:26             ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 8+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-01-11 14:26 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, xiyou.wangcong, Davide Caratti, netdev, jiri,
	pabeni, wizhao, lucien.xin

On Mon, Jan 09, 2023 at 03:46:10PM -0500, Jamal Hadi Salim wrote:
> On Mon, Jan 9, 2023 at 1:39 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 9 Jan 2023 10:59:49 -0500 Jamal Hadi Salim wrote:
> > > Sorry, I thought it was addressing the issue we discussed last time.
> > > Lets discuss in our monthly meeting.
> >
> > When is your meeting? One could consider this patch as a fix and it
> > looks kinda "obviously correct", so the need for delays and discussions
> > is a bit lost on me.

Long story short, this fix is 3 months old by now. Cong has been very
hesitant to it.

> 
> The original issue was discussed in our meetup and the monthly meetup was
> today - where we agreed this patch solves the outstanding issue.

Unfortunatelly Cong didn't attend the mtg, though. Not sure if he is
on PTO or what. We do believe, though, that the patch is now
addressing all his concerns.

  Marcelo

> 
> No idea what is going on with the stoopid mail client; i gave up on thunderbird
> but possibly the OP used html and this thing is acting in kind. I will double
> check next time...
> 
> 
> cheers,
> jamal
> 
> > Cong, WDYT?
> >
> > Reminder: please don't top post and trim your replies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-11 14:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-20 18:25 [RFC net-next 1/2] net/sched: act_mirred: better wording on protection against excessive stack growth Davide Caratti
2022-12-20 18:25 ` [RFC net-next 2/2] act_mirred: use the backlog for nested calls to mirred ingress Davide Caratti
2022-12-26 16:03   ` Jamal Hadi Salim
2023-01-05 17:08     ` Marcelo Ricardo Leitner
2023-01-09 15:59       ` Jamal Hadi Salim
2023-01-09 18:39         ` Jakub Kicinski
2023-01-09 20:46           ` Jamal Hadi Salim
2023-01-11 14:26             ` Marcelo Ricardo Leitner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).