All of lore.kernel.org
 help / color / mirror / Atom feed
* [net v5 0/3] fix bpf_redirect to ifb netdev
@ 2021-12-08 14:54 xiangxia.m.yue
  2021-12-08 14:54 ` [net v5 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: xiangxia.m.yue @ 2021-12-08 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patchset try to fix bpf_redirect to ifb netdev.
Prevent packets loopback and perfromance drop, add check
in sch egress.

Tonghao Zhang (3):
  net: core: set skb useful vars in __bpf_tx_skb
  net: sched: add check tc_skip_classify in sch egress
  selftests: bpf: add bpf_redirect to ifb

 net/core/dev.c                                |  3 +
 net/core/filter.c                             | 12 ++-
 tools/testing/selftests/bpf/Makefile          |  1 +
 .../bpf/progs/test_bpf_redirect_ifb.c         | 13 ++++
 .../selftests/bpf/test_bpf_redirect_ifb.sh    | 73 +++++++++++++++++++
 5 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
 create mode 100755 tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh

-- 
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
--
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [net v5 1/3] net: core: set skb useful vars in __bpf_tx_skb
  2021-12-08 14:54 [net v5 0/3] fix bpf_redirect to ifb netdev xiangxia.m.yue
@ 2021-12-08 14:54 ` xiangxia.m.yue
  2021-12-08 14:54 ` [net v5 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: xiangxia.m.yue @ 2021-12-08 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

We may use bpf_redirect to redirect the packets to other
netdevice (e.g. ifb) in ingress or egress path.

The target netdevice may check the *skb_iif, *redirected
and *from_ingress. For example, if skb_iif or redirected
is 0, ifb will drop the packets.

bpf_redirect may be invoked in ingress or egress path, so
we set the *skb_iif unconditionally.

Fixes: a70b506efe89 ("bpf: enforce recursion limit on redirects")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/filter.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 8271624a19aa..bcfdce9e99f4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2107,9 +2107,19 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
 		return -ENETDOWN;
 	}
 
-	skb->dev = dev;
+	/* The target netdevice (e.g. ifb) may use the:
+	 * - skb_iif, bpf_redirect invoked in ingress or egress path.
+	 * - redirected
+	 * - from_ingress
+	 */
+	skb->skb_iif = skb->dev->ifindex;
+#ifdef CONFIG_NET_CLS_ACT
+	skb_set_redirected(skb, skb->tc_at_ingress);
+#else
 	skb->tstamp = 0;
+#endif
 
+	skb->dev = dev;
 	dev_xmit_recursion_inc();
 	ret = dev_queue_xmit(skb);
 	dev_xmit_recursion_dec();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-08 14:54 [net v5 0/3] fix bpf_redirect to ifb netdev xiangxia.m.yue
  2021-12-08 14:54 ` [net v5 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue
@ 2021-12-08 14:54 ` xiangxia.m.yue
  2021-12-10 16:43   ` John Fastabend
  2021-12-08 14:54 ` [net v5 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue
  2021-12-08 15:41 ` [net v5 0/3] fix bpf_redirect to ifb netdev Alexander Lobakin
  3 siblings, 1 reply; 19+ messages in thread
From: xiangxia.m.yue @ 2021-12-08 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Try to resolve the issues as below:
* We look up and then check tc_skip_classify flag in net
  sched layer, even though skb don't want to be classified.
  That case may consume a lot of cpu cycles. This patch
  is useful when there are a lot of filters with different
  prio. There is ~5 prio in in production, ~1% improvement.

  Rules as below:
  $ for id in $(seq 1 5); do
  $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
  $ done

* bpf_redirect may be invoked in egress path. If we don't
  check the flags and then return immediately, the packets
  will loopback.

  $ tc filter add dev eth0 egress bpf direct-action obj ifb.o sec ifb

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index a64297a4cc89..81ad415b78f9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3823,6 +3823,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	if (!miniq)
 		return skb;
 
+	if (skb_skip_tc_classify(skb))
+		return skb;
+
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	qdisc_skb_cb(skb)->mru = 0;
 	qdisc_skb_cb(skb)->post_ct = false;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net v5 3/3] selftests: bpf: add bpf_redirect to ifb
  2021-12-08 14:54 [net v5 0/3] fix bpf_redirect to ifb netdev xiangxia.m.yue
  2021-12-08 14:54 ` [net v5 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue
  2021-12-08 14:54 ` [net v5 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
@ 2021-12-08 14:54 ` xiangxia.m.yue
  2021-12-08 15:41 ` [net v5 0/3] fix bpf_redirect to ifb netdev Alexander Lobakin
  3 siblings, 0 replies; 19+ messages in thread
From: xiangxia.m.yue @ 2021-12-08 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

ifb netdev is used for queueing incoming traffic for shaping.
we may run bpf progs in tc cls hook(ingress or egress), to
redirect the packets to ifb.

This patch adds this test, for bpf.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 tools/testing/selftests/bpf/Makefile          |  1 +
 .../bpf/progs/test_bpf_redirect_ifb.c         | 13 ++++
 .../selftests/bpf/test_bpf_redirect_ifb.sh    | 73 +++++++++++++++++++
 3 files changed, 87 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
 create mode 100755 tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 5d42db2e129a..6ec8b97af0ea 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -65,6 +65,7 @@ TEST_PROGS := test_kmod.sh \
 	test_xdp_vlan_mode_native.sh \
 	test_lwt_ip_encap.sh \
 	test_tcp_check_syncookie.sh \
+	test_bpf_redirect_ifb.sh \
 	test_tc_tunnel.sh \
 	test_tc_edt.sh \
 	test_xdping.sh \
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c b/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
new file mode 100644
index 000000000000..8b960cd8786b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 DiDi Global */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("redirect_ifb")
+int redirect(struct __sk_buff *skb)
+{
+	return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh b/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh
new file mode 100755
index 000000000000..0933439696ab
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+
+# Topology:
+# ---------
+#      n1 namespace    |     n2 namespace
+#                      |
+#      -----------     |     ----------------
+#      |  veth0  | --------- |  veth1, ifb1 |
+#      -----------   peer    ----------------
+#
+
+readonly prefix="ns-$$-"
+readonly ns1="${prefix}1"
+readonly ns2="${prefix}2"
+readonly ns1_addr=192.168.1.1
+readonly ns2_addr=192.168.1.2
+
+setup() {
+	echo "Load ifb module"
+	if ! /sbin/modprobe -q -n ifb; then
+		echo "test_bpf_redirect ifb: module ifb is not found [SKIP]"
+		exit 4
+	fi
+
+	modprobe -q ifb numifbs=0
+
+	ip netns add "${ns1}"
+	ip netns add "${ns2}"
+
+	ip link add dev veth0 mtu 1500 netns "${ns1}" type veth \
+	      peer name veth1 mtu 1500 netns "${ns2}"
+	# ifb1 created after veth1
+	ip link add dev ifb1 mtu 1500 netns "${ns2}" type ifb
+
+	ip -netns "${ns1}" link set veth0 up
+	ip -netns "${ns2}" link set veth1 up
+	ip -netns "${ns2}" link set ifb1 up
+	ip -netns "${ns1}" -4 addr add "${ns1_addr}/24" dev veth0
+	ip -netns "${ns2}" -4 addr add "${ns2_addr}/24" dev veth1
+
+	ip netns exec "${ns2}" tc qdisc add dev veth1 clsact
+}
+
+cleanup() {
+	ip netns del "${ns2}" &>/dev/null
+	ip netns del "${ns1}" &>/dev/null
+	modprobe -r ifb
+}
+
+trap cleanup EXIT
+
+setup
+
+ip netns exec "${ns2}" tc filter add dev veth1 \
+	ingress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
+ip netns exec "${ns1}" ping -W 2 -c 2 -i 0.2 -q "${ns2_addr}" &>/dev/null
+if [ $? -ne 0 ]; then
+	echo "bpf redirect to ifb on ingress path [FAILED]"
+	exit 1
+fi
+
+ip netns exec "${ns2}" tc filter del dev veth1 ingress
+ip netns exec "${ns2}" tc filter add dev veth1 \
+	egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
+ip netns exec "${ns1}" ping -W 2 -c 2 -i 0.2 -q "${ns2_addr}" &>/dev/null
+if [ $? -ne 0 ]; then
+	echo "bpf redirect to ifb on egress path [FAILED]"
+	exit 1
+fi
+
+echo OK
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [net v5 0/3] fix bpf_redirect to ifb netdev
  2021-12-08 14:54 [net v5 0/3] fix bpf_redirect to ifb netdev xiangxia.m.yue
                   ` (2 preceding siblings ...)
  2021-12-08 14:54 ` [net v5 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue
@ 2021-12-08 15:41 ` Alexander Lobakin
  2021-12-08 15:53   ` Tonghao Zhang
  3 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2021-12-08 15:41 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Eric Dumazet, Antoine Tenart, Wei Wang, Arnd Bergmann,
	netdev, linux-kernel

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Date: Wed,  8 Dec 2021 22:54:56 +0800

> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> This patchset try to fix bpf_redirect to ifb netdev.
> Prevent packets loopback and perfromance drop, add check
> in sch egress.

Please provide a changelog in the cover letter. With the links to
your previous versions ideally.
Otherwise it becomes difficult to understand what are the changes
between them.

>
> Tonghao Zhang (3):
>   net: core: set skb useful vars in __bpf_tx_skb
>   net: sched: add check tc_skip_classify in sch egress
>   selftests: bpf: add bpf_redirect to ifb
> 
>  net/core/dev.c                                |  3 +
>  net/core/filter.c                             | 12 ++-
>  tools/testing/selftests/bpf/Makefile          |  1 +
>  .../bpf/progs/test_bpf_redirect_ifb.c         | 13 ++++
>  .../selftests/bpf/test_bpf_redirect_ifb.sh    | 73 +++++++++++++++++++
>  5 files changed, 101 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
>  create mode 100755 tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh
> 
> -- 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Song Liu <songliubraving@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Antoine Tenart <atenart@kernel.org>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Wei Wang <weiwan@google.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> --
> 2.27.0

Al

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 0/3] fix bpf_redirect to ifb netdev
  2021-12-08 15:41 ` [net v5 0/3] fix bpf_redirect to ifb netdev Alexander Lobakin
@ 2021-12-08 15:53   ` Tonghao Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-08 15:53 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Eric Dumazet,
	Antoine Tenart, Wei Wang, Arnd Bergmann,
	Linux Kernel Network Developers, LKML

On Wed, Dec 8, 2021 at 11:42 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Date: Wed,  8 Dec 2021 22:54:56 +0800
>
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patchset try to fix bpf_redirect to ifb netdev.
> > Prevent packets loopback and perfromance drop, add check
> > in sch egress.
>
> Please provide a changelog in the cover letter. With the links to
> your previous versions ideally.
> Otherwise it becomes difficult to understand what are the changes
> between them.
Hi Alexander
This version of patchset, 2/3 only updates the commit message. because the
example in the commit message is not a usual case.
There are no  comments, so I sent them again.
I will provide a changelog in the next version or resend this version
again.  Thanks.

> >
> > Tonghao Zhang (3):
> >   net: core: set skb useful vars in __bpf_tx_skb
> >   net: sched: add check tc_skip_classify in sch egress
> >   selftests: bpf: add bpf_redirect to ifb
> >
> >  net/core/dev.c                                |  3 +
> >  net/core/filter.c                             | 12 ++-
> >  tools/testing/selftests/bpf/Makefile          |  1 +
> >  .../bpf/progs/test_bpf_redirect_ifb.c         | 13 ++++
> >  .../selftests/bpf/test_bpf_redirect_ifb.sh    | 73 +++++++++++++++++++
> >  5 files changed, 101 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
> >  create mode 100755 tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh
> >
> > --
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Andrii Nakryiko <andrii@kernel.org>
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Cc: Song Liu <songliubraving@fb.com>
> > Cc: Yonghong Song <yhs@fb.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: KP Singh <kpsingh@kernel.org>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Antoine Tenart <atenart@kernel.org>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Wei Wang <weiwan@google.com>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > --
> > 2.27.0
>
> Al



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-08 14:54 ` [net v5 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
@ 2021-12-10 16:43   ` John Fastabend
  2021-12-10 16:52     ` John Fastabend
  2021-12-10 17:37     ` Tonghao Zhang
  0 siblings, 2 replies; 19+ messages in thread
From: John Fastabend @ 2021-12-10 16:43 UTC (permalink / raw)
  To: xiangxia.m.yue, netdev
  Cc: Tonghao Zhang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Arnd Bergmann

xiangxia.m.yue@ wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> Try to resolve the issues as below:
> * We look up and then check tc_skip_classify flag in net
>   sched layer, even though skb don't want to be classified.
>   That case may consume a lot of cpu cycles. This patch
>   is useful when there are a lot of filters with different
>   prio. There is ~5 prio in in production, ~1% improvement.
> 
>   Rules as below:
>   $ for id in $(seq 1 5); do
>   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>   $ done
> 
> * bpf_redirect may be invoked in egress path. If we don't
>   check the flags and then return immediately, the packets
>   will loopback.

This would be the naive case right? Meaning the BPF program is
doing a redirect without any logic or is buggy?

Can you map out how this happens for me, I'm not fully sure I
understand the exact concern. Is it possible for BPF programs
that used to see packets no longer see the packet as expected?

Is this the path you are talking about?

 rx ethx  ->
   execute BPF program on ethx with bpf_redirect(ifb0) ->
     __skb_dequeue @ifb tc_skip_classify = 1 ->
       dev_queue_xmit() -> 
          sch_handle_egress() ->
            execute BPF program again

I can't see why you want to skip that second tc BPF program,
or for that matter any tc filter there. In general how do you
know that is the correct/expected behavior? Before the above
change it would have been called, what if its doing useful
work.

Also its not clear how your ifb setup is built or used. That
might help understand your use case. I would just remove the
IFB altogether and the above discussion is mute.

Thanks,
John

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 16:43   ` John Fastabend
@ 2021-12-10 16:52     ` John Fastabend
  2021-12-10 17:43       ` Tonghao Zhang
  2021-12-10 17:37     ` Tonghao Zhang
  1 sibling, 1 reply; 19+ messages in thread
From: John Fastabend @ 2021-12-10 16:52 UTC (permalink / raw)
  To: John Fastabend, xiangxia.m.yue, netdev
  Cc: Tonghao Zhang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Arnd Bergmann

John Fastabend wrote:
> xiangxia.m.yue@ wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > 
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >   sched layer, even though skb don't want to be classified.
> >   That case may consume a lot of cpu cycles. This patch
> >   is useful when there are a lot of filters with different
> >   prio. There is ~5 prio in in production, ~1% improvement.
> > 
> >   Rules as below:
> >   $ for id in $(seq 1 5); do
> >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >   $ done
> > 
> > * bpf_redirect may be invoked in egress path. If we don't
> >   check the flags and then return immediately, the packets
> >   will loopback.
> 
> This would be the naive case right? Meaning the BPF program is
> doing a redirect without any logic or is buggy?
> 
> Can you map out how this happens for me, I'm not fully sure I
> understand the exact concern. Is it possible for BPF programs
> that used to see packets no longer see the packet as expected?
> 
> Is this the path you are talking about?
> 
>  rx ethx  ->
>    execute BPF program on ethx with bpf_redirect(ifb0) ->
>      __skb_dequeue @ifb tc_skip_classify = 1 ->
>        dev_queue_xmit() -> 
>           sch_handle_egress() ->
>             execute BPF program again
> 
> I can't see why you want to skip that second tc BPF program,
> or for that matter any tc filter there. In general how do you
> know that is the correct/expected behavior? Before the above
> change it would have been called, what if its doing useful
> work.
> 
> Also its not clear how your ifb setup is built or used. That
> might help understand your use case. I would just remove the
> IFB altogether and the above discussion is mute.
> 
> Thanks,
> John

After a bit further thought (and coffee) I think this will
break some programs that exist today. Consider the case
where I pop a header off and resubmit to the same device
intentionally to reprocess the pkt without the header. I've
used this pattern in BPF a few times.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 16:43   ` John Fastabend
  2021-12-10 16:52     ` John Fastabend
@ 2021-12-10 17:37     ` Tonghao Zhang
  2021-12-10 17:46       ` Tonghao Zhang
  1 sibling, 1 reply; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-10 17:37 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> xiangxia.m.yue@ wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >   sched layer, even though skb don't want to be classified.
> >   That case may consume a lot of cpu cycles. This patch
> >   is useful when there are a lot of filters with different
> >   prio. There is ~5 prio in in production, ~1% improvement.
> >
> >   Rules as below:
> >   $ for id in $(seq 1 5); do
> >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >   $ done
> >
> > * bpf_redirect may be invoked in egress path. If we don't
> >   check the flags and then return immediately, the packets
> >   will loopback.
>
> This would be the naive case right? Meaning the BPF program is
> doing a redirect without any logic or is buggy?
>
> Can you map out how this happens for me, I'm not fully sure I
> understand the exact concern. Is it possible for BPF programs
> that used to see packets no longer see the packet as expected?
>
> Is this the path you are talking about?
Hi John
Tx ethx -> __dev_queue_xmit -> sch_handle_egress
->  execute BPF program on ethx with bpf_redirect(ifb0) ->
-> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
the packets loopbacks, that means bpf_redirect doesn't work with ifb
netdev, right ?
so in sch_handle_egress, I add the check skb_skip_tc_classify().

>  rx ethx  ->
>    execute BPF program on ethx with bpf_redirect(ifb0) ->
>      __skb_dequeue @ifb tc_skip_classify = 1 ->
>        dev_queue_xmit() ->
>           sch_handle_egress() ->
>             execute BPF program again
>
> I can't see why you want to skip that second tc BPF program,
> or for that matter any tc filter there. In general how do you
> know that is the correct/expected behavior? Before the above
> change it would have been called, what if its doing useful
> work.
bpf_redirect works fine on ingress with ifb
__netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
-> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
but
__netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
execute the BPF progam again)

> Also its not clear how your ifb setup is built or used. That
> might help understand your use case. I would just remove the
> IFB altogether and the above discussion is mute.
tc filter add dev veth1 egress bpf direct-action obj
test_bpf_redirect_ifb.o sec redirect_ifb

the test_bpf_redirect_ifb  bpf progam:
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 DiDi Global */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("redirect_ifb")
+int redirect(struct __sk_buff *skb)
+{
+       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
+}
+
+char __license[] SEC("license") = "GPL";

The 3/3 is selftest:
https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/

> Thanks,
> John



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 16:52     ` John Fastabend
@ 2021-12-10 17:43       ` Tonghao Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-10 17:43 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On Sat, Dec 11, 2021 at 12:52 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> John Fastabend wrote:
> > xiangxia.m.yue@ wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >   sched layer, even though skb don't want to be classified.
> > >   That case may consume a lot of cpu cycles. This patch
> > >   is useful when there are a lot of filters with different
> > >   prio. There is ~5 prio in in production, ~1% improvement.
> > >
> > >   Rules as below:
> > >   $ for id in $(seq 1 5); do
> > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >   $ done
> > >
> > > * bpf_redirect may be invoked in egress path. If we don't
> > >   check the flags and then return immediately, the packets
> > >   will loopback.
> >
> > This would be the naive case right? Meaning the BPF program is
> > doing a redirect without any logic or is buggy?
> >
> > Can you map out how this happens for me, I'm not fully sure I
> > understand the exact concern. Is it possible for BPF programs
> > that used to see packets no longer see the packet as expected?
> >
> > Is this the path you are talking about?
> >
> >  rx ethx  ->
> >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> >        dev_queue_xmit() ->
> >           sch_handle_egress() ->
> >             execute BPF program again
> >
> > I can't see why you want to skip that second tc BPF program,
> > or for that matter any tc filter there. In general how do you
> > know that is the correct/expected behavior? Before the above
> > change it would have been called, what if its doing useful
> > work.
> >
> > Also its not clear how your ifb setup is built or used. That
> > might help understand your use case. I would just remove the
> > IFB altogether and the above discussion is mute.
> >
> > Thanks,
> > John
>
> After a bit further thought (and coffee) I think this will
> break some programs that exist today. Consider the case
> where I pop a header off and resubmit to the same device
> intentionally to reprocess the pkt without the header. I've
> used this pattern in BPF a few times.
No, ifb netdev sets the skb->tc_skip_classify = 1, that means we
should not process the skb again, no matter on egress or ingress.
if the bpf programs don't set the skb->tc_skip_classify = 1, then we
can process them again.

-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 17:37     ` Tonghao Zhang
@ 2021-12-10 17:46       ` Tonghao Zhang
  2021-12-10 19:54         ` Tonghao Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-10 17:46 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > xiangxia.m.yue@ wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >   sched layer, even though skb don't want to be classified.
> > >   That case may consume a lot of cpu cycles. This patch
> > >   is useful when there are a lot of filters with different
> > >   prio. There is ~5 prio in in production, ~1% improvement.
> > >
> > >   Rules as below:
> > >   $ for id in $(seq 1 5); do
> > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >   $ done
> > >
> > > * bpf_redirect may be invoked in egress path. If we don't
> > >   check the flags and then return immediately, the packets
> > >   will loopback.
> >
> > This would be the naive case right? Meaning the BPF program is
> > doing a redirect without any logic or is buggy?
> >
> > Can you map out how this happens for me, I'm not fully sure I
> > understand the exact concern. Is it possible for BPF programs
> > that used to see packets no longer see the packet as expected?
> >
> > Is this the path you are talking about?
> Hi John
> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> netdev, right ?
> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>
> >  rx ethx  ->
> >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> >        dev_queue_xmit() ->
> >           sch_handle_egress() ->
> >             execute BPF program again
> >
> > I can't see why you want to skip that second tc BPF program,
> > or for that matter any tc filter there. In general how do you
> > know that is the correct/expected behavior? Before the above
> > change it would have been called, what if its doing useful
> > work.
> bpf_redirect works fine on ingress with ifb
> __netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
> -> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
> but
> __netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
> execute the BPF progam again)
so the packets will NOT execute the BPF progam again)

> > Also its not clear how your ifb setup is built or used. That
> > might help understand your use case. I would just remove the
> > IFB altogether and the above discussion is mute.
> tc filter add dev veth1 egress bpf direct-action obj
> test_bpf_redirect_ifb.o sec redirect_ifb
>
> the test_bpf_redirect_ifb  bpf progam:
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 DiDi Global */
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +SEC("redirect_ifb")
> +int redirect(struct __sk_buff *skb)
> +{
> +       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
> +}
> +
> +char __license[] SEC("license") = "GPL";
>
> The 3/3 is selftest:
> https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
>
> > Thanks,
> > John
>
>
>
> --
> Best regards, Tonghao



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 17:46       ` Tonghao Zhang
@ 2021-12-10 19:54         ` Tonghao Zhang
  2021-12-10 20:11           ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-10 19:54 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > <john.fastabend@gmail.com> wrote:
> > >
> > > xiangxia.m.yue@ wrote:
> > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >
> > > > Try to resolve the issues as below:
> > > > * We look up and then check tc_skip_classify flag in net
> > > >   sched layer, even though skb don't want to be classified.
> > > >   That case may consume a lot of cpu cycles. This patch
> > > >   is useful when there are a lot of filters with different
> > > >   prio. There is ~5 prio in in production, ~1% improvement.
> > > >
> > > >   Rules as below:
> > > >   $ for id in $(seq 1 5); do
> > > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > > >   $ done
> > > >
> > > > * bpf_redirect may be invoked in egress path. If we don't
> > > >   check the flags and then return immediately, the packets
> > > >   will loopback.
> > >
> > > This would be the naive case right? Meaning the BPF program is
> > > doing a redirect without any logic or is buggy?
> > >
> > > Can you map out how this happens for me, I'm not fully sure I
> > > understand the exact concern. Is it possible for BPF programs
> > > that used to see packets no longer see the packet as expected?
> > >
> > > Is this the path you are talking about?
> > Hi John
> > Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > netdev, right ?
> > so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >
> > >  rx ethx  ->
> > >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> > >        dev_queue_xmit() ->
> > >           sch_handle_egress() ->
> > >             execute BPF program again
> > >
> > > I can't see why you want to skip that second tc BPF program,
> > > or for that matter any tc filter there. In general how do you
> > > know that is the correct/expected behavior? Before the above
> > > change it would have been called, what if its doing useful
> > > work.
> > bpf_redirect works fine on ingress with ifb
> > __netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
> > -> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
> > but
> > __netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
> > execute the BPF progam again)
> so the packets will NOT execute the BPF progam again)
>
> > > Also its not clear how your ifb setup is built or used. That
> > > might help understand your use case. I would just remove the
> > > IFB altogether and the above discussion is mute.
> > tc filter add dev veth1 egress bpf direct-action obj
> > test_bpf_redirect_ifb.o sec redirect_ifb
> >
> > the test_bpf_redirect_ifb  bpf progam:
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2021 DiDi Global */
> > +
> > +#include <linux/bpf.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +SEC("redirect_ifb")
> > +int redirect(struct __sk_buff *skb)
> > +{
> > +       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
> > +}
> > +
> > +char __license[] SEC("license") = "GPL";
> >
> > The 3/3 is selftest:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> >
> > > Thanks,
> > > John
Hi John
Did I answer your question? I hope I explained things clearly.
> >
> >
> > --
> > Best regards, Tonghao
>
>
>
> --
> Best regards, Tonghao



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 19:54         ` Tonghao Zhang
@ 2021-12-10 20:11           ` Daniel Borkmann
  2021-12-11  0:37             ` Tonghao Zhang
  2021-12-12  9:40             ` Tonghao Zhang
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Borkmann @ 2021-12-10 20:11 UTC (permalink / raw)
  To: Tonghao Zhang, John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Eric Dumazet, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Arnd Bergmann

On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
>>> <john.fastabend@gmail.com> wrote:
>>>> xiangxia.m.yue@ wrote:
>>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>>>>
>>>>> Try to resolve the issues as below:
>>>>> * We look up and then check tc_skip_classify flag in net
>>>>>    sched layer, even though skb don't want to be classified.
>>>>>    That case may consume a lot of cpu cycles. This patch
>>>>>    is useful when there are a lot of filters with different
>>>>>    prio. There is ~5 prio in in production, ~1% improvement.
>>>>>
>>>>>    Rules as below:
>>>>>    $ for id in $(seq 1 5); do
>>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>>>>>    $ done
>>>>>
>>>>> * bpf_redirect may be invoked in egress path. If we don't
>>>>>    check the flags and then return immediately, the packets
>>>>>    will loopback.
>>>>
>>>> This would be the naive case right? Meaning the BPF program is
>>>> doing a redirect without any logic or is buggy?
>>>>
>>>> Can you map out how this happens for me, I'm not fully sure I
>>>> understand the exact concern. Is it possible for BPF programs
>>>> that used to see packets no longer see the packet as expected?
>>>>
>>>> Is this the path you are talking about?
>>> Hi John
>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
>>> netdev, right ?
>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().

But why would you do that? Usage like this is just broken by design..
If you need to loop anything back to RX, just use bpf_redirect() with
BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 20:11           ` Daniel Borkmann
@ 2021-12-11  0:37             ` Tonghao Zhang
  2021-12-16 12:37               ` Daniel Borkmann
  2021-12-12  9:40             ` Tonghao Zhang
  1 sibling, 1 reply; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-11  0:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: John Fastabend, Linux Kernel Network Developers, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> >>> <john.fastabend@gmail.com> wrote:
> >>>> xiangxia.m.yue@ wrote:
> >>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>>>
> >>>>> Try to resolve the issues as below:
> >>>>> * We look up and then check tc_skip_classify flag in net
> >>>>>    sched layer, even though skb don't want to be classified.
> >>>>>    That case may consume a lot of cpu cycles. This patch
> >>>>>    is useful when there are a lot of filters with different
> >>>>>    prio. There is ~5 prio in in production, ~1% improvement.
> >>>>>
> >>>>>    Rules as below:
> >>>>>    $ for id in $(seq 1 5); do
> >>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >>>>>    $ done
> >>>>>
> >>>>> * bpf_redirect may be invoked in egress path. If we don't
> >>>>>    check the flags and then return immediately, the packets
> >>>>>    will loopback.
> >>>>
> >>>> This would be the naive case right? Meaning the BPF program is
> >>>> doing a redirect without any logic or is buggy?
> >>>>
> >>>> Can you map out how this happens for me, I'm not fully sure I
> >>>> understand the exact concern. Is it possible for BPF programs
> >>>> that used to see packets no longer see the packet as expected?
> >>>>
> >>>> Is this the path you are talking about?
> >>> Hi John
> >>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> >>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> >>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> >>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> >>> netdev, right ?
> >>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>
> But why would you do that? Usage like this is just broken by design..
As I understand, we can redirect packets to a target device either at
ingress or at *egress

The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
Allow eBPF programs attached to classifier/actions to call
bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
redirect the packet by dynamic ifindex selection from within the
program to a target device either at ingress or at egress. Can be used
for various scenarios, for example, to load balance skbs into veths,
split parts of the traffic to local taps, etc.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6

But at egress the bpf_redirect doesn't work with ifb.
> If you need to loop anything back to RX, just use bpf_redirect() with
Not use it to loop packets back. the flags of bpf_redirect is 0. for example:

tc filter add dev veth1 \
egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
We load balance the packets to different ifb netdevices at egress. On
ifb, we install filters, rate limit police,




-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-10 20:11           ` Daniel Borkmann
  2021-12-11  0:37             ` Tonghao Zhang
@ 2021-12-12  9:40             ` Tonghao Zhang
  2021-12-14  2:27               ` Tonghao Zhang
  1 sibling, 1 reply; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-12  9:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: John Fastabend, Linux Kernel Network Developers, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> >>> <john.fastabend@gmail.com> wrote:
> >>>> xiangxia.m.yue@ wrote:
> >>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>>>
> >>>>> Try to resolve the issues as below:
> >>>>> * We look up and then check tc_skip_classify flag in net
> >>>>>    sched layer, even though skb don't want to be classified.
> >>>>>    That case may consume a lot of cpu cycles. This patch
> >>>>>    is useful when there are a lot of filters with different
> >>>>>    prio. There is ~5 prio in in production, ~1% improvement.
> >>>>>
> >>>>>    Rules as below:
> >>>>>    $ for id in $(seq 1 5); do
> >>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >>>>>    $ done
> >>>>>
> >>>>> * bpf_redirect may be invoked in egress path. If we don't
> >>>>>    check the flags and then return immediately, the packets
> >>>>>    will loopback.
> >>>>
> >>>> This would be the naive case right? Meaning the BPF program is
> >>>> doing a redirect without any logic or is buggy?
> >>>>
> >>>> Can you map out how this happens for me, I'm not fully sure I
> >>>> understand the exact concern. Is it possible for BPF programs
> >>>> that used to see packets no longer see the packet as expected?
> >>>>
> >>>> Is this the path you are talking about?
> >>> Hi John
> >>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> >>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> >>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> >>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> >>> netdev, right ?
> >>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>
> But why would you do that? Usage like this is just broken by design..
> If you need to loop anything back to RX, just use bpf_redirect() with
> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
Hi
note that: ifb_ri_tasklet can send out the packets or receive skb
ifb_ri_tasklet
                if (!skb->from_ingress) {
                        dev_queue_xmit(skb); // bpf_redirect to ifb
and ifb invoked the dev_queue_xmit in our case.
                } else {
                        skb_pull_rcsum(skb, skb->mac_len);
                        netif_receive_skb(skb);
                }
-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-12  9:40             ` Tonghao Zhang
@ 2021-12-14  2:27               ` Tonghao Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-14  2:27 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Eric Dumazet, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Arnd Bergmann

On Sun, Dec 12, 2021 at 5:40 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > > On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > >>> <john.fastabend@gmail.com> wrote:
> > >>>> xiangxia.m.yue@ wrote:
> > >>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >>>>>
> > >>>>> Try to resolve the issues as below:
> > >>>>> * We look up and then check tc_skip_classify flag in net
> > >>>>>    sched layer, even though skb don't want to be classified.
> > >>>>>    That case may consume a lot of cpu cycles. This patch
> > >>>>>    is useful when there are a lot of filters with different
> > >>>>>    prio. There is ~5 prio in in production, ~1% improvement.
> > >>>>>
> > >>>>>    Rules as below:
> > >>>>>    $ for id in $(seq 1 5); do
> > >>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >>>>>    $ done
> > >>>>>
> > >>>>> * bpf_redirect may be invoked in egress path. If we don't
> > >>>>>    check the flags and then return immediately, the packets
> > >>>>>    will loopback.
> > >>>>
> > >>>> This would be the naive case right? Meaning the BPF program is
> > >>>> doing a redirect without any logic or is buggy?
> > >>>>
> > >>>> Can you map out how this happens for me, I'm not fully sure I
> > >>>> understand the exact concern. Is it possible for BPF programs
> > >>>> that used to see packets no longer see the packet as expected?
> > >>>>
> > >>>> Is this the path you are talking about?
> > >>> Hi John
> > >>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > >>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > >>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > >>> netdev, right ?
> > >>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >
> > But why would you do that? Usage like this is just broken by design..
> > If you need to loop anything back to RX, just use bpf_redirect() with
> > BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> Hi
> note that: ifb_ri_tasklet can send out the packets or receive skb
> ifb_ri_tasklet
>                 if (!skb->from_ingress) {
>                         dev_queue_xmit(skb); // bpf_redirect to ifb
> and ifb invoked the dev_queue_xmit in our case.
>                 } else {
>                         skb_pull_rcsum(skb, skb->mac_len);
>                         netif_receive_skb(skb);
>                 }
Hi
In this thread, I try to explain this patch, and answer questions.
What should I do next? v1-v4 is "Changes Requested" in patchwork, but
v5 is "Rejected"
Should I add more commit message in this patch, and send v6 ?
1/3, 3/3 patch still need to be reviewed.

> --
> Best regards, Tonghao



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-11  0:37             ` Tonghao Zhang
@ 2021-12-16 12:37               ` Daniel Borkmann
  2021-12-17  3:21                 ` Tonghao Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2021-12-16 12:37 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: John Fastabend, Linux Kernel Network Developers, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
>>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
>>>>> <john.fastabend@gmail.com> wrote:
>>>>>> xiangxia.m.yue@ wrote:
[...]
>>>>> Hi John
>>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
>>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
>>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
>>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
>>>>> netdev, right ?
>>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>>
>> But why would you do that? Usage like this is just broken by design..
> As I understand, we can redirect packets to a target device either at
> ingress or at *egress
> 
> The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> Allow eBPF programs attached to classifier/actions to call
> bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> redirect the packet by dynamic ifindex selection from within the
> program to a target device either at ingress or at egress. Can be used
> for various scenarios, for example, to load balance skbs into veths,
> split parts of the traffic to local taps, etc.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> 
> But at egress the bpf_redirect doesn't work with ifb.
>> If you need to loop anything back to RX, just use bpf_redirect() with
> Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> 
> tc filter add dev veth1 \
> egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
>> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> We load balance the packets to different ifb netdevices at egress. On
> ifb, we install filters, rate limit police,

I guess this part here is what I don't quite follow. Could you walk me through
the packet flow in this case? So you go from bpf@tc-egress@phys-dev to do the
redirect to bpf@tc-egress@ifb, and then again to bpf@tc-egress@phys-dev (same
dev or different one I presume)? Why not doing the load-balancing, applying the
policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
bpf@tc-egress@phys-dev location given bpf is perfectly capable to do all of it
there w/o the extra detour & overhead through ifb? The issue I see here is adding
extra overhead to support such a narrow case that nobody else is using and that
can be achieved already with existing infra as I understood it; the justification
right now to add the extra checks to the critical fast path is very thin..

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-16 12:37               ` Daniel Borkmann
@ 2021-12-17  3:21                 ` Tonghao Zhang
  2022-01-10  1:34                   ` Tonghao Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Tonghao Zhang @ 2021-12-17  3:21 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: John Fastabend, Linux Kernel Network Developers, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

On Thu, Dec 16, 2021 at 8:37 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> > On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> >>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> >>>>> <john.fastabend@gmail.com> wrote:
> >>>>>> xiangxia.m.yue@ wrote:
> [...]
> >>>>> Hi John
> >>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> >>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> >>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> >>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> >>>>> netdev, right ?
> >>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >>
> >> But why would you do that? Usage like this is just broken by design..
> > As I understand, we can redirect packets to a target device either at
> > ingress or at *egress
> >
> > The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> > Allow eBPF programs attached to classifier/actions to call
> > bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> > redirect the packet by dynamic ifindex selection from within the
> > program to a target device either at ingress or at egress. Can be used
> > for various scenarios, for example, to load balance skbs into veths,
> > split parts of the traffic to local taps, etc.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> >
> > But at egress the bpf_redirect doesn't work with ifb.
> >> If you need to loop anything back to RX, just use bpf_redirect() with
> > Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> >
> > tc filter add dev veth1 \
> > egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> >> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> > We load balance the packets to different ifb netdevices at egress. On
> > ifb, we install filters, rate limit police,
Hi Daniel, I try to answer your question. thanks
> I guess this part here is what I don't quite follow. Could you walk me through
> the packet flow in this case? So you go from bpf@tc-egress@phys-dev to do the
> redirect to bpf@tc-egress@ifb, and then again to bpf@tc-egress@phys-dev (same
ifb doesn't install bpf prog, ifb will redirect packets to netdevice
which packets from.
ifb_ri_tasklet
1. skb->dev = dev_get_by_index_rcu(dev_net(txp->dev), skb->skb_iif);
// skb_iif is phys-dev
2. dev_queue_xmit

> dev or different one I presume)? Why not doing the load-balancing, applying the
> policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
> bpf@tc-egress@phys-dev location given bpf is perfectly capable to do all of it
> there w/o the extra detour & overhead through ifb? The issue I see here is adding
This is not easy to explain. We have two solution for our production,
one is this, other one is my
other patchset:
https://patchwork.kernel.org/project/netdevbpf/list/?series=593409&archive=both&state=*

Why I need those solutions, please see my patch 1/2 commit message.
https://patchwork.kernel.org/project/netdevbpf/patch/20211210023626.20905-2-xiangxia.m.yue@gmail.com/

The different of two solution is that this one use ifb to do the
rate-limiting/load-blance for applications which
key is high throughput. Then for  applications which key is  the low
latency of data access, can use the
mq or fifo qdisc in the phys-dev. This is very useful in k8s
environment. Anyway my other patchset is better
for this one sulotion. but this one solution is easy to used because
we can use the tc cmd(not bpf) to install filters.

This patch try to fix the bug in bpf.
> extra overhead to support such a narrow case that nobody else is using and that
I don't agree that, I think it's very useful in k8s. If the pods of
k8s are sharing the network resource, this is really a narrow case.
But part of pods want low latency, and other part of pods want high
throughput, this solution is selectable.
> can be achieved already with existing infra as I understood it; the justification
> right now to add the extra checks to the critical fast path is very thin..
note that:
For tc cmd, there is the check "skb_skip_tc_classify" in
tcf_action_exec, This patch only want to move check to
sch_handle_egress to avoid the
unessential tc filter lookup.
For bpf_redirect, we don't run tcf_action_exec, the packets loopback
if there is no check. This patch try to fix bug(bpf), and improve
performance(tc)

I use the netperf/iperf to measure performance, It's not easy to
figure out performance drop.
> Thanks,
> Daniel



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-12-17  3:21                 ` Tonghao Zhang
@ 2022-01-10  1:34                   ` Tonghao Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Tonghao Zhang @ 2022-01-10  1:34 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: John Fastabend, Linux Kernel Network Developers, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Arnd Bergmann

 t

On Fri, Dec 17, 2021 at 11:21 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Thu, Dec 16, 2021 at 8:37 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> > > On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > >>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > >>>>> <john.fastabend@gmail.com> wrote:
> > >>>>>> xiangxia.m.yue@ wrote:
> > [...]
> > >>>>> Hi John
> > >>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > >>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > >>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > >>>>> netdev, right ?
> > >>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> > >>
> > >> But why would you do that? Usage like this is just broken by design..
> > > As I understand, we can redirect packets to a target device either at
> > > ingress or at *egress
> > >
> > > The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> > > Allow eBPF programs attached to classifier/actions to call
> > > bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> > > redirect the packet by dynamic ifindex selection from within the
> > > program to a target device either at ingress or at egress. Can be used
> > > for various scenarios, for example, to load balance skbs into veths,
> > > split parts of the traffic to local taps, etc.
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> > >
> > > But at egress the bpf_redirect doesn't work with ifb.
> > >> If you need to loop anything back to RX, just use bpf_redirect() with
> > > Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> > >
> > > tc filter add dev veth1 \
> > > egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> > >> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> > > We load balance the packets to different ifb netdevices at egress. On
> > > ifb, we install filters, rate limit police,
> Hi Daniel, I try to answer your question. thanks
> > I guess this part here is what I don't quite follow. Could you walk me through
> > the packet flow in this case? So you go from bpf@tc-egress@phys-dev to do the
> > redirect to bpf@tc-egress@ifb, and then again to bpf@tc-egress@phys-dev (same
> ifb doesn't install bpf prog, ifb will redirect packets to netdevice
> which packets from.
> ifb_ri_tasklet
> 1. skb->dev = dev_get_by_index_rcu(dev_net(txp->dev), skb->skb_iif);
> // skb_iif is phys-dev
> 2. dev_queue_xmit
>
> > dev or different one I presume)? Why not doing the load-balancing, applying the
> > policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
> > bpf@tc-egress@phys-dev location given bpf is perfectly capable to do all of it
> > there w/o the extra detour & overhead through ifb? The issue I see here is adding
> This is not easy to explain. We have two solution for our production,
> one is this, other one is my
> other patchset:
> https://patchwork.kernel.org/project/netdevbpf/list/?series=593409&archive=both&state=*
>
> Why I need those solutions, please see my patch 1/2 commit message.
> https://patchwork.kernel.org/project/netdevbpf/patch/20211210023626.20905-2-xiangxia.m.yue@gmail.com/
>
> The different of two solution is that this one use ifb to do the
> rate-limiting/load-blance for applications which
> key is high throughput. Then for  applications which key is  the low
> latency of data access, can use the
> mq or fifo qdisc in the phys-dev. This is very useful in k8s
> environment. Anyway my other patchset is better
> for this one sulotion. but this one solution is easy to used because
> we can use the tc cmd(not bpf) to install filters.
>
> This patch try to fix the bug in bpf.
> > extra overhead to support such a narrow case that nobody else is using and that
> I don't agree that, I think it's very useful in k8s. If the pods of
> k8s are sharing the network resource, this is really a narrow case.
> But part of pods want low latency, and other part of pods want high
> throughput, this solution is selectable.
> > can be achieved already with existing infra as I understood it; the justification
> > right now to add the extra checks to the critical fast path is very thin..
> note that:
> For tc cmd, there is the check "skb_skip_tc_classify" in
> tcf_action_exec, This patch only want to move check to
> sch_handle_egress to avoid the
> unessential tc filter lookup.
> For bpf_redirect, we don't run tcf_action_exec, the packets loopback
> if there is no check. This patch try to fix bug(bpf), and improve
> performance(tc)
>
> I use the netperf/iperf to measure performance, It's not easy to
> figure out performance drop.
Hi Daniel
any process ? Did I answer your questions?
 if you think we should not add(only move skb_skip_tc_classify from
tcf_action_exec to sch_handle_egress) this check in fastpath. I'am OK
to drop this patch.
but we can wait for other users to report this issue in the future. If
so, we can fix it with this patch. Can you help me to review other
patch 1/3 and 3/3.
> > Thanks,
> > Daniel
>
>
>
> --
> Best regards, Tonghao



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-01-10  1:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 14:54 [net v5 0/3] fix bpf_redirect to ifb netdev xiangxia.m.yue
2021-12-08 14:54 ` [net v5 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue
2021-12-08 14:54 ` [net v5 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
2021-12-10 16:43   ` John Fastabend
2021-12-10 16:52     ` John Fastabend
2021-12-10 17:43       ` Tonghao Zhang
2021-12-10 17:37     ` Tonghao Zhang
2021-12-10 17:46       ` Tonghao Zhang
2021-12-10 19:54         ` Tonghao Zhang
2021-12-10 20:11           ` Daniel Borkmann
2021-12-11  0:37             ` Tonghao Zhang
2021-12-16 12:37               ` Daniel Borkmann
2021-12-17  3:21                 ` Tonghao Zhang
2022-01-10  1:34                   ` Tonghao Zhang
2021-12-12  9:40             ` Tonghao Zhang
2021-12-14  2:27               ` Tonghao Zhang
2021-12-08 14:54 ` [net v5 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue
2021-12-08 15:41 ` [net v5 0/3] fix bpf_redirect to ifb netdev Alexander Lobakin
2021-12-08 15:53   ` Tonghao Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.