All of lore.kernel.org
 help / color / mirror / Atom feed
* [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb
@ 2021-11-29  4:55 xiangxia.m.yue
  2021-11-29  4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
  2021-11-29  4:55 ` [net v3 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue
  0 siblings, 2 replies; 6+ messages in thread
From: xiangxia.m.yue @ 2021-11-29  4:55 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

We may use bpf_redirect to redirect the packets to other
netdevice (e.g. ifb) in ingress and egress path.

The target netdevice may check the *skb_iif, *redirected
and *from_ingress, for example, if skb_iif or redirected
is 0, ifb will drop the packets.

bpf_redirect may be invoked in ingress or egress path, so
we set the *skb_iif unconditionally.

Fixes: a70b506efe89 ("bpf: enforce recursion limit on redirects")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/filter.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 8271624a19aa..225dc8743863 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2107,9 +2107,19 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
 		return -ENETDOWN;
 	}
 
-	skb->dev = dev;
+	/* The target netdevice (e.g. ifb) may use the:
+	 * - skb_iif, bpf_redirect may be invoked in ingress or egress path.
+	 * - redirected
+	 * - from_ingress
+	 */
+	skb->skb_iif = skb->dev->ifindex;
+#ifdef CONFIG_NET_CLS_ACT
+	skb_set_redirected(skb, skb->tc_at_ingress);
+#else
 	skb->tstamp = 0;
+#endif
 
+	skb->dev = dev;
 	dev_xmit_recursion_inc();
 	ret = dev_queue_xmit(skb);
 	dev_xmit_recursion_dec();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [net v3 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-11-29  4:55 [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue
@ 2021-11-29  4:55 ` xiangxia.m.yue
  2021-11-29 17:44   ` Eric Dumazet
  2021-11-29  4:55 ` [net v3 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue
  1 sibling, 1 reply; 6+ messages in thread
From: xiangxia.m.yue @ 2021-11-29  4:55 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Willem de Bruijn, Cong Wang, Jakub Kicinski,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Eric Dumazet, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Björn Töpel,
	Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Try to resolve the issues as below:
* We look up and then check tc_skip_classify flag in net
  sched layer, even though skb don't want to be classified.
  That case may consume a lot of cpu cycles.

  Install the rules as below:
  $ for id in $(seq 1 100); do
  $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
  $ done

  netperf:
  $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
  $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32

  Before: 10662.33 tps, 108.95 Mbit/s
  After:  12434.48 tps, 145.89 Mbit/s
  For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.

* bpf_redirect may be invoked in egress path. if we don't
  check the flags and then return immediately, the packets
  will loopback.

  $ tc filter add dev eth0 egress bpf direct-action obj \
	  test_tc_redirect_ifb.o sec redirect_ifb

Cc: Willem de Bruijn <willemb@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: "Björn Töpel" <bjorn@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
v2: https://patchwork.kernel.org/project/netdevbpf/patch/20211103143208.41282-1-xiangxia.m.yue@gmail.com/
Willem de Bruijn and Daniel Borkmann, comment this patch, but I think we should fix this,
bpf_redirect may also loopback the packets. I hope there are more comments?
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 823917de0d2b..4ceb927b1577 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3823,6 +3823,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	if (!miniq)
 		return skb;
 
+	if (skb_skip_tc_classify(skb))
+		return skb;
+
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	qdisc_skb_cb(skb)->mru = 0;
 	qdisc_skb_cb(skb)->post_ct = false;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [net v3 3/3] selftests: bpf: add bpf_redirect to ifb
  2021-11-29  4:55 [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue
  2021-11-29  4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
@ 2021-11-29  4:55 ` xiangxia.m.yue
  1 sibling, 0 replies; 6+ messages in thread
From: xiangxia.m.yue @ 2021-11-29  4:55 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Willem de Bruijn, Cong Wang, Jakub Kicinski,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Eric Dumazet, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Björn Töpel,
	Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

ifb netdev is used for queueing incoming traffic for shaping.
we may run bpf progs in tc cls hook(ingress or egress), to
redirect the packets to ifb.

This patch adds this test, for bpf.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: "Björn Töpel" <bjorn@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 tools/testing/selftests/bpf/Makefile          |  1 +
 .../bpf/progs/test_bpf_redirect_ifb.c         | 10 +++
 .../selftests/bpf/test_bpf_redirect_ifb.sh    | 73 +++++++++++++++++++
 3 files changed, 84 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
 create mode 100755 tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 5d42db2e129a..6ec8b97af0ea 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -65,6 +65,7 @@ TEST_PROGS := test_kmod.sh \
 	test_xdp_vlan_mode_native.sh \
 	test_lwt_ip_encap.sh \
 	test_tcp_check_syncookie.sh \
+	test_bpf_redirect_ifb.sh \
 	test_tc_tunnel.sh \
 	test_tc_edt.sh \
 	test_xdping.sh \
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c b/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
new file mode 100644
index 000000000000..d3205ad5e35a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c
@@ -0,0 +1,10 @@
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("redirect_ifb")
+int redirect(struct __sk_buff *skb)
+{
+	return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh b/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh
new file mode 100755
index 000000000000..0933439696ab
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+
+# Topology:
+# ---------
+#      n1 namespace    |     n2 namespace
+#                      |
+#      -----------     |     ----------------
+#      |  veth0  | --------- |  veth1, ifb1 |
+#      -----------   peer    ----------------
+#
+
+readonly prefix="ns-$$-"
+readonly ns1="${prefix}1"
+readonly ns2="${prefix}2"
+readonly ns1_addr=192.168.1.1
+readonly ns2_addr=192.168.1.2
+
+setup() {
+	echo "Load ifb module"
+	if ! /sbin/modprobe -q -n ifb; then
+		echo "test_bpf_redirect ifb: module ifb is not found [SKIP]"
+		exit 4
+	fi
+
+	modprobe -q ifb numifbs=0
+
+	ip netns add "${ns1}"
+	ip netns add "${ns2}"
+
+	ip link add dev veth0 mtu 1500 netns "${ns1}" type veth \
+	      peer name veth1 mtu 1500 netns "${ns2}"
+	# ifb1 created after veth1
+	ip link add dev ifb1 mtu 1500 netns "${ns2}" type ifb
+
+	ip -netns "${ns1}" link set veth0 up
+	ip -netns "${ns2}" link set veth1 up
+	ip -netns "${ns2}" link set ifb1 up
+	ip -netns "${ns1}" -4 addr add "${ns1_addr}/24" dev veth0
+	ip -netns "${ns2}" -4 addr add "${ns2_addr}/24" dev veth1
+
+	ip netns exec "${ns2}" tc qdisc add dev veth1 clsact
+}
+
+cleanup() {
+	ip netns del "${ns2}" &>/dev/null
+	ip netns del "${ns1}" &>/dev/null
+	modprobe -r ifb
+}
+
+trap cleanup EXIT
+
+setup
+
+ip netns exec "${ns2}" tc filter add dev veth1 \
+	ingress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
+ip netns exec "${ns1}" ping -W 2 -c 2 -i 0.2 -q "${ns2_addr}" &>/dev/null
+if [ $? -ne 0 ]; then
+	echo "bpf redirect to ifb on ingress path [FAILED]"
+	exit 1
+fi
+
+ip netns exec "${ns2}" tc filter del dev veth1 ingress
+ip netns exec "${ns2}" tc filter add dev veth1 \
+	egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
+ip netns exec "${ns1}" ping -W 2 -c 2 -i 0.2 -q "${ns2_addr}" &>/dev/null
+if [ $? -ne 0 ]; then
+	echo "bpf redirect to ifb on egress path [FAILED]"
+	exit 1
+fi
+
+echo OK
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [net v3 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-11-29  4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
@ 2021-11-29 17:44   ` Eric Dumazet
  2021-11-30  1:24     ` Tonghao Zhang
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2021-11-29 17:44 UTC (permalink / raw)
  To: xiangxia.m.yue
  Cc: netdev, Willem de Bruijn, Cong Wang, Jakub Kicinski,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Björn Töpel, Arnd Bergmann

On Sun, Nov 28, 2021 at 8:55 PM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> Try to resolve the issues as below:
> * We look up and then check tc_skip_classify flag in net
>   sched layer, even though skb don't want to be classified.
>   That case may consume a lot of cpu cycles.
>
>   Install the rules as below:
>   $ for id in $(seq 1 100); do
>   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>   $ done
>
>   netperf:
>   $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
>   $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
>
>   Before: 10662.33 tps, 108.95 Mbit/s
>   After:  12434.48 tps, 145.89 Mbit/s
>   For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.

These numbers mean nothing, really.

I think you should put 10,000 filters instead of 100 so that the
numbers look even better ?

As a matter of fact, you add yet another check in fast  path.

For some reason I have not received the cover letter and patch 1/3.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net v3 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-11-29 17:44   ` Eric Dumazet
@ 2021-11-30  1:24     ` Tonghao Zhang
  2021-12-01 10:58       ` Tonghao Zhang
  0 siblings, 1 reply; 6+ messages in thread
From: Tonghao Zhang @ 2021-11-30  1:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Network Developers, Willem de Bruijn, Cong Wang,
	Jakub Kicinski, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Björn Töpel,
	Arnd Bergmann

On Tue, Nov 30, 2021 at 1:44 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sun, Nov 28, 2021 at 8:55 PM <xiangxia.m.yue@gmail.com> wrote:
> >
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >   sched layer, even though skb don't want to be classified.
> >   That case may consume a lot of cpu cycles.
> >
> >   Install the rules as below:
> >   $ for id in $(seq 1 100); do
> >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >   $ done
> >
> >   netperf:
> >   $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> >   $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> >
> >   Before: 10662.33 tps, 108.95 Mbit/s
> >   After:  12434.48 tps, 145.89 Mbit/s
> >   For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
>
> These numbers mean nothing, really.
>
> I think you should put 10,000 filters instead of 100 so that the
> numbers look even better ?
This 100 filters with different prio, I will install 10,000 filters
and test again. Thanks.
> As a matter of fact, you add yet another check in fast  path.
>
> For some reason I have not received the cover letter and patch 1/3.
1/3 patch, https://patchwork.kernel.org/project/netdevbpf/patch/20211129045503.20217-1-xiangxia.m.yue@gmail.com/


-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net v3 2/3] net: sched: add check tc_skip_classify in sch egress
  2021-11-30  1:24     ` Tonghao Zhang
@ 2021-12-01 10:58       ` Tonghao Zhang
  0 siblings, 0 replies; 6+ messages in thread
From: Tonghao Zhang @ 2021-12-01 10:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Network Developers, Willem de Bruijn, Cong Wang,
	Jakub Kicinski, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Björn Töpel,
	Arnd Bergmann

On Tue, Nov 30, 2021 at 9:24 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Tue, Nov 30, 2021 at 1:44 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Sun, Nov 28, 2021 at 8:55 PM <xiangxia.m.yue@gmail.com> wrote:
> > >
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >   sched layer, even though skb don't want to be classified.
> > >   That case may consume a lot of cpu cycles.
> > >
> > >   Install the rules as below:
> > >   $ for id in $(seq 1 100); do
> > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >   $ done
> > >
> > >   netperf:
> > >   $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> > >   $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> > >
> > >   Before: 10662.33 tps, 108.95 Mbit/s
> > >   After:  12434.48 tps, 145.89 Mbit/s
> > >   For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
> >
> > These numbers mean nothing, really.
> >
> > I think you should put 10,000 filters instead of 100 so that the
> > numbers look even better ?
> This 100 filters with different prio, I will install 10,000 filters
> and test again. Thanks.
Hi Eric
I install 10,000 filters with different prio: for example
tc filter add dev enp5s0f0 egress protocol ip prio 10000 flower
skip_hw src_ip 4.4.39.16 action mirred egress redirect dev ifb0

Test test commands:
taskset -c 1 netperf -t TCP_RR -L 4.4.39.16 -H 4.4.200.200 -- -r 32,32
taskset -c 1 netperf -t TCP_STREAM -L 4.4.39.16 -H 4.4.200.200 -- -m 32

Without patch:
152.04 tps
0.58 10^6bits/sec
With patch:
303.07 tps
1.51 10^6bits/sec

> > As a matter of fact, you add yet another check in fast  path.
> >
> > For some reason I have not received the cover letter and patch 1/3.
> 1/3 patch, https://patchwork.kernel.org/project/netdevbpf/patch/20211129045503.20217-1-xiangxia.m.yue@gmail.com/
>
>
> --
> Best regards, Tonghao



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-01 10:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-29  4:55 [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue
2021-11-29  4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue
2021-11-29 17:44   ` Eric Dumazet
2021-11-30  1:24     ` Tonghao Zhang
2021-12-01 10:58       ` Tonghao Zhang
2021-11-29  4:55 ` [net v3 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.