* [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb @ 2021-11-29 4:55 xiangxia.m.yue 2021-11-29 4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue 2021-11-29 4:55 ` [net v3 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue 0 siblings, 2 replies; 6+ messages in thread From: xiangxia.m.yue @ 2021-11-29 4:55 UTC (permalink / raw) To: netdev Cc: Tonghao Zhang, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, David S. Miller, Jakub Kicinski From: Tonghao Zhang <xiangxia.m.yue@gmail.com> We may use bpf_redirect to redirect the packets to other netdevice (e.g. ifb) in ingress and egress path. The target netdevice may check the *skb_iif, *redirected and *from_ingress, for example, if skb_iif or redirected is 0, ifb will drop the packets. bpf_redirect may be invoked in ingress or egress path, so we set the *skb_iif unconditionally. Fixes: a70b506efe89 ("bpf: enforce recursion limit on redirects") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> --- net/core/filter.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 8271624a19aa..225dc8743863 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2107,9 +2107,19 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) return -ENETDOWN; } - skb->dev = dev; + /* The target netdevice (e.g. ifb) may use the: + * - skb_iif, bpf_redirect may be invoked in ingress or egress path. + * - redirected + * - from_ingress + */ + skb->skb_iif = skb->dev->ifindex; +#ifdef CONFIG_NET_CLS_ACT + skb_set_redirected(skb, skb->tc_at_ingress); +#else skb->tstamp = 0; +#endif + skb->dev = dev; dev_xmit_recursion_inc(); ret = dev_queue_xmit(skb); dev_xmit_recursion_dec(); -- 2.27.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [net v3 2/3] net: sched: add check tc_skip_classify in sch egress 2021-11-29 4:55 [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue @ 2021-11-29 4:55 ` xiangxia.m.yue 2021-11-29 17:44 ` Eric Dumazet 2021-11-29 4:55 ` [net v3 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue 1 sibling, 1 reply; 6+ messages in thread From: xiangxia.m.yue @ 2021-11-29 4:55 UTC (permalink / raw) To: netdev Cc: Tonghao Zhang, Willem de Bruijn, Cong Wang, Jakub Kicinski, David S. Miller, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang, Björn Töpel, Arnd Bergmann From: Tonghao Zhang <xiangxia.m.yue@gmail.com> Try to resolve the issues as below: * We look up and then check tc_skip_classify flag in net sched layer, even though skb don't want to be classified. That case may consume a lot of cpu cycles. Install the rules as below: $ for id in $(seq 1 100); do $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 $ done netperf: $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 Before: 10662.33 tps, 108.95 Mbit/s After: 12434.48 tps, 145.89 Mbit/s For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. * bpf_redirect may be invoked in egress path. if we don't check the flags and then return immediately, the packets will loopback. $ tc filter add dev eth0 egress bpf direct-action obj \ test_tc_redirect_ifb.o sec redirect_ifb Cc: Willem de Bruijn <willemb@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Wei Wang <weiwan@google.com> Cc: "Björn Töpel" <bjorn@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> --- v2: https://patchwork.kernel.org/project/netdevbpf/patch/20211103143208.41282-1-xiangxia.m.yue@gmail.com/ Willem de Bruijn and Daniel Borkmann, comment this patch, but I think we should fix this, bpf_redirect may also loopback the packets. I hope there are more comments? --- net/core/dev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 823917de0d2b..4ceb927b1577 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3823,6 +3823,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) if (!miniq) return skb; + if (skb_skip_tc_classify(skb)) + return skb; + /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ qdisc_skb_cb(skb)->mru = 0; qdisc_skb_cb(skb)->post_ct = false; -- 2.27.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [net v3 2/3] net: sched: add check tc_skip_classify in sch egress 2021-11-29 4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue @ 2021-11-29 17:44 ` Eric Dumazet 2021-11-30 1:24 ` Tonghao Zhang 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2021-11-29 17:44 UTC (permalink / raw) To: xiangxia.m.yue Cc: netdev, Willem de Bruijn, Cong Wang, Jakub Kicinski, David S. Miller, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Antoine Tenart, Alexander Lobakin, Wei Wang, Björn Töpel, Arnd Bergmann On Sun, Nov 28, 2021 at 8:55 PM <xiangxia.m.yue@gmail.com> wrote: > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > Try to resolve the issues as below: > * We look up and then check tc_skip_classify flag in net > sched layer, even though skb don't want to be classified. > That case may consume a lot of cpu cycles. > > Install the rules as below: > $ for id in $(seq 1 100); do > $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 > $ done > > netperf: > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 > > Before: 10662.33 tps, 108.95 Mbit/s > After: 12434.48 tps, 145.89 Mbit/s > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. These numbers mean nothing, really. I think you should put 10,000 filters instead of 100 so that the numbers look even better ? As a matter of fact, you add yet another check in fast path. For some reason I have not received the cover letter and patch 1/3. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net v3 2/3] net: sched: add check tc_skip_classify in sch egress 2021-11-29 17:44 ` Eric Dumazet @ 2021-11-30 1:24 ` Tonghao Zhang 2021-12-01 10:58 ` Tonghao Zhang 0 siblings, 1 reply; 6+ messages in thread From: Tonghao Zhang @ 2021-11-30 1:24 UTC (permalink / raw) To: Eric Dumazet Cc: Linux Kernel Network Developers, Willem de Bruijn, Cong Wang, Jakub Kicinski, David S. Miller, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Antoine Tenart, Alexander Lobakin, Wei Wang, Björn Töpel, Arnd Bergmann On Tue, Nov 30, 2021 at 1:44 AM Eric Dumazet <edumazet@google.com> wrote: > > On Sun, Nov 28, 2021 at 8:55 PM <xiangxia.m.yue@gmail.com> wrote: > > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > > > Try to resolve the issues as below: > > * We look up and then check tc_skip_classify flag in net > > sched layer, even though skb don't want to be classified. > > That case may consume a lot of cpu cycles. > > > > Install the rules as below: > > $ for id in $(seq 1 100); do > > $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 > > $ done > > > > netperf: > > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 > > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 > > > > Before: 10662.33 tps, 108.95 Mbit/s > > After: 12434.48 tps, 145.89 Mbit/s > > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. > > These numbers mean nothing, really. > > I think you should put 10,000 filters instead of 100 so that the > numbers look even better ? This 100 filters with different prio, I will install 10,000 filters and test again. Thanks. > As a matter of fact, you add yet another check in fast path. > > For some reason I have not received the cover letter and patch 1/3. 1/3 patch, https://patchwork.kernel.org/project/netdevbpf/patch/20211129045503.20217-1-xiangxia.m.yue@gmail.com/ -- Best regards, Tonghao ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net v3 2/3] net: sched: add check tc_skip_classify in sch egress 2021-11-30 1:24 ` Tonghao Zhang @ 2021-12-01 10:58 ` Tonghao Zhang 0 siblings, 0 replies; 6+ messages in thread From: Tonghao Zhang @ 2021-12-01 10:58 UTC (permalink / raw) To: Eric Dumazet Cc: Linux Kernel Network Developers, Willem de Bruijn, Cong Wang, Jakub Kicinski, David S. Miller, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Antoine Tenart, Alexander Lobakin, Wei Wang, Björn Töpel, Arnd Bergmann On Tue, Nov 30, 2021 at 9:24 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > > On Tue, Nov 30, 2021 at 1:44 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Sun, Nov 28, 2021 at 8:55 PM <xiangxia.m.yue@gmail.com> wrote: > > > > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > > > > > Try to resolve the issues as below: > > > * We look up and then check tc_skip_classify flag in net > > > sched layer, even though skb don't want to be classified. > > > That case may consume a lot of cpu cycles. > > > > > > Install the rules as below: > > > $ for id in $(seq 1 100); do > > > $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 > > > $ done > > > > > > netperf: > > > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 > > > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 > > > > > > Before: 10662.33 tps, 108.95 Mbit/s > > > After: 12434.48 tps, 145.89 Mbit/s > > > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. > > > > These numbers mean nothing, really. > > > > I think you should put 10,000 filters instead of 100 so that the > > numbers look even better ? > This 100 filters with different prio, I will install 10,000 filters > and test again. Thanks. Hi Eric I install 10,000 filters with different prio: for example tc filter add dev enp5s0f0 egress protocol ip prio 10000 flower skip_hw src_ip 4.4.39.16 action mirred egress redirect dev ifb0 Test test commands: taskset -c 1 netperf -t TCP_RR -L 4.4.39.16 -H 4.4.200.200 -- -r 32,32 taskset -c 1 netperf -t TCP_STREAM -L 4.4.39.16 -H 4.4.200.200 -- -m 32 Without patch: 152.04 tps 0.58 10^6bits/sec With patch: 303.07 tps 1.51 10^6bits/sec > > As a matter of fact, you add yet another check in fast path. > > > > For some reason I have not received the cover letter and patch 1/3. > 1/3 patch, https://patchwork.kernel.org/project/netdevbpf/patch/20211129045503.20217-1-xiangxia.m.yue@gmail.com/ > > > -- > Best regards, Tonghao -- Best regards, Tonghao ^ permalink raw reply [flat|nested] 6+ messages in thread
* [net v3 3/3] selftests: bpf: add bpf_redirect to ifb 2021-11-29 4:55 [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue 2021-11-29 4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue @ 2021-11-29 4:55 ` xiangxia.m.yue 1 sibling, 0 replies; 6+ messages in thread From: xiangxia.m.yue @ 2021-11-29 4:55 UTC (permalink / raw) To: netdev Cc: Tonghao Zhang, Willem de Bruijn, Cong Wang, Jakub Kicinski, David S. Miller, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Eric Dumazet, Antoine Tenart, Alexander Lobakin, Wei Wang, Björn Töpel, Arnd Bergmann From: Tonghao Zhang <xiangxia.m.yue@gmail.com> ifb netdev is used for queueing incoming traffic for shaping. we may run bpf progs in tc cls hook(ingress or egress), to redirect the packets to ifb. This patch adds this test, for bpf. Cc: Willem de Bruijn <willemb@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Wei Wang <weiwan@google.com> Cc: "Björn Töpel" <bjorn@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> --- tools/testing/selftests/bpf/Makefile | 1 + .../bpf/progs/test_bpf_redirect_ifb.c | 10 +++ .../selftests/bpf/test_bpf_redirect_ifb.sh | 73 +++++++++++++++++++ 3 files changed, 84 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c create mode 100755 tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 5d42db2e129a..6ec8b97af0ea 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -65,6 +65,7 @@ TEST_PROGS := test_kmod.sh \ test_xdp_vlan_mode_native.sh \ test_lwt_ip_encap.sh \ test_tcp_check_syncookie.sh \ + test_bpf_redirect_ifb.sh \ test_tc_tunnel.sh \ test_tc_edt.sh \ test_xdping.sh \ diff --git a/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c b/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c new file mode 100644 index 000000000000..d3205ad5e35a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_bpf_redirect_ifb.c @@ -0,0 +1,10 @@ +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> + +SEC("redirect_ifb") +int redirect(struct __sk_buff *skb) +{ + return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0); +} + +char __license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh b/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh new file mode 100755 index 000000000000..0933439696ab --- /dev/null +++ b/tools/testing/selftests/bpf/test_bpf_redirect_ifb.sh @@ -0,0 +1,73 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# + +# Topology: +# --------- +# n1 namespace | n2 namespace +# | +# ----------- | ---------------- +# | veth0 | --------- | veth1, ifb1 | +# ----------- peer ---------------- +# + +readonly prefix="ns-$$-" +readonly ns1="${prefix}1" +readonly ns2="${prefix}2" +readonly ns1_addr=192.168.1.1 +readonly ns2_addr=192.168.1.2 + +setup() { + echo "Load ifb module" + if ! /sbin/modprobe -q -n ifb; then + echo "test_bpf_redirect ifb: module ifb is not found [SKIP]" + exit 4 + fi + + modprobe -q ifb numifbs=0 + + ip netns add "${ns1}" + ip netns add "${ns2}" + + ip link add dev veth0 mtu 1500 netns "${ns1}" type veth \ + peer name veth1 mtu 1500 netns "${ns2}" + # ifb1 created after veth1 + ip link add dev ifb1 mtu 1500 netns "${ns2}" type ifb + + ip -netns "${ns1}" link set veth0 up + ip -netns "${ns2}" link set veth1 up + ip -netns "${ns2}" link set ifb1 up + ip -netns "${ns1}" -4 addr add "${ns1_addr}/24" dev veth0 + ip -netns "${ns2}" -4 addr add "${ns2_addr}/24" dev veth1 + + ip netns exec "${ns2}" tc qdisc add dev veth1 clsact +} + +cleanup() { + ip netns del "${ns2}" &>/dev/null + ip netns del "${ns1}" &>/dev/null + modprobe -r ifb +} + +trap cleanup EXIT + +setup + +ip netns exec "${ns2}" tc filter add dev veth1 \ + ingress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb +ip netns exec "${ns1}" ping -W 2 -c 2 -i 0.2 -q "${ns2_addr}" &>/dev/null +if [ $? -ne 0 ]; then + echo "bpf redirect to ifb on ingress path [FAILED]" + exit 1 +fi + +ip netns exec "${ns2}" tc filter del dev veth1 ingress +ip netns exec "${ns2}" tc filter add dev veth1 \ + egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb +ip netns exec "${ns1}" ping -W 2 -c 2 -i 0.2 -q "${ns2_addr}" &>/dev/null +if [ $? -ne 0 ]; then + echo "bpf redirect to ifb on egress path [FAILED]" + exit 1 +fi + +echo OK -- 2.27.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-12-01 10:59 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-29 4:55 [net v3 1/3] net: core: set skb useful vars in __bpf_tx_skb xiangxia.m.yue 2021-11-29 4:55 ` [net v3 2/3] net: sched: add check tc_skip_classify in sch egress xiangxia.m.yue 2021-11-29 17:44 ` Eric Dumazet 2021-11-30 1:24 ` Tonghao Zhang 2021-12-01 10:58 ` Tonghao Zhang 2021-11-29 4:55 ` [net v3 3/3] selftests: bpf: add bpf_redirect to ifb xiangxia.m.yue
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.