From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: "Maxim Mikityanskiy" <maximmi@nvidia.com>,
"Björn Töpel" <bjorn.topel@gmail.com>,
ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org,
bpf@vger.kernel.org, bjorn.topel@intel.com,
magnus.karlsson@intel.com, kuba@kernel.org,
jonathan.lemon@gmail.com, davem@davemloft.net,
john.fastabend@gmail.com, ciara.loftus@intel.com,
weqaar.a.janjua@intel.com
Subject: Re: [PATCH bpf-next v2 0/8] Introduce bpf_redirect_xsk() helper
Date: Wed, 20 Jan 2021 17:19:31 +0100 [thread overview]
Message-ID: <20210120161931.GA32916@ranger.igk.intel.com> (raw)
In-Reply-To: <20210120165708.243f83cb@carbon>
On Wed, Jan 20, 2021 at 04:57:08PM +0100, Jesper Dangaard Brouer wrote:
> On Wed, 20 Jan 2021 15:15:22 +0200
> Maxim Mikityanskiy <maximmi@nvidia.com> wrote:
>
> > On 2021-01-19 17:50, Björn Töpel wrote:
> > > This series extends bind() for XDP sockets, so that the bound socket
> > > is added to the netdev_rx_queue _rx array in the netdevice. We call
> > > this to register the socket. To redirect packets to the registered
> > > socket, a new BPF helper is used: bpf_redirect_xsk().
> > >
> > > For shared XDP sockets, only the first bound socket is
> > > registered. Users that need more complex setup has to use XSKMAP and
> > > bpf_redirect_map().
> > >
> > > Now, why would one use bpf_redirect_xsk() over the regular
> > > bpf_redirect_map() helper?
> > >
> > > * Better performance!
> > > * Convenience; Most user use one socket per queue. This scenario is
> > > what registered sockets support. There is no need to create an
> > > XSKMAP. This can also reduce complexity from containerized setups,
> > > where users might what to use XDP sockets without CAP_SYS_ADMIN
> > > capabilities.
>
> I'm buying into the convenience and reduce complexity, and XDP sockets
> without CAP_SYS_ADMIN into containers.
>
> People might be surprised that I'm actually NOT buying into the better
> performance argument here. At these speeds we are basically comparing
> how close we are to zero (and have to use nanosec time scale for our
> comparisons), more below.
>
>
> > > The first patch restructures xdp_do_redirect() a bit, to make it
> > > easier to add the new helper. This restructure also give us a slight
> > > performance benefit. The following three patches extends bind() and
> > > adds the new helper. After that, two libbpf patches that selects XDP
> > > program based on what kernel is running. Finally, selftests for the new
> > > functionality is added.
> > >
> > > Note that the libbpf "auto-selection" is based on kernel version, so
> > > it is hard coded to the "-next" version (5.12). If you would like to
> > > try this is out, you will need to change the libbpf patch locally!
> > >
> > > Thanks to Maciej and Magnus for the internal review/comments!
> > >
> > > Performance (rxdrop, zero-copy)
> > >
> > > Baseline
> > > Two cores: 21.3 Mpps
> > > One core: 24.5 Mpps
> >
> > Two cores is slower? It used to be faster all the time, didn't it?
> >
> > > Patched
> > > Two cores, bpf_redirect_map: 21.7 Mpps + 2%
> > > One core, bpf_redirect_map: 24.9 Mpps + 2%
> > >
> > > Two cores, bpf_redirect_xsk: 24.0 Mpps +13%
> >
> > Nice, impressive improvement!
>
> I do appreciate you work and performance optimizations at this level,
> because when we are using this few CPU cycles per packet, then it is
> really hard to find new ways to reduce cycles further.
>
> Thank for you saying +13% instead of saying +2.7 Mpps.
> It *is* impressive to basically reduce cycles with 13%.
>
> 21.3 Mpps = 46.94 nanosec per packet
> 24.0 Mpps = 41.66 nanosec per packet
>
> 21.3 Mpps -> 24.0 Mpps = 5.28 nanosec saved
>
> On my 3.60GHz testlab machine that gives me 19 cycles.
>
>
> > > One core, bpf_redirect_xsk: 25.5 Mpps + 4%
> > >
>
> 24.5 Mpps -> 25.5 Mpps = 1.6 nanosec saved
>
> At this point with saving 1.6 ns this is around the cost of a function call 1.3 ns.
>
>
> We still need these optimization in the kernel, but end-users in
> userspace are very quickly going to waste the 19 cycles we found.
> I still support/believe that the OS need to have a little overhead as
> possible, but for me 42 nanosec overhead is close to zero overhead. For
> comparison, I just ran a udp_sink[3] test, and it "cost" 625 ns for
> delivery of UDP packets into socket (almost 15 times slower).
>
> I guess my point is that with XDP we have already achieved and exceeded
> (my original) performance goals, making it even faster is just showing off ;-P
Even though I'll let Bjorn elaborating on this, we're talking here about
AF-XDP which is a bit different pair of shoes to me in terms of
performance. AFAIK we still have a gap when compared to DPDK's numbers. So
I'm really not sure why better performance bothers you? :)
Let's rather be more harsh on changes that actually decrease the
performance, not the other way around. And I suppose you were the one that
always was bringing up the 'death by a 1000 paper cuts' of XDP. So yeah,
I'm a bit confused with your statement.
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
>
> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
> [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_memset.c
> [3] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
>
next prev parent reply other threads:[~2021-01-20 16:31 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-19 15:50 [PATCH bpf-next v2 0/8] Introduce bpf_redirect_xsk() helper Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 1/8] xdp: restructure redirect actions Björn Töpel
2021-01-20 12:44 ` Toke Høiland-Jørgensen
2021-01-20 13:40 ` Björn Töpel
2021-01-20 14:52 ` Toke Høiland-Jørgensen
2021-01-20 15:49 ` Björn Töpel
2021-01-20 16:30 ` Toke Høiland-Jørgensen
2021-01-20 17:26 ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 2/8] xsk: remove explicit_free parameter from __xsk_rcv() Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 3/8] xsk: fold xp_assign_dev and __xp_assign_dev Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 4/8] xsk: register XDP sockets at bind(), and add new AF_XDP BPF helper Björn Töpel
2021-01-20 8:25 ` kernel test robot
2021-01-20 8:41 ` Björn Töpel
2021-01-20 8:50 ` kernel test robot
2021-01-20 12:50 ` Toke Høiland-Jørgensen
2021-01-20 13:25 ` Björn Töpel
2021-01-20 14:54 ` Toke Høiland-Jørgensen
2021-01-20 15:18 ` Björn Töpel
2021-01-20 17:29 ` Toke Høiland-Jørgensen
2021-01-20 18:22 ` Björn Töpel
2021-01-20 20:26 ` Toke Høiland-Jørgensen
2021-01-20 21:15 ` Alexei Starovoitov
2021-01-21 8:18 ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 5/8] libbpf, xsk: select AF_XDP BPF program based on kernel version Björn Töpel
2021-01-20 12:52 ` Toke Høiland-Jørgensen
2021-01-20 13:25 ` Björn Töpel
2021-01-20 14:49 ` Björn Töpel
2021-01-20 15:11 ` Toke Høiland-Jørgensen
2021-01-20 15:27 ` Björn Töpel
2021-01-20 17:30 ` Toke Høiland-Jørgensen
2021-01-20 18:25 ` Alexei Starovoitov
2021-01-20 18:30 ` Björn Töpel
2021-01-20 14:56 ` Toke Høiland-Jørgensen
2021-01-19 15:50 ` [PATCH bpf-next v2 6/8] libbpf, xsk: select bpf_redirect_xsk(), if supported Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 7/8] selftest/bpf: add XDP socket tests for bpf_redirect_{xsk, map}() Björn Töpel
2021-01-21 7:39 ` Andrii Nakryiko
2021-01-21 12:31 ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 8/8] selftest/bpf: remove a lot of ifobject casting in xdpxceiver Björn Töpel
2021-01-20 13:15 ` [PATCH bpf-next v2 0/8] Introduce bpf_redirect_xsk() helper Maxim Mikityanskiy
2021-01-20 13:27 ` Björn Töpel
2021-01-20 15:57 ` Jesper Dangaard Brouer
2021-01-20 16:19 ` Maciej Fijalkowski [this message]
2021-01-21 17:01 ` Jesper Dangaard Brouer
2021-01-22 8:59 ` Magnus Karlsson
2021-01-22 9:45 ` Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210120161931.GA32916@ranger.igk.intel.com \
--to=maciej.fijalkowski@intel.com \
--cc=ast@kernel.org \
--cc=bjorn.topel@gmail.com \
--cc=bjorn.topel@intel.com \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=ciara.loftus@intel.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=maximmi@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=weqaar.a.janjua@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).