From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: "Jonathan Lemon" <jonathan.lemon@gmail.com>,
"Magnus Karlsson" <magnus.karlsson@intel.com>,
"Björn Töpel" <bjorn.topel@intel.com>,
ast@kernel.org, "Daniel Borkmann" <daniel@iogearbox.net>,
"Network Development" <netdev@vger.kernel.org>,
"Jakub Kicinski" <jakub.kicinski@netronome.com>,
"Björn Töpel" <bjorn.topel@gmail.com>,
"Zhang, Qi Z" <qi.z.zhang@intel.com>,
xiaolong.ye@intel.com, brouer@redhat.com,
"xdp-newbies@vger.kernel.org" <xdp-newbies@vger.kernel.org>
Subject: Re: [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support
Date: Wed, 13 Feb 2019 12:55:30 +0100 [thread overview]
Message-ID: <20190213125530.4a7fb8bc@carbon> (raw)
In-Reply-To: <CAJ8uoz19UjmEHTc28Qd_9KdY9D-ojXSBRTbmffRhUTX49mnWvg@mail.gmail.com>
On Wed, 13 Feb 2019 12:32:47 +0100
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
> >
> > > This patch proposes to add AF_XDP support to libbpf. The main reason
> > > for this is to facilitate writing applications that use AF_XDP by
> > > offering higher-level APIs that hide many of the details of the AF_XDP
> > > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > > offering easy-to-use higher level interfaces of XDP
> > > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > > applications using it simpler and smaller, and finally also make it
> > > possible for applications to benefit from optimizations in the AF_XDP
> > > user space access code. Previously, people just copied and pasted the
> > > code from the sample application into their application, which is not
> > > desirable.
> >
> > I like the idea of encapsulating the boilerplate logic in a library.
> >
> > I do think there is an important missing piece though - there should be
> > some code which queries the netdev for how many queues are attached, and
> > create the appropriate number of umem/AF_XDP sockets.
> >
> > I ran into this issue when testing the current AF_XDP code - on my test
> > boxes, the mlx5 card has 55 channels (aka queues), so when the test program
> > binds only to channel 0, nothing works as expected, since not all traffic
> > is being intercepted. While obvious in hindsight, this took a while to
> > track down.
>
> Yes, agreed. You are not the first one to stumble upon this problem
> :-). Let me think a little bit on how to solve this in a good way. We
> need this to be simple and intuitive, as you say.
I see people hitting this with AF_XDP all the time... I had some
backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
give the performance reason why and propose a workaround.
[1] https://github.com/xdp-project/xdp-project/tree/master/conference/FOSDEM2019
[2] https://github.com/xdp-project/xdp-project/blob/master/conference/FOSDEM2019/xdp_building_block.org#backup-slides
Alternative work-around
* Create as many AF_XDP sockets as RXQs
* Have userspace poll()/select on all sockets
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
* Backup Slides :export:
** Slide: Where does AF_XDP performance come from? :export:
/Lock-free [[https://lwn.net/Articles/169961/][channel]] directly from driver RX-queue into AF_XDP socket/
- Single-Producer/Single-Consumer (SPSC) descriptor ring queues
- *Single*-/Producer/ (SP) via bind to specific RX-*/queue id/*
* NAPI-softirq assures only 1-CPU process 1-RX-queue id (per sched)
- *Single*-/Consumer/ (SC) via 1-Application
- *Bounded* buffer pool (UMEM) allocated by userspace (register with kernel)
* Descriptor(s) in ring(s) point into UMEM
* /No memory allocation/, but return frames to UMEM in timely manner
- [[http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf][Transport signature]] Van Jacobson talked about
* Replaced by XDP/eBPF program choosing to XDP_REDIRECT
** Slide: Details: Actually *four* SPSC ring queues :export:
AF_XDP /socket/: Has /two rings/: *RX* and *TX*
- Descriptor(s) in ring points into UMEM
/UMEM/ consists of a number of equally sized chunks
- Has /two rings/: *FILL* ring and *COMPLETION* ring
- FILL ring: application gives kernel area to RX fill
- COMPLETION ring: kernel tells app TX is done for area (can be reused)
** Slide: Gotcha by RX-queue id binding :export:
AF_XDP bound to */single RX-queue id/* (for SPSC performance reasons)
- NIC by default spreads flows with RSS-hashing over RX-queues
* Traffic likely not hitting queue you expect
- You *MUST* configure NIC *HW filters* to /steer to RX-queue id/
* Out of scope for XDP setup
* Use ethtool or TC HW offloading for filter setup
- *Alternative* work-around
* /Create as many AF_XDP sockets as RXQs/
* Have userspace poll()/select on all sockets
next prev parent reply other threads:[~2019-02-13 11:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-08 13:05 [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support Magnus Karlsson
2019-02-08 13:05 ` [PATCH bpf-next v4 1/2] libbpf: add support for using AF_XDP sockets Magnus Karlsson
2019-02-15 16:37 ` Daniel Borkmann
2019-02-18 8:59 ` Magnus Karlsson
2019-02-18 11:21 ` Maciej Fijalkowski
2019-02-08 13:05 ` [PATCH bpf-next v4 2/2] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson
2019-02-11 6:33 ` [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support Jean-Mickael Guerin
2019-02-11 7:52 ` Magnus Karlsson
2019-02-11 19:48 ` Jonathan Lemon
2019-02-13 11:32 ` Magnus Karlsson
2019-02-13 11:55 ` Jesper Dangaard Brouer [this message]
2019-02-15 16:20 ` Daniel Borkmann
2019-02-18 8:20 ` Magnus Karlsson
2019-02-18 9:38 ` Daniel Borkmann
2019-02-18 10:09 ` Magnus Karlsson
2019-02-13 20:49 ` Jonathan Lemon
2019-02-14 8:25 ` Magnus Karlsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190213125530.4a7fb8bc@carbon \
--to=brouer@redhat.com \
--cc=ast@kernel.org \
--cc=bjorn.topel@gmail.com \
--cc=bjorn.topel@intel.com \
--cc=daniel@iogearbox.net \
--cc=jakub.kicinski@netronome.com \
--cc=jonathan.lemon@gmail.com \
--cc=magnus.karlsson@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=qi.z.zhang@intel.com \
--cc=xdp-newbies@vger.kernel.org \
--cc=xiaolong.ye@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).