All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>,
	John Fastabend <john.fastabend@gmail.com>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>,
	bpf@vger.kernel.org, netdev@vger.kernel.org, davem@davemloft.net,
	kuba@kernel.org, ast@kernel.org, daniel@iogearbox.net,
	shayagr@amazon.com, sameehj@amazon.com, dsahern@kernel.org,
	brouer@redhat.com, echaudro@redhat.com
Subject: Re: [PATCH v4 bpf-next 00/13] mvneta: introduce XDP multi-buffer support
Date: Fri, 02 Oct 2020 11:06:12 -0700	[thread overview]
Message-ID: <5f776c14d69b3_a6402087e@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <20201002160623.GA40027@lore-desk>

Lorenzo Bianconi wrote:
> > Lorenzo Bianconi wrote:
> > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > please focus on how these new types of xdp_{buff,frame} packets
> > > traverse the different layers and the layout design. It is on purpose
> > > that BPF-helpers are kept simple, as we don't want to expose the
> > > internal layout to allow later changes.
> > > 
> > > For now, to keep the design simple and to maintain performance, the XDP
> > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > later (another patchset) to add payload access across multiple buffers.
> > > This patchset should still allow for these future extensions. The goal
> > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > same performance as before.
> > > 
> > > The main idea for the new multi-buffer layout is to reuse the same
> > > layout used for non-linear SKB. This rely on the "skb_shared_info"
> > > struct at the end of the first buffer to link together subsequent
> > > buffers. Keeping the layout compatible with SKBs is also done to ease
> > > and speedup creating an SKB from an xdp_{buff,frame}. Converting
> > > xdp_frame to SKB and deliver it to the network stack is shown in cpumap
> > > code (patch 13/13).
> > 
> > Using the end of the buffer for the skb_shared_info struct is going to
> > become driver API so unwinding it if it proves to be a performance issue
> > is going to be ugly. So same question as before, for the use case where
> > we receive packet and do XDP_TX with it how do we avoid cache miss
> > overhead? This is not just a hypothetical use case, the Facebook
> > load balancer is doing this as well as Cilium and allowing this with
> > multi-buffer packets >1500B would be useful.
> > 
> > Can we write the skb_shared_info lazily? It should only be needed once
> > we know the packet is going up the stack to some place that needs the
> > info. Which we could learn from the return code of the XDP program.
> 
> Hi John,

Hi, I'll try to join the two threads this one and the one on helpers here
so we don't get too fragmented.

> 
> I agree, I think for XDP_TX use-case it is not strictly necessary to fill the
> skb_hared_info. The driver can just keep this info on the stack and use it
> inserting the packet back to the DMA ring.
> For mvneta I implemented it in this way to keep the code aligned with ndo_xdp_xmit
> path since it is a low-end device. I guess we are not introducing any API constraint
> for XDP_TX. A high-end device can implement multi-buff for XDP_TX in a different way
> in order to avoid the cache miss.

Agree it would be an implementation detail for XDP_TX except the two helpers added
in this series currently require it to be there.

> 
> We need to fill the skb_shared info only when we want to pass the frame to the
> network stack (build_skb() can directly reuse skb_shared_info->frags[]) or for
> XDP_REDIRECT use-case.

It might be good to think about the XDP_REDIRECT case as well then. If the
frags list fit in the metadata/xdp_frame would we expect better
performance?

Looking at skb_shared_info{} that is a rather large structure with many
fields that look unnecessary for XDP_REDIRECT case and only needed when
passing to the stack. Fundamentally, a frag just needs

 struct bio_vec {
     struct page *bv_page;     // 8B
     unsigned int bv_len;      // 4B
     unsigned int bv_offset;   // 4B
 } // 16B

With header split + data we only need a single frag so we could use just
16B. And worse case jumbo frame + header split seems 3 entries would be
enough giving 48B (header plus 3 4k pages). Could we just stick this in
the metadata and make it read only? Then programs that care can read it
and get all the info they need without helpers. I would expect performance
to be better in the XDP_TX and XDP_REDIRECT cases. And copying an extra
worse case 48B in passing to the stack I guess is not measurable given
all the work needed in that path.

> 
> > 
> > > 
> > > A multi-buffer bit (mb) has been introduced in xdp_{buff,frame} structure
> > > to notify the bpf/network layer if this is a xdp multi-buffer frame (mb = 1)
> > > or not (mb = 0).
> > > The mb bit will be set by a xdp multi-buffer capable driver only for
> > > non-linear frames maintaining the capability to receive linear frames
> > > without any extra cost since the skb_shared_info structure at the end
> > > of the first buffer will be initialized only if mb is set.
> > 
> > Thanks above is clearer.
> > 
> > > 
> > > In order to provide to userspace some metdata about the non-linear
> > > xdp_{buff,frame}, we introduced 2 bpf helpers:
> > > - bpf_xdp_get_frags_count:
> > >   get the number of fragments for a given xdp multi-buffer.
> > > - bpf_xdp_get_frags_total_size:
> > >   get the total size of fragments for a given xdp multi-buffer.
> > 
> > Whats the use case for these? Do you have an example where knowing
> > the frags count is going to be something a BPF program will use?
> > Having total size seems interesting but perhaps we should push that
> > into the metadata so its pulled into the cache if users are going to
> > be reading it on every packet or something.
> 
> At the moment we do not have any use-case for these helpers (not considering
> the sample in the series :)). We introduced them to provide some basic metadata
> about the non-linear xdp_frame.
> IIRC we decided to introduce some helpers instead of adding this info in xdp_frame
> in order to save space on it (for xdp it is essential xdp_frame to fit in a single
> cache-line).

Sure, how about in the metadata then? (From other thread I was suggesting putting
the total length in metadata) We could even allow programs to overwrite it if
they wanted if its not used by the stack for anything other than packet length
visibility. Of course users would then need to be a bit careful not to overwrite
it and then read it again expecting the length to be correct. I think from a
users perspective though that would be expected.

> 
> Regards,
> Lorenzo
> 

  reply	other threads:[~2020-10-02 18:06 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 14:41 [PATCH v4 bpf-next 00/13] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
2020-10-02 14:41 ` [PATCH v4 bpf-next 01/13] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 02/13] xdp: initialize xdp_buff mb bit to 0 in all XDP drivers Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 03/13] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 04/13] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 05/13] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 06/13] bpf: introduce bpf_xdp_get_frags_{count, total_size} helpers Lorenzo Bianconi
2020-10-02 15:36   ` John Fastabend
2020-10-02 16:25     ` Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 07/13] samples/bpf: add bpf program that uses xdp mb helpers Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 08/13] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 09/13] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
2020-10-08  8:06   ` Shay Agroskin
2020-10-08 10:46     ` Jesper Dangaard Brouer
2020-10-02 14:42 ` [PATCH v4 bpf-next 10/13] bpf: test_run: add skb_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 11/13] bpf: add xdp multi-buffer selftest Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 12/13] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 13/13] bpf: cpumap: introduce xdp multi-buff support Lorenzo Bianconi
2020-10-02 15:25 ` [PATCH v4 bpf-next 00/13] mvneta: introduce XDP multi-buffer support John Fastabend
2020-10-02 16:06   ` Lorenzo Bianconi
2020-10-02 18:06     ` John Fastabend [this message]
2020-10-05  9:52       ` Jesper Dangaard Brouer
2020-10-05 21:22         ` John Fastabend
2020-10-05 22:24           ` Lorenzo Bianconi
2020-10-06  4:29             ` John Fastabend
2020-10-06  7:30               ` Jesper Dangaard Brouer
2020-10-06 15:28                 ` Lorenzo Bianconi
2020-10-08 14:38                   ` John Fastabend
2020-10-02 19:53   ` Daniel Borkmann
2020-10-05 15:50     ` Tirthendu Sarkar
2020-10-06 12:39     ` Jubran, Samih

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f776c14d69b3_a6402087e@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=echaudro@redhat.com \
    --cc=kuba@kernel.org \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=lorenzo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sameehj@amazon.com \
    --cc=shayagr@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.