netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* XDP multi-buffer design discussion
       [not found]           ` <ec2fd7f6da44410fbaeb021cf984f2f6@EX13D11EUC003.ant.amazon.com>
@ 2019-12-16 14:07             ` Jesper Dangaard Brouer
  2019-12-17  4:15               ` Luigi Rizzo
  0 siblings, 1 reply; 11+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-16 14:07 UTC (permalink / raw)
  To: Jubran, Samih
  Cc: Machulsky, Zorik, Daniel Borkmann, David Miller, Tzalik, Guy,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Kiyanovski,
	Arthur, brouer, Alexei Starovoitov, netdev, David Ahern


See answers inlined below (please get an email client that support
inline replies... to interact with this community)

On Sun, 15 Dec 2019 13:57:12 +0000
"Jubran, Samih" <sameehj@amazon.com> wrote:

> I am currently working on writing a design document for the XDP multi
> buff and I am using your proposal as a base. 

[Base-doc]: https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org

I will really appreciate if you can write your design document in a
text-based format, and that this can be included in the XDP-project
git-repo.  If you don't want to update the existing document
xdp-multi-buffer01-design.org, I suggest that you instead create
xdp-multi-buffer02-design.org to layout your design proposal.

That said, don't write a huge document... instead interact with
netdev@vger.kernel.org as early as possible, and then update the design
doc with the input you get.  Lets start it now... Cc'ing netdev as this
discussion should also be public.


> I have few questions in mind that weren't addressed in your draft and
> it would be great if you share your thoughts on them.
> 
> * Why should we provide the fragments to the bpf program if the
> program doesn't access them? If validating the length is what
> matters, we can provide only the full length info to the user with no
> issues.

My Proposal#1 (in [base-doc]) is that XDP only get access to the
first-buffer.  People are welcome to challenge this choice.

There are a several sub-questions and challenges hidden inside this
choice.

As you hint, the total length... spawns some questions we should answer:

 (1) is it relevant to the BPF program to know this, explain the use-case.

 (2) if so, how does BPF prog access info (without slowdown baseline)

 (3) if so, implies driver need to collect frags before calling bpf_prog
    (this influence driver RX-loop design).


> * In case we do need the fragments, should they be modifiable
> (Without helpers) by the xdp developer? 

It is okay to think about, how we can give access to fragments in the
future. But IMHO we should avoid going too far down that path...
If we just make sure we can extend it later, then it should be enough.


> * What about data_end? I believe it should point to the end of the
> first buffer, correct?

Yes, because this is part of BPF-verifier checks.


> * Should the kernel indicate to the driver somehow that it supports
> multi buf? I suppose this shouldn't be an issue unless somehow the
> patches were back patched to old kernels.
> 

The other way around.  The driver need to indicate to kernel that is
supports/enabled XDP multi-buffer.  This feature "indication" interface
is unfortunately not available today...

The reason this is needed: the BPF-helper bpf_xdp_adjust_tail() is
allowed to modify xdp_buff->data_end (as also desc in [base-doc]).
Even-though this is only shrink, then it seems very wrong to
change/shrink the first-fragment.

IMHO the BPF-loader (or XDP-attach) should simply reject programs using
bpf_xdp_adjust_tail() on a driver that have enabled XDP-multi-buffer.
This basically also happens today, if trying to attach XDP on a NIC
with large MTU (that requires >= two pages).

--Jesper



> > -----Original Message-----
> > From: Jesper Dangaard Brouer <brouer@redhat.com>
> > Sent: Wednesday, December 4, 2019 4:55 PM
> > To: Machulsky, Zorik <zorik@amazon.com>
> > Cc: Daniel Borkmann <borkmann@iogearbox.net>; David Miller
> > <davem@davemloft.net>; Jubran, Samih <sameehj@amazon.com>; Tzalik,
> > Guy <gtzalik@amazon.com>; brouer@redhat.com; Ilias Apalodimas
> > <ilias.apalodimas@linaro.org>; Toke Høiland-Jørgensen <toke@redhat.com>
> > Subject: Re: XDP_TX in ENA
> > 
> > On Mon, 2 Dec 2019 08:17:08 +0000
> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
> >   
> > > Hi Jesper,
> > >
> > > Just wanted to inform you that Samih (cc-ed) started working on
> > > multi-buffer packets support. I hope it will be OK to reach out to
> > > this forum in case there will be questions during this work.  
> > 
> > Great to hear that you are continuing the work.
> > 
> > I did notice the patchset ("Introduce XDP to ena") from Sameeh, but net-
> > next is currently closed.  I will appreciate if you can Cc both me
> > (brouer@redhat.com) and Ilias Apalodimas <ilias.apalodimas@linaro.org>.
> > 
> > Ilias have signed up for doing driver XDP reviews.
> > 
> > --Jesper
> > 
> >   
> > > On 8/22/19, 11:47 PM, "Jesper Dangaard Brouer" <brouer@redhat.com>  
> > wrote:  
> > >
> > >     Hi Zorik,
> > >
> > >     How do you plan to handle multi-buffer packets (a.k.a jumbo-frames, and  
> > >     more)?
> > >
> > >     Most drivers, when XDP gets loaded, just limit the MTU and disable TSO
> > >     (notice GRO in software is still done). Or reject XDP loading if
> > >     MTU > 3520 or TSO is enabled.
> > >
> > >     You seemed to want XDP multi-buffer support.  For this to happen
> > >     someone needs to work on this.  I've written up a design proposal
> > >     here[1], but I don't have time to work on this... Can you allocate
> > >     resources to work on this?
> > >
> > >     [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > >
> > >     --Jesper
> > >     (top-post as your email client seems to be challenged ;-))


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XDP multi-buffer design discussion
  2019-12-16 14:07             ` XDP multi-buffer design discussion Jesper Dangaard Brouer
@ 2019-12-17  4:15               ` Luigi Rizzo
  2019-12-17  8:46                 ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 11+ messages in thread
From: Luigi Rizzo @ 2019-12-17  4:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jubran, Samih, Machulsky, Zorik, Daniel Borkmann, David Miller,
	Tzalik, Guy, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Kiyanovski, Arthur, Alexei Starovoitov, netdev, David Ahern

On Mon, Dec 16, 2019 at 6:07 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
>
> See answers inlined below (please get an email client that support
> inline replies... to interact with this community)
>
> On Sun, 15 Dec 2019 13:57:12 +0000
> "Jubran, Samih" <sameehj@amazon.com> wrote:
...
> > * Why should we provide the fragments to the bpf program if the
> > program doesn't access them? If validating the length is what
> > matters, we can provide only the full length info to the user with no
> > issues.
>
> My Proposal#1 (in [base-doc]) is that XDP only get access to the
> first-buffer.  People are welcome to challenge this choice.
>
> There are a several sub-questions and challenges hidden inside this
> choice.
>
> As you hint, the total length... spawns some questions we should answer:
>
>  (1) is it relevant to the BPF program to know this, explain the use-case.
>
>  (2) if so, how does BPF prog access info (without slowdown baseline)

For some use cases, the bpf program could deduct the total length
looking at the L3 header. It won't work for XDP_TX response though.

cheers
luigi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XDP multi-buffer design discussion
  2019-12-17  4:15               ` Luigi Rizzo
@ 2019-12-17  8:46                 ` Jesper Dangaard Brouer
  2019-12-17  9:00                   ` Toke Høiland-Jørgensen
  2019-12-17 22:30                   ` Luigi Rizzo
  0 siblings, 2 replies; 11+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-17  8:46 UTC (permalink / raw)
  To: Luigi Rizzo
  Cc: Jubran, Samih, Machulsky, Zorik, Daniel Borkmann, David Miller,
	Tzalik, Guy, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Kiyanovski, Arthur, Alexei Starovoitov, netdev, David Ahern,
	brouer

On Mon, 16 Dec 2019 20:15:12 -0800
Luigi Rizzo <rizzo@iet.unipi.it> wrote:

> On Mon, Dec 16, 2019 at 6:07 AM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> >
> > See answers inlined below (please get an email client that support
> > inline replies... to interact with this community)
> >
> > On Sun, 15 Dec 2019 13:57:12 +0000
> > "Jubran, Samih" <sameehj@amazon.com> wrote:  
> ...
> > > * Why should we provide the fragments to the bpf program if the
> > > program doesn't access them? If validating the length is what
> > > matters, we can provide only the full length info to the user with no
> > > issues.  
> >
> > My Proposal#1 (in [base-doc]) is that XDP only get access to the
> > first-buffer.  People are welcome to challenge this choice.
> >
> > There are a several sub-questions and challenges hidden inside this
> > choice.
> >
> > As you hint, the total length... spawns some questions we should answer:
> >
> >  (1) is it relevant to the BPF program to know this, explain the use-case.
> >
> >  (2) if so, how does BPF prog access info (without slowdown baseline)  
> 
> For some use cases, the bpf program could deduct the total length
> looking at the L3 header. 

Yes, that actually good insight.  I guess the BPF-program could also
use this to detect that it doesn't have access to the full-lineary
packet this way(?)

> It won't work for XDP_TX response though.

The XDP_TX case also need to be discussed/handled. IMHO need to support
XDP_TX for multi-buffer frames.  XDP_TX *can* be driver specific, but
most drivers choose to convert xdp_buff to xdp_frame, which makes it
possible to use/share part of the XDP_REDIRECT code from ndo_xdp_xmit.

We also need to handle XDP_REDIRECT, which becomes challenging, as the
ndo_xdp_xmit functions of *all* drivers need to be updated (or
introduce a flag to handle this incrementally).


Sameeh, I know you have read the section[1] on "Storage space for
multi-buffer references/segments", and you updated the doc in git-tree.
So, you should understand that I want to keep this compatible with how
SKB stores segments, which will make XDP_PASS a lot easier/faster.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org#storage-space-for-multi-buffer-referencessegments


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XDP multi-buffer design discussion
  2019-12-17  8:46                 ` Jesper Dangaard Brouer
@ 2019-12-17  9:00                   ` Toke Høiland-Jørgensen
  2019-12-17 15:44                     ` Jubran, Samih
  2019-12-17 22:30                   ` Luigi Rizzo
  1 sibling, 1 reply; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-12-17  9:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Luigi Rizzo
  Cc: Jubran, Samih, Machulsky, Zorik, Daniel Borkmann, David Miller,
	Tzalik, Guy, Ilias Apalodimas, Kiyanovski, Arthur,
	Alexei Starovoitov, netdev, David Ahern, brouer

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Mon, 16 Dec 2019 20:15:12 -0800
> Luigi Rizzo <rizzo@iet.unipi.it> wrote:
>
>> On Mon, Dec 16, 2019 at 6:07 AM Jesper Dangaard Brouer
>> <brouer@redhat.com> wrote:
>> >
>> >
>> > See answers inlined below (please get an email client that support
>> > inline replies... to interact with this community)
>> >
>> > On Sun, 15 Dec 2019 13:57:12 +0000
>> > "Jubran, Samih" <sameehj@amazon.com> wrote:  
>> ...
>> > > * Why should we provide the fragments to the bpf program if the
>> > > program doesn't access them? If validating the length is what
>> > > matters, we can provide only the full length info to the user with no
>> > > issues.  
>> >
>> > My Proposal#1 (in [base-doc]) is that XDP only get access to the
>> > first-buffer.  People are welcome to challenge this choice.
>> >
>> > There are a several sub-questions and challenges hidden inside this
>> > choice.
>> >
>> > As you hint, the total length... spawns some questions we should answer:
>> >
>> >  (1) is it relevant to the BPF program to know this, explain the use-case.
>> >
>> >  (2) if so, how does BPF prog access info (without slowdown baseline)  
>> 
>> For some use cases, the bpf program could deduct the total length
>> looking at the L3 header. 
>
> Yes, that actually good insight.  I guess the BPF-program could also
> use this to detect that it doesn't have access to the full-lineary
> packet this way(?)
>
>> It won't work for XDP_TX response though.
>
> The XDP_TX case also need to be discussed/handled. IMHO need to support
> XDP_TX for multi-buffer frames.  XDP_TX *can* be driver specific, but
> most drivers choose to convert xdp_buff to xdp_frame, which makes it
> possible to use/share part of the XDP_REDIRECT code from ndo_xdp_xmit.
>
> We also need to handle XDP_REDIRECT, which becomes challenging, as the
> ndo_xdp_xmit functions of *all* drivers need to be updated (or
> introduce a flag to handle this incrementally).

If we want to handle TX and REDIRECT (which I agree we do!), doesn't
that imply that we have to structure the drivers so the XDP program
isn't executed until we have the full packet (i.e., on the last
segment)?

-Toke


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: XDP multi-buffer design discussion
  2019-12-17  9:00                   ` Toke Høiland-Jørgensen
@ 2019-12-17 15:44                     ` Jubran, Samih
  0 siblings, 0 replies; 11+ messages in thread
From: Jubran, Samih @ 2019-12-17 15:44 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, Luigi Rizzo
  Cc: Machulsky, Zorik, Daniel Borkmann, David Miller, Tzalik, Guy,
	Ilias Apalodimas, Kiyanovski, Arthur, Alexei Starovoitov, netdev,
	David Ahern, brouer



> -----Original Message-----
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> Sent: Tuesday, December 17, 2019 11:01 AM
> To: Jesper Dangaard Brouer <brouer@redhat.com>; Luigi Rizzo
> <rizzo@iet.unipi.it>
> Cc: Jubran, Samih <sameehj@amazon.com>; Machulsky, Zorik
> <zorik@amazon.com>; Daniel Borkmann <borkmann@iogearbox.net>; David
> Miller <davem@davemloft.net>; Tzalik, Guy <gtzalik@amazon.com>; Ilias
> Apalodimas <ilias.apalodimas@linaro.org>; Kiyanovski, Arthur
> <akiyano@amazon.com>; Alexei Starovoitov <ast@kernel.org>;
> netdev@vger.kernel.org; David Ahern <dsahern@gmail.com>;
> brouer@redhat.com
> Subject: Re: XDP multi-buffer design discussion
> 
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> 
> > On Mon, 16 Dec 2019 20:15:12 -0800
> > Luigi Rizzo <rizzo@iet.unipi.it> wrote:
> >
> >> On Mon, Dec 16, 2019 at 6:07 AM Jesper Dangaard Brouer
> >> <brouer@redhat.com> wrote:
> >> >
> >> >
> >> > See answers inlined below (please get an email client that support
> >> > inline replies... to interact with this community)
> >> >
> >> > On Sun, 15 Dec 2019 13:57:12 +0000
> >> > "Jubran, Samih" <sameehj@amazon.com> wrote:
> >> ...
> >> > > * Why should we provide the fragments to the bpf program if the
> >> > > program doesn't access them? If validating the length is what
> >> > > matters, we can provide only the full length info to the user
> >> > > with no issues.
> >> >
> >> > My Proposal#1 (in [base-doc]) is that XDP only get access to the
> >> > first-buffer.  People are welcome to challenge this choice.
> >> >
> >> > There are a several sub-questions and challenges hidden inside this
> >> > choice.
> >> >
> >> > As you hint, the total length... spawns some questions we should
> answer:
> >> >
> >> >  (1) is it relevant to the BPF program to know this, explain the use-case.
> >> >
> >> >  (2) if so, how does BPF prog access info (without slowdown
> >> > baseline)
> >>
> >> For some use cases, the bpf program could deduct the total length
> >> looking at the L3 header.
> >
> > Yes, that actually good insight.  I guess the BPF-program could also
> > use this to detect that it doesn't have access to the full-lineary
> > packet this way(?)
> >
> >> It won't work for XDP_TX response though.
> >
> > The XDP_TX case also need to be discussed/handled. IMHO need to
> > support XDP_TX for multi-buffer frames.  XDP_TX *can* be driver
> > specific, but most drivers choose to convert xdp_buff to xdp_frame,
> > which makes it possible to use/share part of the XDP_REDIRECT code from
> ndo_xdp_xmit.
> >
> > We also need to handle XDP_REDIRECT, which becomes challenging, as the
> > ndo_xdp_xmit functions of *all* drivers need to be updated (or
> > introduce a flag to handle this incrementally).
> 
> If we want to handle TX and REDIRECT (which I agree we do!), doesn't that
> imply that we have to structure the drivers so the XDP program isn't
> executed until we have the full packet (i.e., on the last segment)?
> 
> -Toke

Hi All,

Thank you for participating in this discussion, everything that was mentioned above is a good insight.
I'd like to sum it up with the following key points, please tell me if I'm missing something and share your opinions.

* The rationale to supply the frags are the following:
** XDP_PASS optimization: in case the verdict is XDP_PASS, creating the skb can save some work
** XDP_TX: when the verdict is XDP_TX, we need all the frags ready in order to send the packet
** The rx-loop of the driver must be modified, so why not supply information that already have been
   deducted and this should be great for the future in case XDP will be able to access the frags

* Since the XDP program won't be accessing the frags, we don't need to modify the xdp_convert_ctx_access() function, correct?

* We do need to add a way for the driver to indicate to the kernel that it supports multi buff. This is needed so we can reject programs that use bpf_xdp_adjust_tail()

- Sameeh


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XDP multi-buffer design discussion
  2019-12-17  8:46                 ` Jesper Dangaard Brouer
  2019-12-17  9:00                   ` Toke Høiland-Jørgensen
@ 2019-12-17 22:30                   ` Luigi Rizzo
  2019-12-18 16:03                     ` Jubran, Samih
  1 sibling, 1 reply; 11+ messages in thread
From: Luigi Rizzo @ 2019-12-17 22:30 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jubran, Samih, Machulsky, Zorik, Daniel Borkmann, David Miller,
	Tzalik, Guy, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Kiyanovski, Arthur, Alexei Starovoitov, netdev, David Ahern

On Tue, Dec 17, 2019 at 12:46 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Mon, 16 Dec 2019 20:15:12 -0800
> Luigi Rizzo <rizzo@iet.unipi.it> wrote:
>...
> > For some use cases, the bpf program could deduct the total length
> > looking at the L3 header.
>
> Yes, that actually good insight.  I guess the BPF-program could also
> use this to detect that it doesn't have access to the full-lineary
> packet this way(?)
>
> > It won't work for XDP_TX response though.
>
> The XDP_TX case also need to be discussed/handled. IMHO need to support
> XDP_TX for multi-buffer frames.  XDP_TX *can* be driver specific, but
> most drivers choose to convert xdp_buff to xdp_frame, which makes it
> possible to use/share part of the XDP_REDIRECT code from ndo_xdp_xmit.
>
> We also need to handle XDP_REDIRECT, which becomes challenging, as the
> ndo_xdp_xmit functions of *all* drivers need to be updated (or
> introduce a flag to handle this incrementally).

Here is a possible course of action (please let me know if you find loose ends)

1. extend struct xdp_buff with a total length and sk_buff * (NULL by default);
2. add a netdev callback to construct the skb for the current packet.
  This code obviously already in all drivers, just needs to be exposed
as function
  callable by a bpf helper (next bullet);
3. add a new helper 'bpf_create_skb' that when invoked calls the previously
  mentioned netdev callback to  constructs an skb for the current packet,
  and sets the pointer in the xdp_buff, if not there already.
  A bpf program that needs to access segments beyond the first one can
  call bpf_create_skb() if needed, and then use existing helpers
  skb_load_bytes, skb_store_bytes, etc) to access the skb.

  My rationale is that if we need to access multiple segments, we are already
  in an expensive territory and it makes little sense to define a multi segment
  format that would essentially be an skb.

4. implement a mechanism to let so the driver know whether the currently
  loaded bpf program understands the new format.
  There are multiple ways to do that, a trivial one would be to check,
during load,
  that the program calls some known helper eg bpf_understands_fragments()
  which is then jit-ed to somethijng inexpensive

  Note that today, a netdev that cannot guarantee single segment
packets would not
  be able to enable xdp. Hence, without loss of functionality, such
netdev can refuse to
  load a program without bpf_undersdands_fragments().

With all the above, the generic xdp handler would do the following:
 if (!skb_is_linear() && !bpf_understands_fragments()) {
    < linearize skb>;
 }
  <construct xdp_buff with first segment and skb> // skb is unused by
old style programs
  <call bpf program>

The native driver for a device that cannot guarantee a single segment
would just refuse
to load a program that does not understand them (same as today), so
the code would be:

<construct xdp_buff with first segment and empty skb>
 <call bpf program>

On return, we might find that an skb has been built by the xdp program,
and can be immediately used for XDP_PASS (or dropped in case of XDP_DROP)
For XDP_TX and XDP_REDIRECT, something similar: if the packet is a
single segment
and there is no skb, use the existing accelerated path. If there are
multiple segments,
construct the skb if not existing already, and pass it to the standard tx path.

cheers
luigi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: XDP multi-buffer design discussion
  2019-12-17 22:30                   ` Luigi Rizzo
@ 2019-12-18 16:03                     ` Jubran, Samih
  2019-12-19 10:44                       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 11+ messages in thread
From: Jubran, Samih @ 2019-12-18 16:03 UTC (permalink / raw)
  To: Luigi Rizzo, Jesper Dangaard Brouer
  Cc: Machulsky, Zorik, Daniel Borkmann, David Miller, Tzalik, Guy,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Kiyanovski,
	Arthur, Alexei Starovoitov, netdev, David Ahern



> -----Original Message-----
> From: Luigi Rizzo <rizzo@iet.unipi.it>
> Sent: Wednesday, December 18, 2019 12:30 AM
> To: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Jubran, Samih <sameehj@amazon.com>; Machulsky, Zorik
> <zorik@amazon.com>; Daniel Borkmann <borkmann@iogearbox.net>; David
> Miller <davem@davemloft.net>; Tzalik, Guy <gtzalik@amazon.com>; Ilias
> Apalodimas <ilias.apalodimas@linaro.org>; Toke Høiland-Jørgensen
> <toke@redhat.com>; Kiyanovski, Arthur <akiyano@amazon.com>; Alexei
> Starovoitov <ast@kernel.org>; netdev@vger.kernel.org; David Ahern
> <dsahern@gmail.com>
> Subject: Re: XDP multi-buffer design discussion
> 
> On Tue, Dec 17, 2019 at 12:46 AM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Mon, 16 Dec 2019 20:15:12 -0800
> > Luigi Rizzo <rizzo@iet.unipi.it> wrote:
> >...
> > > For some use cases, the bpf program could deduct the total length
> > > looking at the L3 header.
> >
> > Yes, that actually good insight.  I guess the BPF-program could also
> > use this to detect that it doesn't have access to the full-lineary
> > packet this way(?)
> >
> > > It won't work for XDP_TX response though.
> >
> > The XDP_TX case also need to be discussed/handled. IMHO need to
> > support XDP_TX for multi-buffer frames.  XDP_TX *can* be driver
> > specific, but most drivers choose to convert xdp_buff to xdp_frame,
> > which makes it possible to use/share part of the XDP_REDIRECT code from
> ndo_xdp_xmit.
> >
> > We also need to handle XDP_REDIRECT, which becomes challenging, as the
> > ndo_xdp_xmit functions of *all* drivers need to be updated (or
> > introduce a flag to handle this incrementally).
> 
> Here is a possible course of action (please let me know if you find loose ends)
> 
> 1. extend struct xdp_buff with a total length and sk_buff * (NULL by default);
> 2. add a netdev callback to construct the skb for the current packet.
>   This code obviously already in all drivers, just needs to be exposed as
> function
>   callable by a bpf helper (next bullet); 3. add a new helper 'bpf_create_skb'
> that when invoked calls the previously
>   mentioned netdev callback to  constructs an skb for the current packet,
>   and sets the pointer in the xdp_buff, if not there already.
>   A bpf program that needs to access segments beyond the first one can
>   call bpf_create_skb() if needed, and then use existing helpers
>   skb_load_bytes, skb_store_bytes, etc) to access the skb.
> 
>   My rationale is that if we need to access multiple segments, we are already
>   in an expensive territory and it makes little sense to define a multi segment
>   format that would essentially be an skb.
> 
> 4. implement a mechanism to let so the driver know whether the currently
>   loaded bpf program understands the new format.
>   There are multiple ways to do that, a trivial one would be to check, during
> load,
>   that the program calls some known helper eg
> bpf_understands_fragments()
>   which is then jit-ed to somethijng inexpensive
> 
>   Note that today, a netdev that cannot guarantee single segment packets
> would not
>   be able to enable xdp. Hence, without loss of functionality, such netdev can
> refuse to
>   load a program without bpf_undersdands_fragments().
> 
> With all the above, the generic xdp handler would do the following:
>  if (!skb_is_linear() && !bpf_understands_fragments()) {
>     < linearize skb>;
>  }
>   <construct xdp_buff with first segment and skb> // skb is unused by old
> style programs
>   <call bpf program>
> 
> The native driver for a device that cannot guarantee a single segment would
> just refuse to load a program that does not understand them (same as
> today), so the code would be:
> 
> <construct xdp_buff with first segment and empty skb>  <call bpf program>
> 
> On return, we might find that an skb has been built by the xdp program, and
> can be immediately used for XDP_PASS (or dropped in case of XDP_DROP)
> For XDP_TX and XDP_REDIRECT, something similar: if the packet is a single
> segment and there is no skb, use the existing accelerated path. If there are
> multiple segments, construct the skb if not existing already, and pass it to the
> standard tx path.
> 
> cheers
> luigi

I have went over your suggestion, it looks good to me! I couldn't find any loose ends. One thing to note is that the driver now needs
to save the context of the currently processed packet in for each queue so that it can support the netdev callback for creating the skb.
This sounds a bit messy, but I think it should work.

I'd love to hear more thoughts on this,
Jasper, Toke what do you guys think?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XDP multi-buffer design discussion
  2019-12-18 16:03                     ` Jubran, Samih
@ 2019-12-19 10:44                       ` Jesper Dangaard Brouer
  2019-12-19 17:29                         ` Luigi Rizzo
  0 siblings, 1 reply; 11+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-19 10:44 UTC (permalink / raw)
  To: Jubran, Samih
  Cc: Luigi Rizzo, Machulsky, Zorik, Daniel Borkmann, David Miller,
	Tzalik, Guy, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Kiyanovski, Arthur, Alexei Starovoitov, netdev, David Ahern,
	brouer

On Wed, 18 Dec 2019 16:03:54 +0000
"Jubran, Samih" <sameehj@amazon.com> wrote:

> > -----Original Message-----
> > From: Luigi Rizzo <rizzo@iet.unipi.it>
> > Sent: Wednesday, December 18, 2019 12:30 AM
> > To: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Jubran, Samih <sameehj@amazon.com>; Machulsky, Zorik
> > <zorik@amazon.com>; Daniel Borkmann <borkmann@iogearbox.net>; David
> > Miller <davem@davemloft.net>; Tzalik, Guy <gtzalik@amazon.com>; Ilias
> > Apalodimas <ilias.apalodimas@linaro.org>; Toke Høiland-Jørgensen
> > <toke@redhat.com>; Kiyanovski, Arthur <akiyano@amazon.com>; Alexei
> > Starovoitov <ast@kernel.org>; netdev@vger.kernel.org; David Ahern
> > <dsahern@gmail.com>
> > Subject: Re: XDP multi-buffer design discussion
> > 
> > On Tue, Dec 17, 2019 at 12:46 AM Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:  
> > >
> > > On Mon, 16 Dec 2019 20:15:12 -0800
> > > Luigi Rizzo <rizzo@iet.unipi.it> wrote:
> > >...  
> > > > For some use cases, the bpf program could deduct the total length
> > > > looking at the L3 header.  
> > >
> > > Yes, that actually good insight.  I guess the BPF-program could also
> > > use this to detect that it doesn't have access to the full-lineary
> > > packet this way(?)
> > >  
> > > > It won't work for XDP_TX response though.  
> > >
> > > The XDP_TX case also need to be discussed/handled. IMHO need to
> > > support XDP_TX for multi-buffer frames.  XDP_TX *can* be driver
> > > specific, but most drivers choose to convert xdp_buff to
> > > xdp_frame, which makes it possible to use/share part of the
> > > XDP_REDIRECT code from ndo_xdp_xmit.  
> > >
> > > We also need to handle XDP_REDIRECT, which becomes challenging, as the
> > > ndo_xdp_xmit functions of *all* drivers need to be updated (or
> > > introduce a flag to handle this incrementally).  
> > 
> > Here is a possible course of action (please let me know if you find
> > loose ends)
> > 
> > 1. extend struct xdp_buff with a total length and sk_buff * (NULL by default);
> >

I don't like extending xdp_buff with an skb pointer. (Remember xdp_buff
is only "allocated" on the stack).

Perhaps xdp_buff with a total-length, *BUT* this can also be solved in
other ways.  Extend xdp_buff with a flag indicating that this xdp_buff
have multiple-buffers (segments).  Then we know that the area we store
the segments is valid, and we can also store the total-length there.
(We need this total-length later after xdp_buff is freed)

To make life easier for BPF-developers, we could export/create a
BPF-helper bpf_xdp_total_len(ctx), that hide whether segments are there
or not.


> > 2. add a netdev callback to construct the skb for the current
> >    packet. This code obviously already in all drivers, just needs to
> >    be exposed as function callable by a bpf helper (next bullet);
> >

Today we already have functions that create an SKB from an xdp_frame.

First the xdp_buff is converted to an xdp_frame, which memory area
lives in the top (32 bytes) of the packet-page.
(See function call convert_to_xdp_frame() in include/net/xdp.h).

Today two function create an SKB from this xdp_frame, and they should
likely be consolidated.  Functions: (1) cpu_map_build_skb and (2)
veth_xdp_rcv_one (and dive into veth_build_skb). (Hint both use a
variant of build_skb).

The challenge for you, Samih, is the placement of skb_shared_info
within the packet-page memory area.  These two xdp_frame->SKB
functions, place skb_shared_info after xdp_frame->len packet len
(aligned).  IMHO the placement of skb_shared_info should NOT vary this
much. Instead this should be xdp->data_hard_end - 320 bytes (size of
skb_shared_info).  First challenge is fixing this...


> > 3. add a new helper 'bpf_create_skb' that when invoked calls the
> >    previously mentioned netdev callback to  constructs an skb for
> >    the current packet, and sets the pointer in the xdp_buff, if not
> >    there already. A bpf program that needs to access segments beyond
> >    the first one can call bpf_create_skb() if needed, and then use
> >    existing helpers skb_load_bytes, skb_store_bytes, etc) to access
> >    the skb.
> >

I respectfully disagree with this approach.

One reason is that we want to support creating SKBs on a remote CPU.
Like we do today with CPUMAP redirect.  The proposed helper 'bpf_create_skb'
implies that this happens in BPF-context, during the NAPI loop.


> >   My rationale is that if we need to access multiple segments, we
> >   are already in an expensive territory and it makes little sense to
> >   define a multi segment format that would essentially be an skb.
> >

I also disagree. Multi-egment frames also have some high speed
use-cases.  Especially HW header-split at IP+TCP boundry is
interesting.


> > 
> > 4. implement a mechanism to let so the driver know whether the
> >    currently loaded bpf program understands the new format.  There
> >    are multiple ways to do that, a trivial one would be to check,
> >    during load, that the program calls some known helper eg
> >    bpf_understands_fragments() which is then jit-ed to somethijng
> >    inexpensive
> > 
> >   Note that today, a netdev that cannot guarantee single segment
> > packets would not be able to enable xdp. Hence, without loss of
> > functionality, such netdev can refuse to load a program without
> > bpf_undersdands_fragments().
> > 
> >
> > With all the above, the generic xdp handler would do the following:
> >  if (!skb_is_linear() && !bpf_understands_fragments()) {
> >     < linearize skb>;
> >  }
> >   <construct xdp_buff with first segment and skb> // skb is unused by old style programs
> >   <call bpf program>
> > 
> > The native driver for a device that cannot guarantee a single
> > segment would just refuse to load a program that does not
> > understand them (same as today), so the code would be:
> >
> > <construct xdp_buff with first segment and empty skb>  <call bpf program>
> >
> > On return, we might find that an skb has been built by the xdp
> > program, and can be immediately used for XDP_PASS (or dropped in
> > case of XDP_DROP)

I also disagree here.  SKB should first be created AFTER we know if
this will be a XDP_PASS action. 


> > For XDP_TX and XDP_REDIRECT, something similar:
> > if the packet is a single segment and there is no skb, use the
> > existing accelerated path. If there are multiple segments,
> > construct the skb if not existing already, and pass it to the
> > standard tx path.
> > 
> > cheers
> > luigi  
> 
> I have went over your suggestion, it looks good to me! I couldn't
> find any loose ends. One thing to note is that the driver now needs
> to save the context of the currently processed packet in for each
> queue so that it can support the netdev callback for creating the skb.
>
> This sounds a bit messy, but I think it should work.
> 
> I'd love to hear more thoughts on this,
> Jesper, Toke what do you guys think?

As you can see from my comments I (respectfully) disagree with this
approach.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XDP multi-buffer design discussion
  2019-12-19 10:44                       ` Jesper Dangaard Brouer
@ 2019-12-19 17:29                         ` Luigi Rizzo
  2020-01-19  7:34                           ` Jubran, Samih
  0 siblings, 1 reply; 11+ messages in thread
From: Luigi Rizzo @ 2019-12-19 17:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jubran, Samih, Machulsky, Zorik, Daniel Borkmann, David Miller,
	Tzalik, Guy, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Kiyanovski, Arthur, Alexei Starovoitov, netdev, David Ahern

On Thu, Dec 19, 2019 at 2:44 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 18 Dec 2019 16:03:54 +0000
> "Jubran, Samih" <sameehj@amazon.com> wrote:
>
> > > -----Original Message-----
> > > From: Luigi Rizzo <rizzo@iet.unipi.it>
> > >...
> > > Here is a possible course of action (please let me know if you find
> > > loose ends)
> > >
> > > 1. extend struct xdp_buff with a total length and sk_buff * (NULL by default);

Top comment: thank you for the feedback, I see you disagree with my proposal :)
so let me ask some questions to understand its shortcoming, identify use cases,
and perhaps address subproblems individually.

> I don't like extending xdp_buff with an skb pointer. (Remember xdp_buff
> is only "allocated" on the stack).

Nit, what are your specific concerns in adding a field or two (pointer
+ total length)
to the xdp_buff (and to the struct visible to the bpf program) ?

> Perhaps xdp_buff with a total-length, *BUT* this can also be solved in
> other ways.  Extend xdp_buff with a flag indicating that this xdp_buff
> have multiple-buffers (segments).  Then we know that the area we store
> the segments is valid, and we can also store the total-length there.
> (We need this total-length later after xdp_buff is freed)

No matter how we extend the structure we must address problem #1
 #1: old bpf programs may not be aware of the new format
       hence may process packets incorrectly.
I don't think there is a way out of this other than have a mechanism
for the bpf programs to indicate which ABI(s) they understand.
Since there are several ways to address it (via BTF, etc.) at load time
and with minimal code changes in the kernel (and none in the loader
programs), I think we can discuss this separately if needed,
and proceed with the assumption that using a slightly different xdp_buff
with additional info (or bpf helpers to retrieve them) is a solved problem.

...
> > > 2. add a netdev callback to construct the skb for the current
> > >    packet. This code obviously already in all drivers, just needs to
> > >    be exposed as function callable by a bpf helper (next bullet);
> > >
>
> Today we already have functions that create an SKB from an xdp_frame.
> ...

I probably should have been more clear on this point.

I introduced the skb to address problem #2:
  #2:  how do we access a multi-segment packet from within the bpf program
      (in other words, how do we represent a multi-segment packet)

and suggested the helper to address problem #3
  #3: how do avoid constructing such a representation if not needed ?

I don't care whether this format is an skb (more details below), but
the xdp_frame
that you mention seems to have only information on the first segment so
I am not sure how it can help with multi-segment frames.

To elaborate: for #2 we should definitely find some optimized solution for the
common cases. The current xdp_buff is optimized for one segment (alas, it only
supports that), and so drivers disable header split when doing xdp to comply.
Header split with hdr+body is possibly very common too (especially if we don't
artificially disable it), so we could/should redefine the
xdp_{buff|frame} with static
room for two segments (header + rest). This can be populated unconditionally at
relatively low cost, for both the caller and the bpf program.

For larger number of segments, though, there is going to be an O(N) space/time
cost, with potentially large N (say when a NIC does RSC before calling
the driver,
resulting in many segments), and non-negligible constants (each segment may
require a dma_sync, and perhaps a decision to recycle or replenish the buffer).
These are all things that we don't want to do upfront because it could
be wasted work,
hence my suggestion for a netdev method and matching bpf helper to create
whatever multi-segment representation only when the bpf program asks for it.

Again, this would be problem #3, which relates to the next bullet:

> > > 3. add a new helper 'bpf_create_skb' that when invoked calls the
> > >    previously mentioned netdev callback to  constructs an skb for
> > >    the current packet, and sets the pointer in the xdp_buff, if not
> > >    there already. A bpf program that needs to access segments beyond
> > >    the first one can call bpf_create_skb() if needed, and then use
> > >    existing helpers skb_load_bytes, skb_store_bytes, etc) to access
> > >    the skb.
> > >
>
> I respectfully disagree with this approach.
>
> One reason is that we want to support creating SKBs on a remote CPU.
> Like we do today with CPUMAP redirect.  The proposed helper 'bpf_create_skb'
> implies that this happens in BPF-context, during the NAPI loop.

restricting the discussion to the non-optimized case (>2 segments):
there is an O(N) phase to grab the segments that has to happen in
the napi loop. Likewise, storage for the list/array of segments must be procured
in the napi loop (it could be in the page containing the packet, but we have
no guarantee that there is space), and then copied on the remote CPU
into a suitable format for xmit (which again does not have to be an skb,
but see more details in response to the last item).

> > >   My rationale is that if we need to access multiple segments, we
> > >   are already in an expensive territory and it makes little sense to
> > >   define a multi segment format that would essentially be an skb.
> > >
>
> I also disagree. Multi-egment frames also have some high speed
> use-cases.  Especially HW header-split at IP+TCP boundry is
> interesting.

Acknowledged and agreed. I have specifically added the 2-segments
case in the the discussion above.

> > > The native driver for a device that cannot guarantee a single
> > > segment would just refuse to load a program that does not
> > > understand them (same as today), so the code would be:
> > >
> > > <construct xdp_buff with first segment and empty skb>  <call bpf program>
> > >
> > > On return, we might find that an skb has been built by the xdp
> > > program, and can be immediately used for XDP_PASS (or dropped in
> > > case of XDP_DROP)
>
> I also disagree here.  SKB should first be created AFTER we know if
> this will be a XDP_PASS action.

Note that I said "we might" - the decision to build the "full_descriptor"
(skb or other format) is made by the bpf program itself. As a consequence,
it won't build a full_descriptor before making a decision, UNLESS it needs to
see the whole packet to make a decision, in which case there is no better
solution anyways.

The only thing where we may look for optimizations is what format this
full_descriptor should have. Depending on the outcome:
XDP_DROP: don't care, it is going to be dropped
XDP_PASS: skb seems convenient since the network stack expects that
XDP_TX if the buffers are mapped dma_bidir, just an array of dma_addr + len
   would work (no dma_map or page refcounts needed)
XDP_REDIRECT array of page fragments (since we need to dma-map them
  for the destination and play with refcounts)

cheers
luigi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: XDP multi-buffer design discussion
  2019-12-19 17:29                         ` Luigi Rizzo
@ 2020-01-19  7:34                           ` Jubran, Samih
  2020-01-22 18:50                             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 11+ messages in thread
From: Jubran, Samih @ 2020-01-19  7:34 UTC (permalink / raw)
  To: Luigi Rizzo, Jesper Dangaard Brouer
  Cc: Machulsky, Zorik, Daniel Borkmann, David Miller, Tzalik, Guy,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Kiyanovski,
	Arthur, Alexei Starovoitov, netdev, David Ahern

> -----Original Message-----
> From: Luigi Rizzo <rizzo@iet.unipi.it>
> Sent: Thursday, December 19, 2019 7:29 PM
> To: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Jubran, Samih <sameehj@amazon.com>; Machulsky, Zorik
> <zorik@amazon.com>; Daniel Borkmann <borkmann@iogearbox.net>; David
> Miller <davem@davemloft.net>; Tzalik, Guy <gtzalik@amazon.com>; Ilias
> Apalodimas <ilias.apalodimas@linaro.org>; Toke Høiland-Jørgensen
> <toke@redhat.com>; Kiyanovski, Arthur <akiyano@amazon.com>; Alexei
> Starovoitov <ast@kernel.org>; netdev@vger.kernel.org; David Ahern
> <dsahern@gmail.com>
> Subject: Re: XDP multi-buffer design discussion
> 
> On Thu, Dec 19, 2019 at 2:44 AM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Wed, 18 Dec 2019 16:03:54 +0000
> > "Jubran, Samih" <sameehj@amazon.com> wrote:
> >
> > > > -----Original Message-----
> > > > From: Luigi Rizzo <rizzo@iet.unipi.it> ...
> > > > Here is a possible course of action (please let me know if you
> > > >find  loose ends)
> > > >
> > > > 1. extend struct xdp_buff with a total length and sk_buff * (NULL
> > > > by default);
> 
> Top comment: thank you for the feedback, I see you disagree with my
> proposal :) so let me ask some questions to understand its shortcoming,
> identify use cases, and perhaps address subproblems individually.
> 
> > I don't like extending xdp_buff with an skb pointer. (Remember
> > xdp_buff is only "allocated" on the stack).
> 
> Nit, what are your specific concerns in adding a field or two (pointer
> + total length)
> to the xdp_buff (and to the struct visible to the bpf program) ?
> 
> > Perhaps xdp_buff with a total-length, *BUT* this can also be solved in
> > other ways.  Extend xdp_buff with a flag indicating that this xdp_buff
> > have multiple-buffers (segments).  Then we know that the area we store
> > the segments is valid, and we can also store the total-length there.
> > (We need this total-length later after xdp_buff is freed)
> 
> No matter how we extend the structure we must address problem #1
>  #1: old bpf programs may not be aware of the new format
>        hence may process packets incorrectly.
> I don't think there is a way out of this other than have a mechanism for the
> bpf programs to indicate which ABI(s) they understand.
> Since there are several ways to address it (via BTF, etc.) at load time and with
> minimal code changes in the kernel (and none in the loader programs), I
> think we can discuss this separately if needed, and proceed with the
> assumption that using a slightly different xdp_buff with additional info (or
> bpf helpers to retrieve them) is a solved problem.
> 
> ...
> > > > 2. add a netdev callback to construct the skb for the current
> > > >    packet. This code obviously already in all drivers, just needs to
> > > >    be exposed as function callable by a bpf helper (next bullet);
> > > >
> >
> > Today we already have functions that create an SKB from an xdp_frame.
> > ...
> 
> I probably should have been more clear on this point.
> 
> I introduced the skb to address problem #2:
>   #2:  how do we access a multi-segment packet from within the bpf program
>       (in other words, how do we represent a multi-segment packet)
> 
> and suggested the helper to address problem #3
>   #3: how do avoid constructing such a representation if not needed ?
> 
> I don't care whether this format is an skb (more details below), but the
> xdp_frame that you mention seems to have only information on the first
> segment so I am not sure how it can help with multi-segment frames.
> 
> To elaborate: for #2 we should definitely find some optimized solution for
> the common cases. The current xdp_buff is optimized for one segment (alas,
> it only supports that), and so drivers disable header split when doing xdp to
> comply.
> Header split with hdr+body is possibly very common too (especially if we
> don't artificially disable it), so we could/should redefine the
> xdp_{buff|frame} with static room for two segments (header + rest). This
> can be populated unconditionally at relatively low cost, for both the caller
> and the bpf program.
> 
> For larger number of segments, though, there is going to be an O(N)
> space/time cost, with potentially large N (say when a NIC does RSC before
> calling the driver, resulting in many segments), and non-negligible constants
> (each segment may require a dma_sync, and perhaps a decision to recycle or
> replenish the buffer).
> These are all things that we don't want to do upfront because it could be
> wasted work, hence my suggestion for a netdev method and matching bpf
> helper to create whatever multi-segment representation only when the bpf
> program asks for it.
> 
> Again, this would be problem #3, which relates to the next bullet:
> 
> > > > 3. add a new helper 'bpf_create_skb' that when invoked calls the
> > > >    previously mentioned netdev callback to  constructs an skb for
> > > >    the current packet, and sets the pointer in the xdp_buff, if not
> > > >    there already. A bpf program that needs to access segments beyond
> > > >    the first one can call bpf_create_skb() if needed, and then use
> > > >    existing helpers skb_load_bytes, skb_store_bytes, etc) to access
> > > >    the skb.
> > > >
> >
> > I respectfully disagree with this approach.
> >
> > One reason is that we want to support creating SKBs on a remote CPU.
> > Like we do today with CPUMAP redirect.  The proposed helper
> 'bpf_create_skb'
> > implies that this happens in BPF-context, during the NAPI loop.
> 
> restricting the discussion to the non-optimized case (>2 segments):
> there is an O(N) phase to grab the segments that has to happen in the napi
> loop. Likewise, storage for the list/array of segments must be procured in the
> napi loop (it could be in the page containing the packet, but we have no
> guarantee that there is space), and then copied on the remote CPU into a
> suitable format for xmit (which again does not have to be an skb, but see
> more details in response to the last item).
> 
> > > >   My rationale is that if we need to access multiple segments, we
> > > >   are already in an expensive territory and it makes little sense to
> > > >   define a multi segment format that would essentially be an skb.
> > > >
> >
> > I also disagree. Multi-egment frames also have some high speed
> > use-cases.  Especially HW header-split at IP+TCP boundry is
> > interesting.
> 
> Acknowledged and agreed. I have specifically added the 2-segments case in
> the the discussion above.
> 
> > > > The native driver for a device that cannot guarantee a single
> > > > segment would just refuse to load a program that does not
> > > > understand them (same as today), so the code would be:
> > > >
> > > > <construct xdp_buff with first segment and empty skb>  <call bpf
> > > > program>
> > > >
> > > > On return, we might find that an skb has been built by the xdp
> > > > program, and can be immediately used for XDP_PASS (or dropped in
> > > > case of XDP_DROP)
> >
> > I also disagree here.  SKB should first be created AFTER we know if
> > this will be a XDP_PASS action.
> 
> Note that I said "we might" - the decision to build the "full_descriptor"
> (skb or other format) is made by the bpf program itself. As a consequence, it
> won't build a full_descriptor before making a decision, UNLESS it needs to
> see the whole packet to make a decision, in which case there is no better
> solution anyways.
> 
> The only thing where we may look for optimizations is what format this
> full_descriptor should have. Depending on the outcome:
> XDP_DROP: don't care, it is going to be dropped
> XDP_PASS: skb seems convenient since the network stack expects that
> XDP_TX if the buffers are mapped dma_bidir, just an array of dma_addr + len
>    would work (no dma_map or page refcounts needed) XDP_REDIRECT array
> of page fragments (since we need to dma-map them
>   for the destination and play with refcounts)
> 
> cheers
> Luigi

Hi all,

We have two proposed design solutions.  One (Jasper’s) seems easier to implement and gives the average XDP developer the framework to deal with multiple buffers.  The other (Luigi’s) seems more complete but raises a few questions:
1.	The netdev's callback might be too intrusive to the net drivers and requires the driver to somehow save context of the current processed packet
2.	The solution might be an overkill to the average XDP developer.  Does the average XDP developer really need full access to the packet?

Since Jesper's design is easier to implement as well as leaves a way for future extension to Luigi's design, I'm going to implement and share it with you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XDP multi-buffer design discussion
  2020-01-19  7:34                           ` Jubran, Samih
@ 2020-01-22 18:50                             ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 11+ messages in thread
From: Jesper Dangaard Brouer @ 2020-01-22 18:50 UTC (permalink / raw)
  To: Jubran, Samih
  Cc: Luigi Rizzo, Machulsky, Zorik, Daniel Borkmann, David Miller,
	Tzalik, Guy, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Kiyanovski, Arthur, Alexei Starovoitov, netdev, David Ahern,
	brouer

On Sun, 19 Jan 2020 07:34:00 +0000 "Jubran, Samih" <sameehj@amazon.com> wrote:

[...]
> 
> We have two proposed design solutions.  One (Jesper’s) seems easier
> to implement and gives the average XDP developer the framework to
> deal with multiple buffers.  The other (Luigi’s) seems more complete
> but raises a few questions:
>
> 1.	The netdev's callback might be too intrusive to the net
> drivers and requires the driver to somehow save context of the
> current processed packet
>
> 2.	The solution might be an overkill to the average XDP
> developer.  Does the average XDP developer really need full access to
> the packet?
> 
> Since Jesper's design is easier to implement as well as leaves a way
> for future extension to Luigi's design, I'm going to implement and
> share it with you.

Thanks for letting us know you are still working on this.

_WHEN_ you hit issue please feel free to reach out to me. (p.s. I'll be
in Brno at DevConf.cz the next couple of days. But will be back
Tuesday and back on IRC at Freenode channel #xdp nick:netoptimizer).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-22 18:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <BA8DC06A-C508-402D-8A18-B715FBA674A2@amazon.com>
     [not found] ` <b28504a3-4a55-d302-91fe-63915e4568d3@iogearbox.net>
     [not found]   ` <5FA3F980-29E6-4B91-8150-9F28C0E09C45@amazon.com>
     [not found]     ` <20190823084704.075aeebd@carbon>
     [not found]       ` <67C7F66A-A3F7-408F-9C9E-C53982BCCD40@amazon.com>
     [not found]         ` <20191204155509.6b517f75@carbon>
     [not found]           ` <ec2fd7f6da44410fbaeb021cf984f2f6@EX13D11EUC003.ant.amazon.com>
2019-12-16 14:07             ` XDP multi-buffer design discussion Jesper Dangaard Brouer
2019-12-17  4:15               ` Luigi Rizzo
2019-12-17  8:46                 ` Jesper Dangaard Brouer
2019-12-17  9:00                   ` Toke Høiland-Jørgensen
2019-12-17 15:44                     ` Jubran, Samih
2019-12-17 22:30                   ` Luigi Rizzo
2019-12-18 16:03                     ` Jubran, Samih
2019-12-19 10:44                       ` Jesper Dangaard Brouer
2019-12-19 17:29                         ` Luigi Rizzo
2020-01-19  7:34                           ` Jubran, Samih
2020-01-22 18:50                             ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).