* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:19 ` [PATCH 00/11] XDP unaligned chunk placement support Laatz, Kevin
@ 2019-06-28 16:51 ` Björn Töpel
2019-06-28 20:08 ` Jakub Kicinski
2019-06-28 20:25 ` Jakub Kicinski
2019-06-28 20:29 ` Jonathan Lemon
2 siblings, 1 reply; 10+ messages in thread
From: Björn Töpel @ 2019-06-28 16:51 UTC (permalink / raw)
To: Laatz, Kevin, Jakub Kicinski
Cc: Jonathan Lemon, netdev, ast, daniel, magnus.karlsson, bpf,
intel-wired-lan, bruce.richardson, ciara.loftus
On 2019-06-28 18:19, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
>> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>>> On the application side (xdpsock), we don't have to worry about the user
>>> defined headroom, since it is 0, so we only need to account for the
>>> XDP_PACKET_HEADROOM when computing the original address (in the default
>>> scenario).
>> That assumes specific layout for the data inside the buffer. Some NICs
>> will prepend information like timestamp to the packet, meaning the
>> packet would start at offset XDP_PACKET_HEADROOM + metadata len..
>
> Yes, if NICs prepend extra data to the packet that would be a problem for
> using this feature in isolation. However, if we also add in support for
> in-order
> RX and TX rings, that would no longer be an issue. However, even for NICs
> which do prepend data, this patchset should not break anything that is
> currently
> working.
(Late on the ball. I'm in vacation mode.)
In your example Jakub, how would this look in XDP? Wouldn't the
timestamp be part of the metadata (xdp_md.data_meta)? Isn't
data-data_meta (if valid) <= XDP_PACKET_HEADROOM? That was my assumption.
There were some discussion on having meta data length in the struct
xdp_desc, before AF_XDP was merged, but the conclusion was that this was
*not* needed, because AF_XDP and the XDP program had an implicit
contract. If you're running AF_XDP, you also have an XDP program running
and you can determine the meta data length (and also getting back the
original buffer).
So, today in AF_XDP if XDP metadata is added, the userland application
can look it up before the xdp_desc.addr (just like regular XDP), and how
the XDP/AF_XDP application determines length/layout of the metadata i
out-of-band/not specified.
This is a bit messy/handwavy TBH, so maybe adding the length to the
descriptor *is* a good idea (extending the options part of the
xdp_desc)? Less clean though. OTOH the layout of the meta data still
need to be determined.
Björn
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:51 ` Björn Töpel
@ 2019-06-28 20:08 ` Jakub Kicinski
0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2019-06-28 20:08 UTC (permalink / raw)
To: Björn Töpel
Cc: Laatz, Kevin, Jonathan Lemon, netdev, ast, daniel,
magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
ciara.loftus
On Fri, 28 Jun 2019 18:51:37 +0200, Björn Töpel wrote:
> In your example Jakub, how would this look in XDP? Wouldn't the
> timestamp be part of the metadata (xdp_md.data_meta)? Isn't
> data-data_meta (if valid) <= XDP_PACKET_HEADROOM? That was my assumption.
The driver parses the metadata and copies it outside of the prepend
before XDP runs. Then XDP runs unaware of the prepend contents.
That's the current situation.
XDP_PACKET_HEADROOM is before the entire frame. Like this:
buffer start
/ DMA addr given to the device
/ /
v v
| XDP_HEADROOM | meta data | packet data |
Length of meta data comes in the standard fixed size descriptor.
The metadata prepend is in TV form ("TLV with no length field", length's
implied by type).
> There were some discussion on having meta data length in the struct
> xdp_desc, before AF_XDP was merged, but the conclusion was that this was
> *not* needed, because AF_XDP and the XDP program had an implicit
> contract. If you're running AF_XDP, you also have an XDP program running
> and you can determine the meta data length (and also getting back the
> original buffer).
>
> So, today in AF_XDP if XDP metadata is added, the userland application
> can look it up before the xdp_desc.addr (just like regular XDP), and how
> the XDP/AF_XDP application determines length/layout of the metadata i
> out-of-band/not specified.
>
> This is a bit messy/handwavy TBH, so maybe adding the length to the
> descriptor *is* a good idea (extending the options part of the
> xdp_desc)? Less clean though. OTOH the layout of the meta data still
> need to be determined.
Right, the device prepend is not exposed as metadata to XDP.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:19 ` [PATCH 00/11] XDP unaligned chunk placement support Laatz, Kevin
2019-06-28 16:51 ` Björn Töpel
@ 2019-06-28 20:25 ` Jakub Kicinski
2019-06-28 20:29 ` Jonathan Lemon
2 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2019-06-28 20:25 UTC (permalink / raw)
To: Laatz, Kevin
Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel,
magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
ciara.loftus
On Fri, 28 Jun 2019 17:19:09 +0100, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
> > On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
> >> On the application side (xdpsock), we don't have to worry about the user
> >> defined headroom, since it is 0, so we only need to account for the
> >> XDP_PACKET_HEADROOM when computing the original address (in the default
> >> scenario).
> > That assumes specific layout for the data inside the buffer. Some NICs
> > will prepend information like timestamp to the packet, meaning the
> > packet would start at offset XDP_PACKET_HEADROOM + metadata len..
>
> Yes, if NICs prepend extra data to the packet that would be a problem for
> using this feature in isolation. However, if we also add in support for
> in-order RX and TX rings, that would no longer be an issue.
Can you shed more light on in-order rings? Do you mean that RX frames
come in order buffers were placed in the fill queue? That wouldn't
make practical sense, no? Even if the application does no
reordering there is also XDP_DROP and XDP_TX. Please explain :)
> However, even for NICs which do prepend data, this patchset should
> not break anything that is currently working.
My understanding from the beginnings of AF_XDP was that we were
searching for a format flexible enough to support most if not all NICs.
Creating an ABI which will preclude vendors from supporting DPDK via
AF_XDP would seriously undermine the neutrality aspect.
> > I think that's very limiting. What is the challenge in providing
> > aligned addresses, exactly?
> The challenges are two-fold:
> 1) it prevents using arbitrary buffer sizes, which will be an issue
> supporting e.g. jumbo frames in future.
Presumably support for jumbos would require a multi-buffer setup, and
therefore extensions to the ring format. Should we perhaps look into
implementing unaligned chunks by extending ring format as well?
> 2) higher level user-space frameworks which may want to use AF_XDP, such
> as DPDK, do not currently support having buffers with 'fixed' alignment.
> The reason that DPDK uses arbitrary placement is that:
> - it would stop things working on certain NICs which need the
> actual writable space specified in units of 1k - therefore we need 2k +
> metadata space.
> - we place padding between buffers to avoid constantly hitting
> the same memory channels when accessing memory.
> - it allows the application to choose the actual buffer size it
> wants to use.
> We make use of the above to allow us to speed up processing
> significantly and also reduce the packet buffer memory size.
>
> Not having arbitrary buffer alignment also means an AF_XDP driver
> for DPDK cannot be a drop-in replacement for existing drivers in those
> frameworks. Even with a new capability to allow an arbitrary buffer
> alignment, existing apps will need to be modified to use that new
> capability.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:19 ` [PATCH 00/11] XDP unaligned chunk placement support Laatz, Kevin
2019-06-28 16:51 ` Björn Töpel
2019-06-28 20:25 ` Jakub Kicinski
@ 2019-06-28 20:29 ` Jonathan Lemon
2019-07-01 14:58 ` Laatz, Kevin
[not found] ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
2 siblings, 2 replies; 10+ messages in thread
From: Jonathan Lemon @ 2019-06-28 20:29 UTC (permalink / raw)
To: Laatz, Kevin
Cc: Jakub Kicinski, netdev, ast, daniel, bjorn.topel,
magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
ciara.loftus
On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
>> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>>> On the application side (xdpsock), we don't have to worry about the
>>> user
>>> defined headroom, since it is 0, so we only need to account for the
>>> XDP_PACKET_HEADROOM when computing the original address (in the
>>> default
>>> scenario).
>> That assumes specific layout for the data inside the buffer. Some
>> NICs
>> will prepend information like timestamp to the packet, meaning the
>> packet would start at offset XDP_PACKET_HEADROOM + metadata len..
>
> Yes, if NICs prepend extra data to the packet that would be a problem
> for
> using this feature in isolation. However, if we also add in support
> for in-order
> RX and TX rings, that would no longer be an issue. However, even for
> NICs
> which do prepend data, this patchset should not break anything that is
> currently
> working.
I read this as "the correct buffer address is recovered from the shadow
ring".
I'm not sure I'm comfortable with that, and I'm also not sold on
in-order completion
for the RX/TX rings.
>> I think that's very limiting. What is the challenge in providing
>> aligned addresses, exactly?
> The challenges are two-fold:
> 1) it prevents using arbitrary buffer sizes, which will be an issue
> supporting e.g. jumbo frames in future.
> 2) higher level user-space frameworks which may want to use AF_XDP,
> such as DPDK, do not currently support having buffers with 'fixed'
> alignment.
> The reason that DPDK uses arbitrary placement is that:
> - it would stop things working on certain NICs which
> need the actual writable space specified in units of 1k - therefore we
> need 2k + metadata space.
> - we place padding between buffers to avoid constantly
> hitting the same memory channels when accessing memory.
> - it allows the application to choose the actual buffer
> size it wants to use.
> We make use of the above to allow us to speed up processing
> significantly and also reduce the packet buffer memory size.
>
> Not having arbitrary buffer alignment also means an AF_XDP
> driver for DPDK cannot be a drop-in replacement for existing drivers
> in those frameworks. Even with a new capability to allow an arbitrary
> buffer alignment, existing apps will need to be modified to use that
> new capability.
Since all buffers in the umem are the same chunk size, the original
buffer
address can be recalculated with some multiply/shift math. However,
this is
more expensive than just a mask operation.
--
Jonathan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 20:29 ` Jonathan Lemon
@ 2019-07-01 14:58 ` Laatz, Kevin
[not found] ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
1 sibling, 0 replies; 10+ messages in thread
From: Laatz, Kevin @ 2019-07-01 14:58 UTC (permalink / raw)
To: Jonathan Lemon
Cc: Jakub Kicinski, netdev, ast, daniel, bjorn.topel,
magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
ciara.loftus
On 28/06/2019 21:29, Jonathan Lemon wrote:
> On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
>> On 27/06/2019 22:25, Jakub Kicinski wrote:
>>> I think that's very limiting. What is the challenge in providing
>>> aligned addresses, exactly?
>> The challenges are two-fold:
>> 1) it prevents using arbitrary buffer sizes, which will be an issue
>> supporting e.g. jumbo frames in future.
>> 2) higher level user-space frameworks which may want to use AF_XDP,
>> such as DPDK, do not currently support having buffers with 'fixed'
>> alignment.
>> The reason that DPDK uses arbitrary placement is that:
>> - it would stop things working on certain NICs which need the
>> actual writable space specified in units of 1k - therefore we need 2k
>> + metadata space.
>> - we place padding between buffers to avoid constantly
>> hitting the same memory channels when accessing memory.
>> - it allows the application to choose the actual buffer size
>> it wants to use.
>> We make use of the above to allow us to speed up processing
>> significantly and also reduce the packet buffer memory size.
>>
>> Not having arbitrary buffer alignment also means an AF_XDP driver
>> for DPDK cannot be a drop-in replacement for existing drivers in
>> those frameworks. Even with a new capability to allow an arbitrary
>> buffer alignment, existing apps will need to be modified to use that
>> new capability.
>
> Since all buffers in the umem are the same chunk size, the original
> buffer
> address can be recalculated with some multiply/shift math. However,
> this is
> more expensive than just a mask operation.
Yes, we can do this.
Another option we have is to add a socket option for querying the
metadata length from the driver (assuming it doesn't vary per packet).
We can use that information to get back the original address using
subtraction.
Alternatively, we can change the Rx descriptor format to include the
metadata length. We could do this in a couple of ways, for example,
rather than returning the address at the start of the packet, instead
return the buffer address that was passed in, and adding another 16-bit
field to specify the start of the packet offset with that buffer. Id
using 16-bits of descriptor space is not desirable, an alternative could
be to limit umem sizes to e.g. 2^48 bits (256 terabytes should be
enough, right :-) ) and use the remaining 16 bits of the address as a
packet offset. Other variations on these approaches are obviously
possible too.
^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>]
* Re: [PATCH 00/11] XDP unaligned chunk placement support
[not found] ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
@ 2019-07-01 21:20 ` Jakub Kicinski
2019-07-02 9:27 ` Richardson, Bruce
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2019-07-01 21:20 UTC (permalink / raw)
To: Laatz, Kevin
Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel,
magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
ciara.loftus
On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
> On 28/06/2019 21:29, Jonathan Lemon wrote:
> > On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
> >> On 27/06/2019 22:25, Jakub Kicinski wrote:
> >>> I think that's very limiting. What is the challenge in providing
> >>> aligned addresses, exactly?
> >> The challenges are two-fold:
> >> 1) it prevents using arbitrary buffer sizes, which will be an issue
> >> supporting e.g. jumbo frames in future.
> >> 2) higher level user-space frameworks which may want to use AF_XDP,
> >> such as DPDK, do not currently support having buffers with 'fixed'
> >> alignment.
> >> The reason that DPDK uses arbitrary placement is that:
> >> - it would stop things working on certain NICs which need the
> >> actual writable space specified in units of 1k - therefore we need 2k
> >> + metadata space.
> >> - we place padding between buffers to avoid constantly
> >> hitting the same memory channels when accessing memory.
> >> - it allows the application to choose the actual buffer size
> >> it wants to use.
> >> We make use of the above to allow us to speed up processing
> >> significantly and also reduce the packet buffer memory size.
> >>
> >> Not having arbitrary buffer alignment also means an AF_XDP driver
> >> for DPDK cannot be a drop-in replacement for existing drivers in
> >> those frameworks. Even with a new capability to allow an arbitrary
> >> buffer alignment, existing apps will need to be modified to use that
> >> new capability.
> >
> > Since all buffers in the umem are the same chunk size, the original
> > buffer
> > address can be recalculated with some multiply/shift math. However,
> > this is
> > more expensive than just a mask operation.
>
> Yes, we can do this.
That'd be best, can DPDK reasonably guarantee the slicing is uniform?
E.g. it's not desperate buffer pools with different bases?
> Another option we have is to add a socket option for querying the
> metadata length from the driver (assuming it doesn't vary per packet).
> We can use that information to get back to the original address using
> subtraction.
Unfortunately the metadata depends on the packet and how much info
the device was able to extract. So it's variable length.
> Alternatively, we can change the Rx descriptor format to include the
> metadata length. We could do this in a couple of ways, for example,
> rather than returning the address as the start of the packet, instead
> return the buffer address that was passed in, and adding another 16-bit
> field to specify the start of packet offset with that buffer. If using
> another 16-bits of the descriptor space is not desirable, an alternative
> could be to limit umem sizes to e.g. 2^48 bits (256 terabytes should be
> enough, right :-) ) and use the remaining 16 bits of the address as a
> packet offset. Other variations on these approach are obviously possible
> too.
Seems reasonable to me..
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 00/11] XDP unaligned chunk placement support
2019-07-01 21:20 ` Jakub Kicinski
@ 2019-07-02 9:27 ` Richardson, Bruce
2019-07-02 16:33 ` Jonathan Lemon
0 siblings, 1 reply; 10+ messages in thread
From: Richardson, Bruce @ 2019-07-02 9:27 UTC (permalink / raw)
To: Jakub Kicinski, Laatz, Kevin
Cc: Jonathan Lemon, netdev, ast, daniel, Topel, Bjorn, Karlsson,
Magnus, bpf, intel-wired-lan, Loftus, Ciara
> -----Original Message-----
> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
> Sent: Monday, July 1, 2019 10:20 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>; netdev@vger.kernel.org;
> ast@kernel.org; daniel@iogearbox.net; Topel, Bjorn
> <bjorn.topel@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>;
> bpf@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
> Subject: Re: [PATCH 00/11] XDP unaligned chunk placement support
>
> On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
> > On 28/06/2019 21:29, Jonathan Lemon wrote:
> > > On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
> > >> On 27/06/2019 22:25, Jakub Kicinski wrote:
> > >>> I think that's very limiting. What is the challenge in providing
> > >>> aligned addresses, exactly?
> > >> The challenges are two-fold:
> > >> 1) it prevents using arbitrary buffer sizes, which will be an issue
> > >> supporting e.g. jumbo frames in future.
> > >> 2) higher level user-space frameworks which may want to use AF_XDP,
> > >> such as DPDK, do not currently support having buffers with 'fixed'
> > >> alignment.
> > >> The reason that DPDK uses arbitrary placement is that:
> > >> - it would stop things working on certain NICs which need
> > >> the actual writable space specified in units of 1k - therefore we
> > >> need 2k
> > >> + metadata space.
> > >> - we place padding between buffers to avoid constantly
> > >> hitting the same memory channels when accessing memory.
> > >> - it allows the application to choose the actual buffer
> > >> size it wants to use.
> > >> We make use of the above to allow us to speed up processing
> > >> significantly and also reduce the packet buffer memory size.
> > >>
> > >> Not having arbitrary buffer alignment also means an AF_XDP
> > >> driver for DPDK cannot be a drop-in replacement for existing
> > >> drivers in those frameworks. Even with a new capability to allow an
> > >> arbitrary buffer alignment, existing apps will need to be modified
> > >> to use that new capability.
> > >
> > > Since all buffers in the umem are the same chunk size, the original
> > > buffer address can be recalculated with some multiply/shift math.
> > > However, this is more expensive than just a mask operation.
> >
> > Yes, we can do this.
>
> That'd be best, can DPDK reasonably guarantee the slicing is uniform?
> E.g. it's not desperate buffer pools with different bases?
It's generally uniform, but handling the crossing of (huge)page boundaries
complicates things a bit. Therefore I think the final option below
is best as it avoids any such problems.
>
> > Another option we have is to add a socket option for querying the
> > metadata length from the driver (assuming it doesn't vary per packet).
> > We can use that information to get back to the original address using
> > subtraction.
>
> Unfortunately the metadata depends on the packet and how much info the
> device was able to extract. So it's variable length.
>
> > Alternatively, we can change the Rx descriptor format to include the
> > metadata length. We could do this in a couple of ways, for example,
> > rather than returning the address as the start of the packet, instead
> > return the buffer address that was passed in, and adding another
> > 16-bit field to specify the start of packet offset with that buffer.
> > If using another 16-bits of the descriptor space is not desirable, an
> > alternative could be to limit umem sizes to e.g. 2^48 bits (256
> > terabytes should be enough, right :-) ) and use the remaining 16 bits
> > of the address as a packet offset. Other variations on these approach
> > are obviously possible too.
>
> Seems reasonable to me..
I think this is probably the best solution, and also has the advantage that
a buffer retains its base address the full way through the cycle of Rx and Tx.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-07-02 9:27 ` Richardson, Bruce
@ 2019-07-02 16:33 ` Jonathan Lemon
0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Lemon @ 2019-07-02 16:33 UTC (permalink / raw)
To: Richardson, Bruce
Cc: Jakub Kicinski, Laatz, Kevin, netdev, ast, daniel, Topel, Bjorn,
Karlsson, Magnus, bpf, intel-wired-lan, Loftus, Ciara
On 2 Jul 2019, at 2:27, Richardson, Bruce wrote:
>> -----Original Message-----
>> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
>> Sent: Monday, July 1, 2019 10:20 PM
>> To: Laatz, Kevin <kevin.laatz@intel.com>
>> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>;
>> netdev@vger.kernel.org;
>> ast@kernel.org; daniel@iogearbox.net; Topel, Bjorn
>> <bjorn.topel@intel.com>; Karlsson, Magnus
>> <magnus.karlsson@intel.com>;
>> bpf@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Richardson,
>> Bruce
>> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
>> Subject: Re: [PATCH 00/11] XDP unaligned chunk placement support
>>
>> On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
>>> On 28/06/2019 21:29, Jonathan Lemon wrote:
>>>> On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
>>>>> On 27/06/2019 22:25, Jakub Kicinski wrote:
>>>>>> I think that's very limiting. What is the challenge in
>>>>>> providing
>>>>>> aligned addresses, exactly?
>>>>> The challenges are two-fold:
>>>>> 1) it prevents using arbitrary buffer sizes, which will be an
>>>>> issue
>>>>> supporting e.g. jumbo frames in future.
>>>>> 2) higher level user-space frameworks which may want to use
>>>>> AF_XDP,
>>>>> such as DPDK, do not currently support having buffers with 'fixed'
>>>>> alignment.
>>>>> The reason that DPDK uses arbitrary placement is that:
>>>>> - it would stop things working on certain NICs which
>>>>> need
>>>>> the actual writable space specified in units of 1k - therefore we
>>>>> need 2k
>>>>> + metadata space.
>>>>> - we place padding between buffers to avoid
>>>>> constantly
>>>>> hitting the same memory channels when accessing memory.
>>>>> - it allows the application to choose the actual
>>>>> buffer
>>>>> size it wants to use.
>>>>> We make use of the above to allow us to speed up processing
>>>>> significantly and also reduce the packet buffer memory size.
>>>>>
>>>>> Not having arbitrary buffer alignment also means an AF_XDP
>>>>> driver for DPDK cannot be a drop-in replacement for existing
>>>>> drivers in those frameworks. Even with a new capability to allow
>>>>> an
>>>>> arbitrary buffer alignment, existing apps will need to be modified
>>>>> to use that new capability.
>>>>
>>>> Since all buffers in the umem are the same chunk size, the original
>>>> buffer address can be recalculated with some multiply/shift math.
>>>> However, this is more expensive than just a mask operation.
>>>
>>> Yes, we can do this.
>>
>> That'd be best, can DPDK reasonably guarantee the slicing is uniform?
>> E.g. it's not desperate buffer pools with different bases?
>
> It's generally uniform, but handling the crossing of (huge)page
> boundaries
> complicates things a bit. Therefore I think the final option below
> is best as it avoids any such problems.
>
>>
>>> Another option we have is to add a socket option for querying the
>>> metadata length from the driver (assuming it doesn't vary per
>>> packet).
>>> We can use that information to get back to the original address
>>> using
>>> subtraction.
>>
>> Unfortunately the metadata depends on the packet and how much info
>> the
>> device was able to extract. So it's variable length.
>>
>>> Alternatively, we can change the Rx descriptor format to include the
>>> metadata length. We could do this in a couple of ways, for example,
>>> rather than returning the address as the start of the packet,
>>> instead
>>> return the buffer address that was passed in, and adding another
>>> 16-bit field to specify the start of packet offset with that buffer.
>>> If using another 16-bits of the descriptor space is not desirable,
>>> an
>>> alternative could be to limit umem sizes to e.g. 2^48 bits (256
>>> terabytes should be enough, right :-) ) and use the remaining 16
>>> bits
>>> of the address as a packet offset. Other variations on these
>>> approach
>>> are obviously possible too.
>>
>> Seems reasonable to me..
>
> I think this is probably the best solution, and also has the advantage
> that
> a buffer retains its base address the full way through the cycle of Rx
> and Tx.
I like this as well - it also has the advantage that drivers can keep
performing adjustments on the handle, which ends up just modifying the
offset.
--
Jonathan
^ permalink raw reply [flat|nested] 10+ messages in thread