xdp-newbies.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* XDP and AF_XDP performance comparison
@ 2022-09-22  8:21 Federico Parola
  2022-09-22 18:38 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 4+ messages in thread
From: Federico Parola @ 2022-09-22  8:21 UTC (permalink / raw)
  To: xdp-newbies

Dear all,
I would like to share with this community a draft I recently wrote [1] 
on the performance comparison of XDP and AF_XDP packet processing.
In the paper we found some interesting and unexpected results 
(especially related to the impact of addressed memory on the performance 
of the two technologies) and tried to envision a combined use of the two 
technologies, especially to tackle the poor performance of re-injecting 
packets into the kernel from user space to leverage the TCP/IP stack.
Any comment and suggestion from this community or any type of joint 
work/collaboration would be very appreciated.

Best regards,
Federico Parola

[1] 
https://fedeparola.github.io/publication/xdp-af_xdp-cmp/Comparing_User_Space_and_In_Kernel_Packet_Processing_for_Edge_Data_Centers.pdf

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XDP and AF_XDP performance comparison
  2022-09-22  8:21 XDP and AF_XDP performance comparison Federico Parola
@ 2022-09-22 18:38 ` Toke Høiland-Jørgensen
  2022-09-23 13:11   ` Federico Parola
  0 siblings, 1 reply; 4+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-09-22 18:38 UTC (permalink / raw)
  To: Federico Parola, xdp-newbies

Federico Parola <federico.parola@polito.it> writes:

> Dear all,
> I would like to share with this community a draft I recently wrote [1] 
> on the performance comparison of XDP and AF_XDP packet processing.
> In the paper we found some interesting and unexpected results 
> (especially related to the impact of addressed memory on the performance 
> of the two technologies) and tried to envision a combined use of the two 
> technologies, especially to tackle the poor performance of re-injecting 
> packets into the kernel from user space to leverage the TCP/IP stack.
> Any comment and suggestion from this community or any type of joint 
> work/collaboration would be very appreciated.

Hi Federico

Thank you for the link! All in all I thought it was a nicely done
performance comparison.

One thing that might be interesting would be to do the same comparison
on a different driver. A lot of the performance details you're
discovering in this paper boils down to details about how the driver
data path is implemented. For instance, it's an Intel-specific thing
that there's a whole separate path for zero-copy AF_XDP. Any plans to
replicate the study using, say, an mlx5-based NIC?

Also, a couple of comments on details:

- The performance delta you show in Figure 9 where AF_XDP is faster at
  hair-pin forwarding than XDP was a bit puzzling; the two applications
  should basically be doing the same thing. It seems to be because the
  i40e driver converts the xdp_buff struct to an xdp_frame before
  transmitting it out the interface again:

  https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L2280

- It's interesting that userspace seems to handle scattered memory
  accesses over a large range better than kernel-space. It would be
  interesting to know why; you mention you're leaving this to future
  studies, any plans of following up and trying to figure this out? :)


Finally, since you seem to have your tests packaged up nicely, do you
think it would be possible to take (some of) them and turn them into a
kind of "performance CI" test suite, that can be run automatically, or
semi-automatically to catch future performance regressions in the XDP
stack? Such a test suite would be pretty great to have so we can avoid
the "death by a thousand paper cuts" type of gradual performance
degradation as we add new features...

-Toke


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XDP and AF_XDP performance comparison
  2022-09-22 18:38 ` Toke Høiland-Jørgensen
@ 2022-09-23 13:11   ` Federico Parola
  2022-12-16 15:11     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 4+ messages in thread
From: Federico Parola @ 2022-09-23 13:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, xdp-newbies

On 22/09/22 20:38, Toke Høiland-Jørgensen wrote:
> Hi Federico
> 
> Thank you for the link! All in all I thought it was a nicely done
> performance comparison.

Dear Toke,
thank you very much for your observations and your interest in my work.

> One thing that might be interesting would be to do the same comparison
> on a different driver. A lot of the performance details you're
> discovering in this paper boils down to details about how the driver
> data path is implemented. For instance, it's an Intel-specific thing
> that there's a whole separate path for zero-copy AF_XDP. Any plans to
> replicate the study using, say, an mlx5-based NIC?

The impact of the driver on the results was clear from the beginning, 
however I wasn’t aware of mlx5 using the same path for XDP and zc 
AF_XDP, I thought different paths was the norm (my bad for not 
checking). This could radically change results for NVIDIA NICs. I 
performed similar (but less extensive) tests on a X540 NIC running the 
ixgbe driver and the results show similar behavior in the relation 
between XDP and AF_XDP even though the performance gaps are smaller. 
Another factor that impacts results is the kernel version: again, same 
relation between XDP and AF_XDP results but different gaps. In 
particular I experienced significant performance drops (of both XDP and 
AF_XDP) moving from kernel 5.15 to 5.16 and another one from 5.18 to 
5.19 (the latter much more consistent). Unfortunately I don’t have any 
mlx5 NICs at disposal in my lab at the moment. If you are aware of any 
way I could experiment on an NVIDIA NIC (I know there are some open 
testbeds) that would be very interesting.

> Also, a couple of comments on details:
> 
> - The performance delta you show in Figure 9 where AF_XDP is faster at
>    hair-pin forwarding than XDP was a bit puzzling; the two applications
>    should basically be doing the same thing. It seems to be because the
>    i40e driver converts the xdp_buff struct to an xdp_frame before
>    transmitting it out the interface again:
> 
>    https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L2280

For what concerns XDP_TX performance with AF_XDP sockets enabled (XDP-sk 
in the draft) this is definitely the case, since the conversion from 
xdp_buff to xdp_frame requires a copy of the whole packet in a new 
memory page:
https://elixir.bootlin.com/linux/latest/source/net/core/xdp.c#L559

For pure XDP (no AF_XDP sockets enabled) on the other hand, the 
conversion only requires copying some fields. However, given the very 
limited size of the packet processing function (macswap), those copies 
might have a significant impact. This would also explain why the gap 
between XDP and AF_XDP shrinks so much when we move from macswap (+29%) 
to the load balancer (+14%). However it seems to me that the conversion 
is common to all drivers, not specific of Intel, so I wonder if it can 
be avoided (maybe relying only on the xdp_frame?).

> - It's interesting that userspace seems to handle scattered memory
>    accesses over a large range better than kernel-space. It would be
>    interesting to know why; you mention you're leaving this to future
>    studies, any plans of following up and trying to figure this out? :)

This is definitely the most curious result. Given my limited (but 
improving) knowledge of XDP and AF_XDP internals I limited myself to 
observing this behavior. The key point to move on I think would be 
mapping the additional LLC store operation that XDP needs for every 
packet (even when dropping them) to some code in the driver/XDP 
subsystem. This basically causes XDP-based I/O to have almost double LLC 
occupancy w.r.t. AF_XDP-based one (checking if this is Intel-specific of 
applies also to NVIDIA would also help narrow the possibilities). Any 
guidance on how to further inspect the problem would be really appreciated.

> Finally, since you seem to have your tests packaged up nicely, do you
> think it would be possible to take (some of) them and turn them into a
> kind of "performance CI" test suite, that can be run automatically, or
> semi-automatically to catch future performance regressions in the XDP
> stack? Such a test suite would be pretty great to have so we can avoid
> the "death by a thousand paper cuts" type of gradual performance
> degradation as we add new features...

I would be very happy if my work could benefit the community. Please let 
me know if you have any idea or guideline on how my testing suite could 
be integrated in the XDP environment, I guess the xdp-tools repo could 
be the ideal target?

Best regards,
Federico

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XDP and AF_XDP performance comparison
  2022-09-23 13:11   ` Federico Parola
@ 2022-12-16 15:11     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 4+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-12-16 15:11 UTC (permalink / raw)
  To: Federico Parola, xdp-newbies

Hi Federico

Sorry for not replying to this sooner, I just fished it out of my pile
of unprocessed email while I was winding things down for the holidays...

>> Finally, since you seem to have your tests packaged up nicely, do you
>> think it would be possible to take (some of) them and turn them into a
>> kind of "performance CI" test suite, that can be run automatically, or
>> semi-automatically to catch future performance regressions in the XDP
>> stack? Such a test suite would be pretty great to have so we can avoid
>> the "death by a thousand paper cuts" type of gradual performance
>> degradation as we add new features...
>
> I would be very happy if my work could benefit the community. Please let 
> me know if you have any idea or guideline on how my testing suite could 
> be integrated in the XDP environment, I guess the xdp-tools repo could 
> be the ideal target?

I'd be happy to consider your tool for inclusion into xdp-tools, or into
a separate repository under the xdp-project github organisation if that
fits better. I think that will mostly depend on how much surrounding
setup code is needed to run the experiments: the utilities in xdp-tools
are mostly fairly lightweight standalone binaries, so if something more
is needed a separate repository may be better.

As for how to structure the test utility, that's hard to say off-hand.
But one important consideration is making it easy to run (so as few
assumptions on the environment as possible), and easy to automate (so
that we ideally can use it as an automated CI-style test for performance
regressions).

Hope that helps! :)

-Toke


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-16 15:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22  8:21 XDP and AF_XDP performance comparison Federico Parola
2022-09-22 18:38 ` Toke Høiland-Jørgensen
2022-09-23 13:11   ` Federico Parola
2022-12-16 15:11     ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).