bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: magnus.karlsson@intel.com, bjorn@kernel.org, ast@kernel.org,
	daniel@iogearbox.net, netdev@vger.kernel.org,
	maciej.fijalkowski@intel.com, ciara.loftus@intel.com
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>,
	jonathan.lemon@gmail.com, bpf@vger.kernel.org,
	anthony.l.nguyen@intel.com
Subject: [PATCH bpf-next 00/13] xsk: i40e: ice: introduce batching for Rx buffer allocation
Date: Wed, 22 Sep 2021 09:56:00 +0200	[thread overview]
Message-ID: <20210922075613.12186-1-magnus.karlsson@gmail.com> (raw)

This patch set introduces a batched interface for Rx buffer allocation
in AF_XDP buffer pool. Instead of using xsk_buff_alloc(*pool), drivers
can now use xsk_buff_alloc_batch(*pool, **xdp_buff_array,
max). Instead of returning a pointer to an xdp_buff, it returns the
number of xdp_buffs it managed to allocate up to the maximum value of
the max parameter in the function call. Pointers to the allocated
xdp_buff:s are put in the xdp_buff_array supplied in the call. This
could be a SW ring that already exists in the driver or a new
structure that the driver has allocated.

u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool,
                         struct xdp_buff **xdp,
                         u32 max);

When using this interface, the driver should also use the new
interface below to set the relevant fields in the struct xdp_buff. The
reason for this is that xsk_buff_alloc_batch() does not fill in the
data and data_meta fields for you as is the case with
xsk_buff_alloc(). So it is not sufficient to just set data_end
(effectively the size) anymore in the driver. The reason for this is
performance as explained in detail in the commit message.

void xsk_buff_set_size(struct xdp_buff *xdp, u32 size);

Patch 6 also optimizes the buffer allocation in the aligned case. In
this case, we can skip the reinitialization of most fields in the
xdp_buff_xsk struct at allocation time. As the number of elements in
the heads array is equal to the number of possible buffers in the
umem, we can initialize them once and for all at bind time and then
just point to the correct one in the xdp_buff_array that is returned
to the driver. No reason to have a stack of free head entries. In the
unaligned case, the buffers can reside anywhere in the umem, so this
optimization is not possible as we still have to fill in the right
information in the xdp_buff every single time one is allocated.

I have updated i40e and ice to use this new batched interface.

These are the throughput results on my 2.1 GHz Cascade Lake system:

Aligned mode:
ice: +11% / -9 cycles/pkt
i40e: +12% / -9 cycles/pkt

Unaligned mode:
ice: +1.5% / -1 cycle/pkt
i40e: +1% / -1 cycle/pkt

For the aligned case, batching provides around 40% of the performance
improvement and the aligned optimization the rest, around 60%. Would
have expected a ~4% boost for unaligned with this data, but I only get
around 1%. Do not know why. Note that memory consumption in aligned
mode is also reduced by this patch set.

Structure of the patch set:

Patch 1: Removes an unused entry from xdp_buff_xsk.
Patch 2: Introduce the batched buffer allocation API and implementation.
Patch 3-4: Use the batched allocation interface for ice.
Patch 5: Use the batched allocation interface for i40e.
Patch 6: Optimize the buffer allocation for the aligned case.
Patch 7-10: Fix some issues with the tests that were found while
            implementing the two new tests below.
Patch 11-13: Implement two new tests: single packet and headroom validation.

Thanks: Magnus

Magnus Karlsson (13):
  xsk: get rid of unused entry in struct xdp_buff_xsk
  xsk: batched buffer allocation for the pool
  ice: use xdp_buf instead of rx_buf for xsk zero-copy
  ice: use the xsk batched rx allocation interface
  i40e: use the xsk batched rx allocation interface
  xsk: optimize for aligned case
  selftests: xsk: fix missing initialization
  selftests: xsk: put the same buffer only once in the fill ring
  selftests: xsk: fix socket creation retry
  selftests: xsk: introduce pacing of traffic
  selftests: xsk: add single packet test
  selftests: xsk: change interleaving of packets in unaligned mode
  selftests: xsk: add frame_headroom test

 drivers/net/ethernet/intel/i40e/i40e_xsk.c |  52 ++++----
 drivers/net/ethernet/intel/ice/ice_txrx.h  |  16 +--
 drivers/net/ethernet/intel/ice/ice_xsk.c   |  92 +++++++-------
 include/net/xdp_sock_drv.h                 |  22 ++++
 include/net/xsk_buff_pool.h                |  48 +++++++-
 net/xdp/xsk.c                              |  15 ---
 net/xdp/xsk_buff_pool.c                    | 131 +++++++++++++++++---
 net/xdp/xsk_queue.h                        |  12 +-
 tools/testing/selftests/bpf/xdpxceiver.c   | 133 ++++++++++++++++-----
 tools/testing/selftests/bpf/xdpxceiver.h   |  11 +-
 10 files changed, 376 insertions(+), 156 deletions(-)


base-commit: 17b52c226a9a170f1611f69d12a71be05748aefd
--
2.29.0

             reply	other threads:[~2021-09-22  7:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22  7:56 Magnus Karlsson [this message]
2021-09-22  7:56 ` [PATCH bpf-next 01/13] xsk: get rid of unused entry in struct xdp_buff_xsk Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 02/13] xsk: batched buffer allocation for the pool Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 03/13] ice: use xdp_buf instead of rx_buf for xsk zero-copy Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 04/13] ice: use the xsk batched rx allocation interface Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 05/13] i40e: " Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 06/13] xsk: optimize for aligned case Magnus Karlsson
2021-09-28 23:15   ` Nathan Chancellor
2021-09-29  5:52     ` Magnus Karlsson
2021-09-29 15:33   ` kernel test robot
2021-09-22  7:56 ` [PATCH bpf-next 07/13] selftests: xsk: fix missing initialization Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 08/13] selftests: xsk: put the same buffer only once in the fill ring Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 09/13] selftests: xsk: fix socket creation retry Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 10/13] selftests: xsk: introduce pacing of traffic Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 11/13] selftests: xsk: add single packet test Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 12/13] selftests: xsk: change interleaving of packets in unaligned mode Magnus Karlsson
2021-09-22  7:56 ` [PATCH bpf-next 13/13] selftests: xsk: add frame_headroom test Magnus Karlsson
2021-09-27 22:30 ` [PATCH bpf-next 00/13] xsk: i40e: ice: introduce batching for Rx buffer allocation patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210922075613.12186-1-magnus.karlsson@gmail.com \
    --to=magnus.karlsson@gmail.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=ciara.loftus@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=jonathan.lemon@gmail.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).