bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues
@ 2020-07-02 12:18 Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Magnus Karlsson
                   ` (15 more replies)
  0 siblings, 16 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:18 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

This patch set adds support to share a umem between AF_XDP sockets
bound to different queue ids on the same device or even between
devices. It has already been possible to do this by registering the
umem multiple times, but this wastes a lot of memory. Just imagine
having 10 threads each having 10 sockets open sharing a single
umem. This means that you would have to register the umem 100 times
consuming large quantities of memory.

Instead, we extend the existing XDP_SHARED_UMEM flag to also work when
sharing a umem between different queue ids as well as devices. If you
would like to share umem between two sockets, just create the first
one as would do normally. For the second socket you would not register
the same umem using the XDP_UMEM_REG setsockopt. Instead attach one
new fill ring and one new completion ring to this second socket and
then use the XDP_SHARED_UMEM bind flag supplying the file descriptor of
the first socket in the sxdp_shared_umem_fd field to signify that it
is the umem of the first socket you would like to share.

One important thing to note in this example, is that there needs to be
one fill ring and one completion ring per unique device and queue id
bound to. This so that the single-producer and single-consumer semantics
of the rings can be upheld. To recap, if you bind multiple sockets to
the same device and queue id (already supported without this patch
set), you only need one pair of fill and completion rings. If you bind
multiple sockets to multiple different queues or devices, you need one
fill and completion ring pair per unique device,queue_id tuple.

The implementation is based around extending the buffer pool in the
core xsk code. This is a structure that exists on a per unique device
and queue id basis. So, a number of entities that can now be shared
are moved from the umem to the buffer pool. Information about DMA
mappings are also moved from the buffer pool, but as these are per
device independent of the queue id, they are now hanging off the
netdev. In summary after this patch set, there is one xdp_sock struct
per socket created. This points to an xsk_buff_pool for which there is
one per unique device and queue id. The buffer pool points to a DMA
mapping structure for which there is one per device that a umem has
been bound to. And finally, the buffer pool also points to a xdp_umem
struct, for which there is only one per umem registration.

Before:

XSK -> UMEM -> POOL

Now:

XSK -> POOL -> DMA
            \
	     > UMEM

Patches 1-8 only rearrange internal structures to support the buffer
pool carrying this new information, while patch 9 improves performance
as we now have rearrange the internal structures quite a bit. Finally,
patches 10-14 introduce the new functionality together with libbpf
support, samples, and documentation.

Libbpf has also been extended to support sharing of umems between
sockets bound to different devices and queue ids by introducing a new
function called xsk_socket__create_shared(). The difference between
this and the existing xsk_socket__create() is that the former takes a
reference to a fill ring and a completion ring as these need to be
created. This new function needs to be used for the second and
following sockets that binds to the same umem. The first one can be
created by either function as it will also have called
xsk_umem__create().

There is also a new sample xsk_fwd that demonstrates this new
interface and capability.

Note to Maxim at Mellanox. I do not have a mlx5 card, so I have not
been able to test the changes to your driver. It compiles, but that is
all I can say, so it would be great if you could test it. Also, I did
change the name of many functions and variables from umem to pool as a
buffer pool is passed down to the driver in this patch set instead of
the umem. I did not change the name of the files umem.c and
umem.h. Please go through the changes and change things to your
liking.

Performance for the non-shared umem case is unchanged for the xdpsock
sample application with this patch set. For workloads that share a
umem, this patch set can give rise to added performance benefits due
to the decrease in memory usage.

This patch has been applied against commit 91f77560e473 ("Merge branch 'test_progs-improvements'")

Structure of the patch set:

Patch 1: Pass the buffer pool to the driver instead of the umem. This
         because the driver needs one buffer pool per napi context
         when we later introduce sharing of the umem between queue ids
         and devices.
Patch 2: Rename the xsk driver interface so they have better names
         after the move to the buffer pool
Patch 3: There is one buffer pool per device and queue, while there is
         only one umem per registration. The buffer pool needs to be
         created and destroyed independently of the umem.
Patch 4: Move fill and completion rings to the buffer pool as there will
         be one set of these per device and queue
Patch 5: Move queue_id, dev and need_wakeup to buffer pool again as these
         will now be per buffer pool as the umem can be shared between
         devices and queues
Patch 6: Move xsk_tx_list and its lock to buffer pool
Patch 7: Move the creation/deletion of addrs from buffer pool to umem
Patch 8: Enable sharing of DMA mappings when multiple queues of the
         same device are bound
Patch 9: Rearrange internal structs for better performance as these
         have been substantially scrambled by the previous patches
Patch 10: Add shared umem support between queue ids
Patch 11: Add shared umem support between devices
Patch 12: Add support for this in libbpf
Patch 13: Add a new sample that demonstrates this new feature by
          forwarding packets between different netdevs and queues
Patch 14: Add documentation

Thanks: Magnus

Cristian Dumitrescu (1):
  samples/bpf: add new sample xsk_fwd.c

Magnus Karlsson (13):
  xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of
    umem
  xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces
  xsk: create and free context independently from umem
  xsk: move fill and completion rings to buffer pool
  xsk: move queue_id, dev and need_wakeup to context
  xsk: move xsk_tx_list and its lock to buffer pool
  xsk: move addrs from buffer pool to umem
  xsk: net: enable sharing of dma mappings
  xsk: rearrange internal structs for better performance
  xsk: add shared umem support between queue ids
  xsk: add shared umem support between devices
  libbpf: support shared umems between queues and devices
  xsk: documentation for XDP_SHARED_UMEM between queues and netdevs

 Documentation/networking/af_xdp.rst                |   68 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c        |   29 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   10 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c         |   79 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.h         |    4 +-
 drivers/net/ethernet/intel/ice/ice.h               |   18 +-
 drivers/net/ethernet/intel/ice/ice_base.c          |   16 +-
 drivers/net/ethernet/intel/ice/ice_lib.c           |    2 +-
 drivers/net/ethernet/intel/ice/ice_main.c          |   10 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c          |    8 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h          |    2 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c           |  142 +--
 drivers/net/ethernet/intel/ice/ice_xsk.h           |    7 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h           |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   34 +-
 .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h   |    7 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |   61 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   19 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |    5 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |   10 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c |   12 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.h |    2 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    |   12 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h    |    6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c  |  108 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h  |   14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   46 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   16 +-
 include/linux/netdevice.h                          |   13 +-
 include/net/xdp_sock.h                             |   28 +-
 include/net/xdp_sock_drv.h                         |  115 ++-
 include/net/xsk_buff_pool.h                        |   47 +-
 net/core/dev.c                                     |    3 +
 net/ethtool/channels.c                             |    2 +-
 net/ethtool/ioctl.c                                |    2 +-
 net/xdp/xdp_umem.c                                 |  221 +---
 net/xdp/xdp_umem.h                                 |    6 -
 net/xdp/xsk.c                                      |  213 ++--
 net/xdp/xsk.h                                      |    3 +
 net/xdp/xsk_buff_pool.c                            |  314 +++++-
 net/xdp/xsk_diag.c                                 |   14 +-
 net/xdp/xsk_queue.h                                |   12 +-
 samples/bpf/Makefile                               |    3 +
 samples/bpf/xsk_fwd.c                              | 1075 ++++++++++++++++++++
 tools/lib/bpf/libbpf.map                           |    1 +
 tools/lib/bpf/xsk.c                                |  376 ++++---
 tools/lib/bpf/xsk.h                                |    9 +
 49 files changed, 2327 insertions(+), 883 deletions(-)
 create mode 100644 samples/bpf/xsk_fwd.c

--
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-08 15:00   ` Maxim Mikityanskiy
  2020-07-02 12:19 ` [PATCH bpf-next 02/14] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces Magnus Karlsson
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Replace the explicit umem reference passed to the driver in
AF_XDP zero-copy mode with the buffer pool instead. This in
preparation for extending the functionality of the zero-copy mode
so that umems can be shared between queues on the same netdev and
also between netdevs. In this commit, only an umem reference has
been added to the buffer pool struct. But later commits will add
other entities to it. These are going to be entities that are
different between different queue ids and netdevs even though the
umem is shared between them.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |   2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  29 +++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  10 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c         |  81 ++++++------
 drivers/net/ethernet/intel/i40e/i40e_xsk.h         |   4 +-
 drivers/net/ethernet/intel/ice/ice.h               |  18 +--
 drivers/net/ethernet/intel/ice/ice_base.c          |  16 +--
 drivers/net/ethernet/intel/ice/ice_lib.c           |   2 +-
 drivers/net/ethernet/intel/ice/ice_main.c          |  10 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c          |   8 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h          |   2 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c           | 142 ++++++++++-----------
 drivers/net/ethernet/intel/ice/ice_xsk.h           |   7 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h           |   2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  34 ++---
 .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h   |   7 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |  61 ++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  19 +--
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |   5 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.h |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h    |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c  | 108 ++++++++--------
 .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h  |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  46 +++----
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  16 +--
 include/linux/netdevice.h                          |  10 +-
 include/net/xdp_sock_drv.h                         |   7 +-
 include/net/xsk_buff_pool.h                        |   4 +-
 net/ethtool/channels.c                             |   2 +-
 net/ethtool/ioctl.c                                |   2 +-
 net/xdp/xdp_umem.c                                 |  45 +++----
 net/xdp/xsk_buff_pool.c                            |   5 +-
 36 files changed, 389 insertions(+), 373 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index aa8026b..422b54f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1967,7 +1967,7 @@ static int i40e_set_ringparam(struct net_device *netdev,
 	    (new_rx_count == vsi->rx_rings[0]->count))
 		return 0;
 
-	/* If there is a AF_XDP UMEM attached to any of Rx rings,
+	/* If there is a AF_XDP page pool attached to any of Rx rings,
 	 * disallow changing the number of descriptors -- regardless
 	 * if the netdev is running or not.
 	 */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5d807c8..3df725e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3103,12 +3103,12 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 }
 
 /**
- * i40e_xsk_umem - Retrieve the AF_XDP ZC if XDP and ZC is enabled
+ * i40e_xsk_pool - Retrieve the AF_XDP buffer pool if XDP and ZC is enabled
  * @ring: The Tx or Rx ring
  *
- * Returns the UMEM or NULL.
+ * Returns the AF_XDP buffer pool or NULL.
  **/
-static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
+static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring)
 {
 	bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi);
 	int qid = ring->queue_index;
@@ -3119,7 +3119,7 @@ static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
 	if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps))
 		return NULL;
 
-	return xdp_get_umem_from_qid(ring->vsi->netdev, qid);
+	return xdp_get_xsk_pool_from_qid(ring->vsi->netdev, qid);
 }
 
 /**
@@ -3138,7 +3138,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	u32 qtx_ctl = 0;
 
 	if (ring_is_xdp(ring))
-		ring->xsk_umem = i40e_xsk_umem(ring);
+		ring->xsk_pool = i40e_xsk_pool(ring);
 
 	/* some ATR related tx ring init */
 	if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) {
@@ -3261,12 +3261,13 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 		xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
 
 	kfree(ring->rx_bi);
-	ring->xsk_umem = i40e_xsk_umem(ring);
-	if (ring->xsk_umem) {
+	ring->xsk_pool = i40e_xsk_pool(ring);
+	if (ring->xsk_pool) {
 		ret = i40e_alloc_rx_bi_zc(ring);
 		if (ret)
 			return ret;
-		ring->rx_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem);
+		ring->rx_buf_len =
+		  xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
 		/* For AF_XDP ZC, we disallow packets to span on
 		 * multiple buffers, thus letting us skip that
 		 * handling in the fast-path.
@@ -3349,8 +3350,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q);
 	writel(0, ring->tail);
 
-	if (ring->xsk_umem) {
-		xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq);
+	if (ring->xsk_pool) {
+		xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
 		ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring));
 	} else {
 		ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring));
@@ -3361,7 +3362,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 		 */
 		dev_info(&vsi->back->pdev->dev,
 			 "Failed to allocate some buffers on %sRx ring %d (pf_q %d)\n",
-			 ring->xsk_umem ? "UMEM enabled " : "",
+			 ring->xsk_pool ? "AF_XDP ZC enabled " : "",
 			 ring->queue_index, pf_q);
 	}
 
@@ -12553,7 +12554,7 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
 	 */
 	if (need_reset && prog)
 		for (i = 0; i < vsi->num_queue_pairs; i++)
-			if (vsi->xdp_rings[i]->xsk_umem)
+			if (vsi->xdp_rings[i]->xsk_pool)
 				(void)i40e_xsk_wakeup(vsi->netdev, i,
 						      XDP_WAKEUP_RX);
 
@@ -12835,8 +12836,8 @@ static int i40e_xdp(struct net_device *dev,
 	case XDP_QUERY_PROG:
 		xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0;
 		return 0;
-	case XDP_SETUP_XSK_UMEM:
-		return i40e_xsk_umem_setup(vsi, xdp->xsk.umem,
+	case XDP_SETUP_XSK_POOL:
+		return i40e_xsk_pool_setup(vsi, xdp->xsk.pool,
 					   xdp->xsk.queue_id);
 	default:
 		return -EINVAL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f9555c8..a50592b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -636,7 +636,7 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
 	unsigned long bi_size;
 	u16 i;
 
-	if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) {
+	if (ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
 		i40e_xsk_clean_tx_ring(tx_ring);
 	} else {
 		/* ring already cleared, nothing to do */
@@ -1335,7 +1335,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 		rx_ring->skb = NULL;
 	}
 
-	if (rx_ring->xsk_umem) {
+	if (rx_ring->xsk_pool) {
 		i40e_xsk_clean_rx_ring(rx_ring);
 		goto skip_free;
 	}
@@ -1369,7 +1369,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 	}
 
 skip_free:
-	if (rx_ring->xsk_umem)
+	if (rx_ring->xsk_pool)
 		i40e_clear_rx_bi_zc(rx_ring);
 	else
 		i40e_clear_rx_bi(rx_ring);
@@ -2579,7 +2579,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */
 	i40e_for_each_ring(ring, q_vector->tx) {
-		bool wd = ring->xsk_umem ?
+		bool wd = ring->xsk_pool ?
 			  i40e_clean_xdp_tx_irq(vsi, ring, budget) :
 			  i40e_clean_tx_irq(vsi, ring, budget);
 
@@ -2601,7 +2601,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	budget_per_ring = max(budget/q_vector->num_ringpairs, 1);
 
 	i40e_for_each_ring(ring, q_vector->rx) {
-		int cleaned = ring->xsk_umem ?
+		int cleaned = ring->xsk_pool ?
 			      i40e_clean_rx_irq_zc(ring, budget_per_ring) :
 			      i40e_clean_rx_irq(ring, budget_per_ring);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 5c25597..88d43ed 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -411,7 +411,7 @@ struct i40e_ring {
 
 	struct i40e_channel *ch;
 	struct xdp_rxq_info xdp_rxq;
-	struct xdp_umem *xsk_umem;
+	struct xsk_buff_pool *xsk_pool;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ring_uses_build_skb(struct i40e_ring *ring)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 7276580..d7ebdf6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -29,14 +29,16 @@ static struct xdp_buff **i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx)
 }
 
 /**
- * i40e_xsk_umem_enable - Enable/associate a UMEM to a certain ring/qid
+ * i40e_xsk_pool_enable - Enable/associate an AF_XDP buffer pool to a
+ * certain ring/qid
  * @vsi: Current VSI
- * @umem: UMEM
- * @qid: Rx ring to associate UMEM to
+ * @pool: buffer pool
+ * @qid: Rx ring to associate buffer pool with
  *
  * Returns 0 on success, <0 on failure
  **/
-static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
+static int i40e_xsk_pool_enable(struct i40e_vsi *vsi,
+				struct xsk_buff_pool *pool,
 				u16 qid)
 {
 	struct net_device *netdev = vsi->netdev;
@@ -53,7 +55,8 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
 	    qid >= netdev->real_num_tx_queues)
 		return -EINVAL;
 
-	err = xsk_buff_dma_map(umem, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR);
+	err = xsk_buff_dma_map(pool->umem, &vsi->back->pdev->dev,
+			       I40E_RX_DMA_ATTR);
 	if (err)
 		return err;
 
@@ -80,21 +83,22 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
 }
 
 /**
- * i40e_xsk_umem_disable - Disassociate a UMEM from a certain ring/qid
+ * i40e_xsk_pool_disable - Disassociate an AF_XDP buffer pool from a
+ * certain ring/qid
  * @vsi: Current VSI
- * @qid: Rx ring to associate UMEM to
+ * @qid: Rx ring to associate buffer pool with
  *
  * Returns 0 on success, <0 on failure
  **/
-static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
+static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid)
 {
 	struct net_device *netdev = vsi->netdev;
-	struct xdp_umem *umem;
+	struct xsk_buff_pool *pool;
 	bool if_running;
 	int err;
 
-	umem = xdp_get_umem_from_qid(netdev, qid);
-	if (!umem)
+	pool = xdp_get_xsk_pool_from_qid(netdev, qid);
+	if (!pool)
 		return -EINVAL;
 
 	if_running = netif_running(vsi->netdev) && i40e_enabled_xdp_vsi(vsi);
@@ -106,7 +110,7 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
 	}
 
 	clear_bit(qid, vsi->af_xdp_zc_qps);
-	xsk_buff_dma_unmap(umem, I40E_RX_DMA_ATTR);
+	xsk_buff_dma_unmap(pool->umem, I40E_RX_DMA_ATTR);
 
 	if (if_running) {
 		err = i40e_queue_pair_enable(vsi, qid);
@@ -118,20 +122,21 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
 }
 
 /**
- * i40e_xsk_umem_setup - Enable/disassociate a UMEM to/from a ring/qid
+ * i40e_xsk_pool_setup - Enable/disassociate an AF_XDP buffer pool to/from
+ * a ring/qid
  * @vsi: Current VSI
- * @umem: UMEM to enable/associate to a ring, or NULL to disable
- * @qid: Rx ring to (dis)associate UMEM (from)to
+ * @pool: Buffer pool to enable/associate to a ring, or NULL to disable
+ * @qid: Rx ring to (dis)associate buffer pool (from)to
  *
- * This function enables or disables a UMEM to a certain ring.
+ * This function enables or disables a buffer pool to a certain ring.
  *
  * Returns 0 on success, <0 on failure
  **/
-int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
+int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
 			u16 qid)
 {
-	return umem ? i40e_xsk_umem_enable(vsi, umem, qid) :
-		i40e_xsk_umem_disable(vsi, qid);
+	return pool ? i40e_xsk_pool_enable(vsi, pool, qid) :
+		i40e_xsk_pool_disable(vsi, qid);
 }
 
 /**
@@ -191,7 +196,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 	rx_desc = I40E_RX_DESC(rx_ring, ntu);
 	bi = i40e_rx_bi(rx_ring, ntu);
 	do {
-		xdp = xsk_buff_alloc(rx_ring->xsk_umem);
+		xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
 		if (!xdp) {
 			ok = false;
 			goto no_buffers;
@@ -358,11 +363,11 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
 	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
 
-	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) {
+	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
 		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
-			xsk_set_rx_need_wakeup(rx_ring->xsk_umem);
+			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
 		else
-			xsk_clear_rx_need_wakeup(rx_ring->xsk_umem);
+			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
 
 		return (int)total_rx_packets;
 	}
@@ -391,11 +396,12 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
+		if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc))
 			break;
 
-		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma,
+		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem,
+					   desc.addr);
+		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma,
 						 desc.len);
 
 		tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
@@ -419,7 +425,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 						 I40E_TXD_QW1_CMD_SHIFT);
 		i40e_xdp_ring_update_tail(xdp_ring);
 
-		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
+		xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem);
 	}
 
 	return !!budget && work_done;
@@ -452,7 +458,7 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi,
 {
 	unsigned int ntc, total_bytes = 0, budget = vsi->work_limit;
 	u32 i, completed_frames, frames_ready, xsk_frames = 0;
-	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
 	u32 head_idx = i40e_get_head(tx_ring);
 	bool work_done = true, xmit_done;
 	struct i40e_tx_buffer *tx_bi;
@@ -492,14 +498,14 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi,
 		tx_ring->next_to_clean -= tx_ring->count;
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(umem, xsk_frames);
+		xsk_umem_complete_tx(bp->umem, xsk_frames);
 
 	i40e_arm_wb(tx_ring, vsi, budget);
 	i40e_update_tx_stats(tx_ring, completed_frames, total_bytes);
 
 out_xmit:
-	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem))
-		xsk_set_tx_need_wakeup(tx_ring->xsk_umem);
+	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_pool->umem))
+		xsk_set_tx_need_wakeup(tx_ring->xsk_pool->umem);
 
 	xmit_done = i40e_xmit_zc(tx_ring, budget);
 
@@ -533,7 +539,7 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
 	if (queue_id >= vsi->num_queue_pairs)
 		return -ENXIO;
 
-	if (!vsi->xdp_rings[queue_id]->xsk_umem)
+	if (!vsi->xdp_rings[queue_id]->xsk_pool)
 		return -ENXIO;
 
 	ring = vsi->xdp_rings[queue_id];
@@ -572,7 +578,7 @@ void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring)
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
 {
 	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
-	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
 	struct i40e_tx_buffer *tx_bi;
 	u32 xsk_frames = 0;
 
@@ -592,14 +598,15 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(umem, xsk_frames);
+		xsk_umem_complete_tx(bp->umem, xsk_frames);
 }
 
 /**
- * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have AF_XDP UMEM attached
+ * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have an AF_XDP
+ * buffer pool attached
  * @vsi: vsi
  *
- * Returns true if any of the Rx rings has an AF_XDP UMEM attached
+ * Returns true if any of the Rx rings has an AF_XDP buffer pool attached
  **/
 bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
 {
@@ -607,7 +614,7 @@ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
 	int i;
 
 	for (i = 0; i < vsi->num_queue_pairs; i++) {
-		if (xdp_get_umem_from_qid(netdev, i))
+		if (xdp_get_xsk_pool_from_qid(netdev, i))
 			return true;
 	}
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index ea919a7d..a5ad927 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -5,12 +5,12 @@
 #define _I40E_XSK_H_
 
 struct i40e_vsi;
-struct xdp_umem;
+struct xsk_buff_pool;
 struct zero_copy_allocator;
 
 int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
-int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
+int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
 			u16 qid);
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 5792ee6..9eff7e8 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -318,9 +318,9 @@ struct ice_vsi {
 	struct ice_ring **xdp_rings;	 /* XDP ring array */
 	u16 num_xdp_txq;		 /* Used XDP queues */
 	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
-	struct xdp_umem **xsk_umems;
-	u16 num_xsk_umems_used;
-	u16 num_xsk_umems;
+	struct xsk_buff_pool **xsk_pools;
+	u16 num_xsk_pools_used;
+	u16 num_xsk_pools;
 } ____cacheline_internodealigned_in_smp;
 
 /* struct that defines an interrupt vector */
@@ -489,25 +489,25 @@ static inline void ice_set_ring_xdp(struct ice_ring *ring)
 }
 
 /**
- * ice_xsk_umem - get XDP UMEM bound to a ring
+ * ice_xsk_pool - get XSK buffer pool bound to a ring
  * @ring - ring to use
  *
- * Returns a pointer to xdp_umem structure if there is an UMEM present,
+ * Returns a pointer to xdp_umem structure if there is a buffer pool present,
  * NULL otherwise.
  */
-static inline struct xdp_umem *ice_xsk_umem(struct ice_ring *ring)
+static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_ring *ring)
 {
-	struct xdp_umem **umems = ring->vsi->xsk_umems;
+	struct xsk_buff_pool **pools = ring->vsi->xsk_pools;
 	u16 qid = ring->q_index;
 
 	if (ice_ring_is_xdp(ring))
 		qid -= ring->vsi->num_xdp_txq;
 
-	if (qid >= ring->vsi->num_xsk_umems || !umems || !umems[qid] ||
+	if (qid >= ring->vsi->num_xsk_pools || !pools || !pools[qid] ||
 	    !ice_is_xdp_ena_vsi(ring->vsi))
 		return NULL;
 
-	return umems[qid];
+	return pools[qid];
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index d620d26..94dbf89 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -308,12 +308,12 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 			xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
 					 ring->q_index);
 
-		ring->xsk_umem = ice_xsk_umem(ring);
-		if (ring->xsk_umem) {
+		ring->xsk_pool = ice_xsk_pool(ring);
+		if (ring->xsk_pool) {
 			xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
 
 			ring->rx_buf_len =
-				xsk_umem_get_rx_frame_size(ring->xsk_umem);
+				xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
 			/* For AF_XDP ZC, we disallow packets to span on
 			 * multiple buffers, thus letting us skip that
 			 * handling in the fast-path.
@@ -324,7 +324,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 							 NULL);
 			if (err)
 				return err;
-			xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq);
+			xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
 
 			dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
 				 ring->q_index);
@@ -417,9 +417,9 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 	ring->tail = hw->hw_addr + QRX_TAIL(pf_q);
 	writel(0, ring->tail);
 
-	if (ring->xsk_umem) {
-		if (!xsk_buff_can_alloc(ring->xsk_umem, num_bufs)) {
-			dev_warn(dev, "UMEM does not provide enough addresses to fill %d buffers on Rx ring %d\n",
+	if (ring->xsk_pool) {
+		if (!xsk_buff_can_alloc(ring->xsk_pool->umem, num_bufs)) {
+			dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n",
 				 num_bufs, ring->q_index);
 			dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n");
 
@@ -428,7 +428,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 
 		err = ice_alloc_rx_bufs_zc(ring, num_bufs);
 		if (err)
-			dev_info(dev, "Failed to allocate some buffers on UMEM enabled Rx ring %d (pf_q %d)\n",
+			dev_info(dev, "Failed to allocate some buffers on XSK buffer pool enabled Rx ring %d (pf_q %d)\n",
 				 ring->q_index, pf_q);
 		return 0;
 	}
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 28b46cc..e87e25a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1713,7 +1713,7 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi)
 		return ret;
 
 	for (i = 0; i < vsi->num_xdp_txq; i++)
-		vsi->xdp_rings[i]->xsk_umem = ice_xsk_umem(vsi->xdp_rings[i]);
+		vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi->xdp_rings[i]);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 082825e..b354abaf 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1706,7 +1706,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 		if (ice_setup_tx_ring(xdp_ring))
 			goto free_xdp_rings;
 		ice_set_ring_xdp(xdp_ring);
-		xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring);
+		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
 	}
 
 	return 0;
@@ -1950,13 +1950,13 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 	if (if_running)
 		ret = ice_up(vsi);
 
-	if (!ret && prog && vsi->xsk_umems) {
+	if (!ret && prog && vsi->xsk_pools) {
 		int i;
 
 		ice_for_each_rxq(vsi, i) {
 			struct ice_ring *rx_ring = vsi->rx_rings[i];
 
-			if (rx_ring->xsk_umem)
+			if (rx_ring->xsk_pool)
 				napi_schedule(&rx_ring->q_vector->napi);
 		}
 	}
@@ -1985,8 +1985,8 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	case XDP_QUERY_PROG:
 		xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0;
 		return 0;
-	case XDP_SETUP_XSK_UMEM:
-		return ice_xsk_umem_setup(vsi, xdp->xsk.umem,
+	case XDP_SETUP_XSK_POOL:
+		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
 					  xdp->xsk.queue_id);
 	default:
 		return -EINVAL;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index abdb137c..241c1ea 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -145,7 +145,7 @@ void ice_clean_tx_ring(struct ice_ring *tx_ring)
 {
 	u16 i;
 
-	if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_umem) {
+	if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
 		ice_xsk_clean_xdp_ring(tx_ring);
 		goto tx_skip_free;
 	}
@@ -375,7 +375,7 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring)
 	if (!rx_ring->rx_buf)
 		return;
 
-	if (rx_ring->xsk_umem) {
+	if (rx_ring->xsk_pool) {
 		ice_xsk_clean_rx_ring(rx_ring);
 		goto rx_skip_free;
 	}
@@ -1619,7 +1619,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */
 	ice_for_each_ring(ring, q_vector->tx) {
-		bool wd = ring->xsk_umem ?
+		bool wd = ring->xsk_pool ?
 			  ice_clean_tx_irq_zc(ring, budget) :
 			  ice_clean_tx_irq(ring, budget);
 
@@ -1649,7 +1649,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 		 * comparison in the irq context instead of many inside the
 		 * ice_clean_rx_irq function and makes the codebase cleaner.
 		 */
-		cleaned = ring->xsk_umem ?
+		cleaned = ring->xsk_pool ?
 			  ice_clean_rx_irq_zc(ring, budget_per_ring) :
 			  ice_clean_rx_irq(ring, budget_per_ring);
 		work_done += cleaned;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index e70c461..3b37360 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -295,7 +295,7 @@ struct ice_ring {
 
 	struct rcu_head rcu;		/* to avoid race on free */
 	struct bpf_prog *xdp_prog;
-	struct xdp_umem *xsk_umem;
+	struct xsk_buff_pool *xsk_pool;
 	/* CL3 - 3rd cacheline starts here */
 	struct xdp_rxq_info xdp_rxq;
 	/* CLX - the below items are only accessed infrequently and should be
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index b6f928c..f0ce669 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -234,7 +234,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 		if (err)
 			goto free_buf;
 		ice_set_ring_xdp(xdp_ring);
-		xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring);
+		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
 	}
 
 	err = ice_setup_rx_ctx(rx_ring);
@@ -258,21 +258,21 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 }
 
 /**
- * ice_xsk_alloc_umems - allocate a UMEM region for an XDP socket
- * @vsi: VSI to allocate the UMEM on
+ * ice_xsk_alloc_pools - allocate a buffer pool for an XDP socket
+ * @vsi: VSI to allocate the buffer pool on
  *
  * Returns 0 on success, negative on error
  */
-static int ice_xsk_alloc_umems(struct ice_vsi *vsi)
+static int ice_xsk_alloc_pools(struct ice_vsi *vsi)
 {
-	if (vsi->xsk_umems)
+	if (vsi->xsk_pools)
 		return 0;
 
-	vsi->xsk_umems = kcalloc(vsi->num_xsk_umems, sizeof(*vsi->xsk_umems),
+	vsi->xsk_pools = kcalloc(vsi->num_xsk_pools, sizeof(*vsi->xsk_pools),
 				 GFP_KERNEL);
 
-	if (!vsi->xsk_umems) {
-		vsi->num_xsk_umems = 0;
+	if (!vsi->xsk_pools) {
+		vsi->num_xsk_pools = 0;
 		return -ENOMEM;
 	}
 
@@ -280,74 +280,74 @@ static int ice_xsk_alloc_umems(struct ice_vsi *vsi)
 }
 
 /**
- * ice_xsk_remove_umem - Remove an UMEM for a certain ring/qid
+ * ice_xsk_remove_pool - Remove an buffer pool for a certain ring/qid
  * @vsi: VSI from which the VSI will be removed
- * @qid: Ring/qid associated with the UMEM
+ * @qid: Ring/qid associated with the buffer pool
  */
-static void ice_xsk_remove_umem(struct ice_vsi *vsi, u16 qid)
+static void ice_xsk_remove_pool(struct ice_vsi *vsi, u16 qid)
 {
-	vsi->xsk_umems[qid] = NULL;
-	vsi->num_xsk_umems_used--;
+	vsi->xsk_pools[qid] = NULL;
+	vsi->num_xsk_pools_used--;
 
-	if (vsi->num_xsk_umems_used == 0) {
-		kfree(vsi->xsk_umems);
-		vsi->xsk_umems = NULL;
-		vsi->num_xsk_umems = 0;
+	if (vsi->num_xsk_pools_used == 0) {
+		kfree(vsi->xsk_pools);
+		vsi->xsk_pools = NULL;
+		vsi->num_xsk_pools = 0;
 	}
 }
 
 
 /**
- * ice_xsk_umem_disable - disable a UMEM region
+ * ice_xsk_pool_disable - disable a buffer pool region
  * @vsi: Current VSI
  * @qid: queue ID
  *
  * Returns 0 on success, negative on failure
  */
-static int ice_xsk_umem_disable(struct ice_vsi *vsi, u16 qid)
+static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid)
 {
-	if (!vsi->xsk_umems || qid >= vsi->num_xsk_umems ||
-	    !vsi->xsk_umems[qid])
+	if (!vsi->xsk_pools || qid >= vsi->num_xsk_pools ||
+	    !vsi->xsk_pools[qid])
 		return -EINVAL;
 
-	xsk_buff_dma_unmap(vsi->xsk_umems[qid], ICE_RX_DMA_ATTR);
-	ice_xsk_remove_umem(vsi, qid);
+	xsk_buff_dma_unmap(vsi->xsk_pools[qid]->umem, ICE_RX_DMA_ATTR);
+	ice_xsk_remove_pool(vsi, qid);
 
 	return 0;
 }
 
 /**
- * ice_xsk_umem_enable - enable a UMEM region
+ * ice_xsk_pool_enable - enable a buffer pool region
  * @vsi: Current VSI
- * @umem: pointer to a requested UMEM region
+ * @pool: pointer to a requested buffer pool region
  * @qid: queue ID
  *
  * Returns 0 on success, negative on failure
  */
 static int
-ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
+ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
 {
 	int err;
 
 	if (vsi->type != ICE_VSI_PF)
 		return -EINVAL;
 
-	if (!vsi->num_xsk_umems)
-		vsi->num_xsk_umems = min_t(u16, vsi->num_rxq, vsi->num_txq);
-	if (qid >= vsi->num_xsk_umems)
+	if (!vsi->num_xsk_pools)
+		vsi->num_xsk_pools = min_t(u16, vsi->num_rxq, vsi->num_txq);
+	if (qid >= vsi->num_xsk_pools)
 		return -EINVAL;
 
-	err = ice_xsk_alloc_umems(vsi);
+	err = ice_xsk_alloc_pools(vsi);
 	if (err)
 		return err;
 
-	if (vsi->xsk_umems && vsi->xsk_umems[qid])
+	if (vsi->xsk_pools && vsi->xsk_pools[qid])
 		return -EBUSY;
 
-	vsi->xsk_umems[qid] = umem;
-	vsi->num_xsk_umems_used++;
+	vsi->xsk_pools[qid] = pool;
+	vsi->num_xsk_pools_used++;
 
-	err = xsk_buff_dma_map(vsi->xsk_umems[qid], ice_pf_to_dev(vsi->back),
+	err = xsk_buff_dma_map(vsi->xsk_pools[qid]->umem, ice_pf_to_dev(vsi->back),
 			       ICE_RX_DMA_ATTR);
 	if (err)
 		return err;
@@ -356,17 +356,17 @@ ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
 }
 
 /**
- * ice_xsk_umem_setup - enable/disable a UMEM region depending on its state
+ * ice_xsk_pool_setup - enable/disable a buffer pool region depending on its state
  * @vsi: Current VSI
- * @umem: UMEM to enable/associate to a ring, NULL to disable
+ * @pool: buffer pool to enable/associate to a ring, NULL to disable
  * @qid: queue ID
  *
  * Returns 0 on success, negative on failure
  */
-int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
+int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
 {
-	bool if_running, umem_present = !!umem;
-	int ret = 0, umem_failure = 0;
+	bool if_running, pool_present = !!pool;
+	int ret = 0, pool_failure = 0;
 
 	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
 
@@ -374,26 +374,26 @@ int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
 		ret = ice_qp_dis(vsi, qid);
 		if (ret) {
 			netdev_err(vsi->netdev, "ice_qp_dis error = %d\n", ret);
-			goto xsk_umem_if_up;
+			goto xsk_pool_if_up;
 		}
 	}
 
-	umem_failure = umem_present ? ice_xsk_umem_enable(vsi, umem, qid) :
-				      ice_xsk_umem_disable(vsi, qid);
+	pool_failure = pool_present ? ice_xsk_pool_enable(vsi, pool, qid) :
+				      ice_xsk_pool_disable(vsi, qid);
 
-xsk_umem_if_up:
+xsk_pool_if_up:
 	if (if_running) {
 		ret = ice_qp_ena(vsi, qid);
-		if (!ret && umem_present)
+		if (!ret && pool_present)
 			napi_schedule(&vsi->xdp_rings[qid]->q_vector->napi);
 		else if (ret)
 			netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret);
 	}
 
-	if (umem_failure) {
-		netdev_err(vsi->netdev, "Could not %sable UMEM, error = %d\n",
-			   umem_present ? "en" : "dis", umem_failure);
-		return umem_failure;
+	if (pool_failure) {
+		netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n",
+			   pool_present ? "en" : "dis", pool_failure);
+		return pool_failure;
 	}
 
 	return ret;
@@ -424,7 +424,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
 	rx_buf = &rx_ring->rx_buf[ntu];
 
 	do {
-		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_umem);
+		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
 		if (!rx_buf->xdp) {
 			ret = true;
 			break;
@@ -645,11 +645,11 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 	ice_finalize_xdp_rx(rx_ring, xdp_xmit);
 	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
 
-	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) {
+	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
 		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
-			xsk_set_rx_need_wakeup(rx_ring->xsk_umem);
+			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
 		else
-			xsk_clear_rx_need_wakeup(rx_ring->xsk_umem);
+			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
 
 		return (int)total_rx_packets;
 	}
@@ -682,11 +682,11 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
 
 		tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use];
 
-		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
+		if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc))
 			break;
 
-		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma,
+		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, desc.addr);
+		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma,
 						 desc.len);
 
 		tx_buf->bytecount = desc.len;
@@ -703,9 +703,9 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
 
 	if (tx_desc) {
 		ice_xdp_ring_update_tail(xdp_ring);
-		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
-		if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem))
-			xsk_clear_tx_need_wakeup(xdp_ring->xsk_umem);
+		xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem);
+		if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem))
+			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem);
 	}
 
 	return budget > 0 && work_done;
@@ -779,13 +779,13 @@ bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget)
 	xdp_ring->next_to_clean = ntc;
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames);
+		xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames);
 
-	if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem)) {
+	if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) {
 		if (xdp_ring->next_to_clean == xdp_ring->next_to_use)
-			xsk_set_tx_need_wakeup(xdp_ring->xsk_umem);
+			xsk_set_tx_need_wakeup(xdp_ring->xsk_pool->umem);
 		else
-			xsk_clear_tx_need_wakeup(xdp_ring->xsk_umem);
+			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem);
 	}
 
 	ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes);
@@ -820,7 +820,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
 	if (queue_id >= vsi->num_txq)
 		return -ENXIO;
 
-	if (!vsi->xdp_rings[queue_id]->xsk_umem)
+	if (!vsi->xdp_rings[queue_id]->xsk_pool)
 		return -ENXIO;
 
 	ring = vsi->xdp_rings[queue_id];
@@ -839,20 +839,20 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
 }
 
 /**
- * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP UMEM attached
+ * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP buff pool attached
  * @vsi: VSI to be checked
  *
- * Returns true if any of the Rx rings has an AF_XDP UMEM attached
+ * Returns true if any of the Rx rings has an AF_XDP buff pool attached
  */
 bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi)
 {
 	int i;
 
-	if (!vsi->xsk_umems)
+	if (!vsi->xsk_pools)
 		return false;
 
-	for (i = 0; i < vsi->num_xsk_umems; i++) {
-		if (vsi->xsk_umems[i])
+	for (i = 0; i < vsi->num_xsk_pools; i++) {
+		if (vsi->xsk_pools[i])
 			return true;
 	}
 
@@ -860,7 +860,7 @@ bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi)
 }
 
 /**
- * ice_xsk_clean_rx_ring - clean UMEM queues connected to a given Rx ring
+ * ice_xsk_clean_rx_ring - clean buffer pool queues connected to a given Rx ring
  * @rx_ring: ring to be cleaned
  */
 void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
@@ -878,7 +878,7 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
 }
 
 /**
- * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its UMEM queues
+ * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its buffer pool queues
  * @xdp_ring: XDP_Tx ring
  */
 void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
@@ -902,5 +902,5 @@ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames);
+		xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
index fc1a06b..fad7836 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.h
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
@@ -9,7 +9,8 @@
 struct ice_vsi;
 
 #ifdef CONFIG_XDP_SOCKETS
-int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid);
+int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
+		       u16 qid);
 int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget);
 bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget);
 int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
@@ -19,8 +20,8 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring);
 void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring);
 #else
 static inline int
-ice_xsk_umem_setup(struct ice_vsi __always_unused *vsi,
-		   struct xdp_umem __always_unused *umem,
+ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
+		   struct xsk_buff_pool __always_unused *pool,
 		   u16 __always_unused qid)
 {
 	return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5ddfc83..bd0f65e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -350,7 +350,7 @@ struct ixgbe_ring {
 		struct ixgbe_rx_queue_stats rx_stats;
 	};
 	struct xdp_rxq_info xdp_rxq;
-	struct xdp_umem *xsk_umem;
+	struct xsk_buff_pool *xsk_pool;
 	u16 ring_idx;		/* {rx,tx,xdp}_ring back reference idx */
 	u16 rx_buf_len;
 } ____cacheline_internodealigned_in_smp;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f162b8b..3217000 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3158,7 +3158,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 #endif
 
 	ixgbe_for_each_ring(ring, q_vector->tx) {
-		bool wd = ring->xsk_umem ?
+		bool wd = ring->xsk_pool ?
 			  ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) :
 			  ixgbe_clean_tx_irq(q_vector, ring, budget);
 
@@ -3178,7 +3178,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
-		int cleaned = ring->xsk_umem ?
+		int cleaned = ring->xsk_pool ?
 			      ixgbe_clean_rx_irq_zc(q_vector, ring,
 						    per_ring_budget) :
 			      ixgbe_clean_rx_irq(q_vector, ring,
@@ -3473,9 +3473,9 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	u32 txdctl = IXGBE_TXDCTL_ENABLE;
 	u8 reg_idx = ring->reg_idx;
 
-	ring->xsk_umem = NULL;
+	ring->xsk_pool = NULL;
 	if (ring_is_xdp(ring))
-		ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+		ring->xsk_pool = ixgbe_xsk_pool(adapter, ring);
 
 	/* disable queue to avoid issues while updating state */
 	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
@@ -3715,8 +3715,8 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
 	srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT;
 
 	/* configure the packet buffer length */
-	if (rx_ring->xsk_umem) {
-		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_umem);
+	if (rx_ring->xsk_pool) {
+		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_pool->umem);
 
 		/* If the MAC support setting RXDCTL.RLPML, the
 		 * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and
@@ -4061,12 +4061,12 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	u8 reg_idx = ring->reg_idx;
 
 	xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
-	ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
-	if (ring->xsk_umem) {
+	ring->xsk_pool = ixgbe_xsk_pool(adapter, ring);
+	if (ring->xsk_pool) {
 		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
 						   MEM_TYPE_XSK_BUFF_POOL,
 						   NULL));
-		xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq);
+		xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
 	} else {
 		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
 						   MEM_TYPE_PAGE_SHARED, NULL));
@@ -4121,8 +4121,8 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 #endif
 	}
 
-	if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) {
-		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem);
+	if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) {
+		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
 
 		rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
 			    IXGBE_RXDCTL_RLPML_EN);
@@ -4144,7 +4144,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
 
 	ixgbe_rx_desc_queue_enable(adapter, ring);
-	if (ring->xsk_umem)
+	if (ring->xsk_pool)
 		ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring));
 	else
 		ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
@@ -5277,7 +5277,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 	u16 i = rx_ring->next_to_clean;
 	struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i];
 
-	if (rx_ring->xsk_umem) {
+	if (rx_ring->xsk_pool) {
 		ixgbe_xsk_clean_rx_ring(rx_ring);
 		goto skip_free;
 	}
@@ -5965,7 +5965,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	u16 i = tx_ring->next_to_clean;
 	struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
 
-	if (tx_ring->xsk_umem) {
+	if (tx_ring->xsk_pool) {
 		ixgbe_xsk_clean_tx_ring(tx_ring);
 		goto out;
 	}
@@ -10290,7 +10290,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
 	 */
 	if (need_reset && prog)
 		for (i = 0; i < adapter->num_rx_queues; i++)
-			if (adapter->xdp_ring[i]->xsk_umem)
+			if (adapter->xdp_ring[i]->xsk_pool)
 				(void)ixgbe_xsk_wakeup(adapter->netdev, i,
 						       XDP_WAKEUP_RX);
 
@@ -10308,8 +10308,8 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 		xdp->prog_id = adapter->xdp_prog ?
 			adapter->xdp_prog->aux->id : 0;
 		return 0;
-	case XDP_SETUP_XSK_UMEM:
-		return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem,
+	case XDP_SETUP_XSK_POOL:
+		return ixgbe_xsk_pool_setup(adapter, xdp->xsk.pool,
 					    xdp->xsk.queue_id);
 
 	default:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 7887ae4..2aeec78 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -28,9 +28,10 @@ void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask);
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
 
-struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
-				struct ixgbe_ring *ring);
-int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter,
+				     struct ixgbe_ring *ring);
+int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter,
+			 struct xsk_buff_pool *pool,
 			 u16 qid);
 
 void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index be9d2a8..9f503d6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -8,8 +8,8 @@
 #include "ixgbe.h"
 #include "ixgbe_txrx_common.h"
 
-struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
-				struct ixgbe_ring *ring)
+struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter,
+				     struct ixgbe_ring *ring)
 {
 	bool xdp_on = READ_ONCE(adapter->xdp_prog);
 	int qid = ring->ring_idx;
@@ -17,11 +17,11 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
 	if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps))
 		return NULL;
 
-	return xdp_get_umem_from_qid(adapter->netdev, qid);
+	return xdp_get_xsk_pool_from_qid(adapter->netdev, qid);
 }
 
-static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
-				 struct xdp_umem *umem,
+static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter,
+				 struct xsk_buff_pool *pool,
 				 u16 qid)
 {
 	struct net_device *netdev = adapter->netdev;
@@ -35,7 +35,7 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
 	    qid >= netdev->real_num_tx_queues)
 		return -EINVAL;
 
-	err = xsk_buff_dma_map(umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR);
+	err = xsk_buff_dma_map(pool->umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR);
 	if (err)
 		return err;
 
@@ -59,13 +59,13 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
 	return 0;
 }
 
-static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
+static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid)
 {
-	struct xdp_umem *umem;
+	struct xsk_buff_pool *pool;
 	bool if_running;
 
-	umem = xdp_get_umem_from_qid(adapter->netdev, qid);
-	if (!umem)
+	pool = xdp_get_xsk_pool_from_qid(adapter->netdev, qid);
+	if (!pool)
 		return -EINVAL;
 
 	if_running = netif_running(adapter->netdev) &&
@@ -75,7 +75,7 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
 		ixgbe_txrx_ring_disable(adapter, qid);
 
 	clear_bit(qid, adapter->af_xdp_zc_qps);
-	xsk_buff_dma_unmap(umem, IXGBE_RX_DMA_ATTR);
+	xsk_buff_dma_unmap(pool->umem, IXGBE_RX_DMA_ATTR);
 
 	if (if_running)
 		ixgbe_txrx_ring_enable(adapter, qid);
@@ -83,11 +83,12 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
 	return 0;
 }
 
-int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter,
+			 struct xsk_buff_pool *pool,
 			 u16 qid)
 {
-	return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) :
-		ixgbe_xsk_umem_disable(adapter, qid);
+	return pool ? ixgbe_xsk_pool_enable(adapter, pool, qid) :
+		ixgbe_xsk_pool_disable(adapter, qid);
 }
 
 static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
@@ -149,7 +150,7 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
 	i -= rx_ring->count;
 
 	do {
-		bi->xdp = xsk_buff_alloc(rx_ring->xsk_umem);
+		bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
 		if (!bi->xdp) {
 			ok = false;
 			break;
@@ -344,11 +345,11 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	q_vector->rx.total_packets += total_rx_packets;
 	q_vector->rx.total_bytes += total_rx_bytes;
 
-	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) {
+	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
 		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
-			xsk_set_rx_need_wakeup(rx_ring->xsk_umem);
+			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
 		else
-			xsk_clear_rx_need_wakeup(rx_ring->xsk_umem);
+			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
 
 		return (int)total_rx_packets;
 	}
@@ -373,6 +374,7 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
 
 static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 {
+	struct xsk_buff_pool *pool = xdp_ring->xsk_pool;
 	union ixgbe_adv_tx_desc *tx_desc = NULL;
 	struct ixgbe_tx_buffer *tx_bi;
 	bool work_done = true;
@@ -387,12 +389,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
+		if (!xsk_umem_consume_tx(pool->umem, &desc))
 			break;
 
-		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma,
-						 desc.len);
+		dma = xsk_buff_raw_get_dma(pool->umem, desc.addr);
+		xsk_buff_raw_dma_sync_for_device(pool->umem, dma, desc.len);
 
 		tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
 		tx_bi->bytecount = desc.len;
@@ -418,7 +419,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 
 	if (tx_desc) {
 		ixgbe_xdp_ring_update_tail(xdp_ring);
-		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
+		xsk_umem_consume_tx_done(pool->umem);
 	}
 
 	return !!budget && work_done;
@@ -439,7 +440,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 {
 	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
 	unsigned int total_packets = 0, total_bytes = 0;
-	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct xsk_buff_pool *pool = tx_ring->xsk_pool;
 	union ixgbe_adv_tx_desc *tx_desc;
 	struct ixgbe_tx_buffer *tx_bi;
 	u32 xsk_frames = 0;
@@ -484,10 +485,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 	q_vector->tx.total_packets += total_packets;
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(umem, xsk_frames);
+		xsk_umem_complete_tx(pool->umem, xsk_frames);
 
-	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem))
-		xsk_set_tx_need_wakeup(tx_ring->xsk_umem);
+	if (xsk_umem_uses_need_wakeup(pool->umem))
+		xsk_set_tx_need_wakeup(pool->umem);
 
 	return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
 }
@@ -511,7 +512,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 	if (test_bit(__IXGBE_TX_DISABLED, &ring->state))
 		return -ENETDOWN;
 
-	if (!ring->xsk_umem)
+	if (!ring->xsk_pool)
 		return -ENXIO;
 
 	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
@@ -526,7 +527,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
 {
 	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
-	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct xsk_buff_pool *pool = tx_ring->xsk_pool;
 	struct ixgbe_tx_buffer *tx_bi;
 	u32 xsk_frames = 0;
 
@@ -546,5 +547,5 @@ void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(umem, xsk_frames);
+		xsk_umem_complete_tx(pool->umem, xsk_frames);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 842db20..516dfd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -448,7 +448,7 @@ struct mlx5e_xdpsq {
 	struct mlx5e_cq            cq;
 
 	/* read only */
-	struct xdp_umem           *umem;
+	struct xsk_buff_pool      *pool;
 	struct mlx5_wq_cyc         wq;
 	struct mlx5e_xdpsq_stats  *stats;
 	mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check;
@@ -610,7 +610,7 @@ struct mlx5e_rq {
 	struct page_pool      *page_pool;
 
 	/* AF_XDP zero-copy */
-	struct xdp_umem       *umem;
+	struct xsk_buff_pool  *xsk_pool;
 
 	struct work_struct     recover_work;
 
@@ -731,12 +731,13 @@ struct mlx5e_hv_vhca_stats_agent {
 #endif
 
 struct mlx5e_xsk {
-	/* UMEMs are stored separately from channels, because we don't want to
-	 * lose them when channels are recreated. The kernel also stores UMEMs,
-	 * but it doesn't distinguish between zero-copy and non-zero-copy UMEMs,
-	 * so rely on our mechanism.
+	/* XSK buffer pools are stored separately from channels,
+	 * because we don't want to lose them when channels are
+	 * recreated. The kernel also stores buffer pool, but it doesn't
+	 * distinguish between zero-copy and non-zero-copy UMEMs, so
+	 * rely on our mechanism.
 	 */
-	struct xdp_umem **umems;
+	struct xsk_buff_pool **pools;
 	u16 refcnt;
 	bool ever_used;
 };
@@ -948,7 +949,7 @@ struct mlx5e_xsk_param;
 struct mlx5e_rq_param;
 int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		  struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
-		  struct xdp_umem *umem, struct mlx5e_rq *rq);
+		  struct xsk_buff_pool *pool, struct mlx5e_rq *rq);
 int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time);
 void mlx5e_deactivate_rq(struct mlx5e_rq *rq);
 void mlx5e_close_rq(struct mlx5e_rq *rq);
@@ -958,7 +959,7 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		     struct mlx5e_sq_param *param, struct mlx5e_icosq *sq);
 void mlx5e_close_icosq(struct mlx5e_icosq *sq);
 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
-		     struct mlx5e_sq_param *param, struct xdp_umem *umem,
+		     struct mlx5e_sq_param *param, struct xsk_buff_pool *pool,
 		     struct mlx5e_xdpsq *sq, bool is_redirect);
 void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index c9d308e..0a5a873 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -446,7 +446,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
 	} while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->umem, xsk_frames);
+		xsk_umem_complete_tx(sq->pool->umem, xsk_frames);
 
 	sq->stats->cqes += i;
 
@@ -476,7 +476,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->umem, xsk_frames);
+		xsk_umem_complete_tx(sq->pool->umem, xsk_frames);
 }
 
 int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
@@ -561,4 +561,3 @@ void mlx5e_set_xmit_fp(struct mlx5e_xdpsq *sq, bool is_mpw)
 	sq->xmit_xdp_frame = is_mpw ?
 		mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame;
 }
-
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index d147b2f..3dd056a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -19,10 +19,10 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 					      struct mlx5e_wqe_frag_info *wi,
 					      u32 cqe_bcnt);
 
-static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
+static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
 					    struct mlx5e_dma_info *dma_info)
 {
-	dma_info->xsk = xsk_buff_alloc(rq->umem);
+	dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem);
 	if (!dma_info->xsk)
 		return -ENOMEM;
 
@@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 
 static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err)
 {
-	if (!xsk_umem_uses_need_wakeup(rq->umem))
+	if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem))
 		return alloc_err;
 
 	if (unlikely(alloc_err))
-		xsk_set_rx_need_wakeup(rq->umem);
+		xsk_set_rx_need_wakeup(rq->xsk_pool->umem);
 	else
-		xsk_clear_rx_need_wakeup(rq->umem);
+		xsk_clear_rx_need_wakeup(rq->xsk_pool->umem);
 
 	return false;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index 2c80205..f32a381 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -62,7 +62,7 @@ static void mlx5e_build_xsk_cparam(struct mlx5e_priv *priv,
 }
 
 int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
-		   struct mlx5e_xsk_param *xsk, struct xdp_umem *umem,
+		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
 		   struct mlx5e_channel *c)
 {
 	struct mlx5e_channel_param *cparam;
@@ -82,7 +82,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 	if (unlikely(err))
 		goto err_free_cparam;
 
-	err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq);
+	err = mlx5e_open_rq(c, params, &cparam->rq, xsk, pool, &c->xskrq);
 	if (unlikely(err))
 		goto err_close_rx_cq;
 
@@ -90,13 +90,13 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 	if (unlikely(err))
 		goto err_close_rq;
 
-	/* Create a separate SQ, so that when the UMEM is disabled, we could
+	/* Create a separate SQ, so that when the buff pool is disabled, we could
 	 * close this SQ safely and stop receiving CQEs. In other case, e.g., if
-	 * the XDPSQ was used instead, we might run into trouble when the UMEM
+	 * the XDPSQ was used instead, we might run into trouble when the buff pool
 	 * is disabled and then reenabled, but the SQ continues receiving CQEs
-	 * from the old UMEM.
+	 * from the old buff pool.
 	 */
-	err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true);
+	err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, pool, &c->xsksq, true);
 	if (unlikely(err))
 		goto err_close_tx_cq;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
index 0dd11b8..ca20f1f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
@@ -12,7 +12,7 @@ bool mlx5e_validate_xsk_param(struct mlx5e_params *params,
 			      struct mlx5e_xsk_param *xsk,
 			      struct mlx5_core_dev *mdev);
 int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
-		   struct mlx5e_xsk_param *xsk, struct xdp_umem *umem,
+		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
 		   struct mlx5e_channel *c);
 void mlx5e_close_xsk(struct mlx5e_channel *c);
 void mlx5e_activate_xsk(struct mlx5e_channel *c);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 83dce9c..abe4639 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -66,7 +66,7 @@ static void mlx5e_xsk_tx_post_err(struct mlx5e_xdpsq *sq,
 
 bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 {
-	struct xdp_umem *umem = sq->umem;
+	struct xsk_buff_pool *pool = sq->pool;
 	struct mlx5e_xdp_info xdpi;
 	struct mlx5e_xdp_xmit_data xdptxd;
 	bool work_done = true;
@@ -83,7 +83,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(umem, &desc)) {
+		if (!xsk_umem_consume_tx(pool->umem, &desc)) {
 			/* TX will get stuck until something wakes it up by
 			 * triggering NAPI. Currently it's expected that the
 			 * application calls sendto() if there are consumed, but
@@ -92,11 +92,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		xdptxd.dma_addr = xsk_buff_raw_get_dma(umem, desc.addr);
-		xdptxd.data = xsk_buff_raw_get_data(umem, desc.addr);
+		xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr);
+		xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr);
 		xdptxd.len = desc.len;
 
-		xsk_buff_raw_dma_sync_for_device(umem, xdptxd.dma_addr, xdptxd.len);
+		xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len);
 
 		if (unlikely(!sq->xmit_xdp_frame(sq, &xdptxd, &xdpi, check_result))) {
 			if (sq->mpwqe.wqe)
@@ -113,7 +113,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			mlx5e_xdp_mpwqe_complete(sq);
 		mlx5e_xmit_xdp_doorbell(sq);
 
-		xsk_umem_consume_tx_done(umem);
+		xsk_umem_consume_tx_done(pool->umem);
 	}
 
 	return !(budget && work_done);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
index 39fa0a7..610a084 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
@@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget);
 
 static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq)
 {
-	if (!xsk_umem_uses_need_wakeup(sq->umem))
+	if (!xsk_umem_uses_need_wakeup(sq->pool->umem))
 		return;
 
 	if (sq->pc != sq->cc)
-		xsk_clear_tx_need_wakeup(sq->umem);
+		xsk_clear_tx_need_wakeup(sq->pool->umem);
 	else
-		xsk_set_tx_need_wakeup(sq->umem);
+		xsk_set_tx_need_wakeup(sq->pool->umem);
 }
 
 #endif /* __MLX5_EN_XSK_TX_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
index 7b17fcd..947abf1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
@@ -6,26 +6,26 @@
 #include "setup.h"
 #include "en/params.h"
 
-static int mlx5e_xsk_map_umem(struct mlx5e_priv *priv,
-			      struct xdp_umem *umem)
+static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv,
+			      struct xsk_buff_pool *pool)
 {
 	struct device *dev = priv->mdev->device;
 
-	return xsk_buff_dma_map(umem, dev, 0);
+	return xsk_buff_dma_map(pool->umem, dev, 0);
 }
 
-static void mlx5e_xsk_unmap_umem(struct mlx5e_priv *priv,
-				 struct xdp_umem *umem)
+static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv,
+				 struct xsk_buff_pool *pool)
 {
-	return xsk_buff_dma_unmap(umem, 0);
+	return xsk_buff_dma_unmap(pool->umem, 0);
 }
 
-static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk)
+static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk)
 {
-	if (!xsk->umems) {
-		xsk->umems = kcalloc(MLX5E_MAX_NUM_CHANNELS,
-				     sizeof(*xsk->umems), GFP_KERNEL);
-		if (unlikely(!xsk->umems))
+	if (!xsk->pools) {
+		xsk->pools = kcalloc(MLX5E_MAX_NUM_CHANNELS,
+				     sizeof(*xsk->pools), GFP_KERNEL);
+		if (unlikely(!xsk->pools))
 			return -ENOMEM;
 	}
 
@@ -35,68 +35,68 @@ static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk)
 	return 0;
 }
 
-static void mlx5e_xsk_put_umems(struct mlx5e_xsk *xsk)
+static void mlx5e_xsk_put_pools(struct mlx5e_xsk *xsk)
 {
 	if (!--xsk->refcnt) {
-		kfree(xsk->umems);
-		xsk->umems = NULL;
+		kfree(xsk->pools);
+		xsk->pools = NULL;
 	}
 }
 
-static int mlx5e_xsk_add_umem(struct mlx5e_xsk *xsk, struct xdp_umem *umem, u16 ix)
+static int mlx5e_xsk_add_pool(struct mlx5e_xsk *xsk, struct xsk_buff_pool *pool, u16 ix)
 {
 	int err;
 
-	err = mlx5e_xsk_get_umems(xsk);
+	err = mlx5e_xsk_get_pools(xsk);
 	if (unlikely(err))
 		return err;
 
-	xsk->umems[ix] = umem;
+	xsk->pools[ix] = pool;
 	return 0;
 }
 
-static void mlx5e_xsk_remove_umem(struct mlx5e_xsk *xsk, u16 ix)
+static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix)
 {
-	xsk->umems[ix] = NULL;
+	xsk->pools[ix] = NULL;
 
-	mlx5e_xsk_put_umems(xsk);
+	mlx5e_xsk_put_pools(xsk);
 }
 
-static bool mlx5e_xsk_is_umem_sane(struct xdp_umem *umem)
+static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool)
 {
-	return xsk_umem_get_headroom(umem) <= 0xffff &&
-		xsk_umem_get_chunk_size(umem) <= 0xffff;
+	return xsk_umem_get_headroom(pool->umem) <= 0xffff &&
+		xsk_umem_get_chunk_size(pool->umem) <= 0xffff;
 }
 
-void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk)
+void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk)
 {
-	xsk->headroom = xsk_umem_get_headroom(umem);
-	xsk->chunk_size = xsk_umem_get_chunk_size(umem);
+	xsk->headroom = xsk_umem_get_headroom(pool->umem);
+	xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem);
 }
 
 static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
-				   struct xdp_umem *umem, u16 ix)
+				   struct xsk_buff_pool *pool, u16 ix)
 {
 	struct mlx5e_params *params = &priv->channels.params;
 	struct mlx5e_xsk_param xsk;
 	struct mlx5e_channel *c;
 	int err;
 
-	if (unlikely(mlx5e_xsk_get_umem(&priv->channels.params, &priv->xsk, ix)))
+	if (unlikely(mlx5e_xsk_get_pool(&priv->channels.params, &priv->xsk, ix)))
 		return -EBUSY;
 
-	if (unlikely(!mlx5e_xsk_is_umem_sane(umem)))
+	if (unlikely(!mlx5e_xsk_is_pool_sane(pool)))
 		return -EINVAL;
 
-	err = mlx5e_xsk_map_umem(priv, umem);
+	err = mlx5e_xsk_map_pool(priv, pool);
 	if (unlikely(err))
 		return err;
 
-	err = mlx5e_xsk_add_umem(&priv->xsk, umem, ix);
+	err = mlx5e_xsk_add_pool(&priv->xsk, pool, ix);
 	if (unlikely(err))
-		goto err_unmap_umem;
+		goto err_unmap_pool;
 
-	mlx5e_build_xsk_param(umem, &xsk);
+	mlx5e_build_xsk_param(pool, &xsk);
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
 		/* XSK objects will be created on open. */
@@ -112,9 +112,9 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 
 	c = priv->channels.c[ix];
 
-	err = mlx5e_open_xsk(priv, params, &xsk, umem, c);
+	err = mlx5e_open_xsk(priv, params, &xsk, pool, c);
 	if (unlikely(err))
-		goto err_remove_umem;
+		goto err_remove_pool;
 
 	mlx5e_activate_xsk(c);
 
@@ -132,11 +132,11 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 	mlx5e_deactivate_xsk(c);
 	mlx5e_close_xsk(c);
 
-err_remove_umem:
-	mlx5e_xsk_remove_umem(&priv->xsk, ix);
+err_remove_pool:
+	mlx5e_xsk_remove_pool(&priv->xsk, ix);
 
-err_unmap_umem:
-	mlx5e_xsk_unmap_umem(priv, umem);
+err_unmap_pool:
+	mlx5e_xsk_unmap_pool(priv, pool);
 
 	return err;
 
@@ -146,7 +146,7 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 	 */
 	if (!mlx5e_validate_xsk_param(params, &xsk, priv->mdev)) {
 		err = -EINVAL;
-		goto err_remove_umem;
+		goto err_remove_pool;
 	}
 
 	return 0;
@@ -154,45 +154,45 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 
 static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix)
 {
-	struct xdp_umem *umem = mlx5e_xsk_get_umem(&priv->channels.params,
+	struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&priv->channels.params,
 						   &priv->xsk, ix);
 	struct mlx5e_channel *c;
 
-	if (unlikely(!umem))
+	if (unlikely(!pool))
 		return -EINVAL;
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
-		goto remove_umem;
+		goto remove_pool;
 
 	/* XSK RQ and SQ are only created if XDP program is set. */
 	if (!priv->channels.params.xdp_prog)
-		goto remove_umem;
+		goto remove_pool;
 
 	c = priv->channels.c[ix];
 	mlx5e_xsk_redirect_rqt_to_drop(priv, ix);
 	mlx5e_deactivate_xsk(c);
 	mlx5e_close_xsk(c);
 
-remove_umem:
-	mlx5e_xsk_remove_umem(&priv->xsk, ix);
-	mlx5e_xsk_unmap_umem(priv, umem);
+remove_pool:
+	mlx5e_xsk_remove_pool(&priv->xsk, ix);
+	mlx5e_xsk_unmap_pool(priv, pool);
 
 	return 0;
 }
 
-static int mlx5e_xsk_enable_umem(struct mlx5e_priv *priv, struct xdp_umem *umem,
+static int mlx5e_xsk_enable_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool,
 				 u16 ix)
 {
 	int err;
 
 	mutex_lock(&priv->state_lock);
-	err = mlx5e_xsk_enable_locked(priv, umem, ix);
+	err = mlx5e_xsk_enable_locked(priv, pool, ix);
 	mutex_unlock(&priv->state_lock);
 
 	return err;
 }
 
-static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix)
+static int mlx5e_xsk_disable_pool(struct mlx5e_priv *priv, u16 ix)
 {
 	int err;
 
@@ -203,7 +203,7 @@ static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix)
 	return err;
 }
 
-int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid)
+int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	struct mlx5e_params *params = &priv->channels.params;
@@ -212,8 +212,8 @@ int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid)
 	if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
 		return -EINVAL;
 
-	return umem ? mlx5e_xsk_enable_umem(priv, umem, ix) :
-		      mlx5e_xsk_disable_umem(priv, ix);
+	return pool ? mlx5e_xsk_enable_pool(priv, pool, ix) :
+		      mlx5e_xsk_disable_pool(priv, ix);
 }
 
 u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk)
@@ -221,7 +221,7 @@ u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk
 	u16 res = xsk->refcnt ? params->num_channels : 0;
 
 	while (res) {
-		if (mlx5e_xsk_get_umem(params, xsk, res - 1))
+		if (mlx5e_xsk_get_pool(params, xsk, res - 1))
 			break;
 		--res;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
index 25b4cbe..629db33 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
@@ -6,25 +6,25 @@
 
 #include "en.h"
 
-static inline struct xdp_umem *mlx5e_xsk_get_umem(struct mlx5e_params *params,
-						  struct mlx5e_xsk *xsk, u16 ix)
+static inline struct xsk_buff_pool *mlx5e_xsk_get_pool(struct mlx5e_params *params,
+						       struct mlx5e_xsk *xsk, u16 ix)
 {
-	if (!xsk || !xsk->umems)
+	if (!xsk || !xsk->pools)
 		return NULL;
 
 	if (unlikely(ix >= params->num_channels))
 		return NULL;
 
-	return xsk->umems[ix];
+	return xsk->pools[ix];
 }
 
 struct mlx5e_xsk_param;
-void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk);
+void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk);
 
 /* .ndo_bpf callback. */
-int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid);
+int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid);
 
-int mlx5e_xsk_resize_reuseq(struct xdp_umem *umem, u32 nentries);
+int mlx5e_xsk_resize_reuseq(struct xsk_buff_pool *pool, u32 nentries);
 
 u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a836a02..2b4a3e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -365,7 +365,7 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work)
 static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 			  struct mlx5e_params *params,
 			  struct mlx5e_xsk_param *xsk,
-			  struct xdp_umem *umem,
+			  struct xsk_buff_pool *pool,
 			  struct mlx5e_rq_param *rqp,
 			  struct mlx5e_rq *rq)
 {
@@ -391,9 +391,9 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	rq->mdev    = mdev;
 	rq->hw_mtu  = MLX5E_SW2HW_MTU(params, params->sw_mtu);
 	rq->xdpsq   = &c->rq_xdpsq;
-	rq->umem    = umem;
+	rq->xsk_pool = pool;
 
-	if (rq->umem)
+	if (rq->xsk_pool)
 		rq->stats = &c->priv->channel_stats[c->ix].xskrq;
 	else
 		rq->stats = &c->priv->channel_stats[c->ix].rq;
@@ -518,7 +518,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	if (xsk) {
 		err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
 						 MEM_TYPE_XSK_BUFF_POOL, NULL);
-		xsk_buff_set_rxq_info(rq->umem, &rq->xdp_rxq);
+		xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq);
 	} else {
 		/* Create a page_pool and register it with rxq */
 		pp_params.order     = 0;
@@ -857,11 +857,11 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
 
 int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		  struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
-		  struct xdp_umem *umem, struct mlx5e_rq *rq)
+		  struct xsk_buff_pool *pool, struct mlx5e_rq *rq)
 {
 	int err;
 
-	err = mlx5e_alloc_rq(c, params, xsk, umem, param, rq);
+	err = mlx5e_alloc_rq(c, params, xsk, pool, param, rq);
 	if (err)
 		return err;
 
@@ -963,7 +963,7 @@ static int mlx5e_alloc_xdpsq_db(struct mlx5e_xdpsq *sq, int numa)
 
 static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
 			     struct mlx5e_params *params,
-			     struct xdp_umem *umem,
+			     struct xsk_buff_pool *pool,
 			     struct mlx5e_sq_param *param,
 			     struct mlx5e_xdpsq *sq,
 			     bool is_redirect)
@@ -979,9 +979,9 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
 	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
 	sq->min_inline_mode = params->tx_min_inline_mode;
 	sq->hw_mtu    = MLX5E_SW2HW_MTU(params, params->sw_mtu);
-	sq->umem      = umem;
+	sq->pool      = pool;
 
-	sq->stats = sq->umem ?
+	sq->stats = sq->pool ?
 		&c->priv->channel_stats[c->ix].xsksq :
 		is_redirect ?
 			&c->priv->channel_stats[c->ix].xdpsq :
@@ -1445,13 +1445,13 @@ void mlx5e_close_icosq(struct mlx5e_icosq *sq)
 }
 
 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
-		     struct mlx5e_sq_param *param, struct xdp_umem *umem,
+		     struct mlx5e_sq_param *param, struct xsk_buff_pool *pool,
 		     struct mlx5e_xdpsq *sq, bool is_redirect)
 {
 	struct mlx5e_create_sq_param csp = {};
 	int err;
 
-	err = mlx5e_alloc_xdpsq(c, params, umem, param, sq, is_redirect);
+	err = mlx5e_alloc_xdpsq(c, params, pool, param, sq, is_redirect);
 	if (err)
 		return err;
 
@@ -1927,7 +1927,7 @@ static u8 mlx5e_enumerate_lag_port(struct mlx5_core_dev *mdev, int ix)
 static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 			      struct mlx5e_params *params,
 			      struct mlx5e_channel_param *cparam,
-			      struct xdp_umem *umem,
+			      struct xsk_buff_pool *pool,
 			      struct mlx5e_channel **cp)
 {
 	int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
@@ -1966,9 +1966,9 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 	if (unlikely(err))
 		goto err_napi_del;
 
-	if (umem) {
-		mlx5e_build_xsk_param(umem, &xsk);
-		err = mlx5e_open_xsk(priv, params, &xsk, umem, c);
+	if (pool) {
+		mlx5e_build_xsk_param(pool, &xsk);
+		err = mlx5e_open_xsk(priv, params, &xsk, pool, c);
 		if (unlikely(err))
 			goto err_close_queues;
 	}
@@ -2316,12 +2316,12 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
 
 	mlx5e_build_channel_param(priv, &chs->params, cparam);
 	for (i = 0; i < chs->num; i++) {
-		struct xdp_umem *umem = NULL;
+		struct xsk_buff_pool *pool = NULL;
 
 		if (chs->params.xdp_prog)
-			umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, i);
+			pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, i);
 
-		err = mlx5e_open_channel(priv, i, &chs->params, cparam, umem, &chs->c[i]);
+		err = mlx5e_open_channel(priv, i, &chs->params, cparam, pool, &chs->c[i]);
 		if (err)
 			goto err_close_channels;
 	}
@@ -3882,13 +3882,13 @@ static bool mlx5e_xsk_validate_mtu(struct net_device *netdev,
 	u16 ix;
 
 	for (ix = 0; ix < chs->params.num_channels; ix++) {
-		struct xdp_umem *umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, ix);
+		struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, ix);
 		struct mlx5e_xsk_param xsk;
 
-		if (!umem)
+		if (!pool)
 			continue;
 
-		mlx5e_build_xsk_param(umem, &xsk);
+		mlx5e_build_xsk_param(pool, &xsk);
 
 		if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) {
 			u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk);
@@ -4518,8 +4518,8 @@ static int mlx5e_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	case XDP_QUERY_PROG:
 		xdp->prog_id = mlx5e_xdp_query(dev);
 		return 0;
-	case XDP_SETUP_XSK_UMEM:
-		return mlx5e_xsk_setup_umem(dev, xdp->xsk.umem,
+	case XDP_SETUP_XSK_POOL:
+		return mlx5e_xsk_setup_pool(dev, xdp->xsk.pool,
 					    xdp->xsk.queue_id);
 	default:
 		return -EINVAL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index dbb1c63..1dcf77d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -264,8 +264,8 @@ static inline int mlx5e_page_alloc_pool(struct mlx5e_rq *rq,
 static inline int mlx5e_page_alloc(struct mlx5e_rq *rq,
 				   struct mlx5e_dma_info *dma_info)
 {
-	if (rq->umem)
-		return mlx5e_xsk_page_alloc_umem(rq, dma_info);
+	if (rq->xsk_pool)
+		return mlx5e_xsk_page_alloc_pool(rq, dma_info);
 	else
 		return mlx5e_page_alloc_pool(rq, dma_info);
 }
@@ -296,7 +296,7 @@ static inline void mlx5e_page_release(struct mlx5e_rq *rq,
 				      struct mlx5e_dma_info *dma_info,
 				      bool recycle)
 {
-	if (rq->umem)
+	if (rq->xsk_pool)
 		/* The `recycle` parameter is ignored, and the page is always
 		 * put into the Reuse Ring, because there is no way to return
 		 * the page to the userspace when the interface goes down.
@@ -383,14 +383,14 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
 	int err;
 	int i;
 
-	if (rq->umem) {
+	if (rq->xsk_pool) {
 		int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags;
 
 		/* Check in advance that we have enough frames, instead of
 		 * allocating one-by-one, failing and moving frames to the
 		 * Reuse Ring.
 		 */
-		if (unlikely(!xsk_buff_can_alloc(rq->umem, pages_desired)))
+		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired)))
 			return -ENOMEM;
 	}
 
@@ -488,8 +488,8 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	/* Check in advance that we have enough frames, instead of allocating
 	 * one-by-one, failing and moving frames to the Reuse Ring.
 	 */
-	if (rq->umem &&
-	    unlikely(!xsk_buff_can_alloc(rq->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
+	if (rq->xsk_pool &&
+	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
 		err = -ENOMEM;
 		goto err;
 	}
@@ -700,7 +700,7 @@ bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
 	 * the driver when it refills the Fill Ring.
 	 * 2. Otherwise, busy poll by rescheduling the NAPI poll.
 	 */
-	if (unlikely(alloc_err == -ENOMEM && rq->umem))
+	if (unlikely(alloc_err == -ENOMEM && rq->xsk_pool))
 		return true;
 
 	return false;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6fc613e..e5acc3b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -616,7 +616,7 @@ struct netdev_queue {
 	/* Subordinate device that the queue has been assigned to */
 	struct net_device	*sb_dev;
 #ifdef CONFIG_XDP_SOCKETS
-	struct xdp_umem         *umem;
+	struct xsk_buff_pool    *pool;
 #endif
 /*
  * write-mostly part
@@ -749,7 +749,7 @@ struct netdev_rx_queue {
 	struct net_device		*dev;
 	struct xdp_rxq_info		xdp_rxq;
 #ifdef CONFIG_XDP_SOCKETS
-	struct xdp_umem                 *umem;
+	struct xsk_buff_pool            *pool;
 #endif
 } ____cacheline_aligned_in_smp;
 
@@ -879,7 +879,7 @@ enum bpf_netdev_command {
 	/* BPF program for offload callbacks, invoked at program load time. */
 	BPF_OFFLOAD_MAP_ALLOC,
 	BPF_OFFLOAD_MAP_FREE,
-	XDP_SETUP_XSK_UMEM,
+	XDP_SETUP_XSK_POOL,
 };
 
 struct bpf_prog_offload_ops;
@@ -906,9 +906,9 @@ struct netdev_bpf {
 		struct {
 			struct bpf_offloaded_map *offmap;
 		};
-		/* XDP_SETUP_XSK_UMEM */
+		/* XDP_SETUP_XSK_POOL */
 		struct {
-			struct xdp_umem *umem;
+			struct xsk_buff_pool *pool;
 			u16 queue_id;
 		} xsk;
 	};
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index ccf848f..5dc8d3c 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -14,7 +14,8 @@
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
 bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
 void xsk_umem_consume_tx_done(struct xdp_umem *umem);
-struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
+struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev,
+						u16 queue_id);
 void xsk_set_rx_need_wakeup(struct xdp_umem *umem);
 void xsk_set_tx_need_wakeup(struct xdp_umem *umem);
 void xsk_clear_rx_need_wakeup(struct xdp_umem *umem);
@@ -125,8 +126,8 @@ static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem)
 {
 }
 
-static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
-						     u16 queue_id)
+static inline struct xsk_buff_pool *
+xdp_get_xsk_pool_from_qid(struct net_device *dev, u16 queue_id)
 {
 	return NULL;
 }
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index a4ff226..a6dec9c 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -13,6 +13,7 @@ struct xsk_buff_pool;
 struct xdp_rxq_info;
 struct xsk_queue;
 struct xdp_desc;
+struct xdp_umem;
 struct device;
 struct page;
 
@@ -42,13 +43,14 @@ struct xsk_buff_pool {
 	u32 frame_len;
 	bool cheap_dma;
 	bool unaligned;
+	struct xdp_umem *umem;
 	void *addrs;
 	struct device *dev;
 	struct xdp_buff_xsk *free_heads[];
 };
 
 /* AF_XDP core. */
-struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
+struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
 				u32 chunk_size, u32 headroom, u64 size,
 				bool unaligned);
 void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq);
diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
index 9ef54cd..78d990b 100644
--- a/net/ethtool/channels.c
+++ b/net/ethtool/channels.c
@@ -223,7 +223,7 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info)
 	from_channel = channels.combined_count +
 		       min(channels.rx_count, channels.tx_count);
 	for (i = from_channel; i < old_total; i++)
-		if (xdp_get_umem_from_qid(dev, i)) {
+		if (xdp_get_xsk_pool_from_qid(dev, i)) {
 			GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets");
 			return -EINVAL;
 		}
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index b5df90c..91de16d 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1702,7 +1702,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 		min(channels.rx_count, channels.tx_count);
 	to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count);
 	for (i = from_channel; i < to_channel; i++)
-		if (xdp_get_umem_from_qid(dev, i))
+		if (xdp_get_xsk_pool_from_qid(dev, i))
 			return -EINVAL;
 
 	ret = dev->ethtool_ops->set_channels(dev, &channels);
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index e97db37..0b5f3b0 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -51,8 +51,9 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
  * not know if the device has more tx queues than rx, or the opposite.
  * This might also change during run time.
  */
-static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
-			       u16 queue_id)
+static int xdp_reg_xsk_pool_at_qid(struct net_device *dev,
+				   struct xsk_buff_pool *pool,
+				   u16 queue_id)
 {
 	if (queue_id >= max_t(unsigned int,
 			      dev->real_num_rx_queues,
@@ -60,31 +61,31 @@ static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
 		return -EINVAL;
 
 	if (queue_id < dev->real_num_rx_queues)
-		dev->_rx[queue_id].umem = umem;
+		dev->_rx[queue_id].pool = pool;
 	if (queue_id < dev->real_num_tx_queues)
-		dev->_tx[queue_id].umem = umem;
+		dev->_tx[queue_id].pool = pool;
 
 	return 0;
 }
 
-struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
-				       u16 queue_id)
+struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev,
+						u16 queue_id)
 {
 	if (queue_id < dev->real_num_rx_queues)
-		return dev->_rx[queue_id].umem;
+		return dev->_rx[queue_id].pool;
 	if (queue_id < dev->real_num_tx_queues)
-		return dev->_tx[queue_id].umem;
+		return dev->_tx[queue_id].pool;
 
 	return NULL;
 }
-EXPORT_SYMBOL(xdp_get_umem_from_qid);
+EXPORT_SYMBOL(xdp_get_xsk_pool_from_qid);
 
-static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
+static void xdp_clear_xsk_pool_at_qid(struct net_device *dev, u16 queue_id)
 {
 	if (queue_id < dev->real_num_rx_queues)
-		dev->_rx[queue_id].umem = NULL;
+		dev->_rx[queue_id].pool = NULL;
 	if (queue_id < dev->real_num_tx_queues)
-		dev->_tx[queue_id].umem = NULL;
+		dev->_tx[queue_id].pool = NULL;
 }
 
 int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
@@ -102,10 +103,10 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	if (force_zc && force_copy)
 		return -EINVAL;
 
-	if (xdp_get_umem_from_qid(dev, queue_id))
+	if (xdp_get_xsk_pool_from_qid(dev, queue_id))
 		return -EBUSY;
 
-	err = xdp_reg_umem_at_qid(dev, umem, queue_id);
+	err = xdp_reg_xsk_pool_at_qid(dev, umem->pool, queue_id);
 	if (err)
 		return err;
 
@@ -132,8 +133,8 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 		goto err_unreg_umem;
 	}
 
-	bpf.command = XDP_SETUP_XSK_UMEM;
-	bpf.xsk.umem = umem;
+	bpf.command = XDP_SETUP_XSK_POOL;
+	bpf.xsk.pool = umem->pool;
 	bpf.xsk.queue_id = queue_id;
 
 	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
@@ -147,7 +148,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	if (!force_zc)
 		err = 0; /* fallback to copy mode */
 	if (err)
-		xdp_clear_umem_at_qid(dev, queue_id);
+		xdp_clear_xsk_pool_at_qid(dev, queue_id);
 	return err;
 }
 
@@ -162,8 +163,8 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 		return;
 
 	if (umem->zc) {
-		bpf.command = XDP_SETUP_XSK_UMEM;
-		bpf.xsk.umem = NULL;
+		bpf.command = XDP_SETUP_XSK_POOL;
+		bpf.xsk.pool = NULL;
 		bpf.xsk.queue_id = umem->queue_id;
 
 		err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
@@ -172,7 +173,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 			WARN(1, "failed to disable umem!\n");
 	}
 
-	xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
+	xdp_clear_xsk_pool_at_qid(umem->dev, umem->queue_id);
 
 	dev_put(umem->dev);
 	umem->dev = NULL;
@@ -373,8 +374,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (err)
 		goto out_account;
 
-	umem->pool = xp_create(umem->pgs, umem->npgs, chunks, chunk_size,
-			       headroom, size, unaligned_chunks);
+	umem->pool = xp_create(umem, chunks, chunk_size, headroom, size,
+			       unaligned_chunks);
 	if (!umem->pool) {
 		err = -ENOMEM;
 		goto out_pin;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 540ed75..c57f0bb 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -32,7 +32,7 @@ void xp_destroy(struct xsk_buff_pool *pool)
 	kvfree(pool);
 }
 
-struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
+struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
 				u32 chunk_size, u32 headroom, u64 size,
 				bool unaligned)
 {
@@ -58,6 +58,7 @@ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
 	pool->cheap_dma = true;
 	pool->unaligned = unaligned;
 	pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM;
+	pool->umem = umem;
 	INIT_LIST_HEAD(&pool->free_list);
 
 	for (i = 0; i < pool->free_heads_cnt; i++) {
@@ -67,7 +68,7 @@ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
 		pool->free_heads[i] = xskb;
 	}
 
-	err = xp_addr_map(pool, pages, nr_pages);
+	err = xp_addr_map(pool, umem->pgs, umem->npgs);
 	if (!err)
 		return pool;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 02/14] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 03/14] xsk: create and free context independently from umem Magnus Karlsson
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Rename the AF_XDP zero-copy driver interface functions to better
reflect what they do after the replacement of umems with buffer
pools in the previous commit. Mostly it is about replacing the
umem name from the function names with xsk_buff and also have
them take the a buffer pool pointer instead of a umem. The
various ring functions have also been renamed in the process so
that they have the same naming convention as the internal
functions in xsk_queue.h. This so that it will be clearer what
they do and also for consistency.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c        |   6 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c         |  34 +++---
 drivers/net/ethernet/intel/ice/ice_base.c          |   6 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c           |  34 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |  32 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |   8 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h    |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c  |  12 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   4 +-
 include/net/xdp_sock.h                             |   1 +
 include/net/xdp_sock_drv.h                         | 114 +++++++++++----------
 net/ethtool/channels.c                             |   2 +-
 net/ethtool/ioctl.c                                |   2 +-
 net/xdp/xdp_umem.c                                 |  24 ++---
 net/xdp/xsk.c                                      |  45 ++++----
 19 files changed, 182 insertions(+), 170 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3df725e..73dded7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3119,7 +3119,7 @@ static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring)
 	if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps))
 		return NULL;
 
-	return xdp_get_xsk_pool_from_qid(ring->vsi->netdev, qid);
+	return xsk_get_pool_from_qid(ring->vsi->netdev, qid);
 }
 
 /**
@@ -3267,7 +3267,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 		if (ret)
 			return ret;
 		ring->rx_buf_len =
-		  xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
+		  xsk_pool_get_rx_frame_size(ring->xsk_pool);
 		/* For AF_XDP ZC, we disallow packets to span on
 		 * multiple buffers, thus letting us skip that
 		 * handling in the fast-path.
@@ -3351,7 +3351,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	writel(0, ring->tail);
 
 	if (ring->xsk_pool) {
-		xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
+		xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
 		ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring));
 	} else {
 		ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index d7ebdf6..ebaf0bd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -55,8 +55,7 @@ static int i40e_xsk_pool_enable(struct i40e_vsi *vsi,
 	    qid >= netdev->real_num_tx_queues)
 		return -EINVAL;
 
-	err = xsk_buff_dma_map(pool->umem, &vsi->back->pdev->dev,
-			       I40E_RX_DMA_ATTR);
+	err = xsk_pool_dma_map(pool, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR);
 	if (err)
 		return err;
 
@@ -97,7 +96,7 @@ static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid)
 	bool if_running;
 	int err;
 
-	pool = xdp_get_xsk_pool_from_qid(netdev, qid);
+	pool = xsk_get_pool_from_qid(netdev, qid);
 	if (!pool)
 		return -EINVAL;
 
@@ -110,7 +109,7 @@ static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid)
 	}
 
 	clear_bit(qid, vsi->af_xdp_zc_qps);
-	xsk_buff_dma_unmap(pool->umem, I40E_RX_DMA_ATTR);
+	xsk_pool_dma_unmap(pool, I40E_RX_DMA_ATTR);
 
 	if (if_running) {
 		err = i40e_queue_pair_enable(vsi, qid);
@@ -196,7 +195,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 	rx_desc = I40E_RX_DESC(rx_ring, ntu);
 	bi = i40e_rx_bi(rx_ring, ntu);
 	do {
-		xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
+		xdp = xsk_buff_alloc(rx_ring->xsk_pool);
 		if (!xdp) {
 			ok = false;
 			goto no_buffers;
@@ -363,11 +362,11 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
 	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
 
-	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
+	if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
 		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
-			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
+			xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
 		else
-			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
+			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
 
 		return (int)total_rx_packets;
 	}
@@ -396,12 +395,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc))
+		if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc))
 			break;
 
-		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem,
-					   desc.addr);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma,
+		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr);
+		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma,
 						 desc.len);
 
 		tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
@@ -425,7 +423,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 						 I40E_TXD_QW1_CMD_SHIFT);
 		i40e_xdp_ring_update_tail(xdp_ring);
 
-		xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem);
+		xsk_tx_release(xdp_ring->xsk_pool);
 	}
 
 	return !!budget && work_done;
@@ -498,14 +496,14 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi,
 		tx_ring->next_to_clean -= tx_ring->count;
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(bp->umem, xsk_frames);
+		xsk_tx_completed(bp, xsk_frames);
 
 	i40e_arm_wb(tx_ring, vsi, budget);
 	i40e_update_tx_stats(tx_ring, completed_frames, total_bytes);
 
 out_xmit:
-	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_pool->umem))
-		xsk_set_tx_need_wakeup(tx_ring->xsk_pool->umem);
+	if (xsk_uses_need_wakeup(tx_ring->xsk_pool))
+		xsk_set_tx_need_wakeup(tx_ring->xsk_pool);
 
 	xmit_done = i40e_xmit_zc(tx_ring, budget);
 
@@ -598,7 +596,7 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(bp->umem, xsk_frames);
+		xsk_tx_completed(bp, xsk_frames);
 }
 
 /**
@@ -614,7 +612,7 @@ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
 	int i;
 
 	for (i = 0; i < vsi->num_queue_pairs; i++) {
-		if (xdp_get_xsk_pool_from_qid(netdev, i))
+		if (xsk_get_pool_from_qid(netdev, i))
 			return true;
 	}
 
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 94dbf89..16fbc79 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -313,7 +313,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 			xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
 
 			ring->rx_buf_len =
-				xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
+				xsk_pool_get_rx_frame_size(ring->xsk_pool);
 			/* For AF_XDP ZC, we disallow packets to span on
 			 * multiple buffers, thus letting us skip that
 			 * handling in the fast-path.
@@ -324,7 +324,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 							 NULL);
 			if (err)
 				return err;
-			xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
+			xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
 
 			dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
 				 ring->q_index);
@@ -418,7 +418,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 	writel(0, ring->tail);
 
 	if (ring->xsk_pool) {
-		if (!xsk_buff_can_alloc(ring->xsk_pool->umem, num_bufs)) {
+		if (!xsk_buff_can_alloc(ring->xsk_pool, num_bufs)) {
 			dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n",
 				 num_bufs, ring->q_index);
 			dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n");
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index f0ce669..6430df2 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -310,7 +310,7 @@ static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid)
 	    !vsi->xsk_pools[qid])
 		return -EINVAL;
 
-	xsk_buff_dma_unmap(vsi->xsk_pools[qid]->umem, ICE_RX_DMA_ATTR);
+	xsk_pool_dma_unmap(vsi->xsk_pools[qid], ICE_RX_DMA_ATTR);
 	ice_xsk_remove_pool(vsi, qid);
 
 	return 0;
@@ -347,7 +347,7 @@ ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
 	vsi->xsk_pools[qid] = pool;
 	vsi->num_xsk_pools_used++;
 
-	err = xsk_buff_dma_map(vsi->xsk_pools[qid]->umem, ice_pf_to_dev(vsi->back),
+	err = xsk_pool_dma_map(vsi->xsk_pools[qid], ice_pf_to_dev(vsi->back),
 			       ICE_RX_DMA_ATTR);
 	if (err)
 		return err;
@@ -424,7 +424,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
 	rx_buf = &rx_ring->rx_buf[ntu];
 
 	do {
-		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
+		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool);
 		if (!rx_buf->xdp) {
 			ret = true;
 			break;
@@ -645,11 +645,11 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 	ice_finalize_xdp_rx(rx_ring, xdp_xmit);
 	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
 
-	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
+	if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
 		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
-			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
+			xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
 		else
-			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
+			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
 
 		return (int)total_rx_packets;
 	}
@@ -682,11 +682,11 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
 
 		tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use];
 
-		if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc))
+		if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc))
 			break;
 
-		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, desc.addr);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma,
+		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr);
+		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma,
 						 desc.len);
 
 		tx_buf->bytecount = desc.len;
@@ -703,9 +703,9 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
 
 	if (tx_desc) {
 		ice_xdp_ring_update_tail(xdp_ring);
-		xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem);
-		if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem))
-			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem);
+		xsk_tx_release(xdp_ring->xsk_pool);
+		if (xsk_uses_need_wakeup(xdp_ring->xsk_pool))
+			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool);
 	}
 
 	return budget > 0 && work_done;
@@ -779,13 +779,13 @@ bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget)
 	xdp_ring->next_to_clean = ntc;
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames);
+		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
 
-	if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) {
+	if (xsk_uses_need_wakeup(xdp_ring->xsk_pool)) {
 		if (xdp_ring->next_to_clean == xdp_ring->next_to_use)
-			xsk_set_tx_need_wakeup(xdp_ring->xsk_pool->umem);
+			xsk_set_tx_need_wakeup(xdp_ring->xsk_pool);
 		else
-			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem);
+			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool);
 	}
 
 	ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes);
@@ -902,5 +902,5 @@ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames);
+		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
 }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3217000..5d1c786 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3716,7 +3716,7 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
 
 	/* configure the packet buffer length */
 	if (rx_ring->xsk_pool) {
-		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_pool->umem);
+		u32 xsk_buf_len = xsk_pool_get_rx_frame_size(rx_ring->xsk_pool);
 
 		/* If the MAC support setting RXDCTL.RLPML, the
 		 * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and
@@ -4066,7 +4066,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
 						   MEM_TYPE_XSK_BUFF_POOL,
 						   NULL));
-		xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
+		xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
 	} else {
 		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
 						   MEM_TYPE_PAGE_SHARED, NULL));
@@ -4122,7 +4122,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	}
 
 	if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) {
-		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
+		u32 xsk_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool);
 
 		rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
 			    IXGBE_RXDCTL_RLPML_EN);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 9f503d6..f07cd41 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -17,7 +17,7 @@ struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter,
 	if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps))
 		return NULL;
 
-	return xdp_get_xsk_pool_from_qid(adapter->netdev, qid);
+	return xsk_get_pool_from_qid(adapter->netdev, qid);
 }
 
 static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter,
@@ -35,7 +35,7 @@ static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter,
 	    qid >= netdev->real_num_tx_queues)
 		return -EINVAL;
 
-	err = xsk_buff_dma_map(pool->umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR);
+	err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR);
 	if (err)
 		return err;
 
@@ -64,7 +64,7 @@ static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid)
 	struct xsk_buff_pool *pool;
 	bool if_running;
 
-	pool = xdp_get_xsk_pool_from_qid(adapter->netdev, qid);
+	pool = xsk_get_pool_from_qid(adapter->netdev, qid);
 	if (!pool)
 		return -EINVAL;
 
@@ -75,7 +75,7 @@ static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid)
 		ixgbe_txrx_ring_disable(adapter, qid);
 
 	clear_bit(qid, adapter->af_xdp_zc_qps);
-	xsk_buff_dma_unmap(pool->umem, IXGBE_RX_DMA_ATTR);
+	xsk_pool_dma_unmap(pool, IXGBE_RX_DMA_ATTR);
 
 	if (if_running)
 		ixgbe_txrx_ring_enable(adapter, qid);
@@ -150,7 +150,7 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
 	i -= rx_ring->count;
 
 	do {
-		bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
+		bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool);
 		if (!bi->xdp) {
 			ok = false;
 			break;
@@ -345,11 +345,11 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	q_vector->rx.total_packets += total_rx_packets;
 	q_vector->rx.total_bytes += total_rx_bytes;
 
-	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
+	if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
 		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
-			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
+			xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
 		else
-			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
+			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
 
 		return (int)total_rx_packets;
 	}
@@ -389,11 +389,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(pool->umem, &desc))
+		if (!xsk_tx_peek_desc(pool, &desc))
 			break;
 
-		dma = xsk_buff_raw_get_dma(pool->umem, desc.addr);
-		xsk_buff_raw_dma_sync_for_device(pool->umem, dma, desc.len);
+		dma = xsk_buff_raw_get_dma(pool, desc.addr);
+		xsk_buff_raw_dma_sync_for_device(pool, dma, desc.len);
 
 		tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
 		tx_bi->bytecount = desc.len;
@@ -419,7 +419,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
 
 	if (tx_desc) {
 		ixgbe_xdp_ring_update_tail(xdp_ring);
-		xsk_umem_consume_tx_done(pool->umem);
+		xsk_tx_release(pool);
 	}
 
 	return !!budget && work_done;
@@ -485,10 +485,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 	q_vector->tx.total_packets += total_packets;
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(pool->umem, xsk_frames);
+		xsk_tx_completed(pool, xsk_frames);
 
-	if (xsk_umem_uses_need_wakeup(pool->umem))
-		xsk_set_tx_need_wakeup(pool->umem);
+	if (xsk_uses_need_wakeup(pool))
+		xsk_set_tx_need_wakeup(pool);
 
 	return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
 }
@@ -547,5 +547,5 @@ void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(pool->umem, xsk_frames);
+		xsk_tx_completed(pool, xsk_frames);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 0a5a873..d6c7596 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -446,7 +446,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
 	} while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->pool->umem, xsk_frames);
+		xsk_tx_completed(sq->pool, xsk_frames);
 
 	sq->stats->cqes += i;
 
@@ -476,7 +476,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->pool->umem, xsk_frames);
+		xsk_tx_completed(sq->pool, xsk_frames);
 }
 
 int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index 3dd056a..7f88ccf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -22,7 +22,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
 					    struct mlx5e_dma_info *dma_info)
 {
-	dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem);
+	dma_info->xsk = xsk_buff_alloc(rq->xsk_pool);
 	if (!dma_info->xsk)
 		return -ENOMEM;
 
@@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
 
 static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err)
 {
-	if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem))
+	if (!xsk_uses_need_wakeup(rq->xsk_pool))
 		return alloc_err;
 
 	if (unlikely(alloc_err))
-		xsk_set_rx_need_wakeup(rq->xsk_pool->umem);
+		xsk_set_rx_need_wakeup(rq->xsk_pool);
 	else
-		xsk_clear_rx_need_wakeup(rq->xsk_pool->umem);
+		xsk_clear_rx_need_wakeup(rq->xsk_pool);
 
 	return false;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index abe4639..debcc70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -83,7 +83,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(pool->umem, &desc)) {
+		if (!xsk_tx_peek_desc(pool, &desc)) {
 			/* TX will get stuck until something wakes it up by
 			 * triggering NAPI. Currently it's expected that the
 			 * application calls sendto() if there are consumed, but
@@ -92,11 +92,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr);
-		xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr);
+		xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr);
+		xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr);
 		xdptxd.len = desc.len;
 
-		xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len);
+		xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len);
 
 		if (unlikely(!sq->xmit_xdp_frame(sq, &xdptxd, &xdpi, check_result))) {
 			if (sq->mpwqe.wqe)
@@ -113,7 +113,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			mlx5e_xdp_mpwqe_complete(sq);
 		mlx5e_xmit_xdp_doorbell(sq);
 
-		xsk_umem_consume_tx_done(pool->umem);
+		xsk_tx_release(pool);
 	}
 
 	return !(budget && work_done);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
index 610a084..5821e88 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
@@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget);
 
 static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq)
 {
-	if (!xsk_umem_uses_need_wakeup(sq->pool->umem))
+	if (!xsk_uses_need_wakeup(sq->pool))
 		return;
 
 	if (sq->pc != sq->cc)
-		xsk_clear_tx_need_wakeup(sq->pool->umem);
+		xsk_clear_tx_need_wakeup(sq->pool);
 	else
-		xsk_set_tx_need_wakeup(sq->pool->umem);
+		xsk_set_tx_need_wakeup(sq->pool);
 }
 
 #endif /* __MLX5_EN_XSK_TX_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
index 947abf1..cb70870 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
@@ -11,13 +11,13 @@ static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv,
 {
 	struct device *dev = priv->mdev->device;
 
-	return xsk_buff_dma_map(pool->umem, dev, 0);
+	return xsk_pool_dma_map(pool, dev, 0);
 }
 
 static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv,
 				 struct xsk_buff_pool *pool)
 {
-	return xsk_buff_dma_unmap(pool->umem, 0);
+	return xsk_pool_dma_unmap(pool, 0);
 }
 
 static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk)
@@ -64,14 +64,14 @@ static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix)
 
 static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool)
 {
-	return xsk_umem_get_headroom(pool->umem) <= 0xffff &&
-		xsk_umem_get_chunk_size(pool->umem) <= 0xffff;
+	return xsk_pool_get_headroom(pool) <= 0xffff &&
+		xsk_pool_get_chunk_size(pool) <= 0xffff;
 }
 
 void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk)
 {
-	xsk->headroom = xsk_umem_get_headroom(pool->umem);
-	xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem);
+	xsk->headroom = xsk_pool_get_headroom(pool);
+	xsk->chunk_size = xsk_pool_get_chunk_size(pool);
 }
 
 static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2b4a3e3..695b993 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -518,7 +518,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	if (xsk) {
 		err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
 						 MEM_TYPE_XSK_BUFF_POOL, NULL);
-		xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq);
+		xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq);
 	} else {
 		/* Create a page_pool and register it with rxq */
 		pp_params.order     = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1dcf77d..030f6d7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -390,7 +390,7 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
 		 * allocating one-by-one, failing and moving frames to the
 		 * Reuse Ring.
 		 */
-		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired)))
+		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired)))
 			return -ENOMEM;
 	}
 
@@ -489,7 +489,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	 * one-by-one, failing and moving frames to the Reuse Ring.
 	 */
 	if (rq->xsk_pool &&
-	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
+	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool, MLX5_MPWRQ_PAGES_PER_WQE))) {
 		err = -ENOMEM;
 		goto err;
 	}
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 96bfc5f..6eb9628 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -52,6 +52,7 @@ struct xdp_sock {
 	struct net_device *dev;
 	struct xdp_umem *umem;
 	struct list_head flush_node;
+	struct xsk_buff_pool *pool;
 	u16 queue_id;
 	bool zc;
 	enum {
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 5dc8d3c..a7c7d2e 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -11,48 +11,50 @@
 
 #ifdef CONFIG_XDP_SOCKETS
 
-void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
-bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
-void xsk_umem_consume_tx_done(struct xdp_umem *umem);
-struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev,
-						u16 queue_id);
-void xsk_set_rx_need_wakeup(struct xdp_umem *umem);
-void xsk_set_tx_need_wakeup(struct xdp_umem *umem);
-void xsk_clear_rx_need_wakeup(struct xdp_umem *umem);
-void xsk_clear_tx_need_wakeup(struct xdp_umem *umem);
-bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem);
+void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries);
+bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc);
+void xsk_tx_release(struct xsk_buff_pool *pool);
+struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
+					    u16 queue_id);
+void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool);
+void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool);
+void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool);
+void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool);
+bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool);
 
-static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem)
+static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
 {
-	return XDP_PACKET_HEADROOM + umem->headroom;
+	return XDP_PACKET_HEADROOM + pool->headroom;
 }
 
-static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem)
+static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
 {
-	return umem->chunk_size;
+	return pool->chunk_size;
 }
 
-static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem)
+static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
 {
-	return xsk_umem_get_chunk_size(umem) - xsk_umem_get_headroom(umem);
+	return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool);
 }
 
-static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem,
+static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
 					 struct xdp_rxq_info *rxq)
 {
-	xp_set_rxq_info(umem->pool, rxq);
+	xp_set_rxq_info(pool, rxq);
 }
 
-static inline void xsk_buff_dma_unmap(struct xdp_umem *umem,
+static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
 				      unsigned long attrs)
 {
-	xp_dma_unmap(umem->pool, attrs);
+	xp_dma_unmap(pool, attrs);
 }
 
-static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev,
-				   unsigned long attrs)
+static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool,
+				   struct device *dev, unsigned long attrs)
 {
-	return xp_dma_map(umem->pool, dev, attrs, umem->pgs, umem->npgs);
+	struct xdp_umem *umem = pool->umem;
+
+	return xp_dma_map(pool, dev, attrs, umem->pgs, umem->npgs);
 }
 
 static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp)
@@ -69,14 +71,14 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp)
 	return xp_get_frame_dma(xskb);
 }
 
-static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem)
+static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
 {
-	return xp_alloc(umem->pool);
+	return xp_alloc(pool);
 }
 
-static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count)
+static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count)
 {
-	return xp_can_alloc(umem->pool, count);
+	return xp_can_alloc(pool, count);
 }
 
 static inline void xsk_buff_free(struct xdp_buff *xdp)
@@ -86,14 +88,15 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
 	xp_free(xskb);
 }
 
-static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr)
+static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool,
+					      u64 addr)
 {
-	return xp_raw_get_dma(umem->pool, addr);
+	return xp_raw_get_dma(pool, addr);
 }
 
-static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr)
+static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 {
-	return xp_raw_get_data(umem->pool, addr);
+	return xp_raw_get_data(pool, addr);
 }
 
 static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp)
@@ -103,83 +106,83 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp)
 	xp_dma_sync_for_cpu(xskb);
 }
 
-static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem,
+static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 						    dma_addr_t dma,
 						    size_t size)
 {
-	xp_dma_sync_for_device(umem->pool, dma, size);
+	xp_dma_sync_for_device(pool, dma, size);
 }
 
 #else
 
-static inline void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries)
+static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
 {
 }
 
-static inline bool xsk_umem_consume_tx(struct xdp_umem *umem,
-				       struct xdp_desc *desc)
+static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool,
+				    struct xdp_desc *desc)
 {
 	return false;
 }
 
-static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem)
+static inline void xsk_tx_release(struct xsk_buff_pool *pool)
 {
 }
 
 static inline struct xsk_buff_pool *
-xdp_get_xsk_pool_from_qid(struct net_device *dev, u16 queue_id)
+xsk_get_pool_from_qid(struct net_device *dev, u16 queue_id)
 {
 	return NULL;
 }
 
-static inline void xsk_set_rx_need_wakeup(struct xdp_umem *umem)
+static inline void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
 {
 }
 
-static inline void xsk_set_tx_need_wakeup(struct xdp_umem *umem)
+static inline void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool)
 {
 }
 
-static inline void xsk_clear_rx_need_wakeup(struct xdp_umem *umem)
+static inline void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool)
 {
 }
 
-static inline void xsk_clear_tx_need_wakeup(struct xdp_umem *umem)
+static inline void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool)
 {
 }
 
-static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
+static inline bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
 {
 	return false;
 }
 
-static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem)
+static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
 {
 	return 0;
 }
 
-static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem)
+static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
 {
 	return 0;
 }
 
-static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem)
+static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
 {
 	return 0;
 }
 
-static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem,
+static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
 					 struct xdp_rxq_info *rxq)
 {
 }
 
-static inline void xsk_buff_dma_unmap(struct xdp_umem *umem,
+static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
 				      unsigned long attrs)
 {
 }
 
-static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev,
-				   unsigned long attrs)
+static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool,
+				   struct device *dev, unsigned long attrs)
 {
 	return 0;
 }
@@ -194,12 +197,12 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp)
 	return 0;
 }
 
-static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem)
+static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
 {
 	return NULL;
 }
 
-static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count)
+static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count)
 {
 	return false;
 }
@@ -208,12 +211,13 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
 {
 }
 
-static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr)
+static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool,
+					      u64 addr)
 {
 	return 0;
 }
 
-static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr)
+static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 {
 	return NULL;
 }
@@ -222,7 +226,7 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp)
 {
 }
 
-static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem,
+static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 						    dma_addr_t dma,
 						    size_t size)
 {
diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
index 78d990b..9ecda09 100644
--- a/net/ethtool/channels.c
+++ b/net/ethtool/channels.c
@@ -223,7 +223,7 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info)
 	from_channel = channels.combined_count +
 		       min(channels.rx_count, channels.tx_count);
 	for (i = from_channel; i < old_total; i++)
-		if (xdp_get_xsk_pool_from_qid(dev, i)) {
+		if (xsk_get_pool_from_qid(dev, i)) {
 			GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets");
 			return -EINVAL;
 		}
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 91de16d..2d94306 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1702,7 +1702,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 		min(channels.rx_count, channels.tx_count);
 	to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count);
 	for (i = from_channel; i < to_channel; i++)
-		if (xdp_get_xsk_pool_from_qid(dev, i))
+		if (xsk_get_pool_from_qid(dev, i))
 			return -EINVAL;
 
 	ret = dev->ethtool_ops->set_channels(dev, &channels);
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 0b5f3b0..adde4d5 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -51,9 +51,9 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
  * not know if the device has more tx queues than rx, or the opposite.
  * This might also change during run time.
  */
-static int xdp_reg_xsk_pool_at_qid(struct net_device *dev,
-				   struct xsk_buff_pool *pool,
-				   u16 queue_id)
+static int xsk_reg_pool_at_qid(struct net_device *dev,
+			       struct xsk_buff_pool *pool,
+			       u16 queue_id)
 {
 	if (queue_id >= max_t(unsigned int,
 			      dev->real_num_rx_queues,
@@ -68,8 +68,8 @@ static int xdp_reg_xsk_pool_at_qid(struct net_device *dev,
 	return 0;
 }
 
-struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev,
-						u16 queue_id)
+struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
+					    u16 queue_id)
 {
 	if (queue_id < dev->real_num_rx_queues)
 		return dev->_rx[queue_id].pool;
@@ -78,9 +78,9 @@ struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev,
 
 	return NULL;
 }
-EXPORT_SYMBOL(xdp_get_xsk_pool_from_qid);
+EXPORT_SYMBOL(xsk_get_pool_from_qid);
 
-static void xdp_clear_xsk_pool_at_qid(struct net_device *dev, u16 queue_id)
+static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id)
 {
 	if (queue_id < dev->real_num_rx_queues)
 		dev->_rx[queue_id].pool = NULL;
@@ -103,10 +103,10 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	if (force_zc && force_copy)
 		return -EINVAL;
 
-	if (xdp_get_xsk_pool_from_qid(dev, queue_id))
+	if (xsk_get_pool_from_qid(dev, queue_id))
 		return -EBUSY;
 
-	err = xdp_reg_xsk_pool_at_qid(dev, umem->pool, queue_id);
+	err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id);
 	if (err)
 		return err;
 
@@ -119,7 +119,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 		 * Also for supporting drivers that do not implement this
 		 * feature. They will always have to call sendto().
 		 */
-		xsk_set_tx_need_wakeup(umem);
+		xsk_set_tx_need_wakeup(umem->pool);
 	}
 
 	dev_hold(dev);
@@ -148,7 +148,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	if (!force_zc)
 		err = 0; /* fallback to copy mode */
 	if (err)
-		xdp_clear_xsk_pool_at_qid(dev, queue_id);
+		xsk_clear_pool_at_qid(dev, queue_id);
 	return err;
 }
 
@@ -173,7 +173,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 			WARN(1, "failed to disable umem!\n");
 	}
 
-	xdp_clear_xsk_pool_at_qid(umem->dev, umem->queue_id);
+	xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
 
 	dev_put(umem->dev);
 	umem->dev = NULL;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 3700266..7551f5b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -39,8 +39,10 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 		READ_ONCE(xs->umem->fq);
 }
 
-void xsk_set_rx_need_wakeup(struct xdp_umem *umem)
+void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
 {
+	struct xdp_umem *umem = pool->umem;
+
 	if (umem->need_wakeup & XDP_WAKEUP_RX)
 		return;
 
@@ -49,8 +51,9 @@ void xsk_set_rx_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_set_rx_need_wakeup);
 
-void xsk_set_tx_need_wakeup(struct xdp_umem *umem)
+void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool)
 {
+	struct xdp_umem *umem = pool->umem;
 	struct xdp_sock *xs;
 
 	if (umem->need_wakeup & XDP_WAKEUP_TX)
@@ -66,8 +69,10 @@ void xsk_set_tx_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_set_tx_need_wakeup);
 
-void xsk_clear_rx_need_wakeup(struct xdp_umem *umem)
+void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool)
 {
+	struct xdp_umem *umem = pool->umem;
+
 	if (!(umem->need_wakeup & XDP_WAKEUP_RX))
 		return;
 
@@ -76,8 +81,9 @@ void xsk_clear_rx_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_clear_rx_need_wakeup);
 
-void xsk_clear_tx_need_wakeup(struct xdp_umem *umem)
+void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool)
 {
+	struct xdp_umem *umem = pool->umem;
 	struct xdp_sock *xs;
 
 	if (!(umem->need_wakeup & XDP_WAKEUP_TX))
@@ -93,11 +99,11 @@ void xsk_clear_tx_need_wakeup(struct xdp_umem *umem)
 }
 EXPORT_SYMBOL(xsk_clear_tx_need_wakeup);
 
-bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
+bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
 {
-	return umem->flags & XDP_UMEM_USES_NEED_WAKEUP;
+	return pool->umem->flags & XDP_UMEM_USES_NEED_WAKEUP;
 }
-EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
+EXPORT_SYMBOL(xsk_uses_need_wakeup);
 
 void xp_release(struct xdp_buff_xsk *xskb)
 {
@@ -155,12 +161,12 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len,
 	struct xdp_buff *xsk_xdp;
 	int err;
 
-	if (len > xsk_umem_get_rx_frame_size(xs->umem)) {
+	if (len > xsk_pool_get_rx_frame_size(xs->pool)) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
-	xsk_xdp = xsk_buff_alloc(xs->umem);
+	xsk_xdp = xsk_buff_alloc(xs->pool);
 	if (!xsk_xdp) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -249,27 +255,28 @@ void __xsk_map_flush(void)
 	}
 }
 
-void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries)
+void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
 {
-	xskq_prod_submit_n(umem->cq, nb_entries);
+	xskq_prod_submit_n(pool->umem->cq, nb_entries);
 }
-EXPORT_SYMBOL(xsk_umem_complete_tx);
+EXPORT_SYMBOL(xsk_tx_completed);
 
-void xsk_umem_consume_tx_done(struct xdp_umem *umem)
+void xsk_tx_release(struct xsk_buff_pool *pool)
 {
 	struct xdp_sock *xs;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
+	list_for_each_entry_rcu(xs, &pool->umem->xsk_tx_list, list) {
 		__xskq_cons_release(xs->tx);
 		xs->sk.sk_write_space(&xs->sk);
 	}
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(xsk_umem_consume_tx_done);
+EXPORT_SYMBOL(xsk_tx_release);
 
-bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
+bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
 {
+	struct xdp_umem *umem = pool->umem;
 	struct xdp_sock *xs;
 
 	rcu_read_lock();
@@ -294,7 +301,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 	rcu_read_unlock();
 	return false;
 }
-EXPORT_SYMBOL(xsk_umem_consume_tx);
+EXPORT_SYMBOL(xsk_tx_peek_desc);
 
 static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
 {
@@ -357,7 +364,7 @@ static int xsk_generic_xmit(struct sock *sk)
 
 		skb_put(skb, len);
 		addr = desc.addr;
-		buffer = xsk_buff_raw_get_data(xs->umem, addr);
+		buffer = xsk_buff_raw_get_data(xs->pool, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
 		/* This is the backpressure mechanism for the Tx path.
 		 * Reserve space in the completion queue and only proceed
@@ -758,6 +765,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			return PTR_ERR(umem);
 		}
 
+		xs->pool = umem->pool;
+
 		/* Make sure umem is ready before it can be seen by others */
 		smp_wmb();
 		WRITE_ONCE(xs->umem, umem);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 03/14] xsk: create and free context independently from umem
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 02/14] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-08 15:00   ` Maxim Mikityanskiy
  2020-07-02 12:19 ` [PATCH bpf-next 04/14] xsk: move fill and completion rings to buffer pool Magnus Karlsson
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Create and free the buffer pool independently from the umem. Move
these operations that are performed on the buffer pool from the
umem create and destroy functions to new create and destroy
functions just for the buffer pool. This so that in later commits
we can instantiate multiple buffer pools per umem when sharing a
umem between HW queues and/or devices. We also erradicate the
back pointer from the umem to the buffer pool as this will not
work when we introduce the possibility to have multiple buffer
pools per umem.

It might seem a bit odd that we create an empty buffer pool first
and then recreate it with its right size when we bind to a device
and umem. But the page pool will in later commits be used to
carry information before it has been assigned to a umem and its
size decided.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h      |   3 +-
 include/net/xsk_buff_pool.h |  14 +++-
 net/xdp/xdp_umem.c          | 164 ++++----------------------------------------
 net/xdp/xdp_umem.h          |   4 +-
 net/xdp/xsk.c               |  83 +++++++++++++++++++---
 net/xdp/xsk.h               |   3 +
 net/xdp/xsk_buff_pool.c     | 154 +++++++++++++++++++++++++++++++++++++----
 net/xdp/xsk_queue.h         |  12 ++--
 8 files changed, 250 insertions(+), 187 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 6eb9628..b9bb118 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -20,13 +20,12 @@ struct xdp_buff;
 struct xdp_umem {
 	struct xsk_queue *fq;
 	struct xsk_queue *cq;
-	struct xsk_buff_pool *pool;
 	u64 size;
 	u32 headroom;
 	u32 chunk_size;
+	u32 chunks;
 	struct user_struct *user;
 	refcount_t users;
-	struct work_struct work;
 	struct page **pgs;
 	u32 npgs;
 	u16 queue_id;
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index a6dec9c..cda8ced 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -14,6 +14,7 @@ struct xdp_rxq_info;
 struct xsk_queue;
 struct xdp_desc;
 struct xdp_umem;
+struct xdp_sock;
 struct device;
 struct page;
 
@@ -46,16 +47,23 @@ struct xsk_buff_pool {
 	struct xdp_umem *umem;
 	void *addrs;
 	struct device *dev;
+	refcount_t users;
+	struct work_struct work;
 	struct xdp_buff_xsk *free_heads[];
 };
 
 /* AF_XDP core. */
-struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
-				u32 chunk_size, u32 headroom, u64 size,
-				bool unaligned);
+struct xsk_buff_pool *xp_create(void);
+struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool,
+				     struct xdp_umem *umem);
+int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
+		  struct net_device *dev, u16 queue_id, u16 flags);
 void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq);
 void xp_destroy(struct xsk_buff_pool *pool);
 void xp_release(struct xdp_buff_xsk *xskb);
+void xp_get_pool(struct xsk_buff_pool *pool);
+void xp_put_pool(struct xsk_buff_pool *pool);
+void xp_clear_dev(struct xsk_buff_pool *pool);
 
 /* AF_XDP, and XDP core. */
 void xp_free(struct xdp_buff_xsk *xskb);
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index adde4d5..f290345 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -47,160 +47,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
 	spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags);
 }
 
-/* The umem is stored both in the _rx struct and the _tx struct as we do
- * not know if the device has more tx queues than rx, or the opposite.
- * This might also change during run time.
- */
-static int xsk_reg_pool_at_qid(struct net_device *dev,
-			       struct xsk_buff_pool *pool,
-			       u16 queue_id)
-{
-	if (queue_id >= max_t(unsigned int,
-			      dev->real_num_rx_queues,
-			      dev->real_num_tx_queues))
-		return -EINVAL;
-
-	if (queue_id < dev->real_num_rx_queues)
-		dev->_rx[queue_id].pool = pool;
-	if (queue_id < dev->real_num_tx_queues)
-		dev->_tx[queue_id].pool = pool;
-
-	return 0;
-}
-
-struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
-					    u16 queue_id)
+static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
-	if (queue_id < dev->real_num_rx_queues)
-		return dev->_rx[queue_id].pool;
-	if (queue_id < dev->real_num_tx_queues)
-		return dev->_tx[queue_id].pool;
+	unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
 
-	return NULL;
+	kfree(umem->pgs);
+	umem->pgs = NULL;
 }
-EXPORT_SYMBOL(xsk_get_pool_from_qid);
 
-static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id)
+static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
 {
-	if (queue_id < dev->real_num_rx_queues)
-		dev->_rx[queue_id].pool = NULL;
-	if (queue_id < dev->real_num_tx_queues)
-		dev->_tx[queue_id].pool = NULL;
+	if (umem->user) {
+		atomic_long_sub(umem->npgs, &umem->user->locked_vm);
+		free_uid(umem->user);
+	}
 }
 
-int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			u16 queue_id, u16 flags)
+void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
+			 u16 queue_id)
 {
-	bool force_zc, force_copy;
-	struct netdev_bpf bpf;
-	int err = 0;
-
-	ASSERT_RTNL();
-
-	force_zc = flags & XDP_ZEROCOPY;
-	force_copy = flags & XDP_COPY;
-
-	if (force_zc && force_copy)
-		return -EINVAL;
-
-	if (xsk_get_pool_from_qid(dev, queue_id))
-		return -EBUSY;
-
-	err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id);
-	if (err)
-		return err;
-
 	umem->dev = dev;
 	umem->queue_id = queue_id;
 
-	if (flags & XDP_USE_NEED_WAKEUP) {
-		umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
-		/* Tx needs to be explicitly woken up the first time.
-		 * Also for supporting drivers that do not implement this
-		 * feature. They will always have to call sendto().
-		 */
-		xsk_set_tx_need_wakeup(umem->pool);
-	}
-
 	dev_hold(dev);
-
-	if (force_copy)
-		/* For copy-mode, we are done. */
-		return 0;
-
-	if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) {
-		err = -EOPNOTSUPP;
-		goto err_unreg_umem;
-	}
-
-	bpf.command = XDP_SETUP_XSK_POOL;
-	bpf.xsk.pool = umem->pool;
-	bpf.xsk.queue_id = queue_id;
-
-	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
-	if (err)
-		goto err_unreg_umem;
-
-	umem->zc = true;
-	return 0;
-
-err_unreg_umem:
-	if (!force_zc)
-		err = 0; /* fallback to copy mode */
-	if (err)
-		xsk_clear_pool_at_qid(dev, queue_id);
-	return err;
 }
 
 void xdp_umem_clear_dev(struct xdp_umem *umem)
 {
-	struct netdev_bpf bpf;
-	int err;
-
-	ASSERT_RTNL();
-
-	if (!umem->dev)
-		return;
-
-	if (umem->zc) {
-		bpf.command = XDP_SETUP_XSK_POOL;
-		bpf.xsk.pool = NULL;
-		bpf.xsk.queue_id = umem->queue_id;
-
-		err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
-
-		if (err)
-			WARN(1, "failed to disable umem!\n");
-	}
-
-	xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
-
 	dev_put(umem->dev);
 	umem->dev = NULL;
 	umem->zc = false;
 }
 
-static void xdp_umem_unpin_pages(struct xdp_umem *umem)
-{
-	unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
-
-	kfree(umem->pgs);
-	umem->pgs = NULL;
-}
-
-static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
-{
-	if (umem->user) {
-		atomic_long_sub(umem->npgs, &umem->user->locked_vm);
-		free_uid(umem->user);
-	}
-}
-
 static void xdp_umem_release(struct xdp_umem *umem)
 {
-	rtnl_lock();
 	xdp_umem_clear_dev(umem);
-	rtnl_unlock();
 
 	ida_simple_remove(&umem_ida, umem->id);
 
@@ -214,20 +95,12 @@ static void xdp_umem_release(struct xdp_umem *umem)
 		umem->cq = NULL;
 	}
 
-	xp_destroy(umem->pool);
 	xdp_umem_unpin_pages(umem);
 
 	xdp_umem_unaccount_pages(umem);
 	kfree(umem);
 }
 
-static void xdp_umem_release_deferred(struct work_struct *work)
-{
-	struct xdp_umem *umem = container_of(work, struct xdp_umem, work);
-
-	xdp_umem_release(umem);
-}
-
 void xdp_get_umem(struct xdp_umem *umem)
 {
 	refcount_inc(&umem->users);
@@ -238,10 +111,8 @@ void xdp_put_umem(struct xdp_umem *umem)
 	if (!umem)
 		return;
 
-	if (refcount_dec_and_test(&umem->users)) {
-		INIT_WORK(&umem->work, xdp_umem_release_deferred);
-		schedule_work(&umem->work);
-	}
+	if (refcount_dec_and_test(&umem->users))
+		xdp_umem_release(umem);
 }
 
 static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
@@ -357,6 +228,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size = chunk_size;
+	umem->chunks = chunks;
 	umem->npgs = (u32)npgs;
 	umem->pgs = NULL;
 	umem->user = NULL;
@@ -374,16 +246,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (err)
 		goto out_account;
 
-	umem->pool = xp_create(umem, chunks, chunk_size, headroom, size,
-			       unaligned_chunks);
-	if (!umem->pool) {
-		err = -ENOMEM;
-		goto out_pin;
-	}
 	return 0;
 
-out_pin:
-	xdp_umem_unpin_pages(umem);
 out_account:
 	xdp_umem_unaccount_pages(umem);
 	return err;
diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
index 32067fe..93e96be 100644
--- a/net/xdp/xdp_umem.h
+++ b/net/xdp/xdp_umem.h
@@ -8,8 +8,8 @@
 
 #include <net/xdp_sock_drv.h>
 
-int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			u16 queue_id, u16 flags);
+void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
+			 u16 queue_id);
 void xdp_umem_clear_dev(struct xdp_umem *umem);
 bool xdp_umem_validate_queues(struct xdp_umem *umem);
 void xdp_get_umem(struct xdp_umem *umem);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 7551f5b..b12a832 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -105,6 +105,46 @@ bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
 }
 EXPORT_SYMBOL(xsk_uses_need_wakeup);
 
+struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
+					    u16 queue_id)
+{
+	if (queue_id < dev->real_num_rx_queues)
+		return dev->_rx[queue_id].pool;
+	if (queue_id < dev->real_num_tx_queues)
+		return dev->_tx[queue_id].pool;
+
+	return NULL;
+}
+EXPORT_SYMBOL(xsk_get_pool_from_qid);
+
+void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id)
+{
+	if (queue_id < dev->real_num_rx_queues)
+		dev->_rx[queue_id].pool = NULL;
+	if (queue_id < dev->real_num_tx_queues)
+		dev->_tx[queue_id].pool = NULL;
+}
+
+/* The buffer pool is stored both in the _rx struct and the _tx struct as we do
+ * not know if the device has more tx queues than rx, or the opposite.
+ * This might also change during run time.
+ */
+int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
+			u16 queue_id)
+{
+	if (queue_id >= max_t(unsigned int,
+			      dev->real_num_rx_queues,
+			      dev->real_num_tx_queues))
+		return -EINVAL;
+
+	if (queue_id < dev->real_num_rx_queues)
+		dev->_rx[queue_id].pool = pool;
+	if (queue_id < dev->real_num_tx_queues)
+		dev->_tx[queue_id].pool = pool;
+
+	return 0;
+}
+
 void xp_release(struct xdp_buff_xsk *xskb)
 {
 	xskb->pool->free_heads[xskb->pool->free_heads_cnt++] = xskb;
@@ -281,7 +321,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
-		if (!xskq_cons_peek_desc(xs->tx, desc, umem))
+		if (!xskq_cons_peek_desc(xs->tx, desc, pool))
 			continue;
 
 		/* This is the backpressure mechanism for the Tx path.
@@ -347,7 +387,7 @@ static int xsk_generic_xmit(struct sock *sk)
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_cons_peek_desc(xs->tx, &desc, xs->umem)) {
+	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -629,6 +669,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	qid = sxdp->sxdp_queue_id;
 
 	if (flags & XDP_SHARED_UMEM) {
+		struct xsk_buff_pool *curr_pool;
 		struct xdp_sock *umem_xs;
 		struct socket *sock;
 
@@ -663,6 +704,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 			goto out_unlock;
 		}
 
+		/* Share the buffer pool with the other socket. */
+		xp_get_pool(umem_xs->pool);
+		curr_pool = xs->pool;
+		xs->pool = umem_xs->pool;
+		xp_destroy(curr_pool);
 		xdp_get_umem(umem_xs->umem);
 		WRITE_ONCE(xs->umem, umem_xs->umem);
 		sockfd_put(sock);
@@ -670,10 +716,24 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = -EINVAL;
 		goto out_unlock;
 	} else {
+		struct xsk_buff_pool *new_pool;
+
 		/* This xsk has its own umem. */
-		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
-		if (err)
+		xdp_umem_assign_dev(xs->umem, dev, qid);
+		new_pool = xp_assign_umem(xs->pool, xs->umem);
+		if (!new_pool) {
+			err = -ENOMEM;
+			xdp_umem_clear_dev(xs->umem);
+			goto out_unlock;
+		}
+
+		err = xp_assign_dev(new_pool, xs, dev, qid, flags);
+		if (err) {
+			xp_destroy(new_pool);
+			xdp_umem_clear_dev(xs->umem);
 			goto out_unlock;
+		}
+		xs->pool = new_pool;
 	}
 
 	xs->dev = dev;
@@ -765,8 +825,6 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			return PTR_ERR(umem);
 		}
 
-		xs->pool = umem->pool;
-
 		/* Make sure umem is ready before it can be seen by others */
 		smp_wmb();
 		WRITE_ONCE(xs->umem, umem);
@@ -796,7 +854,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			&xs->umem->cq;
 		err = xsk_init_queue(entries, q, true);
 		if (optname == XDP_UMEM_FILL_RING)
-			xp_set_fq(xs->umem->pool, *q);
+			xp_set_fq(xs->pool, *q);
 		mutex_unlock(&xs->mutex);
 		return err;
 	}
@@ -1002,7 +1060,8 @@ static int xsk_notifier(struct notifier_block *this,
 
 				xsk_unbind_dev(xs);
 
-				/* Clear device references in umem. */
+				/* Clear device references. */
+				xp_clear_dev(xs->pool);
 				xdp_umem_clear_dev(xs->umem);
 			}
 			mutex_unlock(&xs->mutex);
@@ -1047,7 +1106,7 @@ static void xsk_destruct(struct sock *sk)
 	if (!sock_flag(sk, SOCK_DEAD))
 		return;
 
-	xdp_put_umem(xs->umem);
+	xp_put_pool(xs->pool);
 
 	sk_refcnt_debug_dec(sk);
 }
@@ -1055,8 +1114,8 @@ static void xsk_destruct(struct sock *sk)
 static int xsk_create(struct net *net, struct socket *sock, int protocol,
 		      int kern)
 {
-	struct sock *sk;
 	struct xdp_sock *xs;
+	struct sock *sk;
 
 	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		return -EPERM;
@@ -1092,6 +1151,10 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
 	INIT_LIST_HEAD(&xs->map_list);
 	spin_lock_init(&xs->map_list_lock);
 
+	xs->pool = xp_create();
+	if (!xs->pool)
+		return -ENOMEM;
+
 	mutex_lock(&net->xdp.lock);
 	sk_add_node_rcu(sk, &net->xdp.list);
 	mutex_unlock(&net->xdp.lock);
diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
index 455ddd4..a00e3e2 100644
--- a/net/xdp/xsk.h
+++ b/net/xdp/xsk.h
@@ -51,5 +51,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
 			     struct xdp_sock **map_entry);
 int xsk_map_inc(struct xsk_map *map);
 void xsk_map_put(struct xsk_map *map);
+void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
+int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
+			u16 queue_id);
 
 #endif /* XSK_H_ */
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index c57f0bb..da93b36 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -2,11 +2,14 @@
 
 #include <net/xsk_buff_pool.h>
 #include <net/xdp_sock.h>
+#include <net/xdp_sock_drv.h>
 #include <linux/dma-direct.h>
 #include <linux/dma-noncoherent.h>
 #include <linux/swiotlb.h>
 
 #include "xsk_queue.h"
+#include "xdp_umem.h"
+#include "xsk.h"
 
 static void xp_addr_unmap(struct xsk_buff_pool *pool)
 {
@@ -32,39 +35,48 @@ void xp_destroy(struct xsk_buff_pool *pool)
 	kvfree(pool);
 }
 
-struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
-				u32 chunk_size, u32 headroom, u64 size,
-				bool unaligned)
+struct xsk_buff_pool *xp_create(void)
+{
+	return kvzalloc(sizeof(struct xsk_buff_pool), GFP_KERNEL);
+}
+
+struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
+				     struct xdp_umem *umem)
 {
 	struct xsk_buff_pool *pool;
 	struct xdp_buff_xsk *xskb;
 	int err;
 	u32 i;
 
-	pool = kvzalloc(struct_size(pool, free_heads, chunks), GFP_KERNEL);
+	pool = kvzalloc(struct_size(pool, free_heads, umem->chunks),
+			GFP_KERNEL);
 	if (!pool)
 		goto out;
 
-	pool->heads = kvcalloc(chunks, sizeof(*pool->heads), GFP_KERNEL);
+	memcpy(pool, pool_old, sizeof(*pool_old));
+
+	pool->heads = kvcalloc(umem->chunks, sizeof(*pool->heads), GFP_KERNEL);
 	if (!pool->heads)
 		goto out;
 
-	pool->chunk_mask = ~((u64)chunk_size - 1);
-	pool->addrs_cnt = size;
-	pool->heads_cnt = chunks;
-	pool->free_heads_cnt = chunks;
-	pool->headroom = headroom;
-	pool->chunk_size = chunk_size;
+	pool->chunk_mask = ~((u64)umem->chunk_size - 1);
+	pool->addrs_cnt = umem->size;
+	pool->heads_cnt = umem->chunks;
+	pool->free_heads_cnt = umem->chunks;
+	pool->headroom = umem->headroom;
+	pool->chunk_size = umem->chunk_size;
 	pool->cheap_dma = true;
-	pool->unaligned = unaligned;
-	pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM;
+	pool->unaligned = umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
+	pool->frame_len = umem->chunk_size - umem->headroom -
+		XDP_PACKET_HEADROOM;
 	pool->umem = umem;
 	INIT_LIST_HEAD(&pool->free_list);
+	refcount_set(&pool->users, 1);
 
 	for (i = 0; i < pool->free_heads_cnt; i++) {
 		xskb = &pool->heads[i];
 		xskb->pool = pool;
-		xskb->xdp.frame_sz = chunk_size - headroom;
+		xskb->xdp.frame_sz = umem->chunk_size - umem->headroom;
 		pool->free_heads[i] = xskb;
 	}
 
@@ -91,6 +103,120 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
 }
 EXPORT_SYMBOL(xp_set_rxq_info);
 
+int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
+		  struct net_device *dev, u16 queue_id, u16 flags)
+{
+	struct xdp_umem *umem = pool->umem;
+	bool force_zc, force_copy;
+	struct netdev_bpf bpf;
+	int err = 0;
+
+	ASSERT_RTNL();
+
+	force_zc = flags & XDP_ZEROCOPY;
+	force_copy = flags & XDP_COPY;
+
+	if (force_zc && force_copy)
+		return -EINVAL;
+
+	if (xsk_get_pool_from_qid(dev, queue_id))
+		return -EBUSY;
+
+	err = xsk_reg_pool_at_qid(dev, pool, queue_id);
+	if (err)
+		return err;
+
+	if ((flags & XDP_USE_NEED_WAKEUP) && xs->tx) {
+		umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
+		/* Tx needs to be explicitly woken up the first time.
+		 * Also for supporting drivers that do not implement this
+		 * feature. They will always have to call sendto().
+		 */
+		xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP;
+	}
+
+	if (force_copy)
+		/* For copy-mode, we are done. */
+		return 0;
+
+	if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) {
+		err = -EOPNOTSUPP;
+		goto err_unreg_pool;
+	}
+
+	bpf.command = XDP_SETUP_XSK_POOL;
+	bpf.xsk.pool = pool;
+	bpf.xsk.queue_id = queue_id;
+
+	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
+	if (err)
+		goto err_unreg_pool;
+
+	umem->zc = true;
+	return 0;
+
+err_unreg_pool:
+	if (!force_zc)
+		err = 0; /* fallback to copy mode */
+	if (err)
+		xsk_clear_pool_at_qid(dev, queue_id);
+	return err;
+}
+
+void xp_clear_dev(struct xsk_buff_pool *pool)
+{
+	struct xdp_umem *umem = pool->umem;
+	struct netdev_bpf bpf;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!umem->dev)
+		return;
+
+	if (umem->zc) {
+		bpf.command = XDP_SETUP_XSK_POOL;
+		bpf.xsk.pool = NULL;
+		bpf.xsk.queue_id = umem->queue_id;
+
+		err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
+
+		if (err)
+			WARN(1, "failed to disable umem!\n");
+	}
+
+	xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
+}
+
+static void xp_release_deferred(struct work_struct *work)
+{
+	struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
+						  work);
+
+	rtnl_lock();
+	xp_clear_dev(pool);
+	rtnl_unlock();
+
+	xdp_put_umem(pool->umem);
+	xp_destroy(pool);
+}
+
+void xp_get_pool(struct xsk_buff_pool *pool)
+{
+	refcount_inc(&pool->users);
+}
+
+void xp_put_pool(struct xsk_buff_pool *pool)
+{
+	if (!pool)
+		return;
+
+	if (refcount_dec_and_test(&pool->users)) {
+		INIT_WORK(&pool->work, xp_release_deferred);
+		schedule_work(&pool->work);
+	}
+}
+
 void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 {
 	dma_addr_t *dma;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 5b5d24d..75f1853 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -165,9 +165,9 @@ static inline bool xp_validate_desc(struct xsk_buff_pool *pool,
 
 static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
 					   struct xdp_desc *d,
-					   struct xdp_umem *umem)
+					   struct xsk_buff_pool *pool)
 {
-	if (!xp_validate_desc(umem->pool, d)) {
+	if (!xp_validate_desc(pool, d)) {
 		q->invalid_descs++;
 		return false;
 	}
@@ -176,14 +176,14 @@ static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
 
 static inline bool xskq_cons_read_desc(struct xsk_queue *q,
 				       struct xdp_desc *desc,
-				       struct xdp_umem *umem)
+				       struct xsk_buff_pool *pool)
 {
 	while (q->cached_cons != q->cached_prod) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		u32 idx = q->cached_cons & q->ring_mask;
 
 		*desc = ring->desc[idx];
-		if (xskq_cons_is_valid_desc(q, desc, umem))
+		if (xskq_cons_is_valid_desc(q, desc, pool))
 			return true;
 
 		q->cached_cons++;
@@ -235,11 +235,11 @@ static inline bool xskq_cons_peek_addr_unchecked(struct xsk_queue *q, u64 *addr)
 
 static inline bool xskq_cons_peek_desc(struct xsk_queue *q,
 				       struct xdp_desc *desc,
-				       struct xdp_umem *umem)
+				       struct xsk_buff_pool *pool)
 {
 	if (q->cached_prod == q->cached_cons)
 		xskq_cons_get_entries(q);
-	return xskq_cons_read_desc(q, desc, umem);
+	return xskq_cons_read_desc(q, desc, pool);
 }
 
 static inline void xskq_cons_release(struct xsk_queue *q)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 04/14] xsk: move fill and completion rings to buffer pool
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (2 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 03/14] xsk: create and free context independently from umem Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 05/14] xsk: move queue_id, dev and need_wakeup to context Magnus Karlsson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Move the fill and completion rings from the umem to the buffer
pool. This so that we in a later commit can share the umem
between multiple HW queue ids. In this case, we need one fill and
completion ring per queue id. As the buffer pool is per queue id
and napi id this is a natural place for it and one umem
struture can be shared between these buffer pools.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h      |  2 --
 include/net/xsk_buff_pool.h |  3 ++-
 net/xdp/xdp_umem.c          | 15 ---------------
 net/xdp/xsk.c               | 40 ++++++++++++++++++++--------------------
 net/xdp/xsk_buff_pool.c     | 20 +++++++++++++++-----
 net/xdp/xsk_diag.c          | 10 ++++++----
 6 files changed, 43 insertions(+), 47 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index b9bb118..2dd3fd9 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -18,8 +18,6 @@ struct xsk_queue;
 struct xdp_buff;
 
 struct xdp_umem {
-	struct xsk_queue *fq;
-	struct xsk_queue *cq;
 	u64 size;
 	u32 headroom;
 	u32 chunk_size;
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index cda8ced..f811e25 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -30,6 +30,7 @@ struct xdp_buff_xsk {
 
 struct xsk_buff_pool {
 	struct xsk_queue *fq;
+	struct xsk_queue *cq;
 	struct list_head free_list;
 	dma_addr_t *dma_pages;
 	struct xdp_buff_xsk *heads;
@@ -58,12 +59,12 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool,
 				     struct xdp_umem *umem);
 int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 		  struct net_device *dev, u16 queue_id, u16 flags);
-void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq);
 void xp_destroy(struct xsk_buff_pool *pool);
 void xp_release(struct xdp_buff_xsk *xskb);
 void xp_get_pool(struct xsk_buff_pool *pool);
 void xp_put_pool(struct xsk_buff_pool *pool);
 void xp_clear_dev(struct xsk_buff_pool *pool);
+bool xp_validate_queues(struct xsk_buff_pool *pool);
 
 /* AF_XDP, and XDP core. */
 void xp_free(struct xdp_buff_xsk *xskb);
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index f290345..7d86a63 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -85,16 +85,6 @@ static void xdp_umem_release(struct xdp_umem *umem)
 
 	ida_simple_remove(&umem_ida, umem->id);
 
-	if (umem->fq) {
-		xskq_destroy(umem->fq);
-		umem->fq = NULL;
-	}
-
-	if (umem->cq) {
-		xskq_destroy(umem->cq);
-		umem->cq = NULL;
-	}
-
 	xdp_umem_unpin_pages(umem);
 
 	xdp_umem_unaccount_pages(umem);
@@ -278,8 +268,3 @@ struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr)
 
 	return umem;
 }
-
-bool xdp_umem_validate_queues(struct xdp_umem *umem)
-{
-	return umem->fq && umem->cq;
-}
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b12a832..92f05b0 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -36,7 +36,7 @@ static DEFINE_PER_CPU(struct list_head, xskmap_flush_list);
 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 {
 	return READ_ONCE(xs->rx) &&  READ_ONCE(xs->umem) &&
-		READ_ONCE(xs->umem->fq);
+		READ_ONCE(xs->pool->fq);
 }
 
 void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
@@ -46,7 +46,7 @@ void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
 	if (umem->need_wakeup & XDP_WAKEUP_RX)
 		return;
 
-	umem->fq->ring->flags |= XDP_RING_NEED_WAKEUP;
+	pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP;
 	umem->need_wakeup |= XDP_WAKEUP_RX;
 }
 EXPORT_SYMBOL(xsk_set_rx_need_wakeup);
@@ -76,7 +76,7 @@ void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool)
 	if (!(umem->need_wakeup & XDP_WAKEUP_RX))
 		return;
 
-	umem->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP;
+	pool->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP;
 	umem->need_wakeup &= ~XDP_WAKEUP_RX;
 }
 EXPORT_SYMBOL(xsk_clear_rx_need_wakeup);
@@ -254,7 +254,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
 static void xsk_flush(struct xdp_sock *xs)
 {
 	xskq_prod_submit(xs->rx);
-	__xskq_cons_release(xs->umem->fq);
+	__xskq_cons_release(xs->pool->fq);
 	sock_def_readable(&xs->sk);
 }
 
@@ -297,7 +297,7 @@ void __xsk_map_flush(void)
 
 void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
 {
-	xskq_prod_submit_n(pool->umem->cq, nb_entries);
+	xskq_prod_submit_n(pool->cq, nb_entries);
 }
 EXPORT_SYMBOL(xsk_tx_completed);
 
@@ -329,7 +329,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
 		 * if there is space in it. This avoids having to implement
 		 * any buffering in the Tx path.
 		 */
-		if (xskq_prod_reserve_addr(umem->cq, desc->addr))
+		if (xskq_prod_reserve_addr(pool->cq, desc->addr))
 			goto out;
 
 		xskq_cons_release(xs->tx);
@@ -367,7 +367,7 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	unsigned long flags;
 
 	spin_lock_irqsave(&xs->tx_completion_lock, flags);
-	xskq_prod_submit_addr(xs->umem->cq, addr);
+	xskq_prod_submit_addr(xs->pool->cq, addr);
 	spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
 
 	sock_wfree(skb);
@@ -411,7 +411,7 @@ static int xsk_generic_xmit(struct sock *sk)
 		 * if there is space in it. This avoids having to implement
 		 * any buffering in the Tx path.
 		 */
-		if (unlikely(err) || xskq_prod_reserve(xs->umem->cq)) {
+		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
 			kfree_skb(skb);
 			goto out;
 		}
@@ -686,6 +686,12 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 			goto out_unlock;
 		}
 
+		if (xs->pool->fq || xs->pool->cq) {
+			/* Do not allow setting your own fq or cq. */
+			err = -EINVAL;
+			goto out_unlock;
+		}
+
 		sock = xsk_lookup_xsk_from_fd(sxdp->sxdp_shared_umem_fd);
 		if (IS_ERR(sock)) {
 			err = PTR_ERR(sock);
@@ -712,7 +718,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		xdp_get_umem(umem_xs->umem);
 		WRITE_ONCE(xs->umem, umem_xs->umem);
 		sockfd_put(sock);
-	} else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
+	} else if (!xs->umem || !xp_validate_queues(xs->pool)) {
 		err = -EINVAL;
 		goto out_unlock;
 	} else {
@@ -850,11 +856,9 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			return -EINVAL;
 		}
 
-		q = (optname == XDP_UMEM_FILL_RING) ? &xs->umem->fq :
-			&xs->umem->cq;
+		q = (optname == XDP_UMEM_FILL_RING) ? &xs->pool->fq :
+			&xs->pool->cq;
 		err = xsk_init_queue(entries, q, true);
-		if (optname == XDP_UMEM_FILL_RING)
-			xp_set_fq(xs->pool, *q);
 		mutex_unlock(&xs->mutex);
 		return err;
 	}
@@ -1000,8 +1004,8 @@ static int xsk_mmap(struct file *file, struct socket *sock,
 	loff_t offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
 	unsigned long size = vma->vm_end - vma->vm_start;
 	struct xdp_sock *xs = xdp_sk(sock->sk);
+	struct xsk_buff_pool *pool = xs->pool;
 	struct xsk_queue *q = NULL;
-	struct xdp_umem *umem;
 	unsigned long pfn;
 	struct page *qpg;
 
@@ -1013,16 +1017,12 @@ static int xsk_mmap(struct file *file, struct socket *sock,
 	} else if (offset == XDP_PGOFF_TX_RING) {
 		q = READ_ONCE(xs->tx);
 	} else {
-		umem = READ_ONCE(xs->umem);
-		if (!umem)
-			return -EINVAL;
-
 		/* Matches the smp_wmb() in XDP_UMEM_REG */
 		smp_rmb();
 		if (offset == XDP_UMEM_PGOFF_FILL_RING)
-			q = READ_ONCE(umem->fq);
+			q = READ_ONCE(pool->fq);
 		else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING)
-			q = READ_ONCE(umem->cq);
+			q = READ_ONCE(pool->cq);
 	}
 
 	if (!q)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index da93b36..6a6e0d5 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -89,11 +89,6 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
 	return NULL;
 }
 
-void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq)
-{
-	pool->fq = fq;
-}
-
 void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
 {
 	u32 i;
@@ -197,6 +192,16 @@ static void xp_release_deferred(struct work_struct *work)
 	xp_clear_dev(pool);
 	rtnl_unlock();
 
+	if (pool->fq) {
+		xskq_destroy(pool->fq);
+		pool->fq = NULL;
+	}
+
+	if (pool->cq) {
+		xskq_destroy(pool->cq);
+		pool->cq = NULL;
+	}
+
 	xdp_put_umem(pool->umem);
 	xp_destroy(pool);
 }
@@ -217,6 +222,11 @@ void xp_put_pool(struct xsk_buff_pool *pool)
 	}
 }
 
+bool xp_validate_queues(struct xsk_buff_pool *pool)
+{
+	return pool->fq && pool->cq;
+}
+
 void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 {
 	dma_addr_t *dma;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index 0163b26..1936423 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -46,6 +46,7 @@ static int xsk_diag_put_rings_cfg(const struct xdp_sock *xs,
 
 static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 {
+	struct xsk_buff_pool *pool = xs->pool;
 	struct xdp_umem *umem = xs->umem;
 	struct xdp_diag_umem du = {};
 	int err;
@@ -67,10 +68,11 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 
 	err = nla_put(nlskb, XDP_DIAG_UMEM, sizeof(du), &du);
 
-	if (!err && umem->fq)
-		err = xsk_diag_put_ring(umem->fq, XDP_DIAG_UMEM_FILL_RING, nlskb);
-	if (!err && umem->cq) {
-		err = xsk_diag_put_ring(umem->cq, XDP_DIAG_UMEM_COMPLETION_RING,
+	if (!err && pool->fq)
+		err = xsk_diag_put_ring(pool->fq,
+					XDP_DIAG_UMEM_FILL_RING, nlskb);
+	if (!err && pool->cq) {
+		err = xsk_diag_put_ring(pool->cq, XDP_DIAG_UMEM_COMPLETION_RING,
 					nlskb);
 	}
 	return err;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 05/14] xsk: move queue_id, dev and need_wakeup to context
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (3 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 04/14] xsk: move fill and completion rings to buffer pool Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 06/14] xsk: move xsk_tx_list and its lock to buffer pool Magnus Karlsson
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Move queue_id, dev, and need_wakeup from the umem to the
buffer pool. This so that we in a later commit can share the umem
between multiple HW queues. There is one buffer pool per dev and
queue id, so these variables should belong to the buffer pool, not
the umem. Need_wakeup is also something that is set on a per napi
level, so there is usually one per device and queue id. So move
this to the buffer pool too.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h      |  3 ---
 include/net/xsk_buff_pool.h |  4 ++++
 net/xdp/xdp_umem.c          | 19 +------------------
 net/xdp/xdp_umem.h          |  4 ----
 net/xdp/xsk.c               | 40 +++++++++++++++-------------------------
 net/xdp/xsk_buff_pool.c     | 37 +++++++++++++++++++++----------------
 net/xdp/xsk_diag.c          |  4 ++--
 7 files changed, 43 insertions(+), 68 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 2dd3fd9..e12d814 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -26,11 +26,8 @@ struct xdp_umem {
 	refcount_t users;
 	struct page **pgs;
 	u32 npgs;
-	u16 queue_id;
-	u8 need_wakeup;
 	u8 flags;
 	int id;
-	struct net_device *dev;
 	bool zc;
 	spinlock_t xsk_tx_list_lock;
 	struct list_head xsk_tx_list;
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index f811e25..cd929a8 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -43,11 +43,15 @@ struct xsk_buff_pool {
 	u32 headroom;
 	u32 chunk_size;
 	u32 frame_len;
+	u16 queue_id;
+	u8 cached_need_wakeup;
+	bool uses_need_wakeup;
 	bool cheap_dma;
 	bool unaligned;
 	struct xdp_umem *umem;
 	void *addrs;
 	struct device *dev;
+	struct net_device *netdev;
 	refcount_t users;
 	struct work_struct work;
 	struct xdp_buff_xsk *free_heads[];
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 7d86a63..b1699d0 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -63,26 +63,9 @@ static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
 	}
 }
 
-void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			 u16 queue_id)
-{
-	umem->dev = dev;
-	umem->queue_id = queue_id;
-
-	dev_hold(dev);
-}
-
-void xdp_umem_clear_dev(struct xdp_umem *umem)
-{
-	dev_put(umem->dev);
-	umem->dev = NULL;
-	umem->zc = false;
-}
-
 static void xdp_umem_release(struct xdp_umem *umem)
 {
-	xdp_umem_clear_dev(umem);
-
+	umem->zc = false;
 	ida_simple_remove(&umem_ida, umem->id);
 
 	xdp_umem_unpin_pages(umem);
diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
index 93e96be..67bf3f3 100644
--- a/net/xdp/xdp_umem.h
+++ b/net/xdp/xdp_umem.h
@@ -8,10 +8,6 @@
 
 #include <net/xdp_sock_drv.h>
 
-void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			 u16 queue_id);
-void xdp_umem_clear_dev(struct xdp_umem *umem);
-bool xdp_umem_validate_queues(struct xdp_umem *umem);
 void xdp_get_umem(struct xdp_umem *umem);
 void xdp_put_umem(struct xdp_umem *umem);
 void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 92f05b0..b02ed96 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -41,67 +41,61 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 
 void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
 {
-	struct xdp_umem *umem = pool->umem;
-
-	if (umem->need_wakeup & XDP_WAKEUP_RX)
+	if (pool->cached_need_wakeup & XDP_WAKEUP_RX)
 		return;
 
 	pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP;
-	umem->need_wakeup |= XDP_WAKEUP_RX;
+	pool->cached_need_wakeup |= XDP_WAKEUP_RX;
 }
 EXPORT_SYMBOL(xsk_set_rx_need_wakeup);
 
 void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool)
 {
-	struct xdp_umem *umem = pool->umem;
 	struct xdp_sock *xs;
 
-	if (umem->need_wakeup & XDP_WAKEUP_TX)
+	if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
 		return;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
+	list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) {
 		xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP;
 	}
 	rcu_read_unlock();
 
-	umem->need_wakeup |= XDP_WAKEUP_TX;
+	pool->cached_need_wakeup |= XDP_WAKEUP_TX;
 }
 EXPORT_SYMBOL(xsk_set_tx_need_wakeup);
 
 void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool)
 {
-	struct xdp_umem *umem = pool->umem;
-
-	if (!(umem->need_wakeup & XDP_WAKEUP_RX))
+	if (!(pool->cached_need_wakeup & XDP_WAKEUP_RX))
 		return;
 
 	pool->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP;
-	umem->need_wakeup &= ~XDP_WAKEUP_RX;
+	pool->cached_need_wakeup &= ~XDP_WAKEUP_RX;
 }
 EXPORT_SYMBOL(xsk_clear_rx_need_wakeup);
 
 void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool)
 {
-	struct xdp_umem *umem = pool->umem;
 	struct xdp_sock *xs;
 
-	if (!(umem->need_wakeup & XDP_WAKEUP_TX))
+	if (!(pool->cached_need_wakeup & XDP_WAKEUP_TX))
 		return;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
+	list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) {
 		xs->tx->ring->flags &= ~XDP_RING_NEED_WAKEUP;
 	}
 	rcu_read_unlock();
 
-	umem->need_wakeup &= ~XDP_WAKEUP_TX;
+	pool->cached_need_wakeup &= ~XDP_WAKEUP_TX;
 }
 EXPORT_SYMBOL(xsk_clear_tx_need_wakeup);
 
 bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
 {
-	return pool->umem->flags & XDP_UMEM_USES_NEED_WAKEUP;
+	return pool->uses_need_wakeup;
 }
 EXPORT_SYMBOL(xsk_uses_need_wakeup);
 
@@ -474,16 +468,16 @@ static __poll_t xsk_poll(struct file *file, struct socket *sock,
 	__poll_t mask = datagram_poll(file, sock, wait);
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
-	struct xdp_umem *umem;
+	struct xsk_buff_pool *pool;
 
 	if (unlikely(!xsk_is_bound(xs)))
 		return mask;
 
-	umem = xs->umem;
+	pool = xs->pool;
 
-	if (umem->need_wakeup) {
+	if (pool->cached_need_wakeup) {
 		if (xs->zc)
-			xsk_wakeup(xs, umem->need_wakeup);
+			xsk_wakeup(xs, pool->cached_need_wakeup);
 		else
 			/* Poll needs to drive Tx also in copy mode */
 			__xsk_sendmsg(sk);
@@ -725,18 +719,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		struct xsk_buff_pool *new_pool;
 
 		/* This xsk has its own umem. */
-		xdp_umem_assign_dev(xs->umem, dev, qid);
 		new_pool = xp_assign_umem(xs->pool, xs->umem);
 		if (!new_pool) {
 			err = -ENOMEM;
-			xdp_umem_clear_dev(xs->umem);
 			goto out_unlock;
 		}
 
 		err = xp_assign_dev(new_pool, xs, dev, qid, flags);
 		if (err) {
 			xp_destroy(new_pool);
-			xdp_umem_clear_dev(xs->umem);
 			goto out_unlock;
 		}
 		xs->pool = new_pool;
@@ -1062,7 +1053,6 @@ static int xsk_notifier(struct notifier_block *this,
 
 				/* Clear device references. */
 				xp_clear_dev(xs->pool);
-				xdp_umem_clear_dev(xs->umem);
 			}
 			mutex_unlock(&xs->mutex);
 		}
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 6a6e0d5..e0a49fc 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -99,9 +99,8 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
 EXPORT_SYMBOL(xp_set_rxq_info);
 
 int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
-		  struct net_device *dev, u16 queue_id, u16 flags)
+		  struct net_device *netdev, u16 queue_id, u16 flags)
 {
-	struct xdp_umem *umem = pool->umem;
 	bool force_zc, force_copy;
 	struct netdev_bpf bpf;
 	int err = 0;
@@ -114,15 +113,15 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 	if (force_zc && force_copy)
 		return -EINVAL;
 
-	if (xsk_get_pool_from_qid(dev, queue_id))
+	if (xsk_get_pool_from_qid(netdev, queue_id))
 		return -EBUSY;
 
-	err = xsk_reg_pool_at_qid(dev, pool, queue_id);
+	err = xsk_reg_pool_at_qid(netdev, pool, queue_id);
 	if (err)
 		return err;
 
 	if ((flags & XDP_USE_NEED_WAKEUP) && xs->tx) {
-		umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
+		pool->uses_need_wakeup = true;
 		/* Tx needs to be explicitly woken up the first time.
 		 * Also for supporting drivers that do not implement this
 		 * feature. They will always have to call sendto().
@@ -130,11 +129,14 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 		xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP;
 	}
 
+	dev_hold(netdev);
+
 	if (force_copy)
 		/* For copy-mode, we are done. */
 		return 0;
 
-	if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) {
+	if (!netdev->netdev_ops->ndo_bpf ||
+	    !netdev->netdev_ops->ndo_xsk_wakeup) {
 		err = -EOPNOTSUPP;
 		goto err_unreg_pool;
 	}
@@ -143,44 +145,47 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 	bpf.xsk.pool = pool;
 	bpf.xsk.queue_id = queue_id;
 
-	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
+	err = netdev->netdev_ops->ndo_bpf(netdev, &bpf);
 	if (err)
 		goto err_unreg_pool;
 
-	umem->zc = true;
+	pool->netdev = netdev;
+	pool->queue_id = queue_id;
+	pool->umem->zc = true;
 	return 0;
 
 err_unreg_pool:
 	if (!force_zc)
 		err = 0; /* fallback to copy mode */
 	if (err)
-		xsk_clear_pool_at_qid(dev, queue_id);
+		xsk_clear_pool_at_qid(netdev, queue_id);
 	return err;
 }
 
 void xp_clear_dev(struct xsk_buff_pool *pool)
 {
-	struct xdp_umem *umem = pool->umem;
 	struct netdev_bpf bpf;
 	int err;
 
 	ASSERT_RTNL();
 
-	if (!umem->dev)
+	if (!pool->netdev)
 		return;
 
-	if (umem->zc) {
+	if (pool->umem->zc) {
 		bpf.command = XDP_SETUP_XSK_POOL;
 		bpf.xsk.pool = NULL;
-		bpf.xsk.queue_id = umem->queue_id;
+		bpf.xsk.queue_id = pool->queue_id;
 
-		err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
+		err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf);
 
 		if (err)
-			WARN(1, "failed to disable umem!\n");
+			WARN(1, "Failed to disable zero-copy!\n");
 	}
 
-	xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
+	xsk_clear_pool_at_qid(pool->netdev, pool->queue_id);
+	dev_put(pool->netdev);
+	pool->netdev = NULL;
 }
 
 static void xp_release_deferred(struct work_struct *work)
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index 1936423..c974295 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -59,8 +59,8 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
 	du.num_pages = umem->npgs;
 	du.chunk_size = umem->chunk_size;
 	du.headroom = umem->headroom;
-	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
-	du.queue_id = umem->queue_id;
+	du.ifindex = pool->netdev ? pool->netdev->ifindex : 0;
+	du.queue_id = pool->queue_id;
 	du.flags = 0;
 	if (umem->zc)
 		du.flags |= XDP_DU_F_ZEROCOPY;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 06/14] xsk: move xsk_tx_list and its lock to buffer pool
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (4 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 05/14] xsk: move queue_id, dev and need_wakeup to context Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 07/14] xsk: move addrs from buffer pool to umem Magnus Karlsson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Move the xsk_tx_list and the xsk_tx_list_lock from the umem to
the buffer pool. This so that we in a later commit can share the
umem between multiple HW queues. There is one xsk_tx_list per
device and queue id, so it should be located in the buffer pool.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h      |  4 +---
 include/net/xsk_buff_pool.h |  5 +++++
 net/xdp/xdp_umem.c          | 26 --------------------------
 net/xdp/xdp_umem.h          |  2 --
 net/xdp/xsk.c               | 13 ++++++-------
 net/xdp/xsk_buff_pool.c     | 26 ++++++++++++++++++++++++++
 6 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e12d814..471719d 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -29,8 +29,6 @@ struct xdp_umem {
 	u8 flags;
 	int id;
 	bool zc;
-	spinlock_t xsk_tx_list_lock;
-	struct list_head xsk_tx_list;
 };
 
 struct xsk_map {
@@ -57,7 +55,7 @@ struct xdp_sock {
 	/* Protects multiple processes in the control path */
 	struct mutex mutex;
 	struct xsk_queue *tx ____cacheline_aligned_in_smp;
-	struct list_head list;
+	struct list_head tx_list;
 	/* Mutual exclusion of NAPI TX thread and sendmsg error paths
 	 * in the SKB destructor callback.
 	 */
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index cd929a8..6158a47 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -52,6 +52,9 @@ struct xsk_buff_pool {
 	void *addrs;
 	struct device *dev;
 	struct net_device *netdev;
+	struct list_head xsk_tx_list;
+	/* Protects modifications to the xsk_tx_list */
+	spinlock_t xsk_tx_list_lock;
 	refcount_t users;
 	struct work_struct work;
 	struct xdp_buff_xsk *free_heads[];
@@ -69,6 +72,8 @@ void xp_get_pool(struct xsk_buff_pool *pool);
 void xp_put_pool(struct xsk_buff_pool *pool);
 void xp_clear_dev(struct xsk_buff_pool *pool);
 bool xp_validate_queues(struct xsk_buff_pool *pool);
+void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs);
+void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs);
 
 /* AF_XDP, and XDP core. */
 void xp_free(struct xdp_buff_xsk *xskb);
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index b1699d0..a871c75 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -23,30 +23,6 @@
 
 static DEFINE_IDA(umem_ida);
 
-void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
-{
-	unsigned long flags;
-
-	if (!xs->tx)
-		return;
-
-	spin_lock_irqsave(&umem->xsk_tx_list_lock, flags);
-	list_add_rcu(&xs->list, &umem->xsk_tx_list);
-	spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags);
-}
-
-void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
-{
-	unsigned long flags;
-
-	if (!xs->tx)
-		return;
-
-	spin_lock_irqsave(&umem->xsk_tx_list_lock, flags);
-	list_del_rcu(&xs->list);
-	spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags);
-}
-
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
 	unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
@@ -206,8 +182,6 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	umem->pgs = NULL;
 	umem->user = NULL;
 	umem->flags = mr->flags;
-	INIT_LIST_HEAD(&umem->xsk_tx_list);
-	spin_lock_init(&umem->xsk_tx_list_lock);
 
 	refcount_set(&umem->users, 1);
 
diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
index 67bf3f3..181fdda 100644
--- a/net/xdp/xdp_umem.h
+++ b/net/xdp/xdp_umem.h
@@ -10,8 +10,6 @@
 
 void xdp_get_umem(struct xdp_umem *umem);
 void xdp_put_umem(struct xdp_umem *umem);
-void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs);
-void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs);
 struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr);
 
 #endif /* XDP_UMEM_H_ */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b02ed96..4d0028c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -57,7 +57,7 @@ void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool)
 		return;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) {
+	list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
 		xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP;
 	}
 	rcu_read_unlock();
@@ -84,7 +84,7 @@ void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool)
 		return;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) {
+	list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
 		xs->tx->ring->flags &= ~XDP_RING_NEED_WAKEUP;
 	}
 	rcu_read_unlock();
@@ -300,7 +300,7 @@ void xsk_tx_release(struct xsk_buff_pool *pool)
 	struct xdp_sock *xs;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(xs, &pool->umem->xsk_tx_list, list) {
+	list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
 		__xskq_cons_release(xs->tx);
 		xs->sk.sk_write_space(&xs->sk);
 	}
@@ -310,11 +310,10 @@ EXPORT_SYMBOL(xsk_tx_release);
 
 bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
 {
-	struct xdp_umem *umem = pool->umem;
 	struct xdp_sock *xs;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
+	list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
 		if (!xskq_cons_peek_desc(xs->tx, desc, pool))
 			continue;
 
@@ -518,7 +517,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 	WRITE_ONCE(xs->state, XSK_UNBOUND);
 
 	/* Wait for driver to stop using the xdp socket. */
-	xdp_del_sk_umem(xs->umem, xs);
+	xp_del_xsk(xs->pool, xs);
 	xs->dev = NULL;
 	synchronize_net();
 	dev_put(dev);
@@ -736,7 +735,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	xs->dev = dev;
 	xs->zc = xs->umem->zc;
 	xs->queue_id = qid;
-	xdp_add_sk_umem(xs->umem, xs);
+	xp_add_xsk(xs->pool, xs);
 
 out_unlock:
 	if (err) {
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index e0a49fc..31dd337 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -11,6 +11,30 @@
 #include "xdp_umem.h"
 #include "xsk.h"
 
+void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs)
+{
+	unsigned long flags;
+
+	if (!xs->tx)
+		return;
+
+	spin_lock_irqsave(&pool->xsk_tx_list_lock, flags);
+	list_add_rcu(&xs->tx_list, &pool->xsk_tx_list);
+	spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags);
+}
+
+void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs)
+{
+	unsigned long flags;
+
+	if (!xs->tx)
+		return;
+
+	spin_lock_irqsave(&pool->xsk_tx_list_lock, flags);
+	list_del_rcu(&xs->tx_list);
+	spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags);
+}
+
 static void xp_addr_unmap(struct xsk_buff_pool *pool)
 {
 	vunmap(pool->addrs);
@@ -71,6 +95,8 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
 		XDP_PACKET_HEADROOM;
 	pool->umem = umem;
 	INIT_LIST_HEAD(&pool->free_list);
+	INIT_LIST_HEAD(&pool->xsk_tx_list);
+	spin_lock_init(&pool->xsk_tx_list_lock);
 	refcount_set(&pool->users, 1);
 
 	for (i = 0; i < pool->free_heads_cnt; i++) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 07/14] xsk: move addrs from buffer pool to umem
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (5 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 06/14] xsk: move xsk_tx_list and its lock to buffer pool Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 08/14] xsk: net: enable sharing of dma mappings Magnus Karlsson
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Replicate the addrs pointer in the buffer pool to the umem. This mapping
will be the same for all buffer pools sharing the same umem. In the
buffer pool we leave the addrs pointer for performance reasons.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h  |  1 +
 net/xdp/xdp_umem.c      | 22 ++++++++++++++++++++++
 net/xdp/xsk_buff_pool.c | 21 ++-------------------
 3 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 471719d..d2fddf2 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -18,6 +18,7 @@ struct xsk_queue;
 struct xdp_buff;
 
 struct xdp_umem {
+	void *addrs;
 	u64 size;
 	u32 headroom;
 	u32 chunk_size;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index a871c75..372998d 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -39,11 +39,27 @@ static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
 	}
 }
 
+static void xdp_umem_addr_unmap(struct xdp_umem *umem)
+{
+	vunmap(umem->addrs);
+	umem->addrs = NULL;
+}
+
+static int xdp_umem_addr_map(struct xdp_umem *umem, struct page **pages,
+			     u32 nr_pages)
+{
+	umem->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
+	if (!umem->addrs)
+		return -ENOMEM;
+	return 0;
+}
+
 static void xdp_umem_release(struct xdp_umem *umem)
 {
 	umem->zc = false;
 	ida_simple_remove(&umem_ida, umem->id);
 
+	xdp_umem_addr_unmap(umem);
 	xdp_umem_unpin_pages(umem);
 
 	xdp_umem_unaccount_pages(umem);
@@ -193,8 +209,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (err)
 		goto out_account;
 
+	err = xdp_umem_addr_map(umem, umem->pgs, umem->npgs);
+	if (err)
+		goto out_unpin;
+
 	return 0;
 
+out_unpin:
+	xdp_umem_unpin_pages(umem);
 out_account:
 	xdp_umem_unaccount_pages(umem);
 	return err;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 31dd337..ae27664 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -35,26 +35,11 @@ void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs)
 	spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags);
 }
 
-static void xp_addr_unmap(struct xsk_buff_pool *pool)
-{
-	vunmap(pool->addrs);
-}
-
-static int xp_addr_map(struct xsk_buff_pool *pool,
-		       struct page **pages, u32 nr_pages)
-{
-	pool->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
-	if (!pool->addrs)
-		return -ENOMEM;
-	return 0;
-}
-
 void xp_destroy(struct xsk_buff_pool *pool)
 {
 	if (!pool)
 		return;
 
-	xp_addr_unmap(pool);
 	kvfree(pool->heads);
 	kvfree(pool);
 }
@@ -69,7 +54,6 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
 {
 	struct xsk_buff_pool *pool;
 	struct xdp_buff_xsk *xskb;
-	int err;
 	u32 i;
 
 	pool = kvzalloc(struct_size(pool, free_heads, umem->chunks),
@@ -94,6 +78,7 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
 	pool->frame_len = umem->chunk_size - umem->headroom -
 		XDP_PACKET_HEADROOM;
 	pool->umem = umem;
+	pool->addrs = umem->addrs;
 	INIT_LIST_HEAD(&pool->free_list);
 	INIT_LIST_HEAD(&pool->xsk_tx_list);
 	spin_lock_init(&pool->xsk_tx_list_lock);
@@ -106,9 +91,7 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
 		pool->free_heads[i] = xskb;
 	}
 
-	err = xp_addr_map(pool, umem->pgs, umem->npgs);
-	if (!err)
-		return pool;
+	return pool;
 
 out:
 	xp_destroy(pool);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 08/14] xsk: net: enable sharing of dma mappings
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (6 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 07/14] xsk: move addrs from buffer pool to umem Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 09/14] xsk: rearrange internal structs for better performance Magnus Karlsson
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Enable the sharing of dma mappings by moving them out of the umem
structure. Instead we put each dma mapped umem region in a list
in the netdev structure. If dma has already been mapped for this
umem and device, it is not mapped again and the existing dma
mappings are reused.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/linux/netdevice.h   |   3 ++
 include/net/xsk_buff_pool.h |   7 +++
 net/core/dev.c              |   3 ++
 net/xdp/xsk_buff_pool.c     | 112 ++++++++++++++++++++++++++++++++++++--------
 4 files changed, 106 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e5acc3b..fd794aa 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2006,6 +2006,9 @@ struct net_device {
 	unsigned int		real_num_rx_queues;
 
 	struct bpf_prog __rcu	*xdp_prog;
+#ifdef CONFIG_XDP_SOCKETS
+	struct list_head        xsk_dma_list;
+#endif
 	unsigned long		gro_flush_timeout;
 	int			napi_defer_hard_irqs;
 	rx_handler_func_t __rcu	*rx_handler;
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 6158a47..197cca8 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -28,6 +28,13 @@ struct xdp_buff_xsk {
 	struct list_head free_list_node;
 };
 
+struct xsk_dma_map {
+	dma_addr_t *dma_pages;
+	struct xdp_umem *umem;
+	refcount_t users;
+	struct list_head list; /* Protected by the RTNL_LOCK */
+};
+
 struct xsk_buff_pool {
 	struct xsk_queue *fq;
 	struct xsk_queue *cq;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6bc2388..fe8a72f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9959,6 +9959,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->ptype_all);
 	INIT_LIST_HEAD(&dev->ptype_specific);
 	INIT_LIST_HEAD(&dev->net_notifier_list);
+#ifdef CONFIG_XDP_SOCKETS
+	INIT_LIST_HEAD(&dev->xsk_dma_list);
+#endif
 #ifdef CONFIG_NET_SCHED
 	hash_init(dev->qdisc_hash);
 #endif
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index ae27664..3c58d76 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -107,6 +107,25 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
 }
 EXPORT_SYMBOL(xp_set_rxq_info);
 
+static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
+{
+	struct netdev_bpf bpf;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (pool->umem->zc) {
+		bpf.command = XDP_SETUP_XSK_POOL;
+		bpf.xsk.pool = NULL;
+		bpf.xsk.queue_id = pool->queue_id;
+
+		err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf);
+
+		if (err)
+			WARN(1, "Failed to disable zero-copy!\n");
+	}
+}
+
 int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 		  struct net_device *netdev, u16 queue_id, u16 flags)
 {
@@ -125,6 +144,8 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 	if (xsk_get_pool_from_qid(netdev, queue_id))
 		return -EBUSY;
 
+	pool->netdev = netdev;
+	pool->queue_id = queue_id;
 	err = xsk_reg_pool_at_qid(netdev, pool, queue_id);
 	if (err)
 		return err;
@@ -158,11 +179,15 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 	if (err)
 		goto err_unreg_pool;
 
-	pool->netdev = netdev;
-	pool->queue_id = queue_id;
+	if (!pool->dma_pages) {
+		WARN(1, "Driver did not DMA map zero-copy buffers");
+		goto err_unreg_xsk;
+	}
 	pool->umem->zc = true;
 	return 0;
 
+err_unreg_xsk:
+	xp_disable_drv_zc(pool);
 err_unreg_pool:
 	if (!force_zc)
 		err = 0; /* fallback to copy mode */
@@ -173,25 +198,10 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 
 void xp_clear_dev(struct xsk_buff_pool *pool)
 {
-	struct netdev_bpf bpf;
-	int err;
-
-	ASSERT_RTNL();
-
 	if (!pool->netdev)
 		return;
 
-	if (pool->umem->zc) {
-		bpf.command = XDP_SETUP_XSK_POOL;
-		bpf.xsk.pool = NULL;
-		bpf.xsk.queue_id = pool->queue_id;
-
-		err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf);
-
-		if (err)
-			WARN(1, "Failed to disable zero-copy!\n");
-	}
-
+	xp_disable_drv_zc(pool);
 	xsk_clear_pool_at_qid(pool->netdev, pool->queue_id);
 	dev_put(pool->netdev);
 	pool->netdev = NULL;
@@ -241,14 +251,61 @@ bool xp_validate_queues(struct xsk_buff_pool *pool)
 	return pool->fq && pool->cq;
 }
 
+static struct xsk_dma_map *xp_find_dma_map(struct xsk_buff_pool *pool)
+{
+	struct xsk_dma_map *dma_map;
+
+	list_for_each_entry(dma_map, &pool->netdev->xsk_dma_list, list) {
+		if (dma_map->umem == pool->umem)
+			return dma_map;
+	}
+
+	return NULL;
+}
+
+static void xp_destroy_dma_map(struct xsk_dma_map *dma_map)
+{
+	list_del(&dma_map->list);
+	kfree(dma_map);
+}
+
+static void xp_put_dma_map(struct xsk_dma_map *dma_map)
+{
+	if (!refcount_dec_and_test(&dma_map->users))
+		return;
+
+	xp_destroy_dma_map(dma_map);
+}
+
+static struct xsk_dma_map *xp_create_dma_map(struct xsk_buff_pool *pool)
+{
+	struct xsk_dma_map *dma_map;
+
+	dma_map = kzalloc(sizeof(*dma_map), GFP_KERNEL);
+	if (!dma_map)
+		return NULL;
+
+	dma_map->umem = pool->umem;
+	refcount_set(&dma_map->users, 1);
+	list_add(&dma_map->list, &pool->netdev->xsk_dma_list);
+	return dma_map;
+}
+
 void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 {
+	struct xsk_dma_map *dma_map;
 	dma_addr_t *dma;
 	u32 i;
 
 	if (pool->dma_pages_cnt == 0)
 		return;
 
+	dma_map = xp_find_dma_map(pool);
+	if (!dma_map) {
+		WARN(1, "Could not find dma_map for device");
+		return;
+	}
+
 	for (i = 0; i < pool->dma_pages_cnt; i++) {
 		dma = &pool->dma_pages[i];
 		if (*dma) {
@@ -258,6 +315,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
 		}
 	}
 
+	xp_put_dma_map(dma_map);
 	kvfree(pool->dma_pages);
 	pool->dma_pages_cnt = 0;
 	pool->dev = NULL;
@@ -321,14 +379,29 @@ static bool xp_check_cheap_dma(struct xsk_buff_pool *pool)
 int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 	       unsigned long attrs, struct page **pages, u32 nr_pages)
 {
+	struct xsk_dma_map *dma_map;
 	dma_addr_t dma;
 	u32 i;
 
+	dma_map = xp_find_dma_map(pool);
+	if (dma_map) {
+		pool->dma_pages = dma_map->dma_pages;
+		refcount_inc(&dma_map->users);
+		return 0;
+	}
+
+	dma_map = xp_create_dma_map(pool);
+	if (!dma_map)
+		return -ENOMEM;
+
 	pool->dma_pages = kvcalloc(nr_pages, sizeof(*pool->dma_pages),
 				   GFP_KERNEL);
-	if (!pool->dma_pages)
+	if (!pool->dma_pages) {
+		xp_destroy_dma_map(dma_map);
 		return -ENOMEM;
+	}
 
+	dma_map->dma_pages = pool->dma_pages;
 	pool->dev = dev;
 	pool->dma_pages_cnt = nr_pages;
 
@@ -337,6 +410,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 					 DMA_BIDIRECTIONAL, attrs);
 		if (dma_mapping_error(dev, dma)) {
 			xp_dma_unmap(pool, attrs);
+			xp_destroy_dma_map(dma_map);
 			return -ENOMEM;
 		}
 		pool->dma_pages[i] = dma;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 09/14] xsk: rearrange internal structs for better performance
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (7 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 08/14] xsk: net: enable sharing of dma mappings Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 10/14] xsk: add shared umem support between queue ids Magnus Karlsson
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Rearrange the xdp_sock, xdp_umem and xsk_buff_pool structures so
that they get smaller and align better to the cache lines. In the
previous commits of this patch set, these structs have been
reordered with the focus on functionality and simplicity, not
performance. This patch improves throughput performance by around
3%.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h      | 14 +++++++-------
 include/net/xsk_buff_pool.h | 27 +++++++++++++++------------
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index d2fddf2..6c14d48 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -23,13 +23,13 @@ struct xdp_umem {
 	u32 headroom;
 	u32 chunk_size;
 	u32 chunks;
+	u32 npgs;
 	struct user_struct *user;
 	refcount_t users;
-	struct page **pgs;
-	u32 npgs;
 	u8 flags;
-	int id;
 	bool zc;
+	struct page **pgs;
+	int id;
 };
 
 struct xsk_map {
@@ -41,7 +41,7 @@ struct xsk_map {
 struct xdp_sock {
 	/* struct sock must be the first member of struct xdp_sock */
 	struct sock sk;
-	struct xsk_queue *rx;
+	struct xsk_queue *rx ____cacheline_aligned_in_smp;
 	struct net_device *dev;
 	struct xdp_umem *umem;
 	struct list_head flush_node;
@@ -53,8 +53,7 @@ struct xdp_sock {
 		XSK_BOUND,
 		XSK_UNBOUND,
 	} state;
-	/* Protects multiple processes in the control path */
-	struct mutex mutex;
+	u64 rx_dropped;
 	struct xsk_queue *tx ____cacheline_aligned_in_smp;
 	struct list_head tx_list;
 	/* Mutual exclusion of NAPI TX thread and sendmsg error paths
@@ -63,10 +62,11 @@ struct xdp_sock {
 	spinlock_t tx_completion_lock;
 	/* Protects generic receive. */
 	spinlock_t rx_lock;
-	u64 rx_dropped;
 	struct list_head map_list;
 	/* Protects map_list */
 	spinlock_t map_list_lock;
+	/* Protects multiple processes in the control path */
+	struct mutex mutex;
 };
 
 #ifdef CONFIG_XDP_SOCKETS
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 197cca8..7513a17 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -36,34 +36,37 @@ struct xsk_dma_map {
 };
 
 struct xsk_buff_pool {
-	struct xsk_queue *fq;
-	struct xsk_queue *cq;
+	/* Members only used in the control path first. */
+	struct device *dev;
+	struct net_device *netdev;
+	struct list_head xsk_tx_list;
+	/* Protects modifications to the xsk_tx_list */
+	spinlock_t xsk_tx_list_lock;
+	refcount_t users;
+	struct xdp_umem *umem;
+	struct work_struct work;
 	struct list_head free_list;
+	u32 heads_cnt;
+	u16 queue_id;
+
+	/* Data path members as close to free_heads at the end as possible. */
+	struct xsk_queue *fq ____cacheline_aligned_in_smp;
+	struct xsk_queue *cq;
 	dma_addr_t *dma_pages;
 	struct xdp_buff_xsk *heads;
 	u64 chunk_mask;
 	u64 addrs_cnt;
 	u32 free_list_cnt;
 	u32 dma_pages_cnt;
-	u32 heads_cnt;
 	u32 free_heads_cnt;
 	u32 headroom;
 	u32 chunk_size;
 	u32 frame_len;
-	u16 queue_id;
 	u8 cached_need_wakeup;
 	bool uses_need_wakeup;
 	bool cheap_dma;
 	bool unaligned;
-	struct xdp_umem *umem;
 	void *addrs;
-	struct device *dev;
-	struct net_device *netdev;
-	struct list_head xsk_tx_list;
-	/* Protects modifications to the xsk_tx_list */
-	spinlock_t xsk_tx_list_lock;
-	refcount_t users;
-	struct work_struct work;
 	struct xdp_buff_xsk *free_heads[];
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 10/14] xsk: add shared umem support between queue ids
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (8 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 09/14] xsk: rearrange internal structs for better performance Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 11/14] xsk: add shared umem support between devices Magnus Karlsson
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Add support to share a umem between queue ids on the same
device. This mode can be invoked with the XDP_SHARED_UMEM bind
flag. Previously, sharing was only supported within the same
queue id and device, and you shared one set of fill and
completion rings. However, note that when sharing a umem between
queue ids, you need to create a fill ring and a completion ring
and tie them to the socket before you do the bind with the
XDP_SHARED_UMEM flag. This so that the single-producer
single-consumer semantics can be upheld.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xsk_buff_pool.h |  3 +++
 net/xdp/xsk.c               | 51 +++++++++++++++++++++++++++++----------------
 net/xdp/xsk_buff_pool.c     | 27 ++++++++++++++++++++++--
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 7513a17..844901c 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -76,6 +76,9 @@ struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool,
 				     struct xdp_umem *umem);
 int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 		  struct net_device *dev, u16 queue_id, u16 flags);
+int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *xs,
+			 struct xdp_umem *umem, struct net_device *dev,
+			 u16 queue_id);
 void xp_destroy(struct xsk_buff_pool *pool);
 void xp_release(struct xdp_buff_xsk *xskb);
 void xp_get_pool(struct xsk_buff_pool *pool);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 4d0028c..1abc222 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -627,6 +627,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
+	struct xsk_buff_pool *new_pool;
 	struct net_device *dev;
 	u32 flags, qid;
 	int err = 0;
@@ -679,12 +680,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 			goto out_unlock;
 		}
 
-		if (xs->pool->fq || xs->pool->cq) {
-			/* Do not allow setting your own fq or cq. */
-			err = -EINVAL;
-			goto out_unlock;
-		}
-
 		sock = xsk_lookup_xsk_from_fd(sxdp->sxdp_shared_umem_fd);
 		if (IS_ERR(sock)) {
 			err = PTR_ERR(sock);
@@ -697,17 +692,43 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 			sockfd_put(sock);
 			goto out_unlock;
 		}
-		if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
+		if (umem_xs->dev != dev) {
 			err = -EINVAL;
 			sockfd_put(sock);
 			goto out_unlock;
 		}
 
-		/* Share the buffer pool with the other socket. */
-		xp_get_pool(umem_xs->pool);
-		curr_pool = xs->pool;
-		xs->pool = umem_xs->pool;
-		xp_destroy(curr_pool);
+		if (umem_xs->queue_id != qid) {
+			/* Share the umem with another socket on another qid */
+			new_pool = xp_assign_umem(xs->pool, umem_xs->umem);
+			if (!new_pool) {
+				sockfd_put(sock);
+				goto out_unlock;
+			}
+
+			err = xp_assign_dev_shared(new_pool, xs, umem_xs->umem,
+						   dev, qid);
+			if (err) {
+				xp_destroy(new_pool);
+				sockfd_put(sock);
+				goto out_unlock;
+			}
+			xs->pool = new_pool;
+		} else {
+			/* Share the buffer pool with the other socket. */
+			if (xs->pool->fq || xs->pool->cq) {
+				/* Do not allow setting your own fq or cq. */
+				err = -EINVAL;
+				sockfd_put(sock);
+				goto out_unlock;
+			}
+
+			xp_get_pool(umem_xs->pool);
+			curr_pool = xs->pool;
+			xs->pool = umem_xs->pool;
+			xp_destroy(curr_pool);
+		}
+
 		xdp_get_umem(umem_xs->umem);
 		WRITE_ONCE(xs->umem, umem_xs->umem);
 		sockfd_put(sock);
@@ -715,8 +736,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		err = -EINVAL;
 		goto out_unlock;
 	} else {
-		struct xsk_buff_pool *new_pool;
-
 		/* This xsk has its own umem. */
 		new_pool = xp_assign_umem(xs->pool, xs->umem);
 		if (!new_pool) {
@@ -841,10 +860,6 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 			mutex_unlock(&xs->mutex);
 			return -EBUSY;
 		}
-		if (!xs->umem) {
-			mutex_unlock(&xs->mutex);
-			return -EINVAL;
-		}
 
 		q = (optname == XDP_UMEM_FILL_RING) ? &xs->pool->fq :
 			&xs->pool->cq;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 3c58d76..7987c17 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -126,8 +126,8 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
 	}
 }
 
-int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
-		  struct net_device *netdev, u16 queue_id, u16 flags)
+static int __xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
+			   struct net_device *netdev, u16 queue_id, u16 flags)
 {
 	bool force_zc, force_copy;
 	struct netdev_bpf bpf;
@@ -196,6 +196,29 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
 	return err;
 }
 
+int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
+		  struct net_device *dev, u16 queue_id, u16 flags)
+{
+	return __xp_assign_dev(pool, xs, dev, queue_id, flags);
+}
+
+int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *xs,
+			 struct xdp_umem *umem, struct net_device *dev,
+			 u16 queue_id)
+{
+	u16 flags;
+
+	/* One fill and completion ring required for each queue id. */
+	if (!pool->fq || !pool->cq)
+		return -EINVAL;
+
+	flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
+	if (pool->uses_need_wakeup)
+		flags |= XDP_USE_NEED_WAKEUP;
+
+	return __xp_assign_dev(pool, xs, dev, queue_id, flags);
+}
+
 void xp_clear_dev(struct xsk_buff_pool *pool)
 {
 	if (!pool->netdev)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 11/14] xsk: add shared umem support between devices
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (9 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 10/14] xsk: add shared umem support between queue ids Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-09 15:20   ` Maxim Mikityanskiy
  2020-07-02 12:19 ` [PATCH bpf-next 12/14] libbpf: support shared umems between queues and devices Magnus Karlsson
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Add support to share a umem between different devices. This mode
can be invoked with the XDP_SHARED_UMEM bind flag. Previously,
sharing was only supported within the same device. Note that when
sharing a umem between devices, just as in the case of sharing a
umem between queue ids, you need to create a fill ring and a
completion ring and tie them to the socket (with two setsockopts,
one for each ring) before you do the bind with the
XDP_SHARED_UMEM flag. This so that the single-producer
single-consumer semantics of the rings can be upheld.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 1abc222..b240221 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -692,14 +692,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 			sockfd_put(sock);
 			goto out_unlock;
 		}
-		if (umem_xs->dev != dev) {
-			err = -EINVAL;
-			sockfd_put(sock);
-			goto out_unlock;
-		}
 
-		if (umem_xs->queue_id != qid) {
-			/* Share the umem with another socket on another qid */
+		if (umem_xs->queue_id != qid || umem_xs->dev != dev) {
+			/* Share the umem with another socket on another qid
+			 * and/or device.
+			 */
 			new_pool = xp_assign_umem(xs->pool, umem_xs->umem);
 			if (!new_pool) {
 				sockfd_put(sock);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 12/14] libbpf: support shared umems between queues and devices
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (10 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 11/14] xsk: add shared umem support between devices Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 13/14] samples/bpf: add new sample xsk_fwd.c Magnus Karlsson
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Add support for shared umems between hardware queues and devices to
the AF_XDP part of libbpf. This so that zero-copy can be achieved in
applications that want to send and receive packets between HW queues
on one device or between different devices/netdevs.

In order to create sockets that share a umem between hardware queues
and devices, a new function has been added called
xsk_socket__create_shared(). It takes the same arguments as
xsk_socket_create() plus references to a fill ring and a completion
ring. So for every socket that share a umem, you need to have one more
set of fill and completion rings. This in order to maintain the
single-producer single-consumer semantics of the rings.

You can create all the sockets via the new xsk_socket__create_shared()
call, or create the first one with xsk_socket__create() and the rest
with xsk_socket__create_shared(). Both methods work.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 tools/lib/bpf/libbpf.map |   1 +
 tools/lib/bpf/xsk.c      | 376 ++++++++++++++++++++++++++++++-----------------
 tools/lib/bpf/xsk.h      |   9 ++
 3 files changed, 254 insertions(+), 132 deletions(-)

diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 6544d2c..eb8065b 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -288,4 +288,5 @@ LIBBPF_0.1.0 {
 		bpf_map__value_size;
 		bpf_program__autoload;
 		bpf_program__set_autoload;
+		xsk_socket__create_shared;
 } LIBBPF_0.0.9;
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index f7f4efb..86ad4f7 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -20,6 +20,7 @@
 #include <linux/if_ether.h>
 #include <linux/if_packet.h>
 #include <linux/if_xdp.h>
+#include <linux/list.h>
 #include <linux/sockios.h>
 #include <net/if.h>
 #include <sys/ioctl.h>
@@ -48,26 +49,35 @@
 #endif
 
 struct xsk_umem {
-	struct xsk_ring_prod *fill;
-	struct xsk_ring_cons *comp;
+	struct xsk_ring_prod *fill_save;
+	struct xsk_ring_cons *comp_save;
 	char *umem_area;
 	struct xsk_umem_config config;
 	int fd;
 	int refcount;
+	struct list_head ctx_list;
+};
+
+struct xsk_ctx {
+	struct xsk_ring_prod *fill;
+	struct xsk_ring_cons *comp;
+	__u32 queue_id;
+	struct xsk_umem *umem;
+	int refcount;
+	int ifindex;
+	struct list_head list;
+	int prog_fd;
+	int xsks_map_fd;
+	char ifname[IFNAMSIZ];
 };
 
 struct xsk_socket {
 	struct xsk_ring_cons *rx;
 	struct xsk_ring_prod *tx;
 	__u64 outstanding_tx;
-	struct xsk_umem *umem;
+	struct xsk_ctx *ctx;
 	struct xsk_socket_config config;
 	int fd;
-	int ifindex;
-	int prog_fd;
-	int xsks_map_fd;
-	__u32 queue_id;
-	char ifname[IFNAMSIZ];
 };
 
 struct xsk_nl_info {
@@ -203,15 +213,73 @@ static int xsk_get_mmap_offsets(int fd, struct xdp_mmap_offsets *off)
 	return -EINVAL;
 }
 
+static int xsk_create_umem_rings(struct xsk_umem *umem, int fd,
+				 struct xsk_ring_prod *fill,
+				 struct xsk_ring_cons *comp)
+{
+	struct xdp_mmap_offsets off;
+	void *map;
+	int err;
+
+	err = setsockopt(fd, SOL_XDP, XDP_UMEM_FILL_RING,
+			 &umem->config.fill_size,
+			 sizeof(umem->config.fill_size));
+	if (err)
+		return -errno;
+
+	err = setsockopt(fd, SOL_XDP, XDP_UMEM_COMPLETION_RING,
+			 &umem->config.comp_size,
+			 sizeof(umem->config.comp_size));
+	if (err)
+		return -errno;
+
+	err = xsk_get_mmap_offsets(fd, &off);
+	if (err)
+		return -errno;
+
+	map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64),
+		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+		   XDP_UMEM_PGOFF_FILL_RING);
+	if (map == MAP_FAILED)
+		return -errno;
+
+	fill->mask = umem->config.fill_size - 1;
+	fill->size = umem->config.fill_size;
+	fill->producer = map + off.fr.producer;
+	fill->consumer = map + off.fr.consumer;
+	fill->flags = map + off.fr.flags;
+	fill->ring = map + off.fr.desc;
+	fill->cached_cons = umem->config.fill_size;
+
+	map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64),
+		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+		   XDP_UMEM_PGOFF_COMPLETION_RING);
+	if (map == MAP_FAILED) {
+		err = -errno;
+		goto out_mmap;
+	}
+
+	comp->mask = umem->config.comp_size - 1;
+	comp->size = umem->config.comp_size;
+	comp->producer = map + off.cr.producer;
+	comp->consumer = map + off.cr.consumer;
+	comp->flags = map + off.cr.flags;
+	comp->ring = map + off.cr.desc;
+
+	return 0;
+
+out_mmap:
+	munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64));
+	return err;
+}
+
 int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area,
 			    __u64 size, struct xsk_ring_prod *fill,
 			    struct xsk_ring_cons *comp,
 			    const struct xsk_umem_config *usr_config)
 {
-	struct xdp_mmap_offsets off;
 	struct xdp_umem_reg mr;
 	struct xsk_umem *umem;
-	void *map;
 	int err;
 
 	if (!umem_area || !umem_ptr || !fill || !comp)
@@ -230,6 +298,7 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area,
 	}
 
 	umem->umem_area = umem_area;
+	INIT_LIST_HEAD(&umem->ctx_list);
 	xsk_set_umem_config(&umem->config, usr_config);
 
 	memset(&mr, 0, sizeof(mr));
@@ -244,71 +313,16 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area,
 		err = -errno;
 		goto out_socket;
 	}
-	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_FILL_RING,
-			 &umem->config.fill_size,
-			 sizeof(umem->config.fill_size));
-	if (err) {
-		err = -errno;
-		goto out_socket;
-	}
-	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_COMPLETION_RING,
-			 &umem->config.comp_size,
-			 sizeof(umem->config.comp_size));
-	if (err) {
-		err = -errno;
-		goto out_socket;
-	}
 
-	err = xsk_get_mmap_offsets(umem->fd, &off);
-	if (err) {
-		err = -errno;
-		goto out_socket;
-	}
-
-	map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64),
-		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
-		   XDP_UMEM_PGOFF_FILL_RING);
-	if (map == MAP_FAILED) {
-		err = -errno;
+	err = xsk_create_umem_rings(umem, umem->fd, fill, comp);
+	if (err)
 		goto out_socket;
-	}
-
-	umem->fill = fill;
-	fill->mask = umem->config.fill_size - 1;
-	fill->size = umem->config.fill_size;
-	fill->producer = map + off.fr.producer;
-	fill->consumer = map + off.fr.consumer;
-	fill->flags = map + off.fr.flags;
-	fill->ring = map + off.fr.desc;
-	fill->cached_prod = *fill->producer;
-	/* cached_cons is "size" bigger than the real consumer pointer
-	 * See xsk_prod_nb_free
-	 */
-	fill->cached_cons = *fill->consumer + umem->config.fill_size;
-
-	map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64),
-		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
-		   XDP_UMEM_PGOFF_COMPLETION_RING);
-	if (map == MAP_FAILED) {
-		err = -errno;
-		goto out_mmap;
-	}
-
-	umem->comp = comp;
-	comp->mask = umem->config.comp_size - 1;
-	comp->size = umem->config.comp_size;
-	comp->producer = map + off.cr.producer;
-	comp->consumer = map + off.cr.consumer;
-	comp->flags = map + off.cr.flags;
-	comp->ring = map + off.cr.desc;
-	comp->cached_prod = *comp->producer;
-	comp->cached_cons = *comp->consumer;
 
+	umem->fill_save = fill;
+	umem->comp_save = comp;
 	*umem_ptr = umem;
 	return 0;
 
-out_mmap:
-	munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64));
 out_socket:
 	close(umem->fd);
 out_umem_alloc:
@@ -342,6 +356,7 @@ DEFAULT_VERSION(xsk_umem__create_v0_0_4, xsk_umem__create, LIBBPF_0.0.4)
 static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 {
 	static const int log_buf_size = 16 * 1024;
+	struct xsk_ctx *ctx = xsk->ctx;
 	char log_buf[log_buf_size];
 	int err, prog_fd;
 
@@ -369,7 +384,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 		/* *(u32 *)(r10 - 4) = r2 */
 		BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -4),
 		/* r1 = xskmap[] */
-		BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
+		BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd),
 		/* r3 = XDP_PASS */
 		BPF_MOV64_IMM(BPF_REG_3, 2),
 		/* call bpf_redirect_map */
@@ -381,7 +396,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 		/* r2 += -4 */
 		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
 		/* r1 = xskmap[] */
-		BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
+		BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd),
 		/* call bpf_map_lookup_elem */
 		BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
 		/* r1 = r0 */
@@ -393,7 +408,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 		/* r2 = *(u32 *)(r10 - 4) */
 		BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
 		/* r1 = xskmap[] */
-		BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
+		BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd),
 		/* r3 = 0 */
 		BPF_MOV64_IMM(BPF_REG_3, 0),
 		/* call bpf_redirect_map */
@@ -411,19 +426,21 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 		return prog_fd;
 	}
 
-	err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
+	err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd,
+				  xsk->config.xdp_flags);
 	if (err) {
 		close(prog_fd);
 		return err;
 	}
 
-	xsk->prog_fd = prog_fd;
+	ctx->prog_fd = prog_fd;
 	return 0;
 }
 
 static int xsk_get_max_queues(struct xsk_socket *xsk)
 {
 	struct ethtool_channels channels = { .cmd = ETHTOOL_GCHANNELS };
+	struct xsk_ctx *ctx = xsk->ctx;
 	struct ifreq ifr = {};
 	int fd, err, ret;
 
@@ -432,7 +449,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
 		return -errno;
 
 	ifr.ifr_data = (void *)&channels;
-	memcpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
+	memcpy(ifr.ifr_name, ctx->ifname, IFNAMSIZ - 1);
 	ifr.ifr_name[IFNAMSIZ - 1] = '\0';
 	err = ioctl(fd, SIOCETHTOOL, &ifr);
 	if (err && errno != EOPNOTSUPP) {
@@ -460,6 +477,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
 
 static int xsk_create_bpf_maps(struct xsk_socket *xsk)
 {
+	struct xsk_ctx *ctx = xsk->ctx;
 	int max_queues;
 	int fd;
 
@@ -472,15 +490,17 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk)
 	if (fd < 0)
 		return fd;
 
-	xsk->xsks_map_fd = fd;
+	ctx->xsks_map_fd = fd;
 
 	return 0;
 }
 
 static void xsk_delete_bpf_maps(struct xsk_socket *xsk)
 {
-	bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id);
-	close(xsk->xsks_map_fd);
+	struct xsk_ctx *ctx = xsk->ctx;
+
+	bpf_map_delete_elem(ctx->xsks_map_fd, &ctx->queue_id);
+	close(ctx->xsks_map_fd);
 }
 
 static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
@@ -488,10 +508,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 	__u32 i, *map_ids, num_maps, prog_len = sizeof(struct bpf_prog_info);
 	__u32 map_len = sizeof(struct bpf_map_info);
 	struct bpf_prog_info prog_info = {};
+	struct xsk_ctx *ctx = xsk->ctx;
 	struct bpf_map_info map_info;
 	int fd, err;
 
-	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
+	err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len);
 	if (err)
 		return err;
 
@@ -505,11 +526,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 	prog_info.nr_map_ids = num_maps;
 	prog_info.map_ids = (__u64)(unsigned long)map_ids;
 
-	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
+	err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len);
 	if (err)
 		goto out_map_ids;
 
-	xsk->xsks_map_fd = -1;
+	ctx->xsks_map_fd = -1;
 
 	for (i = 0; i < prog_info.nr_map_ids; i++) {
 		fd = bpf_map_get_fd_by_id(map_ids[i]);
@@ -523,7 +544,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 		}
 
 		if (!strcmp(map_info.name, "xsks_map")) {
-			xsk->xsks_map_fd = fd;
+			ctx->xsks_map_fd = fd;
 			continue;
 		}
 
@@ -531,7 +552,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 	}
 
 	err = 0;
-	if (xsk->xsks_map_fd == -1)
+	if (ctx->xsks_map_fd == -1)
 		err = -ENOENT;
 
 out_map_ids:
@@ -541,16 +562,19 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 
 static int xsk_set_bpf_maps(struct xsk_socket *xsk)
 {
-	return bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
+	struct xsk_ctx *ctx = xsk->ctx;
+
+	return bpf_map_update_elem(ctx->xsks_map_fd, &ctx->queue_id,
 				   &xsk->fd, 0);
 }
 
 static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
 {
+	struct xsk_ctx *ctx = xsk->ctx;
 	__u32 prog_id = 0;
 	int err;
 
-	err = bpf_get_link_xdp_id(xsk->ifindex, &prog_id,
+	err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id,
 				  xsk->config.xdp_flags);
 	if (err)
 		return err;
@@ -566,12 +590,12 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
 			return err;
 		}
 	} else {
-		xsk->prog_fd = bpf_prog_get_fd_by_id(prog_id);
-		if (xsk->prog_fd < 0)
+		ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id);
+		if (ctx->prog_fd < 0)
 			return -errno;
 		err = xsk_lookup_bpf_maps(xsk);
 		if (err) {
-			close(xsk->prog_fd);
+			close(ctx->prog_fd);
 			return err;
 		}
 	}
@@ -580,25 +604,110 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
 		err = xsk_set_bpf_maps(xsk);
 	if (err) {
 		xsk_delete_bpf_maps(xsk);
-		close(xsk->prog_fd);
+		close(ctx->prog_fd);
 		return err;
 	}
 
 	return 0;
 }
 
-int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
-		       __u32 queue_id, struct xsk_umem *umem,
-		       struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
-		       const struct xsk_socket_config *usr_config)
+static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex,
+				   __u32 queue_id)
+{
+	struct xsk_ctx *ctx;
+
+	if (list_empty(&umem->ctx_list))
+		return NULL;
+
+	list_for_each_entry(ctx, &umem->ctx_list, list) {
+		if (ctx->ifindex == ifindex && ctx->queue_id == queue_id) {
+			ctx->refcount++;
+			return ctx;
+		}
+	}
+
+	return NULL;
+}
+
+static void xsk_put_ctx(struct xsk_ctx *ctx)
+{
+	struct xsk_umem *umem = ctx->umem;
+	struct xdp_mmap_offsets off;
+	int err;
+
+	if (--ctx->refcount == 0) {
+		err = xsk_get_mmap_offsets(umem->fd, &off);
+		if (!err) {
+			munmap(ctx->fill->ring - off.fr.desc,
+			       off.fr.desc + umem->config.fill_size *
+			       sizeof(__u64));
+			munmap(ctx->comp->ring - off.cr.desc,
+			       off.cr.desc + umem->config.comp_size *
+			       sizeof(__u64));
+		}
+
+		list_del(&ctx->list);
+		free(ctx);
+	}
+}
+
+static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk,
+				      struct xsk_umem *umem, int ifindex,
+				      const char *ifname, __u32 queue_id,
+				      struct xsk_ring_prod *fill,
+				      struct xsk_ring_cons *comp)
+{
+	struct xsk_ctx *ctx;
+	int err;
+
+	ctx = calloc(1, sizeof(*ctx));
+	if (!ctx)
+		return NULL;
+
+	if (!umem->fill_save) {
+		err = xsk_create_umem_rings(umem, xsk->fd, fill, comp);
+		if (err) {
+			free(ctx);
+			return NULL;
+		}
+	} else if (umem->fill_save != fill || umem->comp_save != comp) {
+		/* Copy over rings to new structs. */
+		memcpy(fill, umem->fill_save, sizeof(*fill));
+		memcpy(comp, umem->comp_save, sizeof(*comp));
+	}
+
+	ctx->ifindex = ifindex;
+	ctx->refcount = 1;
+	ctx->umem = umem;
+	ctx->queue_id = queue_id;
+	memcpy(ctx->ifname, ifname, IFNAMSIZ - 1);
+	ctx->ifname[IFNAMSIZ - 1] = '\0';
+
+	umem->fill_save = NULL;
+	umem->comp_save = NULL;
+	ctx->fill = fill;
+	ctx->comp = comp;
+	list_add(&ctx->list, &umem->ctx_list);
+	return ctx;
+}
+
+int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
+			      const char *ifname,
+			      __u32 queue_id, struct xsk_umem *umem,
+			      struct xsk_ring_cons *rx,
+			      struct xsk_ring_prod *tx,
+			      struct xsk_ring_prod *fill,
+			      struct xsk_ring_cons *comp,
+			      const struct xsk_socket_config *usr_config)
 {
 	void *rx_map = NULL, *tx_map = NULL;
 	struct sockaddr_xdp sxdp = {};
 	struct xdp_mmap_offsets off;
 	struct xsk_socket *xsk;
-	int err;
+	struct xsk_ctx *ctx;
+	int err, ifindex;
 
-	if (!umem || !xsk_ptr || !(rx || tx))
+	if (!umem || !xsk_ptr || !(rx || tx) || !fill || !comp)
 		return -EFAULT;
 
 	xsk = calloc(1, sizeof(*xsk));
@@ -609,10 +718,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	if (err)
 		goto out_xsk_alloc;
 
-	if (umem->refcount &&
-	    !(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
-		pr_warn("Error: shared umems not supported by libbpf supplied XDP program.\n");
-		err = -EBUSY;
+	xsk->outstanding_tx = 0;
+	ifindex = if_nametoindex(ifname);
+	if (!ifindex) {
+		err = -errno;
 		goto out_xsk_alloc;
 	}
 
@@ -626,16 +735,16 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 		xsk->fd = umem->fd;
 	}
 
-	xsk->outstanding_tx = 0;
-	xsk->queue_id = queue_id;
-	xsk->umem = umem;
-	xsk->ifindex = if_nametoindex(ifname);
-	if (!xsk->ifindex) {
-		err = -errno;
-		goto out_socket;
+	ctx = xsk_get_ctx(umem, ifindex, queue_id);
+	if (!ctx) {
+		ctx = xsk_create_ctx(xsk, umem, ifindex, ifname, queue_id,
+				     fill, comp);
+		if (!ctx) {
+			err = -ENOMEM;
+			goto out_socket;
+		}
 	}
-	memcpy(xsk->ifname, ifname, IFNAMSIZ - 1);
-	xsk->ifname[IFNAMSIZ - 1] = '\0';
+	xsk->ctx = ctx;
 
 	if (rx) {
 		err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING,
@@ -643,7 +752,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 				 sizeof(xsk->config.rx_size));
 		if (err) {
 			err = -errno;
-			goto out_socket;
+			goto out_put_ctx;
 		}
 	}
 	if (tx) {
@@ -652,14 +761,14 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 				 sizeof(xsk->config.tx_size));
 		if (err) {
 			err = -errno;
-			goto out_socket;
+			goto out_put_ctx;
 		}
 	}
 
 	err = xsk_get_mmap_offsets(xsk->fd, &off);
 	if (err) {
 		err = -errno;
-		goto out_socket;
+		goto out_put_ctx;
 	}
 
 	if (rx) {
@@ -669,7 +778,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 			      xsk->fd, XDP_PGOFF_RX_RING);
 		if (rx_map == MAP_FAILED) {
 			err = -errno;
-			goto out_socket;
+			goto out_put_ctx;
 		}
 
 		rx->mask = xsk->config.rx_size - 1;
@@ -708,10 +817,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	xsk->tx = tx;
 
 	sxdp.sxdp_family = PF_XDP;
-	sxdp.sxdp_ifindex = xsk->ifindex;
-	sxdp.sxdp_queue_id = xsk->queue_id;
+	sxdp.sxdp_ifindex = ctx->ifindex;
+	sxdp.sxdp_queue_id = ctx->queue_id;
 	if (umem->refcount > 1) {
-		sxdp.sxdp_flags = XDP_SHARED_UMEM;
+		sxdp.sxdp_flags |= XDP_SHARED_UMEM;
 		sxdp.sxdp_shared_umem_fd = umem->fd;
 	} else {
 		sxdp.sxdp_flags = xsk->config.bind_flags;
@@ -723,7 +832,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 		goto out_mmap_tx;
 	}
 
-	xsk->prog_fd = -1;
+	ctx->prog_fd = -1;
 
 	if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
 		err = xsk_setup_xdp_prog(xsk);
@@ -742,6 +851,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	if (rx)
 		munmap(rx_map, off.rx.desc +
 		       xsk->config.rx_size * sizeof(struct xdp_desc));
+out_put_ctx:
+	xsk_put_ctx(ctx);
 out_socket:
 	if (--umem->refcount)
 		close(xsk->fd);
@@ -750,25 +861,24 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	return err;
 }
 
-int xsk_umem__delete(struct xsk_umem *umem)
+int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
+		       __u32 queue_id, struct xsk_umem *umem,
+		       struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
+		       const struct xsk_socket_config *usr_config)
 {
-	struct xdp_mmap_offsets off;
-	int err;
+	return xsk_socket__create_shared(xsk_ptr, ifname, queue_id, umem,
+					 rx, tx, umem->fill_save,
+					 umem->comp_save, usr_config);
+}
 
+int xsk_umem__delete(struct xsk_umem *umem)
+{
 	if (!umem)
 		return 0;
 
 	if (umem->refcount)
 		return -EBUSY;
 
-	err = xsk_get_mmap_offsets(umem->fd, &off);
-	if (!err) {
-		munmap(umem->fill->ring - off.fr.desc,
-		       off.fr.desc + umem->config.fill_size * sizeof(__u64));
-		munmap(umem->comp->ring - off.cr.desc,
-		       off.cr.desc + umem->config.comp_size * sizeof(__u64));
-	}
-
 	close(umem->fd);
 	free(umem);
 
@@ -778,15 +888,16 @@ int xsk_umem__delete(struct xsk_umem *umem)
 void xsk_socket__delete(struct xsk_socket *xsk)
 {
 	size_t desc_sz = sizeof(struct xdp_desc);
+	struct xsk_ctx *ctx = xsk->ctx;
 	struct xdp_mmap_offsets off;
 	int err;
 
 	if (!xsk)
 		return;
 
-	if (xsk->prog_fd != -1) {
+	if (ctx->prog_fd != -1) {
 		xsk_delete_bpf_maps(xsk);
-		close(xsk->prog_fd);
+		close(ctx->prog_fd);
 	}
 
 	err = xsk_get_mmap_offsets(xsk->fd, &off);
@@ -799,14 +910,15 @@ void xsk_socket__delete(struct xsk_socket *xsk)
 			munmap(xsk->tx->ring - off.tx.desc,
 			       off.tx.desc + xsk->config.tx_size * desc_sz);
 		}
-
 	}
 
-	xsk->umem->refcount--;
+	xsk_put_ctx(ctx);
+
+	ctx->umem->refcount--;
 	/* Do not close an fd that also has an associated umem connected
 	 * to it.
 	 */
-	if (xsk->fd != xsk->umem->fd)
+	if (xsk->fd != ctx->umem->fd)
 		close(xsk->fd);
 	free(xsk);
 }
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 584f682..1069c46 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -234,6 +234,15 @@ LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
 				  struct xsk_ring_cons *rx,
 				  struct xsk_ring_prod *tx,
 				  const struct xsk_socket_config *config);
+LIBBPF_API int
+xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
+			  const char *ifname,
+			  __u32 queue_id, struct xsk_umem *umem,
+			  struct xsk_ring_cons *rx,
+			  struct xsk_ring_prod *tx,
+			  struct xsk_ring_prod *fill,
+			  struct xsk_ring_cons *comp,
+			  const struct xsk_socket_config *config);
 
 /* Returns 0 for success and -EBUSY if the umem is still in use. */
 LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 13/14] samples/bpf: add new sample xsk_fwd.c
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (11 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 12/14] libbpf: support shared umems between queues and devices Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-02 12:19 ` [PATCH bpf-next 14/14] xsk: documentation for XDP_SHARED_UMEM between queues and netdevs Magnus Karlsson
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: Cristian Dumitrescu, bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

From: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

This sample code illustrates the packet forwarding between multiple
AF_XDP sockets in multi-threading environment. All the threads and
sockets are sharing a common buffer pool, with each socket having
its own private buffer cache. The sockets are created with the
xsk_socket__create_shared() function, which allows multiple AF_XDP
sockets to share the same UMEM object.

Example 1: Single thread handling two sockets. Packets received
from socket A (on top of interface IFA, queue QA) are forwarded
to socket B (on top of interface IFB, queue QB) and vice-versa.
The thread is affinitized to CPU core C:

./xsk_fwd -i IFA -q QA -i IFB -q QB -c C

Example 2: Two threads, each handling two sockets. Packets from
socket A are sent to socket B (by thread X), packets
from socket B are sent to socket A (by thread X); packets from
socket C are sent to socket D (by thread Y), packets from socket
D are sent to socket C (by thread Y). The two threads are bound
to CPU cores CX and CY:

./xdp_fwd -i IFA -q QA -i IFB -q QB -i IFC -q QC -i IFD -q QD
-c CX -c CY

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 samples/bpf/Makefile  |    3 +
 samples/bpf/xsk_fwd.c | 1075 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1078 insertions(+)
 create mode 100644 samples/bpf/xsk_fwd.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 8403e47..92a3cfb 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -48,6 +48,7 @@ tprogs-y += syscall_tp
 tprogs-y += cpustat
 tprogs-y += xdp_adjust_tail
 tprogs-y += xdpsock
+tprogs-y += xsk_fwd
 tprogs-y += xdp_fwd
 tprogs-y += task_fd_query
 tprogs-y += xdp_sample_pkts
@@ -104,6 +105,7 @@ syscall_tp-objs := bpf_load.o syscall_tp_user.o
 cpustat-objs := bpf_load.o cpustat_user.o
 xdp_adjust_tail-objs := xdp_adjust_tail_user.o
 xdpsock-objs := xdpsock_user.o
+xsk_fwd-objs := xsk_fwd.o
 xdp_fwd-objs := xdp_fwd_user.o
 task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
 xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
@@ -203,6 +205,7 @@ TPROGLDLIBS_trace_output	+= -lrt
 TPROGLDLIBS_map_perf_test	+= -lrt
 TPROGLDLIBS_test_overhead	+= -lrt
 TPROGLDLIBS_xdpsock		+= -pthread
+TPROGLDLIBS_xsk_fwd		+= -pthread
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
 #  make M=samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/xsk_fwd.c b/samples/bpf/xsk_fwd.c
new file mode 100644
index 0000000..3af105e
--- /dev/null
+++ b/samples/bpf/xsk_fwd.c
@@ -0,0 +1,1075 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2020 Intel Corporation. */
+
+#define _GNU_SOURCE
+#include <poll.h>
+#include <pthread.h>
+#include <signal.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+#include <getopt.h>
+#include <netinet/ether.h>
+
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/xsk.h>
+#include <bpf/bpf.h>
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+typedef __u64 u64;
+typedef __u32 u32;
+typedef __u16 u16;
+typedef __u8  u8;
+
+/* This program illustrates the packet forwarding between multiple AF_XDP
+ * sockets in multi-threaded environment. All threads are sharing a common
+ * buffer pool, with each socket having its own private buffer cache.
+ *
+ * Example 1: Single thread handling two sockets. The packets received by socket
+ * A (interface IFA, queue QA) are forwarded to socket B (interface IFB, queue
+ * QB), while the packets received by socket B are forwarded to socket A. The
+ * thread is running on CPU core X:
+ *
+ *         ./xsk_fwd -i IFA -q QA -i IFB -q QB -c X
+ *
+ * Example 2: Two threads, each handling two sockets. The thread running on CPU
+ * core X forwards all the packets received by socket A to socket B, and all the
+ * packets received by socket B to socket A. The thread running on CPU core Y is
+ * performing the same packet forwarding between sockets C and D:
+ *
+ *         ./xsk_fwd -i IFA -q QA -i IFB -q QB -i IFC -q QC -i IFD -q QD
+ *         -c CX -c CY
+ */
+
+/*
+ * Buffer pool and buffer cache
+ *
+ * For packet forwarding, the packet buffers are typically allocated from the
+ * pool for packet reception and freed back to the pool for further reuse once
+ * the packet transmission is completed.
+ *
+ * The buffer pool is shared between multiple threads. In order to minimize the
+ * access latency to the shared buffer pool, each thread creates one (or
+ * several) buffer caches, which, unlike the buffer pool, are private to the
+ * thread that creates them and therefore cannot be shared with other threads.
+ * The access to the shared pool is only needed either (A) when the cache gets
+ * empty due to repeated buffer allocations and it needs to be replenished from
+ * the pool, or (B) when the cache gets full due to repeated buffer free and it
+ * needs to be flushed back to the pull.
+ *
+ * In a packet forwarding system, a packet received on any input port can
+ * potentially be transmitted on any output port, depending on the forwarding
+ * configuration. For AF_XDP sockets, for this to work with zero-copy of the
+ * packet buffers when, it is required that the buffer pool memory fits into the
+ * UMEM area shared by all the sockets.
+ */
+
+struct bpool_params {
+	u32 n_buffers;
+	u32 buffer_size;
+	int mmap_flags;
+
+	u32 n_users_max;
+	u32 n_buffers_per_slab;
+};
+
+/* This buffer pool implementation organizes the buffers into equally sized
+ * slabs of *n_buffers_per_slab*. Initially, there are *n_slabs* slabs in the
+ * pool that are completely filled with buffer pointers (full slabs).
+ *
+ * Each buffer cache has a slab for buffer allocation and a slab for buffer
+ * free, with both of these slabs initially empty. When the cache's allocation
+ * slab goes empty, it is swapped with one of the available full slabs from the
+ * pool, if any is available. When the cache's free slab goes full, it is
+ * swapped for one of the empty slabs from the pool, which is guaranteed to
+ * succeed.
+ *
+ * Partially filled slabs never get traded between the cache and the pool
+ * (except when the cache itself is destroyed), which enables fast operation
+ * through pointer swapping.
+ */
+struct bpool {
+	struct bpool_params params;
+	pthread_mutex_t lock;
+	void *addr;
+
+	u64 **slabs;
+	u64 **slabs_reserved;
+	u64 *buffers;
+	u64 *buffers_reserved;
+
+	u64 n_slabs;
+	u64 n_slabs_reserved;
+	u64 n_buffers;
+
+	u64 n_slabs_available;
+	u64 n_slabs_reserved_available;
+
+	struct xsk_umem_config umem_cfg;
+	struct xsk_ring_prod umem_fq;
+	struct xsk_ring_cons umem_cq;
+	struct xsk_umem *umem;
+};
+
+static struct bpool *
+bpool_init(struct bpool_params *params,
+	   struct xsk_umem_config *umem_cfg)
+{
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	u64 n_slabs, n_slabs_reserved, n_buffers, n_buffers_reserved;
+	u64 slabs_size, slabs_reserved_size;
+	u64 buffers_size, buffers_reserved_size;
+	u64 total_size, i;
+	struct bpool *bp;
+	u8 *p;
+	int status;
+
+	/* mmap prep. */
+	if (setrlimit(RLIMIT_MEMLOCK, &r))
+		return NULL;
+
+	/* bpool internals dimensioning. */
+	n_slabs = (params->n_buffers + params->n_buffers_per_slab - 1) /
+		params->n_buffers_per_slab;
+	n_slabs_reserved = params->n_users_max * 2;
+	n_buffers = n_slabs * params->n_buffers_per_slab;
+	n_buffers_reserved = n_slabs_reserved * params->n_buffers_per_slab;
+
+	slabs_size = n_slabs * sizeof(u64 *);
+	slabs_reserved_size = n_slabs_reserved * sizeof(u64 *);
+	buffers_size = n_buffers * sizeof(u64);
+	buffers_reserved_size = n_buffers_reserved * sizeof(u64);
+
+	total_size = sizeof(struct bpool) +
+		slabs_size + slabs_reserved_size +
+		buffers_size + buffers_reserved_size;
+
+	/* bpool memory allocation. */
+	p = calloc(total_size, sizeof(u8));
+	if (!p)
+		return NULL;
+
+	/* bpool memory initialization. */
+	bp = (struct bpool *)p;
+	memcpy(&bp->params, params, sizeof(*params));
+	bp->params.n_buffers = n_buffers;
+
+	bp->slabs = (u64 **)&p[sizeof(struct bpool)];
+	bp->slabs_reserved = (u64 **)&p[sizeof(struct bpool) +
+		slabs_size];
+	bp->buffers = (u64 *)&p[sizeof(struct bpool) +
+		slabs_size + slabs_reserved_size];
+	bp->buffers_reserved = (u64 *)&p[sizeof(struct bpool) +
+		slabs_size + slabs_reserved_size + buffers_size];
+
+	bp->n_slabs = n_slabs;
+	bp->n_slabs_reserved = n_slabs_reserved;
+	bp->n_buffers = n_buffers;
+
+	for (i = 0; i < n_slabs; i++)
+		bp->slabs[i] = &bp->buffers[i * params->n_buffers_per_slab];
+	bp->n_slabs_available = n_slabs;
+
+	for (i = 0; i < n_slabs_reserved; i++)
+		bp->slabs_reserved[i] = &bp->buffers_reserved[i *
+			params->n_buffers_per_slab];
+	bp->n_slabs_reserved_available = n_slabs_reserved;
+
+	for (i = 0; i < n_buffers; i++)
+		bp->buffers[i] = i * params->buffer_size;
+
+	/* lock. */
+	status = pthread_mutex_init(&bp->lock, NULL);
+	if (status) {
+		free(p);
+		return NULL;
+	}
+
+	/* mmap. */
+	bp->addr = mmap(NULL,
+			n_buffers * params->buffer_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | params->mmap_flags,
+			-1,
+			0);
+	if (bp->addr == MAP_FAILED) {
+		pthread_mutex_destroy(&bp->lock);
+		free(p);
+		return NULL;
+	}
+
+	/* umem. */
+	status = xsk_umem__create(&bp->umem,
+				  bp->addr,
+				  bp->params.n_buffers * bp->params.buffer_size,
+				  &bp->umem_fq,
+				  &bp->umem_cq,
+				  umem_cfg);
+	if (status) {
+		munmap(bp->addr, bp->params.n_buffers * bp->params.buffer_size);
+		pthread_mutex_destroy(&bp->lock);
+		free(p);
+		return NULL;
+	}
+	memcpy(&bp->umem_cfg, umem_cfg, sizeof(*umem_cfg));
+
+	return bp;
+}
+
+static void
+bpool_free(struct bpool *bp)
+{
+	if (!bp)
+		return;
+
+	xsk_umem__delete(bp->umem);
+	munmap(bp->addr, bp->params.n_buffers * bp->params.buffer_size);
+	pthread_mutex_destroy(&bp->lock);
+	free(bp);
+}
+
+struct bcache {
+	struct bpool *bp;
+
+	u64 *slab_cons;
+	u64 *slab_prod;
+
+	u64 n_buffers_cons;
+	u64 n_buffers_prod;
+};
+
+static u32
+bcache_slab_size(struct bcache *bc)
+{
+	struct bpool *bp = bc->bp;
+
+	return bp->params.n_buffers_per_slab;
+}
+
+static struct bcache *
+bcache_init(struct bpool *bp)
+{
+	struct bcache *bc;
+
+	bc = calloc(1, sizeof(struct bcache));
+	if (!bc)
+		return NULL;
+
+	bc->bp = bp;
+	bc->n_buffers_cons = 0;
+	bc->n_buffers_prod = 0;
+
+	pthread_mutex_lock(&bp->lock);
+	if (bp->n_slabs_reserved_available == 0) {
+		pthread_mutex_unlock(&bp->lock);
+		free(bc);
+		return NULL;
+	}
+
+	bc->slab_cons = bp->slabs_reserved[bp->n_slabs_reserved_available - 1];
+	bc->slab_prod = bp->slabs_reserved[bp->n_slabs_reserved_available - 2];
+	bp->n_slabs_reserved_available -= 2;
+	pthread_mutex_unlock(&bp->lock);
+
+	return bc;
+}
+
+static void
+bcache_free(struct bcache *bc)
+{
+	struct bpool *bp;
+
+	if (!bc)
+		return;
+
+	/* In order to keep this example simple, the case of freeing any
+	 * existing buffers from the cache back to the pool is ignored.
+	 */
+
+	bp = bc->bp;
+	pthread_mutex_lock(&bp->lock);
+	bp->slabs_reserved[bp->n_slabs_reserved_available] = bc->slab_prod;
+	bp->slabs_reserved[bp->n_slabs_reserved_available + 1] = bc->slab_cons;
+	bp->n_slabs_reserved_available += 2;
+	pthread_mutex_unlock(&bp->lock);
+
+	free(bc);
+}
+
+/* To work correctly, the implementation requires that the *n_buffers* input
+ * argument is never greater than the buffer pool's *n_buffers_per_slab*. This
+ * is typically the case, with one exception taking place when large number of
+ * buffers are allocated at init time (e.g. for the UMEM fill queue setup).
+ */
+static inline u32
+bcache_cons_check(struct bcache *bc, u32 n_buffers)
+{
+	struct bpool *bp = bc->bp;
+	u64 n_buffers_per_slab = bp->params.n_buffers_per_slab;
+	u64 n_buffers_cons = bc->n_buffers_cons;
+	u64 n_slabs_available;
+	u64 *slab_full;
+
+	/*
+	 * Consumer slab is not empty: Use what's available locally. Do not
+	 * look for more buffers from the pool when the ask can only be
+	 * partially satisfied.
+	 */
+	if (n_buffers_cons)
+		return (n_buffers_cons < n_buffers) ?
+			n_buffers_cons :
+			n_buffers;
+
+	/*
+	 * Consumer slab is empty: look to trade the current consumer slab
+	 * (full) for a full slab from the pool, if any is available.
+	 */
+	pthread_mutex_lock(&bp->lock);
+	n_slabs_available = bp->n_slabs_available;
+	if (!n_slabs_available) {
+		pthread_mutex_unlock(&bp->lock);
+		return 0;
+	}
+
+	n_slabs_available--;
+	slab_full = bp->slabs[n_slabs_available];
+	bp->slabs[n_slabs_available] = bc->slab_cons;
+	bp->n_slabs_available = n_slabs_available;
+	pthread_mutex_unlock(&bp->lock);
+
+	bc->slab_cons = slab_full;
+	bc->n_buffers_cons = n_buffers_per_slab;
+	return n_buffers;
+}
+
+static inline u64
+bcache_cons(struct bcache *bc)
+{
+	u64 n_buffers_cons = bc->n_buffers_cons - 1;
+	u64 buffer;
+
+	buffer = bc->slab_cons[n_buffers_cons];
+	bc->n_buffers_cons = n_buffers_cons;
+	return buffer;
+}
+
+static inline void
+bcache_prod(struct bcache *bc, u64 buffer)
+{
+	struct bpool *bp = bc->bp;
+	u64 n_buffers_per_slab = bp->params.n_buffers_per_slab;
+	u64 n_buffers_prod = bc->n_buffers_prod;
+	u64 n_slabs_available;
+	u64 *slab_empty;
+
+	/*
+	 * Producer slab is not yet full: store the current buffer to it.
+	 */
+	if (n_buffers_prod < n_buffers_per_slab) {
+		bc->slab_prod[n_buffers_prod] = buffer;
+		bc->n_buffers_prod = n_buffers_prod + 1;
+		return;
+	}
+
+	/*
+	 * Producer slab is full: trade the cache's current producer slab
+	 * (full) for an empty slab from the pool, then store the current
+	 * buffer to the new producer slab. As one full slab exists in the
+	 * cache, it is guaranteed that there is at least one empty slab
+	 * available in the pool.
+	 */
+	pthread_mutex_lock(&bp->lock);
+	n_slabs_available = bp->n_slabs_available;
+	slab_empty = bp->slabs[n_slabs_available];
+	bp->slabs[n_slabs_available] = bc->slab_prod;
+	bp->n_slabs_available = n_slabs_available + 1;
+	pthread_mutex_unlock(&bp->lock);
+
+	slab_empty[0] = buffer;
+	bc->slab_prod = slab_empty;
+	bc->n_buffers_prod = 1;
+}
+
+/*
+ * Port
+ *
+ * Each of the forwarding ports sits on top of an AF_XDP socket. In order for
+ * packet forwarding to happen with no packet buffer copy, all the sockets need
+ * to share the same UMEM area, which is used as the buffer pool memory.
+ */
+#ifndef MAX_BURST_RX
+#define MAX_BURST_RX 64
+#endif
+
+#ifndef MAX_BURST_TX
+#define MAX_BURST_TX 64
+#endif
+
+struct burst_rx {
+	u64 addr[MAX_BURST_RX];
+	u32 len[MAX_BURST_RX];
+};
+
+struct burst_tx {
+	u64 addr[MAX_BURST_TX];
+	u32 len[MAX_BURST_TX];
+	u32 n_pkts;
+};
+
+struct port_params {
+	struct xsk_socket_config xsk_cfg;
+	struct bpool *bp;
+	const char *iface;
+	u32 iface_queue;
+};
+
+struct port {
+	struct port_params params;
+
+	struct bcache *bc;
+
+	struct xsk_ring_cons rxq;
+	struct xsk_ring_prod txq;
+	struct xsk_ring_prod umem_fq;
+	struct xsk_ring_cons umem_cq;
+	struct xsk_socket *xsk;
+	int umem_fq_initialized;
+
+	u64 n_pkts_rx;
+	u64 n_pkts_tx;
+};
+
+static void
+port_free(struct port *p)
+{
+	if (!p)
+		return;
+
+	/* To keep this example simple, the code to free the buffers from the
+	 * socket's receive and transmit queues, as well as from the UMEM fill
+	 * and completion queues, is not included.
+	 */
+
+	if (p->xsk)
+		xsk_socket__delete(p->xsk);
+
+	bcache_free(p->bc);
+
+	free(p);
+}
+
+static struct port *
+port_init(struct port_params *params)
+{
+	struct port *p;
+	u32 umem_fq_size, pos = 0;
+	int status, i;
+
+	/* Memory allocation and initialization. */
+	p = calloc(sizeof(struct port), 1);
+	if (!p)
+		return NULL;
+
+	memcpy(&p->params, params, sizeof(p->params));
+	umem_fq_size = params->bp->umem_cfg.fill_size;
+
+	/* bcache. */
+	p->bc = bcache_init(params->bp);
+	if (!p->bc ||
+	    (bcache_slab_size(p->bc) < umem_fq_size) ||
+	    (bcache_cons_check(p->bc, umem_fq_size) < umem_fq_size)) {
+		port_free(p);
+		return NULL;
+	}
+
+	/* xsk socket. */
+	status = xsk_socket__create_shared(&p->xsk,
+					   params->iface,
+					   params->iface_queue,
+					   params->bp->umem,
+					   &p->rxq,
+					   &p->txq,
+					   &p->umem_fq,
+					   &p->umem_cq,
+					   &params->xsk_cfg);
+	if (status) {
+		port_free(p);
+		return NULL;
+	}
+
+	/* umem fq. */
+	xsk_ring_prod__reserve(&p->umem_fq, umem_fq_size, &pos);
+
+	for (i = 0; i < umem_fq_size; i++)
+		*xsk_ring_prod__fill_addr(&p->umem_fq, pos + i) =
+			bcache_cons(p->bc);
+
+	xsk_ring_prod__submit(&p->umem_fq, umem_fq_size);
+	p->umem_fq_initialized = 1;
+
+	return p;
+}
+
+static inline u32
+port_rx_burst(struct port *p, struct burst_rx *b)
+{
+	u32 n_pkts, pos, i;
+
+	/* Free buffers for FQ replenish. */
+	n_pkts = ARRAY_SIZE(b->addr);
+
+	n_pkts = bcache_cons_check(p->bc, n_pkts);
+	if (!n_pkts)
+		return 0;
+
+	/* RXQ. */
+	n_pkts = xsk_ring_cons__peek(&p->rxq, n_pkts, &pos);
+	if (!n_pkts) {
+		if (xsk_ring_prod__needs_wakeup(&p->umem_fq)) {
+			struct pollfd pollfd = {
+				.fd = xsk_socket__fd(p->xsk),
+				.events = POLLIN,
+			};
+
+			poll(&pollfd, 1, 0);
+		}
+		return 0;
+	}
+
+	for (i = 0; i < n_pkts; i++) {
+		b->addr[i] = xsk_ring_cons__rx_desc(&p->rxq, pos + i)->addr;
+		b->len[i] = xsk_ring_cons__rx_desc(&p->rxq, pos + i)->len;
+	}
+
+	xsk_ring_cons__release(&p->rxq, n_pkts);
+	p->n_pkts_rx += n_pkts;
+
+	/* UMEM FQ. */
+	for ( ; ; ) {
+		int status;
+
+		status = xsk_ring_prod__reserve(&p->umem_fq, n_pkts, &pos);
+		if (status == n_pkts)
+			break;
+
+		if (xsk_ring_prod__needs_wakeup(&p->umem_fq)) {
+			struct pollfd pollfd = {
+				.fd = xsk_socket__fd(p->xsk),
+				.events = POLLIN,
+			};
+
+			poll(&pollfd, 1, 0);
+		}
+	}
+
+	for (i = 0; i < n_pkts; i++)
+		*xsk_ring_prod__fill_addr(&p->umem_fq, pos + i) =
+			bcache_cons(p->bc);
+
+	xsk_ring_prod__submit(&p->umem_fq, n_pkts);
+
+	return n_pkts;
+}
+
+static inline void
+port_tx_burst(struct port *p, struct burst_tx *b)
+{
+	u32 n_pkts, pos, i;
+	int status;
+
+	/* UMEM CQ. */
+	n_pkts = p->params.bp->umem_cfg.comp_size;
+
+	n_pkts = xsk_ring_cons__peek(&p->umem_cq, n_pkts, &pos);
+
+	for (i = 0; i < n_pkts; i++) {
+		u64 addr = *xsk_ring_cons__comp_addr(&p->umem_cq, pos + i);
+
+		bcache_prod(p->bc, addr);
+	}
+
+	xsk_ring_cons__release(&p->umem_cq, n_pkts);
+
+	/* TXQ. */
+	n_pkts = b->n_pkts;
+
+	for ( ; ; ) {
+		status = xsk_ring_prod__reserve(&p->txq, n_pkts, &pos);
+		if (status == n_pkts)
+			break;
+
+		if (xsk_ring_prod__needs_wakeup(&p->txq))
+			sendto(xsk_socket__fd(p->xsk), NULL, 0, MSG_DONTWAIT,
+			       NULL, 0);
+	}
+
+	for (i = 0; i < n_pkts; i++) {
+		xsk_ring_prod__tx_desc(&p->txq, pos + i)->addr = b->addr[i];
+		xsk_ring_prod__tx_desc(&p->txq, pos + i)->len = b->len[i];
+	}
+
+	xsk_ring_prod__submit(&p->txq, n_pkts);
+	if (xsk_ring_prod__needs_wakeup(&p->txq))
+		sendto(xsk_socket__fd(p->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
+	p->n_pkts_tx += n_pkts;
+}
+
+/*
+ * Thread
+ *
+ * Packet forwarding threads.
+ */
+#ifndef MAX_PORTS_PER_THREAD
+#define MAX_PORTS_PER_THREAD 16
+#endif
+
+struct thread_data {
+	struct port *ports_rx[MAX_PORTS_PER_THREAD];
+	struct port *ports_tx[MAX_PORTS_PER_THREAD];
+	u32 n_ports_rx;
+	struct burst_rx burst_rx;
+	struct burst_tx burst_tx[MAX_PORTS_PER_THREAD];
+	u32 cpu_core_id;
+	int quit;
+};
+
+static void swap_mac_addresses(void *data)
+{
+	struct ether_header *eth = (struct ether_header *)data;
+	struct ether_addr *src_addr = (struct ether_addr *)&eth->ether_shost;
+	struct ether_addr *dst_addr = (struct ether_addr *)&eth->ether_dhost;
+	struct ether_addr tmp;
+
+	tmp = *src_addr;
+	*src_addr = *dst_addr;
+	*dst_addr = tmp;
+}
+
+static void *
+thread_func(void *arg)
+{
+	struct thread_data *t = arg;
+	cpu_set_t cpu_cores;
+	u32 i;
+
+	CPU_ZERO(&cpu_cores);
+	CPU_SET(t->cpu_core_id, &cpu_cores);
+	pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_cores);
+
+	for (i = 0; !t->quit; i = (i + 1) & (t->n_ports_rx - 1)) {
+		struct port *port_rx = t->ports_rx[i];
+		struct port *port_tx = t->ports_tx[i];
+		struct burst_rx *brx = &t->burst_rx;
+		struct burst_tx *btx = &t->burst_tx[i];
+		u32 n_pkts, j;
+
+		/* RX. */
+		n_pkts = port_rx_burst(port_rx, brx);
+		if (!n_pkts)
+			continue;
+
+		/* Process & TX. */
+		for (j = 0; j < n_pkts; j++) {
+			u64 addr = xsk_umem__add_offset_to_addr(brx->addr[j]);
+			u8 *pkt = xsk_umem__get_data(port_rx->params.bp->addr,
+						     addr);
+
+			swap_mac_addresses(pkt);
+
+			btx->addr[btx->n_pkts] = brx->addr[j];
+			btx->len[btx->n_pkts] = brx->len[j];
+			btx->n_pkts++;
+
+			if (btx->n_pkts == MAX_BURST_TX) {
+				port_tx_burst(port_tx, btx);
+				btx->n_pkts = 0;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+/*
+ * Process
+ */
+static const struct bpool_params bpool_params_default = {
+	.n_buffers = 64 * 1024,
+	.buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE,
+	.mmap_flags = 0,
+
+	.n_users_max = 16,
+	.n_buffers_per_slab = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+};
+
+static const struct xsk_umem_config umem_cfg_default = {
+	.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+	.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+	.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE,
+	.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+	.flags = 0,
+};
+
+static const struct port_params port_params_default = {
+	.xsk_cfg = {
+		.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+		.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.libbpf_flags = 0,
+		.xdp_flags = 0,
+		.bind_flags = 0,
+	},
+
+	.bp = NULL,
+	.iface = NULL,
+	.iface_queue = 0,
+};
+
+#ifndef MAX_PORTS
+#define MAX_PORTS 64
+#endif
+
+#ifndef MAX_THREADS
+#define MAX_THREADS 64
+#endif
+
+static struct bpool_params bpool_params;
+static struct xsk_umem_config umem_cfg;
+static struct bpool *bp;
+
+static struct port_params port_params[MAX_PORTS];
+static struct port *ports[MAX_PORTS];
+static u64 n_pkts_rx[MAX_PORTS];
+static u64 n_pkts_tx[MAX_PORTS];
+static int n_ports;
+
+static pthread_t threads[MAX_THREADS];
+static struct thread_data thread_data[MAX_THREADS];
+static int n_threads;
+
+static void
+print_usage(char *prog_name)
+{
+	const char *usage =
+		"Usage:\n"
+		"\t%s [ -b SIZE ] -c CORE -i INTERFACE [ -q QUEUE ]\n"
+		"\n"
+		"-c CORE        CPU core to run a packet forwarding thread\n"
+		"               on. May be invoked multiple times.\n"
+		"\n"
+		"-b SIZE        Number of buffers in the buffer pool shared\n"
+		"               by all the forwarding threads. Default: %u.\n"
+		"\n"
+		"-i INTERFACE   Network interface. Each (INTERFACE, QUEUE)\n"
+		"               pair specifies one forwarding port. May be\n"
+		"               invoked multiple times.\n"
+		"\n"
+		"-q QUEUE       Network interface queue for RX and TX. Each\n"
+		"               (INTERFACE, QUEUE) pair specified one\n"
+		"               forwarding port. Default: %u. May be invoked\n"
+		"               multiple times.\n"
+		"\n";
+	printf(usage,
+	       prog_name,
+	       bpool_params_default.n_buffers,
+	       port_params_default.iface_queue);
+}
+
+static int
+parse_args(int argc, char **argv)
+{
+	struct option lgopts[] = {
+		{ NULL,  0, 0, 0 }
+	};
+	int opt, option_index;
+
+	/* Parse the input arguments. */
+	for ( ; ;) {
+		opt = getopt_long(argc, argv, "c:i:q:", lgopts, &option_index);
+		if (opt == EOF)
+			break;
+
+		switch (opt) {
+		case 'b':
+			bpool_params.n_buffers = atoi(optarg);
+			break;
+
+		case 'c':
+			if (n_threads == MAX_THREADS) {
+				printf("Max number of threads (%d) reached.\n",
+				       MAX_THREADS);
+				return -1;
+			}
+
+			thread_data[n_threads].cpu_core_id = atoi(optarg);
+			n_threads++;
+			break;
+
+		case 'i':
+			if (n_ports == MAX_PORTS) {
+				printf("Max number of ports (%d) reached.\n",
+				       MAX_PORTS);
+				return -1;
+			}
+
+			port_params[n_ports].iface = optarg;
+			port_params[n_ports].iface_queue = 0;
+			n_ports++;
+			break;
+
+		case 'q':
+			if (n_ports == 0) {
+				printf("No port specified for queue.\n");
+				return -1;
+			}
+			port_params[n_ports - 1].iface_queue = atoi(optarg);
+			break;
+
+		default:
+			printf("Illegal argument.\n");
+			return -1;
+		}
+	}
+
+	optind = 1; /* reset getopt lib */
+
+	/* Check the input arguments. */
+	if (!n_ports) {
+		printf("No ports specified.\n");
+		return -1;
+	}
+
+	if (!n_threads) {
+		printf("No threads specified.\n");
+		return -1;
+	}
+
+	if (n_ports % n_threads) {
+		printf("Ports cannot be evenly distributed to threads.\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static void
+print_port(u32 port_id)
+{
+	struct port *port = ports[port_id];
+
+	printf("Port %u: interface = %s, queue = %u\n",
+	       port_id, port->params.iface, port->params.iface_queue);
+}
+
+static void
+print_thread(u32 thread_id)
+{
+	struct thread_data *t = &thread_data[thread_id];
+	u32 i;
+
+	printf("Thread %u (CPU core %u): ",
+	       thread_id, t->cpu_core_id);
+
+	for (i = 0; i < t->n_ports_rx; i++) {
+		struct port *port_rx = t->ports_rx[i];
+		struct port *port_tx = t->ports_tx[i];
+
+		printf("(%s, %u) -> (%s, %u), ",
+		       port_rx->params.iface,
+		       port_rx->params.iface_queue,
+		       port_tx->params.iface,
+		       port_tx->params.iface_queue);
+	}
+
+	printf("\n");
+}
+
+static void
+print_port_stats_separator(void)
+{
+	printf("+-%4s-+-%12s-+-%13s-+-%12s-+-%13s-+\n",
+	       "----",
+	       "------------",
+	       "-------------",
+	       "------------",
+	       "-------------");
+}
+
+static void
+print_port_stats_header(void)
+{
+	print_port_stats_separator();
+	printf("| %4s | %12s | %13s | %12s | %13s |\n",
+	       "Port",
+	       "RX packets",
+	       "RX rate (pps)",
+	       "TX packets",
+	       "TX_rate (pps)");
+	print_port_stats_separator();
+}
+
+static void
+print_port_stats_trailer(void)
+{
+	print_port_stats_separator();
+	printf("\n");
+}
+
+static void
+print_port_stats(int port_id, u64 ns_diff)
+{
+	struct port *p = ports[port_id];
+	double rx_pps, tx_pps;
+
+	rx_pps = (p->n_pkts_rx - n_pkts_rx[port_id]) * 1000000000. / ns_diff;
+	tx_pps = (p->n_pkts_tx - n_pkts_tx[port_id]) * 1000000000. / ns_diff;
+
+	printf("| %4d | %12llu | %13.0f | %12llu | %13.0f |\n",
+	       port_id,
+	       p->n_pkts_rx,
+	       rx_pps,
+	       p->n_pkts_tx,
+	       tx_pps);
+
+	n_pkts_rx[port_id] = p->n_pkts_rx;
+	n_pkts_tx[port_id] = p->n_pkts_tx;
+}
+
+static void
+print_port_stats_all(u64 ns_diff)
+{
+	int i;
+
+	print_port_stats_header();
+	for (i = 0; i < n_ports; i++)
+		print_port_stats(i, ns_diff);
+	print_port_stats_trailer();
+}
+
+static int quit;
+
+static void
+signal_handler(int sig)
+{
+	quit = 1;
+}
+
+int main(int argc, char **argv)
+{
+	struct timespec time;
+	u64 ns0;
+	int i;
+
+	/* Parse args. */
+	memcpy(&bpool_params, &bpool_params_default,
+	       sizeof(struct bpool_params));
+	memcpy(&umem_cfg, &umem_cfg_default,
+	       sizeof(struct xsk_umem_config));
+	for (i = 0; i < MAX_PORTS; i++)
+		memcpy(&port_params[i], &port_params_default,
+		       sizeof(struct port_params));
+
+	if (parse_args(argc, argv)) {
+		print_usage(argv[0]);
+		return -1;
+	}
+
+	/* Buffer pool initialization. */
+	bp = bpool_init(&bpool_params, &umem_cfg);
+	if (!bp) {
+		printf("Buffer pool initialization failed.\n");
+		return -1;
+	}
+	printf("Buffer pool created successfully.\n");
+
+	/* Ports initialization. */
+	for (i = 0; i < MAX_PORTS; i++)
+		port_params[i].bp = bp;
+
+	for (i = 0; i < n_ports; i++) {
+		ports[i] = port_init(&port_params[i]);
+		if (!ports[i]) {
+			printf("Port %d initialization failed.\n", i);
+			return -1;
+		}
+		print_port(i);
+	}
+	printf("All ports created successfully.\n");
+
+	/* Threads. */
+	for (i = 0; i < n_threads; i++) {
+		struct thread_data *t = &thread_data[i];
+		u32 n_ports_per_thread = n_ports / n_threads, j;
+
+		for (j = 0; j < n_ports_per_thread; j++) {
+			t->ports_rx[j] = ports[i * n_ports_per_thread + j];
+			t->ports_tx[j] = ports[i * n_ports_per_thread +
+				(j + 1) % n_ports_per_thread];
+		}
+
+		t->n_ports_rx = n_ports_per_thread;
+
+		print_thread(i);
+	}
+
+	for (i = 0; i < n_threads; i++) {
+		int status;
+
+		status = pthread_create(&threads[i],
+					NULL,
+					thread_func,
+					&thread_data[i]);
+		if (status) {
+			printf("Thread %d creation failed.\n", i);
+			return -1;
+		}
+	}
+	printf("All threads created successfully.\n");
+
+	/* Print statistics. */
+	signal(SIGINT, signal_handler);
+	signal(SIGTERM, signal_handler);
+	signal(SIGABRT, signal_handler);
+
+	clock_gettime(CLOCK_MONOTONIC, &time);
+	ns0 = time.tv_sec * 1000000000UL + time.tv_nsec;
+	for ( ; !quit; ) {
+		u64 ns1, ns_diff;
+
+		sleep(1);
+		clock_gettime(CLOCK_MONOTONIC, &time);
+		ns1 = time.tv_sec * 1000000000UL + time.tv_nsec;
+		ns_diff = ns1 - ns0;
+		ns0 = ns1;
+
+		print_port_stats_all(ns_diff);
+	}
+
+	/* Threads completion. */
+	printf("Quit.\n");
+	for (i = 0; i < n_threads; i++)
+		thread_data[i].quit = 1;
+
+	for (i = 0; i < n_threads; i++)
+		pthread_join(threads[i], NULL);
+
+	/* Ports free. */
+	for (i = 0; i < n_ports; i++)
+		port_free(ports[i]);
+
+	/* Buffer pool free. */
+	bpool_free(bp);
+
+	return 0;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 14/14] xsk: documentation for XDP_SHARED_UMEM between queues and netdevs
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (12 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 13/14] samples/bpf: add new sample xsk_fwd.c Magnus Karlsson
@ 2020-07-02 12:19 ` Magnus Karlsson
  2020-07-06 18:39 ` [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Daniel Borkmann
  2020-07-08 15:00 ` Maxim Mikityanskiy
  15 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-02 12:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

Add documentation for the XDP_SHARED_UMEM feature when a UMEM is
shared between different queues and/or netdevs.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 Documentation/networking/af_xdp.rst | 68 +++++++++++++++++++++++++++++++------
 1 file changed, 58 insertions(+), 10 deletions(-)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 5bc55a4..2ccc564 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -258,14 +258,21 @@ socket into zero-copy mode or fail.
 XDP_SHARED_UMEM bind flag
 -------------------------
 
-This flag enables you to bind multiple sockets to the same UMEM, but
-only if they share the same queue id. In this mode, each socket has
-their own RX and TX rings, but the UMEM (tied to the fist socket
-created) only has a single FILL ring and a single COMPLETION
-ring. To use this mode, create the first socket and bind it in the normal
-way. Create a second socket and create an RX and a TX ring, or at
-least one of them, but no FILL or COMPLETION rings as the ones from
-the first socket will be used. In the bind call, set he
+This flag enables you to bind multiple sockets to the same UMEM. It
+works on the same queue id, between queue ids and between
+netdevs/devices. In this mode, each socket has their own RX and TX
+rings as usual, but you are going to have one or more FILL and
+COMPLETION ring pairs. You have to create one of these pairs per
+unique netdev and queue id tuple that you bind to.
+
+Starting with the case were we would like to share a UMEM between
+sockets bound to the same netdev and queue id. The UMEM (tied to the
+fist socket created) will only have a single FILL ring and a single
+COMPLETION ring as there is only on unique netdev,queue_id tuple that
+we have bound to. To use this mode, create the first socket and bind
+it in the normal way. Create a second socket and create an RX and a TX
+ring, or at least one of them, but no FILL or COMPLETION rings as the
+ones from the first socket will be used. In the bind call, set he
 XDP_SHARED_UMEM option and provide the initial socket's fd in the
 sxdp_shared_umem_fd field. You can attach an arbitrary number of extra
 sockets this way.
@@ -305,11 +312,41 @@ concurrently. There are no synchronization primitives in the
 libbpf code that protects multiple users at this point in time.
 
 Libbpf uses this mode if you create more than one socket tied to the
-same umem. However, note that you need to supply the
+same UMEM. However, note that you need to supply the
 XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the
 xsk_socket__create calls and load your own XDP program as there is no
 built in one in libbpf that will route the traffic for you.
 
+The second case is when you share a UMEM between sockets that are
+bound to different queue ids and/or netdevs. In this case you have to
+create one FILL ring and one COMPLETION ring for each unique
+netdev,queue_id pair. Let us say you want to create two sockets bound
+to two different queue ids on the same netdev. Create the first socket
+and bind it in the normal way. Create a second socket and create an RX
+and a TX ring, or at least one of them, and then one FILL and
+COMPLETION ring for this socket. Then in the bind call, set he
+XDP_SHARED_UMEM option and provide the initial socket's fd in the
+sxdp_shared_umem_fd field as you registered the UMEM on that
+socket. These two sockets will now share one and the same UMEM.
+
+There is no need to supply an XDP program like the one in the previous
+case where sockets were bound to the same queue id and
+device. Instead, use the NIC's packet steering capabilities to steer
+the packets to the right queue. In the previous example, there is only
+one queue shared among sockets, so the NIC cannot do this steering. It
+can only steer between queues.
+
+In libbpf, you need to use the xsk_socket__create_shared() API as it
+takes a reference to a FILL ring and a COMPLETION ring that will be
+created for you and bound to the shared UMEM. You can use this
+function for all the sockets you create, or you can use it for the
+second and following ones and use xsk_socket__create() for the first
+one. Both methods yield the same result.
+
+Note that a UMEM can be shared between sockets on the same queue id
+and device, as well as between queues on the same device and between
+devices at the same time.
+
 XDP_USE_NEED_WAKEUP bind flag
 -----------------------------
 
@@ -364,7 +401,7 @@ resources by only setting up one of them. Both the FILL ring and the
 COMPLETION ring are mandatory as you need to have a UMEM tied to your
 socket. But if the XDP_SHARED_UMEM flag is used, any socket after the
 first one does not have a UMEM and should in that case not have any
-FILL or COMPLETION rings created as the ones from the shared umem will
+FILL or COMPLETION rings created as the ones from the shared UMEM will
 be used. Note, that the rings are single-producer single-consumer, so
 do not try to access them from multiple processes at the same
 time. See the XDP_SHARED_UMEM section.
@@ -567,6 +604,17 @@ A: The short answer is no, that is not supported at the moment. The
    switch, or other distribution mechanism, in your NIC to direct
    traffic to the correct queue id and socket.
 
+Q: My packets are sometimes corrupted. What is wrong?
+
+A: Care has to be taken not to feed the same buffer in the UMEM into
+   more than one ring at the same time. If you for example feed the
+   same buffer into the FILL ring and the TX ring at the same time, the
+   NIC might receive data into the buffer at the same time it is
+   sending it. This will cause some packets to become corrupted. Same
+   thing goes for feeding the same buffer into the FILL rings
+   belonging to different queue ids or netdevs bound with the
+   XDP_SHARED_UMEM flag.
+
 Credits
 =======
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (13 preceding siblings ...)
  2020-07-02 12:19 ` [PATCH bpf-next 14/14] xsk: documentation for XDP_SHARED_UMEM between queues and netdevs Magnus Karlsson
@ 2020-07-06 18:39 ` Daniel Borkmann
  2020-07-07 10:37   ` Maxim Mikityanskiy
  2020-07-08 15:00 ` Maxim Mikityanskiy
  15 siblings, 1 reply; 25+ messages in thread
From: Daniel Borkmann @ 2020-07-06 18:39 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jonathan.lemon, maximmi
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

On 7/2/20 2:18 PM, Magnus Karlsson wrote:
> This patch set adds support to share a umem between AF_XDP sockets
> bound to different queue ids on the same device or even between
> devices. It has already been possible to do this by registering the
> umem multiple times, but this wastes a lot of memory. Just imagine
> having 10 threads each having 10 sockets open sharing a single
> umem. This means that you would have to register the umem 100 times
> consuming large quantities of memory.

[...]

> Note to Maxim at Mellanox. I do not have a mlx5 card, so I have not
> been able to test the changes to your driver. It compiles, but that is
> all I can say, so it would be great if you could test it. Also, I did
> change the name of many functions and variables from umem to pool as a
> buffer pool is passed down to the driver in this patch set instead of
> the umem. I did not change the name of the files umem.c and
> umem.h. Please go through the changes and change things to your
> liking.

Bjorn / Maxim, this is waiting on review (& mlx5 testing) from you, ptal.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues
  2020-07-06 18:39 ` [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Daniel Borkmann
@ 2020-07-07 10:37   ` Maxim Mikityanskiy
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Mikityanskiy @ 2020-07-07 10:37 UTC (permalink / raw)
  To: Daniel Borkmann, Magnus Karlsson, bjorn.topel, ast, netdev,
	jonathan.lemon
  Cc: bpf, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, cristian.dumitrescu

On 2020-07-06 21:39, Daniel Borkmann wrote:
> On 7/2/20 2:18 PM, Magnus Karlsson wrote:
>> This patch set adds support to share a umem between AF_XDP sockets
>> bound to different queue ids on the same device or even between
>> devices. It has already been possible to do this by registering the
>> umem multiple times, but this wastes a lot of memory. Just imagine
>> having 10 threads each having 10 sockets open sharing a single
>> umem. This means that you would have to register the umem 100 times
>> consuming large quantities of memory.

Sounds like this series has some great stuff!

> [...]
> 
>> Note to Maxim at Mellanox. I do not have a mlx5 card, so I have not
>> been able to test the changes to your driver. It compiles, but that is
>> all I can say, so it would be great if you could test it. Also, I did
>> change the name of many functions and variables from umem to pool as a
>> buffer pool is passed down to the driver in this patch set instead of
>> the umem. I did not change the name of the files umem.c and
>> umem.h. Please go through the changes and change things to your
>> liking.
> 
> Bjorn / Maxim, this is waiting on review (& mlx5 testing) from you, ptal.

Sure, I'll take a look and do the mlx5 testing (I only noticed this 
series yesterday).

> Thanks,
> Daniel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 03/14] xsk: create and free context independently from umem
  2020-07-02 12:19 ` [PATCH bpf-next 03/14] xsk: create and free context independently from umem Magnus Karlsson
@ 2020-07-08 15:00   ` Maxim Mikityanskiy
  2020-07-09  6:47     ` Magnus Karlsson
  0 siblings, 1 reply; 25+ messages in thread
From: Maxim Mikityanskiy @ 2020-07-08 15:00 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciej.fijalkowski, maciejromanfijalkowski,
	cristian.dumitrescu

On 2020-07-02 15:19, Magnus Karlsson wrote:
> Create and free the buffer pool independently from the umem. Move
> these operations that are performed on the buffer pool from the
> umem create and destroy functions to new create and destroy
> functions just for the buffer pool. This so that in later commits
> we can instantiate multiple buffer pools per umem when sharing a
> umem between HW queues and/or devices. We also erradicate the
> back pointer from the umem to the buffer pool as this will not
> work when we introduce the possibility to have multiple buffer
> pools per umem.
> 
> It might seem a bit odd that we create an empty buffer pool first
> and then recreate it with its right size when we bind to a device
> and umem. But the page pool will in later commits be used to
> carry information before it has been assigned to a umem and its
> size decided.

What kind of information? I'm looking at the final code: on socket 
creation you just fill the pool with zeros, then we may have setsockopt 
for FQ and CQ, then xsk_bind replaces the pool with the real one. So the 
only information carried from the old pool to the new one is FQ and CQ, 
or did I miss anything?

I don't quite like this design, it's kind of a hack to support the 
initialization order that we have, but it complicates things: when you 
copy the old pool into the new one, it's not clear which fields we care 
about, and which are ignored/overwritten.

Regarding FQ and CQ, for shared UMEM, they won't be filled, so there is 
no point in the temporary pool in this case (unless it also stores 
something that I missed).

I suggest to add a pointer to some kind of a configuration struct to xs. 
All things configured with setsockopt go to that struct. xsk_bind will 
call a function to validate the config struct, and if it's OK, it will 
create the pool (once), fill the fields and free the config struct. 
Config struct can be a union with the pool to save space in xs. Probably 
we will also be able to drop a few fields from xs (such as umem?). How 
do you feel about this idea?

> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>   include/net/xdp_sock.h      |   3 +-
>   include/net/xsk_buff_pool.h |  14 +++-
>   net/xdp/xdp_umem.c          | 164 ++++----------------------------------------
>   net/xdp/xdp_umem.h          |   4 +-
>   net/xdp/xsk.c               |  83 +++++++++++++++++++---
>   net/xdp/xsk.h               |   3 +
>   net/xdp/xsk_buff_pool.c     | 154 +++++++++++++++++++++++++++++++++++++----
>   net/xdp/xsk_queue.h         |  12 ++--
>   8 files changed, 250 insertions(+), 187 deletions(-)
> 
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 6eb9628..b9bb118 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -20,13 +20,12 @@ struct xdp_buff;
>   struct xdp_umem {
>   	struct xsk_queue *fq;
>   	struct xsk_queue *cq;
> -	struct xsk_buff_pool *pool;
>   	u64 size;
>   	u32 headroom;
>   	u32 chunk_size;
> +	u32 chunks;
>   	struct user_struct *user;
>   	refcount_t users;
> -	struct work_struct work;
>   	struct page **pgs;
>   	u32 npgs;
>   	u16 queue_id;
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index a6dec9c..cda8ced 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -14,6 +14,7 @@ struct xdp_rxq_info;
>   struct xsk_queue;
>   struct xdp_desc;
>   struct xdp_umem;
> +struct xdp_sock;
>   struct device;
>   struct page;
>   
> @@ -46,16 +47,23 @@ struct xsk_buff_pool {
>   	struct xdp_umem *umem;
>   	void *addrs;
>   	struct device *dev;
> +	refcount_t users;
> +	struct work_struct work;
>   	struct xdp_buff_xsk *free_heads[];
>   };
>   
>   /* AF_XDP core. */
> -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
> -				u32 chunk_size, u32 headroom, u64 size,
> -				bool unaligned);
> +struct xsk_buff_pool *xp_create(void);
> +struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool,
> +				     struct xdp_umem *umem);
> +int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
> +		  struct net_device *dev, u16 queue_id, u16 flags);
>   void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq);
>   void xp_destroy(struct xsk_buff_pool *pool);
>   void xp_release(struct xdp_buff_xsk *xskb);
> +void xp_get_pool(struct xsk_buff_pool *pool);
> +void xp_put_pool(struct xsk_buff_pool *pool);
> +void xp_clear_dev(struct xsk_buff_pool *pool);
>   
>   /* AF_XDP, and XDP core. */
>   void xp_free(struct xdp_buff_xsk *xskb);
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index adde4d5..f290345 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -47,160 +47,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
>   	spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags);
>   }
>   
> -/* The umem is stored both in the _rx struct and the _tx struct as we do
> - * not know if the device has more tx queues than rx, or the opposite.
> - * This might also change during run time.
> - */
> -static int xsk_reg_pool_at_qid(struct net_device *dev,
> -			       struct xsk_buff_pool *pool,
> -			       u16 queue_id)
> -{
> -	if (queue_id >= max_t(unsigned int,
> -			      dev->real_num_rx_queues,
> -			      dev->real_num_tx_queues))
> -		return -EINVAL;
> -
> -	if (queue_id < dev->real_num_rx_queues)
> -		dev->_rx[queue_id].pool = pool;
> -	if (queue_id < dev->real_num_tx_queues)
> -		dev->_tx[queue_id].pool = pool;
> -
> -	return 0;
> -}
> -
> -struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
> -					    u16 queue_id)
> +static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>   {
> -	if (queue_id < dev->real_num_rx_queues)
> -		return dev->_rx[queue_id].pool;
> -	if (queue_id < dev->real_num_tx_queues)
> -		return dev->_tx[queue_id].pool;
> +	unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
>   
> -	return NULL;
> +	kfree(umem->pgs);
> +	umem->pgs = NULL;
>   }
> -EXPORT_SYMBOL(xsk_get_pool_from_qid);
>   
> -static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id)
> +static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
>   {
> -	if (queue_id < dev->real_num_rx_queues)
> -		dev->_rx[queue_id].pool = NULL;
> -	if (queue_id < dev->real_num_tx_queues)
> -		dev->_tx[queue_id].pool = NULL;
> +	if (umem->user) {
> +		atomic_long_sub(umem->npgs, &umem->user->locked_vm);
> +		free_uid(umem->user);
> +	}
>   }
>   
> -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> -			u16 queue_id, u16 flags)
> +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> +			 u16 queue_id)
>   {
> -	bool force_zc, force_copy;
> -	struct netdev_bpf bpf;
> -	int err = 0;
> -
> -	ASSERT_RTNL();
> -
> -	force_zc = flags & XDP_ZEROCOPY;
> -	force_copy = flags & XDP_COPY;
> -
> -	if (force_zc && force_copy)
> -		return -EINVAL;
> -
> -	if (xsk_get_pool_from_qid(dev, queue_id))
> -		return -EBUSY;
> -
> -	err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id);
> -	if (err)
> -		return err;
> -
>   	umem->dev = dev;
>   	umem->queue_id = queue_id;
>   
> -	if (flags & XDP_USE_NEED_WAKEUP) {
> -		umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
> -		/* Tx needs to be explicitly woken up the first time.
> -		 * Also for supporting drivers that do not implement this
> -		 * feature. They will always have to call sendto().
> -		 */
> -		xsk_set_tx_need_wakeup(umem->pool);
> -	}
> -
>   	dev_hold(dev);
> -
> -	if (force_copy)
> -		/* For copy-mode, we are done. */
> -		return 0;
> -
> -	if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) {
> -		err = -EOPNOTSUPP;
> -		goto err_unreg_umem;
> -	}
> -
> -	bpf.command = XDP_SETUP_XSK_POOL;
> -	bpf.xsk.pool = umem->pool;
> -	bpf.xsk.queue_id = queue_id;
> -
> -	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> -	if (err)
> -		goto err_unreg_umem;
> -
> -	umem->zc = true;
> -	return 0;
> -
> -err_unreg_umem:
> -	if (!force_zc)
> -		err = 0; /* fallback to copy mode */
> -	if (err)
> -		xsk_clear_pool_at_qid(dev, queue_id);
> -	return err;
>   }
>   
>   void xdp_umem_clear_dev(struct xdp_umem *umem)
>   {
> -	struct netdev_bpf bpf;
> -	int err;
> -
> -	ASSERT_RTNL();
> -
> -	if (!umem->dev)
> -		return;
> -
> -	if (umem->zc) {
> -		bpf.command = XDP_SETUP_XSK_POOL;
> -		bpf.xsk.pool = NULL;
> -		bpf.xsk.queue_id = umem->queue_id;
> -
> -		err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
> -
> -		if (err)
> -			WARN(1, "failed to disable umem!\n");
> -	}
> -
> -	xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
> -
>   	dev_put(umem->dev);
>   	umem->dev = NULL;
>   	umem->zc = false;
>   }
>   
> -static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> -{
> -	unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
> -
> -	kfree(umem->pgs);
> -	umem->pgs = NULL;
> -}
> -
> -static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
> -{
> -	if (umem->user) {
> -		atomic_long_sub(umem->npgs, &umem->user->locked_vm);
> -		free_uid(umem->user);
> -	}
> -}
> -
>   static void xdp_umem_release(struct xdp_umem *umem)
>   {
> -	rtnl_lock();
>   	xdp_umem_clear_dev(umem);
> -	rtnl_unlock();
>   
>   	ida_simple_remove(&umem_ida, umem->id);
>   
> @@ -214,20 +95,12 @@ static void xdp_umem_release(struct xdp_umem *umem)
>   		umem->cq = NULL;
>   	}
>   
> -	xp_destroy(umem->pool);
>   	xdp_umem_unpin_pages(umem);
>   
>   	xdp_umem_unaccount_pages(umem);
>   	kfree(umem);
>   }
>   
> -static void xdp_umem_release_deferred(struct work_struct *work)
> -{
> -	struct xdp_umem *umem = container_of(work, struct xdp_umem, work);
> -
> -	xdp_umem_release(umem);
> -}
> -
>   void xdp_get_umem(struct xdp_umem *umem)
>   {
>   	refcount_inc(&umem->users);
> @@ -238,10 +111,8 @@ void xdp_put_umem(struct xdp_umem *umem)
>   	if (!umem)
>   		return;
>   
> -	if (refcount_dec_and_test(&umem->users)) {
> -		INIT_WORK(&umem->work, xdp_umem_release_deferred);
> -		schedule_work(&umem->work);
> -	}
> +	if (refcount_dec_and_test(&umem->users))
> +		xdp_umem_release(umem);
>   }
>   
>   static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
> @@ -357,6 +228,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   	umem->size = size;
>   	umem->headroom = headroom;
>   	umem->chunk_size = chunk_size;
> +	umem->chunks = chunks;
>   	umem->npgs = (u32)npgs;
>   	umem->pgs = NULL;
>   	umem->user = NULL;
> @@ -374,16 +246,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   	if (err)
>   		goto out_account;
>   
> -	umem->pool = xp_create(umem, chunks, chunk_size, headroom, size,
> -			       unaligned_chunks);
> -	if (!umem->pool) {
> -		err = -ENOMEM;
> -		goto out_pin;
> -	}
>   	return 0;
>   
> -out_pin:
> -	xdp_umem_unpin_pages(umem);
>   out_account:
>   	xdp_umem_unaccount_pages(umem);
>   	return err;
> diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> index 32067fe..93e96be 100644
> --- a/net/xdp/xdp_umem.h
> +++ b/net/xdp/xdp_umem.h
> @@ -8,8 +8,8 @@
>   
>   #include <net/xdp_sock_drv.h>
>   
> -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> -			u16 queue_id, u16 flags);
> +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> +			 u16 queue_id);
>   void xdp_umem_clear_dev(struct xdp_umem *umem);
>   bool xdp_umem_validate_queues(struct xdp_umem *umem);
>   void xdp_get_umem(struct xdp_umem *umem);
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 7551f5b..b12a832 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -105,6 +105,46 @@ bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
>   }
>   EXPORT_SYMBOL(xsk_uses_need_wakeup);
>   
> +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
> +					    u16 queue_id)
> +{
> +	if (queue_id < dev->real_num_rx_queues)
> +		return dev->_rx[queue_id].pool;
> +	if (queue_id < dev->real_num_tx_queues)
> +		return dev->_tx[queue_id].pool;
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL(xsk_get_pool_from_qid);
> +
> +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id)
> +{
> +	if (queue_id < dev->real_num_rx_queues)
> +		dev->_rx[queue_id].pool = NULL;
> +	if (queue_id < dev->real_num_tx_queues)
> +		dev->_tx[queue_id].pool = NULL;
> +}
> +
> +/* The buffer pool is stored both in the _rx struct and the _tx struct as we do
> + * not know if the device has more tx queues than rx, or the opposite.
> + * This might also change during run time.
> + */
> +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
> +			u16 queue_id)
> +{
> +	if (queue_id >= max_t(unsigned int,
> +			      dev->real_num_rx_queues,
> +			      dev->real_num_tx_queues))
> +		return -EINVAL;
> +
> +	if (queue_id < dev->real_num_rx_queues)
> +		dev->_rx[queue_id].pool = pool;
> +	if (queue_id < dev->real_num_tx_queues)
> +		dev->_tx[queue_id].pool = pool;
> +
> +	return 0;
> +}
> +
>   void xp_release(struct xdp_buff_xsk *xskb)
>   {
>   	xskb->pool->free_heads[xskb->pool->free_heads_cnt++] = xskb;
> @@ -281,7 +321,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
>   
>   	rcu_read_lock();
>   	list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
> -		if (!xskq_cons_peek_desc(xs->tx, desc, umem))
> +		if (!xskq_cons_peek_desc(xs->tx, desc, pool))
>   			continue;
>   
>   		/* This is the backpressure mechanism for the Tx path.
> @@ -347,7 +387,7 @@ static int xsk_generic_xmit(struct sock *sk)
>   	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>   		goto out;
>   
> -	while (xskq_cons_peek_desc(xs->tx, &desc, xs->umem)) {
> +	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
>   		char *buffer;
>   		u64 addr;
>   		u32 len;
> @@ -629,6 +669,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   	qid = sxdp->sxdp_queue_id;
>   
>   	if (flags & XDP_SHARED_UMEM) {
> +		struct xsk_buff_pool *curr_pool;
>   		struct xdp_sock *umem_xs;
>   		struct socket *sock;
>   
> @@ -663,6 +704,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   			goto out_unlock;
>   		}
>   
> +		/* Share the buffer pool with the other socket. */
> +		xp_get_pool(umem_xs->pool);
> +		curr_pool = xs->pool;
> +		xs->pool = umem_xs->pool;
> +		xp_destroy(curr_pool);
>   		xdp_get_umem(umem_xs->umem);
>   		WRITE_ONCE(xs->umem, umem_xs->umem);
>   		sockfd_put(sock);
> @@ -670,10 +716,24 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   		err = -EINVAL;
>   		goto out_unlock;
>   	} else {
> +		struct xsk_buff_pool *new_pool;
> +
>   		/* This xsk has its own umem. */
> -		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
> -		if (err)
> +		xdp_umem_assign_dev(xs->umem, dev, qid);
> +		new_pool = xp_assign_umem(xs->pool, xs->umem);

It looks like the old pool (xs->pool) is never freed.

> +		if (!new_pool) {
> +			err = -ENOMEM;
> +			xdp_umem_clear_dev(xs->umem);
> +			goto out_unlock;
> +		}
> +
> +		err = xp_assign_dev(new_pool, xs, dev, qid, flags);
> +		if (err) {
> +			xp_destroy(new_pool);
> +			xdp_umem_clear_dev(xs->umem);
>   			goto out_unlock;
> +		}
> +		xs->pool = new_pool;
>   	}
>   
>   	xs->dev = dev;
> @@ -765,8 +825,6 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
>   			return PTR_ERR(umem);
>   		}
>   
> -		xs->pool = umem->pool;
> -
>   		/* Make sure umem is ready before it can be seen by others */
>   		smp_wmb();
>   		WRITE_ONCE(xs->umem, umem);
> @@ -796,7 +854,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
>   			&xs->umem->cq;
>   		err = xsk_init_queue(entries, q, true);
>   		if (optname == XDP_UMEM_FILL_RING)
> -			xp_set_fq(xs->umem->pool, *q);
> +			xp_set_fq(xs->pool, *q);
>   		mutex_unlock(&xs->mutex);
>   		return err;
>   	}
> @@ -1002,7 +1060,8 @@ static int xsk_notifier(struct notifier_block *this,
>   
>   				xsk_unbind_dev(xs);
>   
> -				/* Clear device references in umem. */
> +				/* Clear device references. */
> +				xp_clear_dev(xs->pool);
>   				xdp_umem_clear_dev(xs->umem);
>   			}
>   			mutex_unlock(&xs->mutex);
> @@ -1047,7 +1106,7 @@ static void xsk_destruct(struct sock *sk)
>   	if (!sock_flag(sk, SOCK_DEAD))
>   		return;
>   
> -	xdp_put_umem(xs->umem);
> +	xp_put_pool(xs->pool);
>   
>   	sk_refcnt_debug_dec(sk);
>   }
> @@ -1055,8 +1114,8 @@ static void xsk_destruct(struct sock *sk)
>   static int xsk_create(struct net *net, struct socket *sock, int protocol,
>   		      int kern)
>   {
> -	struct sock *sk;
>   	struct xdp_sock *xs;
> +	struct sock *sk;
>   
>   	if (!ns_capable(net->user_ns, CAP_NET_RAW))
>   		return -EPERM;
> @@ -1092,6 +1151,10 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
>   	INIT_LIST_HEAD(&xs->map_list);
>   	spin_lock_init(&xs->map_list_lock);
>   
> +	xs->pool = xp_create();
> +	if (!xs->pool)
> +		return -ENOMEM;
> +
>   	mutex_lock(&net->xdp.lock);
>   	sk_add_node_rcu(sk, &net->xdp.list);
>   	mutex_unlock(&net->xdp.lock);
> diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
> index 455ddd4..a00e3e2 100644
> --- a/net/xdp/xsk.h
> +++ b/net/xdp/xsk.h
> @@ -51,5 +51,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
>   			     struct xdp_sock **map_entry);
>   int xsk_map_inc(struct xsk_map *map);
>   void xsk_map_put(struct xsk_map *map);
> +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
> +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
> +			u16 queue_id);
>   
>   #endif /* XSK_H_ */
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index c57f0bb..da93b36 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -2,11 +2,14 @@
>   
>   #include <net/xsk_buff_pool.h>
>   #include <net/xdp_sock.h>
> +#include <net/xdp_sock_drv.h>
>   #include <linux/dma-direct.h>
>   #include <linux/dma-noncoherent.h>
>   #include <linux/swiotlb.h>
>   
>   #include "xsk_queue.h"
> +#include "xdp_umem.h"
> +#include "xsk.h"
>   
>   static void xp_addr_unmap(struct xsk_buff_pool *pool)
>   {
> @@ -32,39 +35,48 @@ void xp_destroy(struct xsk_buff_pool *pool)
>   	kvfree(pool);
>   }
>   
> -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
> -				u32 chunk_size, u32 headroom, u64 size,
> -				bool unaligned)
> +struct xsk_buff_pool *xp_create(void)
> +{
> +	return kvzalloc(sizeof(struct xsk_buff_pool), GFP_KERNEL);
> +}
> +
> +struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
> +				     struct xdp_umem *umem)
>   {
>   	struct xsk_buff_pool *pool;
>   	struct xdp_buff_xsk *xskb;
>   	int err;
>   	u32 i;
>   
> -	pool = kvzalloc(struct_size(pool, free_heads, chunks), GFP_KERNEL);
> +	pool = kvzalloc(struct_size(pool, free_heads, umem->chunks),
> +			GFP_KERNEL);
>   	if (!pool)
>   		goto out;
>   
> -	pool->heads = kvcalloc(chunks, sizeof(*pool->heads), GFP_KERNEL);
> +	memcpy(pool, pool_old, sizeof(*pool_old));
> +
> +	pool->heads = kvcalloc(umem->chunks, sizeof(*pool->heads), GFP_KERNEL);
>   	if (!pool->heads)
>   		goto out;
>   
> -	pool->chunk_mask = ~((u64)chunk_size - 1);
> -	pool->addrs_cnt = size;
> -	pool->heads_cnt = chunks;
> -	pool->free_heads_cnt = chunks;
> -	pool->headroom = headroom;
> -	pool->chunk_size = chunk_size;
> +	pool->chunk_mask = ~((u64)umem->chunk_size - 1);
> +	pool->addrs_cnt = umem->size;
> +	pool->heads_cnt = umem->chunks;
> +	pool->free_heads_cnt = umem->chunks;
> +	pool->headroom = umem->headroom;
> +	pool->chunk_size = umem->chunk_size;
>   	pool->cheap_dma = true;
> -	pool->unaligned = unaligned;
> -	pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM;
> +	pool->unaligned = umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
> +	pool->frame_len = umem->chunk_size - umem->headroom -
> +		XDP_PACKET_HEADROOM;
>   	pool->umem = umem;
>   	INIT_LIST_HEAD(&pool->free_list);
> +	refcount_set(&pool->users, 1);
>   
>   	for (i = 0; i < pool->free_heads_cnt; i++) {
>   		xskb = &pool->heads[i];
>   		xskb->pool = pool;
> -		xskb->xdp.frame_sz = chunk_size - headroom;
> +		xskb->xdp.frame_sz = umem->chunk_size - umem->headroom;
>   		pool->free_heads[i] = xskb;
>   	}
>   
> @@ -91,6 +103,120 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
>   }
>   EXPORT_SYMBOL(xp_set_rxq_info);
>   
> +int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
> +		  struct net_device *dev, u16 queue_id, u16 flags)
> +{
> +	struct xdp_umem *umem = pool->umem;
> +	bool force_zc, force_copy;
> +	struct netdev_bpf bpf;
> +	int err = 0;
> +
> +	ASSERT_RTNL();
> +
> +	force_zc = flags & XDP_ZEROCOPY;
> +	force_copy = flags & XDP_COPY;
> +
> +	if (force_zc && force_copy)
> +		return -EINVAL;
> +
> +	if (xsk_get_pool_from_qid(dev, queue_id))
> +		return -EBUSY;
> +
> +	err = xsk_reg_pool_at_qid(dev, pool, queue_id);
> +	if (err)
> +		return err;
> +
> +	if ((flags & XDP_USE_NEED_WAKEUP) && xs->tx) {
> +		umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
> +		/* Tx needs to be explicitly woken up the first time.
> +		 * Also for supporting drivers that do not implement this
> +		 * feature. They will always have to call sendto().
> +		 */
> +		xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP;
> +	}
> +
> +	if (force_copy)
> +		/* For copy-mode, we are done. */
> +		return 0;
> +
> +	if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) {
> +		err = -EOPNOTSUPP;
> +		goto err_unreg_pool;
> +	}
> +
> +	bpf.command = XDP_SETUP_XSK_POOL;
> +	bpf.xsk.pool = pool;
> +	bpf.xsk.queue_id = queue_id;
> +
> +	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> +	if (err)
> +		goto err_unreg_pool;
> +
> +	umem->zc = true;
> +	return 0;
> +
> +err_unreg_pool:
> +	if (!force_zc)
> +		err = 0; /* fallback to copy mode */
> +	if (err)
> +		xsk_clear_pool_at_qid(dev, queue_id);
> +	return err;
> +}
> +
> +void xp_clear_dev(struct xsk_buff_pool *pool)
> +{
> +	struct xdp_umem *umem = pool->umem;
> +	struct netdev_bpf bpf;
> +	int err;
> +
> +	ASSERT_RTNL();
> +
> +	if (!umem->dev)
> +		return;
> +
> +	if (umem->zc) {
> +		bpf.command = XDP_SETUP_XSK_POOL;
> +		bpf.xsk.pool = NULL;
> +		bpf.xsk.queue_id = umem->queue_id;
> +
> +		err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
> +
> +		if (err)
> +			WARN(1, "failed to disable umem!\n");
> +	}
> +
> +	xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
> +}
> +
> +static void xp_release_deferred(struct work_struct *work)
> +{
> +	struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
> +						  work);
> +
> +	rtnl_lock();
> +	xp_clear_dev(pool);
> +	rtnl_unlock();
> +
> +	xdp_put_umem(pool->umem);
> +	xp_destroy(pool);
> +}
> +
> +void xp_get_pool(struct xsk_buff_pool *pool)
> +{
> +	refcount_inc(&pool->users);
> +}
> +
> +void xp_put_pool(struct xsk_buff_pool *pool)
> +{
> +	if (!pool)
> +		return;
> +
> +	if (refcount_dec_and_test(&pool->users)) {
> +		INIT_WORK(&pool->work, xp_release_deferred);
> +		schedule_work(&pool->work);
> +	}
> +}
> +
>   void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
>   {
>   	dma_addr_t *dma;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 5b5d24d..75f1853 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -165,9 +165,9 @@ static inline bool xp_validate_desc(struct xsk_buff_pool *pool,
>   
>   static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
>   					   struct xdp_desc *d,
> -					   struct xdp_umem *umem)
> +					   struct xsk_buff_pool *pool)
>   {
> -	if (!xp_validate_desc(umem->pool, d)) {
> +	if (!xp_validate_desc(pool, d)) {
>   		q->invalid_descs++;
>   		return false;
>   	}
> @@ -176,14 +176,14 @@ static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
>   
>   static inline bool xskq_cons_read_desc(struct xsk_queue *q,
>   				       struct xdp_desc *desc,
> -				       struct xdp_umem *umem)
> +				       struct xsk_buff_pool *pool)
>   {
>   	while (q->cached_cons != q->cached_prod) {
>   		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>   		u32 idx = q->cached_cons & q->ring_mask;
>   
>   		*desc = ring->desc[idx];
> -		if (xskq_cons_is_valid_desc(q, desc, umem))
> +		if (xskq_cons_is_valid_desc(q, desc, pool))
>   			return true;
>   
>   		q->cached_cons++;
> @@ -235,11 +235,11 @@ static inline bool xskq_cons_peek_addr_unchecked(struct xsk_queue *q, u64 *addr)
>   
>   static inline bool xskq_cons_peek_desc(struct xsk_queue *q,
>   				       struct xdp_desc *desc,
> -				       struct xdp_umem *umem)
> +				       struct xsk_buff_pool *pool)
>   {
>   	if (q->cached_prod == q->cached_cons)
>   		xskq_cons_get_entries(q);
> -	return xskq_cons_read_desc(q, desc, umem);
> +	return xskq_cons_read_desc(q, desc, pool);
>   }
>   
>   static inline void xskq_cons_release(struct xsk_queue *q)
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues
  2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
                   ` (14 preceding siblings ...)
  2020-07-06 18:39 ` [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Daniel Borkmann
@ 2020-07-08 15:00 ` Maxim Mikityanskiy
  2020-07-09  6:54   ` Magnus Karlsson
  15 siblings, 1 reply; 25+ messages in thread
From: Maxim Mikityanskiy @ 2020-07-08 15:00 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciej.fijalkowski, maciejromanfijalkowski,
	cristian.dumitrescu

On 2020-07-02 15:18, Magnus Karlsson wrote:
> This patch set adds support to share a umem between AF_XDP sockets
> bound to different queue ids on the same device or even between
> devices. It has already been possible to do this by registering the
> umem multiple times, but this wastes a lot of memory. Just imagine
> having 10 threads each having 10 sockets open sharing a single
> umem. This means that you would have to register the umem 100 times
> consuming large quantities of memory.

Just to clarify: the main memory savings are achieved, because we don't 
need to store an array of pages in struct xdp_umem multiple times, right?

I guess there is one more drawback of sharing a UMEM the old way 
(register it multiple times): it would map (DMA) the same pages multiple 
times.

> Instead, we extend the existing XDP_SHARED_UMEM flag to also work when
> sharing a umem between different queue ids as well as devices. If you
> would like to share umem between two sockets, just create the first
> one as would do normally. For the second socket you would not register
> the same umem using the XDP_UMEM_REG setsockopt. Instead attach one
> new fill ring and one new completion ring to this second socket and
> then use the XDP_SHARED_UMEM bind flag supplying the file descriptor of
> the first socket in the sxdp_shared_umem_fd field to signify that it
> is the umem of the first socket you would like to share.
> 
> One important thing to note in this example, is that there needs to be
> one fill ring and one completion ring per unique device and queue id
> bound to. This so that the single-producer and single-consumer semantics
> of the rings can be upheld. To recap, if you bind multiple sockets to
> the same device and queue id (already supported without this patch
> set), you only need one pair of fill and completion rings. If you bind
> multiple sockets to multiple different queues or devices, you need one
> fill and completion ring pair per unique device,queue_id tuple.
> 
> The implementation is based around extending the buffer pool in the
> core xsk code. This is a structure that exists on a per unique device
> and queue id basis. So, a number of entities that can now be shared
> are moved from the umem to the buffer pool. Information about DMA
> mappings are also moved from the buffer pool, but as these are per
> device independent of the queue id, they are now hanging off the
> netdev.

Basically, you want to map a pair of (netdev, UMEM) to DMA info. The 
current implementation of xp_find_dma_map stores a list of UMEMs in the 
netdev and goes over that list to find the corresponding DMA info. It 
would be more effective to do it vice-versa, i.e. to store the list of 
netdevs inside of a UMEM, because you normally have fewer netdevs in the 
system than sockets, and you'll have fewer list items to traverse. Of 
course, it has no effect on the data path, but it will improve the time 
to open a socket (i.e. connection rate).

> In summary after this patch set, there is one xdp_sock struct
> per socket created. This points to an xsk_buff_pool for which there is
> one per unique device and queue id. The buffer pool points to a DMA
> mapping structure for which there is one per device that a umem has
> been bound to. And finally, the buffer pool also points to a xdp_umem
> struct, for which there is only one per umem registration.
> 
> Before:
> 
> XSK -> UMEM -> POOL
> 
> Now:
> 
> XSK -> POOL -> DMA
>              \
> 	     > UMEM
> 
> Patches 1-8 only rearrange internal structures to support the buffer
> pool carrying this new information, while patch 9 improves performance
> as we now have rearrange the internal structures quite a bit. Finally,
> patches 10-14 introduce the new functionality together with libbpf
> support, samples, and documentation.
> 
> Libbpf has also been extended to support sharing of umems between
> sockets bound to different devices and queue ids by introducing a new
> function called xsk_socket__create_shared(). The difference between
> this and the existing xsk_socket__create() is that the former takes a
> reference to a fill ring and a completion ring as these need to be
> created. This new function needs to be used for the second and
> following sockets that binds to the same umem. The first one can be
> created by either function as it will also have called
> xsk_umem__create().
> 
> There is also a new sample xsk_fwd that demonstrates this new
> interface and capability.
> 
> Note to Maxim at Mellanox. I do not have a mlx5 card, so I have not
> been able to test the changes to your driver. It compiles, but that is
> all I can say, so it would be great if you could test it. Also, I did
> change the name of many functions and variables from umem to pool as a
> buffer pool is passed down to the driver in this patch set instead of
> the umem. I did not change the name of the files umem.c and
> umem.h. Please go through the changes and change things to your
> liking.

I looked through the mlx5 patches, and I see the changes are minor, and 
most importantly, the functionality is not broken (tested with xdpsock). 
I would still like to make some cosmetic amendments - I'll send you an 
updated patch.

> Performance for the non-shared umem case is unchanged for the xdpsock
> sample application with this patch set.

I also tested it on mlx5 (ConnectX-5 Ex), and the performance hasn't 
been hurt.

> For workloads that share a
> umem, this patch set can give rise to added performance benefits due
> to the decrease in memory usage.
> 
> This patch has been applied against commit 91f77560e473 ("Merge branch 'test_progs-improvements'")
> 
> Structure of the patch set:
> 
> Patch 1: Pass the buffer pool to the driver instead of the umem. This
>           because the driver needs one buffer pool per napi context
>           when we later introduce sharing of the umem between queue ids
>           and devices.
> Patch 2: Rename the xsk driver interface so they have better names
>           after the move to the buffer pool
> Patch 3: There is one buffer pool per device and queue, while there is
>           only one umem per registration. The buffer pool needs to be
>           created and destroyed independently of the umem.
> Patch 4: Move fill and completion rings to the buffer pool as there will
>           be one set of these per device and queue
> Patch 5: Move queue_id, dev and need_wakeup to buffer pool again as these
>           will now be per buffer pool as the umem can be shared between
>           devices and queues
> Patch 6: Move xsk_tx_list and its lock to buffer pool
> Patch 7: Move the creation/deletion of addrs from buffer pool to umem
> Patch 8: Enable sharing of DMA mappings when multiple queues of the
>           same device are bound
> Patch 9: Rearrange internal structs for better performance as these
>           have been substantially scrambled by the previous patches
> Patch 10: Add shared umem support between queue ids
> Patch 11: Add shared umem support between devices
> Patch 12: Add support for this in libbpf
> Patch 13: Add a new sample that demonstrates this new feature by
>            forwarding packets between different netdevs and queues
> Patch 14: Add documentation
> 
> Thanks: Magnus
> 
> Cristian Dumitrescu (1):
>    samples/bpf: add new sample xsk_fwd.c
> 
> Magnus Karlsson (13):
>    xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of
>      umem
>    xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces
>    xsk: create and free context independently from umem
>    xsk: move fill and completion rings to buffer pool
>    xsk: move queue_id, dev and need_wakeup to context
>    xsk: move xsk_tx_list and its lock to buffer pool
>    xsk: move addrs from buffer pool to umem
>    xsk: net: enable sharing of dma mappings
>    xsk: rearrange internal structs for better performance
>    xsk: add shared umem support between queue ids
>    xsk: add shared umem support between devices
>    libbpf: support shared umems between queues and devices
>    xsk: documentation for XDP_SHARED_UMEM between queues and netdevs
> 
>   Documentation/networking/af_xdp.rst                |   68 +-
>   drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |    2 +-
>   drivers/net/ethernet/intel/i40e/i40e_main.c        |   29 +-
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   10 +-
>   drivers/net/ethernet/intel/i40e/i40e_txrx.h        |    2 +-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c         |   79 +-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.h         |    4 +-
>   drivers/net/ethernet/intel/ice/ice.h               |   18 +-
>   drivers/net/ethernet/intel/ice/ice_base.c          |   16 +-
>   drivers/net/ethernet/intel/ice/ice_lib.c           |    2 +-
>   drivers/net/ethernet/intel/ice/ice_main.c          |   10 +-
>   drivers/net/ethernet/intel/ice/ice_txrx.c          |    8 +-
>   drivers/net/ethernet/intel/ice/ice_txrx.h          |    2 +-
>   drivers/net/ethernet/intel/ice/ice_xsk.c           |  142 +--
>   drivers/net/ethernet/intel/ice/ice_xsk.h           |    7 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe.h           |    2 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   34 +-
>   .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h   |    7 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |   61 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en.h       |   19 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |    5 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |   10 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c |   12 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.h |    2 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    |   12 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h    |    6 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c  |  108 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h  |   14 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   46 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   16 +-
>   include/linux/netdevice.h                          |   13 +-
>   include/net/xdp_sock.h                             |   28 +-
>   include/net/xdp_sock_drv.h                         |  115 ++-
>   include/net/xsk_buff_pool.h                        |   47 +-
>   net/core/dev.c                                     |    3 +
>   net/ethtool/channels.c                             |    2 +-
>   net/ethtool/ioctl.c                                |    2 +-
>   net/xdp/xdp_umem.c                                 |  221 +---
>   net/xdp/xdp_umem.h                                 |    6 -
>   net/xdp/xsk.c                                      |  213 ++--
>   net/xdp/xsk.h                                      |    3 +
>   net/xdp/xsk_buff_pool.c                            |  314 +++++-
>   net/xdp/xsk_diag.c                                 |   14 +-
>   net/xdp/xsk_queue.h                                |   12 +-
>   samples/bpf/Makefile                               |    3 +
>   samples/bpf/xsk_fwd.c                              | 1075 ++++++++++++++++++++
>   tools/lib/bpf/libbpf.map                           |    1 +
>   tools/lib/bpf/xsk.c                                |  376 ++++---
>   tools/lib/bpf/xsk.h                                |    9 +
>   49 files changed, 2327 insertions(+), 883 deletions(-)
>   create mode 100644 samples/bpf/xsk_fwd.c
> 
> --
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem
  2020-07-02 12:19 ` [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Magnus Karlsson
@ 2020-07-08 15:00   ` Maxim Mikityanskiy
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Mikityanskiy @ 2020-07-08 15:00 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciej.fijalkowski, maciejromanfijalkowski,
	cristian.dumitrescu

On 2020-07-02 15:19, Magnus Karlsson wrote:
> Replace the explicit umem reference passed to the driver in
> AF_XDP zero-copy mode with the buffer pool instead. This in
> preparation for extending the functionality of the zero-copy mode
> so that umems can be shared between queues on the same netdev and
> also between netdevs. In this commit, only an umem reference has
> been added to the buffer pool struct. But later commits will add
> other entities to it. These are going to be entities that are
> different between different queue ids and netdevs even though the
> umem is shared between them.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |   2 +-
>   drivers/net/ethernet/intel/i40e/i40e_main.c        |  29 +++--
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  10 +-
>   drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c         |  81 ++++++------
>   drivers/net/ethernet/intel/i40e/i40e_xsk.h         |   4 +-
>   drivers/net/ethernet/intel/ice/ice.h               |  18 +--
>   drivers/net/ethernet/intel/ice/ice_base.c          |  16 +--
>   drivers/net/ethernet/intel/ice/ice_lib.c           |   2 +-
>   drivers/net/ethernet/intel/ice/ice_main.c          |  10 +-
>   drivers/net/ethernet/intel/ice/ice_txrx.c          |   8 +-
>   drivers/net/ethernet/intel/ice/ice_txrx.h          |   2 +-
>   drivers/net/ethernet/intel/ice/ice_xsk.c           | 142 ++++++++++-----------
>   drivers/net/ethernet/intel/ice/ice_xsk.h           |   7 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe.h           |   2 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  34 ++---
>   .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h   |   7 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |  61 ++++-----
>   drivers/net/ethernet/mellanox/mlx5/core/en.h       |  19 +--
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |   5 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |  10 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c |  12 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.h |   2 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    |  12 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h    |   6 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c  | 108 ++++++++--------
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h  |  14 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  46 +++----
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  16 +--
>   include/linux/netdevice.h                          |  10 +-
>   include/net/xdp_sock_drv.h                         |   7 +-
>   include/net/xsk_buff_pool.h                        |   4 +-
>   net/ethtool/channels.c                             |   2 +-
>   net/ethtool/ioctl.c                                |   2 +-
>   net/xdp/xdp_umem.c                                 |  45 +++----
>   net/xdp/xsk_buff_pool.c                            |   5 +-
>   36 files changed, 389 insertions(+), 373 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index aa8026b..422b54f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -1967,7 +1967,7 @@ static int i40e_set_ringparam(struct net_device *netdev,
>   	    (new_rx_count == vsi->rx_rings[0]->count))
>   		return 0;
>   
> -	/* If there is a AF_XDP UMEM attached to any of Rx rings,
> +	/* If there is a AF_XDP page pool attached to any of Rx rings,
>   	 * disallow changing the number of descriptors -- regardless
>   	 * if the netdev is running or not.
>   	 */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 5d807c8..3df725e 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -3103,12 +3103,12 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
>   }
>   
>   /**
> - * i40e_xsk_umem - Retrieve the AF_XDP ZC if XDP and ZC is enabled
> + * i40e_xsk_pool - Retrieve the AF_XDP buffer pool if XDP and ZC is enabled
>    * @ring: The Tx or Rx ring
>    *
> - * Returns the UMEM or NULL.
> + * Returns the AF_XDP buffer pool or NULL.
>    **/
> -static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
> +static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring)
>   {
>   	bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi);
>   	int qid = ring->queue_index;
> @@ -3119,7 +3119,7 @@ static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
>   	if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps))
>   		return NULL;
>   
> -	return xdp_get_umem_from_qid(ring->vsi->netdev, qid);
> +	return xdp_get_xsk_pool_from_qid(ring->vsi->netdev, qid);
>   }
>   
>   /**
> @@ -3138,7 +3138,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
>   	u32 qtx_ctl = 0;
>   
>   	if (ring_is_xdp(ring))
> -		ring->xsk_umem = i40e_xsk_umem(ring);
> +		ring->xsk_pool = i40e_xsk_pool(ring);
>   
>   	/* some ATR related tx ring init */
>   	if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) {
> @@ -3261,12 +3261,13 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
>   		xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
>   
>   	kfree(ring->rx_bi);
> -	ring->xsk_umem = i40e_xsk_umem(ring);
> -	if (ring->xsk_umem) {
> +	ring->xsk_pool = i40e_xsk_pool(ring);
> +	if (ring->xsk_pool) {
>   		ret = i40e_alloc_rx_bi_zc(ring);
>   		if (ret)
>   			return ret;
> -		ring->rx_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem);
> +		ring->rx_buf_len =
> +		  xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
>   		/* For AF_XDP ZC, we disallow packets to span on
>   		 * multiple buffers, thus letting us skip that
>   		 * handling in the fast-path.
> @@ -3349,8 +3350,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
>   	ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q);
>   	writel(0, ring->tail);
>   
> -	if (ring->xsk_umem) {
> -		xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq);
> +	if (ring->xsk_pool) {
> +		xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
>   		ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring));
>   	} else {
>   		ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring));
> @@ -3361,7 +3362,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
>   		 */
>   		dev_info(&vsi->back->pdev->dev,
>   			 "Failed to allocate some buffers on %sRx ring %d (pf_q %d)\n",
> -			 ring->xsk_umem ? "UMEM enabled " : "",
> +			 ring->xsk_pool ? "AF_XDP ZC enabled " : "",
>   			 ring->queue_index, pf_q);
>   	}
>   
> @@ -12553,7 +12554,7 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
>   	 */
>   	if (need_reset && prog)
>   		for (i = 0; i < vsi->num_queue_pairs; i++)
> -			if (vsi->xdp_rings[i]->xsk_umem)
> +			if (vsi->xdp_rings[i]->xsk_pool)
>   				(void)i40e_xsk_wakeup(vsi->netdev, i,
>   						      XDP_WAKEUP_RX);
>   
> @@ -12835,8 +12836,8 @@ static int i40e_xdp(struct net_device *dev,
>   	case XDP_QUERY_PROG:
>   		xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0;
>   		return 0;
> -	case XDP_SETUP_XSK_UMEM:
> -		return i40e_xsk_umem_setup(vsi, xdp->xsk.umem,
> +	case XDP_SETUP_XSK_POOL:
> +		return i40e_xsk_pool_setup(vsi, xdp->xsk.pool,
>   					   xdp->xsk.queue_id);
>   	default:
>   		return -EINVAL;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index f9555c8..a50592b 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -636,7 +636,7 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
>   	unsigned long bi_size;
>   	u16 i;
>   
> -	if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) {
> +	if (ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
>   		i40e_xsk_clean_tx_ring(tx_ring);
>   	} else {
>   		/* ring already cleared, nothing to do */
> @@ -1335,7 +1335,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
>   		rx_ring->skb = NULL;
>   	}
>   
> -	if (rx_ring->xsk_umem) {
> +	if (rx_ring->xsk_pool) {
>   		i40e_xsk_clean_rx_ring(rx_ring);
>   		goto skip_free;
>   	}
> @@ -1369,7 +1369,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
>   	}
>   
>   skip_free:
> -	if (rx_ring->xsk_umem)
> +	if (rx_ring->xsk_pool)
>   		i40e_clear_rx_bi_zc(rx_ring);
>   	else
>   		i40e_clear_rx_bi(rx_ring);
> @@ -2579,7 +2579,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   	 * budget and be more aggressive about cleaning up the Tx descriptors.
>   	 */
>   	i40e_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_umem ?
> +		bool wd = ring->xsk_pool ?
>   			  i40e_clean_xdp_tx_irq(vsi, ring, budget) :
>   			  i40e_clean_tx_irq(vsi, ring, budget);
>   
> @@ -2601,7 +2601,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   	budget_per_ring = max(budget/q_vector->num_ringpairs, 1);
>   
>   	i40e_for_each_ring(ring, q_vector->rx) {
> -		int cleaned = ring->xsk_umem ?
> +		int cleaned = ring->xsk_pool ?
>   			      i40e_clean_rx_irq_zc(ring, budget_per_ring) :
>   			      i40e_clean_rx_irq(ring, budget_per_ring);
>   
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index 5c25597..88d43ed 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -411,7 +411,7 @@ struct i40e_ring {
>   
>   	struct i40e_channel *ch;
>   	struct xdp_rxq_info xdp_rxq;
> -	struct xdp_umem *xsk_umem;
> +	struct xsk_buff_pool *xsk_pool;
>   } ____cacheline_internodealigned_in_smp;
>   
>   static inline bool ring_uses_build_skb(struct i40e_ring *ring)
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 7276580..d7ebdf6 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -29,14 +29,16 @@ static struct xdp_buff **i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx)
>   }
>   
>   /**
> - * i40e_xsk_umem_enable - Enable/associate a UMEM to a certain ring/qid
> + * i40e_xsk_pool_enable - Enable/associate an AF_XDP buffer pool to a
> + * certain ring/qid
>    * @vsi: Current VSI
> - * @umem: UMEM
> - * @qid: Rx ring to associate UMEM to
> + * @pool: buffer pool
> + * @qid: Rx ring to associate buffer pool with
>    *
>    * Returns 0 on success, <0 on failure
>    **/
> -static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
> +static int i40e_xsk_pool_enable(struct i40e_vsi *vsi,
> +				struct xsk_buff_pool *pool,
>   				u16 qid)
>   {
>   	struct net_device *netdev = vsi->netdev;
> @@ -53,7 +55,8 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
>   	    qid >= netdev->real_num_tx_queues)
>   		return -EINVAL;
>   
> -	err = xsk_buff_dma_map(umem, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR);
> +	err = xsk_buff_dma_map(pool->umem, &vsi->back->pdev->dev,
> +			       I40E_RX_DMA_ATTR);
>   	if (err)
>   		return err;
>   
> @@ -80,21 +83,22 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
>   }
>   
>   /**
> - * i40e_xsk_umem_disable - Disassociate a UMEM from a certain ring/qid
> + * i40e_xsk_pool_disable - Disassociate an AF_XDP buffer pool from a
> + * certain ring/qid
>    * @vsi: Current VSI
> - * @qid: Rx ring to associate UMEM to
> + * @qid: Rx ring to associate buffer pool with
>    *
>    * Returns 0 on success, <0 on failure
>    **/
> -static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
> +static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid)
>   {
>   	struct net_device *netdev = vsi->netdev;
> -	struct xdp_umem *umem;
> +	struct xsk_buff_pool *pool;
>   	bool if_running;
>   	int err;
>   
> -	umem = xdp_get_umem_from_qid(netdev, qid);
> -	if (!umem)
> +	pool = xdp_get_xsk_pool_from_qid(netdev, qid);
> +	if (!pool)
>   		return -EINVAL;
>   
>   	if_running = netif_running(vsi->netdev) && i40e_enabled_xdp_vsi(vsi);
> @@ -106,7 +110,7 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
>   	}
>   
>   	clear_bit(qid, vsi->af_xdp_zc_qps);
> -	xsk_buff_dma_unmap(umem, I40E_RX_DMA_ATTR);
> +	xsk_buff_dma_unmap(pool->umem, I40E_RX_DMA_ATTR);
>   
>   	if (if_running) {
>   		err = i40e_queue_pair_enable(vsi, qid);
> @@ -118,20 +122,21 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
>   }
>   
>   /**
> - * i40e_xsk_umem_setup - Enable/disassociate a UMEM to/from a ring/qid
> + * i40e_xsk_pool_setup - Enable/disassociate an AF_XDP buffer pool to/from
> + * a ring/qid
>    * @vsi: Current VSI
> - * @umem: UMEM to enable/associate to a ring, or NULL to disable
> - * @qid: Rx ring to (dis)associate UMEM (from)to
> + * @pool: Buffer pool to enable/associate to a ring, or NULL to disable
> + * @qid: Rx ring to (dis)associate buffer pool (from)to
>    *
> - * This function enables or disables a UMEM to a certain ring.
> + * This function enables or disables a buffer pool to a certain ring.
>    *
>    * Returns 0 on success, <0 on failure
>    **/
> -int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
> +int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
>   			u16 qid)
>   {
> -	return umem ? i40e_xsk_umem_enable(vsi, umem, qid) :
> -		i40e_xsk_umem_disable(vsi, qid);
> +	return pool ? i40e_xsk_pool_enable(vsi, pool, qid) :
> +		i40e_xsk_pool_disable(vsi, qid);
>   }
>   
>   /**
> @@ -191,7 +196,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
>   	rx_desc = I40E_RX_DESC(rx_ring, ntu);
>   	bi = i40e_rx_bi(rx_ring, ntu);
>   	do {
> -		xdp = xsk_buff_alloc(rx_ring->xsk_umem);
> +		xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
>   		if (!xdp) {
>   			ok = false;
>   			goto no_buffers;
> @@ -358,11 +363,11 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>   	i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
>   	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
>   
> -	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) {
> +	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
>   		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
> -			xsk_set_rx_need_wakeup(rx_ring->xsk_umem);
> +			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
>   		else
> -			xsk_clear_rx_need_wakeup(rx_ring->xsk_umem);
> +			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
>   
>   		return (int)total_rx_packets;
>   	}
> @@ -391,11 +396,12 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   			break;
>   		}
>   
> -		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
> +		if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc))
>   			break;
>   
> -		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr);
> -		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma,
> +		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem,
> +					   desc.addr);
> +		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma,
>   						 desc.len);
>   
>   		tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
> @@ -419,7 +425,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   						 I40E_TXD_QW1_CMD_SHIFT);
>   		i40e_xdp_ring_update_tail(xdp_ring);
>   
> -		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
> +		xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem);
>   	}
>   
>   	return !!budget && work_done;
> @@ -452,7 +458,7 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi,
>   {
>   	unsigned int ntc, total_bytes = 0, budget = vsi->work_limit;
>   	u32 i, completed_frames, frames_ready, xsk_frames = 0;
> -	struct xdp_umem *umem = tx_ring->xsk_umem;
> +	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
>   	u32 head_idx = i40e_get_head(tx_ring);
>   	bool work_done = true, xmit_done;
>   	struct i40e_tx_buffer *tx_bi;
> @@ -492,14 +498,14 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi,
>   		tx_ring->next_to_clean -= tx_ring->count;
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(umem, xsk_frames);
> +		xsk_umem_complete_tx(bp->umem, xsk_frames);
>   
>   	i40e_arm_wb(tx_ring, vsi, budget);
>   	i40e_update_tx_stats(tx_ring, completed_frames, total_bytes);
>   
>   out_xmit:
> -	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem))
> -		xsk_set_tx_need_wakeup(tx_ring->xsk_umem);
> +	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_pool->umem))
> +		xsk_set_tx_need_wakeup(tx_ring->xsk_pool->umem);
>   
>   	xmit_done = i40e_xmit_zc(tx_ring, budget);
>   
> @@ -533,7 +539,7 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
>   	if (queue_id >= vsi->num_queue_pairs)
>   		return -ENXIO;
>   
> -	if (!vsi->xdp_rings[queue_id]->xsk_umem)
> +	if (!vsi->xdp_rings[queue_id]->xsk_pool)
>   		return -ENXIO;
>   
>   	ring = vsi->xdp_rings[queue_id];
> @@ -572,7 +578,7 @@ void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring)
>   void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
>   {
>   	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
> -	struct xdp_umem *umem = tx_ring->xsk_umem;
> +	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
>   	struct i40e_tx_buffer *tx_bi;
>   	u32 xsk_frames = 0;
>   
> @@ -592,14 +598,15 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
>   	}
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(umem, xsk_frames);
> +		xsk_umem_complete_tx(bp->umem, xsk_frames);
>   }
>   
>   /**
> - * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have AF_XDP UMEM attached
> + * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have an AF_XDP
> + * buffer pool attached
>    * @vsi: vsi
>    *
> - * Returns true if any of the Rx rings has an AF_XDP UMEM attached
> + * Returns true if any of the Rx rings has an AF_XDP buffer pool attached
>    **/
>   bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
>   {
> @@ -607,7 +614,7 @@ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
>   	int i;
>   
>   	for (i = 0; i < vsi->num_queue_pairs; i++) {
> -		if (xdp_get_umem_from_qid(netdev, i))
> +		if (xdp_get_xsk_pool_from_qid(netdev, i))
>   			return true;
>   	}
>   
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index ea919a7d..a5ad927 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -5,12 +5,12 @@
>   #define _I40E_XSK_H_
>   
>   struct i40e_vsi;
> -struct xdp_umem;
> +struct xsk_buff_pool;
>   struct zero_copy_allocator;
>   
>   int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
>   int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
> -int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
> +int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
>   			u16 qid);
>   bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>   int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index 5792ee6..9eff7e8 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -318,9 +318,9 @@ struct ice_vsi {
>   	struct ice_ring **xdp_rings;	 /* XDP ring array */
>   	u16 num_xdp_txq;		 /* Used XDP queues */
>   	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
> -	struct xdp_umem **xsk_umems;
> -	u16 num_xsk_umems_used;
> -	u16 num_xsk_umems;
> +	struct xsk_buff_pool **xsk_pools;
> +	u16 num_xsk_pools_used;
> +	u16 num_xsk_pools;
>   } ____cacheline_internodealigned_in_smp;
>   
>   /* struct that defines an interrupt vector */
> @@ -489,25 +489,25 @@ static inline void ice_set_ring_xdp(struct ice_ring *ring)
>   }
>   
>   /**
> - * ice_xsk_umem - get XDP UMEM bound to a ring
> + * ice_xsk_pool - get XSK buffer pool bound to a ring
>    * @ring - ring to use
>    *
> - * Returns a pointer to xdp_umem structure if there is an UMEM present,
> + * Returns a pointer to xdp_umem structure if there is a buffer pool present,
>    * NULL otherwise.
>    */
> -static inline struct xdp_umem *ice_xsk_umem(struct ice_ring *ring)
> +static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_ring *ring)
>   {
> -	struct xdp_umem **umems = ring->vsi->xsk_umems;
> +	struct xsk_buff_pool **pools = ring->vsi->xsk_pools;
>   	u16 qid = ring->q_index;
>   
>   	if (ice_ring_is_xdp(ring))
>   		qid -= ring->vsi->num_xdp_txq;
>   
> -	if (qid >= ring->vsi->num_xsk_umems || !umems || !umems[qid] ||
> +	if (qid >= ring->vsi->num_xsk_pools || !pools || !pools[qid] ||
>   	    !ice_is_xdp_ena_vsi(ring->vsi))
>   		return NULL;
>   
> -	return umems[qid];
> +	return pools[qid];
>   }
>   
>   /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index d620d26..94dbf89 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -308,12 +308,12 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>   			xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
>   					 ring->q_index);
>   
> -		ring->xsk_umem = ice_xsk_umem(ring);
> -		if (ring->xsk_umem) {
> +		ring->xsk_pool = ice_xsk_pool(ring);
> +		if (ring->xsk_pool) {
>   			xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
>   
>   			ring->rx_buf_len =
> -				xsk_umem_get_rx_frame_size(ring->xsk_umem);
> +				xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
>   			/* For AF_XDP ZC, we disallow packets to span on
>   			 * multiple buffers, thus letting us skip that
>   			 * handling in the fast-path.
> @@ -324,7 +324,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>   							 NULL);
>   			if (err)
>   				return err;
> -			xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq);
> +			xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
>   
>   			dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
>   				 ring->q_index);
> @@ -417,9 +417,9 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>   	ring->tail = hw->hw_addr + QRX_TAIL(pf_q);
>   	writel(0, ring->tail);
>   
> -	if (ring->xsk_umem) {
> -		if (!xsk_buff_can_alloc(ring->xsk_umem, num_bufs)) {
> -			dev_warn(dev, "UMEM does not provide enough addresses to fill %d buffers on Rx ring %d\n",
> +	if (ring->xsk_pool) {
> +		if (!xsk_buff_can_alloc(ring->xsk_pool->umem, num_bufs)) {
> +			dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n",
>   				 num_bufs, ring->q_index);
>   			dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n");
>   
> @@ -428,7 +428,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>   
>   		err = ice_alloc_rx_bufs_zc(ring, num_bufs);
>   		if (err)
> -			dev_info(dev, "Failed to allocate some buffers on UMEM enabled Rx ring %d (pf_q %d)\n",
> +			dev_info(dev, "Failed to allocate some buffers on XSK buffer pool enabled Rx ring %d (pf_q %d)\n",
>   				 ring->q_index, pf_q);
>   		return 0;
>   	}
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index 28b46cc..e87e25a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -1713,7 +1713,7 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi)
>   		return ret;
>   
>   	for (i = 0; i < vsi->num_xdp_txq; i++)
> -		vsi->xdp_rings[i]->xsk_umem = ice_xsk_umem(vsi->xdp_rings[i]);
> +		vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi->xdp_rings[i]);
>   
>   	return ret;
>   }
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 082825e..b354abaf 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -1706,7 +1706,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
>   		if (ice_setup_tx_ring(xdp_ring))
>   			goto free_xdp_rings;
>   		ice_set_ring_xdp(xdp_ring);
> -		xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring);
> +		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
>   	}
>   
>   	return 0;
> @@ -1950,13 +1950,13 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>   	if (if_running)
>   		ret = ice_up(vsi);
>   
> -	if (!ret && prog && vsi->xsk_umems) {
> +	if (!ret && prog && vsi->xsk_pools) {
>   		int i;
>   
>   		ice_for_each_rxq(vsi, i) {
>   			struct ice_ring *rx_ring = vsi->rx_rings[i];
>   
> -			if (rx_ring->xsk_umem)
> +			if (rx_ring->xsk_pool)
>   				napi_schedule(&rx_ring->q_vector->napi);
>   		}
>   	}
> @@ -1985,8 +1985,8 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   	case XDP_QUERY_PROG:
>   		xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0;
>   		return 0;
> -	case XDP_SETUP_XSK_UMEM:
> -		return ice_xsk_umem_setup(vsi, xdp->xsk.umem,
> +	case XDP_SETUP_XSK_POOL:
> +		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
>   					  xdp->xsk.queue_id);
>   	default:
>   		return -EINVAL;
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index abdb137c..241c1ea 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -145,7 +145,7 @@ void ice_clean_tx_ring(struct ice_ring *tx_ring)
>   {
>   	u16 i;
>   
> -	if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_umem) {
> +	if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
>   		ice_xsk_clean_xdp_ring(tx_ring);
>   		goto tx_skip_free;
>   	}
> @@ -375,7 +375,7 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring)
>   	if (!rx_ring->rx_buf)
>   		return;
>   
> -	if (rx_ring->xsk_umem) {
> +	if (rx_ring->xsk_pool) {
>   		ice_xsk_clean_rx_ring(rx_ring);
>   		goto rx_skip_free;
>   	}
> @@ -1619,7 +1619,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
>   	 * budget and be more aggressive about cleaning up the Tx descriptors.
>   	 */
>   	ice_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_umem ?
> +		bool wd = ring->xsk_pool ?
>   			  ice_clean_tx_irq_zc(ring, budget) :
>   			  ice_clean_tx_irq(ring, budget);
>   
> @@ -1649,7 +1649,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
>   		 * comparison in the irq context instead of many inside the
>   		 * ice_clean_rx_irq function and makes the codebase cleaner.
>   		 */
> -		cleaned = ring->xsk_umem ?
> +		cleaned = ring->xsk_pool ?
>   			  ice_clean_rx_irq_zc(ring, budget_per_ring) :
>   			  ice_clean_rx_irq(ring, budget_per_ring);
>   		work_done += cleaned;
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
> index e70c461..3b37360 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> @@ -295,7 +295,7 @@ struct ice_ring {
>   
>   	struct rcu_head rcu;		/* to avoid race on free */
>   	struct bpf_prog *xdp_prog;
> -	struct xdp_umem *xsk_umem;
> +	struct xsk_buff_pool *xsk_pool;
>   	/* CL3 - 3rd cacheline starts here */
>   	struct xdp_rxq_info xdp_rxq;
>   	/* CLX - the below items are only accessed infrequently and should be
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index b6f928c..f0ce669 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -234,7 +234,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>   		if (err)
>   			goto free_buf;
>   		ice_set_ring_xdp(xdp_ring);
> -		xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring);
> +		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
>   	}
>   
>   	err = ice_setup_rx_ctx(rx_ring);
> @@ -258,21 +258,21 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>   }
>   
>   /**
> - * ice_xsk_alloc_umems - allocate a UMEM region for an XDP socket
> - * @vsi: VSI to allocate the UMEM on
> + * ice_xsk_alloc_pools - allocate a buffer pool for an XDP socket
> + * @vsi: VSI to allocate the buffer pool on
>    *
>    * Returns 0 on success, negative on error
>    */
> -static int ice_xsk_alloc_umems(struct ice_vsi *vsi)
> +static int ice_xsk_alloc_pools(struct ice_vsi *vsi)
>   {
> -	if (vsi->xsk_umems)
> +	if (vsi->xsk_pools)
>   		return 0;
>   
> -	vsi->xsk_umems = kcalloc(vsi->num_xsk_umems, sizeof(*vsi->xsk_umems),
> +	vsi->xsk_pools = kcalloc(vsi->num_xsk_pools, sizeof(*vsi->xsk_pools),
>   				 GFP_KERNEL);
>   
> -	if (!vsi->xsk_umems) {
> -		vsi->num_xsk_umems = 0;
> +	if (!vsi->xsk_pools) {
> +		vsi->num_xsk_pools = 0;
>   		return -ENOMEM;
>   	}
>   
> @@ -280,74 +280,74 @@ static int ice_xsk_alloc_umems(struct ice_vsi *vsi)
>   }
>   
>   /**
> - * ice_xsk_remove_umem - Remove an UMEM for a certain ring/qid
> + * ice_xsk_remove_pool - Remove an buffer pool for a certain ring/qid
>    * @vsi: VSI from which the VSI will be removed
> - * @qid: Ring/qid associated with the UMEM
> + * @qid: Ring/qid associated with the buffer pool
>    */
> -static void ice_xsk_remove_umem(struct ice_vsi *vsi, u16 qid)
> +static void ice_xsk_remove_pool(struct ice_vsi *vsi, u16 qid)
>   {
> -	vsi->xsk_umems[qid] = NULL;
> -	vsi->num_xsk_umems_used--;
> +	vsi->xsk_pools[qid] = NULL;
> +	vsi->num_xsk_pools_used--;
>   
> -	if (vsi->num_xsk_umems_used == 0) {
> -		kfree(vsi->xsk_umems);
> -		vsi->xsk_umems = NULL;
> -		vsi->num_xsk_umems = 0;
> +	if (vsi->num_xsk_pools_used == 0) {
> +		kfree(vsi->xsk_pools);
> +		vsi->xsk_pools = NULL;
> +		vsi->num_xsk_pools = 0;
>   	}
>   }
>   
>   
>   /**
> - * ice_xsk_umem_disable - disable a UMEM region
> + * ice_xsk_pool_disable - disable a buffer pool region
>    * @vsi: Current VSI
>    * @qid: queue ID
>    *
>    * Returns 0 on success, negative on failure
>    */
> -static int ice_xsk_umem_disable(struct ice_vsi *vsi, u16 qid)
> +static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid)
>   {
> -	if (!vsi->xsk_umems || qid >= vsi->num_xsk_umems ||
> -	    !vsi->xsk_umems[qid])
> +	if (!vsi->xsk_pools || qid >= vsi->num_xsk_pools ||
> +	    !vsi->xsk_pools[qid])
>   		return -EINVAL;
>   
> -	xsk_buff_dma_unmap(vsi->xsk_umems[qid], ICE_RX_DMA_ATTR);
> -	ice_xsk_remove_umem(vsi, qid);
> +	xsk_buff_dma_unmap(vsi->xsk_pools[qid]->umem, ICE_RX_DMA_ATTR);
> +	ice_xsk_remove_pool(vsi, qid);
>   
>   	return 0;
>   }
>   
>   /**
> - * ice_xsk_umem_enable - enable a UMEM region
> + * ice_xsk_pool_enable - enable a buffer pool region
>    * @vsi: Current VSI
> - * @umem: pointer to a requested UMEM region
> + * @pool: pointer to a requested buffer pool region
>    * @qid: queue ID
>    *
>    * Returns 0 on success, negative on failure
>    */
>   static int
> -ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
> +ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
>   {
>   	int err;
>   
>   	if (vsi->type != ICE_VSI_PF)
>   		return -EINVAL;
>   
> -	if (!vsi->num_xsk_umems)
> -		vsi->num_xsk_umems = min_t(u16, vsi->num_rxq, vsi->num_txq);
> -	if (qid >= vsi->num_xsk_umems)
> +	if (!vsi->num_xsk_pools)
> +		vsi->num_xsk_pools = min_t(u16, vsi->num_rxq, vsi->num_txq);
> +	if (qid >= vsi->num_xsk_pools)
>   		return -EINVAL;
>   
> -	err = ice_xsk_alloc_umems(vsi);
> +	err = ice_xsk_alloc_pools(vsi);
>   	if (err)
>   		return err;
>   
> -	if (vsi->xsk_umems && vsi->xsk_umems[qid])
> +	if (vsi->xsk_pools && vsi->xsk_pools[qid])
>   		return -EBUSY;
>   
> -	vsi->xsk_umems[qid] = umem;
> -	vsi->num_xsk_umems_used++;
> +	vsi->xsk_pools[qid] = pool;
> +	vsi->num_xsk_pools_used++;
>   
> -	err = xsk_buff_dma_map(vsi->xsk_umems[qid], ice_pf_to_dev(vsi->back),
> +	err = xsk_buff_dma_map(vsi->xsk_pools[qid]->umem, ice_pf_to_dev(vsi->back),
>   			       ICE_RX_DMA_ATTR);
>   	if (err)
>   		return err;
> @@ -356,17 +356,17 @@ ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
>   }
>   
>   /**
> - * ice_xsk_umem_setup - enable/disable a UMEM region depending on its state
> + * ice_xsk_pool_setup - enable/disable a buffer pool region depending on its state
>    * @vsi: Current VSI
> - * @umem: UMEM to enable/associate to a ring, NULL to disable
> + * @pool: buffer pool to enable/associate to a ring, NULL to disable
>    * @qid: queue ID
>    *
>    * Returns 0 on success, negative on failure
>    */
> -int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
> +int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
>   {
> -	bool if_running, umem_present = !!umem;
> -	int ret = 0, umem_failure = 0;
> +	bool if_running, pool_present = !!pool;
> +	int ret = 0, pool_failure = 0;
>   
>   	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
>   
> @@ -374,26 +374,26 @@ int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
>   		ret = ice_qp_dis(vsi, qid);
>   		if (ret) {
>   			netdev_err(vsi->netdev, "ice_qp_dis error = %d\n", ret);
> -			goto xsk_umem_if_up;
> +			goto xsk_pool_if_up;
>   		}
>   	}
>   
> -	umem_failure = umem_present ? ice_xsk_umem_enable(vsi, umem, qid) :
> -				      ice_xsk_umem_disable(vsi, qid);
> +	pool_failure = pool_present ? ice_xsk_pool_enable(vsi, pool, qid) :
> +				      ice_xsk_pool_disable(vsi, qid);
>   
> -xsk_umem_if_up:
> +xsk_pool_if_up:
>   	if (if_running) {
>   		ret = ice_qp_ena(vsi, qid);
> -		if (!ret && umem_present)
> +		if (!ret && pool_present)
>   			napi_schedule(&vsi->xdp_rings[qid]->q_vector->napi);
>   		else if (ret)
>   			netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret);
>   	}
>   
> -	if (umem_failure) {
> -		netdev_err(vsi->netdev, "Could not %sable UMEM, error = %d\n",
> -			   umem_present ? "en" : "dis", umem_failure);
> -		return umem_failure;
> +	if (pool_failure) {
> +		netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n",
> +			   pool_present ? "en" : "dis", pool_failure);
> +		return pool_failure;
>   	}
>   
>   	return ret;
> @@ -424,7 +424,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
>   	rx_buf = &rx_ring->rx_buf[ntu];
>   
>   	do {
> -		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_umem);
> +		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
>   		if (!rx_buf->xdp) {
>   			ret = true;
>   			break;
> @@ -645,11 +645,11 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
>   	ice_finalize_xdp_rx(rx_ring, xdp_xmit);
>   	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
>   
> -	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) {
> +	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
>   		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
> -			xsk_set_rx_need_wakeup(rx_ring->xsk_umem);
> +			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
>   		else
> -			xsk_clear_rx_need_wakeup(rx_ring->xsk_umem);
> +			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
>   
>   		return (int)total_rx_packets;
>   	}
> @@ -682,11 +682,11 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
>   
>   		tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use];
>   
> -		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
> +		if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc))
>   			break;
>   
> -		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr);
> -		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma,
> +		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, desc.addr);
> +		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma,
>   						 desc.len);
>   
>   		tx_buf->bytecount = desc.len;
> @@ -703,9 +703,9 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
>   
>   	if (tx_desc) {
>   		ice_xdp_ring_update_tail(xdp_ring);
> -		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
> -		if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem))
> -			xsk_clear_tx_need_wakeup(xdp_ring->xsk_umem);
> +		xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem);
> +		if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem))
> +			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem);
>   	}
>   
>   	return budget > 0 && work_done;
> @@ -779,13 +779,13 @@ bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget)
>   	xdp_ring->next_to_clean = ntc;
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames);
> +		xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames);
>   
> -	if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem)) {
> +	if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) {
>   		if (xdp_ring->next_to_clean == xdp_ring->next_to_use)
> -			xsk_set_tx_need_wakeup(xdp_ring->xsk_umem);
> +			xsk_set_tx_need_wakeup(xdp_ring->xsk_pool->umem);
>   		else
> -			xsk_clear_tx_need_wakeup(xdp_ring->xsk_umem);
> +			xsk_clear_tx_need_wakeup(xdp_ring->xsk_pool->umem);
>   	}
>   
>   	ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes);
> @@ -820,7 +820,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
>   	if (queue_id >= vsi->num_txq)
>   		return -ENXIO;
>   
> -	if (!vsi->xdp_rings[queue_id]->xsk_umem)
> +	if (!vsi->xdp_rings[queue_id]->xsk_pool)
>   		return -ENXIO;
>   
>   	ring = vsi->xdp_rings[queue_id];
> @@ -839,20 +839,20 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
>   }
>   
>   /**
> - * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP UMEM attached
> + * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP buff pool attached
>    * @vsi: VSI to be checked
>    *
> - * Returns true if any of the Rx rings has an AF_XDP UMEM attached
> + * Returns true if any of the Rx rings has an AF_XDP buff pool attached
>    */
>   bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi)
>   {
>   	int i;
>   
> -	if (!vsi->xsk_umems)
> +	if (!vsi->xsk_pools)
>   		return false;
>   
> -	for (i = 0; i < vsi->num_xsk_umems; i++) {
> -		if (vsi->xsk_umems[i])
> +	for (i = 0; i < vsi->num_xsk_pools; i++) {
> +		if (vsi->xsk_pools[i])
>   			return true;
>   	}
>   
> @@ -860,7 +860,7 @@ bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi)
>   }
>   
>   /**
> - * ice_xsk_clean_rx_ring - clean UMEM queues connected to a given Rx ring
> + * ice_xsk_clean_rx_ring - clean buffer pool queues connected to a given Rx ring
>    * @rx_ring: ring to be cleaned
>    */
>   void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
> @@ -878,7 +878,7 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
>   }
>   
>   /**
> - * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its UMEM queues
> + * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its buffer pool queues
>    * @xdp_ring: XDP_Tx ring
>    */
>   void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
> @@ -902,5 +902,5 @@ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
>   	}
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames);
> +		xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames);
>   }
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
> index fc1a06b..fad7836 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.h
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
> @@ -9,7 +9,8 @@
>   struct ice_vsi;
>   
>   #ifdef CONFIG_XDP_SOCKETS
> -int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid);
> +int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
> +		       u16 qid);
>   int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget);
>   bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget);
>   int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
> @@ -19,8 +20,8 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring);
>   void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring);
>   #else
>   static inline int
> -ice_xsk_umem_setup(struct ice_vsi __always_unused *vsi,
> -		   struct xdp_umem __always_unused *umem,
> +ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
> +		   struct xsk_buff_pool __always_unused *pool,
>   		   u16 __always_unused qid)
>   {
>   	return -EOPNOTSUPP;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 5ddfc83..bd0f65e 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -350,7 +350,7 @@ struct ixgbe_ring {
>   		struct ixgbe_rx_queue_stats rx_stats;
>   	};
>   	struct xdp_rxq_info xdp_rxq;
> -	struct xdp_umem *xsk_umem;
> +	struct xsk_buff_pool *xsk_pool;
>   	u16 ring_idx;		/* {rx,tx,xdp}_ring back reference idx */
>   	u16 rx_buf_len;
>   } ____cacheline_internodealigned_in_smp;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index f162b8b..3217000 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -3158,7 +3158,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
>   #endif
>   
>   	ixgbe_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_umem ?
> +		bool wd = ring->xsk_pool ?
>   			  ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) :
>   			  ixgbe_clean_tx_irq(q_vector, ring, budget);
>   
> @@ -3178,7 +3178,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
>   		per_ring_budget = budget;
>   
>   	ixgbe_for_each_ring(ring, q_vector->rx) {
> -		int cleaned = ring->xsk_umem ?
> +		int cleaned = ring->xsk_pool ?
>   			      ixgbe_clean_rx_irq_zc(q_vector, ring,
>   						    per_ring_budget) :
>   			      ixgbe_clean_rx_irq(q_vector, ring,
> @@ -3473,9 +3473,9 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
>   	u32 txdctl = IXGBE_TXDCTL_ENABLE;
>   	u8 reg_idx = ring->reg_idx;
>   
> -	ring->xsk_umem = NULL;
> +	ring->xsk_pool = NULL;
>   	if (ring_is_xdp(ring))
> -		ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
> +		ring->xsk_pool = ixgbe_xsk_pool(adapter, ring);
>   
>   	/* disable queue to avoid issues while updating state */
>   	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
> @@ -3715,8 +3715,8 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
>   	srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT;
>   
>   	/* configure the packet buffer length */
> -	if (rx_ring->xsk_umem) {
> -		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_umem);
> +	if (rx_ring->xsk_pool) {
> +		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_pool->umem);
>   
>   		/* If the MAC support setting RXDCTL.RLPML, the
>   		 * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and
> @@ -4061,12 +4061,12 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
>   	u8 reg_idx = ring->reg_idx;
>   
>   	xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
> -	ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
> -	if (ring->xsk_umem) {
> +	ring->xsk_pool = ixgbe_xsk_pool(adapter, ring);
> +	if (ring->xsk_pool) {
>   		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
>   						   MEM_TYPE_XSK_BUFF_POOL,
>   						   NULL));
> -		xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq);
> +		xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq);
>   	} else {
>   		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
>   						   MEM_TYPE_PAGE_SHARED, NULL));
> @@ -4121,8 +4121,8 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
>   #endif
>   	}
>   
> -	if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) {
> -		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem);
> +	if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) {
> +		u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_pool->umem);
>   
>   		rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
>   			    IXGBE_RXDCTL_RLPML_EN);
> @@ -4144,7 +4144,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
>   	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
>   
>   	ixgbe_rx_desc_queue_enable(adapter, ring);
> -	if (ring->xsk_umem)
> +	if (ring->xsk_pool)
>   		ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring));
>   	else
>   		ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
> @@ -5277,7 +5277,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
>   	u16 i = rx_ring->next_to_clean;
>   	struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i];
>   
> -	if (rx_ring->xsk_umem) {
> +	if (rx_ring->xsk_pool) {
>   		ixgbe_xsk_clean_rx_ring(rx_ring);
>   		goto skip_free;
>   	}
> @@ -5965,7 +5965,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
>   	u16 i = tx_ring->next_to_clean;
>   	struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
>   
> -	if (tx_ring->xsk_umem) {
> +	if (tx_ring->xsk_pool) {
>   		ixgbe_xsk_clean_tx_ring(tx_ring);
>   		goto out;
>   	}
> @@ -10290,7 +10290,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>   	 */
>   	if (need_reset && prog)
>   		for (i = 0; i < adapter->num_rx_queues; i++)
> -			if (adapter->xdp_ring[i]->xsk_umem)
> +			if (adapter->xdp_ring[i]->xsk_pool)
>   				(void)ixgbe_xsk_wakeup(adapter->netdev, i,
>   						       XDP_WAKEUP_RX);
>   
> @@ -10308,8 +10308,8 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   		xdp->prog_id = adapter->xdp_prog ?
>   			adapter->xdp_prog->aux->id : 0;
>   		return 0;
> -	case XDP_SETUP_XSK_UMEM:
> -		return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem,
> +	case XDP_SETUP_XSK_POOL:
> +		return ixgbe_xsk_pool_setup(adapter, xdp->xsk.pool,
>   					    xdp->xsk.queue_id);
>   
>   	default:
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> index 7887ae4..2aeec78 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> @@ -28,9 +28,10 @@ void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask);
>   void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
>   void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
>   
> -struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
> -				struct ixgbe_ring *ring);
> -int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
> +struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter,
> +				     struct ixgbe_ring *ring);
> +int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter,
> +			 struct xsk_buff_pool *pool,
>   			 u16 qid);
>   
>   void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index be9d2a8..9f503d6 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -8,8 +8,8 @@
>   #include "ixgbe.h"
>   #include "ixgbe_txrx_common.h"
>   
> -struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
> -				struct ixgbe_ring *ring)
> +struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter,
> +				     struct ixgbe_ring *ring)
>   {
>   	bool xdp_on = READ_ONCE(adapter->xdp_prog);
>   	int qid = ring->ring_idx;
> @@ -17,11 +17,11 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
>   	if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps))
>   		return NULL;
>   
> -	return xdp_get_umem_from_qid(adapter->netdev, qid);
> +	return xdp_get_xsk_pool_from_qid(adapter->netdev, qid);
>   }
>   
> -static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
> -				 struct xdp_umem *umem,
> +static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter,
> +				 struct xsk_buff_pool *pool,
>   				 u16 qid)
>   {
>   	struct net_device *netdev = adapter->netdev;
> @@ -35,7 +35,7 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
>   	    qid >= netdev->real_num_tx_queues)
>   		return -EINVAL;
>   
> -	err = xsk_buff_dma_map(umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR);
> +	err = xsk_buff_dma_map(pool->umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR);
>   	if (err)
>   		return err;
>   
> @@ -59,13 +59,13 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
>   	return 0;
>   }
>   
> -static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
> +static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid)
>   {
> -	struct xdp_umem *umem;
> +	struct xsk_buff_pool *pool;
>   	bool if_running;
>   
> -	umem = xdp_get_umem_from_qid(adapter->netdev, qid);
> -	if (!umem)
> +	pool = xdp_get_xsk_pool_from_qid(adapter->netdev, qid);
> +	if (!pool)
>   		return -EINVAL;
>   
>   	if_running = netif_running(adapter->netdev) &&
> @@ -75,7 +75,7 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
>   		ixgbe_txrx_ring_disable(adapter, qid);
>   
>   	clear_bit(qid, adapter->af_xdp_zc_qps);
> -	xsk_buff_dma_unmap(umem, IXGBE_RX_DMA_ATTR);
> +	xsk_buff_dma_unmap(pool->umem, IXGBE_RX_DMA_ATTR);
>   
>   	if (if_running)
>   		ixgbe_txrx_ring_enable(adapter, qid);
> @@ -83,11 +83,12 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
>   	return 0;
>   }
>   
> -int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
> +int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter,
> +			 struct xsk_buff_pool *pool,
>   			 u16 qid)
>   {
> -	return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) :
> -		ixgbe_xsk_umem_disable(adapter, qid);
> +	return pool ? ixgbe_xsk_pool_enable(adapter, pool, qid) :
> +		ixgbe_xsk_pool_disable(adapter, qid);
>   }
>   
>   static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
> @@ -149,7 +150,7 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
>   	i -= rx_ring->count;
>   
>   	do {
> -		bi->xdp = xsk_buff_alloc(rx_ring->xsk_umem);
> +		bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem);
>   		if (!bi->xdp) {
>   			ok = false;
>   			break;
> @@ -344,11 +345,11 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>   	q_vector->rx.total_packets += total_rx_packets;
>   	q_vector->rx.total_bytes += total_rx_bytes;
>   
> -	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) {
> +	if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) {
>   		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
> -			xsk_set_rx_need_wakeup(rx_ring->xsk_umem);
> +			xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem);
>   		else
> -			xsk_clear_rx_need_wakeup(rx_ring->xsk_umem);
> +			xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem);
>   
>   		return (int)total_rx_packets;
>   	}
> @@ -373,6 +374,7 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
>   
>   static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
>   {
> +	struct xsk_buff_pool *pool = xdp_ring->xsk_pool;
>   	union ixgbe_adv_tx_desc *tx_desc = NULL;
>   	struct ixgbe_tx_buffer *tx_bi;
>   	bool work_done = true;
> @@ -387,12 +389,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
>   			break;
>   		}
>   
> -		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
> +		if (!xsk_umem_consume_tx(pool->umem, &desc))
>   			break;
>   
> -		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr);
> -		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma,
> -						 desc.len);
> +		dma = xsk_buff_raw_get_dma(pool->umem, desc.addr);
> +		xsk_buff_raw_dma_sync_for_device(pool->umem, dma, desc.len);
>   
>   		tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
>   		tx_bi->bytecount = desc.len;
> @@ -418,7 +419,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
>   
>   	if (tx_desc) {
>   		ixgbe_xdp_ring_update_tail(xdp_ring);
> -		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
> +		xsk_umem_consume_tx_done(pool->umem);
>   	}
>   
>   	return !!budget && work_done;
> @@ -439,7 +440,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>   {
>   	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
>   	unsigned int total_packets = 0, total_bytes = 0;
> -	struct xdp_umem *umem = tx_ring->xsk_umem;
> +	struct xsk_buff_pool *pool = tx_ring->xsk_pool;
>   	union ixgbe_adv_tx_desc *tx_desc;
>   	struct ixgbe_tx_buffer *tx_bi;
>   	u32 xsk_frames = 0;
> @@ -484,10 +485,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>   	q_vector->tx.total_packets += total_packets;
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(umem, xsk_frames);
> +		xsk_umem_complete_tx(pool->umem, xsk_frames);
>   
> -	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem))
> -		xsk_set_tx_need_wakeup(tx_ring->xsk_umem);
> +	if (xsk_umem_uses_need_wakeup(pool->umem))
> +		xsk_set_tx_need_wakeup(pool->umem);
>   
>   	return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
>   }
> @@ -511,7 +512,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
>   	if (test_bit(__IXGBE_TX_DISABLED, &ring->state))
>   		return -ENETDOWN;
>   
> -	if (!ring->xsk_umem)
> +	if (!ring->xsk_pool)
>   		return -ENXIO;
>   
>   	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
> @@ -526,7 +527,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
>   void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
>   {
>   	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
> -	struct xdp_umem *umem = tx_ring->xsk_umem;
> +	struct xsk_buff_pool *pool = tx_ring->xsk_pool;
>   	struct ixgbe_tx_buffer *tx_bi;
>   	u32 xsk_frames = 0;
>   
> @@ -546,5 +547,5 @@ void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
>   	}
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(umem, xsk_frames);
> +		xsk_umem_complete_tx(pool->umem, xsk_frames);
>   }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 842db20..516dfd3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -448,7 +448,7 @@ struct mlx5e_xdpsq {
>   	struct mlx5e_cq            cq;
>   
>   	/* read only */
> -	struct xdp_umem           *umem;
> +	struct xsk_buff_pool      *pool;
>   	struct mlx5_wq_cyc         wq;
>   	struct mlx5e_xdpsq_stats  *stats;
>   	mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check;
> @@ -610,7 +610,7 @@ struct mlx5e_rq {
>   	struct page_pool      *page_pool;
>   
>   	/* AF_XDP zero-copy */
> -	struct xdp_umem       *umem;
> +	struct xsk_buff_pool  *xsk_pool;
>   
>   	struct work_struct     recover_work;
>   
> @@ -731,12 +731,13 @@ struct mlx5e_hv_vhca_stats_agent {
>   #endif
>   
>   struct mlx5e_xsk {
> -	/* UMEMs are stored separately from channels, because we don't want to
> -	 * lose them when channels are recreated. The kernel also stores UMEMs,
> -	 * but it doesn't distinguish between zero-copy and non-zero-copy UMEMs,
> -	 * so rely on our mechanism.
> +	/* XSK buffer pools are stored separately from channels,
> +	 * because we don't want to lose them when channels are
> +	 * recreated. The kernel also stores buffer pool, but it doesn't
> +	 * distinguish between zero-copy and non-zero-copy UMEMs, so
> +	 * rely on our mechanism.
>   	 */
> -	struct xdp_umem **umems;
> +	struct xsk_buff_pool **pools;
>   	u16 refcnt;
>   	bool ever_used;
>   };
> @@ -948,7 +949,7 @@ struct mlx5e_xsk_param;
>   struct mlx5e_rq_param;
>   int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
>   		  struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
> -		  struct xdp_umem *umem, struct mlx5e_rq *rq);
> +		  struct xsk_buff_pool *pool, struct mlx5e_rq *rq);
>   int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time);
>   void mlx5e_deactivate_rq(struct mlx5e_rq *rq);
>   void mlx5e_close_rq(struct mlx5e_rq *rq);
> @@ -958,7 +959,7 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
>   		     struct mlx5e_sq_param *param, struct mlx5e_icosq *sq);
>   void mlx5e_close_icosq(struct mlx5e_icosq *sq);
>   int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
> -		     struct mlx5e_sq_param *param, struct xdp_umem *umem,
> +		     struct mlx5e_sq_param *param, struct xsk_buff_pool *pool,
>   		     struct mlx5e_xdpsq *sq, bool is_redirect);
>   void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq);
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index c9d308e..0a5a873 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -446,7 +446,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
>   	} while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(sq->umem, xsk_frames);
> +		xsk_umem_complete_tx(sq->pool->umem, xsk_frames);
>   
>   	sq->stats->cqes += i;
>   
> @@ -476,7 +476,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
>   	}
>   
>   	if (xsk_frames)
> -		xsk_umem_complete_tx(sq->umem, xsk_frames);
> +		xsk_umem_complete_tx(sq->pool->umem, xsk_frames);
>   }
>   
>   int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
> @@ -561,4 +561,3 @@ void mlx5e_set_xmit_fp(struct mlx5e_xdpsq *sq, bool is_mpw)
>   	sq->xmit_xdp_frame = is_mpw ?
>   		mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame;
>   }
> -
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
> index d147b2f..3dd056a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
> @@ -19,10 +19,10 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
>   					      struct mlx5e_wqe_frag_info *wi,
>   					      u32 cqe_bcnt);
>   
> -static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
> +static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
>   					    struct mlx5e_dma_info *dma_info)
>   {
> -	dma_info->xsk = xsk_buff_alloc(rq->umem);
> +	dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem);
>   	if (!dma_info->xsk)
>   		return -ENOMEM;
>   
> @@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
>   
>   static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err)
>   {
> -	if (!xsk_umem_uses_need_wakeup(rq->umem))
> +	if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem))
>   		return alloc_err;
>   
>   	if (unlikely(alloc_err))
> -		xsk_set_rx_need_wakeup(rq->umem);
> +		xsk_set_rx_need_wakeup(rq->xsk_pool->umem);
>   	else
> -		xsk_clear_rx_need_wakeup(rq->umem);
> +		xsk_clear_rx_need_wakeup(rq->xsk_pool->umem);
>   
>   	return false;
>   }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
> index 2c80205..f32a381 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
> @@ -62,7 +62,7 @@ static void mlx5e_build_xsk_cparam(struct mlx5e_priv *priv,
>   }
>   
>   int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
> -		   struct mlx5e_xsk_param *xsk, struct xdp_umem *umem,
> +		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
>   		   struct mlx5e_channel *c)
>   {
>   	struct mlx5e_channel_param *cparam;
> @@ -82,7 +82,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
>   	if (unlikely(err))
>   		goto err_free_cparam;
>   
> -	err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq);
> +	err = mlx5e_open_rq(c, params, &cparam->rq, xsk, pool, &c->xskrq);
>   	if (unlikely(err))
>   		goto err_close_rx_cq;
>   
> @@ -90,13 +90,13 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
>   	if (unlikely(err))
>   		goto err_close_rq;
>   
> -	/* Create a separate SQ, so that when the UMEM is disabled, we could
> +	/* Create a separate SQ, so that when the buff pool is disabled, we could
>   	 * close this SQ safely and stop receiving CQEs. In other case, e.g., if
> -	 * the XDPSQ was used instead, we might run into trouble when the UMEM
> +	 * the XDPSQ was used instead, we might run into trouble when the buff pool
>   	 * is disabled and then reenabled, but the SQ continues receiving CQEs
> -	 * from the old UMEM.
> +	 * from the old buff pool.
>   	 */
> -	err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true);
> +	err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, pool, &c->xsksq, true);
>   	if (unlikely(err))
>   		goto err_close_tx_cq;
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
> index 0dd11b8..ca20f1f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
> @@ -12,7 +12,7 @@ bool mlx5e_validate_xsk_param(struct mlx5e_params *params,
>   			      struct mlx5e_xsk_param *xsk,
>   			      struct mlx5_core_dev *mdev);
>   int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
> -		   struct mlx5e_xsk_param *xsk, struct xdp_umem *umem,
> +		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
>   		   struct mlx5e_channel *c);
>   void mlx5e_close_xsk(struct mlx5e_channel *c);
>   void mlx5e_activate_xsk(struct mlx5e_channel *c);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> index 83dce9c..abe4639 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> @@ -66,7 +66,7 @@ static void mlx5e_xsk_tx_post_err(struct mlx5e_xdpsq *sq,
>   
>   bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   {
> -	struct xdp_umem *umem = sq->umem;
> +	struct xsk_buff_pool *pool = sq->pool;
>   	struct mlx5e_xdp_info xdpi;
>   	struct mlx5e_xdp_xmit_data xdptxd;
>   	bool work_done = true;
> @@ -83,7 +83,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   			break;
>   		}
>   
> -		if (!xsk_umem_consume_tx(umem, &desc)) {
> +		if (!xsk_umem_consume_tx(pool->umem, &desc)) {
>   			/* TX will get stuck until something wakes it up by
>   			 * triggering NAPI. Currently it's expected that the
>   			 * application calls sendto() if there are consumed, but
> @@ -92,11 +92,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   			break;
>   		}
>   
> -		xdptxd.dma_addr = xsk_buff_raw_get_dma(umem, desc.addr);
> -		xdptxd.data = xsk_buff_raw_get_data(umem, desc.addr);
> +		xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr);
> +		xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr);
>   		xdptxd.len = desc.len;
>   
> -		xsk_buff_raw_dma_sync_for_device(umem, xdptxd.dma_addr, xdptxd.len);
> +		xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len);
>   
>   		if (unlikely(!sq->xmit_xdp_frame(sq, &xdptxd, &xdpi, check_result))) {
>   			if (sq->mpwqe.wqe)
> @@ -113,7 +113,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>   			mlx5e_xdp_mpwqe_complete(sq);
>   		mlx5e_xmit_xdp_doorbell(sq);
>   
> -		xsk_umem_consume_tx_done(umem);
> +		xsk_umem_consume_tx_done(pool->umem);
>   	}
>   
>   	return !(budget && work_done);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
> index 39fa0a7..610a084 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
> @@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget);
>   
>   static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq)
>   {
> -	if (!xsk_umem_uses_need_wakeup(sq->umem))
> +	if (!xsk_umem_uses_need_wakeup(sq->pool->umem))
>   		return;
>   
>   	if (sq->pc != sq->cc)
> -		xsk_clear_tx_need_wakeup(sq->umem);
> +		xsk_clear_tx_need_wakeup(sq->pool->umem);
>   	else
> -		xsk_set_tx_need_wakeup(sq->umem);
> +		xsk_set_tx_need_wakeup(sq->pool->umem);
>   }
>   
>   #endif /* __MLX5_EN_XSK_TX_H__ */
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
> index 7b17fcd..947abf1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
> @@ -6,26 +6,26 @@
>   #include "setup.h"
>   #include "en/params.h"
>   
> -static int mlx5e_xsk_map_umem(struct mlx5e_priv *priv,
> -			      struct xdp_umem *umem)
> +static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv,
> +			      struct xsk_buff_pool *pool)
>   {
>   	struct device *dev = priv->mdev->device;
>   
> -	return xsk_buff_dma_map(umem, dev, 0);
> +	return xsk_buff_dma_map(pool->umem, dev, 0);
>   }
>   
> -static void mlx5e_xsk_unmap_umem(struct mlx5e_priv *priv,
> -				 struct xdp_umem *umem)
> +static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv,
> +				 struct xsk_buff_pool *pool)
>   {
> -	return xsk_buff_dma_unmap(umem, 0);
> +	return xsk_buff_dma_unmap(pool->umem, 0);
>   }
>   
> -static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk)
> +static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk)
>   {
> -	if (!xsk->umems) {
> -		xsk->umems = kcalloc(MLX5E_MAX_NUM_CHANNELS,
> -				     sizeof(*xsk->umems), GFP_KERNEL);
> -		if (unlikely(!xsk->umems))
> +	if (!xsk->pools) {
> +		xsk->pools = kcalloc(MLX5E_MAX_NUM_CHANNELS,
> +				     sizeof(*xsk->pools), GFP_KERNEL);
> +		if (unlikely(!xsk->pools))
>   			return -ENOMEM;
>   	}
>   
> @@ -35,68 +35,68 @@ static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk)
>   	return 0;
>   }
>   
> -static void mlx5e_xsk_put_umems(struct mlx5e_xsk *xsk)
> +static void mlx5e_xsk_put_pools(struct mlx5e_xsk *xsk)
>   {
>   	if (!--xsk->refcnt) {
> -		kfree(xsk->umems);
> -		xsk->umems = NULL;
> +		kfree(xsk->pools);
> +		xsk->pools = NULL;
>   	}
>   }
>   
> -static int mlx5e_xsk_add_umem(struct mlx5e_xsk *xsk, struct xdp_umem *umem, u16 ix)
> +static int mlx5e_xsk_add_pool(struct mlx5e_xsk *xsk, struct xsk_buff_pool *pool, u16 ix)
>   {
>   	int err;
>   
> -	err = mlx5e_xsk_get_umems(xsk);
> +	err = mlx5e_xsk_get_pools(xsk);
>   	if (unlikely(err))
>   		return err;
>   
> -	xsk->umems[ix] = umem;
> +	xsk->pools[ix] = pool;
>   	return 0;
>   }
>   
> -static void mlx5e_xsk_remove_umem(struct mlx5e_xsk *xsk, u16 ix)
> +static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix)
>   {
> -	xsk->umems[ix] = NULL;
> +	xsk->pools[ix] = NULL;
>   
> -	mlx5e_xsk_put_umems(xsk);
> +	mlx5e_xsk_put_pools(xsk);
>   }
>   
> -static bool mlx5e_xsk_is_umem_sane(struct xdp_umem *umem)
> +static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool)
>   {
> -	return xsk_umem_get_headroom(umem) <= 0xffff &&
> -		xsk_umem_get_chunk_size(umem) <= 0xffff;
> +	return xsk_umem_get_headroom(pool->umem) <= 0xffff &&
> +		xsk_umem_get_chunk_size(pool->umem) <= 0xffff;
>   }
>   
> -void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk)
> +void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk)
>   {
> -	xsk->headroom = xsk_umem_get_headroom(umem);
> -	xsk->chunk_size = xsk_umem_get_chunk_size(umem);
> +	xsk->headroom = xsk_umem_get_headroom(pool->umem);
> +	xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem);
>   }
>   
>   static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
> -				   struct xdp_umem *umem, u16 ix)
> +				   struct xsk_buff_pool *pool, u16 ix)
>   {
>   	struct mlx5e_params *params = &priv->channels.params;
>   	struct mlx5e_xsk_param xsk;
>   	struct mlx5e_channel *c;
>   	int err;
>   
> -	if (unlikely(mlx5e_xsk_get_umem(&priv->channels.params, &priv->xsk, ix)))
> +	if (unlikely(mlx5e_xsk_get_pool(&priv->channels.params, &priv->xsk, ix)))
>   		return -EBUSY;
>   
> -	if (unlikely(!mlx5e_xsk_is_umem_sane(umem)))
> +	if (unlikely(!mlx5e_xsk_is_pool_sane(pool)))
>   		return -EINVAL;
>   
> -	err = mlx5e_xsk_map_umem(priv, umem);
> +	err = mlx5e_xsk_map_pool(priv, pool);
>   	if (unlikely(err))
>   		return err;
>   
> -	err = mlx5e_xsk_add_umem(&priv->xsk, umem, ix);
> +	err = mlx5e_xsk_add_pool(&priv->xsk, pool, ix);
>   	if (unlikely(err))
> -		goto err_unmap_umem;
> +		goto err_unmap_pool;
>   
> -	mlx5e_build_xsk_param(umem, &xsk);
> +	mlx5e_build_xsk_param(pool, &xsk);
>   
>   	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
>   		/* XSK objects will be created on open. */
> @@ -112,9 +112,9 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
>   
>   	c = priv->channels.c[ix];
>   
> -	err = mlx5e_open_xsk(priv, params, &xsk, umem, c);
> +	err = mlx5e_open_xsk(priv, params, &xsk, pool, c);
>   	if (unlikely(err))
> -		goto err_remove_umem;
> +		goto err_remove_pool;
>   
>   	mlx5e_activate_xsk(c);
>   
> @@ -132,11 +132,11 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
>   	mlx5e_deactivate_xsk(c);
>   	mlx5e_close_xsk(c);
>   
> -err_remove_umem:
> -	mlx5e_xsk_remove_umem(&priv->xsk, ix);
> +err_remove_pool:
> +	mlx5e_xsk_remove_pool(&priv->xsk, ix);
>   
> -err_unmap_umem:
> -	mlx5e_xsk_unmap_umem(priv, umem);
> +err_unmap_pool:
> +	mlx5e_xsk_unmap_pool(priv, pool);
>   
>   	return err;
>   
> @@ -146,7 +146,7 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
>   	 */
>   	if (!mlx5e_validate_xsk_param(params, &xsk, priv->mdev)) {
>   		err = -EINVAL;
> -		goto err_remove_umem;
> +		goto err_remove_pool;
>   	}
>   
>   	return 0;
> @@ -154,45 +154,45 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
>   
>   static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix)
>   {
> -	struct xdp_umem *umem = mlx5e_xsk_get_umem(&priv->channels.params,
> +	struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&priv->channels.params,
>   						   &priv->xsk, ix);
>   	struct mlx5e_channel *c;
>   
> -	if (unlikely(!umem))
> +	if (unlikely(!pool))
>   		return -EINVAL;
>   
>   	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
> -		goto remove_umem;
> +		goto remove_pool;
>   
>   	/* XSK RQ and SQ are only created if XDP program is set. */
>   	if (!priv->channels.params.xdp_prog)
> -		goto remove_umem;
> +		goto remove_pool;
>   
>   	c = priv->channels.c[ix];
>   	mlx5e_xsk_redirect_rqt_to_drop(priv, ix);
>   	mlx5e_deactivate_xsk(c);
>   	mlx5e_close_xsk(c);
>   
> -remove_umem:
> -	mlx5e_xsk_remove_umem(&priv->xsk, ix);
> -	mlx5e_xsk_unmap_umem(priv, umem);
> +remove_pool:
> +	mlx5e_xsk_remove_pool(&priv->xsk, ix);
> +	mlx5e_xsk_unmap_pool(priv, pool);
>   
>   	return 0;
>   }
>   
> -static int mlx5e_xsk_enable_umem(struct mlx5e_priv *priv, struct xdp_umem *umem,
> +static int mlx5e_xsk_enable_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool,
>   				 u16 ix)
>   {
>   	int err;
>   
>   	mutex_lock(&priv->state_lock);
> -	err = mlx5e_xsk_enable_locked(priv, umem, ix);
> +	err = mlx5e_xsk_enable_locked(priv, pool, ix);
>   	mutex_unlock(&priv->state_lock);
>   
>   	return err;
>   }
>   
> -static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix)
> +static int mlx5e_xsk_disable_pool(struct mlx5e_priv *priv, u16 ix)
>   {
>   	int err;
>   
> @@ -203,7 +203,7 @@ static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix)
>   	return err;
>   }
>   
> -int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid)
> +int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid)
>   {
>   	struct mlx5e_priv *priv = netdev_priv(dev);
>   	struct mlx5e_params *params = &priv->channels.params;
> @@ -212,8 +212,8 @@ int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid)
>   	if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
>   		return -EINVAL;
>   
> -	return umem ? mlx5e_xsk_enable_umem(priv, umem, ix) :
> -		      mlx5e_xsk_disable_umem(priv, ix);
> +	return pool ? mlx5e_xsk_enable_pool(priv, pool, ix) :
> +		      mlx5e_xsk_disable_pool(priv, ix);
>   }
>   
>   u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk)
> @@ -221,7 +221,7 @@ u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk
>   	u16 res = xsk->refcnt ? params->num_channels : 0;
>   
>   	while (res) {
> -		if (mlx5e_xsk_get_umem(params, xsk, res - 1))
> +		if (mlx5e_xsk_get_pool(params, xsk, res - 1))
>   			break;
>   		--res;
>   	}
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
> index 25b4cbe..629db33 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
> @@ -6,25 +6,25 @@
>   
>   #include "en.h"
>   
> -static inline struct xdp_umem *mlx5e_xsk_get_umem(struct mlx5e_params *params,
> -						  struct mlx5e_xsk *xsk, u16 ix)
> +static inline struct xsk_buff_pool *mlx5e_xsk_get_pool(struct mlx5e_params *params,
> +						       struct mlx5e_xsk *xsk, u16 ix)
>   {
> -	if (!xsk || !xsk->umems)
> +	if (!xsk || !xsk->pools)
>   		return NULL;
>   
>   	if (unlikely(ix >= params->num_channels))
>   		return NULL;
>   
> -	return xsk->umems[ix];
> +	return xsk->pools[ix];
>   }
>   
>   struct mlx5e_xsk_param;
> -void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk);
> +void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk);
>   
>   /* .ndo_bpf callback. */
> -int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid);
> +int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid);
>   
> -int mlx5e_xsk_resize_reuseq(struct xdp_umem *umem, u32 nentries);
> +int mlx5e_xsk_resize_reuseq(struct xsk_buff_pool *pool, u32 nentries);
>   
>   u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk);
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index a836a02..2b4a3e3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -365,7 +365,7 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work)
>   static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   			  struct mlx5e_params *params,
>   			  struct mlx5e_xsk_param *xsk,
> -			  struct xdp_umem *umem,
> +			  struct xsk_buff_pool *pool,
>   			  struct mlx5e_rq_param *rqp,
>   			  struct mlx5e_rq *rq)
>   {
> @@ -391,9 +391,9 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   	rq->mdev    = mdev;
>   	rq->hw_mtu  = MLX5E_SW2HW_MTU(params, params->sw_mtu);
>   	rq->xdpsq   = &c->rq_xdpsq;
> -	rq->umem    = umem;
> +	rq->xsk_pool = pool;
>   
> -	if (rq->umem)
> +	if (rq->xsk_pool)
>   		rq->stats = &c->priv->channel_stats[c->ix].xskrq;
>   	else
>   		rq->stats = &c->priv->channel_stats[c->ix].rq;
> @@ -518,7 +518,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   	if (xsk) {
>   		err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
>   						 MEM_TYPE_XSK_BUFF_POOL, NULL);
> -		xsk_buff_set_rxq_info(rq->umem, &rq->xdp_rxq);
> +		xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq);
>   	} else {
>   		/* Create a page_pool and register it with rxq */
>   		pp_params.order     = 0;
> @@ -857,11 +857,11 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
>   
>   int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
>   		  struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
> -		  struct xdp_umem *umem, struct mlx5e_rq *rq)
> +		  struct xsk_buff_pool *pool, struct mlx5e_rq *rq)
>   {
>   	int err;
>   
> -	err = mlx5e_alloc_rq(c, params, xsk, umem, param, rq);
> +	err = mlx5e_alloc_rq(c, params, xsk, pool, param, rq);
>   	if (err)
>   		return err;
>   
> @@ -963,7 +963,7 @@ static int mlx5e_alloc_xdpsq_db(struct mlx5e_xdpsq *sq, int numa)
>   
>   static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
>   			     struct mlx5e_params *params,
> -			     struct xdp_umem *umem,
> +			     struct xsk_buff_pool *pool,
>   			     struct mlx5e_sq_param *param,
>   			     struct mlx5e_xdpsq *sq,
>   			     bool is_redirect)
> @@ -979,9 +979,9 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
>   	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
>   	sq->min_inline_mode = params->tx_min_inline_mode;
>   	sq->hw_mtu    = MLX5E_SW2HW_MTU(params, params->sw_mtu);
> -	sq->umem      = umem;
> +	sq->pool      = pool;
>   
> -	sq->stats = sq->umem ?
> +	sq->stats = sq->pool ?
>   		&c->priv->channel_stats[c->ix].xsksq :
>   		is_redirect ?
>   			&c->priv->channel_stats[c->ix].xdpsq :
> @@ -1445,13 +1445,13 @@ void mlx5e_close_icosq(struct mlx5e_icosq *sq)
>   }
>   
>   int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
> -		     struct mlx5e_sq_param *param, struct xdp_umem *umem,
> +		     struct mlx5e_sq_param *param, struct xsk_buff_pool *pool,
>   		     struct mlx5e_xdpsq *sq, bool is_redirect)
>   {
>   	struct mlx5e_create_sq_param csp = {};
>   	int err;
>   
> -	err = mlx5e_alloc_xdpsq(c, params, umem, param, sq, is_redirect);
> +	err = mlx5e_alloc_xdpsq(c, params, pool, param, sq, is_redirect);
>   	if (err)
>   		return err;
>   
> @@ -1927,7 +1927,7 @@ static u8 mlx5e_enumerate_lag_port(struct mlx5_core_dev *mdev, int ix)
>   static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
>   			      struct mlx5e_params *params,
>   			      struct mlx5e_channel_param *cparam,
> -			      struct xdp_umem *umem,
> +			      struct xsk_buff_pool *pool,
>   			      struct mlx5e_channel **cp)
>   {
>   	int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
> @@ -1966,9 +1966,9 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
>   	if (unlikely(err))
>   		goto err_napi_del;
>   
> -	if (umem) {
> -		mlx5e_build_xsk_param(umem, &xsk);
> -		err = mlx5e_open_xsk(priv, params, &xsk, umem, c);
> +	if (pool) {
> +		mlx5e_build_xsk_param(pool, &xsk);
> +		err = mlx5e_open_xsk(priv, params, &xsk, pool, c);
>   		if (unlikely(err))
>   			goto err_close_queues;
>   	}
> @@ -2316,12 +2316,12 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
>   
>   	mlx5e_build_channel_param(priv, &chs->params, cparam);
>   	for (i = 0; i < chs->num; i++) {
> -		struct xdp_umem *umem = NULL;
> +		struct xsk_buff_pool *pool = NULL;
>   
>   		if (chs->params.xdp_prog)
> -			umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, i);
> +			pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, i);
>   
> -		err = mlx5e_open_channel(priv, i, &chs->params, cparam, umem, &chs->c[i]);
> +		err = mlx5e_open_channel(priv, i, &chs->params, cparam, pool, &chs->c[i]);
>   		if (err)
>   			goto err_close_channels;
>   	}
> @@ -3882,13 +3882,13 @@ static bool mlx5e_xsk_validate_mtu(struct net_device *netdev,
>   	u16 ix;
>   
>   	for (ix = 0; ix < chs->params.num_channels; ix++) {
> -		struct xdp_umem *umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, ix);
> +		struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, ix);
>   		struct mlx5e_xsk_param xsk;
>   
> -		if (!umem)
> +		if (!pool)
>   			continue;
>   
> -		mlx5e_build_xsk_param(umem, &xsk);
> +		mlx5e_build_xsk_param(pool, &xsk);
>   
>   		if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) {
>   			u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk);
> @@ -4518,8 +4518,8 @@ static int mlx5e_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   	case XDP_QUERY_PROG:
>   		xdp->prog_id = mlx5e_xdp_query(dev);
>   		return 0;
> -	case XDP_SETUP_XSK_UMEM:
> -		return mlx5e_xsk_setup_umem(dev, xdp->xsk.umem,
> +	case XDP_SETUP_XSK_POOL:
> +		return mlx5e_xsk_setup_pool(dev, xdp->xsk.pool,
>   					    xdp->xsk.queue_id);
>   	default:
>   		return -EINVAL;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index dbb1c63..1dcf77d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -264,8 +264,8 @@ static inline int mlx5e_page_alloc_pool(struct mlx5e_rq *rq,
>   static inline int mlx5e_page_alloc(struct mlx5e_rq *rq,
>   				   struct mlx5e_dma_info *dma_info)
>   {
> -	if (rq->umem)
> -		return mlx5e_xsk_page_alloc_umem(rq, dma_info);
> +	if (rq->xsk_pool)
> +		return mlx5e_xsk_page_alloc_pool(rq, dma_info);
>   	else
>   		return mlx5e_page_alloc_pool(rq, dma_info);
>   }
> @@ -296,7 +296,7 @@ static inline void mlx5e_page_release(struct mlx5e_rq *rq,
>   				      struct mlx5e_dma_info *dma_info,
>   				      bool recycle)
>   {
> -	if (rq->umem)
> +	if (rq->xsk_pool)
>   		/* The `recycle` parameter is ignored, and the page is always
>   		 * put into the Reuse Ring, because there is no way to return
>   		 * the page to the userspace when the interface goes down.
> @@ -383,14 +383,14 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
>   	int err;
>   	int i;
>   
> -	if (rq->umem) {
> +	if (rq->xsk_pool) {
>   		int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags;
>   
>   		/* Check in advance that we have enough frames, instead of
>   		 * allocating one-by-one, failing and moving frames to the
>   		 * Reuse Ring.
>   		 */
> -		if (unlikely(!xsk_buff_can_alloc(rq->umem, pages_desired)))
> +		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired)))
>   			return -ENOMEM;
>   	}
>   
> @@ -488,8 +488,8 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
>   	/* Check in advance that we have enough frames, instead of allocating
>   	 * one-by-one, failing and moving frames to the Reuse Ring.
>   	 */
> -	if (rq->umem &&
> -	    unlikely(!xsk_buff_can_alloc(rq->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
> +	if (rq->xsk_pool &&
> +	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
>   		err = -ENOMEM;
>   		goto err;
>   	}
> @@ -700,7 +700,7 @@ bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
>   	 * the driver when it refills the Fill Ring.
>   	 * 2. Otherwise, busy poll by rescheduling the NAPI poll.
>   	 */
> -	if (unlikely(alloc_err == -ENOMEM && rq->umem))
> +	if (unlikely(alloc_err == -ENOMEM && rq->xsk_pool))
>   		return true;
>   
>   	return false;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6fc613e..e5acc3b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -616,7 +616,7 @@ struct netdev_queue {
>   	/* Subordinate device that the queue has been assigned to */
>   	struct net_device	*sb_dev;
>   #ifdef CONFIG_XDP_SOCKETS
> -	struct xdp_umem         *umem;
> +	struct xsk_buff_pool    *pool;

Nit: IMO, it's better to prefix the field name with xsk_ in such places. 
"Pool" is too generic, who knows what kind of pool we'll have in a 
netdev queue in the future.

>   #endif
>   /*
>    * write-mostly part
> @@ -749,7 +749,7 @@ struct netdev_rx_queue {
>   	struct net_device		*dev;
>   	struct xdp_rxq_info		xdp_rxq;
>   #ifdef CONFIG_XDP_SOCKETS
> -	struct xdp_umem                 *umem;
> +	struct xsk_buff_pool            *pool;
>   #endif
>   } ____cacheline_aligned_in_smp;
>   
> @@ -879,7 +879,7 @@ enum bpf_netdev_command {
>   	/* BPF program for offload callbacks, invoked at program load time. */
>   	BPF_OFFLOAD_MAP_ALLOC,
>   	BPF_OFFLOAD_MAP_FREE,
> -	XDP_SETUP_XSK_UMEM,
> +	XDP_SETUP_XSK_POOL,
>   };
>   
>   struct bpf_prog_offload_ops;
> @@ -906,9 +906,9 @@ struct netdev_bpf {
>   		struct {
>   			struct bpf_offloaded_map *offmap;
>   		};
> -		/* XDP_SETUP_XSK_UMEM */
> +		/* XDP_SETUP_XSK_POOL */
>   		struct {
> -			struct xdp_umem *umem;
> +			struct xsk_buff_pool *pool;
>   			u16 queue_id;
>   		} xsk;
>   	};
> diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
> index ccf848f..5dc8d3c 100644
> --- a/include/net/xdp_sock_drv.h
> +++ b/include/net/xdp_sock_drv.h
> @@ -14,7 +14,8 @@
>   void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
>   bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
>   void xsk_umem_consume_tx_done(struct xdp_umem *umem);
> -struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
> +struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev,
> +						u16 queue_id);
>   void xsk_set_rx_need_wakeup(struct xdp_umem *umem);
>   void xsk_set_tx_need_wakeup(struct xdp_umem *umem);
>   void xsk_clear_rx_need_wakeup(struct xdp_umem *umem);
> @@ -125,8 +126,8 @@ static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem)
>   {
>   }
>   
> -static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> -						     u16 queue_id)
> +static inline struct xsk_buff_pool *
> +xdp_get_xsk_pool_from_qid(struct net_device *dev, u16 queue_id)
>   {
>   	return NULL;
>   }
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index a4ff226..a6dec9c 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -13,6 +13,7 @@ struct xsk_buff_pool;
>   struct xdp_rxq_info;
>   struct xsk_queue;
>   struct xdp_desc;
> +struct xdp_umem;
>   struct device;
>   struct page;
>   
> @@ -42,13 +43,14 @@ struct xsk_buff_pool {
>   	u32 frame_len;
>   	bool cheap_dma;
>   	bool unaligned;
> +	struct xdp_umem *umem;
>   	void *addrs;
>   	struct device *dev;
>   	struct xdp_buff_xsk *free_heads[];
>   };
>   
>   /* AF_XDP core. */
> -struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
> +struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
>   				u32 chunk_size, u32 headroom, u64 size,
>   				bool unaligned);
>   void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq);
> diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
> index 9ef54cd..78d990b 100644
> --- a/net/ethtool/channels.c
> +++ b/net/ethtool/channels.c
> @@ -223,7 +223,7 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info)
>   	from_channel = channels.combined_count +
>   		       min(channels.rx_count, channels.tx_count);
>   	for (i = from_channel; i < old_total; i++)
> -		if (xdp_get_umem_from_qid(dev, i)) {
> +		if (xdp_get_xsk_pool_from_qid(dev, i)) {
>   			GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets");
>   			return -EINVAL;
>   		}
> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
> index b5df90c..91de16d 100644
> --- a/net/ethtool/ioctl.c
> +++ b/net/ethtool/ioctl.c
> @@ -1702,7 +1702,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
>   		min(channels.rx_count, channels.tx_count);
>   	to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count);
>   	for (i = from_channel; i < to_channel; i++)
> -		if (xdp_get_umem_from_qid(dev, i))
> +		if (xdp_get_xsk_pool_from_qid(dev, i))
>   			return -EINVAL;
>   
>   	ret = dev->ethtool_ops->set_channels(dev, &channels);
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index e97db37..0b5f3b0 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -51,8 +51,9 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
>    * not know if the device has more tx queues than rx, or the opposite.
>    * This might also change during run time.
>    */
> -static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> -			       u16 queue_id)
> +static int xdp_reg_xsk_pool_at_qid(struct net_device *dev,
> +				   struct xsk_buff_pool *pool,
> +				   u16 queue_id)
>   {
>   	if (queue_id >= max_t(unsigned int,
>   			      dev->real_num_rx_queues,
> @@ -60,31 +61,31 @@ static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
>   		return -EINVAL;
>   
>   	if (queue_id < dev->real_num_rx_queues)
> -		dev->_rx[queue_id].umem = umem;
> +		dev->_rx[queue_id].pool = pool;
>   	if (queue_id < dev->real_num_tx_queues)
> -		dev->_tx[queue_id].umem = umem;
> +		dev->_tx[queue_id].pool = pool;
>   
>   	return 0;
>   }
>   
> -struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> -				       u16 queue_id)
> +struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev,
> +						u16 queue_id)
>   {
>   	if (queue_id < dev->real_num_rx_queues)
> -		return dev->_rx[queue_id].umem;
> +		return dev->_rx[queue_id].pool;
>   	if (queue_id < dev->real_num_tx_queues)
> -		return dev->_tx[queue_id].umem;
> +		return dev->_tx[queue_id].pool;
>   
>   	return NULL;
>   }
> -EXPORT_SYMBOL(xdp_get_umem_from_qid);
> +EXPORT_SYMBOL(xdp_get_xsk_pool_from_qid);
>   
> -static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> +static void xdp_clear_xsk_pool_at_qid(struct net_device *dev, u16 queue_id)
>   {
>   	if (queue_id < dev->real_num_rx_queues)
> -		dev->_rx[queue_id].umem = NULL;
> +		dev->_rx[queue_id].pool = NULL;
>   	if (queue_id < dev->real_num_tx_queues)
> -		dev->_tx[queue_id].umem = NULL;
> +		dev->_tx[queue_id].pool = NULL;
>   }
>   
>   int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> @@ -102,10 +103,10 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
>   	if (force_zc && force_copy)
>   		return -EINVAL;
>   
> -	if (xdp_get_umem_from_qid(dev, queue_id))
> +	if (xdp_get_xsk_pool_from_qid(dev, queue_id))
>   		return -EBUSY;
>   
> -	err = xdp_reg_umem_at_qid(dev, umem, queue_id);
> +	err = xdp_reg_xsk_pool_at_qid(dev, umem->pool, queue_id);
>   	if (err)
>   		return err;
>   
> @@ -132,8 +133,8 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
>   		goto err_unreg_umem;
>   	}
>   
> -	bpf.command = XDP_SETUP_XSK_UMEM;
> -	bpf.xsk.umem = umem;
> +	bpf.command = XDP_SETUP_XSK_POOL;
> +	bpf.xsk.pool = umem->pool;
>   	bpf.xsk.queue_id = queue_id;
>   
>   	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> @@ -147,7 +148,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
>   	if (!force_zc)
>   		err = 0; /* fallback to copy mode */
>   	if (err)
> -		xdp_clear_umem_at_qid(dev, queue_id);
> +		xdp_clear_xsk_pool_at_qid(dev, queue_id);
>   	return err;
>   }
>   
> @@ -162,8 +163,8 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
>   		return;
>   
>   	if (umem->zc) {
> -		bpf.command = XDP_SETUP_XSK_UMEM;
> -		bpf.xsk.umem = NULL;
> +		bpf.command = XDP_SETUP_XSK_POOL;
> +		bpf.xsk.pool = NULL;
>   		bpf.xsk.queue_id = umem->queue_id;
>   
>   		err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
> @@ -172,7 +173,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
>   			WARN(1, "failed to disable umem!\n");
>   	}
>   
> -	xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
> +	xdp_clear_xsk_pool_at_qid(umem->dev, umem->queue_id);
>   
>   	dev_put(umem->dev);
>   	umem->dev = NULL;
> @@ -373,8 +374,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>   	if (err)
>   		goto out_account;
>   
> -	umem->pool = xp_create(umem->pgs, umem->npgs, chunks, chunk_size,
> -			       headroom, size, unaligned_chunks);
> +	umem->pool = xp_create(umem, chunks, chunk_size, headroom, size,
> +			       unaligned_chunks);
>   	if (!umem->pool) {
>   		err = -ENOMEM;
>   		goto out_pin;
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 540ed75..c57f0bb 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -32,7 +32,7 @@ void xp_destroy(struct xsk_buff_pool *pool)
>   	kvfree(pool);
>   }
>   
> -struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
> +struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
>   				u32 chunk_size, u32 headroom, u64 size,
>   				bool unaligned)
>   {
> @@ -58,6 +58,7 @@ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
>   	pool->cheap_dma = true;
>   	pool->unaligned = unaligned;
>   	pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM;
> +	pool->umem = umem;
>   	INIT_LIST_HEAD(&pool->free_list);
>   
>   	for (i = 0; i < pool->free_heads_cnt; i++) {
> @@ -67,7 +68,7 @@ struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks,
>   		pool->free_heads[i] = xskb;
>   	}
>   
> -	err = xp_addr_map(pool, pages, nr_pages);
> +	err = xp_addr_map(pool, umem->pgs, umem->npgs);
>   	if (!err)
>   		return pool;
>   
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 03/14] xsk: create and free context independently from umem
  2020-07-08 15:00   ` Maxim Mikityanskiy
@ 2020-07-09  6:47     ` Magnus Karlsson
  0 siblings, 0 replies; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-09  6:47 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Magnus Karlsson, Björn Töpel, Alexei Starovoitov,
	Daniel Borkmann, Network Development, Jonathan Lemon, bpf,
	jeffrey.t.kirsher, Fijalkowski, Maciej, Maciej Fijalkowski,
	cristian.dumitrescu

On Wed, Jul 8, 2020 at 5:01 PM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> On 2020-07-02 15:19, Magnus Karlsson wrote:
> > Create and free the buffer pool independently from the umem. Move
> > these operations that are performed on the buffer pool from the
> > umem create and destroy functions to new create and destroy
> > functions just for the buffer pool. This so that in later commits
> > we can instantiate multiple buffer pools per umem when sharing a
> > umem between HW queues and/or devices. We also erradicate the
> > back pointer from the umem to the buffer pool as this will not
> > work when we introduce the possibility to have multiple buffer
> > pools per umem.
> >
> > It might seem a bit odd that we create an empty buffer pool first
> > and then recreate it with its right size when we bind to a device
> > and umem. But the page pool will in later commits be used to
> > carry information before it has been assigned to a umem and its
> > size decided.
>
> What kind of information? I'm looking at the final code: on socket
> creation you just fill the pool with zeros, then we may have setsockopt
> for FQ and CQ, then xsk_bind replaces the pool with the real one. So the
> only information carried from the old pool to the new one is FQ and CQ,
> or did I miss anything?
>
> I don't quite like this design, it's kind of a hack to support the
> initialization order that we have, but it complicates things: when you
> copy the old pool into the new one, it's not clear which fields we care
> about, and which are ignored/overwritten.
>
> Regarding FQ and CQ, for shared UMEM, they won't be filled, so there is
> no point in the temporary pool in this case (unless it also stores
> something that I missed).
>
> I suggest to add a pointer to some kind of a configuration struct to xs.
> All things configured with setsockopt go to that struct. xsk_bind will
> call a function to validate the config struct, and if it's OK, it will
> create the pool (once), fill the fields and free the config struct.
> Config struct can be a union with the pool to save space in xs. Probably
> we will also be able to drop a few fields from xs (such as umem?). How
> do you feel about this idea?

It is used to store FQ and CQ information that is used once we enable
this in patch 10 and 11. But I agree, the current solution is not
good. I will rework this along the lines of your suggestion.

> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > ---
> >   include/net/xdp_sock.h      |   3 +-
> >   include/net/xsk_buff_pool.h |  14 +++-
> >   net/xdp/xdp_umem.c          | 164 ++++----------------------------------------
> >   net/xdp/xdp_umem.h          |   4 +-
> >   net/xdp/xsk.c               |  83 +++++++++++++++++++---
> >   net/xdp/xsk.h               |   3 +
> >   net/xdp/xsk_buff_pool.c     | 154 +++++++++++++++++++++++++++++++++++++----
> >   net/xdp/xsk_queue.h         |  12 ++--
> >   8 files changed, 250 insertions(+), 187 deletions(-)
> >
> > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> > index 6eb9628..b9bb118 100644
> > --- a/include/net/xdp_sock.h
> > +++ b/include/net/xdp_sock.h
> > @@ -20,13 +20,12 @@ struct xdp_buff;
> >   struct xdp_umem {
> >       struct xsk_queue *fq;
> >       struct xsk_queue *cq;
> > -     struct xsk_buff_pool *pool;
> >       u64 size;
> >       u32 headroom;
> >       u32 chunk_size;
> > +     u32 chunks;
> >       struct user_struct *user;
> >       refcount_t users;
> > -     struct work_struct work;
> >       struct page **pgs;
> >       u32 npgs;
> >       u16 queue_id;
> > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> > index a6dec9c..cda8ced 100644
> > --- a/include/net/xsk_buff_pool.h
> > +++ b/include/net/xsk_buff_pool.h
> > @@ -14,6 +14,7 @@ struct xdp_rxq_info;
> >   struct xsk_queue;
> >   struct xdp_desc;
> >   struct xdp_umem;
> > +struct xdp_sock;
> >   struct device;
> >   struct page;
> >
> > @@ -46,16 +47,23 @@ struct xsk_buff_pool {
> >       struct xdp_umem *umem;
> >       void *addrs;
> >       struct device *dev;
> > +     refcount_t users;
> > +     struct work_struct work;
> >       struct xdp_buff_xsk *free_heads[];
> >   };
> >
> >   /* AF_XDP core. */
> > -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
> > -                             u32 chunk_size, u32 headroom, u64 size,
> > -                             bool unaligned);
> > +struct xsk_buff_pool *xp_create(void);
> > +struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool,
> > +                                  struct xdp_umem *umem);
> > +int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
> > +               struct net_device *dev, u16 queue_id, u16 flags);
> >   void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq);
> >   void xp_destroy(struct xsk_buff_pool *pool);
> >   void xp_release(struct xdp_buff_xsk *xskb);
> > +void xp_get_pool(struct xsk_buff_pool *pool);
> > +void xp_put_pool(struct xsk_buff_pool *pool);
> > +void xp_clear_dev(struct xsk_buff_pool *pool);
> >
> >   /* AF_XDP, and XDP core. */
> >   void xp_free(struct xdp_buff_xsk *xskb);
> > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > index adde4d5..f290345 100644
> > --- a/net/xdp/xdp_umem.c
> > +++ b/net/xdp/xdp_umem.c
> > @@ -47,160 +47,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
> >       spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags);
> >   }
> >
> > -/* The umem is stored both in the _rx struct and the _tx struct as we do
> > - * not know if the device has more tx queues than rx, or the opposite.
> > - * This might also change during run time.
> > - */
> > -static int xsk_reg_pool_at_qid(struct net_device *dev,
> > -                            struct xsk_buff_pool *pool,
> > -                            u16 queue_id)
> > -{
> > -     if (queue_id >= max_t(unsigned int,
> > -                           dev->real_num_rx_queues,
> > -                           dev->real_num_tx_queues))
> > -             return -EINVAL;
> > -
> > -     if (queue_id < dev->real_num_rx_queues)
> > -             dev->_rx[queue_id].pool = pool;
> > -     if (queue_id < dev->real_num_tx_queues)
> > -             dev->_tx[queue_id].pool = pool;
> > -
> > -     return 0;
> > -}
> > -
> > -struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
> > -                                         u16 queue_id)
> > +static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> >   {
> > -     if (queue_id < dev->real_num_rx_queues)
> > -             return dev->_rx[queue_id].pool;
> > -     if (queue_id < dev->real_num_tx_queues)
> > -             return dev->_tx[queue_id].pool;
> > +     unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
> >
> > -     return NULL;
> > +     kfree(umem->pgs);
> > +     umem->pgs = NULL;
> >   }
> > -EXPORT_SYMBOL(xsk_get_pool_from_qid);
> >
> > -static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id)
> > +static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
> >   {
> > -     if (queue_id < dev->real_num_rx_queues)
> > -             dev->_rx[queue_id].pool = NULL;
> > -     if (queue_id < dev->real_num_tx_queues)
> > -             dev->_tx[queue_id].pool = NULL;
> > +     if (umem->user) {
> > +             atomic_long_sub(umem->npgs, &umem->user->locked_vm);
> > +             free_uid(umem->user);
> > +     }
> >   }
> >
> > -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > -                     u16 queue_id, u16 flags)
> > +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > +                      u16 queue_id)
> >   {
> > -     bool force_zc, force_copy;
> > -     struct netdev_bpf bpf;
> > -     int err = 0;
> > -
> > -     ASSERT_RTNL();
> > -
> > -     force_zc = flags & XDP_ZEROCOPY;
> > -     force_copy = flags & XDP_COPY;
> > -
> > -     if (force_zc && force_copy)
> > -             return -EINVAL;
> > -
> > -     if (xsk_get_pool_from_qid(dev, queue_id))
> > -             return -EBUSY;
> > -
> > -     err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id);
> > -     if (err)
> > -             return err;
> > -
> >       umem->dev = dev;
> >       umem->queue_id = queue_id;
> >
> > -     if (flags & XDP_USE_NEED_WAKEUP) {
> > -             umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
> > -             /* Tx needs to be explicitly woken up the first time.
> > -              * Also for supporting drivers that do not implement this
> > -              * feature. They will always have to call sendto().
> > -              */
> > -             xsk_set_tx_need_wakeup(umem->pool);
> > -     }
> > -
> >       dev_hold(dev);
> > -
> > -     if (force_copy)
> > -             /* For copy-mode, we are done. */
> > -             return 0;
> > -
> > -     if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) {
> > -             err = -EOPNOTSUPP;
> > -             goto err_unreg_umem;
> > -     }
> > -
> > -     bpf.command = XDP_SETUP_XSK_POOL;
> > -     bpf.xsk.pool = umem->pool;
> > -     bpf.xsk.queue_id = queue_id;
> > -
> > -     err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> > -     if (err)
> > -             goto err_unreg_umem;
> > -
> > -     umem->zc = true;
> > -     return 0;
> > -
> > -err_unreg_umem:
> > -     if (!force_zc)
> > -             err = 0; /* fallback to copy mode */
> > -     if (err)
> > -             xsk_clear_pool_at_qid(dev, queue_id);
> > -     return err;
> >   }
> >
> >   void xdp_umem_clear_dev(struct xdp_umem *umem)
> >   {
> > -     struct netdev_bpf bpf;
> > -     int err;
> > -
> > -     ASSERT_RTNL();
> > -
> > -     if (!umem->dev)
> > -             return;
> > -
> > -     if (umem->zc) {
> > -             bpf.command = XDP_SETUP_XSK_POOL;
> > -             bpf.xsk.pool = NULL;
> > -             bpf.xsk.queue_id = umem->queue_id;
> > -
> > -             err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
> > -
> > -             if (err)
> > -                     WARN(1, "failed to disable umem!\n");
> > -     }
> > -
> > -     xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
> > -
> >       dev_put(umem->dev);
> >       umem->dev = NULL;
> >       umem->zc = false;
> >   }
> >
> > -static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> > -{
> > -     unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
> > -
> > -     kfree(umem->pgs);
> > -     umem->pgs = NULL;
> > -}
> > -
> > -static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
> > -{
> > -     if (umem->user) {
> > -             atomic_long_sub(umem->npgs, &umem->user->locked_vm);
> > -             free_uid(umem->user);
> > -     }
> > -}
> > -
> >   static void xdp_umem_release(struct xdp_umem *umem)
> >   {
> > -     rtnl_lock();
> >       xdp_umem_clear_dev(umem);
> > -     rtnl_unlock();
> >
> >       ida_simple_remove(&umem_ida, umem->id);
> >
> > @@ -214,20 +95,12 @@ static void xdp_umem_release(struct xdp_umem *umem)
> >               umem->cq = NULL;
> >       }
> >
> > -     xp_destroy(umem->pool);
> >       xdp_umem_unpin_pages(umem);
> >
> >       xdp_umem_unaccount_pages(umem);
> >       kfree(umem);
> >   }
> >
> > -static void xdp_umem_release_deferred(struct work_struct *work)
> > -{
> > -     struct xdp_umem *umem = container_of(work, struct xdp_umem, work);
> > -
> > -     xdp_umem_release(umem);
> > -}
> > -
> >   void xdp_get_umem(struct xdp_umem *umem)
> >   {
> >       refcount_inc(&umem->users);
> > @@ -238,10 +111,8 @@ void xdp_put_umem(struct xdp_umem *umem)
> >       if (!umem)
> >               return;
> >
> > -     if (refcount_dec_and_test(&umem->users)) {
> > -             INIT_WORK(&umem->work, xdp_umem_release_deferred);
> > -             schedule_work(&umem->work);
> > -     }
> > +     if (refcount_dec_and_test(&umem->users))
> > +             xdp_umem_release(umem);
> >   }
> >
> >   static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
> > @@ -357,6 +228,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
> >       umem->size = size;
> >       umem->headroom = headroom;
> >       umem->chunk_size = chunk_size;
> > +     umem->chunks = chunks;
> >       umem->npgs = (u32)npgs;
> >       umem->pgs = NULL;
> >       umem->user = NULL;
> > @@ -374,16 +246,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
> >       if (err)
> >               goto out_account;
> >
> > -     umem->pool = xp_create(umem, chunks, chunk_size, headroom, size,
> > -                            unaligned_chunks);
> > -     if (!umem->pool) {
> > -             err = -ENOMEM;
> > -             goto out_pin;
> > -     }
> >       return 0;
> >
> > -out_pin:
> > -     xdp_umem_unpin_pages(umem);
> >   out_account:
> >       xdp_umem_unaccount_pages(umem);
> >       return err;
> > diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> > index 32067fe..93e96be 100644
> > --- a/net/xdp/xdp_umem.h
> > +++ b/net/xdp/xdp_umem.h
> > @@ -8,8 +8,8 @@
> >
> >   #include <net/xdp_sock_drv.h>
> >
> > -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > -                     u16 queue_id, u16 flags);
> > +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > +                      u16 queue_id);
> >   void xdp_umem_clear_dev(struct xdp_umem *umem);
> >   bool xdp_umem_validate_queues(struct xdp_umem *umem);
> >   void xdp_get_umem(struct xdp_umem *umem);
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 7551f5b..b12a832 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -105,6 +105,46 @@ bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
> >   }
> >   EXPORT_SYMBOL(xsk_uses_need_wakeup);
> >
> > +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
> > +                                         u16 queue_id)
> > +{
> > +     if (queue_id < dev->real_num_rx_queues)
> > +             return dev->_rx[queue_id].pool;
> > +     if (queue_id < dev->real_num_tx_queues)
> > +             return dev->_tx[queue_id].pool;
> > +
> > +     return NULL;
> > +}
> > +EXPORT_SYMBOL(xsk_get_pool_from_qid);
> > +
> > +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id)
> > +{
> > +     if (queue_id < dev->real_num_rx_queues)
> > +             dev->_rx[queue_id].pool = NULL;
> > +     if (queue_id < dev->real_num_tx_queues)
> > +             dev->_tx[queue_id].pool = NULL;
> > +}
> > +
> > +/* The buffer pool is stored both in the _rx struct and the _tx struct as we do
> > + * not know if the device has more tx queues than rx, or the opposite.
> > + * This might also change during run time.
> > + */
> > +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
> > +                     u16 queue_id)
> > +{
> > +     if (queue_id >= max_t(unsigned int,
> > +                           dev->real_num_rx_queues,
> > +                           dev->real_num_tx_queues))
> > +             return -EINVAL;
> > +
> > +     if (queue_id < dev->real_num_rx_queues)
> > +             dev->_rx[queue_id].pool = pool;
> > +     if (queue_id < dev->real_num_tx_queues)
> > +             dev->_tx[queue_id].pool = pool;
> > +
> > +     return 0;
> > +}
> > +
> >   void xp_release(struct xdp_buff_xsk *xskb)
> >   {
> >       xskb->pool->free_heads[xskb->pool->free_heads_cnt++] = xskb;
> > @@ -281,7 +321,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> >
> >       rcu_read_lock();
> >       list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
> > -             if (!xskq_cons_peek_desc(xs->tx, desc, umem))
> > +             if (!xskq_cons_peek_desc(xs->tx, desc, pool))
> >                       continue;
> >
> >               /* This is the backpressure mechanism for the Tx path.
> > @@ -347,7 +387,7 @@ static int xsk_generic_xmit(struct sock *sk)
> >       if (xs->queue_id >= xs->dev->real_num_tx_queues)
> >               goto out;
> >
> > -     while (xskq_cons_peek_desc(xs->tx, &desc, xs->umem)) {
> > +     while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> >               char *buffer;
> >               u64 addr;
> >               u32 len;
> > @@ -629,6 +669,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> >       qid = sxdp->sxdp_queue_id;
> >
> >       if (flags & XDP_SHARED_UMEM) {
> > +             struct xsk_buff_pool *curr_pool;
> >               struct xdp_sock *umem_xs;
> >               struct socket *sock;
> >
> > @@ -663,6 +704,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> >                       goto out_unlock;
> >               }
> >
> > +             /* Share the buffer pool with the other socket. */
> > +             xp_get_pool(umem_xs->pool);
> > +             curr_pool = xs->pool;
> > +             xs->pool = umem_xs->pool;
> > +             xp_destroy(curr_pool);
> >               xdp_get_umem(umem_xs->umem);
> >               WRITE_ONCE(xs->umem, umem_xs->umem);
> >               sockfd_put(sock);
> > @@ -670,10 +716,24 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> >               err = -EINVAL;
> >               goto out_unlock;
> >       } else {
> > +             struct xsk_buff_pool *new_pool;
> > +
> >               /* This xsk has its own umem. */
> > -             err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
> > -             if (err)
> > +             xdp_umem_assign_dev(xs->umem, dev, qid);
> > +             new_pool = xp_assign_umem(xs->pool, xs->umem);
>
> It looks like the old pool (xs->pool) is never freed.

This will go away in the new design.

Thanks: Magnus

> > +             if (!new_pool) {
> > +                     err = -ENOMEM;
> > +                     xdp_umem_clear_dev(xs->umem);
> > +                     goto out_unlock;
> > +             }
> > +
> > +             err = xp_assign_dev(new_pool, xs, dev, qid, flags);
> > +             if (err) {
> > +                     xp_destroy(new_pool);
> > +                     xdp_umem_clear_dev(xs->umem);
> >                       goto out_unlock;
> > +             }
> > +             xs->pool = new_pool;
> >       }
> >
> >       xs->dev = dev;
> > @@ -765,8 +825,6 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
> >                       return PTR_ERR(umem);
> >               }
> >
> > -             xs->pool = umem->pool;
> > -
> >               /* Make sure umem is ready before it can be seen by others */
> >               smp_wmb();
> >               WRITE_ONCE(xs->umem, umem);
> > @@ -796,7 +854,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
> >                       &xs->umem->cq;
> >               err = xsk_init_queue(entries, q, true);
> >               if (optname == XDP_UMEM_FILL_RING)
> > -                     xp_set_fq(xs->umem->pool, *q);
> > +                     xp_set_fq(xs->pool, *q);
> >               mutex_unlock(&xs->mutex);
> >               return err;
> >       }
> > @@ -1002,7 +1060,8 @@ static int xsk_notifier(struct notifier_block *this,
> >
> >                               xsk_unbind_dev(xs);
> >
> > -                             /* Clear device references in umem. */
> > +                             /* Clear device references. */
> > +                             xp_clear_dev(xs->pool);
> >                               xdp_umem_clear_dev(xs->umem);
> >                       }
> >                       mutex_unlock(&xs->mutex);
> > @@ -1047,7 +1106,7 @@ static void xsk_destruct(struct sock *sk)
> >       if (!sock_flag(sk, SOCK_DEAD))
> >               return;
> >
> > -     xdp_put_umem(xs->umem);
> > +     xp_put_pool(xs->pool);
> >
> >       sk_refcnt_debug_dec(sk);
> >   }
> > @@ -1055,8 +1114,8 @@ static void xsk_destruct(struct sock *sk)
> >   static int xsk_create(struct net *net, struct socket *sock, int protocol,
> >                     int kern)
> >   {
> > -     struct sock *sk;
> >       struct xdp_sock *xs;
> > +     struct sock *sk;
> >
> >       if (!ns_capable(net->user_ns, CAP_NET_RAW))
> >               return -EPERM;
> > @@ -1092,6 +1151,10 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
> >       INIT_LIST_HEAD(&xs->map_list);
> >       spin_lock_init(&xs->map_list_lock);
> >
> > +     xs->pool = xp_create();
> > +     if (!xs->pool)
> > +             return -ENOMEM;
> > +
> >       mutex_lock(&net->xdp.lock);
> >       sk_add_node_rcu(sk, &net->xdp.list);
> >       mutex_unlock(&net->xdp.lock);
> > diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
> > index 455ddd4..a00e3e2 100644
> > --- a/net/xdp/xsk.h
> > +++ b/net/xdp/xsk.h
> > @@ -51,5 +51,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
> >                            struct xdp_sock **map_entry);
> >   int xsk_map_inc(struct xsk_map *map);
> >   void xsk_map_put(struct xsk_map *map);
> > +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
> > +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
> > +                     u16 queue_id);
> >
> >   #endif /* XSK_H_ */
> > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > index c57f0bb..da93b36 100644
> > --- a/net/xdp/xsk_buff_pool.c
> > +++ b/net/xdp/xsk_buff_pool.c
> > @@ -2,11 +2,14 @@
> >
> >   #include <net/xsk_buff_pool.h>
> >   #include <net/xdp_sock.h>
> > +#include <net/xdp_sock_drv.h>
> >   #include <linux/dma-direct.h>
> >   #include <linux/dma-noncoherent.h>
> >   #include <linux/swiotlb.h>
> >
> >   #include "xsk_queue.h"
> > +#include "xdp_umem.h"
> > +#include "xsk.h"
> >
> >   static void xp_addr_unmap(struct xsk_buff_pool *pool)
> >   {
> > @@ -32,39 +35,48 @@ void xp_destroy(struct xsk_buff_pool *pool)
> >       kvfree(pool);
> >   }
> >
> > -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks,
> > -                             u32 chunk_size, u32 headroom, u64 size,
> > -                             bool unaligned)
> > +struct xsk_buff_pool *xp_create(void)
> > +{
> > +     return kvzalloc(sizeof(struct xsk_buff_pool), GFP_KERNEL);
> > +}
> > +
> > +struct xsk_buff_pool *xp_assign_umem(struct xsk_buff_pool *pool_old,
> > +                                  struct xdp_umem *umem)
> >   {
> >       struct xsk_buff_pool *pool;
> >       struct xdp_buff_xsk *xskb;
> >       int err;
> >       u32 i;
> >
> > -     pool = kvzalloc(struct_size(pool, free_heads, chunks), GFP_KERNEL);
> > +     pool = kvzalloc(struct_size(pool, free_heads, umem->chunks),
> > +                     GFP_KERNEL);
> >       if (!pool)
> >               goto out;
> >
> > -     pool->heads = kvcalloc(chunks, sizeof(*pool->heads), GFP_KERNEL);
> > +     memcpy(pool, pool_old, sizeof(*pool_old));
> > +
> > +     pool->heads = kvcalloc(umem->chunks, sizeof(*pool->heads), GFP_KERNEL);
> >       if (!pool->heads)
> >               goto out;
> >
> > -     pool->chunk_mask = ~((u64)chunk_size - 1);
> > -     pool->addrs_cnt = size;
> > -     pool->heads_cnt = chunks;
> > -     pool->free_heads_cnt = chunks;
> > -     pool->headroom = headroom;
> > -     pool->chunk_size = chunk_size;
> > +     pool->chunk_mask = ~((u64)umem->chunk_size - 1);
> > +     pool->addrs_cnt = umem->size;
> > +     pool->heads_cnt = umem->chunks;
> > +     pool->free_heads_cnt = umem->chunks;
> > +     pool->headroom = umem->headroom;
> > +     pool->chunk_size = umem->chunk_size;
> >       pool->cheap_dma = true;
> > -     pool->unaligned = unaligned;
> > -     pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM;
> > +     pool->unaligned = umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
> > +     pool->frame_len = umem->chunk_size - umem->headroom -
> > +             XDP_PACKET_HEADROOM;
> >       pool->umem = umem;
> >       INIT_LIST_HEAD(&pool->free_list);
> > +     refcount_set(&pool->users, 1);
> >
> >       for (i = 0; i < pool->free_heads_cnt; i++) {
> >               xskb = &pool->heads[i];
> >               xskb->pool = pool;
> > -             xskb->xdp.frame_sz = chunk_size - headroom;
> > +             xskb->xdp.frame_sz = umem->chunk_size - umem->headroom;
> >               pool->free_heads[i] = xskb;
> >       }
> >
> > @@ -91,6 +103,120 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
> >   }
> >   EXPORT_SYMBOL(xp_set_rxq_info);
> >
> > +int xp_assign_dev(struct xsk_buff_pool *pool, struct xdp_sock *xs,
> > +               struct net_device *dev, u16 queue_id, u16 flags)
> > +{
> > +     struct xdp_umem *umem = pool->umem;
> > +     bool force_zc, force_copy;
> > +     struct netdev_bpf bpf;
> > +     int err = 0;
> > +
> > +     ASSERT_RTNL();
> > +
> > +     force_zc = flags & XDP_ZEROCOPY;
> > +     force_copy = flags & XDP_COPY;
> > +
> > +     if (force_zc && force_copy)
> > +             return -EINVAL;
> > +
> > +     if (xsk_get_pool_from_qid(dev, queue_id))
> > +             return -EBUSY;
> > +
> > +     err = xsk_reg_pool_at_qid(dev, pool, queue_id);
> > +     if (err)
> > +             return err;
> > +
> > +     if ((flags & XDP_USE_NEED_WAKEUP) && xs->tx) {
> > +             umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
> > +             /* Tx needs to be explicitly woken up the first time.
> > +              * Also for supporting drivers that do not implement this
> > +              * feature. They will always have to call sendto().
> > +              */
> > +             xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP;
> > +     }
> > +
> > +     if (force_copy)
> > +             /* For copy-mode, we are done. */
> > +             return 0;
> > +
> > +     if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) {
> > +             err = -EOPNOTSUPP;
> > +             goto err_unreg_pool;
> > +     }
> > +
> > +     bpf.command = XDP_SETUP_XSK_POOL;
> > +     bpf.xsk.pool = pool;
> > +     bpf.xsk.queue_id = queue_id;
> > +
> > +     err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> > +     if (err)
> > +             goto err_unreg_pool;
> > +
> > +     umem->zc = true;
> > +     return 0;
> > +
> > +err_unreg_pool:
> > +     if (!force_zc)
> > +             err = 0; /* fallback to copy mode */
> > +     if (err)
> > +             xsk_clear_pool_at_qid(dev, queue_id);
> > +     return err;
> > +}
> > +
> > +void xp_clear_dev(struct xsk_buff_pool *pool)
> > +{
> > +     struct xdp_umem *umem = pool->umem;
> > +     struct netdev_bpf bpf;
> > +     int err;
> > +
> > +     ASSERT_RTNL();
> > +
> > +     if (!umem->dev)
> > +             return;
> > +
> > +     if (umem->zc) {
> > +             bpf.command = XDP_SETUP_XSK_POOL;
> > +             bpf.xsk.pool = NULL;
> > +             bpf.xsk.queue_id = umem->queue_id;
> > +
> > +             err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf);
> > +
> > +             if (err)
> > +                     WARN(1, "failed to disable umem!\n");
> > +     }
> > +
> > +     xsk_clear_pool_at_qid(umem->dev, umem->queue_id);
> > +}
> > +
> > +static void xp_release_deferred(struct work_struct *work)
> > +{
> > +     struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool,
> > +                                               work);
> > +
> > +     rtnl_lock();
> > +     xp_clear_dev(pool);
> > +     rtnl_unlock();
> > +
> > +     xdp_put_umem(pool->umem);
> > +     xp_destroy(pool);
> > +}
> > +
> > +void xp_get_pool(struct xsk_buff_pool *pool)
> > +{
> > +     refcount_inc(&pool->users);
> > +}
> > +
> > +void xp_put_pool(struct xsk_buff_pool *pool)
> > +{
> > +     if (!pool)
> > +             return;
> > +
> > +     if (refcount_dec_and_test(&pool->users)) {
> > +             INIT_WORK(&pool->work, xp_release_deferred);
> > +             schedule_work(&pool->work);
> > +     }
> > +}
> > +
> >   void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
> >   {
> >       dma_addr_t *dma;
> > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > index 5b5d24d..75f1853 100644
> > --- a/net/xdp/xsk_queue.h
> > +++ b/net/xdp/xsk_queue.h
> > @@ -165,9 +165,9 @@ static inline bool xp_validate_desc(struct xsk_buff_pool *pool,
> >
> >   static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
> >                                          struct xdp_desc *d,
> > -                                        struct xdp_umem *umem)
> > +                                        struct xsk_buff_pool *pool)
> >   {
> > -     if (!xp_validate_desc(umem->pool, d)) {
> > +     if (!xp_validate_desc(pool, d)) {
> >               q->invalid_descs++;
> >               return false;
> >       }
> > @@ -176,14 +176,14 @@ static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
> >
> >   static inline bool xskq_cons_read_desc(struct xsk_queue *q,
> >                                      struct xdp_desc *desc,
> > -                                    struct xdp_umem *umem)
> > +                                    struct xsk_buff_pool *pool)
> >   {
> >       while (q->cached_cons != q->cached_prod) {
> >               struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
> >               u32 idx = q->cached_cons & q->ring_mask;
> >
> >               *desc = ring->desc[idx];
> > -             if (xskq_cons_is_valid_desc(q, desc, umem))
> > +             if (xskq_cons_is_valid_desc(q, desc, pool))
> >                       return true;
> >
> >               q->cached_cons++;
> > @@ -235,11 +235,11 @@ static inline bool xskq_cons_peek_addr_unchecked(struct xsk_queue *q, u64 *addr)
> >
> >   static inline bool xskq_cons_peek_desc(struct xsk_queue *q,
> >                                      struct xdp_desc *desc,
> > -                                    struct xdp_umem *umem)
> > +                                    struct xsk_buff_pool *pool)
> >   {
> >       if (q->cached_prod == q->cached_cons)
> >               xskq_cons_get_entries(q);
> > -     return xskq_cons_read_desc(q, desc, umem);
> > +     return xskq_cons_read_desc(q, desc, pool);
> >   }
> >
> >   static inline void xskq_cons_release(struct xsk_queue *q)
> >
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues
  2020-07-08 15:00 ` Maxim Mikityanskiy
@ 2020-07-09  6:54   ` Magnus Karlsson
  2020-07-09 14:56     ` [PATCH 1/2] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Maxim Mikityanskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Magnus Karlsson @ 2020-07-09  6:54 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Magnus Karlsson, Björn Töpel, Alexei Starovoitov,
	Daniel Borkmann, Network Development, Jonathan Lemon, bpf,
	jeffrey.t.kirsher, Fijalkowski, Maciej, Maciej Fijalkowski,
	cristian.dumitrescu

On Wed, Jul 8, 2020 at 5:02 PM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> On 2020-07-02 15:18, Magnus Karlsson wrote:
> > This patch set adds support to share a umem between AF_XDP sockets
> > bound to different queue ids on the same device or even between
> > devices. It has already been possible to do this by registering the
> > umem multiple times, but this wastes a lot of memory. Just imagine
> > having 10 threads each having 10 sockets open sharing a single
> > umem. This means that you would have to register the umem 100 times
> > consuming large quantities of memory.
>
> Just to clarify: the main memory savings are achieved, because we don't
> need to store an array of pages in struct xdp_umem multiple times, right?
>
> I guess there is one more drawback of sharing a UMEM the old way
> (register it multiple times): it would map (DMA) the same pages multiple
> times.

Both are correct. The main saving is from only having to DMA map the
device once, your second comment.

> > Instead, we extend the existing XDP_SHARED_UMEM flag to also work when
> > sharing a umem between different queue ids as well as devices. If you
> > would like to share umem between two sockets, just create the first
> > one as would do normally. For the second socket you would not register
> > the same umem using the XDP_UMEM_REG setsockopt. Instead attach one
> > new fill ring and one new completion ring to this second socket and
> > then use the XDP_SHARED_UMEM bind flag supplying the file descriptor of
> > the first socket in the sxdp_shared_umem_fd field to signify that it
> > is the umem of the first socket you would like to share.
> >
> > One important thing to note in this example, is that there needs to be
> > one fill ring and one completion ring per unique device and queue id
> > bound to. This so that the single-producer and single-consumer semantics
> > of the rings can be upheld. To recap, if you bind multiple sockets to
> > the same device and queue id (already supported without this patch
> > set), you only need one pair of fill and completion rings. If you bind
> > multiple sockets to multiple different queues or devices, you need one
> > fill and completion ring pair per unique device,queue_id tuple.
> >
> > The implementation is based around extending the buffer pool in the
> > core xsk code. This is a structure that exists on a per unique device
> > and queue id basis. So, a number of entities that can now be shared
> > are moved from the umem to the buffer pool. Information about DMA
> > mappings are also moved from the buffer pool, but as these are per
> > device independent of the queue id, they are now hanging off the
> > netdev.
>
> Basically, you want to map a pair of (netdev, UMEM) to DMA info. The
> current implementation of xp_find_dma_map stores a list of UMEMs in the
> netdev and goes over that list to find the corresponding DMA info. It
> would be more effective to do it vice-versa, i.e. to store the list of
> netdevs inside of a UMEM, because you normally have fewer netdevs in the
> system than sockets, and you'll have fewer list items to traverse. Of
> course, it has no effect on the data path, but it will improve the time
> to open a socket (i.e. connection rate).

Good idea. Will fix.

> > In summary after this patch set, there is one xdp_sock struct
> > per socket created. This points to an xsk_buff_pool for which there is
> > one per unique device and queue id. The buffer pool points to a DMA
> > mapping structure for which there is one per device that a umem has
> > been bound to. And finally, the buffer pool also points to a xdp_umem
> > struct, for which there is only one per umem registration.
> >
> > Before:
> >
> > XSK -> UMEM -> POOL
> >
> > Now:
> >
> > XSK -> POOL -> DMA
> >              \
> >            > UMEM
> >
> > Patches 1-8 only rearrange internal structures to support the buffer
> > pool carrying this new information, while patch 9 improves performance
> > as we now have rearrange the internal structures quite a bit. Finally,
> > patches 10-14 introduce the new functionality together with libbpf
> > support, samples, and documentation.
> >
> > Libbpf has also been extended to support sharing of umems between
> > sockets bound to different devices and queue ids by introducing a new
> > function called xsk_socket__create_shared(). The difference between
> > this and the existing xsk_socket__create() is that the former takes a
> > reference to a fill ring and a completion ring as these need to be
> > created. This new function needs to be used for the second and
> > following sockets that binds to the same umem. The first one can be
> > created by either function as it will also have called
> > xsk_umem__create().
> >
> > There is also a new sample xsk_fwd that demonstrates this new
> > interface and capability.
> >
> > Note to Maxim at Mellanox. I do not have a mlx5 card, so I have not
> > been able to test the changes to your driver. It compiles, but that is
> > all I can say, so it would be great if you could test it. Also, I did
> > change the name of many functions and variables from umem to pool as a
> > buffer pool is passed down to the driver in this patch set instead of
> > the umem. I did not change the name of the files umem.c and
> > umem.h. Please go through the changes and change things to your
> > liking.
>
> I looked through the mlx5 patches, and I see the changes are minor, and
> most importantly, the functionality is not broken (tested with xdpsock).
> I would still like to make some cosmetic amendments - I'll send you an
> updated patch.

Appreciated. Thanks.

> > Performance for the non-shared umem case is unchanged for the xdpsock
> > sample application with this patch set.
>
> I also tested it on mlx5 (ConnectX-5 Ex), and the performance hasn't
> been hurt.

Good to hear. I might include another patch in the v2 that improves
performance with 3% for the l2fwd sample app on my system. It is in
common code so should benefit everyone. Though, it is dependent on a
new DMA interface patch set from Christof making it from bpf to
bpf-next. If it makes it over in time, I will include it. Otherwise,
it will be submitted later.

/Magnus

> > For workloads that share a
> > umem, this patch set can give rise to added performance benefits due
> > to the decrease in memory usage.
> >
> > This patch has been applied against commit 91f77560e473 ("Merge branch 'test_progs-improvements'")
> >
> > Structure of the patch set:
> >
> > Patch 1: Pass the buffer pool to the driver instead of the umem. This
> >           because the driver needs one buffer pool per napi context
> >           when we later introduce sharing of the umem between queue ids
> >           and devices.
> > Patch 2: Rename the xsk driver interface so they have better names
> >           after the move to the buffer pool
> > Patch 3: There is one buffer pool per device and queue, while there is
> >           only one umem per registration. The buffer pool needs to be
> >           created and destroyed independently of the umem.
> > Patch 4: Move fill and completion rings to the buffer pool as there will
> >           be one set of these per device and queue
> > Patch 5: Move queue_id, dev and need_wakeup to buffer pool again as these
> >           will now be per buffer pool as the umem can be shared between
> >           devices and queues
> > Patch 6: Move xsk_tx_list and its lock to buffer pool
> > Patch 7: Move the creation/deletion of addrs from buffer pool to umem
> > Patch 8: Enable sharing of DMA mappings when multiple queues of the
> >           same device are bound
> > Patch 9: Rearrange internal structs for better performance as these
> >           have been substantially scrambled by the previous patches
> > Patch 10: Add shared umem support between queue ids
> > Patch 11: Add shared umem support between devices
> > Patch 12: Add support for this in libbpf
> > Patch 13: Add a new sample that demonstrates this new feature by
> >            forwarding packets between different netdevs and queues
> > Patch 14: Add documentation
> >
> > Thanks: Magnus
> >
> > Cristian Dumitrescu (1):
> >    samples/bpf: add new sample xsk_fwd.c
> >
> > Magnus Karlsson (13):
> >    xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of
> >      umem
> >    xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces
> >    xsk: create and free context independently from umem
> >    xsk: move fill and completion rings to buffer pool
> >    xsk: move queue_id, dev and need_wakeup to context
> >    xsk: move xsk_tx_list and its lock to buffer pool
> >    xsk: move addrs from buffer pool to umem
> >    xsk: net: enable sharing of dma mappings
> >    xsk: rearrange internal structs for better performance
> >    xsk: add shared umem support between queue ids
> >    xsk: add shared umem support between devices
> >    libbpf: support shared umems between queues and devices
> >    xsk: documentation for XDP_SHARED_UMEM between queues and netdevs
> >
> >   Documentation/networking/af_xdp.rst                |   68 +-
> >   drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |    2 +-
> >   drivers/net/ethernet/intel/i40e/i40e_main.c        |   29 +-
> >   drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   10 +-
> >   drivers/net/ethernet/intel/i40e/i40e_txrx.h        |    2 +-
> >   drivers/net/ethernet/intel/i40e/i40e_xsk.c         |   79 +-
> >   drivers/net/ethernet/intel/i40e/i40e_xsk.h         |    4 +-
> >   drivers/net/ethernet/intel/ice/ice.h               |   18 +-
> >   drivers/net/ethernet/intel/ice/ice_base.c          |   16 +-
> >   drivers/net/ethernet/intel/ice/ice_lib.c           |    2 +-
> >   drivers/net/ethernet/intel/ice/ice_main.c          |   10 +-
> >   drivers/net/ethernet/intel/ice/ice_txrx.c          |    8 +-
> >   drivers/net/ethernet/intel/ice/ice_txrx.h          |    2 +-
> >   drivers/net/ethernet/intel/ice/ice_xsk.c           |  142 +--
> >   drivers/net/ethernet/intel/ice/ice_xsk.h           |    7 +-
> >   drivers/net/ethernet/intel/ixgbe/ixgbe.h           |    2 +-
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   34 +-
> >   .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h   |    7 +-
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |   61 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en.h       |   19 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |    5 +-
> >   .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |   10 +-
> >   .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c |   12 +-
> >   .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.h |    2 +-
> >   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    |   12 +-
> >   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h    |    6 +-
> >   .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c  |  108 +-
> >   .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h  |   14 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   46 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   16 +-
> >   include/linux/netdevice.h                          |   13 +-
> >   include/net/xdp_sock.h                             |   28 +-
> >   include/net/xdp_sock_drv.h                         |  115 ++-
> >   include/net/xsk_buff_pool.h                        |   47 +-
> >   net/core/dev.c                                     |    3 +
> >   net/ethtool/channels.c                             |    2 +-
> >   net/ethtool/ioctl.c                                |    2 +-
> >   net/xdp/xdp_umem.c                                 |  221 +---
> >   net/xdp/xdp_umem.h                                 |    6 -
> >   net/xdp/xsk.c                                      |  213 ++--
> >   net/xdp/xsk.h                                      |    3 +
> >   net/xdp/xsk_buff_pool.c                            |  314 +++++-
> >   net/xdp/xsk_diag.c                                 |   14 +-
> >   net/xdp/xsk_queue.h                                |   12 +-
> >   samples/bpf/Makefile                               |    3 +
> >   samples/bpf/xsk_fwd.c                              | 1075 ++++++++++++++++++++
> >   tools/lib/bpf/libbpf.map                           |    1 +
> >   tools/lib/bpf/xsk.c                                |  376 ++++---
> >   tools/lib/bpf/xsk.h                                |    9 +
> >   49 files changed, 2327 insertions(+), 883 deletions(-)
> >   create mode 100644 samples/bpf/xsk_fwd.c
> >
> > --
> > 2.7.4
> >
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem
  2020-07-09  6:54   ` Magnus Karlsson
@ 2020-07-09 14:56     ` Maxim Mikityanskiy
  2020-07-09 14:56       ` [PATCH 2/2] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces Maxim Mikityanskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Maxim Mikityanskiy @ 2020-07-09 14:56 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Network Development, Jonathan Lemon, bpf, jeffrey.t.kirsher,
	Fijalkowski Maciej, Maciej Fijalkowski, cristian.dumitrescu,
	Maxim Mikityanskiy

From: Magnus Karlsson <magnus.karlsson@intel.com>

Replace the explicit umem reference passed to the driver in
AF_XDP zero-copy mode with the buffer pool instead. This in
preparation for extending the functionality of the zero-copy mode
so that umems can be shared between queues on the same netdev and
also between netdevs. In this commit, only an umem reference has
been added to the buffer pool struct. But later commits will add
other entities to it. These are going to be entities that are
different between different queue ids and netdevs even though the
umem is shared between them.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
Magnus, here are my minor changes to the mlx5 part (only mlx5 is
included in these two patches). I renamed pool to xsk_pool when used
outside of en/xsk and renamed umem.{c,h} to pool.{c,h}.

 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  19 +--
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |   5 +-
 .../mlx5/core/en/xsk/{umem.c => pool.c}       | 110 +++++++++---------
 .../ethernet/mellanox/mlx5/core/en/xsk/pool.h |  27 +++++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |  10 +-
 .../mellanox/mlx5/core/en/xsk/setup.c         |  12 +-
 .../mellanox/mlx5/core/en/xsk/setup.h         |   2 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  14 +--
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.h   |   6 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/umem.h |  29 -----
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |   2 +-
 .../mellanox/mlx5/core/en_fs_ethtool.c        |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  49 ++++----
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  16 +--
 15 files changed, 152 insertions(+), 153 deletions(-)
 rename drivers/net/ethernet/mellanox/mlx5/core/en/xsk/{umem.c => pool.c} (51%)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 1e7c7f10db6e..e4a02885a603 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -24,7 +24,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
 		en_selftest.o en/port.o en/monitor_stats.o en/health.o \
-		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/umem.o \
+		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \
 		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o
 
 #
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index c44669102626..d95296e28b97 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -441,7 +441,7 @@ struct mlx5e_xdpsq {
 	struct mlx5e_cq            cq;
 
 	/* read only */
-	struct xdp_umem           *umem;
+	struct xsk_buff_pool      *xsk_pool;
 	struct mlx5_wq_cyc         wq;
 	struct mlx5e_xdpsq_stats  *stats;
 	mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check;
@@ -603,7 +603,7 @@ struct mlx5e_rq {
 	struct page_pool      *page_pool;
 
 	/* AF_XDP zero-copy */
-	struct xdp_umem       *umem;
+	struct xsk_buff_pool  *xsk_pool;
 
 	struct work_struct     recover_work;
 
@@ -726,12 +726,13 @@ struct mlx5e_hv_vhca_stats_agent {
 #endif
 
 struct mlx5e_xsk {
-	/* UMEMs are stored separately from channels, because we don't want to
-	 * lose them when channels are recreated. The kernel also stores UMEMs,
-	 * but it doesn't distinguish between zero-copy and non-zero-copy UMEMs,
-	 * so rely on our mechanism.
+	/* XSK buffer pools are stored separately from channels,
+	 * because we don't want to lose them when channels are
+	 * recreated. The kernel also stores buffer pool, but it doesn't
+	 * distinguish between zero-copy and non-zero-copy UMEMs, so
+	 * rely on our mechanism.
 	 */
-	struct xdp_umem **umems;
+	struct xsk_buff_pool **pools;
 	u16 refcnt;
 	bool ever_used;
 };
@@ -923,7 +924,7 @@ struct mlx5e_xsk_param;
 struct mlx5e_rq_param;
 int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		  struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
-		  struct xdp_umem *umem, struct mlx5e_rq *rq);
+		  struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq);
 int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time);
 void mlx5e_deactivate_rq(struct mlx5e_rq *rq);
 void mlx5e_close_rq(struct mlx5e_rq *rq);
@@ -933,7 +934,7 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		     struct mlx5e_sq_param *param, struct mlx5e_icosq *sq);
 void mlx5e_close_icosq(struct mlx5e_icosq *sq);
 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
-		     struct mlx5e_sq_param *param, struct xdp_umem *umem,
+		     struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool,
 		     struct mlx5e_xdpsq *sq, bool is_redirect);
 void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index c9d308e91965..c2e06f5a092f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -446,7 +446,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
 	} while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->umem, xsk_frames);
+		xsk_umem_complete_tx(sq->xsk_pool->umem, xsk_frames);
 
 	sq->stats->cqes += i;
 
@@ -476,7 +476,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->umem, xsk_frames);
+		xsk_umem_complete_tx(sq->xsk_pool->umem, xsk_frames);
 }
 
 int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
@@ -561,4 +561,3 @@ void mlx5e_set_xmit_fp(struct mlx5e_xdpsq *sq, bool is_mpw)
 	sq->xmit_xdp_frame = is_mpw ?
 		mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame;
 }
-
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
similarity index 51%
rename from drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
rename to drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
index 331ca2b0f8a4..8ccd9203ee25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
@@ -1,31 +1,31 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
-/* Copyright (c) 2019 Mellanox Technologies. */
+/* Copyright (c) 2019-2020, Mellanox Technologies inc. All rights reserved. */
 
 #include <net/xdp_sock_drv.h>
-#include "umem.h"
+#include "pool.h"
 #include "setup.h"
 #include "en/params.h"
 
-static int mlx5e_xsk_map_umem(struct mlx5e_priv *priv,
-			      struct xdp_umem *umem)
+static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv,
+			      struct xsk_buff_pool *pool)
 {
 	struct device *dev = priv->mdev->device;
 
-	return xsk_buff_dma_map(umem, dev, 0);
+	return xsk_buff_dma_map(pool->umem, dev, 0);
 }
 
-static void mlx5e_xsk_unmap_umem(struct mlx5e_priv *priv,
-				 struct xdp_umem *umem)
+static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv,
+				 struct xsk_buff_pool *pool)
 {
-	return xsk_buff_dma_unmap(umem, 0);
+	return xsk_buff_dma_unmap(pool->umem, 0);
 }
 
-static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk)
+static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk)
 {
-	if (!xsk->umems) {
-		xsk->umems = kcalloc(MLX5E_MAX_NUM_CHANNELS,
-				     sizeof(*xsk->umems), GFP_KERNEL);
-		if (unlikely(!xsk->umems))
+	if (!xsk->pools) {
+		xsk->pools = kcalloc(MLX5E_MAX_NUM_CHANNELS,
+				     sizeof(*xsk->pools), GFP_KERNEL);
+		if (unlikely(!xsk->pools))
 			return -ENOMEM;
 	}
 
@@ -35,68 +35,68 @@ static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk)
 	return 0;
 }
 
-static void mlx5e_xsk_put_umems(struct mlx5e_xsk *xsk)
+static void mlx5e_xsk_put_pools(struct mlx5e_xsk *xsk)
 {
 	if (!--xsk->refcnt) {
-		kfree(xsk->umems);
-		xsk->umems = NULL;
+		kfree(xsk->pools);
+		xsk->pools = NULL;
 	}
 }
 
-static int mlx5e_xsk_add_umem(struct mlx5e_xsk *xsk, struct xdp_umem *umem, u16 ix)
+static int mlx5e_xsk_add_pool(struct mlx5e_xsk *xsk, struct xsk_buff_pool *pool, u16 ix)
 {
 	int err;
 
-	err = mlx5e_xsk_get_umems(xsk);
+	err = mlx5e_xsk_get_pools(xsk);
 	if (unlikely(err))
 		return err;
 
-	xsk->umems[ix] = umem;
+	xsk->pools[ix] = pool;
 	return 0;
 }
 
-static void mlx5e_xsk_remove_umem(struct mlx5e_xsk *xsk, u16 ix)
+static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix)
 {
-	xsk->umems[ix] = NULL;
+	xsk->pools[ix] = NULL;
 
-	mlx5e_xsk_put_umems(xsk);
+	mlx5e_xsk_put_pools(xsk);
 }
 
-static bool mlx5e_xsk_is_umem_sane(struct xdp_umem *umem)
+static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool)
 {
-	return xsk_umem_get_headroom(umem) <= 0xffff &&
-		xsk_umem_get_chunk_size(umem) <= 0xffff;
+	return xsk_umem_get_headroom(pool->umem) <= 0xffff &&
+		xsk_umem_get_chunk_size(pool->umem) <= 0xffff;
 }
 
-void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk)
+void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk)
 {
-	xsk->headroom = xsk_umem_get_headroom(umem);
-	xsk->chunk_size = xsk_umem_get_chunk_size(umem);
+	xsk->headroom = xsk_umem_get_headroom(pool->umem);
+	xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem);
 }
 
 static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
-				   struct xdp_umem *umem, u16 ix)
+				   struct xsk_buff_pool *pool, u16 ix)
 {
 	struct mlx5e_params *params = &priv->channels.params;
 	struct mlx5e_xsk_param xsk;
 	struct mlx5e_channel *c;
 	int err;
 
-	if (unlikely(mlx5e_xsk_get_umem(&priv->channels.params, &priv->xsk, ix)))
+	if (unlikely(mlx5e_xsk_get_pool(&priv->channels.params, &priv->xsk, ix)))
 		return -EBUSY;
 
-	if (unlikely(!mlx5e_xsk_is_umem_sane(umem)))
+	if (unlikely(!mlx5e_xsk_is_pool_sane(pool)))
 		return -EINVAL;
 
-	err = mlx5e_xsk_map_umem(priv, umem);
+	err = mlx5e_xsk_map_pool(priv, pool);
 	if (unlikely(err))
 		return err;
 
-	err = mlx5e_xsk_add_umem(&priv->xsk, umem, ix);
+	err = mlx5e_xsk_add_pool(&priv->xsk, pool, ix);
 	if (unlikely(err))
-		goto err_unmap_umem;
+		goto err_unmap_pool;
 
-	mlx5e_build_xsk_param(umem, &xsk);
+	mlx5e_build_xsk_param(pool, &xsk);
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
 		/* XSK objects will be created on open. */
@@ -112,9 +112,9 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 
 	c = priv->channels.c[ix];
 
-	err = mlx5e_open_xsk(priv, params, &xsk, umem, c);
+	err = mlx5e_open_xsk(priv, params, &xsk, pool, c);
 	if (unlikely(err))
-		goto err_remove_umem;
+		goto err_remove_pool;
 
 	mlx5e_activate_xsk(c);
 
@@ -132,11 +132,11 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 	mlx5e_deactivate_xsk(c);
 	mlx5e_close_xsk(c);
 
-err_remove_umem:
-	mlx5e_xsk_remove_umem(&priv->xsk, ix);
+err_remove_pool:
+	mlx5e_xsk_remove_pool(&priv->xsk, ix);
 
-err_unmap_umem:
-	mlx5e_xsk_unmap_umem(priv, umem);
+err_unmap_pool:
+	mlx5e_xsk_unmap_pool(priv, pool);
 
 	return err;
 
@@ -146,7 +146,7 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 	 */
 	if (!mlx5e_validate_xsk_param(params, &xsk, priv->mdev)) {
 		err = -EINVAL;
-		goto err_remove_umem;
+		goto err_remove_pool;
 	}
 
 	return 0;
@@ -154,45 +154,45 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 
 static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix)
 {
-	struct xdp_umem *umem = mlx5e_xsk_get_umem(&priv->channels.params,
+	struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&priv->channels.params,
 						   &priv->xsk, ix);
 	struct mlx5e_channel *c;
 
-	if (unlikely(!umem))
+	if (unlikely(!pool))
 		return -EINVAL;
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
-		goto remove_umem;
+		goto remove_pool;
 
 	/* XSK RQ and SQ are only created if XDP program is set. */
 	if (!priv->channels.params.xdp_prog)
-		goto remove_umem;
+		goto remove_pool;
 
 	c = priv->channels.c[ix];
 	mlx5e_xsk_redirect_rqt_to_drop(priv, ix);
 	mlx5e_deactivate_xsk(c);
 	mlx5e_close_xsk(c);
 
-remove_umem:
-	mlx5e_xsk_remove_umem(&priv->xsk, ix);
-	mlx5e_xsk_unmap_umem(priv, umem);
+remove_pool:
+	mlx5e_xsk_remove_pool(&priv->xsk, ix);
+	mlx5e_xsk_unmap_pool(priv, pool);
 
 	return 0;
 }
 
-static int mlx5e_xsk_enable_umem(struct mlx5e_priv *priv, struct xdp_umem *umem,
+static int mlx5e_xsk_enable_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool,
 				 u16 ix)
 {
 	int err;
 
 	mutex_lock(&priv->state_lock);
-	err = mlx5e_xsk_enable_locked(priv, umem, ix);
+	err = mlx5e_xsk_enable_locked(priv, pool, ix);
 	mutex_unlock(&priv->state_lock);
 
 	return err;
 }
 
-static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix)
+static int mlx5e_xsk_disable_pool(struct mlx5e_priv *priv, u16 ix)
 {
 	int err;
 
@@ -203,7 +203,7 @@ static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix)
 	return err;
 }
 
-int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid)
+int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	struct mlx5e_params *params = &priv->channels.params;
@@ -212,6 +212,6 @@ int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid)
 	if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
 		return -EINVAL;
 
-	return umem ? mlx5e_xsk_enable_umem(priv, umem, ix) :
-		      mlx5e_xsk_disable_umem(priv, ix);
+	return pool ? mlx5e_xsk_enable_pool(priv, pool, ix) :
+		      mlx5e_xsk_disable_pool(priv, ix);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.h
new file mode 100644
index 000000000000..dca0010a0866
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019-2020, Mellanox Technologies inc. All rights reserved. */
+
+#ifndef __MLX5_EN_XSK_POOL_H__
+#define __MLX5_EN_XSK_POOL_H__
+
+#include "en.h"
+
+static inline struct xsk_buff_pool *mlx5e_xsk_get_pool(struct mlx5e_params *params,
+						       struct mlx5e_xsk *xsk, u16 ix)
+{
+	if (!xsk || !xsk->pools)
+		return NULL;
+
+	if (unlikely(ix >= params->num_channels))
+		return NULL;
+
+	return xsk->pools[ix];
+}
+
+struct mlx5e_xsk_param;
+void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk);
+
+/* .ndo_bpf callback. */
+int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid);
+
+#endif /* __MLX5_EN_XSK_POOL_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index d147b2f13b54..3dd056a11bae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -19,10 +19,10 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 					      struct mlx5e_wqe_frag_info *wi,
 					      u32 cqe_bcnt);
 
-static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
+static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
 					    struct mlx5e_dma_info *dma_info)
 {
-	dma_info->xsk = xsk_buff_alloc(rq->umem);
+	dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem);
 	if (!dma_info->xsk)
 		return -ENOMEM;
 
@@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 
 static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err)
 {
-	if (!xsk_umem_uses_need_wakeup(rq->umem))
+	if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem))
 		return alloc_err;
 
 	if (unlikely(alloc_err))
-		xsk_set_rx_need_wakeup(rq->umem);
+		xsk_set_rx_need_wakeup(rq->xsk_pool->umem);
 	else
-		xsk_clear_rx_need_wakeup(rq->umem);
+		xsk_clear_rx_need_wakeup(rq->xsk_pool->umem);
 
 	return false;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index cc46414773b5..a0b9dff33be9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -44,7 +44,7 @@ static void mlx5e_build_xsk_cparam(struct mlx5e_priv *priv,
 }
 
 int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
-		   struct mlx5e_xsk_param *xsk, struct xdp_umem *umem,
+		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
 		   struct mlx5e_channel *c)
 {
 	struct mlx5e_channel_param *cparam;
@@ -63,7 +63,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 	if (unlikely(err))
 		goto err_free_cparam;
 
-	err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq);
+	err = mlx5e_open_rq(c, params, &cparam->rq, xsk, pool, &c->xskrq);
 	if (unlikely(err))
 		goto err_close_rx_cq;
 
@@ -71,13 +71,13 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 	if (unlikely(err))
 		goto err_close_rq;
 
-	/* Create a separate SQ, so that when the UMEM is disabled, we could
+	/* Create a separate SQ, so that when the buff pool is disabled, we could
 	 * close this SQ safely and stop receiving CQEs. In other case, e.g., if
-	 * the XDPSQ was used instead, we might run into trouble when the UMEM
+	 * the XDPSQ was used instead, we might run into trouble when the buff pool
 	 * is disabled and then reenabled, but the SQ continues receiving CQEs
-	 * from the old UMEM.
+	 * from the old buff pool.
 	 */
-	err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true);
+	err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, pool, &c->xsksq, true);
 	if (unlikely(err))
 		goto err_close_tx_cq;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
index 0dd11b81c046..ca20f1ff5e39 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h
@@ -12,7 +12,7 @@ bool mlx5e_validate_xsk_param(struct mlx5e_params *params,
 			      struct mlx5e_xsk_param *xsk,
 			      struct mlx5_core_dev *mdev);
 int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
-		   struct mlx5e_xsk_param *xsk, struct xdp_umem *umem,
+		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
 		   struct mlx5e_channel *c);
 void mlx5e_close_xsk(struct mlx5e_channel *c);
 void mlx5e_activate_xsk(struct mlx5e_channel *c);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index e0b3c61af93e..e46ca8620ea9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -2,7 +2,7 @@
 /* Copyright (c) 2019 Mellanox Technologies. */
 
 #include "tx.h"
-#include "umem.h"
+#include "pool.h"
 #include "en/xdp.h"
 #include "en/params.h"
 #include <net/xdp_sock_drv.h>
@@ -66,7 +66,7 @@ static void mlx5e_xsk_tx_post_err(struct mlx5e_xdpsq *sq,
 
 bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 {
-	struct xdp_umem *umem = sq->umem;
+	struct xsk_buff_pool *pool = sq->xsk_pool;
 	struct mlx5e_xdp_info xdpi;
 	struct mlx5e_xdp_xmit_data xdptxd;
 	bool work_done = true;
@@ -83,7 +83,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(umem, &desc)) {
+		if (!xsk_umem_consume_tx(pool->umem, &desc)) {
 			/* TX will get stuck until something wakes it up by
 			 * triggering NAPI. Currently it's expected that the
 			 * application calls sendto() if there are consumed, but
@@ -92,11 +92,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		xdptxd.dma_addr = xsk_buff_raw_get_dma(umem, desc.addr);
-		xdptxd.data = xsk_buff_raw_get_data(umem, desc.addr);
+		xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr);
+		xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr);
 		xdptxd.len = desc.len;
 
-		xsk_buff_raw_dma_sync_for_device(umem, xdptxd.dma_addr, xdptxd.len);
+		xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len);
 
 		if (unlikely(!sq->xmit_xdp_frame(sq, &xdptxd, &xdpi, check_result))) {
 			if (sq->mpwqe.wqe)
@@ -113,7 +113,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			mlx5e_xdp_mpwqe_complete(sq);
 		mlx5e_xmit_xdp_doorbell(sq);
 
-		xsk_umem_consume_tx_done(umem);
+		xsk_umem_consume_tx_done(pool->umem);
 	}
 
 	return !(budget && work_done);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
index 39fa0a705856..ddb61d5bc2db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
@@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget);
 
 static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq)
 {
-	if (!xsk_umem_uses_need_wakeup(sq->umem))
+	if (!xsk_umem_uses_need_wakeup(sq->xsk_pool->umem))
 		return;
 
 	if (sq->pc != sq->cc)
-		xsk_clear_tx_need_wakeup(sq->umem);
+		xsk_clear_tx_need_wakeup(sq->xsk_pool->umem);
 	else
-		xsk_set_tx_need_wakeup(sq->umem);
+		xsk_set_tx_need_wakeup(sq->xsk_pool->umem);
 }
 
 #endif /* __MLX5_EN_XSK_TX_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
deleted file mode 100644
index bada94973586..000000000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h
+++ /dev/null
@@ -1,29 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
-/* Copyright (c) 2019 Mellanox Technologies. */
-
-#ifndef __MLX5_EN_XSK_UMEM_H__
-#define __MLX5_EN_XSK_UMEM_H__
-
-#include "en.h"
-
-static inline struct xdp_umem *mlx5e_xsk_get_umem(struct mlx5e_params *params,
-						  struct mlx5e_xsk *xsk, u16 ix)
-{
-	if (!xsk || !xsk->umems)
-		return NULL;
-
-	if (unlikely(ix >= params->num_channels))
-		return NULL;
-
-	return xsk->umems[ix];
-}
-
-struct mlx5e_xsk_param;
-void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk);
-
-/* .ndo_bpf callback. */
-int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid);
-
-int mlx5e_xsk_resize_reuseq(struct xdp_umem *umem, u32 nentries);
-
-#endif /* __MLX5_EN_XSK_UMEM_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index ec5658bbe3c5..73321cd86246 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -32,7 +32,7 @@
 
 #include "en.h"
 #include "en/port.h"
-#include "en/xsk/umem.h"
+#include "en/xsk/pool.h"
 #include "lib/clock.h"
 
 void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index 83c9b2bbc4af..b416a8ee2eed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -33,7 +33,7 @@
 #include <linux/mlx5/fs.h>
 #include "en.h"
 #include "en/params.h"
-#include "en/xsk/umem.h"
+#include "en/xsk/pool.h"
 
 struct mlx5e_ethtool_rule {
 	struct list_head             list;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b04c8572adea..42e80165ca6c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -58,7 +58,7 @@
 #include "en/monitor_stats.h"
 #include "en/health.h"
 #include "en/params.h"
-#include "en/xsk/umem.h"
+#include "en/xsk/pool.h"
 #include "en/xsk/setup.h"
 #include "en/xsk/rx.h"
 #include "en/xsk/tx.h"
@@ -365,7 +365,7 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work)
 static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 			  struct mlx5e_params *params,
 			  struct mlx5e_xsk_param *xsk,
-			  struct xdp_umem *umem,
+			  struct xsk_buff_pool *xsk_pool,
 			  struct mlx5e_rq_param *rqp,
 			  struct mlx5e_rq *rq)
 {
@@ -391,9 +391,9 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	rq->mdev    = mdev;
 	rq->hw_mtu  = MLX5E_SW2HW_MTU(params, params->sw_mtu);
 	rq->xdpsq   = &c->rq_xdpsq;
-	rq->umem    = umem;
+	rq->xsk_pool = xsk_pool;
 
-	if (rq->umem)
+	if (rq->xsk_pool)
 		rq->stats = &c->priv->channel_stats[c->ix].xskrq;
 	else
 		rq->stats = &c->priv->channel_stats[c->ix].rq;
@@ -518,7 +518,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	if (xsk) {
 		err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
 						 MEM_TYPE_XSK_BUFF_POOL, NULL);
-		xsk_buff_set_rxq_info(rq->umem, &rq->xdp_rxq);
+		xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq);
 	} else {
 		/* Create a page_pool and register it with rxq */
 		pp_params.order     = 0;
@@ -857,11 +857,11 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
 
 int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		  struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
-		  struct xdp_umem *umem, struct mlx5e_rq *rq)
+		  struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq)
 {
 	int err;
 
-	err = mlx5e_alloc_rq(c, params, xsk, umem, param, rq);
+	err = mlx5e_alloc_rq(c, params, xsk, xsk_pool, param, rq);
 	if (err)
 		return err;
 
@@ -966,7 +966,7 @@ static int mlx5e_alloc_xdpsq_db(struct mlx5e_xdpsq *sq, int numa)
 
 static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
 			     struct mlx5e_params *params,
-			     struct xdp_umem *umem,
+			     struct xsk_buff_pool *xsk_pool,
 			     struct mlx5e_sq_param *param,
 			     struct mlx5e_xdpsq *sq,
 			     bool is_redirect)
@@ -982,9 +982,9 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
 	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
 	sq->min_inline_mode = params->tx_min_inline_mode;
 	sq->hw_mtu    = MLX5E_SW2HW_MTU(params, params->sw_mtu);
-	sq->umem      = umem;
+	sq->xsk_pool  = xsk_pool;
 
-	sq->stats = sq->umem ?
+	sq->stats = sq->xsk_pool ?
 		&c->priv->channel_stats[c->ix].xsksq :
 		is_redirect ?
 			&c->priv->channel_stats[c->ix].xdpsq :
@@ -1449,13 +1449,13 @@ void mlx5e_close_icosq(struct mlx5e_icosq *sq)
 }
 
 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
-		     struct mlx5e_sq_param *param, struct xdp_umem *umem,
+		     struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool,
 		     struct mlx5e_xdpsq *sq, bool is_redirect)
 {
 	struct mlx5e_create_sq_param csp = {};
 	int err;
 
-	err = mlx5e_alloc_xdpsq(c, params, umem, param, sq, is_redirect);
+	err = mlx5e_alloc_xdpsq(c, params, xsk_pool, param, sq, is_redirect);
 	if (err)
 		return err;
 
@@ -1948,7 +1948,7 @@ static u8 mlx5e_enumerate_lag_port(struct mlx5_core_dev *mdev, int ix)
 static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 			      struct mlx5e_params *params,
 			      struct mlx5e_channel_param *cparam,
-			      struct xdp_umem *umem,
+			      struct xsk_buff_pool *xsk_pool,
 			      struct mlx5e_channel **cp)
 {
 	int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
@@ -1987,9 +1987,9 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 	if (unlikely(err))
 		goto err_napi_del;
 
-	if (umem) {
-		mlx5e_build_xsk_param(umem, &xsk);
-		err = mlx5e_open_xsk(priv, params, &xsk, umem, c);
+	if (xsk_pool) {
+		mlx5e_build_xsk_param(xsk_pool, &xsk);
+		err = mlx5e_open_xsk(priv, params, &xsk, xsk_pool, c);
 		if (unlikely(err))
 			goto err_close_queues;
 	}
@@ -2350,12 +2350,12 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
 
 	mlx5e_build_channel_param(priv, &chs->params, cparam);
 	for (i = 0; i < chs->num; i++) {
-		struct xdp_umem *umem = NULL;
+		struct xsk_buff_pool *xsk_pool = NULL;
 
 		if (chs->params.xdp_prog)
-			umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, i);
+			xsk_pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, i);
 
-		err = mlx5e_open_channel(priv, i, &chs->params, cparam, umem, &chs->c[i]);
+		err = mlx5e_open_channel(priv, i, &chs->params, cparam, xsk_pool, &chs->c[i]);
 		if (err)
 			goto err_close_channels;
 	}
@@ -3917,13 +3917,14 @@ static bool mlx5e_xsk_validate_mtu(struct net_device *netdev,
 	u16 ix;
 
 	for (ix = 0; ix < chs->params.num_channels; ix++) {
-		struct xdp_umem *umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, ix);
+		struct xsk_buff_pool *xsk_pool =
+			mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, ix);
 		struct mlx5e_xsk_param xsk;
 
-		if (!umem)
+		if (!xsk_pool)
 			continue;
 
-		mlx5e_build_xsk_param(umem, &xsk);
+		mlx5e_build_xsk_param(xsk_pool, &xsk);
 
 		if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) {
 			u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk);
@@ -4543,8 +4544,8 @@ static int mlx5e_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	case XDP_QUERY_PROG:
 		xdp->prog_id = mlx5e_xdp_query(dev);
 		return 0;
-	case XDP_SETUP_XSK_UMEM:
-		return mlx5e_xsk_setup_umem(dev, xdp->xsk.umem,
+	case XDP_SETUP_XSK_POOL:
+		return mlx5e_xsk_setup_pool(dev, xdp->xsk.pool,
 					    xdp->xsk.queue_id);
 	default:
 		return -EINVAL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 350f9c54e508..4f9a1d6e54fd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -264,8 +264,8 @@ static inline int mlx5e_page_alloc_pool(struct mlx5e_rq *rq,
 static inline int mlx5e_page_alloc(struct mlx5e_rq *rq,
 				   struct mlx5e_dma_info *dma_info)
 {
-	if (rq->umem)
-		return mlx5e_xsk_page_alloc_umem(rq, dma_info);
+	if (rq->xsk_pool)
+		return mlx5e_xsk_page_alloc_pool(rq, dma_info);
 	else
 		return mlx5e_page_alloc_pool(rq, dma_info);
 }
@@ -296,7 +296,7 @@ static inline void mlx5e_page_release(struct mlx5e_rq *rq,
 				      struct mlx5e_dma_info *dma_info,
 				      bool recycle)
 {
-	if (rq->umem)
+	if (rq->xsk_pool)
 		/* The `recycle` parameter is ignored, and the page is always
 		 * put into the Reuse Ring, because there is no way to return
 		 * the page to the userspace when the interface goes down.
@@ -383,14 +383,14 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
 	int err;
 	int i;
 
-	if (rq->umem) {
+	if (rq->xsk_pool) {
 		int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags;
 
 		/* Check in advance that we have enough frames, instead of
 		 * allocating one-by-one, failing and moving frames to the
 		 * Reuse Ring.
 		 */
-		if (unlikely(!xsk_buff_can_alloc(rq->umem, pages_desired)))
+		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired)))
 			return -ENOMEM;
 	}
 
@@ -488,8 +488,8 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	/* Check in advance that we have enough frames, instead of allocating
 	 * one-by-one, failing and moving frames to the Reuse Ring.
 	 */
-	if (rq->umem &&
-	    unlikely(!xsk_buff_can_alloc(rq->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
+	if (rq->xsk_pool &&
+	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
 		err = -ENOMEM;
 		goto err;
 	}
@@ -737,7 +737,7 @@ bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
 	 * the driver when it refills the Fill Ring.
 	 * 2. Otherwise, busy poll by rescheduling the NAPI poll.
 	 */
-	if (unlikely(alloc_err == -ENOMEM && rq->umem))
+	if (unlikely(alloc_err == -ENOMEM && rq->xsk_pool))
 		return true;
 
 	return false;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces
  2020-07-09 14:56     ` [PATCH 1/2] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Maxim Mikityanskiy
@ 2020-07-09 14:56       ` Maxim Mikityanskiy
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Mikityanskiy @ 2020-07-09 14:56 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Network Development, Jonathan Lemon, bpf, jeffrey.t.kirsher,
	Fijalkowski Maciej, Maciej Fijalkowski, cristian.dumitrescu,
	Maxim Mikityanskiy

From: Magnus Karlsson <magnus.karlsson@intel.com>

Rename the AF_XDP zero-copy driver interface functions to better
reflect what they do after the replacement of umems with buffer
pools in the previous commit. Mostly it is about replacing the
umem name from the function names with xsk_buff and also have
them take the a buffer pool pointer instead of a umem. The
various ring functions have also been renamed in the process so
that they have the same naming convention as the internal
functions in xsk_queue.h. This so that it will be clearer what
they do and also for consistency.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c     |  4 ++--
 .../net/ethernet/mellanox/mlx5/core/en/xsk/pool.c    | 12 ++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h  |  8 ++++----
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c  | 10 +++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h  |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c      |  4 ++--
 7 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index c2e06f5a092f..4385052d8c5c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -446,7 +446,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
 	} while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->xsk_pool->umem, xsk_frames);
+		xsk_tx_completed(sq->xsk_pool, xsk_frames);
 
 	sq->stats->cqes += i;
 
@@ -476,7 +476,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
 	}
 
 	if (xsk_frames)
-		xsk_umem_complete_tx(sq->xsk_pool->umem, xsk_frames);
+		xsk_tx_completed(sq->xsk_pool, xsk_frames);
 }
 
 int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
index 8ccd9203ee25..3503e7711178 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
@@ -11,13 +11,13 @@ static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv,
 {
 	struct device *dev = priv->mdev->device;
 
-	return xsk_buff_dma_map(pool->umem, dev, 0);
+	return xsk_pool_dma_map(pool, dev, 0);
 }
 
 static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv,
 				 struct xsk_buff_pool *pool)
 {
-	return xsk_buff_dma_unmap(pool->umem, 0);
+	return xsk_pool_dma_unmap(pool, 0);
 }
 
 static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk)
@@ -64,14 +64,14 @@ static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix)
 
 static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool)
 {
-	return xsk_umem_get_headroom(pool->umem) <= 0xffff &&
-		xsk_umem_get_chunk_size(pool->umem) <= 0xffff;
+	return xsk_pool_get_headroom(pool) <= 0xffff &&
+		xsk_pool_get_chunk_size(pool) <= 0xffff;
 }
 
 void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk)
 {
-	xsk->headroom = xsk_umem_get_headroom(pool->umem);
-	xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem);
+	xsk->headroom = xsk_pool_get_headroom(pool);
+	xsk->chunk_size = xsk_pool_get_chunk_size(pool);
 }
 
 static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index 3dd056a11bae..7f88ccf67fdd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -22,7 +22,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
 					    struct mlx5e_dma_info *dma_info)
 {
-	dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem);
+	dma_info->xsk = xsk_buff_alloc(rq->xsk_pool);
 	if (!dma_info->xsk)
 		return -ENOMEM;
 
@@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
 
 static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err)
 {
-	if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem))
+	if (!xsk_uses_need_wakeup(rq->xsk_pool))
 		return alloc_err;
 
 	if (unlikely(alloc_err))
-		xsk_set_rx_need_wakeup(rq->xsk_pool->umem);
+		xsk_set_rx_need_wakeup(rq->xsk_pool);
 	else
-		xsk_clear_rx_need_wakeup(rq->xsk_pool->umem);
+		xsk_clear_rx_need_wakeup(rq->xsk_pool);
 
 	return false;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index e46ca8620ea9..5d8b5fe2161c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -83,7 +83,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		if (!xsk_umem_consume_tx(pool->umem, &desc)) {
+		if (!xsk_tx_peek_desc(pool, &desc)) {
 			/* TX will get stuck until something wakes it up by
 			 * triggering NAPI. Currently it's expected that the
 			 * application calls sendto() if there are consumed, but
@@ -92,11 +92,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			break;
 		}
 
-		xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr);
-		xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr);
+		xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr);
+		xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr);
 		xdptxd.len = desc.len;
 
-		xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len);
+		xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len);
 
 		if (unlikely(!sq->xmit_xdp_frame(sq, &xdptxd, &xdpi, check_result))) {
 			if (sq->mpwqe.wqe)
@@ -113,7 +113,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
 			mlx5e_xdp_mpwqe_complete(sq);
 		mlx5e_xmit_xdp_doorbell(sq);
 
-		xsk_umem_consume_tx_done(pool->umem);
+		xsk_tx_release(pool);
 	}
 
 	return !(budget && work_done);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
index ddb61d5bc2db..a05085035f23 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h
@@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget);
 
 static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq)
 {
-	if (!xsk_umem_uses_need_wakeup(sq->xsk_pool->umem))
+	if (!xsk_uses_need_wakeup(sq->xsk_pool))
 		return;
 
 	if (sq->pc != sq->cc)
-		xsk_clear_tx_need_wakeup(sq->xsk_pool->umem);
+		xsk_clear_tx_need_wakeup(sq->xsk_pool);
 	else
-		xsk_set_tx_need_wakeup(sq->xsk_pool->umem);
+		xsk_set_tx_need_wakeup(sq->xsk_pool);
 }
 
 #endif /* __MLX5_EN_XSK_TX_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 42e80165ca6c..5c1c15bab6ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -518,7 +518,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	if (xsk) {
 		err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
 						 MEM_TYPE_XSK_BUFF_POOL, NULL);
-		xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq);
+		xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq);
 	} else {
 		/* Create a page_pool and register it with rxq */
 		pp_params.order     = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 4f9a1d6e54fd..e05422190d29 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -390,7 +390,7 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
 		 * allocating one-by-one, failing and moving frames to the
 		 * Reuse Ring.
 		 */
-		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired)))
+		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired)))
 			return -ENOMEM;
 	}
 
@@ -489,7 +489,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	 * one-by-one, failing and moving frames to the Reuse Ring.
 	 */
 	if (rq->xsk_pool &&
-	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) {
+	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool, MLX5_MPWRQ_PAGES_PER_WQE))) {
 		err = -ENOMEM;
 		goto err;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 11/14] xsk: add shared umem support between devices
  2020-07-02 12:19 ` [PATCH bpf-next 11/14] xsk: add shared umem support between devices Magnus Karlsson
@ 2020-07-09 15:20   ` Maxim Mikityanskiy
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Mikityanskiy @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jonathan.lemon, bpf,
	jeffrey.t.kirsher, maciej.fijalkowski, maciejromanfijalkowski,
	cristian.dumitrescu

On 2020-07-02 15:19, Magnus Karlsson wrote:
> Add support to share a umem between different devices. This mode
> can be invoked with the XDP_SHARED_UMEM bind flag. Previously,
> sharing was only supported within the same device. Note that when
> sharing a umem between devices, just as in the case of sharing a
> umem between queue ids, you need to create a fill ring and a
> completion ring and tie them to the socket (with two setsockopts,
> one for each ring) before you do the bind with the
> XDP_SHARED_UMEM flag. This so that the single-producer
> single-consumer semantics of the rings can be upheld.

I also wonder what performance numbers you see when doing forwarding 
with xsk_fwd between two queues of the same netdev and between two 
netdevs. Could you share (compared to some baseline like xdpsock -l)?

> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>   net/xdp/xsk.c | 11 ++++-------
>   1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 1abc222..b240221 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -692,14 +692,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   			sockfd_put(sock);
>   			goto out_unlock;
>   		}
> -		if (umem_xs->dev != dev) {
> -			err = -EINVAL;
> -			sockfd_put(sock);
> -			goto out_unlock;
> -		}
>   
> -		if (umem_xs->queue_id != qid) {
> -			/* Share the umem with another socket on another qid */
> +		if (umem_xs->queue_id != qid || umem_xs->dev != dev) {
> +			/* Share the umem with another socket on another qid
> +			 * and/or device.
> +			 */
>   			new_pool = xp_assign_umem(xs->pool, umem_xs->umem);
>   			if (!new_pool) {
>   				sockfd_put(sock);
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-07-09 15:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 12:18 [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 01/14] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Magnus Karlsson
2020-07-08 15:00   ` Maxim Mikityanskiy
2020-07-02 12:19 ` [PATCH bpf-next 02/14] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 03/14] xsk: create and free context independently from umem Magnus Karlsson
2020-07-08 15:00   ` Maxim Mikityanskiy
2020-07-09  6:47     ` Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 04/14] xsk: move fill and completion rings to buffer pool Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 05/14] xsk: move queue_id, dev and need_wakeup to context Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 06/14] xsk: move xsk_tx_list and its lock to buffer pool Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 07/14] xsk: move addrs from buffer pool to umem Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 08/14] xsk: net: enable sharing of dma mappings Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 09/14] xsk: rearrange internal structs for better performance Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 10/14] xsk: add shared umem support between queue ids Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 11/14] xsk: add shared umem support between devices Magnus Karlsson
2020-07-09 15:20   ` Maxim Mikityanskiy
2020-07-02 12:19 ` [PATCH bpf-next 12/14] libbpf: support shared umems between queues and devices Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 13/14] samples/bpf: add new sample xsk_fwd.c Magnus Karlsson
2020-07-02 12:19 ` [PATCH bpf-next 14/14] xsk: documentation for XDP_SHARED_UMEM between queues and netdevs Magnus Karlsson
2020-07-06 18:39 ` [PATCH bpf-next 00/14] xsk: support shared umems between devices and queues Daniel Borkmann
2020-07-07 10:37   ` Maxim Mikityanskiy
2020-07-08 15:00 ` Maxim Mikityanskiy
2020-07-09  6:54   ` Magnus Karlsson
2020-07-09 14:56     ` [PATCH 1/2] xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of umem Maxim Mikityanskiy
2020-07-09 14:56       ` [PATCH 2/2] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces Maxim Mikityanskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).