All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30
@ 2022-09-30 16:28 Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 01/16] net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup Saeed Mahameed
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Saeed Mahameed <saeedm@nvidia.com>

The gist of this 4 part series is in this patchset's last patch

This series contains performance optimizations. XSK starts using the
batching allocator, and XSK data path gets separated from the regular
RX, allowing to drop some branches not relevant for non-XSK use cases.
Some minor optimizations for indirect calls and need_wakeup are also
included.

Other than that, this series adds a few features to the mlx5e
implementation of XSK:

1. XDP metadata support on XSK RQs.

2. RSS contexts support for XSK RQs.

3. Some other optimizations 

4. Last but not least, change the queuing scheme, so that XSK RQs no longer
use higher indices, but replace the regular RQs.

Maxim Says:
==========

In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.

However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.

This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.

As the result of this transition:

1. It's possible to use RSS contexts over XSK RQs.

2. It's possible to dedicate all queues to XSK.

3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.

4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.

==========

Part 4 will include some final xsk optimizations and minor improvements

part 1: https://lore.kernel.org/netdev/20220927203611.244301-1-saeed@kernel.org/
part 2: https://lore.kernel.org/netdev/20220929072156.93299-1-saeed@kernel.org/

Maxim Mikityanskiy (16):
  net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup
  net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup
  net/mlx5e: Introduce wqe_index_mask for legacy RQ
  net/mlx5e: Make the wqe_index_mask calculation more exact
  net/mlx5e: Use partial batches in legacy RQ
  net/mlx5e: xsk: Use partial batches in legacy RQ with XSK
  net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs
  net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ
  net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ
  net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ
  net/mlx5e: Use non-XSK page allocator in SHAMPO
  net/mlx5e: Call mlx5e_page_release_dynamic directly where possible
  net/mlx5e: Optimize RQ page deallocation
  net/mlx5e: xsk: Support XDP metadata on XSK RQs
  net/mlx5e: Introduce the mlx5e_flush_rq function
  net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues

 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  10 +-
 .../ethernet/mellanox/mlx5/core/en/channels.c |  29 ++-
 .../ethernet/mellanox/mlx5/core/en/channels.h |   3 +-
 .../ethernet/mellanox/mlx5/core/en/params.c   |  44 +++-
 .../ethernet/mellanox/mlx5/core/en/params.h   |  32 ---
 .../mellanox/mlx5/core/en/reporter_rx.c       |  23 +-
 .../ethernet/mellanox/mlx5/core/en/rx_res.c   | 118 ++--------
 .../ethernet/mellanox/mlx5/core/en/rx_res.h   |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h |   7 +
 .../ethernet/mellanox/mlx5/core/en/xsk/pool.c |  17 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   | 176 ++++++++++++++-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |   3 +
 .../mellanox/mlx5/core/en/xsk/setup.c         |   4 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  12 +-
 .../mellanox/mlx5/core/en_fs_ethtool.c        |  13 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  52 +++--
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   3 -
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 211 +++++++-----------
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib.c |   1 -
 .../mellanox/mlx5/core/ipoib/ipoib_vlan.c     |   1 -
 drivers/net/ethernet/mellanox/mlx5/core/wq.h  |   2 +-
 21 files changed, 385 insertions(+), 385 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 01/16] net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 02/16] net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup Saeed Mahameed
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

mlx5e_xsk_wakeup triggers an IRQ by posting a NOP to async_icosq, taking
a spinlock to protect from concurrent access. There is already a
function that does the same: mlx5e_trigger_napi_icosq. Use this function
in mlx5e_xsk_wakeup.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 4902ef74fedf..1203d7d5f9bd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -36,9 +36,7 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 		if (test_and_set_bit(MLX5E_SQ_STATE_PENDING_XSK_TX, &c->async_icosq.state))
 			return 0;
 
-		spin_lock_bh(&c->async_icosq_lock);
-		mlx5e_trigger_irq(&c->async_icosq);
-		spin_unlock_bh(&c->async_icosq_lock);
+		mlx5e_trigger_napi_icosq(c);
 	}
 
 	return 0;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 02/16] net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 01/16] net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 03/16] net/mlx5e: Introduce wqe_index_mask for legacy RQ Saeed Mahameed
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates
that XSK queues are open, but not necessarily activated. This check is
not very useful, because:

0. Both XSK setup and netdev state transitions take the same state_lock
mutex, so they can't happen at the same time.

1. If the netdev is up, xsk_is_bound can return true only when
MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel.
mlx5e_xsk_wakeup is only called when xsk_is_bound is true.

2. If the XSK socket is bound, and the netdev is going up or down,
mlx5e_xsk_wakeup can take one of two branches, depending on the return
value of napi_if_scheduled_mark_missed:

2.1. True means one of two things: either NAPI was enabled at this
point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was
disabled, and nothing really happened.

2.2. False means that NAPI was enabled by this point, which also implies
MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains
a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this
flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels.

As checking this flag doesn't cut any flows, remove the check.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    | 3 ---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index 5129b9bf534f..d7dfc7d2c058 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -154,7 +154,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 void mlx5e_close_xsk(struct mlx5e_channel *c)
 {
 	clear_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
-	synchronize_net(); /* Sync with the XSK wakeup and with NAPI. */
+	synchronize_net(); /* Sync with NAPI. */
 
 	mlx5e_close_rq(&c->xskrq);
 	mlx5e_close_cq(&c->xskrq.cq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 1203d7d5f9bd..c856fc3f197e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -22,9 +22,6 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 
 	c = priv->channels.c[ix];
 
-	if (unlikely(!test_bit(MLX5E_CHANNEL_STATE_XSK, c->state)))
-		return -EINVAL;
-
 	if (!napi_if_scheduled_mark_missed(&c->napi)) {
 		/* To avoid WQE overrun, don't post a NOP if async_icosq is not
 		 * active and not polled by NAPI. Return 0, because the upcoming
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 03/16] net/mlx5e: Introduce wqe_index_mask for legacy RQ
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 01/16] net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 02/16] net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 04/16] net/mlx5e: Make the wqe_index_mask calculation more exact Saeed Mahameed
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

When fragments of different WQEs share the same page, mlx5e_post_rx_wqes
must wait until the old WQE stops using the page, only then the new WQE
can allocate the new page. Essentially, it means that if WQE index i is
still in use, the allocation must stop before `i % bulk`, where bulk is
the number of WQEs that may share the same page.

As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the
new wqe_index_mask field will be equal to `bulk - 1`.

At the same time, wqe_bulk remains for optimization purposes and stores
`max(bulk, 8)`, which allows to skip the allocation until we have at
least 8 WQEs free.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
 .../ethernet/mellanox/mlx5/core/en/params.c   | 25 ++++++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 95a232fb2127..8e174a7f7c25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -660,6 +660,7 @@ struct mlx5e_rq_frags_info {
 	u8 num_frags;
 	u8 log_num_frags;
 	u8 wqe_bulk;
+	u8 wqe_index_mask;
 };
 
 struct mlx5e_dma_info {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 68bc66cbd8a5..49306a68b3b5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -586,7 +586,14 @@ static int mlx5e_build_rq_frags_info(struct mlx5_core_dev *mdev,
 		info->arr[0].frag_size = byte_count;
 		info->arr[0].frag_stride = frag_stride;
 		info->num_frags = 1;
-		info->wqe_bulk = PAGE_SIZE / frag_stride;
+
+		/* N WQEs share the same page, N = PAGE_SIZE / frag_stride. The
+		 * first WQE in the page is responsible for allocation of this
+		 * page, this WQE's index is k*N. If WQEs [k*N+1; k*N+N-1] are
+		 * still not completed, the allocation must stop before k*N.
+		 */
+		info->wqe_index_mask = (PAGE_SIZE / frag_stride) - 1;
+
 		goto out;
 	}
 
@@ -635,11 +642,21 @@ static int mlx5e_build_rq_frags_info(struct mlx5_core_dev *mdev,
 		i++;
 	}
 	info->num_frags = i;
-	/* number of different wqes sharing a page */
-	info->wqe_bulk = 1 + (info->num_frags % 2);
+
+	/* The last fragment of WQE with index 2*N may share the page with the
+	 * first fragment of WQE with index 2*N+1 in certain cases. If WQE 2*N+1
+	 * is not completed yet, WQE 2*N must not be allocated, as it's
+	 * responsible for allocating a new page.
+	 */
+	info->wqe_index_mask = info->num_frags % 2;
 
 out:
-	info->wqe_bulk = max_t(u8, info->wqe_bulk, 8);
+	/* Bulking optimization to skip allocation until at least 8 WQEs can be
+	 * allocated in a row. At the same time, never start allocation when
+	 * the page is still used by older WQEs.
+	 */
+	info->wqe_bulk = max_t(u8, info->wqe_index_mask + 1, 8);
+
 	info->log_num_frags = order_base_2(info->num_frags);
 
 	return 0;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 04/16] net/mlx5e: Make the wqe_index_mask calculation more exact
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 03/16] net/mlx5e: Introduce wqe_index_mask for legacy RQ Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 05/16] net/mlx5e: Use partial batches in legacy RQ Saeed Mahameed
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

The old calculation of wqe_index_mask may give false positives, i.e.
request bulking of pairs of WQEs when not strictly needed, for example,
when the first fragment size is equal to the PAGE_SIZE, bulking is not
needed, even if the number of fragments is odd.

Make the calculation more exact to cut false positives.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/params.c   | 21 ++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 49306a68b3b5..ac4d70bb21e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -648,7 +648,26 @@ static int mlx5e_build_rq_frags_info(struct mlx5_core_dev *mdev,
 	 * is not completed yet, WQE 2*N must not be allocated, as it's
 	 * responsible for allocating a new page.
 	 */
-	info->wqe_index_mask = info->num_frags % 2;
+	if (frag_size_max == PAGE_SIZE) {
+		/* No WQE can start in the middle of a page. */
+		info->wqe_index_mask = 0;
+	} else {
+		/* PAGE_SIZEs starting from 8192 don't use 2K-sized fragments,
+		 * because there would be more than MLX5E_MAX_RX_FRAGS of them.
+		 */
+		WARN_ON(PAGE_SIZE != 2 * DEFAULT_FRAG_SIZE);
+
+		/* Odd number of fragments allows to pack the last fragment of
+		 * the previous WQE and the first fragment of the next WQE into
+		 * the same page.
+		 * As long as DEFAULT_FRAG_SIZE is 2048, and MLX5E_MAX_RX_FRAGS
+		 * is 4, the last fragment can be bigger than the rest only if
+		 * it's the fourth one, so WQEs consisting of 3 fragments will
+		 * always share a page.
+		 * When a page is shared, WQE bulk size is 2, otherwise just 1.
+		 */
+		info->wqe_index_mask = info->num_frags % 2;
+	}
 
 out:
 	/* Bulking optimization to skip allocation until at least 8 WQEs can be
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 05/16] net/mlx5e: Use partial batches in legacy RQ
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 04/16] net/mlx5e: Make the wqe_index_mask calculation more exact Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 06/16] net/mlx5e: xsk: Use partial batches in legacy RQ with XSK Saeed Mahameed
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

Legacy RQ allocates WQEs in batches. If the batch allocation fails, the
pages of the allocated part are released. This commit changes this
behavior to allow to use the pages that have been already allocated.

After this change, we need to be careful about indexing rq->wqe.frags[].
The WQ size is a power of two that divides by wqe_bulk (8), and the old
code used whole bulks, which allowed to use indices [8*K; 8*K+7] without
overflowing. Now that the bulks may be partial, the range can start at
any location (not only at 8*K), so we need to wrap them around to avoid
out-of-bounds array access.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 39 ++++++++++---------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 72d74de3ee99..ffca217b7d7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -427,7 +427,6 @@ static void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix)
 static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
-	int err;
 	int i;
 
 	if (rq->xsk_pool) {
@@ -442,20 +441,16 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
 	}
 
 	for (i = 0; i < wqe_bulk; i++) {
-		struct mlx5e_rx_wqe_cyc *wqe = mlx5_wq_cyc_get_wqe(wq, ix + i);
+		int j = mlx5_wq_cyc_ctr2ix(wq, ix + i);
+		struct mlx5e_rx_wqe_cyc *wqe;
 
-		err = mlx5e_alloc_rx_wqe(rq, wqe, ix + i);
-		if (unlikely(err))
-			goto free_wqes;
-	}
+		wqe = mlx5_wq_cyc_get_wqe(wq, j);
 
-	return 0;
-
-free_wqes:
-	while (--i >= 0)
-		mlx5e_dealloc_rx_wqe(rq, ix + i);
+		if (unlikely(mlx5e_alloc_rx_wqe(rq, wqe, j)))
+			break;
+	}
 
-	return err;
+	return i;
 }
 
 static inline void
@@ -821,8 +816,8 @@ static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
+	bool busy = false;
 	u8 wqe_bulk;
-	int err;
 
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, &rq->state)))
 		return false;
@@ -837,14 +832,22 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 
 	do {
 		u16 head = mlx5_wq_cyc_get_head(wq);
+		int count;
+		u8 bulk;
 
-		err = mlx5e_alloc_rx_wqes(rq, head, wqe_bulk);
-		if (unlikely(err)) {
+		/* Don't allow any newly allocated WQEs to share the same page
+		 * with old WQEs that aren't completed yet. Stop earlier.
+		 */
+		bulk = wqe_bulk - ((head + wqe_bulk) & rq->wqe.info.wqe_index_mask);
+
+		count = mlx5e_alloc_rx_wqes(rq, head, bulk);
+		if (likely(count > 0))
+			mlx5_wq_cyc_push_n(wq, count);
+		if (unlikely(count != bulk)) {
 			rq->stats->buff_alloc_err++;
+			busy = true;
 			break;
 		}
-
-		mlx5_wq_cyc_push_n(wq, wqe_bulk);
 	} while (mlx5_wq_cyc_missing(wq) >= wqe_bulk);
 
 	/* ensure wqes are visible to device before updating doorbell record */
@@ -852,7 +855,7 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 
 	mlx5_wq_cyc_update_db_record(wq);
 
-	return !!err;
+	return busy;
 }
 
 void mlx5e_free_icosq_descs(struct mlx5e_icosq *sq)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 06/16] net/mlx5e: xsk: Use partial batches in legacy RQ with XSK
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 05/16] net/mlx5e: Use partial batches in legacy RQ Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 07/16] net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs Saeed Mahameed
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

The previous commit allowed allocating WQE batches in legacy RQ
partially, however, XSK still checks whether there are enough frames in
the fill ring. Remove this check to allow to allocate batches partially
also with XSK.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ffca217b7d7e..80f2b5960782 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -429,17 +429,6 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
 	int i;
 
-	if (rq->xsk_pool) {
-		int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags;
-
-		/* Check in advance that we have enough frames, instead of
-		 * allocating one-by-one, failing and moving frames to the
-		 * Reuse Ring.
-		 */
-		if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired)))
-			return -ENOMEM;
-	}
-
 	for (i = 0; i < wqe_bulk; i++) {
 		int j = mlx5_wq_cyc_ctr2ix(wq, ix + i);
 		struct mlx5e_rx_wqe_cyc *wqe;
@@ -841,8 +830,7 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 		bulk = wqe_bulk - ((head + wqe_bulk) & rq->wqe.info.wqe_index_mask);
 
 		count = mlx5e_alloc_rx_wqes(rq, head, bulk);
-		if (likely(count > 0))
-			mlx5_wq_cyc_push_n(wq, count);
+		mlx5_wq_cyc_push_n(wq, count);
 		if (unlikely(count != bulk)) {
 			rq->stats->buff_alloc_err++;
 			busy = true;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 07/16] net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 06/16] net/mlx5e: xsk: Use partial batches in legacy RQ with XSK Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 08/16] net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ Saeed Mahameed
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As
partial batches are allowed, there is no point to have a loop in a loop,
so the outer loop is removed, and the batch size is increased up to the
total number of WQEs to allocate, still not smaller than 8.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 37 ++++++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/wq.h  |  2 +-
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 80f2b5960782..d620c1ed9b80 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -424,7 +424,7 @@ static void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix)
 	mlx5e_free_rx_wqe(rq, wi, false);
 }
 
-static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
+static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
 	int i;
@@ -805,38 +805,33 @@ static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
+	int wqe_bulk, count;
 	bool busy = false;
-	u8 wqe_bulk;
+	u16 head;
 
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, &rq->state)))
 		return false;
 
-	wqe_bulk = rq->wqe.info.wqe_bulk;
-
-	if (mlx5_wq_cyc_missing(wq) < wqe_bulk)
+	if (mlx5_wq_cyc_missing(wq) < rq->wqe.info.wqe_bulk)
 		return false;
 
 	if (rq->page_pool)
 		page_pool_nid_changed(rq->page_pool, numa_mem_id());
 
-	do {
-		u16 head = mlx5_wq_cyc_get_head(wq);
-		int count;
-		u8 bulk;
+	wqe_bulk = mlx5_wq_cyc_missing(wq);
+	head = mlx5_wq_cyc_get_head(wq);
 
-		/* Don't allow any newly allocated WQEs to share the same page
-		 * with old WQEs that aren't completed yet. Stop earlier.
-		 */
-		bulk = wqe_bulk - ((head + wqe_bulk) & rq->wqe.info.wqe_index_mask);
+	/* Don't allow any newly allocated WQEs to share the same page with old
+	 * WQEs that aren't completed yet. Stop earlier.
+	 */
+	wqe_bulk -= (head + wqe_bulk) & rq->wqe.info.wqe_index_mask;
 
-		count = mlx5e_alloc_rx_wqes(rq, head, bulk);
-		mlx5_wq_cyc_push_n(wq, count);
-		if (unlikely(count != bulk)) {
-			rq->stats->buff_alloc_err++;
-			busy = true;
-			break;
-		}
-	} while (mlx5_wq_cyc_missing(wq) >= wqe_bulk);
+	count = mlx5e_alloc_rx_wqes(rq, head, wqe_bulk);
+	mlx5_wq_cyc_push_n(wq, count);
+	if (unlikely(count != wqe_bulk)) {
+		rq->stats->buff_alloc_err++;
+		busy = true;
+	}
 
 	/* ensure wqes are visible to device before updating doorbell record */
 	dma_wmb();
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.h b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
index e5c4dcd1425e..4d629e5ddbc7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
@@ -123,7 +123,7 @@ static inline void mlx5_wq_cyc_push(struct mlx5_wq_cyc *wq)
 	wq->cur_sz++;
 }
 
-static inline void mlx5_wq_cyc_push_n(struct mlx5_wq_cyc *wq, u8 n)
+static inline void mlx5_wq_cyc_push_n(struct mlx5_wq_cyc *wq, u16 n)
 {
 	wq->wqe_ctr += n;
 	wq->cur_sz += n;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 08/16] net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 07/16] net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 09/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ Saeed Mahameed
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

Allocation of XSK frames on legacy RQ may be made more efficient with a
specialized routine that relies on certain assumptions, such as there is
only one fragment, allocation units (XSK frames) are not shared among
multiple packets. It reduces the number of branches both in the XSK code
and in the regular RQ, because with this approach there is only a single
check whether it's an XSK or regular RQ.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   | 26 +++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 11 +++++---
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 4441d35943d1..a850141789a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -8,6 +8,32 @@
 
 /* RX data path */
 
+int mlx5e_xsk_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
+{
+	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
+	int i;
+
+	for (i = 0; i < wqe_bulk; i++) {
+		int j = mlx5_wq_cyc_ctr2ix(wq, ix + i);
+		struct mlx5e_wqe_frag_info *frag;
+		struct mlx5e_rx_wqe_cyc *wqe;
+		dma_addr_t addr;
+
+		wqe = mlx5_wq_cyc_get_wqe(wq, j);
+		/* Assumes log_num_frags == 0. */
+		frag = &rq->wqe.frags[j];
+
+		frag->au->xsk = xsk_buff_alloc(rq->xsk_pool);
+		if (unlikely(!frag->au->xsk))
+			return i;
+
+		addr = xsk_buff_xdp_get_frame_dma(frag->au->xsk);
+		wqe->data[0].addr = cpu_to_be64(addr + rq->buff.headroom);
+	}
+
+	return wqe_bulk;
+}
+
 static struct sk_buff *mlx5e_xsk_construct_skb(struct mlx5e_rq *rq, void *data,
 					       u32 cqe_bcnt)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index e702cb790476..acabcee623f9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -9,6 +9,7 @@
 
 /* RX data path */
 
+int mlx5e_xsk_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk);
 struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 						    struct mlx5e_mpw_info *wi,
 						    u16 cqe_bcnt,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index d620c1ed9b80..6321eb3fff31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -359,7 +359,7 @@ static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
 		 * offset) should just use the new one without replenishing again
 		 * by themselves.
 		 */
-		err = mlx5e_page_alloc(rq, frag->au);
+		err = mlx5e_page_alloc_pool(rq, frag->au);
 
 	return err;
 }
@@ -393,8 +393,7 @@ static int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq, struct mlx5e_rx_wqe_cyc *wqe,
 			goto free_frags;
 
 		headroom = i == 0 ? rq->buff.headroom : 0;
-		addr = rq->xsk_pool ? xsk_buff_xdp_get_frame_dma(frag->au->xsk) :
-				      page_pool_get_dma_addr(frag->au->page);
+		addr = page_pool_get_dma_addr(frag->au->page);
 		wqe->data[i].addr = cpu_to_be64(addr + frag->offset + headroom);
 	}
 
@@ -826,7 +825,11 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 	 */
 	wqe_bulk -= (head + wqe_bulk) & rq->wqe.info.wqe_index_mask;
 
-	count = mlx5e_alloc_rx_wqes(rq, head, wqe_bulk);
+	if (!rq->xsk_pool)
+		count = mlx5e_alloc_rx_wqes(rq, head, wqe_bulk);
+	else
+		count = mlx5e_xsk_alloc_rx_wqes(rq, head, wqe_bulk);
+
 	mlx5_wq_cyc_push_n(wq, count);
 	if (unlikely(count != wqe_bulk)) {
 		rq->stats->buff_alloc_err++;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 09/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 08/16] net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 10/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ Saeed Mahameed
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on legacy RQ, adding
a special case for XSK. The new branch introduced basically replaces the
branch that was removed from the same place a few commits before.

A check is made that DMA sync is not needed, because the batching
allocator falls back to returning one frame when DMA sync is needed, and
this is best handled by the loop in the standard case.

Performance improvement is up to 8% in the aligned mode and up to 9% in
the unaligned mode.

Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps
Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps
Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps
Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps
Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps

CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   | 40 +++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  7 ++++
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  7 ++++
 4 files changed, 55 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index a850141789a0..812a370f6aea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -8,6 +8,46 @@
 
 /* RX data path */
 
+int mlx5e_xsk_alloc_rx_wqes_batched(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
+{
+	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
+	struct xdp_buff **buffs;
+	u32 contig, alloc;
+	int i;
+
+	/* mlx5e_init_frags_partition creates a 1:1 mapping between
+	 * rq->wqe.frags and rq->wqe.alloc_units, which allows us to
+	 * allocate XDP buffers straight into alloc_units.
+	 */
+	BUILD_BUG_ON(sizeof(rq->wqe.alloc_units[0]) !=
+		     sizeof(rq->wqe.alloc_units[0].xsk));
+	buffs = (struct xdp_buff **)rq->wqe.alloc_units;
+	contig = mlx5_wq_cyc_get_size(wq) - ix;
+	if (wqe_bulk <= contig) {
+		alloc = xsk_buff_alloc_batch(rq->xsk_pool, buffs + ix, wqe_bulk);
+	} else {
+		alloc = xsk_buff_alloc_batch(rq->xsk_pool, buffs + ix, contig);
+		if (likely(alloc == contig))
+			alloc += xsk_buff_alloc_batch(rq->xsk_pool, buffs, wqe_bulk - contig);
+	}
+
+	for (i = 0; i < alloc; i++) {
+		int j = mlx5_wq_cyc_ctr2ix(wq, ix + i);
+		struct mlx5e_wqe_frag_info *frag;
+		struct mlx5e_rx_wqe_cyc *wqe;
+		dma_addr_t addr;
+
+		wqe = mlx5_wq_cyc_get_wqe(wq, j);
+		/* Assumes log_num_frags == 0. */
+		frag = &rq->wqe.frags[j];
+
+		addr = xsk_buff_xdp_get_frame_dma(frag->au->xsk);
+		wqe->data[0].addr = cpu_to_be64(addr + rq->buff.headroom);
+	}
+
+	return alloc;
+}
+
 int mlx5e_xsk_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index acabcee623f9..7898a78237b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -9,6 +9,7 @@
 
 /* RX data path */
 
+int mlx5e_xsk_alloc_rx_wqes_batched(struct mlx5e_rq *rq, u16 ix, int wqe_bulk);
 int mlx5e_xsk_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk);
 struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 						    struct mlx5e_mpw_info *wi,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2719247b18db..6a0adda03463 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -433,6 +433,13 @@ static void mlx5e_init_frags_partition(struct mlx5e_rq *rq)
 	struct mlx5e_wqe_frag_info *prev = NULL;
 	int i;
 
+	if (rq->xsk_pool) {
+		/* Assumptions used by XSK batched allocator. */
+		WARN_ON(rq->wqe.info.num_frags != 1);
+		WARN_ON(rq->wqe.info.log_num_frags != 0);
+		WARN_ON(rq->wqe.info.arr[0].frag_stride != PAGE_SIZE);
+	}
+
 	next_frag.au = &rq->wqe.alloc_units[0];
 
 	for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 6321eb3fff31..5f411c29157f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -827,7 +827,14 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 
 	if (!rq->xsk_pool)
 		count = mlx5e_alloc_rx_wqes(rq, head, wqe_bulk);
+	else if (likely(!rq->xsk_pool->dma_need_sync))
+		count = mlx5e_xsk_alloc_rx_wqes_batched(rq, head, wqe_bulk);
 	else
+		/* If dma_need_sync is true, it's more efficient to call
+		 * xsk_buff_alloc in a loop, rather than xsk_buff_alloc_batch,
+		 * because the latter does the same check and returns only one
+		 * frame.
+		 */
 		count = mlx5e_xsk_alloc_rx_wqes(rq, head, wqe_bulk);
 
 	mlx5_wq_cyc_push_n(wq, count);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 10/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 09/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 11/16] net/mlx5e: Use non-XSK page allocator in SHAMPO Saeed Mahameed
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on striding RQ and
creates an optimized flow for XSK. A side effect is an opportunity to
optimize the regular RX flow by dropping branching for XSK cases.

Performance improvement is up to 6.4% in the aligned mode and up to 7.5%
in the unaligned mode.

Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps
Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps
Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps
Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps
Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps

CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h |  7 ++
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   | 88 ++++++++++++++++++-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.h   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 58 +++---------
 4 files changed, 106 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index f4f306bb8e6d..4456ad5cedf1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -452,4 +452,11 @@ static inline bool mlx5e_icosq_can_post_wqe(struct mlx5e_icosq *sq, u16 wqe_size
 
 	return mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, room);
 }
+
+static inline struct mlx5e_mpw_info *mlx5e_get_mpw_info(struct mlx5e_rq *rq, int i)
+{
+	size_t isz = struct_size(rq->mpwqe.info, alloc_units, rq->mpwqe.pages_per_wqe);
+
+	return (struct mlx5e_mpw_info *)((char *)rq->mpwqe.info + array_size(i, isz));
+}
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 812a370f6aea..7bd49f0b1271 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -8,6 +8,90 @@
 
 /* RX data path */
 
+int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
+{
+	struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, ix);
+	struct mlx5e_icosq *icosq = rq->icosq;
+	struct mlx5_wq_cyc *wq = &icosq->wq;
+	struct mlx5e_umr_wqe *umr_wqe;
+	int batch, i;
+	u32 offset; /* 17-bit value with MTT. */
+	u16 pi;
+
+	if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, rq->mpwqe.pages_per_wqe)))
+		goto err;
+
+	BUILD_BUG_ON(sizeof(wi->alloc_units[0]) != sizeof(wi->alloc_units[0].xsk));
+	batch = xsk_buff_alloc_batch(rq->xsk_pool, (struct xdp_buff **)wi->alloc_units,
+				     rq->mpwqe.pages_per_wqe);
+
+	/* If batch < pages_per_wqe, either:
+	 * 1. Some (or all) descriptors were invalid.
+	 * 2. dma_need_sync is true, and it fell back to allocating one frame.
+	 * In either case, try to continue allocating frames one by one, until
+	 * the first error, which will mean there are no more valid descriptors.
+	 */
+	for (; batch < rq->mpwqe.pages_per_wqe; batch++) {
+		wi->alloc_units[batch].xsk = xsk_buff_alloc(rq->xsk_pool);
+		if (unlikely(!wi->alloc_units[batch].xsk))
+			goto err_reuse_batch;
+	}
+
+	pi = mlx5e_icosq_get_next_pi(icosq, rq->mpwqe.umr_wqebbs);
+	umr_wqe = mlx5_wq_cyc_get_wqe(wq, pi);
+	memcpy(umr_wqe, &rq->mpwqe.umr_wqe, sizeof(struct mlx5e_umr_wqe));
+
+	if (unlikely(rq->mpwqe.unaligned)) {
+		for (i = 0; i < batch; i++) {
+			dma_addr_t addr = xsk_buff_xdp_get_frame_dma(wi->alloc_units[i].xsk);
+
+			umr_wqe->inline_ksms[i] = (struct mlx5_ksm) {
+				.key = rq->mkey_be,
+				.va = cpu_to_be64(addr),
+			};
+		}
+	} else {
+		for (i = 0; i < batch; i++) {
+			dma_addr_t addr = xsk_buff_xdp_get_frame_dma(wi->alloc_units[i].xsk);
+
+			umr_wqe->inline_mtts[i] = (struct mlx5_mtt) {
+				.ptag = cpu_to_be64(addr | MLX5_EN_WR),
+			};
+		}
+	}
+
+	bitmap_zero(wi->xdp_xmit_bitmap, rq->mpwqe.pages_per_wqe);
+	wi->consumed_strides = 0;
+
+	umr_wqe->ctrl.opmod_idx_opcode =
+		cpu_to_be32((icosq->pc << MLX5_WQE_CTRL_WQE_INDEX_SHIFT) | MLX5_OPCODE_UMR);
+
+	offset = ix * rq->mpwqe.mtts_per_wqe;
+	if (likely(!rq->mpwqe.unaligned))
+		offset = MLX5_ALIGNED_MTTS_OCTW(offset);
+	umr_wqe->uctrl.xlt_offset = cpu_to_be16(offset);
+
+	icosq->db.wqe_info[pi] = (struct mlx5e_icosq_wqe_info) {
+		.wqe_type = MLX5E_ICOSQ_WQE_UMR_RX,
+		.num_wqebbs = rq->mpwqe.umr_wqebbs,
+		.umr.rq = rq,
+	};
+
+	icosq->pc += rq->mpwqe.umr_wqebbs;
+
+	icosq->doorbell_cseg = &umr_wqe->ctrl;
+
+	return 0;
+
+err_reuse_batch:
+	while (--batch >= 0)
+		xsk_buff_free(wi->alloc_units[batch].xsk);
+
+err:
+	rq->stats->buff_alloc_err++;
+	return -ENOMEM;
+}
+
 int mlx5e_xsk_alloc_rx_wqes_batched(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
@@ -112,7 +196,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	 */
 	WARN_ON_ONCE(head_offset);
 
-	xdp->data_end = xdp->data + cqe_bcnt;
+	xsk_buff_set_size(xdp, cqe_bcnt);
 	xdp_set_data_meta_invalid(xdp);
 	xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
 	net_prefetch(xdp->data);
@@ -159,7 +243,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 	 */
 	WARN_ON_ONCE(wi->offset);
 
-	xdp->data_end = xdp->data + cqe_bcnt;
+	xsk_buff_set_size(xdp, cqe_bcnt);
 	xdp_set_data_meta_invalid(xdp);
 	xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
 	net_prefetch(xdp->data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index 7898a78237b8..84a496a8d72f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -9,6 +9,7 @@
 
 /* RX data path */
 
+int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 int mlx5e_xsk_alloc_rx_wqes_batched(struct mlx5e_rq *rq, u16 ix, int wqe_bulk);
 int mlx5e_xsk_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk);
 struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5f411c29157f..329702e185a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -75,13 +75,6 @@ const struct mlx5e_rx_handlers mlx5e_rx_handlers_nic = {
 	.handle_rx_cqe_mpwqe_shampo = mlx5e_handle_rx_cqe_mpwrq_shampo,
 };
 
-static struct mlx5e_mpw_info *mlx5e_get_mpw_info(struct mlx5e_rq *rq, int i)
-{
-	size_t isz = struct_size(rq->mpwqe.info, alloc_units, rq->mpwqe.pages_per_wqe);
-
-	return (struct mlx5e_mpw_info *)((char *)rq->mpwqe.info + array_size(i, isz));
-}
-
 static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
 {
 	return config->rx_filter == HWTSTAMP_FILTER_ALL;
@@ -668,15 +661,6 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	int err;
 	int i;
 
-	/* Check in advance that we have enough frames, instead of allocating
-	 * one-by-one, failing and moving frames to the Reuse Ring.
-	 */
-	if (rq->xsk_pool &&
-	    unlikely(!xsk_buff_can_alloc(rq->xsk_pool, rq->mpwqe.pages_per_wqe))) {
-		err = -ENOMEM;
-		goto err;
-	}
-
 	if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) {
 		err = mlx5e_alloc_rx_hd_mpwqe(rq);
 		if (unlikely(err))
@@ -687,33 +671,16 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	umr_wqe = mlx5_wq_cyc_get_wqe(wq, pi);
 	memcpy(umr_wqe, &rq->mpwqe.umr_wqe, sizeof(struct mlx5e_umr_wqe));
 
-	if (unlikely(rq->mpwqe.unaligned)) {
-		for (i = 0; i < rq->mpwqe.pages_per_wqe; i++, au++) {
-			dma_addr_t addr;
-
-			err = mlx5e_page_alloc(rq, au);
-			if (unlikely(err))
-				goto err_unmap;
-			/* Unaligned means XSK. */
-			addr = xsk_buff_xdp_get_frame_dma(au->xsk);
-			umr_wqe->inline_ksms[i] = (struct mlx5_ksm) {
-				.key = rq->mkey_be,
-				.va = cpu_to_be64(addr),
-			};
-		}
-	} else {
-		for (i = 0; i < rq->mpwqe.pages_per_wqe; i++, au++) {
-			dma_addr_t addr;
+	for (i = 0; i < rq->mpwqe.pages_per_wqe; i++, au++) {
+		dma_addr_t addr;
 
-			err = mlx5e_page_alloc(rq, au);
-			if (unlikely(err))
-				goto err_unmap;
-			addr = rq->xsk_pool ? xsk_buff_xdp_get_frame_dma(au->xsk) :
-					      page_pool_get_dma_addr(au->page);
-			umr_wqe->inline_mtts[i] = (struct mlx5_mtt) {
-				.ptag = cpu_to_be64(addr | MLX5_EN_WR),
-			};
-		}
+		err = mlx5e_page_alloc_pool(rq, au);
+		if (unlikely(err))
+			goto err_unmap;
+		addr = page_pool_get_dma_addr(au->page);
+		umr_wqe->inline_mtts[i] = (struct mlx5_mtt) {
+			.ptag = cpu_to_be64(addr | MLX5_EN_WR),
+		};
 	}
 
 	bitmap_zero(wi->xdp_xmit_bitmap, rq->mpwqe.pages_per_wqe);
@@ -723,9 +690,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 		cpu_to_be32((sq->pc << MLX5_WQE_CTRL_WQE_INDEX_SHIFT) |
 			    MLX5_OPCODE_UMR);
 
-	offset = ix * rq->mpwqe.mtts_per_wqe;
-	if (!rq->mpwqe.unaligned)
-		offset = MLX5_ALIGNED_MTTS_OCTW(offset);
+	offset = MLX5_ALIGNED_MTTS_OCTW(ix * rq->mpwqe.mtts_per_wqe);
 	umr_wqe->uctrl.xlt_offset = cpu_to_be16(offset);
 
 	sq->db.wqe_info[pi] = (struct mlx5e_icosq_wqe_info) {
@@ -1016,7 +981,8 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
 	head = rq->mpwqe.actual_wq_head;
 	i = missing;
 	do {
-		alloc_err = mlx5e_alloc_rx_mpwqe(rq, head);
+		alloc_err = rq->xsk_pool ? mlx5e_xsk_alloc_rx_mpwqe(rq, head) :
+					   mlx5e_alloc_rx_mpwqe(rq, head);
 
 		if (unlikely(alloc_err))
 			break;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 11/16] net/mlx5e: Use non-XSK page allocator in SHAMPO
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 10/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:28 ` [PATCH net-next 12/16] net/mlx5e: Call mlx5e_page_release_dynamic directly where possible Saeed Mahameed
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

The SHAMPO flow is not compatible with XSK, it can call the page pool
allocator directly to save a branch.

mlx5e_page_alloc is removed, as it's no longer used in any flow.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 329702e185a9..9d0a5c66c6a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -293,16 +293,6 @@ static inline int mlx5e_page_alloc_pool(struct mlx5e_rq *rq, union mlx5e_alloc_u
 	return 0;
 }
 
-static inline int mlx5e_page_alloc(struct mlx5e_rq *rq, union mlx5e_alloc_unit *au)
-{
-	if (rq->xsk_pool) {
-		au->xsk = xsk_buff_alloc(rq->xsk_pool);
-		return likely(au->xsk) ? 0 : -ENOMEM;
-	} else {
-		return mlx5e_page_alloc_pool(rq, au);
-	}
-}
-
 void mlx5e_page_dma_unmap(struct mlx5e_rq *rq, struct page *page)
 {
 	dma_addr_t dma_addr = page_pool_get_dma_addr(page);
@@ -562,7 +552,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 		if (!(header_offset & (PAGE_SIZE - 1))) {
 			union mlx5e_alloc_unit au;
 
-			err = mlx5e_page_alloc(rq, &au);
+			err = mlx5e_page_alloc_pool(rq, &au);
 			if (unlikely(err))
 				goto err_unmap;
 			page = dma_info->page = au.page;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 12/16] net/mlx5e: Call mlx5e_page_release_dynamic directly where possible
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 11/16] net/mlx5e: Use non-XSK page allocator in SHAMPO Saeed Mahameed
@ 2022-09-30 16:28 ` Saeed Mahameed
  2022-09-30 16:29 ` [PATCH net-next 13/16] net/mlx5e: Optimize RQ page deallocation Saeed Mahameed
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

mlx5e_page_release calls the appropriate deallocator depending on
whether it's an XSK RQ or a regular one. Some flows that call this
function are not compatible with XSK, so they can call the non-XSK
deallocator directly to save a branch.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 20 ++++---------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 9d0a5c66c6a9..d0db6a66cb46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -588,12 +588,8 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 	while (--i >= 0) {
 		dma_info = &shampo->info[--index];
 		if (!(i & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1))) {
-			union mlx5e_alloc_unit au = {
-				.page = dma_info->page,
-			};
-
 			dma_info->addr = ALIGN_DOWN(dma_info->addr, PAGE_SIZE);
-			mlx5e_page_release(rq, &au, true);
+			mlx5e_page_release_dynamic(rq, dma_info->page, true);
 		}
 	}
 	rq->stats->buff_alloc_err++;
@@ -698,7 +694,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 err_unmap:
 	while (--i >= 0) {
 		au--;
-		mlx5e_page_release(rq, au, true);
+		mlx5e_page_release_dynamic(rq, au->page, true);
 	}
 
 err:
@@ -731,12 +727,8 @@ void mlx5e_shampo_dealloc_hd(struct mlx5e_rq *rq, u16 len, u16 start, bool close
 		hd_info = &shampo->info[index];
 		hd_info->addr = ALIGN_DOWN(hd_info->addr, PAGE_SIZE);
 		if (hd_info->page != deleted_page) {
-			union mlx5e_alloc_unit au = {
-				.page = hd_info->page,
-			};
-
 			deleted_page = hd_info->page;
-			mlx5e_page_release(rq, &au, false);
+			mlx5e_page_release_dynamic(rq, hd_info->page, false);
 		}
 	}
 
@@ -2061,12 +2053,8 @@ mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index)
 	u64 addr = shampo->info[header_index].addr;
 
 	if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) {
-		union mlx5e_alloc_unit au = {
-			.page = shampo->info[header_index].page,
-		};
-
 		shampo->info[header_index].addr = ALIGN_DOWN(addr, PAGE_SIZE);
-		mlx5e_page_release(rq, &au, true);
+		mlx5e_page_release_dynamic(rq, shampo->info[header_index].page, true);
 	}
 	bitmap_clear(shampo->bitmap, header_index, 1);
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 13/16] net/mlx5e: Optimize RQ page deallocation
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2022-09-30 16:28 ` [PATCH net-next 12/16] net/mlx5e: Call mlx5e_page_release_dynamic directly where possible Saeed Mahameed
@ 2022-09-30 16:29 ` Saeed Mahameed
  2022-09-30 16:29 ` [PATCH net-next 14/16] net/mlx5e: xsk: Support XDP metadata on XSK RQs Saeed Mahameed
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling
mlx5e_page_release for ones that are not scheduled for XDP_TX or
XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a
regular one for each page/XSK frame. This check can be moved outside the
loop to reduce the number of branches.

mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release
for the ones that are last in a page; and mlx5e_page_release checks
whether it's an XSK RQ or a regular one for each fragment. Using the
fact that XSK doesn't support multiple fragments, it can be optimized
for both XSK and regular usages:

1. Make an early check for XSK and call its deallocator directly, saving
3 branches (loop condition, frag->last_in_page and selection of
deallocator).

2. Call the regular deallocator directly in the non-XSK case, saving a
branch per fragment, except the first one.

After the changes, mlx5e_page_release is removed, as there are no
callers left.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 41 +++++++++++--------
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 7bd49f0b1271..661d2d5748f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -253,7 +253,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 		return NULL; /* page/packet was consumed by XDP */
 
 	/* XDP_PASS: copy the data from the UMEM to a new SKB. The frame reuse
-	 * will be handled by mlx5e_put_rx_frag.
+	 * will be handled by mlx5e_free_rx_wqe.
 	 * On SKB allocation failure, NULL is returned.
 	 */
 	return mlx5e_xsk_construct_skb(rq, xdp->data, xdp->data_end - xdp->data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index d0db6a66cb46..36eda4c958a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -317,20 +317,6 @@ void mlx5e_page_release_dynamic(struct mlx5e_rq *rq, struct page *page, bool rec
 	}
 }
 
-static inline void mlx5e_page_release(struct mlx5e_rq *rq,
-				      union mlx5e_alloc_unit *au,
-				      bool recycle)
-{
-	if (rq->xsk_pool)
-		/* The `recycle` parameter is ignored, and the page is always
-		 * put into the Reuse Ring, because there is no way to return
-		 * the page to the userspace when the interface goes down.
-		 */
-		xsk_buff_free(au->xsk);
-	else
-		mlx5e_page_release_dynamic(rq, au->page, recycle);
-}
-
 static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
 				    struct mlx5e_wqe_frag_info *frag)
 {
@@ -352,7 +338,7 @@ static inline void mlx5e_put_rx_frag(struct mlx5e_rq *rq,
 				     bool recycle)
 {
 	if (frag->last_in_page)
-		mlx5e_page_release(rq, frag->au, recycle);
+		mlx5e_page_release_dynamic(rq, frag->au->page, recycle);
 }
 
 static inline struct mlx5e_wqe_frag_info *get_frag(struct mlx5e_rq *rq, u16 ix)
@@ -395,6 +381,15 @@ static inline void mlx5e_free_rx_wqe(struct mlx5e_rq *rq,
 {
 	int i;
 
+	if (rq->xsk_pool) {
+		/* The `recycle` parameter is ignored, and the page is always
+		 * put into the Reuse Ring, because there is no way to return
+		 * the page to the userspace when the interface goes down.
+		 */
+		xsk_buff_free(wi->au->xsk);
+		return;
+	}
+
 	for (i = 0; i < rq->wqe.info.num_frags; i++, wi++)
 		mlx5e_put_rx_frag(rq, wi, recycle);
 }
@@ -463,9 +458,19 @@ mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi, bool recycle
 
 	no_xdp_xmit = bitmap_empty(wi->xdp_xmit_bitmap, rq->mpwqe.pages_per_wqe);
 
-	for (i = 0; i < rq->mpwqe.pages_per_wqe; i++)
-		if (no_xdp_xmit || !test_bit(i, wi->xdp_xmit_bitmap))
-			mlx5e_page_release(rq, &alloc_units[i], recycle);
+	if (rq->xsk_pool) {
+		/* The `recycle` parameter is ignored, and the page is always
+		 * put into the Reuse Ring, because there is no way to return
+		 * the page to the userspace when the interface goes down.
+		 */
+		for (i = 0; i < rq->mpwqe.pages_per_wqe; i++)
+			if (no_xdp_xmit || !test_bit(i, wi->xdp_xmit_bitmap))
+				xsk_buff_free(alloc_units[i].xsk);
+	} else {
+		for (i = 0; i < rq->mpwqe.pages_per_wqe; i++)
+			if (no_xdp_xmit || !test_bit(i, wi->xdp_xmit_bitmap))
+				mlx5e_page_release_dynamic(rq, alloc_units[i].page, recycle);
+	}
 }
 
 static void mlx5e_post_rx_mpwqe(struct mlx5e_rq *rq, u8 n)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 14/16] net/mlx5e: xsk: Support XDP metadata on XSK RQs
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2022-09-30 16:29 ` [PATCH net-next 13/16] net/mlx5e: Optimize RQ page deallocation Saeed Mahameed
@ 2022-09-30 16:29 ` Saeed Mahameed
  2022-09-30 16:29 ` [PATCH net-next 15/16] net/mlx5e: Introduce the mlx5e_flush_rq function Saeed Mahameed
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

Add support for XDP metadata on XSK RQs for cross-program
communication. The driver no longer calls xdp_set_data_meta_invalid and
copies the metadata to a newly allocated SKB on XDP_PASS.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   | 20 +++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 661d2d5748f4..aebc1d5a9004 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -158,18 +158,24 @@ int mlx5e_xsk_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
 	return wqe_bulk;
 }
 
-static struct sk_buff *mlx5e_xsk_construct_skb(struct mlx5e_rq *rq, void *data,
-					       u32 cqe_bcnt)
+static struct sk_buff *mlx5e_xsk_construct_skb(struct mlx5e_rq *rq, struct xdp_buff *xdp)
 {
+	u32 totallen = xdp->data_end - xdp->data_meta;
+	u32 metalen = xdp->data - xdp->data_meta;
 	struct sk_buff *skb;
 
-	skb = napi_alloc_skb(rq->cq.napi, cqe_bcnt);
+	skb = napi_alloc_skb(rq->cq.napi, totallen);
 	if (unlikely(!skb)) {
 		rq->stats->buff_alloc_err++;
 		return NULL;
 	}
 
-	skb_put_data(skb, data, cqe_bcnt);
+	skb_put_data(skb, xdp->data_meta, totallen);
+
+	if (metalen) {
+		skb_metadata_set(skb, metalen);
+		__skb_pull(skb, metalen);
+	}
 
 	return skb;
 }
@@ -197,7 +203,6 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	WARN_ON_ONCE(head_offset);
 
 	xsk_buff_set_size(xdp, cqe_bcnt);
-	xdp_set_data_meta_invalid(xdp);
 	xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
 	net_prefetch(xdp->data);
 
@@ -226,7 +231,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	/* XDP_PASS: copy the data from the UMEM to a new SKB and reuse the
 	 * frame. On SKB allocation failure, NULL is returned.
 	 */
-	return mlx5e_xsk_construct_skb(rq, xdp->data, xdp->data_end - xdp->data);
+	return mlx5e_xsk_construct_skb(rq, xdp);
 }
 
 struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
@@ -244,7 +249,6 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 	WARN_ON_ONCE(wi->offset);
 
 	xsk_buff_set_size(xdp, cqe_bcnt);
-	xdp_set_data_meta_invalid(xdp);
 	xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
 	net_prefetch(xdp->data);
 
@@ -256,5 +260,5 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 	 * will be handled by mlx5e_free_rx_wqe.
 	 * On SKB allocation failure, NULL is returned.
 	 */
-	return mlx5e_xsk_construct_skb(rq, xdp->data, xdp->data_end - xdp->data);
+	return mlx5e_xsk_construct_skb(rq, xdp);
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 15/16] net/mlx5e: Introduce the mlx5e_flush_rq function
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2022-09-30 16:29 ` [PATCH net-next 14/16] net/mlx5e: xsk: Support XDP metadata on XSK RQs Saeed Mahameed
@ 2022-09-30 16:29 ` Saeed Mahameed
  2022-09-30 16:29 ` [PATCH net-next 16/16] net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues Saeed Mahameed
  2022-10-01 20:40 ` [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 patchwork-bot+netdevbpf
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

Add a function to flush an RQ: clean up descriptors, release pages and
reset the RQ. This procedure is used by the recovery flow, and it will
also be used in a following commit to free some memory when switching a
channel to the XSK mode.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../mellanox/mlx5/core/en/reporter_rx.c       | 23 +--------------
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 28 ++++++++++++++++++-
 3 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8e174a7f7c25..238307390400 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1097,7 +1097,7 @@ void mlx5e_activate_priv_channels(struct mlx5e_priv *priv);
 void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv);
 int mlx5e_ptp_rx_manage_fs_ctx(struct mlx5e_priv *priv, void *ctx);
 
-int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_state);
+int mlx5e_flush_rq(struct mlx5e_rq *rq, int curr_state);
 void mlx5e_activate_rq(struct mlx5e_rq *rq);
 void mlx5e_deactivate_rq(struct mlx5e_rq *rq);
 void mlx5e_activate_icosq(struct mlx5e_icosq *icosq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 2b946ae1d97f..5f6f95ad6888 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -134,34 +134,13 @@ static int mlx5e_rx_reporter_err_icosq_cqe_recover(void *ctx)
 	return err;
 }
 
-static int mlx5e_rq_to_ready(struct mlx5e_rq *rq, int curr_state)
-{
-	struct net_device *dev = rq->netdev;
-	int err;
-
-	err = mlx5e_modify_rq_state(rq, curr_state, MLX5_RQC_STATE_RST);
-	if (err) {
-		netdev_err(dev, "Failed to move rq 0x%x to reset\n", rq->rqn);
-		return err;
-	}
-	err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
-	if (err) {
-		netdev_err(dev, "Failed to move rq 0x%x to ready\n", rq->rqn);
-		return err;
-	}
-
-	return 0;
-}
-
 static int mlx5e_rx_reporter_err_rq_cqe_recover(void *ctx)
 {
 	struct mlx5e_rq *rq = ctx;
 	int err;
 
 	mlx5e_deactivate_rq(rq);
-	mlx5e_free_rx_descs(rq);
-
-	err = mlx5e_rq_to_ready(rq, MLX5_RQC_STATE_ERR);
+	err = mlx5e_flush_rq(rq, MLX5_RQC_STATE_ERR);
 	clear_bit(MLX5E_RQ_STATE_RECOVERING, &rq->state);
 	if (err)
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6a0adda03463..129a0d678cce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -899,7 +899,7 @@ int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param)
 	return err;
 }
 
-int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_state)
+static int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_state)
 {
 	struct mlx5_core_dev *mdev = rq->mdev;
 
@@ -928,6 +928,32 @@ int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_state)
 	return err;
 }
 
+static int mlx5e_rq_to_ready(struct mlx5e_rq *rq, int curr_state)
+{
+	struct net_device *dev = rq->netdev;
+	int err;
+
+	err = mlx5e_modify_rq_state(rq, curr_state, MLX5_RQC_STATE_RST);
+	if (err) {
+		netdev_err(dev, "Failed to move rq 0x%x to reset\n", rq->rqn);
+		return err;
+	}
+	err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
+	if (err) {
+		netdev_err(dev, "Failed to move rq 0x%x to ready\n", rq->rqn);
+		return err;
+	}
+
+	return 0;
+}
+
+int mlx5e_flush_rq(struct mlx5e_rq *rq, int curr_state)
+{
+	mlx5e_free_rx_descs(rq);
+
+	return mlx5e_rq_to_ready(rq, curr_state);
+}
+
 static int mlx5e_modify_rq_scatter_fcs(struct mlx5e_rq *rq, bool enable)
 {
 	struct mlx5_core_dev *mdev = rq->mdev;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 16/16] net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (14 preceding siblings ...)
  2022-09-30 16:29 ` [PATCH net-next 15/16] net/mlx5e: Introduce the mlx5e_flush_rq function Saeed Mahameed
@ 2022-09-30 16:29 ` Saeed Mahameed
  2022-10-01 20:40 ` [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 patchwork-bot+netdevbpf
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2022-09-30 16:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Maxim Mikityanskiy <maximmi@nvidia.com>

In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.

However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.

This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.

As the result of this transition:

1. It's possible to use RSS contexts over XSK RQs.

2. It's possible to dedicate all queues to XSK.

3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.

4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   7 --
 .../ethernet/mellanox/mlx5/core/en/channels.c |  29 +++--
 .../ethernet/mellanox/mlx5/core/en/channels.h |   3 +-
 .../ethernet/mellanox/mlx5/core/en/params.h   |  32 -----
 .../ethernet/mellanox/mlx5/core/en/rx_res.c   | 118 +++---------------
 .../ethernet/mellanox/mlx5/core/en/rx_res.h   |   9 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/pool.c |  17 +--
 .../mellanox/mlx5/core/en/xsk/setup.c         |   2 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |   5 +-
 .../mellanox/mlx5/core/en_fs_ethtool.c        |  13 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  17 +--
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   3 -
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib.c |   1 -
 .../mellanox/mlx5/core/ipoib/ipoib_vlan.c     |   1 -
 14 files changed, 52 insertions(+), 205 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 238307390400..6bc6472b98f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -181,12 +181,6 @@ do {                                                            \
 #define mlx5e_state_dereference(priv, p) \
 	rcu_dereference_protected((p), lockdep_is_held(&(priv)->state_lock))
 
-enum mlx5e_rq_group {
-	MLX5E_RQ_GROUP_REGULAR,
-	MLX5E_RQ_GROUP_XSK,
-#define MLX5E_NUM_RQ_GROUPS(g) (1 + MLX5E_RQ_GROUP_##g)
-};
-
 static inline u8 mlx5e_get_num_lag_ports(struct mlx5_core_dev *mdev)
 {
 	if (mlx5_lag_is_lacp_owner(mdev))
@@ -1005,7 +999,6 @@ struct mlx5e_profile {
 	mlx5e_stats_grp_t *stats_grps;
 	const struct mlx5e_rx_handlers *rx_handlers;
 	int	max_tc;
-	u8	rq_groups;
 	u32     features;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c
index e7c14c0de0a7..48581ea3adcb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c
@@ -10,28 +10,33 @@ unsigned int mlx5e_channels_get_num(struct mlx5e_channels *chs)
 	return chs->num;
 }
 
-void mlx5e_channels_get_regular_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn)
+static struct mlx5e_channel *mlx5e_channels_get(struct mlx5e_channels *chs, unsigned int ix)
 {
-	struct mlx5e_channel *c;
+	WARN_ON_ONCE(ix >= mlx5e_channels_get_num(chs));
+	return chs->c[ix];
+}
 
-	WARN_ON(ix >= mlx5e_channels_get_num(chs));
-	c = chs->c[ix];
+bool mlx5e_channels_is_xsk(struct mlx5e_channels *chs, unsigned int ix)
+{
+	struct mlx5e_channel *c = mlx5e_channels_get(chs, ix);
 
-	*rqn = c->rq.rqn;
+	return test_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
 }
 
-bool mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn)
+void mlx5e_channels_get_regular_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn)
 {
-	struct mlx5e_channel *c;
+	struct mlx5e_channel *c = mlx5e_channels_get(chs, ix);
 
-	WARN_ON(ix >= mlx5e_channels_get_num(chs));
-	c = chs->c[ix];
+	*rqn = c->rq.rqn;
+}
 
-	if (!test_bit(MLX5E_CHANNEL_STATE_XSK, c->state))
-		return false;
+void mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn)
+{
+	struct mlx5e_channel *c = mlx5e_channels_get(chs, ix);
+
+	WARN_ON_ONCE(!test_bit(MLX5E_CHANNEL_STATE_XSK, c->state));
 
 	*rqn = c->xskrq.rqn;
-	return true;
 }
 
 bool mlx5e_channels_get_ptp_rqn(struct mlx5e_channels *chs, u32 *rqn)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h
index ca00cbc827cb..637ca90daaa8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h
@@ -9,8 +9,9 @@
 struct mlx5e_channels;
 
 unsigned int mlx5e_channels_get_num(struct mlx5e_channels *chs);
+bool mlx5e_channels_is_xsk(struct mlx5e_channels *chs, unsigned int ix);
 void mlx5e_channels_get_regular_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn);
-bool mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn);
+void mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn);
 bool mlx5e_channels_get_ptp_rqn(struct mlx5e_channels *chs, u32 *rqn);
 
 #endif /* __MLX5_EN_CHANNELS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index cb862c478376..a3952afdcbe4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -53,38 +53,6 @@ struct mlx5e_create_sq_param {
 	u8                          min_inline_mode;
 };
 
-static inline bool mlx5e_qid_get_ch_if_in_group(struct mlx5e_params *params,
-						u16 qid,
-						enum mlx5e_rq_group group,
-						u16 *ix)
-{
-	int nch = params->num_channels;
-	int ch = qid - nch * group;
-
-	if (ch < 0 || ch >= nch)
-		return false;
-
-	*ix = ch;
-	return true;
-}
-
-static inline void mlx5e_qid_get_ch_and_group(struct mlx5e_params *params,
-					      u16 qid,
-					      u16 *ix,
-					      enum mlx5e_rq_group *group)
-{
-	u16 nch = params->num_channels;
-
-	*ix = qid % nch;
-	*group = qid / nch;
-}
-
-static inline bool mlx5e_qid_validate(const struct mlx5e_profile *profile,
-				      struct mlx5e_params *params, u64 qid)
-{
-	return qid < params->num_channels * profile->rq_groups;
-}
-
 /* Striding RQ dynamic parameters */
 
 u8 mlx5e_mpwrq_page_shift(struct mlx5_core_dev *mdev, struct mlx5e_xsk_param *xsk);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
index 3436ecfcbc2f..e1095bc36543 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
@@ -24,8 +24,6 @@ struct mlx5e_rx_res {
 	struct {
 		struct mlx5e_rqt direct_rqt;
 		struct mlx5e_tir direct_tir;
-		struct mlx5e_rqt xsk_rqt;
-		struct mlx5e_tir xsk_tir;
 	} *channels;
 
 	struct {
@@ -320,48 +318,8 @@ static int mlx5e_rx_res_channels_init(struct mlx5e_rx_res *res)
 		mlx5e_tir_builder_clear(builder);
 	}
 
-	if (!(res->features & MLX5E_RX_RES_FEATURE_XSK))
-		goto out;
-
-	for (ix = 0; ix < res->max_nch; ix++) {
-		err = mlx5e_rqt_init_direct(&res->channels[ix].xsk_rqt,
-					    res->mdev, false, res->drop_rqn);
-		if (err) {
-			mlx5_core_warn(res->mdev, "Failed to create an XSK RQT: err = %d, ix = %u\n",
-				       err, ix);
-			goto err_destroy_xsk_rqts;
-		}
-	}
-
-	for (ix = 0; ix < res->max_nch; ix++) {
-		mlx5e_tir_builder_build_rqt(builder, res->mdev->mlx5e_res.hw_objs.td.tdn,
-					    mlx5e_rqt_get_rqtn(&res->channels[ix].xsk_rqt),
-					    inner_ft_support);
-		mlx5e_tir_builder_build_packet_merge(builder, &res->pkt_merge_param);
-		mlx5e_tir_builder_build_direct(builder);
-
-		err = mlx5e_tir_init(&res->channels[ix].xsk_tir, builder, res->mdev, true);
-		if (err) {
-			mlx5_core_warn(res->mdev, "Failed to create an XSK TIR: err = %d, ix = %u\n",
-				       err, ix);
-			goto err_destroy_xsk_tirs;
-		}
-
-		mlx5e_tir_builder_clear(builder);
-	}
-
 	goto out;
 
-err_destroy_xsk_tirs:
-	while (--ix >= 0)
-		mlx5e_tir_destroy(&res->channels[ix].xsk_tir);
-
-	ix = res->max_nch;
-err_destroy_xsk_rqts:
-	while (--ix >= 0)
-		mlx5e_rqt_destroy(&res->channels[ix].xsk_rqt);
-
-	ix = res->max_nch;
 err_destroy_direct_tirs:
 	while (--ix >= 0)
 		mlx5e_tir_destroy(&res->channels[ix].direct_tir);
@@ -420,12 +378,6 @@ static void mlx5e_rx_res_channels_destroy(struct mlx5e_rx_res *res)
 	for (ix = 0; ix < res->max_nch; ix++) {
 		mlx5e_tir_destroy(&res->channels[ix].direct_tir);
 		mlx5e_rqt_destroy(&res->channels[ix].direct_rqt);
-
-		if (!(res->features & MLX5E_RX_RES_FEATURE_XSK))
-			continue;
-
-		mlx5e_tir_destroy(&res->channels[ix].xsk_tir);
-		mlx5e_rqt_destroy(&res->channels[ix].xsk_rqt);
 	}
 
 	kvfree(res->channels);
@@ -491,13 +443,6 @@ u32 mlx5e_rx_res_get_tirn_direct(struct mlx5e_rx_res *res, unsigned int ix)
 	return mlx5e_tir_get_tirn(&res->channels[ix].direct_tir);
 }
 
-u32 mlx5e_rx_res_get_tirn_xsk(struct mlx5e_rx_res *res, unsigned int ix)
-{
-	WARN_ON(!(res->features & MLX5E_RX_RES_FEATURE_XSK));
-
-	return mlx5e_tir_get_tirn(&res->channels[ix].xsk_tir);
-}
-
 u32 mlx5e_rx_res_get_tirn_rss(struct mlx5e_rx_res *res, enum mlx5_traffic_types tt)
 {
 	struct mlx5e_rss *rss = res->rss[0];
@@ -527,26 +472,14 @@ static void mlx5e_rx_res_channel_activate_direct(struct mlx5e_rx_res *res,
 						 struct mlx5e_channels *chs,
 						 unsigned int ix)
 {
-	u32 rqn;
+	u32 rqn = res->rss_rqns[ix];
 	int err;
 
-	mlx5e_channels_get_regular_rqn(chs, ix, &rqn);
 	err = mlx5e_rqt_redirect_direct(&res->channels[ix].direct_rqt, rqn);
 	if (err)
 		mlx5_core_warn(res->mdev, "Failed to redirect direct RQT %#x to RQ %#x (channel %u): err = %d\n",
 			       mlx5e_rqt_get_rqtn(&res->channels[ix].direct_rqt),
 			       rqn, ix, err);
-
-	if (!(res->features & MLX5E_RX_RES_FEATURE_XSK))
-		return;
-
-	if (!mlx5e_channels_get_xsk_rqn(chs, ix, &rqn))
-		rqn = res->drop_rqn;
-	err = mlx5e_rqt_redirect_direct(&res->channels[ix].xsk_rqt, rqn);
-	if (err)
-		mlx5_core_warn(res->mdev, "Failed to redirect XSK RQT %#x to RQ %#x (channel %u): err = %d\n",
-			       mlx5e_rqt_get_rqtn(&res->channels[ix].xsk_rqt),
-			       rqn, ix, err);
 }
 
 static void mlx5e_rx_res_channel_deactivate_direct(struct mlx5e_rx_res *res,
@@ -559,15 +492,6 @@ static void mlx5e_rx_res_channel_deactivate_direct(struct mlx5e_rx_res *res,
 		mlx5_core_warn(res->mdev, "Failed to redirect direct RQT %#x to drop RQ %#x (channel %u): err = %d\n",
 			       mlx5e_rqt_get_rqtn(&res->channels[ix].direct_rqt),
 			       res->drop_rqn, ix, err);
-
-	if (!(res->features & MLX5E_RX_RES_FEATURE_XSK))
-		return;
-
-	err = mlx5e_rqt_redirect_direct(&res->channels[ix].xsk_rqt, res->drop_rqn);
-	if (err)
-		mlx5_core_warn(res->mdev, "Failed to redirect XSK RQT %#x to drop RQ %#x (channel %u): err = %d\n",
-			       mlx5e_rqt_get_rqtn(&res->channels[ix].xsk_rqt),
-			       res->drop_rqn, ix, err);
 }
 
 void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_channels *chs)
@@ -577,8 +501,12 @@ void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_chann
 
 	nch = mlx5e_channels_get_num(chs);
 
-	for (ix = 0; ix < chs->num; ix++)
-		mlx5e_channels_get_regular_rqn(chs, ix, &res->rss_rqns[ix]);
+	for (ix = 0; ix < chs->num; ix++) {
+		if (mlx5e_channels_is_xsk(chs, ix))
+			mlx5e_channels_get_xsk_rqn(chs, ix, &res->rss_rqns[ix]);
+		else
+			mlx5e_channels_get_regular_rqn(chs, ix, &res->rss_rqns[ix]);
+	}
 	res->rss_nch = chs->num;
 
 	mlx5e_rx_res_rss_enable(res);
@@ -621,33 +549,17 @@ void mlx5e_rx_res_channels_deactivate(struct mlx5e_rx_res *res)
 	}
 }
 
-int mlx5e_rx_res_xsk_activate(struct mlx5e_rx_res *res, struct mlx5e_channels *chs,
-			      unsigned int ix)
+void mlx5e_rx_res_xsk_update(struct mlx5e_rx_res *res, struct mlx5e_channels *chs,
+			     unsigned int ix, bool xsk)
 {
-	u32 rqn;
-	int err;
-
-	if (!mlx5e_channels_get_xsk_rqn(chs, ix, &rqn))
-		return -EINVAL;
-
-	err = mlx5e_rqt_redirect_direct(&res->channels[ix].xsk_rqt, rqn);
-	if (err)
-		mlx5_core_warn(res->mdev, "Failed to redirect XSK RQT %#x to XSK RQ %#x (channel %u): err = %d\n",
-			       mlx5e_rqt_get_rqtn(&res->channels[ix].xsk_rqt),
-			       rqn, ix, err);
-	return err;
-}
+	if (xsk)
+		mlx5e_channels_get_xsk_rqn(chs, ix, &res->rss_rqns[ix]);
+	else
+		mlx5e_channels_get_regular_rqn(chs, ix, &res->rss_rqns[ix]);
 
-int mlx5e_rx_res_xsk_deactivate(struct mlx5e_rx_res *res, unsigned int ix)
-{
-	int err;
+	mlx5e_rx_res_rss_enable(res);
 
-	err = mlx5e_rqt_redirect_direct(&res->channels[ix].xsk_rqt, res->drop_rqn);
-	if (err)
-		mlx5_core_warn(res->mdev, "Failed to redirect XSK RQT %#x to drop RQ %#x (channel %u): err = %d\n",
-			       mlx5e_rqt_get_rqtn(&res->channels[ix].xsk_rqt),
-			       res->drop_rqn, ix, err);
-	return err;
+	mlx5e_rx_res_channel_activate_direct(res, chs, ix);
 }
 
 int mlx5e_rx_res_packet_merge_set_param(struct mlx5e_rx_res *res,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
index b39b20a720e0..5d5f64fab60f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
@@ -17,8 +17,7 @@ struct mlx5e_rss_params_hash;
 
 enum mlx5e_rx_res_features {
 	MLX5E_RX_RES_FEATURE_INNER_FT = BIT(0),
-	MLX5E_RX_RES_FEATURE_XSK = BIT(1),
-	MLX5E_RX_RES_FEATURE_PTP = BIT(2),
+	MLX5E_RX_RES_FEATURE_PTP = BIT(1),
 };
 
 /* Setup */
@@ -32,7 +31,6 @@ void mlx5e_rx_res_free(struct mlx5e_rx_res *res);
 
 /* TIRN getters for flow steering */
 u32 mlx5e_rx_res_get_tirn_direct(struct mlx5e_rx_res *res, unsigned int ix);
-u32 mlx5e_rx_res_get_tirn_xsk(struct mlx5e_rx_res *res, unsigned int ix);
 u32 mlx5e_rx_res_get_tirn_rss(struct mlx5e_rx_res *res, enum mlx5_traffic_types tt);
 u32 mlx5e_rx_res_get_tirn_rss_inner(struct mlx5e_rx_res *res, enum mlx5_traffic_types tt);
 u32 mlx5e_rx_res_get_tirn_ptp(struct mlx5e_rx_res *res);
@@ -40,9 +38,8 @@ u32 mlx5e_rx_res_get_tirn_ptp(struct mlx5e_rx_res *res);
 /* Activate/deactivate API */
 void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_channels *chs);
 void mlx5e_rx_res_channels_deactivate(struct mlx5e_rx_res *res);
-int mlx5e_rx_res_xsk_activate(struct mlx5e_rx_res *res, struct mlx5e_channels *chs,
-			      unsigned int ix);
-int mlx5e_rx_res_xsk_deactivate(struct mlx5e_rx_res *res, unsigned int ix);
+void mlx5e_rx_res_xsk_update(struct mlx5e_rx_res *res, struct mlx5e_channels *chs,
+			     unsigned int ix, bool xsk);
 
 /* Configuration API */
 void mlx5e_rx_res_rss_set_indir_uniform(struct mlx5e_rx_res *res, unsigned int nch);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
index 6058b1e72c6c..9804ef15a4d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
@@ -124,16 +124,10 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 	 * any Fill Ring entries at the setup stage.
 	 */
 
-	err = mlx5e_rx_res_xsk_activate(priv->rx_res, &priv->channels, ix);
-	if (unlikely(err))
-		goto err_deactivate;
+	mlx5e_rx_res_xsk_update(priv->rx_res, &priv->channels, ix, true);
 
 	return 0;
 
-err_deactivate:
-	mlx5e_deactivate_xsk(c);
-	mlx5e_close_xsk(c);
-
 err_remove_pool:
 	mlx5e_xsk_remove_pool(&priv->xsk, ix);
 
@@ -171,7 +165,7 @@ static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix)
 		goto remove_pool;
 
 	c = priv->channels.c[ix];
-	mlx5e_rx_res_xsk_deactivate(priv->rx_res, ix);
+	mlx5e_rx_res_xsk_update(priv->rx_res, &priv->channels, ix, false);
 	mlx5e_deactivate_xsk(c);
 	mlx5e_close_xsk(c);
 
@@ -209,11 +203,10 @@ int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	struct mlx5e_params *params = &priv->channels.params;
-	u16 ix;
 
-	if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
+	if (unlikely(qid >= params->num_channels))
 		return -EINVAL;
 
-	return pool ? mlx5e_xsk_enable_pool(priv, pool, ix) :
-		      mlx5e_xsk_disable_pool(priv, ix);
+	return pool ? mlx5e_xsk_enable_pool(priv, pool, qid) :
+		      mlx5e_xsk_disable_pool(priv, qid);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index d7dfc7d2c058..ff03c43833bb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -66,7 +66,7 @@ static int mlx5e_init_xsk_rq(struct mlx5e_channel *c,
 	rq->xsk_pool     = pool;
 	rq->stats        = &c->priv->channel_stats[c->ix]->xskrq;
 	rq->ptp_cyc2time = mlx5_rq_ts_translator(mdev);
-	rq_xdp_ix        = c->ix + params->num_channels * MLX5E_RQ_GROUP_XSK;
+	rq_xdp_ix        = c->ix;
 	err = mlx5e_rq_set_handlers(rq, params, xsk);
 	if (err)
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index c856fc3f197e..367a9505ca4f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -12,15 +12,14 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	struct mlx5e_params *params = &priv->channels.params;
 	struct mlx5e_channel *c;
-	u16 ix;
 
 	if (unlikely(!mlx5e_xdp_is_active(priv)))
 		return -ENETDOWN;
 
-	if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
+	if (unlikely(qid >= params->num_channels))
 		return -EINVAL;
 
-	c = priv->channels.c[ix];
+	c = priv->channels.c[qid];
 
 	if (!napi_if_scheduled_mark_missed(&c->napi)) {
 		/* To avoid WQE overrun, don't post a NOP if async_icosq is not
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index 2a67798cd446..aac32e505c14 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -451,15 +451,7 @@ static int flow_get_tirn(struct mlx5e_priv *priv,
 		eth_rule->rss = rss;
 		mlx5e_rss_refcnt_inc(eth_rule->rss);
 	} else {
-		struct mlx5e_params *params = &priv->channels.params;
-		enum mlx5e_rq_group group;
-		u16 ix;
-
-		mlx5e_qid_get_ch_and_group(params, fs->ring_cookie, &ix, &group);
-
-		*tirn = group == MLX5E_RQ_GROUP_XSK ?
-			mlx5e_rx_res_get_tirn_xsk(priv->rx_res, ix) :
-			mlx5e_rx_res_get_tirn_direct(priv->rx_res, ix);
+		*tirn = mlx5e_rx_res_get_tirn_direct(priv->rx_res, fs->ring_cookie);
 	}
 
 	return 0;
@@ -682,8 +674,7 @@ static int validate_flow(struct mlx5e_priv *priv,
 		return -ENOSPC;
 
 	if (fs->ring_cookie != RX_CLS_FLOW_DISC)
-		if (!mlx5e_qid_validate(priv->profile, &priv->channels.params,
-					fs->ring_cookie))
+		if (fs->ring_cookie >= priv->channels.params.num_channels)
 			return -EINVAL;
 
 	switch (flow_type_mask(fs->flow_type)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 129a0d678cce..21fe43406d88 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2690,7 +2690,7 @@ static int mlx5e_update_netdev_queues(struct mlx5e_priv *priv)
 	struct netdev_tc_txq old_tc_to_txq[TC_MAX_QUEUE], *tc_to_txq;
 	struct net_device *netdev = priv->netdev;
 	int old_num_txqs, old_ntc;
-	int num_rxqs, nch, ntc;
+	int nch, ntc;
 	int err;
 	int i;
 
@@ -2701,7 +2701,6 @@ static int mlx5e_update_netdev_queues(struct mlx5e_priv *priv)
 
 	nch = priv->channels.params.num_channels;
 	ntc = priv->channels.params.mqprio.num_tc;
-	num_rxqs = nch * priv->profile->rq_groups;
 	tc_to_txq = priv->channels.params.mqprio.tc_to_txq;
 
 	err = mlx5e_netdev_set_tcs(netdev, nch, ntc, tc_to_txq);
@@ -2710,7 +2709,7 @@ static int mlx5e_update_netdev_queues(struct mlx5e_priv *priv)
 	err = mlx5e_update_tx_netdev_queues(priv);
 	if (err)
 		goto err_tcs;
-	err = netif_set_real_num_rx_queues(netdev, num_rxqs);
+	err = netif_set_real_num_rx_queues(netdev, nch);
 	if (err) {
 		netdev_warn(netdev, "netif_set_real_num_rx_queues failed, %d\n", err);
 		goto err_txqs;
@@ -5199,7 +5198,7 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 		goto err_destroy_q_counters;
 	}
 
-	features = MLX5E_RX_RES_FEATURE_XSK | MLX5E_RX_RES_FEATURE_PTP;
+	features = MLX5E_RX_RES_FEATURE_PTP;
 	if (priv->channels.params.tunneled_offload_en)
 		features |= MLX5E_RX_RES_FEATURE_INNER_FT;
 	err = mlx5e_rx_res_init(priv->rx_res, priv->mdev, features,
@@ -5390,7 +5389,6 @@ static const struct mlx5e_profile mlx5e_nic_profile = {
 	.update_carrier	   = mlx5e_update_carrier,
 	.rx_handlers       = &mlx5e_rx_handlers_nic,
 	.max_tc		   = MLX5E_MAX_NUM_TC,
-	.rq_groups	   = MLX5E_NUM_RQ_GROUPS(XSK),
 	.stats_grps	   = mlx5e_nic_stats_grps,
 	.stats_grps_num	   = mlx5e_nic_stats_grps_num,
 	.features          = BIT(MLX5E_PROFILE_FEATURE_PTP_RX) |
@@ -5423,8 +5421,7 @@ mlx5e_calc_max_nch(struct mlx5_core_dev *mdev, struct net_device *netdev,
 	max_nch = mlx5e_profile_max_num_channels(mdev, profile);
 
 	/* netdev rx queues */
-	tmp = netdev->num_rx_queues / max_t(u8, profile->rq_groups, 1);
-	max_nch = min_t(unsigned int, max_nch, tmp);
+	max_nch = min_t(unsigned int, max_nch, netdev->num_rx_queues);
 
 	/* netdev tx queues */
 	tmp = netdev->num_tx_queues;
@@ -5568,11 +5565,7 @@ static unsigned int mlx5e_get_max_num_txqs(struct mlx5_core_dev *mdev,
 static unsigned int mlx5e_get_max_num_rxqs(struct mlx5_core_dev *mdev,
 					   const struct mlx5e_profile *profile)
 {
-	unsigned int nch;
-
-	nch = mlx5e_profile_max_num_channels(mdev, profile);
-
-	return nch * profile->rq_groups;
+	return mlx5e_profile_max_num_channels(mdev, profile);
 }
 
 struct net_device *
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 83b2febe8a7b..794cd8dfe9c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1224,7 +1224,6 @@ static const struct mlx5e_profile mlx5e_rep_profile = {
 	.update_stats           = mlx5e_stats_update_ndo_stats,
 	.rx_handlers            = &mlx5e_rx_handlers_rep,
 	.max_tc			= 1,
-	.rq_groups		= MLX5E_NUM_RQ_GROUPS(REGULAR),
 	.stats_grps		= mlx5e_rep_stats_grps,
 	.stats_grps_num		= mlx5e_rep_stats_grps_num,
 	.max_nch_limit		= mlx5e_rep_max_nch_limit,
@@ -1244,8 +1243,6 @@ static const struct mlx5e_profile mlx5e_uplink_rep_profile = {
 	.update_carrier	        = mlx5e_update_carrier,
 	.rx_handlers            = &mlx5e_rx_handlers_rep,
 	.max_tc			= MLX5E_MAX_NUM_TC,
-	/* XSK is needed so we can replace profile with NIC netdev */
-	.rq_groups		= MLX5E_NUM_RQ_GROUPS(XSK),
 	.stats_grps		= mlx5e_ul_rep_stats_grps,
 	.stats_grps_num		= mlx5e_ul_rep_stats_grps_num,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 35f797cfd21e..4e3a75496dd9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -463,7 +463,6 @@ static const struct mlx5e_profile mlx5i_nic_profile = {
 	.update_carrier    = NULL, /* no HW update in IB link */
 	.rx_handlers       = &mlx5i_rx_handlers,
 	.max_tc		   = MLX5I_MAX_NUM_TC,
-	.rq_groups	   = MLX5E_NUM_RQ_GROUPS(REGULAR),
 	.stats_grps        = mlx5i_stats_grps,
 	.stats_grps_num    = mlx5i_stats_grps_num,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
index 0b86e78dbc0e..0227a521d301 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
@@ -349,7 +349,6 @@ static const struct mlx5e_profile mlx5i_pkey_nic_profile = {
 	.update_stats	   = NULL,
 	.rx_handlers       = &mlx5i_rx_handlers,
 	.max_tc		   = MLX5I_MAX_NUM_TC,
-	.rq_groups	   = MLX5E_NUM_RQ_GROUPS(REGULAR),
 };
 
 const struct mlx5e_profile *mlx5i_pkey_get_profile(void)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30
  2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
                   ` (15 preceding siblings ...)
  2022-09-30 16:29 ` [PATCH net-next 16/16] net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues Saeed Mahameed
@ 2022-10-01 20:40 ` patchwork-bot+netdevbpf
  16 siblings, 0 replies; 18+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-10-01 20:40 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem, kuba, pabeni, edumazet, saeedm, netdev, tariqt

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 30 Sep 2022 09:28:47 -0700 you wrote:
> From: Saeed Mahameed <saeedm@nvidia.com>
> 
> The gist of this 4 part series is in this patchset's last patch
> 
> This series contains performance optimizations. XSK starts using the
> batching allocator, and XSK data path gets separated from the regular
> RX, allowing to drop some branches not relevant for non-XSK use cases.
> Some minor optimizations for indirect calls and need_wakeup are also
> included.
> 
> [...]

Here is the summary with links:
  - [net-next,01/16] net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup
    https://git.kernel.org/netdev/net-next/c/d54d7194ba48
  - [net-next,02/16] net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup
    https://git.kernel.org/netdev/net-next/c/8cbcafcee191
  - [net-next,03/16] net/mlx5e: Introduce wqe_index_mask for legacy RQ
    https://git.kernel.org/netdev/net-next/c/a064c609849b
  - [net-next,04/16] net/mlx5e: Make the wqe_index_mask calculation more exact
    https://git.kernel.org/netdev/net-next/c/5758c3145b88
  - [net-next,05/16] net/mlx5e: Use partial batches in legacy RQ
    https://git.kernel.org/netdev/net-next/c/42847fed5552
  - [net-next,06/16] net/mlx5e: xsk: Use partial batches in legacy RQ with XSK
    https://git.kernel.org/netdev/net-next/c/3f5fe0b2e606
  - [net-next,07/16] net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs
    https://git.kernel.org/netdev/net-next/c/0b4822323745
  - [net-next,08/16] net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ
    https://git.kernel.org/netdev/net-next/c/a2e5ba242c33
  - [net-next,09/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ
    https://git.kernel.org/netdev/net-next/c/259bbc64367a
  - [net-next,10/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ
    https://git.kernel.org/netdev/net-next/c/cf544517c469
  - [net-next,11/16] net/mlx5e: Use non-XSK page allocator in SHAMPO
    https://git.kernel.org/netdev/net-next/c/132857d9124c
  - [net-next,12/16] net/mlx5e: Call mlx5e_page_release_dynamic directly where possible
    https://git.kernel.org/netdev/net-next/c/96d37d861a09
  - [net-next,13/16] net/mlx5e: Optimize RQ page deallocation
    https://git.kernel.org/netdev/net-next/c/ddb7afeee28b
  - [net-next,14/16] net/mlx5e: xsk: Support XDP metadata on XSK RQs
    https://git.kernel.org/netdev/net-next/c/a752b2edb5c1
  - [net-next,15/16] net/mlx5e: Introduce the mlx5e_flush_rq function
    https://git.kernel.org/netdev/net-next/c/d9ba64deb2f1
  - [net-next,16/16] net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues
    https://git.kernel.org/netdev/net-next/c/3db4c85cde7a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-10-01 20:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-30 16:28 [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 01/16] net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 02/16] net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 03/16] net/mlx5e: Introduce wqe_index_mask for legacy RQ Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 04/16] net/mlx5e: Make the wqe_index_mask calculation more exact Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 05/16] net/mlx5e: Use partial batches in legacy RQ Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 06/16] net/mlx5e: xsk: Use partial batches in legacy RQ with XSK Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 07/16] net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 08/16] net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 09/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 10/16] net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 11/16] net/mlx5e: Use non-XSK page allocator in SHAMPO Saeed Mahameed
2022-09-30 16:28 ` [PATCH net-next 12/16] net/mlx5e: Call mlx5e_page_release_dynamic directly where possible Saeed Mahameed
2022-09-30 16:29 ` [PATCH net-next 13/16] net/mlx5e: Optimize RQ page deallocation Saeed Mahameed
2022-09-30 16:29 ` [PATCH net-next 14/16] net/mlx5e: xsk: Support XDP metadata on XSK RQs Saeed Mahameed
2022-09-30 16:29 ` [PATCH net-next 15/16] net/mlx5e: Introduce the mlx5e_flush_rq function Saeed Mahameed
2022-09-30 16:29 ` [PATCH net-next 16/16] net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues Saeed Mahameed
2022-10-01 20:40 ` [PATCH net-next 00/16] mlx5 xsk updates part3 2022-09-30 patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.