[pull request][net-next 00/15] mlx5: Drop internal page cache implementation

* [pull request][net-next 00/15] mlx5: Drop internal page cache implementation
@ 2023-03-28 20:56 Saeed Mahameed
  2023-03-28 20:56 ` [net-next 01/15] net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation Saeed Mahameed
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-03-28 20:56 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Jesper Dangaard Brouer,
	Matthew Wilcox, Toke Høiland-Jørgensen,
	Ilias Apalodimas

From: Saeed Mahameed <saeedm@nvidia.com>

Hi Dave, Hi Jakub,

This series from Dragos provides the patches that remove the mlx5
internal page cache implementation and convert mlx5 RX buffers to
completely rely on the standard page pool.

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

The following changes since commit 86e2eca4ddedc07d639c44c990e1c220cac3741e:

  net: ethernet: ti: am65-cpsw: enable p0 host port rx_vlan_remap (2023-03-28 15:29:50 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2023-03-28

for you to fetch changes up to 3905f8d64ccc2c640d8c1179f4452f2bf8f1df56:

  net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats (2023-03-28 13:43:59 -0700)

----------------------------------------------------------------
mlx5-updates-2023-03-28

Dragos Tatulea says:
====================

net/mlx5e: RX, Drop page_cache and fully use page_pool

For page allocation on the rx path, the mlx5e driver has been using an
internal page cache in tandem with the page pool. The internal page
cache uses a queue for page recycling which has the issue of head of
queue blocking.

This patch series drops the internal page_cache altogether and uses the
page_pool to implement everything that was done by the page_cache
before:
* Let the page_pool handle dma mapping and unmapping.
* Use fragmented pages with fragment counter instead of tracking via
  page ref.
* Enable skb recycling.

The patch series has the following effects on the rx path:

* Improved performance for the cases when there was low page recycling
  due to head of queue blocking in the internal page_cache. The test
  for this was running a single iperf TCP stream to a rx queue
  which is bound on the same cpu as the application.

  |-------------+--------+--------+------+---------|
  | rq type     | before | after  | unit |   diff  |
  |-------------+--------+--------+------+---------|
  | striding rq |  30.1  |  31.4  | Gbps |  4.14 % |
  | legacy rq   |  30.2  |  33.0  | Gbps |  8.48 % |
  |-------------+--------+--------+------+---------|

* Small XDP performance degradation. The test was is XDP drop
  program running on a single rx queue with small packets incoming
  it looks like this:

  |-------------+----------+----------+------+---------|
  | rq type     | before   | after    | unit |   diff  |
  |-------------+----------+----------+------+---------|
  | striding rq | 19725449 | 18544617 | pps  | -6.37 % |
  | legacy rq   | 19879931 | 18631841 | pps  | -6.70 % |
  |-------------+----------+----------+------+---------|

  This will be handled in a different patch series by adding support for
  multi-packet per page.

* For other cases the performance is roughly the same.

The above numbers were obtained on the following system:
  24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
  32 GB RAM
  ConnectX-7 single port

The breakdown on the patch series is the following:
* Preparations for introducing the mlx5e_frag_page struct.
* Delete the mlx5e_page_cache struct.
* Enable dma mapping from page_pool.
* Enable skb recycling and fragment counting.
* Do deferred release of pages (just before alloc) to ensure better
  page_pool cache utilization.

====================

----------------------------------------------------------------
Dragos Tatulea (15):
      net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
      net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
      net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
      net/mlx5e: RX, Store SHAMPO header pages in array
      net/mlx5e: RX, Remove internal page_cache
      net/mlx5e: RX, Enable dma map and sync from page_pool allocator
      net/mlx5e: RX, Enable skb page recycling through the page_pool
      net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
      net/mlx5e: RX, Defer page release in striding rq for better recycling
      net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
      net/mlx5e: RX, Defer page release in legacy rq for better recycling
      net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
      net/mlx5e: RX, Increase WQE bulk size for legacy rq
      net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
      net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats

 .../ethernet/mellanox/mlx5/counters.rst            |  26 --
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  51 ++-
 .../net/ethernet/mellanox/mlx5/core/en/params.c    |  53 ++-
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h  |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c    |  54 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 167 +++++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 452 +++++++++++----------
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |  20 -
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |  10 -
 11 files changed, 464 insertions(+), 389 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread