All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net-next 00/15] mlx5 updates 2023-12-20
@ 2023-12-21  0:57 Saeed Mahameed
  2023-12-21  0:57 ` [net-next 01/15] net/mlx5e: Use the correct lag ports number when creating TISes Saeed Mahameed
                   ` (15 more replies)
  0 siblings, 16 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Saeed Mahameed <saeedm@nvidia.com>

This series adds support for Socket Direct and Embedded management PF
ethernet netdev support.

For more information please see tag log below.

Please pull and let me know if there is any problem.

Happy holidays.

Thanks,
Saeed.


The following changes since commit bee9705c679d0df8ee099e3c5312ac76f447848a:

  Merge branch 'net-sched-tc-drop-reason' (2023-12-20 11:50:13 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2023-12-20

for you to fetch changes up to 22c4640698a1d47606b5a4264a584e8046641784:

  net/mlx5: Implement management PF Ethernet profile (2023-12-20 16:54:27 -0800)

----------------------------------------------------------------
mlx5-updates-2023-12-20

mlx5 Socket direct support and management PF profile.

Tariq Says:
===========
Support Socket-Direct multi-dev netdev

This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.

We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.

The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.

We pick one device to be a primary (leader), and it fills a special
role.  The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.

Currently, we limit the support to PFs only, and up to two devices
(sockets).

===========

Armen Says:
===========
Management PF support and module integration

This patch rolls out comprehensive support for the Management Physical
Function (MGMT PF) within the mlx5 driver. It involves updating the
mlx5 interface header to introduce necessary definitions for MGMT PF
and adding a new management PF netdev profile, which will allow the host
side to communicate with the embedded linux on Blue-field devices.

===========

----------------------------------------------------------------
Armen Ratner (1):
      net/mlx5: Implement management PF Ethernet profile

Saeed Mahameed (1):
      net/mlx5e: Use the correct lag ports number when creating TISes

Tariq Toukan (13):
      net/mlx5: Fix query of sd_group field
      net/mlx5: SD, Introduce SD lib
      net/mlx5: SD, Implement basic query and instantiation
      net/mlx5: SD, Implement devcom communication and primary election
      net/mlx5: SD, Implement steering for primary and secondaries
      net/mlx5: SD, Add informative prints in kernel log
      net/mlx5e: Create single netdev per SD group
      net/mlx5e: Create EN core HW resources for all secondary devices
      net/mlx5e: Let channels be SD-aware
      net/mlx5e: Support cross-vhca RSS
      net/mlx5e: Support per-mdev queue counter
      net/mlx5e: Block TLS device offload on combined SD netdev
      net/mlx5: Enable SD feature

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/dev.c      |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c     |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  15 +-
 .../net/ethernet/mellanox/mlx5/core/en/channels.c  |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en/channels.h  |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c   | 268 ++++++++++++
 .../ethernet/mellanox/mlx5/core/en/monitor_stats.c |  48 +-
 .../net/ethernet/mellanox/mlx5/core/en/params.c    |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en/params.h    |   3 -
 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/qos.c   |   8 +-
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   |   4 +-
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/rqt.c   | 123 +++++-
 drivers/net/ethernet/mellanox/mlx5/core/en/rqt.h   |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/rss.c   |  17 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/rss.h   |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en/rx_res.c    |  62 ++-
 .../net/ethernet/mellanox/mlx5/core/en/rx_res.h    |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/trap.c  |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/pool.c  |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c |   8 +-
 .../ethernet/mellanox/mlx5/core/en_accel/ktls.c    |   2 +-
 .../ethernet/mellanox/mlx5/core/en_accel/ktls.h    |   4 +-
 .../ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c    |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 200 +++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |  39 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/lib/devcom.h   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |  12 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c   | 487 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h   |  38 ++
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  21 +
 include/linux/mlx5/driver.h                        |  10 +
 include/linux/mlx5/mlx5_ifc.h                      |  24 +-
 include/linux/mlx5/vport.h                         |   1 +
 40 files changed, 1320 insertions(+), 192 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-next 01/15] net/mlx5e: Use the correct lag ports number when creating TISes
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-29 22:40   ` patchwork-bot+netdevbpf
  2023-12-21  0:57 ` [net-next 02/15] net/mlx5: Fix query of sd_group field Saeed Mahameed
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Saeed Mahameed <saeedm@nvidia.com>

The cited commit moved the code of mlx5e_create_tises() and changed the
loop to create TISes over MLX5_MAX_PORTS constant value, instead of
getting the correct lag ports supported by the device, which can cause
FW errors on devices with less than MLX5_MAX_PORTS ports.

Change that back to mlx5e_get_num_lag_ports(mdev).

Also IPoIB interfaces create there own TISes, they don't use the eth
TISes, pass a flag to indicate that.

Fixes: b25bd37c859f ("net/mlx5: Move TISes from priv to mdev HW resources")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../ethernet/mellanox/mlx5/core/en_common.c   | 21 ++++++++++++-------
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib.c |  2 +-
 include/linux/mlx5/driver.h                   |  1 +
 5 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 0bfe1ca8a364..55c6ace0acd5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1124,7 +1124,7 @@ static inline bool mlx5_tx_swp_supported(struct mlx5_core_dev *mdev)
 extern const struct ethtool_ops mlx5e_ethtool_ops;
 
 int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, u32 *mkey);
-int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
+int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises);
 void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
 int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb,
 		       bool enable_mc_lb);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index 67f546683e85..6ed3a32b7e22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -95,7 +95,7 @@ static void mlx5e_destroy_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PO
 {
 	int tc, i;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < mlx5e_get_num_lag_ports(mdev); i++)
 		for (tc = 0; tc < MLX5_MAX_NUM_TC; tc++)
 			mlx5e_destroy_tis(mdev, tisn[i][tc]);
 }
@@ -110,7 +110,7 @@ static int mlx5e_create_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PORT
 	int tc, i;
 	int err;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++) {
+	for (i = 0; i < mlx5e_get_num_lag_ports(mdev); i++) {
 		for (tc = 0; tc < MLX5_MAX_NUM_TC; tc++) {
 			u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
 			void *tisc;
@@ -140,7 +140,7 @@ static int mlx5e_create_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PORT
 	return err;
 }
 
-int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev)
+int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises)
 {
 	struct mlx5e_hw_objs *res = &mdev->mlx5e_res.hw_objs;
 	int err;
@@ -169,11 +169,15 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev)
 		goto err_destroy_mkey;
 	}
 
-	err = mlx5e_create_tises(mdev, res->tisn);
-	if (err) {
-		mlx5_core_err(mdev, "alloc tises failed, %d\n", err);
-		goto err_destroy_bfreg;
+	if (create_tises) {
+		err = mlx5e_create_tises(mdev, res->tisn);
+		if (err) {
+			mlx5_core_err(mdev, "alloc tises failed, %d\n", err);
+			goto err_destroy_bfreg;
+		}
+		res->tisn_valid = true;
 	}
+
 	INIT_LIST_HEAD(&res->td.tirs_list);
 	mutex_init(&res->td.list_lock);
 
@@ -203,7 +207,8 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev)
 
 	mlx5_crypto_dek_cleanup(mdev->mlx5e_res.dek_priv);
 	mdev->mlx5e_res.dek_priv = NULL;
-	mlx5e_destroy_tises(mdev, res->tisn);
+	if (res->tisn_valid)
+		mlx5e_destroy_tises(mdev, res->tisn);
 	mlx5_free_bfreg(mdev, &res->bfreg);
 	mlx5_core_destroy_mkey(mdev, res->mkey);
 	mlx5_core_dealloc_transport_domain(mdev, res->td.tdn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b5f1c4ca38ba..c8e8f512803e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5992,7 +5992,7 @@ static int mlx5e_resume(struct auxiliary_device *adev)
 	if (netif_device_present(netdev))
 		return 0;
 
-	err = mlx5e_create_mdev_resources(mdev);
+	err = mlx5e_create_mdev_resources(mdev, true);
 	if (err)
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 58845121954c..d77be1b4dd9c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -783,7 +783,7 @@ static int mlx5_rdma_setup_rn(struct ib_device *ibdev, u32 port_num,
 		}
 
 		/* This should only be called once per mdev */
-		err = mlx5e_create_mdev_resources(mdev);
+		err = mlx5e_create_mdev_resources(mdev, false);
 		if (err)
 			goto destroy_ht;
 	}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 7ee5b79ff3d6..aafb36c9e5d9 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -681,6 +681,7 @@ struct mlx5e_resources {
 		struct mlx5_sq_bfreg       bfreg;
 #define MLX5_MAX_NUM_TC 8
 		u32                        tisn[MLX5_MAX_PORTS][MLX5_MAX_NUM_TC];
+		bool			   tisn_valid;
 	} hw_objs;
 	struct net_device *uplink_netdev;
 	struct mutex uplink_netdev_lock;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 02/15] net/mlx5: Fix query of sd_group field
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
  2023-12-21  0:57 ` [net-next 01/15] net/mlx5e: Use the correct lag ports number when creating TISes Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 03/15] net/mlx5: SD, Introduce SD lib Saeed Mahameed
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Tariq Toukan <tariqt@nvidia.com>

The sd_group field moved in the HW spec from the MPIR register
to the vport context.
Align the query accordingly.

Fixes: f5e956329960 ("net/mlx5: Expose Management PCIe Index Register (MPIR)")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/vport.c   | 21 +++++++++++++++++++
 include/linux/mlx5/mlx5_ifc.h                 | 10 ++++++---
 include/linux/mlx5/vport.h                    |  1 +
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 5a31fb47ffa5..c95a84b7db3a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -440,6 +440,27 @@ int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_system_image_guid);
 
+int mlx5_query_nic_vport_sd_group(struct mlx5_core_dev *mdev, u8 *sd_group)
+{
+	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
+	u32 *out;
+	int err;
+
+	out = kvzalloc(outlen, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	err = mlx5_query_nic_vport_context(mdev, 0, out);
+	if (err)
+		goto out;
+
+	*sd_group = MLX5_GET(query_nic_vport_context_out, out,
+			     nic_vport_context.sd_group);
+out:
+	kvfree(out);
+	return err;
+}
+
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid)
 {
 	u32 *out;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index fee20fc010c2..bf2d51952e48 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -4030,8 +4030,13 @@ struct mlx5_ifc_nic_vport_context_bits {
 	u8	   affiliation_criteria[0x4];
 	u8	   affiliated_vhca_id[0x10];
 
-	u8	   reserved_at_60[0xd0];
+	u8	   reserved_at_60[0xa0];
 
+	u8	   reserved_at_100[0x1];
+	u8         sd_group[0x3];
+	u8	   reserved_at_104[0x1c];
+
+	u8	   reserved_at_120[0x10];
 	u8         mtu[0x10];
 
 	u8         system_image_guid[0x40];
@@ -10116,8 +10121,7 @@ struct mlx5_ifc_mpir_reg_bits {
 	u8         reserved_at_20[0x20];
 
 	u8         local_port[0x8];
-	u8         reserved_at_28[0x15];
-	u8         sd_group[0x3];
+	u8         reserved_at_28[0x18];
 
 	u8         reserved_at_60[0x20];
 };
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index fbb9bf447889..c36cc6d82926 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -72,6 +72,7 @@ int mlx5_query_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 *mtu);
 int mlx5_modify_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 mtu);
 int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
 					   u64 *system_image_guid);
+int mlx5_query_nic_vport_sd_group(struct mlx5_core_dev *mdev, u8 *sd_group);
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
 int mlx5_modify_nic_vport_node_guid(struct mlx5_core_dev *mdev,
 				    u16 vport, u64 node_guid);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 03/15] net/mlx5: SD, Introduce SD lib
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
  2023-12-21  0:57 ` [net-next 01/15] net/mlx5e: Use the correct lag ports number when creating TISes Saeed Mahameed
  2023-12-21  0:57 ` [net-next 02/15] net/mlx5: Fix query of sd_group field Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation Saeed Mahameed
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Add Socket-Direct API with empty/minimal implementation.
We fill-in the implementation gradually in downstream patches.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h    | 11 ++++
 .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 60 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/lib/sd.h  | 38 ++++++++++++
 4 files changed, 110 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index c44870b175f9..76dc5a9b9648 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -29,7 +29,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += en/rqt.o en/tir.o en/rss.o en/rx_res.o \
 		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \
 		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o en/ptp.o \
 		en/qos.o en/htb.o en/trap.o en/fs_tt_redirect.o en/selq.o \
-		lib/crypto.o
+		lib/crypto.o lib/sd.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index 2b5826a785c4..0810b92b48d0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -54,4 +54,15 @@ static inline struct net_device *mlx5_uplink_netdev_get(struct mlx5_core_dev *md
 {
 	return mdev->mlx5e_res.uplink_netdev;
 }
+
+struct mlx5_sd;
+
+static inline struct mlx5_sd *mlx5_get_sd(struct mlx5_core_dev *dev)
+{
+	return NULL;
+}
+
+static inline void mlx5_set_sd(struct mlx5_core_dev *dev, struct mlx5_sd *sd)
+{
+}
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
new file mode 100644
index 000000000000..ea37238c4519
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include "lib/sd.h"
+#include "mlx5_core.h"
+
+#define sd_info(__dev, format, ...) \
+	dev_info((__dev)->device, "Socket-Direct: " format, ##__VA_ARGS__)
+#define sd_warn(__dev, format, ...) \
+	dev_warn((__dev)->device, "Socket-Direct: " format, ##__VA_ARGS__)
+
+struct mlx5_sd {
+};
+
+static int mlx5_sd_get_host_buses(struct mlx5_core_dev *dev)
+{
+	return 1;
+}
+
+struct mlx5_core_dev *
+mlx5_sd_primary_get_peer(struct mlx5_core_dev *primary, int idx)
+{
+	if (idx == 0)
+		return primary;
+
+	return NULL;
+}
+
+int mlx5_sd_ch_ix_get_dev_ix(struct mlx5_core_dev *dev, int ch_ix)
+{
+	return ch_ix % mlx5_sd_get_host_buses(dev);
+}
+
+int mlx5_sd_ch_ix_get_vec_ix(struct mlx5_core_dev *dev, int ch_ix)
+{
+	return ch_ix / mlx5_sd_get_host_buses(dev);
+}
+
+struct mlx5_core_dev *mlx5_sd_ch_ix_get_dev(struct mlx5_core_dev *primary, int ch_ix)
+{
+	int mdev_idx = mlx5_sd_ch_ix_get_dev_ix(primary, ch_ix);
+
+	return mlx5_sd_primary_get_peer(primary, mdev_idx);
+}
+
+int mlx5_sd_init(struct mlx5_core_dev *dev)
+{
+	return 0;
+}
+
+void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
+{
+}
+
+struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
+					  struct auxiliary_device *adev,
+					  int idx)
+{
+	return adev;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
new file mode 100644
index 000000000000..137efaf9aabc
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5_LIB_SD_H__
+#define __MLX5_LIB_SD_H__
+
+#define MLX5_SD_MAX_GROUP_SZ 2
+
+struct mlx5_sd;
+
+struct mlx5_core_dev *mlx5_sd_primary_get_peer(struct mlx5_core_dev *primary, int idx);
+int mlx5_sd_ch_ix_get_dev_ix(struct mlx5_core_dev *dev, int ch_ix);
+int mlx5_sd_ch_ix_get_vec_ix(struct mlx5_core_dev *dev, int ch_ix);
+struct mlx5_core_dev *mlx5_sd_ch_ix_get_dev(struct mlx5_core_dev *primary, int ch_ix);
+struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
+					  struct auxiliary_device *adev,
+					  int idx);
+
+int mlx5_sd_init(struct mlx5_core_dev *dev);
+void mlx5_sd_cleanup(struct mlx5_core_dev *dev);
+
+#define mlx5_sd_for_each_dev_from_to(i, primary, ix_from, to, pos)	\
+	for (i = ix_from;							\
+	     (pos = mlx5_sd_primary_get_peer(primary, i)) && pos != (to); i++)
+
+#define mlx5_sd_for_each_dev(i, primary, pos)				\
+	mlx5_sd_for_each_dev_from_to(i, primary, 0, NULL, pos)
+
+#define mlx5_sd_for_each_dev_to(i, primary, to, pos)			\
+	mlx5_sd_for_each_dev_from_to(i, primary, 0, to, pos)
+
+#define mlx5_sd_for_each_secondary(i, primary, pos)			\
+	mlx5_sd_for_each_dev_from_to(i, primary, 1, NULL, pos)
+
+#define mlx5_sd_for_each_secondary_to(i, primary, to, pos)		\
+	mlx5_sd_for_each_dev_from_to(i, primary, 1, to, pos)
+
+#endif /* __MLX5_LIB_SD_H__ */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 03/15] net/mlx5: SD, Introduce SD lib Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2024-01-05 12:15   ` Jiri Pirko
  2023-12-21  0:57 ` [net-next 05/15] net/mlx5: SD, Implement devcom communication and primary election Saeed Mahameed
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Tariq Toukan <tariqt@nvidia.com>

Add implementation for querying the MPIR register for Socket-Direct
attributes, and instantiating a SD struct accordingly.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 107 +++++++++++++++++-
 1 file changed, 106 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index ea37238c4519..9d8b1bb0c0a6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -3,6 +3,8 @@
 
 #include "lib/sd.h"
 #include "mlx5_core.h"
+#include "lib/mlx5.h"
+#include <linux/mlx5/vport.h>
 
 #define sd_info(__dev, format, ...) \
 	dev_info((__dev)->device, "Socket-Direct: " format, ##__VA_ARGS__)
@@ -10,11 +12,18 @@
 	dev_warn((__dev)->device, "Socket-Direct: " format, ##__VA_ARGS__)
 
 struct mlx5_sd {
+	u32 group_id;
+	u8 host_buses;
 };
 
 static int mlx5_sd_get_host_buses(struct mlx5_core_dev *dev)
 {
-	return 1;
+	struct mlx5_sd *sd = mlx5_get_sd(dev);
+
+	if (!sd)
+		return 1;
+
+	return sd->host_buses;
 }
 
 struct mlx5_core_dev *
@@ -43,13 +52,109 @@ struct mlx5_core_dev *mlx5_sd_ch_ix_get_dev(struct mlx5_core_dev *primary, int c
 	return mlx5_sd_primary_get_peer(primary, mdev_idx);
 }
 
+static bool mlx5_sd_is_supported(struct mlx5_core_dev *dev, u8 host_buses)
+{
+	/* Feature is currently implemented for PFs only */
+	if (!mlx5_core_is_pf(dev))
+		return false;
+
+	/* Honor the SW implementation limit */
+	if (host_buses > MLX5_SD_MAX_GROUP_SZ)
+		return false;
+
+	return true;
+}
+
+static int mlx5_query_sd(struct mlx5_core_dev *dev, bool *sdm,
+			 u8 *host_buses, u8 *sd_group)
+{
+	u32 out[MLX5_ST_SZ_DW(mpir_reg)];
+	int err;
+
+	err = mlx5_query_mpir_reg(dev, out);
+	if (err)
+		return err;
+
+	err = mlx5_query_nic_vport_sd_group(dev, sd_group);
+	if (err)
+		return err;
+
+	*sdm = MLX5_GET(mpir_reg, out, sdm);
+	*host_buses = MLX5_GET(mpir_reg, out, host_buses);
+
+	return 0;
+}
+
+static u32 mlx5_sd_group_id(struct mlx5_core_dev *dev, u8 sd_group)
+{
+	return (u32)((MLX5_CAP_GEN(dev, native_port_num) << 8) | sd_group);
+}
+
+static int sd_init(struct mlx5_core_dev *dev)
+{
+	u8 host_buses, sd_group;
+	struct mlx5_sd *sd;
+	u32 group_id;
+	bool sdm;
+	int err;
+
+	err = mlx5_query_sd(dev, &sdm, &host_buses, &sd_group);
+	if (err)
+		return err;
+
+	if (!sdm)
+		return 0;
+
+	if (!sd_group)
+		return 0;
+
+	group_id = mlx5_sd_group_id(dev, sd_group);
+
+	if (!mlx5_sd_is_supported(dev, host_buses)) {
+		sd_warn(dev, "can't support requested netdev combining for group id 0x%x), skipping\n",
+			group_id);
+		return 0;
+	}
+
+	sd = kzalloc(sizeof(*sd), GFP_KERNEL);
+	if (!sd)
+		return -ENOMEM;
+
+	sd->host_buses = host_buses;
+	sd->group_id = group_id;
+
+	mlx5_set_sd(dev, sd);
+
+	return 0;
+}
+
+static void sd_cleanup(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sd *sd = mlx5_get_sd(dev);
+
+	mlx5_set_sd(dev, NULL);
+	kfree(sd);
+}
+
 int mlx5_sd_init(struct mlx5_core_dev *dev)
 {
+	int err;
+
+	err = sd_init(dev);
+	if (err)
+		return err;
+
 	return 0;
 }
 
 void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
 {
+	struct mlx5_sd *sd = mlx5_get_sd(dev);
+
+	if (!sd)
+		return;
+
+	sd_cleanup(dev);
 }
 
 struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 05/15] net/mlx5: SD, Implement devcom communication and primary election
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 06/15] net/mlx5: SD, Implement steering for primary and secondaries Saeed Mahameed
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Tariq Toukan <tariqt@nvidia.com>

Use devcom to communicate between the different devices. Add a new
devcom component type for this.

Each device registers itself to the devcom component <SD, group ID>.
Once all devices of a component are registered, the component becomes
ready, and a primary device is elected.

In principle, any of the devices can act as a primary, they are all
capable, and a random election would've worked. However, we aim to
achieve predictability and consistency, hence each group always choses
the same device, with the lowest PCI BUS number, as primary.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/lib/devcom.h  |   1 +
 .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 122 +++++++++++++++++-
 2 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
index ec32b686f586..d58032dd0df7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
@@ -10,6 +10,7 @@ enum mlx5_devcom_component {
 	MLX5_DEVCOM_ESW_OFFLOADS,
 	MLX5_DEVCOM_MPV,
 	MLX5_DEVCOM_HCA_PORTS,
+	MLX5_DEVCOM_SD_GROUP,
 	MLX5_DEVCOM_NUM_COMPONENTS,
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index 9d8b1bb0c0a6..19e674dd1af7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -14,6 +14,16 @@
 struct mlx5_sd {
 	u32 group_id;
 	u8 host_buses;
+	struct mlx5_devcom_comp_dev *devcom;
+	bool primary;
+	union {
+		struct { /* primary */
+			struct mlx5_core_dev *secondaries[MLX5_SD_MAX_GROUP_SZ - 1];
+		};
+		struct { /* secondary */
+			struct mlx5_core_dev *primary_dev;
+		};
+	};
 };
 
 static int mlx5_sd_get_host_buses(struct mlx5_core_dev *dev)
@@ -26,13 +36,29 @@ static int mlx5_sd_get_host_buses(struct mlx5_core_dev *dev)
 	return sd->host_buses;
 }
 
+static struct mlx5_core_dev *mlx5_sd_get_primary(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sd *sd = mlx5_get_sd(dev);
+
+	if (!sd)
+		return dev;
+
+	return sd->primary ? dev : sd->primary_dev;
+}
+
 struct mlx5_core_dev *
 mlx5_sd_primary_get_peer(struct mlx5_core_dev *primary, int idx)
 {
+	struct mlx5_sd *sd;
+
 	if (idx == 0)
 		return primary;
 
-	return NULL;
+	if (idx >= mlx5_sd_get_host_buses(primary))
+		return NULL;
+
+	sd = mlx5_get_sd(primary);
+	return sd->secondaries[idx - 1];
 }
 
 int mlx5_sd_ch_ix_get_dev_ix(struct mlx5_core_dev *dev, int ch_ix)
@@ -136,15 +162,93 @@ static void sd_cleanup(struct mlx5_core_dev *dev)
 	kfree(sd);
 }
 
+static int sd_register(struct mlx5_core_dev *dev)
+{
+	struct mlx5_devcom_comp_dev *devcom, *pos;
+	struct mlx5_core_dev *peer, *primary;
+	struct mlx5_sd *sd, *primary_sd;
+	int err, i;
+
+	sd = mlx5_get_sd(dev);
+	devcom = mlx5_devcom_register_component(dev->priv.devc, MLX5_DEVCOM_SD_GROUP,
+						sd->group_id, NULL, dev);
+	if (!devcom)
+		return -ENOMEM;
+
+	sd->devcom = devcom;
+
+	if (mlx5_devcom_comp_get_size(devcom) != sd->host_buses)
+		return 0;
+
+	mlx5_devcom_comp_lock(devcom);
+	mlx5_devcom_comp_set_ready(devcom, true);
+	mlx5_devcom_comp_unlock(devcom);
+
+	if (!mlx5_devcom_for_each_peer_begin(devcom)) {
+		err = -ENODEV;
+		goto err_devcom_unreg;
+	}
+
+	primary = dev;
+	mlx5_devcom_for_each_peer_entry(devcom, peer, pos)
+		if (peer->pdev->bus->number < primary->pdev->bus->number)
+			primary = peer;
+
+	primary_sd = mlx5_get_sd(primary);
+	primary_sd->primary = true;
+	i = 0;
+	/* loop the secondaries */
+	mlx5_devcom_for_each_peer_entry(primary_sd->devcom, peer, pos) {
+		struct mlx5_sd *peer_sd = mlx5_get_sd(peer);
+
+		primary_sd->secondaries[i++] = peer;
+		peer_sd->primary = false;
+		peer_sd->primary_dev = primary;
+	}
+
+	mlx5_devcom_for_each_peer_end(devcom);
+	return 0;
+
+err_devcom_unreg:
+	mlx5_devcom_comp_lock(sd->devcom);
+	mlx5_devcom_comp_set_ready(sd->devcom, false);
+	mlx5_devcom_comp_unlock(sd->devcom);
+	mlx5_devcom_unregister_component(sd->devcom);
+	return err;
+}
+
+static void sd_unregister(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sd *sd = mlx5_get_sd(dev);
+
+	mlx5_devcom_comp_lock(sd->devcom);
+	mlx5_devcom_comp_set_ready(sd->devcom, false);
+	mlx5_devcom_comp_unlock(sd->devcom);
+	mlx5_devcom_unregister_component(sd->devcom);
+}
+
 int mlx5_sd_init(struct mlx5_core_dev *dev)
 {
+	struct mlx5_sd *sd = mlx5_get_sd(dev);
 	int err;
 
 	err = sd_init(dev);
 	if (err)
 		return err;
 
+	sd = mlx5_get_sd(dev);
+	if (!sd)
+		return 0;
+
+	err = sd_register(dev);
+	if (err)
+		goto err_sd_cleanup;
+
 	return 0;
+
+err_sd_cleanup:
+	sd_cleanup(dev);
+	return err;
 }
 
 void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
@@ -154,6 +258,7 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
 	if (!sd)
 		return;
 
+	sd_unregister(dev);
 	sd_cleanup(dev);
 }
 
@@ -161,5 +266,18 @@ struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
 					  struct auxiliary_device *adev,
 					  int idx)
 {
-	return adev;
+	struct mlx5_sd *sd = mlx5_get_sd(dev);
+	struct mlx5_core_dev *primary;
+
+	if (!sd)
+		return adev;
+
+	if (!mlx5_devcom_comp_is_ready(sd->devcom))
+		return NULL;
+
+	primary = mlx5_sd_get_primary(dev);
+	if (dev == primary)
+		return adev;
+
+	return &primary->priv.adev[idx]->adev;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 06/15] net/mlx5: SD, Implement steering for primary and secondaries
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 05/15] net/mlx5: SD, Implement devcom communication and primary election Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log Saeed Mahameed
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Tariq Toukan <tariqt@nvidia.com>

Implement the needed SD steering adjustments for the primary and
secondaries.

While the SD multiple devices are used to avoid cross-numa memory, when
it comes to chip level all traffic goes only through the primary device.
The secondaries are forced to silent mode, to guarantee they are not
involved in any unexpected ingress/egress traffic.

In RX, secondary devices will not have steering objects. Traffic will be
steered from the primary device to the RQs of a secondary device using
advanced cross-vhca RX steering capabilities.

In TX, the primary creates a new TX flow table, which is aliased by the
secondaries.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 185 +++++++++++++++++-
 1 file changed, 184 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index 19e674dd1af7..3309f21d892e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -4,6 +4,7 @@
 #include "lib/sd.h"
 #include "mlx5_core.h"
 #include "lib/mlx5.h"
+#include "fs_cmd.h"
 #include <linux/mlx5/vport.h>
 
 #define sd_info(__dev, format, ...) \
@@ -19,9 +20,11 @@ struct mlx5_sd {
 	union {
 		struct { /* primary */
 			struct mlx5_core_dev *secondaries[MLX5_SD_MAX_GROUP_SZ - 1];
+			struct mlx5_flow_table *tx_ft;
 		};
 		struct { /* secondary */
 			struct mlx5_core_dev *primary_dev;
+			u32 alias_obj_id;
 		};
 	};
 };
@@ -78,6 +81,21 @@ struct mlx5_core_dev *mlx5_sd_ch_ix_get_dev(struct mlx5_core_dev *primary, int c
 	return mlx5_sd_primary_get_peer(primary, mdev_idx);
 }
 
+static bool ft_create_alias_supported(struct mlx5_core_dev *dev)
+{
+	u64 obj_allowed = MLX5_CAP_GEN_2_64(dev, allowed_object_for_other_vhca_access);
+	u32 obj_supp = MLX5_CAP_GEN_2(dev, cross_vhca_object_to_object_supported);
+
+	if (!(obj_supp &
+	    MLX5_CROSS_VHCA_OBJ_TO_OBJ_SUPPORTED_LOCAL_FLOW_TABLE_ROOT_TO_REMOTE_FLOW_TABLE))
+		return false;
+
+	if (!(obj_allowed & MLX5_ALLOWED_OBJ_FOR_OTHER_VHCA_ACCESS_FLOW_TABLE))
+		return false;
+
+	return true;
+}
+
 static bool mlx5_sd_is_supported(struct mlx5_core_dev *dev, u8 host_buses)
 {
 	/* Feature is currently implemented for PFs only */
@@ -88,6 +106,24 @@ static bool mlx5_sd_is_supported(struct mlx5_core_dev *dev, u8 host_buses)
 	if (host_buses > MLX5_SD_MAX_GROUP_SZ)
 		return false;
 
+	/* Disconnect secondaries from the network */
+	if (!MLX5_CAP_GEN(dev, eswitch_manager))
+		return false;
+	if (!MLX5_CAP_GEN(dev, silent_mode))
+		return false;
+
+	/* RX steering from primary to secondaries */
+	if (!MLX5_CAP_GEN(dev, cross_vhca_rqt))
+		return false;
+	if (host_buses > MLX5_CAP_GEN_2(dev, max_rqt_vhca_id))
+		return false;
+
+	/* TX steering from secondaries to primary */
+	if (!ft_create_alias_supported(dev))
+		return false;
+	if (!MLX5_CAP_FLOWTABLE_NIC_TX(dev, reset_root_to_default))
+		return false;
+
 	return true;
 }
 
@@ -227,10 +263,122 @@ static void sd_unregister(struct mlx5_core_dev *dev)
 	mlx5_devcom_unregister_component(sd->devcom);
 }
 
+static int sd_cmd_set_primary(struct mlx5_core_dev *primary, u8 *alias_key)
+{
+	struct mlx5_cmd_allow_other_vhca_access_attr allow_attr = {};
+	struct mlx5_sd *sd = mlx5_get_sd(primary);
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_flow_namespace *nic_ns;
+	struct mlx5_flow_table *ft;
+	int err;
+
+	nic_ns = mlx5_get_flow_namespace(primary, MLX5_FLOW_NAMESPACE_EGRESS);
+	if (!nic_ns)
+		return -EOPNOTSUPP;
+
+	ft = mlx5_create_flow_table(nic_ns, &ft_attr);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		return err;
+	}
+	sd->tx_ft = ft;
+	memcpy(allow_attr.access_key, alias_key, ACCESS_KEY_LEN);
+	allow_attr.obj_type = MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS;
+	allow_attr.obj_id = (ft->type << FT_ID_FT_TYPE_OFFSET) | ft->id;
+
+	err = mlx5_cmd_allow_other_vhca_access(primary, &allow_attr);
+	if (err) {
+		mlx5_core_err(primary, "Failed to allow other vhca access err=%d\n",
+			      err);
+		mlx5_destroy_flow_table(ft);
+		return err;
+	}
+
+	return 0;
+}
+
+static void sd_cmd_unset_primary(struct mlx5_core_dev *primary)
+{
+	struct mlx5_sd *sd = mlx5_get_sd(primary);
+
+	mlx5_destroy_flow_table(sd->tx_ft);
+}
+
+static int sd_secondary_create_alias_ft(struct mlx5_core_dev *secondary,
+					struct mlx5_core_dev *primary,
+					struct mlx5_flow_table *ft,
+					u32 *obj_id, u8 *alias_key)
+{
+	u32 aliased_object_id = (ft->type << FT_ID_FT_TYPE_OFFSET) | ft->id;
+	u16 vhca_id_to_be_accessed = MLX5_CAP_GEN(primary, vhca_id);
+	struct mlx5_cmd_alias_obj_create_attr alias_attr = {};
+	int ret;
+
+	memcpy(alias_attr.access_key, alias_key, ACCESS_KEY_LEN);
+	alias_attr.obj_id = aliased_object_id;
+	alias_attr.obj_type = MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS;
+	alias_attr.vhca_id = vhca_id_to_be_accessed;
+	ret = mlx5_cmd_alias_obj_create(secondary, &alias_attr, obj_id);
+	if (ret) {
+		mlx5_core_err(secondary, "Failed to create alias object err=%d\n",
+			      ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void sd_secondary_destroy_alias_ft(struct mlx5_core_dev *secondary)
+{
+	struct mlx5_sd *sd = mlx5_get_sd(secondary);
+
+	mlx5_cmd_alias_obj_destroy(secondary, sd->alias_obj_id,
+				   MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS);
+}
+
+static int sd_cmd_set_secondary(struct mlx5_core_dev *secondary,
+				struct mlx5_core_dev *primary,
+				u8 *alias_key)
+{
+	struct mlx5_sd *primary_sd = mlx5_get_sd(primary);
+	struct mlx5_sd *sd = mlx5_get_sd(secondary);
+	int err;
+
+	err = mlx5_fs_cmd_set_l2table_entry_silent(secondary, 1);
+	if (err)
+		return err;
+
+	err = sd_secondary_create_alias_ft(secondary, primary, primary_sd->tx_ft,
+					   &sd->alias_obj_id, alias_key);
+	if (err)
+		goto err_unset_silent;
+
+	err = mlx5_fs_cmd_set_tx_flow_table_root(secondary, sd->alias_obj_id, false);
+	if (err)
+		goto err_destroy_alias_ft;
+
+	return 0;
+
+err_destroy_alias_ft:
+	sd_secondary_destroy_alias_ft(secondary);
+err_unset_silent:
+	mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
+	return err;
+}
+
+static void sd_cmd_unset_secondary(struct mlx5_core_dev *secondary)
+{
+	mlx5_fs_cmd_set_tx_flow_table_root(secondary, 0, true);
+	sd_secondary_destroy_alias_ft(secondary);
+	mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
+}
+
 int mlx5_sd_init(struct mlx5_core_dev *dev)
 {
+	struct mlx5_core_dev *primary, *pos, *to;
 	struct mlx5_sd *sd = mlx5_get_sd(dev);
-	int err;
+	u8 alias_key[ACCESS_KEY_LEN];
+	int err, i;
 
 	err = sd_init(dev);
 	if (err)
@@ -244,8 +392,33 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
 	if (err)
 		goto err_sd_cleanup;
 
+	if (!mlx5_devcom_comp_is_ready(sd->devcom))
+		return 0;
+
+	primary = mlx5_sd_get_primary(dev);
+
+	for (i = 0; i < ACCESS_KEY_LEN; i++)
+		alias_key[i] = get_random_u8();
+
+	err = sd_cmd_set_primary(primary, alias_key);
+	if (err)
+		goto err_sd_unregister;
+
+	mlx5_sd_for_each_secondary(i, primary, pos) {
+		err = sd_cmd_set_secondary(pos, primary, alias_key);
+		if (err)
+			goto err_unset_secondaries;
+	}
+
 	return 0;
 
+err_unset_secondaries:
+	to = pos;
+	mlx5_sd_for_each_secondary_to(i, primary, to, pos)
+		sd_cmd_unset_secondary(pos);
+	sd_cmd_unset_primary(primary);
+err_sd_unregister:
+	sd_unregister(dev);
 err_sd_cleanup:
 	sd_cleanup(dev);
 	return err;
@@ -254,10 +427,20 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
 void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
 {
 	struct mlx5_sd *sd = mlx5_get_sd(dev);
+	struct mlx5_core_dev *primary, *pos;
+	int i;
 
 	if (!sd)
 		return;
 
+	if (!mlx5_devcom_comp_is_ready(sd->devcom))
+		goto out;
+
+	primary = mlx5_sd_get_primary(dev);
+	mlx5_sd_for_each_secondary(i, primary, pos)
+		sd_cmd_unset_secondary(pos);
+	sd_cmd_unset_primary(primary);
+out:
 	sd_unregister(dev);
 	sd_cleanup(dev);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 06/15] net/mlx5: SD, Implement steering for primary and secondaries Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2024-01-05 12:12   ` Jiri Pirko
  2023-12-21  0:57 ` [net-next 08/15] net/mlx5e: Create single netdev per SD group Saeed Mahameed
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Tariq Toukan <tariqt@nvidia.com>

Print to kernel log when an SD group moves from/to ready state.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index 3309f21d892e..f68942277c62 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -373,6 +373,21 @@ static void sd_cmd_unset_secondary(struct mlx5_core_dev *secondary)
 	mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
 }
 
+static void sd_print_group(struct mlx5_core_dev *primary)
+{
+	struct mlx5_sd *sd = mlx5_get_sd(primary);
+	struct mlx5_core_dev *pos;
+	int i;
+
+	sd_info(primary, "group id %#x, primary %s, vhca %u\n",
+		sd->group_id, pci_name(primary->pdev),
+		MLX5_CAP_GEN(primary, vhca_id));
+	mlx5_sd_for_each_secondary(i, primary, pos)
+		sd_info(primary, "group id %#x, secondary#%d %s, vhca %u\n",
+			sd->group_id, i - 1, pci_name(pos->pdev),
+			MLX5_CAP_GEN(pos, vhca_id));
+}
+
 int mlx5_sd_init(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_dev *primary, *pos, *to;
@@ -410,6 +425,10 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
 			goto err_unset_secondaries;
 	}
 
+	sd_info(primary, "group id %#x, size %d, combined\n",
+		sd->group_id, mlx5_devcom_comp_get_size(sd->devcom));
+	sd_print_group(primary);
+
 	return 0;
 
 err_unset_secondaries:
@@ -440,6 +459,8 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
 	mlx5_sd_for_each_secondary(i, primary, pos)
 		sd_cmd_unset_secondary(pos);
 	sd_cmd_unset_primary(primary);
+
+	sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
 out:
 	sd_unregister(dev);
 	sd_cleanup(dev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 08/15] net/mlx5e: Create single netdev per SD group
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2024-01-08 13:36   ` Aishwarya TCV
  2023-12-21  0:57 ` [net-next 09/15] net/mlx5e: Create EN core HW resources for all secondary devices Saeed Mahameed
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Integrate the SD library calls into the auxiliary_driver ops in
preparation for creating a single netdev for the multiple devices
belonging to the same SD group.

SD is still disabled at this stage. It is enabled by a downstream patch
when all needed parts are implemented.

The netdev is created only when the SD group, with all its participants,
are ready. It is later destroyed if any of the participating devices
drops.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 69 +++++++++++++++++--
 1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index c8e8f512803e..2c47c9076aa6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -70,6 +70,7 @@
 #include "qos.h"
 #include "en/trap.h"
 #include "lib/devcom.h"
+#include "lib/sd.h"
 
 bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev, u8 page_shift,
 					    enum mlx5e_mpwrq_umr_mode umr_mode)
@@ -5980,7 +5981,7 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv)
 	free_netdev(netdev);
 }
 
-static int mlx5e_resume(struct auxiliary_device *adev)
+static int _mlx5e_resume(struct auxiliary_device *adev)
 {
 	struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
 	struct mlx5e_dev *mlx5e_dev = auxiliary_get_drvdata(adev);
@@ -6005,6 +6006,23 @@ static int mlx5e_resume(struct auxiliary_device *adev)
 	return 0;
 }
 
+static int mlx5e_resume(struct auxiliary_device *adev)
+{
+	struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
+	struct mlx5_core_dev *mdev = edev->mdev;
+	struct auxiliary_device *actual_adev;
+	int err;
+
+	err = mlx5_sd_init(mdev);
+	if (err)
+		return err;
+
+	actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
+	if (actual_adev)
+		return _mlx5e_resume(actual_adev);
+	return 0;
+}
+
 static int _mlx5e_suspend(struct auxiliary_device *adev)
 {
 	struct mlx5e_dev *mlx5e_dev = auxiliary_get_drvdata(adev);
@@ -6025,7 +6043,17 @@ static int _mlx5e_suspend(struct auxiliary_device *adev)
 
 static int mlx5e_suspend(struct auxiliary_device *adev, pm_message_t state)
 {
-	return _mlx5e_suspend(adev);
+	struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
+	struct mlx5_core_dev *mdev = edev->mdev;
+	struct auxiliary_device *actual_adev;
+	int err = 0;
+
+	actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
+	if (actual_adev)
+		err = _mlx5e_suspend(actual_adev);
+
+	mlx5_sd_cleanup(mdev);
+	return err;
 }
 
 static int _mlx5e_probe(struct auxiliary_device *adev)
@@ -6071,9 +6099,9 @@ static int _mlx5e_probe(struct auxiliary_device *adev)
 		goto err_destroy_netdev;
 	}
 
-	err = mlx5e_resume(adev);
+	err = _mlx5e_resume(adev);
 	if (err) {
-		mlx5_core_err(mdev, "mlx5e_resume failed, %d\n", err);
+		mlx5_core_err(mdev, "_mlx5e_resume failed, %d\n", err);
 		goto err_profile_cleanup;
 	}
 
@@ -6104,15 +6132,29 @@ static int _mlx5e_probe(struct auxiliary_device *adev)
 static int mlx5e_probe(struct auxiliary_device *adev,
 		       const struct auxiliary_device_id *id)
 {
-	return _mlx5e_probe(adev);
+	struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
+	struct mlx5_core_dev *mdev = edev->mdev;
+	struct auxiliary_device *actual_adev;
+	int err;
+
+	err = mlx5_sd_init(mdev);
+	if (err)
+		return err;
+
+	actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
+	if (actual_adev)
+		return _mlx5e_probe(actual_adev);
+	return 0;
 }
 
-static void mlx5e_remove(struct auxiliary_device *adev)
+static void _mlx5e_remove(struct auxiliary_device *adev)
 {
+	struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
 	struct mlx5e_dev *mlx5e_dev = auxiliary_get_drvdata(adev);
 	struct mlx5e_priv *priv = mlx5e_dev->priv;
+	struct mlx5_core_dev *mdev = edev->mdev;
 
-	mlx5_core_uplink_netdev_set(priv->mdev, NULL);
+	mlx5_core_uplink_netdev_set(mdev, NULL);
 	mlx5e_dcbnl_delete_app(priv);
 	unregister_netdev(priv->netdev);
 	_mlx5e_suspend(adev);
@@ -6122,6 +6164,19 @@ static void mlx5e_remove(struct auxiliary_device *adev)
 	mlx5e_destroy_devlink(mlx5e_dev);
 }
 
+static void mlx5e_remove(struct auxiliary_device *adev)
+{
+	struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
+	struct mlx5_core_dev *mdev = edev->mdev;
+	struct auxiliary_device *actual_adev;
+
+	actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
+	if (actual_adev)
+		_mlx5e_remove(actual_adev);
+
+	mlx5_sd_cleanup(mdev);
+}
+
 static const struct auxiliary_device_id mlx5e_id_table[] = {
 	{ .name = MLX5_ADEV_NAME ".eth", },
 	{},
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 09/15] net/mlx5e: Create EN core HW resources for all secondary devices
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 08/15] net/mlx5e: Create single netdev per SD group Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 10/15] net/mlx5e: Let channels be SD-aware Saeed Mahameed
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Traffic queues will be created on all devices, including the
secondaries. Create the needed core layer resources for them as well.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 32 +++++++++++++------
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 55c6ace0acd5..6c143088e247 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -60,6 +60,7 @@
 #include "lib/clock.h"
 #include "en/rx_res.h"
 #include "en/selq.h"
+#include "lib/sd.h"
 
 extern const struct net_device_ops mlx5e_netdev_ops;
 struct page_pool;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2c47c9076aa6..90a02fd3357a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5988,22 +5988,29 @@ static int _mlx5e_resume(struct auxiliary_device *adev)
 	struct mlx5e_priv *priv = mlx5e_dev->priv;
 	struct net_device *netdev = priv->netdev;
 	struct mlx5_core_dev *mdev = edev->mdev;
-	int err;
+	struct mlx5_core_dev *pos, *to;
+	int err, i;
 
 	if (netif_device_present(netdev))
 		return 0;
 
-	err = mlx5e_create_mdev_resources(mdev, true);
-	if (err)
-		return err;
+	mlx5_sd_for_each_dev(i, mdev, pos) {
+		err = mlx5e_create_mdev_resources(pos, true);
+		if (err)
+			goto err_destroy_mdev_res;
+	}
 
 	err = mlx5e_attach_netdev(priv);
-	if (err) {
-		mlx5e_destroy_mdev_resources(mdev);
-		return err;
-	}
+	if (err)
+		goto err_destroy_mdev_res;
 
 	return 0;
+
+err_destroy_mdev_res:
+	to = pos;
+	mlx5_sd_for_each_dev_to(i, mdev, to, pos)
+		mlx5e_destroy_mdev_resources(pos);
+	return err;
 }
 
 static int mlx5e_resume(struct auxiliary_device *adev)
@@ -6029,15 +6036,20 @@ static int _mlx5e_suspend(struct auxiliary_device *adev)
 	struct mlx5e_priv *priv = mlx5e_dev->priv;
 	struct net_device *netdev = priv->netdev;
 	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5_core_dev *pos;
+	int i;
 
 	if (!netif_device_present(netdev)) {
 		if (test_bit(MLX5E_STATE_DESTROYING, &priv->state))
-			mlx5e_destroy_mdev_resources(mdev);
+			mlx5_sd_for_each_dev(i, mdev, pos)
+				mlx5e_destroy_mdev_resources(pos);
 		return -ENODEV;
 	}
 
 	mlx5e_detach_netdev(priv);
-	mlx5e_destroy_mdev_resources(mdev);
+	mlx5_sd_for_each_dev(i, mdev, pos)
+		mlx5e_destroy_mdev_resources(pos);
+
 	return 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 09/15] net/mlx5e: Create EN core HW resources for all secondary devices Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2024-01-04 22:50   ` Jakub Kicinski
  2023-12-21  0:57 ` [net-next 11/15] net/mlx5e: Support cross-vhca RSS Saeed Mahameed
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Distribute the channels between the different SD-devices to acheive
local numa node performance on multiple numas.

Each channel works against one specific mdev, creating all datapath
queues against it.

We distribute channels to mdevs in a round-robin policy.

Example for 2 mdevs and 6 channels:
+-------+---------+
| ch ix | mdev ix |
+-------+---------+
|   0   |    0    |
|   1   |    1    |
|   2   |    0    |
|   3   |    1    |
|   4   |    0    |
|   5   |    1    |
+-------+---------+

This round-robin distribution policy is preferred over another suggested
intuitive distribution, in which we first distribute one half of the
channels to mdev #0 and then the second half to mdev #1.

We prefer round-robin for a reason: it is less influenced by changes in
the number of channels. The mapping between channel index and mdev is
fixed, no matter how many channels the user configures. As the channel
stats are persistent to channels closure, changing the mapping every
single time would turn the accumulative stats less representing of the
channel's history.

Per-channel objects should stop using the primary mdev (priv->mdev)
directly, and instead move to using their own channel's mdev.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
 .../ethernet/mellanox/mlx5/core/en/params.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en/qos.c  |  8 ++---
 .../mellanox/mlx5/core/en/reporter_rx.c       |  4 +--
 .../mellanox/mlx5/core/en/reporter_tx.c       |  3 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/pool.c |  6 ++--
 .../mellanox/mlx5/core/en_accel/ktls_rx.c     |  6 ++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 32 ++++++++++++-------
 8 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6c143088e247..f6e78c465c7a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -792,6 +792,7 @@ struct mlx5e_channel {
 	struct hwtstamp_config    *tstamp;
 	DECLARE_BITMAP(state, MLX5E_CHANNEL_NUM_STATES);
 	int                        ix;
+	int                        vec_ix;
 	int                        cpu;
 	/* Sync between icosq recovery and XSK enable/disable. */
 	struct mutex               icosq_recovery_lock;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 284253b79266..18f0cedc8610 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -674,7 +674,7 @@ void mlx5e_build_create_cq_param(struct mlx5e_create_cq_param *ccp, struct mlx5e
 		.napi = &c->napi,
 		.ch_stats = c->stats,
 		.node = cpu_to_node(c->cpu),
-		.ix = c->ix,
+		.ix = c->vec_ix,
 	};
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
index 34adf8c3f81a..e87e26f2c669 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
@@ -122,8 +122,8 @@ int mlx5e_open_qos_sq(struct mlx5e_priv *priv, struct mlx5e_channels *chs,
 
 	memset(&param_sq, 0, sizeof(param_sq));
 	memset(&param_cq, 0, sizeof(param_cq));
-	mlx5e_build_sq_param(priv->mdev, params, &param_sq);
-	mlx5e_build_tx_cq_param(priv->mdev, params, &param_cq);
+	mlx5e_build_sq_param(c->mdev, params, &param_sq);
+	mlx5e_build_tx_cq_param(c->mdev, params, &param_cq);
 	err = mlx5e_open_cq(c->mdev, params->tx_cq_moderation, &param_cq, &ccp, &sq->cq);
 	if (err)
 		goto err_free_sq;
@@ -176,7 +176,7 @@ int mlx5e_activate_qos_sq(void *data, u16 node_qid, u32 hw_id)
 	 */
 	smp_wmb();
 
-	qos_dbg(priv->mdev, "Activate QoS SQ qid %u\n", node_qid);
+	qos_dbg(sq->mdev, "Activate QoS SQ qid %u\n", node_qid);
 	mlx5e_activate_txqsq(sq);
 
 	return 0;
@@ -190,7 +190,7 @@ void mlx5e_deactivate_qos_sq(struct mlx5e_priv *priv, u16 qid)
 	if (!sq) /* Handle the case when the SQ failed to open. */
 		return;
 
-	qos_dbg(priv->mdev, "Deactivate QoS SQ qid %u\n", qid);
+	qos_dbg(sq->mdev, "Deactivate QoS SQ qid %u\n", qid);
 	mlx5e_deactivate_txqsq(sq);
 
 	priv->txq2sq[mlx5e_qid_from_qos(&priv->channels, qid)] = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 4358798d6ce1..25d751eba99b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -294,8 +294,8 @@ static void mlx5e_rx_reporter_diagnose_generic_rq(struct mlx5e_rq *rq,
 
 	params = &priv->channels.params;
 	rq_sz = mlx5e_rqwq_get_size(rq);
-	real_time =  mlx5_is_real_time_rq(priv->mdev);
-	rq_stride = BIT(mlx5e_mpwqe_get_log_stride_size(priv->mdev, params, NULL));
+	real_time =  mlx5_is_real_time_rq(rq->mdev);
+	rq_stride = BIT(mlx5e_mpwqe_get_log_stride_size(rq->mdev, params, NULL));
 
 	mlx5e_health_fmsg_named_obj_nest_start(fmsg, "RQ");
 	devlink_fmsg_u8_pair_put(fmsg, "type", params->rq_wq_type);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 6b44ddce14e9..0ab9db319530 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -219,7 +219,6 @@ mlx5e_tx_reporter_build_diagnose_output_sq_common(struct devlink_fmsg *fmsg,
 						  struct mlx5e_txqsq *sq, int tc)
 {
 	bool stopped = netif_xmit_stopped(sq->txq);
-	struct mlx5e_priv *priv = sq->priv;
 	u8 state;
 	int err;
 
@@ -227,7 +226,7 @@ mlx5e_tx_reporter_build_diagnose_output_sq_common(struct devlink_fmsg *fmsg,
 	devlink_fmsg_u32_pair_put(fmsg, "txq ix", sq->txq_ix);
 	devlink_fmsg_u32_pair_put(fmsg, "sqn", sq->sqn);
 
-	err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
+	err = mlx5_core_query_sq_state(sq->mdev, sq->sqn, &state);
 	if (!err)
 		devlink_fmsg_u8_pair_put(fmsg, "HW state", state);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
index ebada0c5af3c..db776e515b6a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
@@ -6,10 +6,10 @@
 #include "setup.h"
 #include "en/params.h"
 
-static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv,
+static int mlx5e_xsk_map_pool(struct mlx5_core_dev *mdev,
 			      struct xsk_buff_pool *pool)
 {
-	struct device *dev = mlx5_core_dma_dev(priv->mdev);
+	struct device *dev = mlx5_core_dma_dev(mdev);
 
 	return xsk_pool_dma_map(pool, dev, DMA_ATTR_SKIP_CPU_SYNC);
 }
@@ -89,7 +89,7 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
 	if (unlikely(!mlx5e_xsk_is_pool_sane(pool)))
 		return -EINVAL;
 
-	err = mlx5e_xsk_map_pool(priv, pool);
+	err = mlx5e_xsk_map_pool(mlx5_sd_ch_ix_get_dev(priv->mdev, ix), pool);
 	if (unlikely(err))
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index 9b597cb24598..65ccb33edafb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -267,7 +267,7 @@ resync_post_get_progress_params(struct mlx5e_icosq *sq,
 		goto err_out;
 	}
 
-	pdev = mlx5_core_dma_dev(sq->channel->priv->mdev);
+	pdev = mlx5_core_dma_dev(sq->channel->mdev);
 	buf->dma_addr = dma_map_single(pdev, &buf->progress,
 				       PROGRESS_PARAMS_PADDED_SIZE, DMA_FROM_DEVICE);
 	if (unlikely(dma_mapping_error(pdev, buf->dma_addr))) {
@@ -425,14 +425,12 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
 {
 	struct mlx5e_ktls_rx_resync_buf *buf = wi->tls_get_params.buf;
 	struct mlx5e_ktls_offload_context_rx *priv_rx;
-	struct mlx5e_ktls_rx_resync_ctx *resync;
 	u8 tracker_state, auth_state, *ctx;
 	struct device *dev;
 	u32 hw_seq;
 
 	priv_rx = buf->priv_rx;
-	resync = &priv_rx->resync;
-	dev = mlx5_core_dma_dev(resync->priv->mdev);
+	dev = mlx5_core_dma_dev(sq->channel->mdev);
 	if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags)))
 		goto out;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 90a02fd3357a..8dac57282f1c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2527,14 +2527,20 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 			      struct xsk_buff_pool *xsk_pool,
 			      struct mlx5e_channel **cp)
 {
-	int cpu = mlx5_comp_vector_get_cpu(priv->mdev, ix);
 	struct net_device *netdev = priv->netdev;
+	struct mlx5_core_dev *mdev;
 	struct mlx5e_xsk_param xsk;
 	struct mlx5e_channel *c;
 	unsigned int irq;
+	int vec_ix;
+	int cpu;
 	int err;
 
-	err = mlx5_comp_irqn_get(priv->mdev, ix, &irq);
+	mdev = mlx5_sd_ch_ix_get_dev(priv->mdev, ix);
+	vec_ix = mlx5_sd_ch_ix_get_vec_ix(mdev, ix);
+	cpu = mlx5_comp_vector_get_cpu(mdev, vec_ix);
+
+	err = mlx5_comp_irqn_get(mdev, vec_ix, &irq);
 	if (err)
 		return err;
 
@@ -2547,18 +2553,19 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 		return -ENOMEM;
 
 	c->priv     = priv;
-	c->mdev     = priv->mdev;
+	c->mdev     = mdev;
 	c->tstamp   = &priv->tstamp;
 	c->ix       = ix;
+	c->vec_ix   = vec_ix;
 	c->cpu      = cpu;
-	c->pdev     = mlx5_core_dma_dev(priv->mdev);
+	c->pdev     = mlx5_core_dma_dev(mdev);
 	c->netdev   = priv->netdev;
-	c->mkey_be  = cpu_to_be32(priv->mdev->mlx5e_res.hw_objs.mkey);
+	c->mkey_be  = cpu_to_be32(mdev->mlx5e_res.hw_objs.mkey);
 	c->num_tc   = mlx5e_get_dcb_num_tc(params);
 	c->xdp      = !!params->xdp_prog;
 	c->stats    = &priv->channel_stats[ix]->ch;
 	c->aff_mask = irq_get_effective_affinity_mask(irq);
-	c->lag_port = mlx5e_enumerate_lag_port(priv->mdev, ix);
+	c->lag_port = mlx5e_enumerate_lag_port(mdev, ix);
 
 	netif_napi_add(netdev, &c->napi, mlx5e_napi_poll);
 
@@ -2936,15 +2943,18 @@ static MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX(mlx5e_update_netdev_queues);
 static void mlx5e_set_default_xps_cpumasks(struct mlx5e_priv *priv,
 					   struct mlx5e_params *params)
 {
-	struct mlx5_core_dev *mdev = priv->mdev;
-	int num_comp_vectors, ix, irq;
-
-	num_comp_vectors = mlx5_comp_vectors_max(mdev);
+	int ix;
 
 	for (ix = 0; ix < params->num_channels; ix++) {
+		int num_comp_vectors, irq, vec_ix;
+		struct mlx5_core_dev *mdev;
+
+		mdev = mlx5_sd_ch_ix_get_dev(priv->mdev, ix);
+		num_comp_vectors = mlx5_comp_vectors_max(mdev);
 		cpumask_clear(priv->scratchpad.cpumask);
+		vec_ix = mlx5_sd_ch_ix_get_vec_ix(mdev, ix);
 
-		for (irq = ix; irq < num_comp_vectors; irq += params->num_channels) {
+		for (irq = vec_ix; irq < num_comp_vectors; irq += params->num_channels) {
 			int cpu = mlx5_comp_vector_get_cpu(mdev, irq);
 
 			cpumask_set_cpu(cpu, priv->scratchpad.cpumask);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 11/15] net/mlx5e: Support cross-vhca RSS
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 10/15] net/mlx5e: Let channels be SD-aware Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 12/15] net/mlx5e: Support per-mdev queue counter Saeed Mahameed
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Implement driver support for the HW feature that allows RX steering of
one device to target other device's RQs.

In SD multi-mdev netdev mode, we set the secondaries into silent mode,
disconnecting them from the network. This feature is then used to steer
traffic from the primary to the secondaries.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/channels.c |  10 +-
 .../ethernet/mellanox/mlx5/core/en/channels.h |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/rqt.c  | 123 ++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/en/rqt.h  |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en/rss.c  |  17 +--
 .../net/ethernet/mellanox/mlx5/core/en/rss.h  |   4 +-
 .../ethernet/mellanox/mlx5/core/en/rx_res.c   |  62 ++++++---
 .../ethernet/mellanox/mlx5/core/en/rx_res.h   |   1 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |   2 +-
 10 files changed, 179 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c
index 48581ea3adcb..874a1016623c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.c
@@ -23,20 +23,26 @@ bool mlx5e_channels_is_xsk(struct mlx5e_channels *chs, unsigned int ix)
 	return test_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
 }
 
-void mlx5e_channels_get_regular_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn)
+void mlx5e_channels_get_regular_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn,
+				    u32 *vhca_id)
 {
 	struct mlx5e_channel *c = mlx5e_channels_get(chs, ix);
 
 	*rqn = c->rq.rqn;
+	if (vhca_id)
+		*vhca_id = MLX5_CAP_GEN(c->mdev, vhca_id);
 }
 
-void mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn)
+void mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn,
+				u32 *vhca_id)
 {
 	struct mlx5e_channel *c = mlx5e_channels_get(chs, ix);
 
 	WARN_ON_ONCE(!test_bit(MLX5E_CHANNEL_STATE_XSK, c->state));
 
 	*rqn = c->xskrq.rqn;
+	if (vhca_id)
+		*vhca_id = MLX5_CAP_GEN(c->mdev, vhca_id);
 }
 
 bool mlx5e_channels_get_ptp_rqn(struct mlx5e_channels *chs, u32 *rqn)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h
index 637ca90daaa8..6715aa9383b9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/channels.h
@@ -10,8 +10,10 @@ struct mlx5e_channels;
 
 unsigned int mlx5e_channels_get_num(struct mlx5e_channels *chs);
 bool mlx5e_channels_is_xsk(struct mlx5e_channels *chs, unsigned int ix);
-void mlx5e_channels_get_regular_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn);
-void mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn);
+void mlx5e_channels_get_regular_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn,
+				    u32 *vhca_id);
+void mlx5e_channels_get_xsk_rqn(struct mlx5e_channels *chs, unsigned int ix, u32 *rqn,
+				u32 *vhca_id);
 bool mlx5e_channels_get_ptp_rqn(struct mlx5e_channels *chs, u32 *rqn);
 
 #endif /* __MLX5_EN_CHANNELS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.c
index 7b8ff7a71003..bcafb4bf9415 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.c
@@ -4,6 +4,33 @@
 #include "rqt.h"
 #include <linux/mlx5/transobj.h>
 
+static bool verify_num_vhca_ids(struct mlx5_core_dev *mdev, u32 *vhca_ids,
+				unsigned int size)
+{
+	unsigned int max_num_vhca_id = MLX5_CAP_GEN_2(mdev, max_rqt_vhca_id);
+	int i;
+
+	/* Verify that all vhca_ids are in range [0, max_num_vhca_ids - 1] */
+	for (i = 0; i < size; i++)
+		if (vhca_ids[i] >= max_num_vhca_id)
+			return false;
+	return true;
+}
+
+static bool rqt_verify_vhca_ids(struct mlx5_core_dev *mdev, u32 *vhca_ids,
+				unsigned int size)
+{
+	if (!vhca_ids)
+		return true;
+
+	if (!MLX5_CAP_GEN(mdev, cross_vhca_rqt))
+		return false;
+	if (!verify_num_vhca_ids(mdev, vhca_ids, size))
+		return false;
+
+	return true;
+}
+
 void mlx5e_rss_params_indir_init_uniform(struct mlx5e_rss_params_indir *indir,
 					 unsigned int num_channels)
 {
@@ -13,19 +40,38 @@ void mlx5e_rss_params_indir_init_uniform(struct mlx5e_rss_params_indir *indir,
 		indir->table[i] = i % num_channels;
 }
 
+static void fill_rqn_list(void *rqtc, u32 *rqns, u32 *vhca_ids, unsigned int size)
+{
+	unsigned int i;
+
+	if (vhca_ids) {
+		MLX5_SET(rqtc, rqtc, rq_vhca_id_format, 1);
+		for (i = 0; i < size; i++) {
+			MLX5_SET(rqtc, rqtc, rq_vhca[i].rq_num, rqns[i]);
+			MLX5_SET(rqtc, rqtc, rq_vhca[i].rq_vhca_id, vhca_ids[i]);
+		}
+	} else {
+		for (i = 0; i < size; i++)
+			MLX5_SET(rqtc, rqtc, rq_num[i], rqns[i]);
+	}
+}
 static int mlx5e_rqt_init(struct mlx5e_rqt *rqt, struct mlx5_core_dev *mdev,
-			  u16 max_size, u32 *init_rqns, u16 init_size)
+			  u16 max_size, u32 *init_rqns, u32 *init_vhca_ids, u16 init_size)
 {
+	int entry_sz;
 	void *rqtc;
 	int inlen;
 	int err;
 	u32 *in;
-	int i;
+
+	if (!rqt_verify_vhca_ids(mdev, init_vhca_ids, init_size))
+		return -EOPNOTSUPP;
 
 	rqt->mdev = mdev;
 	rqt->size = max_size;
 
-	inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + sizeof(u32) * init_size;
+	entry_sz = init_vhca_ids ? MLX5_ST_SZ_BYTES(rq_vhca) : MLX5_ST_SZ_BYTES(rq_num);
+	inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + entry_sz * init_size;
 	in = kvzalloc(inlen, GFP_KERNEL);
 	if (!in)
 		return -ENOMEM;
@@ -33,10 +79,9 @@ static int mlx5e_rqt_init(struct mlx5e_rqt *rqt, struct mlx5_core_dev *mdev,
 	rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
 
 	MLX5_SET(rqtc, rqtc, rqt_max_size, rqt->size);
-
 	MLX5_SET(rqtc, rqtc, rqt_actual_size, init_size);
-	for (i = 0; i < init_size; i++)
-		MLX5_SET(rqtc, rqtc, rq_num[i], init_rqns[i]);
+
+	fill_rqn_list(rqtc, init_rqns, init_vhca_ids, init_size);
 
 	err = mlx5_core_create_rqt(rqt->mdev, in, inlen, &rqt->rqtn);
 
@@ -49,7 +94,7 @@ int mlx5e_rqt_init_direct(struct mlx5e_rqt *rqt, struct mlx5_core_dev *mdev,
 {
 	u16 max_size = indir_enabled ? indir_table_size : 1;
 
-	return mlx5e_rqt_init(rqt, mdev, max_size, &init_rqn, 1);
+	return mlx5e_rqt_init(rqt, mdev, max_size, &init_rqn, NULL, 1);
 }
 
 static int mlx5e_bits_invert(unsigned long a, int size)
@@ -63,7 +108,8 @@ static int mlx5e_bits_invert(unsigned long a, int size)
 	return inv;
 }
 
-static int mlx5e_calc_indir_rqns(u32 *rss_rqns, u32 *rqns, unsigned int num_rqns,
+static int mlx5e_calc_indir_rqns(u32 *rss_rqns, u32 *rqns, u32 *rss_vhca_ids, u32 *vhca_ids,
+				 unsigned int num_rqns,
 				 u8 hfunc, struct mlx5e_rss_params_indir *indir)
 {
 	unsigned int i;
@@ -82,30 +128,42 @@ static int mlx5e_calc_indir_rqns(u32 *rss_rqns, u32 *rqns, unsigned int num_rqns
 			 */
 			return -EINVAL;
 		rss_rqns[i] = rqns[ix];
+		if (vhca_ids)
+			rss_vhca_ids[i] = vhca_ids[ix];
 	}
 
 	return 0;
 }
 
 int mlx5e_rqt_init_indir(struct mlx5e_rqt *rqt, struct mlx5_core_dev *mdev,
-			 u32 *rqns, unsigned int num_rqns,
+			 u32 *rqns, u32 *vhca_ids, unsigned int num_rqns,
 			 u8 hfunc, struct mlx5e_rss_params_indir *indir)
 {
-	u32 *rss_rqns;
+	u32 *rss_rqns, *rss_vhca_ids = NULL;
 	int err;
 
 	rss_rqns = kvmalloc_array(indir->actual_table_size, sizeof(*rss_rqns), GFP_KERNEL);
 	if (!rss_rqns)
 		return -ENOMEM;
 
-	err = mlx5e_calc_indir_rqns(rss_rqns, rqns, num_rqns, hfunc, indir);
+	if (vhca_ids) {
+		rss_vhca_ids = kvmalloc_array(indir->actual_table_size, sizeof(*rss_vhca_ids),
+					      GFP_KERNEL);
+		if (!rss_vhca_ids) {
+			kvfree(rss_rqns);
+			return -ENOMEM;
+		}
+	}
+
+	err = mlx5e_calc_indir_rqns(rss_rqns, rqns, rss_vhca_ids, vhca_ids, num_rqns, hfunc, indir);
 	if (err)
 		goto out;
 
-	err = mlx5e_rqt_init(rqt, mdev, indir->max_table_size, rss_rqns,
+	err = mlx5e_rqt_init(rqt, mdev, indir->max_table_size, rss_rqns, rss_vhca_ids,
 			     indir->actual_table_size);
 
 out:
+	kvfree(rss_vhca_ids);
 	kvfree(rss_rqns);
 	return err;
 }
@@ -126,15 +184,20 @@ void mlx5e_rqt_destroy(struct mlx5e_rqt *rqt)
 	mlx5_core_destroy_rqt(rqt->mdev, rqt->rqtn);
 }
 
-static int mlx5e_rqt_redirect(struct mlx5e_rqt *rqt, u32 *rqns, unsigned int size)
+static int mlx5e_rqt_redirect(struct mlx5e_rqt *rqt, u32 *rqns, u32 *vhca_ids,
+			      unsigned int size)
 {
-	unsigned int i;
+	int entry_sz;
 	void *rqtc;
 	int inlen;
 	u32 *in;
 	int err;
 
-	inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + sizeof(u32) * size;
+	if (!rqt_verify_vhca_ids(rqt->mdev, vhca_ids, size))
+		return -EINVAL;
+
+	entry_sz = vhca_ids ? MLX5_ST_SZ_BYTES(rq_vhca) : MLX5_ST_SZ_BYTES(rq_num);
+	inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + entry_sz * size;
 	in = kvzalloc(inlen, GFP_KERNEL);
 	if (!in)
 		return -ENOMEM;
@@ -143,8 +206,8 @@ static int mlx5e_rqt_redirect(struct mlx5e_rqt *rqt, u32 *rqns, unsigned int siz
 
 	MLX5_SET(modify_rqt_in, in, bitmask.rqn_list, 1);
 	MLX5_SET(rqtc, rqtc, rqt_actual_size, size);
-	for (i = 0; i < size; i++)
-		MLX5_SET(rqtc, rqtc, rq_num[i], rqns[i]);
+
+	fill_rqn_list(rqtc, rqns, vhca_ids, size);
 
 	err = mlx5_core_modify_rqt(rqt->mdev, rqt->rqtn, in, inlen);
 
@@ -152,17 +215,21 @@ static int mlx5e_rqt_redirect(struct mlx5e_rqt *rqt, u32 *rqns, unsigned int siz
 	return err;
 }
 
-int mlx5e_rqt_redirect_direct(struct mlx5e_rqt *rqt, u32 rqn)
+int mlx5e_rqt_redirect_direct(struct mlx5e_rqt *rqt, u32 rqn, u32 *vhca_id)
 {
-	return mlx5e_rqt_redirect(rqt, &rqn, 1);
+	return mlx5e_rqt_redirect(rqt, &rqn, vhca_id, 1);
 }
 
-int mlx5e_rqt_redirect_indir(struct mlx5e_rqt *rqt, u32 *rqns, unsigned int num_rqns,
+int mlx5e_rqt_redirect_indir(struct mlx5e_rqt *rqt, u32 *rqns, u32 *vhca_ids,
+			     unsigned int num_rqns,
 			     u8 hfunc, struct mlx5e_rss_params_indir *indir)
 {
-	u32 *rss_rqns;
+	u32 *rss_rqns, *rss_vhca_ids = NULL;
 	int err;
 
+	if (!rqt_verify_vhca_ids(rqt->mdev, vhca_ids, num_rqns))
+		return -EINVAL;
+
 	if (WARN_ON(rqt->size != indir->max_table_size))
 		return -EINVAL;
 
@@ -170,13 +237,23 @@ int mlx5e_rqt_redirect_indir(struct mlx5e_rqt *rqt, u32 *rqns, unsigned int num_
 	if (!rss_rqns)
 		return -ENOMEM;
 
-	err = mlx5e_calc_indir_rqns(rss_rqns, rqns, num_rqns, hfunc, indir);
+	if (vhca_ids) {
+		rss_vhca_ids = kvmalloc_array(indir->actual_table_size, sizeof(*rss_vhca_ids),
+					      GFP_KERNEL);
+		if (!rss_vhca_ids) {
+			kvfree(rss_rqns);
+			return -ENOMEM;
+		}
+	}
+
+	err = mlx5e_calc_indir_rqns(rss_rqns, rqns, rss_vhca_ids, vhca_ids, num_rqns, hfunc, indir);
 	if (err)
 		goto out;
 
-	err = mlx5e_rqt_redirect(rqt, rss_rqns, indir->actual_table_size);
+	err = mlx5e_rqt_redirect(rqt, rss_rqns, rss_vhca_ids, indir->actual_table_size);
 
 out:
+	kvfree(rss_vhca_ids);
 	kvfree(rss_rqns);
 	return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.h
index 77fba3ebd18d..e0bc30308c77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rqt.h
@@ -20,7 +20,7 @@ void mlx5e_rss_params_indir_init_uniform(struct mlx5e_rss_params_indir *indir,
 					 unsigned int num_channels);
 
 struct mlx5e_rqt {
-	struct mlx5_core_dev *mdev;
+	struct mlx5_core_dev *mdev; /* primary */
 	u32 rqtn;
 	u16 size;
 };
@@ -28,7 +28,7 @@ struct mlx5e_rqt {
 int mlx5e_rqt_init_direct(struct mlx5e_rqt *rqt, struct mlx5_core_dev *mdev,
 			  bool indir_enabled, u32 init_rqn, u32 indir_table_size);
 int mlx5e_rqt_init_indir(struct mlx5e_rqt *rqt, struct mlx5_core_dev *mdev,
-			 u32 *rqns, unsigned int num_rqns,
+			 u32 *rqns, u32 *vhca_ids, unsigned int num_rqns,
 			 u8 hfunc, struct mlx5e_rss_params_indir *indir);
 void mlx5e_rqt_destroy(struct mlx5e_rqt *rqt);
 
@@ -38,8 +38,9 @@ static inline u32 mlx5e_rqt_get_rqtn(struct mlx5e_rqt *rqt)
 }
 
 u32 mlx5e_rqt_size(struct mlx5_core_dev *mdev, unsigned int num_channels);
-int mlx5e_rqt_redirect_direct(struct mlx5e_rqt *rqt, u32 rqn);
-int mlx5e_rqt_redirect_indir(struct mlx5e_rqt *rqt, u32 *rqns, unsigned int num_rqns,
+int mlx5e_rqt_redirect_direct(struct mlx5e_rqt *rqt, u32 rqn, u32 *vhca_id);
+int mlx5e_rqt_redirect_indir(struct mlx5e_rqt *rqt, u32 *rqns, u32 *vhca_ids,
+			     unsigned int num_rqns,
 			     u8 hfunc, struct mlx5e_rss_params_indir *indir);
 
 #endif /* __MLX5_EN_RQT_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c
index c1545a2e8d6d..5f742f896600 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.c
@@ -74,7 +74,7 @@ struct mlx5e_rss {
 	struct mlx5e_tir *tir[MLX5E_NUM_INDIR_TIRS];
 	struct mlx5e_tir *inner_tir[MLX5E_NUM_INDIR_TIRS];
 	struct mlx5e_rqt rqt;
-	struct mlx5_core_dev *mdev;
+	struct mlx5_core_dev *mdev; /* primary */
 	u32 drop_rqn;
 	bool inner_ft_support;
 	bool enabled;
@@ -473,21 +473,22 @@ int mlx5e_rss_obtain_tirn(struct mlx5e_rss *rss,
 	return 0;
 }
 
-static int mlx5e_rss_apply(struct mlx5e_rss *rss, u32 *rqns, unsigned int num_rqns)
+static int mlx5e_rss_apply(struct mlx5e_rss *rss, u32 *rqns, u32 *vhca_ids, unsigned int num_rqns)
 {
 	int err;
 
-	err = mlx5e_rqt_redirect_indir(&rss->rqt, rqns, num_rqns, rss->hash.hfunc, &rss->indir);
+	err = mlx5e_rqt_redirect_indir(&rss->rqt, rqns, vhca_ids, num_rqns, rss->hash.hfunc,
+				       &rss->indir);
 	if (err)
 		mlx5e_rss_warn(rss->mdev, "Failed to redirect RQT %#x to channels: err = %d\n",
 			       mlx5e_rqt_get_rqtn(&rss->rqt), err);
 	return err;
 }
 
-void mlx5e_rss_enable(struct mlx5e_rss *rss, u32 *rqns, unsigned int num_rqns)
+void mlx5e_rss_enable(struct mlx5e_rss *rss, u32 *rqns, u32 *vhca_ids, unsigned int num_rqns)
 {
 	rss->enabled = true;
-	mlx5e_rss_apply(rss, rqns, num_rqns);
+	mlx5e_rss_apply(rss, rqns, vhca_ids, num_rqns);
 }
 
 void mlx5e_rss_disable(struct mlx5e_rss *rss)
@@ -495,7 +496,7 @@ void mlx5e_rss_disable(struct mlx5e_rss *rss)
 	int err;
 
 	rss->enabled = false;
-	err = mlx5e_rqt_redirect_direct(&rss->rqt, rss->drop_rqn);
+	err = mlx5e_rqt_redirect_direct(&rss->rqt, rss->drop_rqn, NULL);
 	if (err)
 		mlx5e_rss_warn(rss->mdev, "Failed to redirect RQT %#x to drop RQ %#x: err = %d\n",
 			       mlx5e_rqt_get_rqtn(&rss->rqt), rss->drop_rqn, err);
@@ -568,7 +569,7 @@ int mlx5e_rss_get_rxfh(struct mlx5e_rss *rss, u32 *indir, u8 *key, u8 *hfunc)
 
 int mlx5e_rss_set_rxfh(struct mlx5e_rss *rss, const u32 *indir,
 		       const u8 *key, const u8 *hfunc,
-		       u32 *rqns, unsigned int num_rqns)
+		       u32 *rqns, u32 *vhca_ids, unsigned int num_rqns)
 {
 	bool changed_indir = false;
 	bool changed_hash = false;
@@ -608,7 +609,7 @@ int mlx5e_rss_set_rxfh(struct mlx5e_rss *rss, const u32 *indir,
 	}
 
 	if (changed_indir && rss->enabled) {
-		err = mlx5e_rss_apply(rss, rqns, num_rqns);
+		err = mlx5e_rss_apply(rss, rqns, vhca_ids, num_rqns);
 		if (err) {
 			mlx5e_rss_copy(rss, old_rss);
 			goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h
index d1d0bc350e92..d0df98963c8d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rss.h
@@ -39,7 +39,7 @@ int mlx5e_rss_obtain_tirn(struct mlx5e_rss *rss,
 			  const struct mlx5e_packet_merge_param *init_pkt_merge_param,
 			  bool inner, u32 *tirn);
 
-void mlx5e_rss_enable(struct mlx5e_rss *rss, u32 *rqns, unsigned int num_rqns);
+void mlx5e_rss_enable(struct mlx5e_rss *rss, u32 *rqns, u32 *vhca_ids, unsigned int num_rqns);
 void mlx5e_rss_disable(struct mlx5e_rss *rss);
 
 int mlx5e_rss_packet_merge_set_param(struct mlx5e_rss *rss,
@@ -47,7 +47,7 @@ int mlx5e_rss_packet_merge_set_param(struct mlx5e_rss *rss,
 int mlx5e_rss_get_rxfh(struct mlx5e_rss *rss, u32 *indir, u8 *key, u8 *hfunc);
 int mlx5e_rss_set_rxfh(struct mlx5e_rss *rss, const u32 *indir,
 		       const u8 *key, const u8 *hfunc,
-		       u32 *rqns, unsigned int num_rqns);
+		       u32 *rqns, u32 *vhca_ids, unsigned int num_rqns);
 struct mlx5e_rss_params_hash mlx5e_rss_get_hash(struct mlx5e_rss *rss);
 u8 mlx5e_rss_get_hash_fields(struct mlx5e_rss *rss, enum mlx5_traffic_types tt);
 int mlx5e_rss_set_hash_fields(struct mlx5e_rss *rss, enum mlx5_traffic_types tt,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
index b23e224e3763..a86eade9a9e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
@@ -8,7 +8,7 @@
 #define MLX5E_MAX_NUM_RSS 16
 
 struct mlx5e_rx_res {
-	struct mlx5_core_dev *mdev;
+	struct mlx5_core_dev *mdev; /* primary */
 	enum mlx5e_rx_res_features features;
 	unsigned int max_nch;
 	u32 drop_rqn;
@@ -19,6 +19,7 @@ struct mlx5e_rx_res {
 	struct mlx5e_rss *rss[MLX5E_MAX_NUM_RSS];
 	bool rss_active;
 	u32 *rss_rqns;
+	u32 *rss_vhca_ids;
 	unsigned int rss_nch;
 
 	struct {
@@ -34,6 +35,13 @@ struct mlx5e_rx_res {
 
 /* API for rx_res_rss_* */
 
+static u32 *get_vhca_ids(struct mlx5e_rx_res *res, int offset)
+{
+	bool multi_vhca = res->features & MLX5E_RX_RES_FEATURE_MULTI_VHCA;
+
+	return multi_vhca ? res->rss_vhca_ids + offset : NULL;
+}
+
 void mlx5e_rx_res_rss_update_num_channels(struct mlx5e_rx_res *res, u32 nch)
 {
 	int i;
@@ -85,8 +93,11 @@ int mlx5e_rx_res_rss_init(struct mlx5e_rx_res *res, u32 *rss_idx, unsigned int i
 		return PTR_ERR(rss);
 
 	mlx5e_rss_set_indir_uniform(rss, init_nch);
-	if (res->rss_active)
-		mlx5e_rss_enable(rss, res->rss_rqns, res->rss_nch);
+	if (res->rss_active) {
+		u32 *vhca_ids = get_vhca_ids(res, 0);
+
+		mlx5e_rss_enable(rss, res->rss_rqns, vhca_ids, res->rss_nch);
+	}
 
 	res->rss[i] = rss;
 	*rss_idx = i;
@@ -153,10 +164,12 @@ static void mlx5e_rx_res_rss_enable(struct mlx5e_rx_res *res)
 
 	for (i = 0; i < MLX5E_MAX_NUM_RSS; i++) {
 		struct mlx5e_rss *rss = res->rss[i];
+		u32 *vhca_ids;
 
 		if (!rss)
 			continue;
-		mlx5e_rss_enable(rss, res->rss_rqns, res->rss_nch);
+		vhca_ids = get_vhca_ids(res, 0);
+		mlx5e_rss_enable(rss, res->rss_rqns, vhca_ids, res->rss_nch);
 	}
 }
 
@@ -200,6 +213,7 @@ int mlx5e_rx_res_rss_get_rxfh(struct mlx5e_rx_res *res, u32 rss_idx,
 int mlx5e_rx_res_rss_set_rxfh(struct mlx5e_rx_res *res, u32 rss_idx,
 			      const u32 *indir, const u8 *key, const u8 *hfunc)
 {
+	u32 *vhca_ids = get_vhca_ids(res, 0);
 	struct mlx5e_rss *rss;
 
 	if (rss_idx >= MLX5E_MAX_NUM_RSS)
@@ -209,7 +223,8 @@ int mlx5e_rx_res_rss_set_rxfh(struct mlx5e_rx_res *res, u32 rss_idx,
 	if (!rss)
 		return -ENOENT;
 
-	return mlx5e_rss_set_rxfh(rss, indir, key, hfunc, res->rss_rqns, res->rss_nch);
+	return mlx5e_rss_set_rxfh(rss, indir, key, hfunc, res->rss_rqns, vhca_ids,
+				  res->rss_nch);
 }
 
 int mlx5e_rx_res_rss_get_hash_fields(struct mlx5e_rx_res *res, u32 rss_idx,
@@ -280,11 +295,13 @@ struct mlx5e_rss *mlx5e_rx_res_rss_get(struct mlx5e_rx_res *res, u32 rss_idx)
 
 static void mlx5e_rx_res_free(struct mlx5e_rx_res *res)
 {
+	kvfree(res->rss_vhca_ids);
 	kvfree(res->rss_rqns);
 	kvfree(res);
 }
 
-static struct mlx5e_rx_res *mlx5e_rx_res_alloc(struct mlx5_core_dev *mdev, unsigned int max_nch)
+static struct mlx5e_rx_res *mlx5e_rx_res_alloc(struct mlx5_core_dev *mdev, unsigned int max_nch,
+					       bool multi_vhca)
 {
 	struct mlx5e_rx_res *rx_res;
 
@@ -298,6 +315,15 @@ static struct mlx5e_rx_res *mlx5e_rx_res_alloc(struct mlx5_core_dev *mdev, unsig
 		return NULL;
 	}
 
+	if (multi_vhca) {
+		rx_res->rss_vhca_ids = kvcalloc(max_nch, sizeof(*rx_res->rss_vhca_ids), GFP_KERNEL);
+		if (!rx_res->rss_vhca_ids) {
+			kvfree(rx_res->rss_rqns);
+			kvfree(rx_res);
+			return NULL;
+		}
+	}
+
 	return rx_res;
 }
 
@@ -424,10 +450,11 @@ mlx5e_rx_res_create(struct mlx5_core_dev *mdev, enum mlx5e_rx_res_features featu
 		    const struct mlx5e_packet_merge_param *init_pkt_merge_param,
 		    unsigned int init_nch)
 {
+	bool multi_vhca = features & MLX5E_RX_RES_FEATURE_MULTI_VHCA;
 	struct mlx5e_rx_res *res;
 	int err;
 
-	res = mlx5e_rx_res_alloc(mdev, max_nch);
+	res = mlx5e_rx_res_alloc(mdev, max_nch, multi_vhca);
 	if (!res)
 		return ERR_PTR(-ENOMEM);
 
@@ -504,10 +531,11 @@ static void mlx5e_rx_res_channel_activate_direct(struct mlx5e_rx_res *res,
 						 struct mlx5e_channels *chs,
 						 unsigned int ix)
 {
+	u32 *vhca_id = get_vhca_ids(res, ix);
 	u32 rqn = res->rss_rqns[ix];
 	int err;
 
-	err = mlx5e_rqt_redirect_direct(&res->channels[ix].direct_rqt, rqn);
+	err = mlx5e_rqt_redirect_direct(&res->channels[ix].direct_rqt, rqn, vhca_id);
 	if (err)
 		mlx5_core_warn(res->mdev, "Failed to redirect direct RQT %#x to RQ %#x (channel %u): err = %d\n",
 			       mlx5e_rqt_get_rqtn(&res->channels[ix].direct_rqt),
@@ -519,7 +547,7 @@ static void mlx5e_rx_res_channel_deactivate_direct(struct mlx5e_rx_res *res,
 {
 	int err;
 
-	err = mlx5e_rqt_redirect_direct(&res->channels[ix].direct_rqt, res->drop_rqn);
+	err = mlx5e_rqt_redirect_direct(&res->channels[ix].direct_rqt, res->drop_rqn, NULL);
 	if (err)
 		mlx5_core_warn(res->mdev, "Failed to redirect direct RQT %#x to drop RQ %#x (channel %u): err = %d\n",
 			       mlx5e_rqt_get_rqtn(&res->channels[ix].direct_rqt),
@@ -534,10 +562,12 @@ void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_chann
 	nch = mlx5e_channels_get_num(chs);
 
 	for (ix = 0; ix < chs->num; ix++) {
+		u32 *vhca_id = get_vhca_ids(res, ix);
+
 		if (mlx5e_channels_is_xsk(chs, ix))
-			mlx5e_channels_get_xsk_rqn(chs, ix, &res->rss_rqns[ix]);
+			mlx5e_channels_get_xsk_rqn(chs, ix, &res->rss_rqns[ix], vhca_id);
 		else
-			mlx5e_channels_get_regular_rqn(chs, ix, &res->rss_rqns[ix]);
+			mlx5e_channels_get_regular_rqn(chs, ix, &res->rss_rqns[ix], vhca_id);
 	}
 	res->rss_nch = chs->num;
 
@@ -554,7 +584,7 @@ void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_chann
 		if (!mlx5e_channels_get_ptp_rqn(chs, &rqn))
 			rqn = res->drop_rqn;
 
-		err = mlx5e_rqt_redirect_direct(&res->ptp.rqt, rqn);
+		err = mlx5e_rqt_redirect_direct(&res->ptp.rqt, rqn, NULL);
 		if (err)
 			mlx5_core_warn(res->mdev, "Failed to redirect direct RQT %#x to RQ %#x (PTP): err = %d\n",
 				       mlx5e_rqt_get_rqtn(&res->ptp.rqt),
@@ -573,7 +603,7 @@ void mlx5e_rx_res_channels_deactivate(struct mlx5e_rx_res *res)
 		mlx5e_rx_res_channel_deactivate_direct(res, ix);
 
 	if (res->features & MLX5E_RX_RES_FEATURE_PTP) {
-		err = mlx5e_rqt_redirect_direct(&res->ptp.rqt, res->drop_rqn);
+		err = mlx5e_rqt_redirect_direct(&res->ptp.rqt, res->drop_rqn, NULL);
 		if (err)
 			mlx5_core_warn(res->mdev, "Failed to redirect direct RQT %#x to drop RQ %#x (PTP): err = %d\n",
 				       mlx5e_rqt_get_rqtn(&res->ptp.rqt),
@@ -584,10 +614,12 @@ void mlx5e_rx_res_channels_deactivate(struct mlx5e_rx_res *res)
 void mlx5e_rx_res_xsk_update(struct mlx5e_rx_res *res, struct mlx5e_channels *chs,
 			     unsigned int ix, bool xsk)
 {
+	u32 *vhca_id = get_vhca_ids(res, ix);
+
 	if (xsk)
-		mlx5e_channels_get_xsk_rqn(chs, ix, &res->rss_rqns[ix]);
+		mlx5e_channels_get_xsk_rqn(chs, ix, &res->rss_rqns[ix], vhca_id);
 	else
-		mlx5e_channels_get_regular_rqn(chs, ix, &res->rss_rqns[ix]);
+		mlx5e_channels_get_regular_rqn(chs, ix, &res->rss_rqns[ix], vhca_id);
 
 	mlx5e_rx_res_rss_enable(res);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
index 82aaba8a82b3..7b1a9f0f1874 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
@@ -18,6 +18,7 @@ struct mlx5e_rss_params_hash;
 enum mlx5e_rx_res_features {
 	MLX5E_RX_RES_FEATURE_INNER_FT = BIT(0),
 	MLX5E_RX_RES_FEATURE_PTP = BIT(1),
+	MLX5E_RX_RES_FEATURE_MULTI_VHCA = BIT(2),
 };
 
 /* Setup */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8dac57282f1c..d707d45ca074 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5382,6 +5382,8 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 	features = MLX5E_RX_RES_FEATURE_PTP;
 	if (mlx5_tunnel_inner_ft_supported(mdev))
 		features |= MLX5E_RX_RES_FEATURE_INNER_FT;
+	if (mlx5_get_sd(priv->mdev))
+		features |= MLX5E_RX_RES_FEATURE_MULTI_VHCA;
 
 	priv->rx_res = mlx5e_rx_res_create(priv->mdev, features, priv->max_nch, priv->drop_rq.rqn,
 					   &priv->channels.params.packet_merge,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 85cdba226eac..6590443d6f2e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -766,7 +766,7 @@ static int mlx5e_hairpin_create_indirect_rqt(struct mlx5e_hairpin *hp)
 		return err;
 
 	mlx5e_rss_params_indir_init_uniform(&indir, hp->num_channels);
-	err = mlx5e_rqt_init_indir(&hp->indir_rqt, mdev, hp->pair->rqn, hp->num_channels,
+	err = mlx5e_rqt_init_indir(&hp->indir_rqt, mdev, hp->pair->rqn, NULL, hp->num_channels,
 				   mlx5e_rx_res_get_current_hash(priv->rx_res).hfunc,
 				   &indir);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 12/15] net/mlx5e: Support per-mdev queue counter
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 11/15] net/mlx5e: Support cross-vhca RSS Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 13/15] net/mlx5e: Block TLS device offload on combined SD netdev Saeed Mahameed
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Tariq Toukan <tariqt@nvidia.com>

Each queue counter object counts some events (in hardware) for the RQs
that are attached to it, like events of packet drops due to no receive
WQE (rx_out_of_buffer).

Each RQ can be attached to a queue counter only within the same vhca. To
still cover all RQs with these counters, we create multiple instances,
one per vhca.

The result that's shown to the user is now the sum of all instances.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  7 +--
 .../mellanox/mlx5/core/en/monitor_stats.c     | 48 +++++++++++++------
 .../ethernet/mellanox/mlx5/core/en/params.c   |  7 +--
 .../ethernet/mellanox/mlx5/core/en/params.h   |  3 --
 .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 12 +++--
 .../net/ethernet/mellanox/mlx5/core/en/trap.c | 11 +++--
 .../mellanox/mlx5/core/en/xsk/setup.c         |  8 ++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 43 ++++++++++-------
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 39 ++++++++++-----
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  2 +-
 10 files changed, 111 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f6e78c465c7a..84db05fb9389 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -793,6 +793,7 @@ struct mlx5e_channel {
 	DECLARE_BITMAP(state, MLX5E_CHANNEL_NUM_STATES);
 	int                        ix;
 	int                        vec_ix;
+	int                        sd_ix;
 	int                        cpu;
 	/* Sync between icosq recovery and XSK enable/disable. */
 	struct mutex               icosq_recovery_lock;
@@ -916,7 +917,7 @@ struct mlx5e_priv {
 	bool                       tx_ptp_opened;
 	bool                       rx_ptp_opened;
 	struct hwtstamp_config     tstamp;
-	u16                        q_counter;
+	u16                        q_counter[MLX5_SD_MAX_GROUP_SZ];
 	u16                        drop_rq_q_counter;
 	struct notifier_block      events_nb;
 	struct notifier_block      blocking_events_nb;
@@ -1031,12 +1032,12 @@ struct mlx5e_xsk_param;
 
 struct mlx5e_rq_param;
 int mlx5e_open_rq(struct mlx5e_params *params, struct mlx5e_rq_param *param,
-		  struct mlx5e_xsk_param *xsk, int node,
+		  struct mlx5e_xsk_param *xsk, int node, u16 q_counter,
 		  struct mlx5e_rq *rq);
 #define MLX5E_RQ_WQES_TIMEOUT 20000 /* msecs */
 int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time);
 void mlx5e_close_rq(struct mlx5e_rq *rq);
-int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param);
+int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param, u16 q_counter);
 void mlx5e_destroy_rq(struct mlx5e_rq *rq);
 
 struct mlx5e_sq_param;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/monitor_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/monitor_stats.c
index 40c8df111754..e2d8d2754be0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/monitor_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/monitor_stats.c
@@ -20,10 +20,8 @@
 #define NUM_REQ_PPCNT_COUNTER_S1 MLX5_CMD_SET_MONITOR_NUM_PPCNT_COUNTER_SET1
 #define NUM_REQ_Q_COUNTERS_S1    MLX5_CMD_SET_MONITOR_NUM_Q_COUNTERS_SET1
 
-int mlx5e_monitor_counter_supported(struct mlx5e_priv *priv)
+static int mlx5e_monitor_counter_cap(struct mlx5_core_dev *mdev)
 {
-	struct mlx5_core_dev *mdev = priv->mdev;
-
 	if (!MLX5_CAP_GEN(mdev, max_num_of_monitor_counters))
 		return false;
 	if (MLX5_CAP_PCAM_REG(mdev, ppcnt) &&
@@ -36,24 +34,38 @@ int mlx5e_monitor_counter_supported(struct mlx5e_priv *priv)
 	return true;
 }
 
-static void mlx5e_monitor_counter_arm(struct mlx5e_priv *priv)
+int mlx5e_monitor_counter_supported(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *pos;
+	int i;
+
+	mlx5_sd_for_each_dev(i, priv->mdev, pos)
+		if (!mlx5e_monitor_counter_cap(pos))
+			return false;
+	return true;
+}
+
+static void mlx5e_monitor_counter_arm(struct mlx5_core_dev *mdev)
 {
 	u32 in[MLX5_ST_SZ_DW(arm_monitor_counter_in)] = {};
 
 	MLX5_SET(arm_monitor_counter_in, in, opcode,
 		 MLX5_CMD_OP_ARM_MONITOR_COUNTER);
-	mlx5_cmd_exec_in(priv->mdev, arm_monitor_counter, in);
+	mlx5_cmd_exec_in(mdev, arm_monitor_counter, in);
 }
 
 static void mlx5e_monitor_counters_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
 					       monitor_counters_work);
+	struct mlx5_core_dev *pos;
+	int i;
 
 	mutex_lock(&priv->state_lock);
 	mlx5e_stats_update_ndo_stats(priv);
 	mutex_unlock(&priv->state_lock);
-	mlx5e_monitor_counter_arm(priv);
+	mlx5_sd_for_each_dev(i, priv->mdev, pos)
+		mlx5e_monitor_counter_arm(pos);
 }
 
 static int mlx5e_monitor_event_handler(struct notifier_block *nb,
@@ -97,15 +109,13 @@ static int fill_monitor_counter_q_counter_set1(int cnt, int q_counter, u32 *in)
 }
 
 /* check if mlx5e_monitor_counter_supported before calling this function*/
-static void mlx5e_set_monitor_counter(struct mlx5e_priv *priv)
+static void mlx5e_set_monitor_counter(struct mlx5_core_dev *mdev, int q_counter)
 {
-	struct mlx5_core_dev *mdev = priv->mdev;
 	int max_num_of_counters = MLX5_CAP_GEN(mdev, max_num_of_monitor_counters);
 	int num_q_counters      = MLX5_CAP_GEN(mdev, num_q_monitor_counters);
 	int num_ppcnt_counters  = !MLX5_CAP_PCAM_REG(mdev, ppcnt) ? 0 :
 				  MLX5_CAP_GEN(mdev, num_ppcnt_monitor_counters);
 	u32 in[MLX5_ST_SZ_DW(set_monitor_counter_in)] = {};
-	int q_counter = priv->q_counter;
 	int cnt	= 0;
 
 	if (num_ppcnt_counters  >=  NUM_REQ_PPCNT_COUNTER_S1 &&
@@ -127,13 +137,17 @@ static void mlx5e_set_monitor_counter(struct mlx5e_priv *priv)
 /* check if mlx5e_monitor_counter_supported before calling this function*/
 void mlx5e_monitor_counter_init(struct mlx5e_priv *priv)
 {
+	struct mlx5_core_dev *pos;
+	int i;
+
 	INIT_WORK(&priv->monitor_counters_work, mlx5e_monitor_counters_work);
 	MLX5_NB_INIT(&priv->monitor_counters_nb, mlx5e_monitor_event_handler,
 		     MONITOR_COUNTER);
-	mlx5_eq_notifier_register(priv->mdev, &priv->monitor_counters_nb);
-
-	mlx5e_set_monitor_counter(priv);
-	mlx5e_monitor_counter_arm(priv);
+	mlx5_sd_for_each_dev(i, priv->mdev, pos) {
+		mlx5_eq_notifier_register(pos, &priv->monitor_counters_nb);
+		mlx5e_set_monitor_counter(pos, priv->q_counter[i]);
+		mlx5e_monitor_counter_arm(pos);
+	}
 	queue_work(priv->wq, &priv->update_stats_work);
 }
 
@@ -141,11 +155,15 @@ void mlx5e_monitor_counter_init(struct mlx5e_priv *priv)
 void mlx5e_monitor_counter_cleanup(struct mlx5e_priv *priv)
 {
 	u32 in[MLX5_ST_SZ_DW(set_monitor_counter_in)] = {};
+	struct mlx5_core_dev *pos;
+	int i;
 
 	MLX5_SET(set_monitor_counter_in, in, opcode,
 		 MLX5_CMD_OP_SET_MONITOR_COUNTER);
 
-	mlx5_cmd_exec_in(priv->mdev, set_monitor_counter, in);
-	mlx5_eq_notifier_unregister(priv->mdev, &priv->monitor_counters_nb);
+	mlx5_sd_for_each_dev(i, priv->mdev, pos) {
+		mlx5_cmd_exec_in(pos, set_monitor_counter, in);
+		mlx5_eq_notifier_unregister(pos, &priv->monitor_counters_nb);
+	}
 	cancel_work_sync(&priv->monitor_counters_work);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 18f0cedc8610..fb10bb166fbb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -945,7 +945,6 @@ static u8 rq_end_pad_mode(struct mlx5_core_dev *mdev, struct mlx5e_params *param
 int mlx5e_build_rq_param(struct mlx5_core_dev *mdev,
 			 struct mlx5e_params *params,
 			 struct mlx5e_xsk_param *xsk,
-			 u16 q_counter,
 			 struct mlx5e_rq_param *param)
 {
 	void *rqc = param->rqc;
@@ -1007,7 +1006,6 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev,
 	MLX5_SET(wq, wq, log_wq_stride,
 		 mlx5e_get_rqwq_log_stride(params->rq_wq_type, ndsegs));
 	MLX5_SET(wq, wq, pd,               mdev->mlx5e_res.hw_objs.pdn);
-	MLX5_SET(rqc, rqc, counter_set_id, q_counter);
 	MLX5_SET(rqc, rqc, vsd,            params->vlan_strip_disable);
 	MLX5_SET(rqc, rqc, scatter_fcs,    params->scatter_fcs_en);
 
@@ -1018,7 +1016,6 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev,
 }
 
 void mlx5e_build_drop_rq_param(struct mlx5_core_dev *mdev,
-			       u16 q_counter,
 			       struct mlx5e_rq_param *param)
 {
 	void *rqc = param->rqc;
@@ -1027,7 +1024,6 @@ void mlx5e_build_drop_rq_param(struct mlx5_core_dev *mdev,
 	MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC);
 	MLX5_SET(wq, wq, log_wq_stride,
 		 mlx5e_get_rqwq_log_stride(MLX5_WQ_TYPE_CYCLIC, 1));
-	MLX5_SET(rqc, rqc, counter_set_id, q_counter);
 
 	param->wq.buf_numa_node = dev_to_node(mlx5_core_dma_dev(mdev));
 }
@@ -1292,13 +1288,12 @@ void mlx5e_build_xdpsq_param(struct mlx5_core_dev *mdev,
 
 int mlx5e_build_channel_param(struct mlx5_core_dev *mdev,
 			      struct mlx5e_params *params,
-			      u16 q_counter,
 			      struct mlx5e_channel_param *cparam)
 {
 	u8 icosq_log_wq_sz, async_icosq_log_wq_sz;
 	int err;
 
-	err = mlx5e_build_rq_param(mdev, params, NULL, q_counter, &cparam->rq);
+	err = mlx5e_build_rq_param(mdev, params, NULL, &cparam->rq);
 	if (err)
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 6800949dafbc..9a781f18b57f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -130,10 +130,8 @@ void mlx5e_build_create_cq_param(struct mlx5e_create_cq_param *ccp, struct mlx5e
 int mlx5e_build_rq_param(struct mlx5_core_dev *mdev,
 			 struct mlx5e_params *params,
 			 struct mlx5e_xsk_param *xsk,
-			 u16 q_counter,
 			 struct mlx5e_rq_param *param);
 void mlx5e_build_drop_rq_param(struct mlx5_core_dev *mdev,
-			       u16 q_counter,
 			       struct mlx5e_rq_param *param);
 void mlx5e_build_sq_param_common(struct mlx5_core_dev *mdev,
 				 struct mlx5e_sq_param *param);
@@ -149,7 +147,6 @@ void mlx5e_build_xdpsq_param(struct mlx5_core_dev *mdev,
 			     struct mlx5e_sq_param *param);
 int mlx5e_build_channel_param(struct mlx5_core_dev *mdev,
 			      struct mlx5e_params *params,
-			      u16 q_counter,
 			      struct mlx5e_channel_param *cparam);
 
 u16 mlx5e_calc_sq_stop_room(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index c206cc0a8483..cafb41895f94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -646,7 +646,6 @@ static void mlx5e_ptp_build_sq_param(struct mlx5_core_dev *mdev,
 
 static void mlx5e_ptp_build_rq_param(struct mlx5_core_dev *mdev,
 				     struct net_device *netdev,
-				     u16 q_counter,
 				     struct mlx5e_ptp_params *ptp_params)
 {
 	struct mlx5e_rq_param *rq_params = &ptp_params->rq_param;
@@ -655,7 +654,7 @@ static void mlx5e_ptp_build_rq_param(struct mlx5_core_dev *mdev,
 	params->rq_wq_type = MLX5_WQ_TYPE_CYCLIC;
 	mlx5e_init_rq_type_params(mdev, params);
 	params->sw_mtu = netdev->max_mtu;
-	mlx5e_build_rq_param(mdev, params, NULL, q_counter, rq_params);
+	mlx5e_build_rq_param(mdev, params, NULL, rq_params);
 }
 
 static void mlx5e_ptp_build_params(struct mlx5e_ptp *c,
@@ -681,7 +680,7 @@ static void mlx5e_ptp_build_params(struct mlx5e_ptp *c,
 	/* RQ */
 	if (test_bit(MLX5E_PTP_STATE_RX, c->state)) {
 		params->vlan_strip_disable = orig->vlan_strip_disable;
-		mlx5e_ptp_build_rq_param(c->mdev, c->netdev, c->priv->q_counter, cparams);
+		mlx5e_ptp_build_rq_param(c->mdev, c->netdev, cparams);
 	}
 }
 
@@ -714,13 +713,16 @@ static int mlx5e_ptp_open_rq(struct mlx5e_ptp *c, struct mlx5e_params *params,
 			     struct mlx5e_rq_param *rq_param)
 {
 	int node = dev_to_node(c->mdev->device);
-	int err;
+	int err, sd_ix;
+	u16 q_counter;
 
 	err = mlx5e_init_ptp_rq(c, params, &c->rq);
 	if (err)
 		return err;
 
-	return mlx5e_open_rq(params, rq_param, NULL, node, &c->rq);
+	sd_ix = mlx5_sd_ch_ix_get_dev_ix(c->mdev, MLX5E_PTP_CHANNEL_IX);
+	q_counter = c->priv->q_counter[sd_ix];
+	return mlx5e_open_rq(params, rq_param, NULL, node, q_counter, &c->rq);
 }
 
 static int mlx5e_ptp_open_queues(struct mlx5e_ptp *c,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
index ac458a8d10e0..53ca16cb9c41 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
@@ -63,10 +63,12 @@ static int mlx5e_open_trap_rq(struct mlx5e_priv *priv, struct mlx5e_trap *t)
 	struct mlx5e_create_cq_param ccp = {};
 	struct dim_cq_moder trap_moder = {};
 	struct mlx5e_rq *rq = &t->rq;
+	u16 q_counter;
 	int node;
 	int err;
 
 	node = dev_to_node(mdev->device);
+	q_counter = priv->q_counter[0];
 
 	ccp.netdev   = priv->netdev;
 	ccp.wq       = priv->wq;
@@ -79,7 +81,7 @@ static int mlx5e_open_trap_rq(struct mlx5e_priv *priv, struct mlx5e_trap *t)
 		return err;
 
 	mlx5e_init_trap_rq(t, &t->params, rq);
-	err = mlx5e_open_rq(&t->params, rq_param, NULL, node, rq);
+	err = mlx5e_open_rq(&t->params, rq_param, NULL, node, q_counter, rq);
 	if (err)
 		goto err_destroy_cq;
 
@@ -116,15 +118,14 @@ static int mlx5e_create_trap_direct_rq_tir(struct mlx5_core_dev *mdev, struct ml
 }
 
 static void mlx5e_build_trap_params(struct mlx5_core_dev *mdev,
-				    int max_mtu, u16 q_counter,
-				    struct mlx5e_trap *t)
+				    int max_mtu, struct mlx5e_trap *t)
 {
 	struct mlx5e_params *params = &t->params;
 
 	params->rq_wq_type = MLX5_WQ_TYPE_CYCLIC;
 	mlx5e_init_rq_type_params(mdev, params);
 	params->sw_mtu = max_mtu;
-	mlx5e_build_rq_param(mdev, params, NULL, q_counter, &t->rq_param);
+	mlx5e_build_rq_param(mdev, params, NULL, &t->rq_param);
 }
 
 static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_priv *priv)
@@ -138,7 +139,7 @@ static struct mlx5e_trap *mlx5e_open_trap(struct mlx5e_priv *priv)
 	if (!t)
 		return ERR_PTR(-ENOMEM);
 
-	mlx5e_build_trap_params(priv->mdev, netdev->max_mtu, priv->q_counter, t);
+	mlx5e_build_trap_params(priv->mdev, netdev->max_mtu, t);
 
 	t->priv     = priv;
 	t->mdev     = priv->mdev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index 82e6abbc1734..06592b9f0424 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -49,10 +49,9 @@ bool mlx5e_validate_xsk_param(struct mlx5e_params *params,
 static void mlx5e_build_xsk_cparam(struct mlx5_core_dev *mdev,
 				   struct mlx5e_params *params,
 				   struct mlx5e_xsk_param *xsk,
-				   u16 q_counter,
 				   struct mlx5e_channel_param *cparam)
 {
-	mlx5e_build_rq_param(mdev, params, xsk, q_counter, &cparam->rq);
+	mlx5e_build_rq_param(mdev, params, xsk, &cparam->rq);
 	mlx5e_build_xdpsq_param(mdev, params, xsk, &cparam->xdp_sq);
 }
 
@@ -93,6 +92,7 @@ static int mlx5e_open_xsk_rq(struct mlx5e_channel *c, struct mlx5e_params *param
 			     struct mlx5e_rq_param *rq_params, struct xsk_buff_pool *pool,
 			     struct mlx5e_xsk_param *xsk)
 {
+	u16 q_counter = c->priv->q_counter[c->sd_ix];
 	struct mlx5e_rq *xskrq = &c->xskrq;
 	int err;
 
@@ -100,7 +100,7 @@ static int mlx5e_open_xsk_rq(struct mlx5e_channel *c, struct mlx5e_params *param
 	if (err)
 		return err;
 
-	err = mlx5e_open_rq(params, rq_params, xsk, cpu_to_node(c->cpu), xskrq);
+	err = mlx5e_open_rq(params, rq_params, xsk, cpu_to_node(c->cpu), q_counter, xskrq);
 	if (err)
 		return err;
 
@@ -125,7 +125,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 	if (!cparam)
 		return -ENOMEM;
 
-	mlx5e_build_xsk_cparam(priv->mdev, params, xsk, priv->q_counter, cparam);
+	mlx5e_build_xsk_cparam(priv->mdev, params, xsk, cparam);
 
 	err = mlx5e_open_cq(c->mdev, params->rx_cq_moderation, &cparam->rq.cqp, &ccp,
 			    &c->xskrq.cq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d707d45ca074..b8f08d64f66b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1025,7 +1025,7 @@ static void mlx5e_free_rq(struct mlx5e_rq *rq)
 	mlx5_wq_destroy(&rq->wq_ctrl);
 }
 
-int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param)
+int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param, u16 q_counter)
 {
 	struct mlx5_core_dev *mdev = rq->mdev;
 	u8 ts_format;
@@ -1052,6 +1052,7 @@ int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param)
 	MLX5_SET(rqc,  rqc, cqn,		rq->cq.mcq.cqn);
 	MLX5_SET(rqc,  rqc, state,		MLX5_RQC_STATE_RST);
 	MLX5_SET(rqc,  rqc, ts_format,		ts_format);
+	MLX5_SET(rqc,  rqc, counter_set_id,     q_counter);
 	MLX5_SET(wq,   wq,  log_wq_pg_sz,	rq->wq_ctrl.buf.page_shift -
 						MLX5_ADAPTER_PAGE_SHIFT);
 	MLX5_SET64(wq, wq,  dbr_addr,		rq->wq_ctrl.db.dma);
@@ -1275,7 +1276,7 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
 }
 
 int mlx5e_open_rq(struct mlx5e_params *params, struct mlx5e_rq_param *param,
-		  struct mlx5e_xsk_param *xsk, int node,
+		  struct mlx5e_xsk_param *xsk, int node, u16 q_counter,
 		  struct mlx5e_rq *rq)
 {
 	struct mlx5_core_dev *mdev = rq->mdev;
@@ -1288,7 +1289,7 @@ int mlx5e_open_rq(struct mlx5e_params *params, struct mlx5e_rq_param *param,
 	if (err)
 		return err;
 
-	err = mlx5e_create_rq(rq, param);
+	err = mlx5e_create_rq(rq, param, q_counter);
 	if (err)
 		goto err_free_rq;
 
@@ -2334,13 +2335,14 @@ static int mlx5e_set_tx_maxrate(struct net_device *dev, int index, u32 rate)
 static int mlx5e_open_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 			     struct mlx5e_rq_param *rq_params)
 {
+	u16 q_counter = c->priv->q_counter[c->sd_ix];
 	int err;
 
 	err = mlx5e_init_rxq_rq(c, params, rq_params->xdp_frag_size, &c->rq);
 	if (err)
 		return err;
 
-	return mlx5e_open_rq(params, rq_params, NULL, cpu_to_node(c->cpu), &c->rq);
+	return mlx5e_open_rq(params, rq_params, NULL, cpu_to_node(c->cpu), q_counter, &c->rq);
 }
 
 static int mlx5e_open_queues(struct mlx5e_channel *c,
@@ -2557,6 +2559,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 	c->tstamp   = &priv->tstamp;
 	c->ix       = ix;
 	c->vec_ix   = vec_ix;
+	c->sd_ix    = mlx5_sd_ch_ix_get_dev_ix(mdev, ix);
 	c->cpu      = cpu;
 	c->pdev     = mlx5_core_dma_dev(mdev);
 	c->netdev   = priv->netdev;
@@ -2655,7 +2658,7 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
 	if (!chs->c || !cparam)
 		goto err_free;
 
-	err = mlx5e_build_channel_param(priv->mdev, &chs->params, priv->q_counter, cparam);
+	err = mlx5e_build_channel_param(priv->mdev, &chs->params, cparam);
 	if (err)
 		goto err_free;
 
@@ -3346,7 +3349,7 @@ int mlx5e_open_drop_rq(struct mlx5e_priv *priv,
 	struct mlx5e_cq *cq = &drop_rq->cq;
 	int err;
 
-	mlx5e_build_drop_rq_param(mdev, priv->drop_rq_q_counter, &rq_param);
+	mlx5e_build_drop_rq_param(mdev, &rq_param);
 
 	err = mlx5e_alloc_drop_cq(priv, cq, &cq_param);
 	if (err)
@@ -3360,7 +3363,7 @@ int mlx5e_open_drop_rq(struct mlx5e_priv *priv,
 	if (err)
 		goto err_destroy_cq;
 
-	err = mlx5e_create_rq(drop_rq, &rq_param);
+	err = mlx5e_create_rq(drop_rq, &rq_param, priv->drop_rq_q_counter);
 	if (err)
 		goto err_free_rq;
 
@@ -5275,13 +5278,17 @@ void mlx5e_create_q_counters(struct mlx5e_priv *priv)
 	u32 out[MLX5_ST_SZ_DW(alloc_q_counter_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(alloc_q_counter_in)] = {};
 	struct mlx5_core_dev *mdev = priv->mdev;
-	int err;
+	struct mlx5_core_dev *pos;
+	int err, i;
 
 	MLX5_SET(alloc_q_counter_in, in, opcode, MLX5_CMD_OP_ALLOC_Q_COUNTER);
-	err = mlx5_cmd_exec_inout(mdev, alloc_q_counter, in, out);
-	if (!err)
-		priv->q_counter =
-			MLX5_GET(alloc_q_counter_out, out, counter_set_id);
+
+	mlx5_sd_for_each_dev(i, mdev, pos) {
+		err = mlx5_cmd_exec_inout(pos, alloc_q_counter, in, out);
+		if (!err)
+			priv->q_counter[i] =
+				MLX5_GET(alloc_q_counter_out, out, counter_set_id);
+	}
 
 	err = mlx5_cmd_exec_inout(mdev, alloc_q_counter, in, out);
 	if (!err)
@@ -5292,13 +5299,17 @@ void mlx5e_create_q_counters(struct mlx5e_priv *priv)
 void mlx5e_destroy_q_counters(struct mlx5e_priv *priv)
 {
 	u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)] = {};
+	struct mlx5_core_dev *pos;
+	int i;
 
 	MLX5_SET(dealloc_q_counter_in, in, opcode,
 		 MLX5_CMD_OP_DEALLOC_Q_COUNTER);
-	if (priv->q_counter) {
-		MLX5_SET(dealloc_q_counter_in, in, counter_set_id,
-			 priv->q_counter);
-		mlx5_cmd_exec_in(priv->mdev, dealloc_q_counter, in);
+	mlx5_sd_for_each_dev(i, priv->mdev, pos) {
+		if (priv->q_counter[i]) {
+			MLX5_SET(dealloc_q_counter_in, in, counter_set_id,
+				 priv->q_counter[i]);
+			mlx5_cmd_exec_in(pos, dealloc_q_counter, in);
+		}
 	}
 
 	if (priv->drop_rq_q_counter) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 4b96ad657145..f3d0898bdbc6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -561,11 +561,23 @@ static const struct counter_desc drop_rq_stats_desc[] = {
 #define NUM_Q_COUNTERS			ARRAY_SIZE(q_stats_desc)
 #define NUM_DROP_RQ_COUNTERS		ARRAY_SIZE(drop_rq_stats_desc)
 
+static bool q_counter_any(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *pos;
+	int i;
+
+	mlx5_sd_for_each_dev(i, priv->mdev, pos)
+		if (priv->q_counter[i++])
+			return true;
+
+	return false;
+}
+
 static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(qcnt)
 {
 	int num_stats = 0;
 
-	if (priv->q_counter)
+	if (q_counter_any(priv))
 		num_stats += NUM_Q_COUNTERS;
 
 	if (priv->drop_rq_q_counter)
@@ -578,7 +590,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(qcnt)
 {
 	int i;
 
-	for (i = 0; i < NUM_Q_COUNTERS && priv->q_counter; i++)
+	for (i = 0; i < NUM_Q_COUNTERS && q_counter_any(priv); i++)
 		strcpy(data + (idx++) * ETH_GSTRING_LEN,
 		       q_stats_desc[i].format);
 
@@ -593,7 +605,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(qcnt)
 {
 	int i;
 
-	for (i = 0; i < NUM_Q_COUNTERS && priv->q_counter; i++)
+	for (i = 0; i < NUM_Q_COUNTERS && q_counter_any(priv); i++)
 		data[idx++] = MLX5E_READ_CTR32_CPU(&priv->stats.qcnt,
 						   q_stats_desc, i);
 	for (i = 0; i < NUM_DROP_RQ_COUNTERS && priv->drop_rq_q_counter; i++)
@@ -607,18 +619,23 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(qcnt)
 	struct mlx5e_qcounter_stats *qcnt = &priv->stats.qcnt;
 	u32 out[MLX5_ST_SZ_DW(query_q_counter_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(query_q_counter_in)] = {};
-	int ret;
+	struct mlx5_core_dev *pos;
+	u32 rx_out_of_buffer = 0;
+	int ret, i;
 
 	MLX5_SET(query_q_counter_in, in, opcode, MLX5_CMD_OP_QUERY_Q_COUNTER);
 
-	if (priv->q_counter) {
-		MLX5_SET(query_q_counter_in, in, counter_set_id,
-			 priv->q_counter);
-		ret = mlx5_cmd_exec_inout(priv->mdev, query_q_counter, in, out);
-		if (!ret)
-			qcnt->rx_out_of_buffer = MLX5_GET(query_q_counter_out,
-							  out, out_of_buffer);
+	mlx5_sd_for_each_dev(i, priv->mdev, pos) {
+		if (priv->q_counter[i]) {
+			MLX5_SET(query_q_counter_in, in, counter_set_id,
+				 priv->q_counter[i]);
+			ret = mlx5_cmd_exec_inout(pos, query_q_counter, in, out);
+			if (!ret)
+				rx_out_of_buffer += MLX5_GET(query_q_counter_out,
+							     out, out_of_buffer);
+		}
 	}
+	qcnt->rx_out_of_buffer = rx_out_of_buffer;
 
 	if (priv->drop_rq_q_counter) {
 		MLX5_SET(query_q_counter_in, in, counter_set_id,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 6590443d6f2e..ebcf40fb671d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1169,7 +1169,7 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
 			MLX5_CAP_GEN(priv->mdev, log_min_hairpin_wq_data_sz),
 			MLX5_CAP_GEN(priv->mdev, log_max_hairpin_wq_data_sz));
 
-	params.q_counter = priv->q_counter;
+	params.q_counter = priv->q_counter[0];
 	err = devl_param_driverinit_value_get(
 		devlink, MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, &val);
 	if (err) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 13/15] net/mlx5e: Block TLS device offload on combined SD netdev
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 12/15] net/mlx5e: Support per-mdev queue counter Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 14/15] net/mlx5: Enable SD feature Saeed Mahameed
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

1) Each TX TLS device offloaded context has its own TIS object.  Extra work
is needed to get it working in a SD environment, where a stream can move
between different SQs (belonging to different mdevs).

2) Each RX TLS device offloaded context needs a DEK object from the DEK
pool.

Extra work is needed to get it working in a SD environment, as the DEK
pool currently falsely depends on TX cap, and is on the primary device
only.

Disallow this combination for now.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
index 984fa04bd331..e3e57c849436 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
@@ -96,7 +96,7 @@ bool mlx5e_is_ktls_rx(struct mlx5_core_dev *mdev)
 {
 	u8 max_sq_wqebbs = mlx5e_get_max_sq_wqebbs(mdev);
 
-	if (is_kdump_kernel() || !MLX5_CAP_GEN(mdev, tls_rx))
+	if (is_kdump_kernel() || !MLX5_CAP_GEN(mdev, tls_rx) || mlx5_get_sd(mdev))
 		return false;
 
 	/* Check the possibility to post the required ICOSQ WQEs. */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
index f11075e67658..adc6d8ea0960 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
@@ -11,6 +11,7 @@
 
 #ifdef CONFIG_MLX5_EN_TLS
 #include "lib/crypto.h"
+#include "lib/mlx5.h"
 
 struct mlx5_crypto_dek *mlx5_ktls_create_key(struct mlx5_crypto_dek_pool *dek_pool,
 					     struct tls_crypto_info *crypto_info);
@@ -61,7 +62,8 @@ void mlx5e_ktls_rx_resync_destroy_resp_list(struct mlx5e_ktls_resync_resp *resp_
 
 static inline bool mlx5e_is_ktls_tx(struct mlx5_core_dev *mdev)
 {
-	return !is_kdump_kernel() && MLX5_CAP_GEN(mdev, tls_tx);
+	return !is_kdump_kernel() && MLX5_CAP_GEN(mdev, tls_tx) &&
+		!mlx5_get_sd(mdev);
 }
 
 bool mlx5e_is_ktls_rx(struct mlx5_core_dev *mdev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 14/15] net/mlx5: Enable SD feature
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 13/15] net/mlx5e: Block TLS device offload on combined SD netdev Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  0:57 ` [net-next 15/15] net/mlx5: Implement management PF Ethernet profile Saeed Mahameed
  2024-01-04 22:47 ` [pull request][net-next 00/15] mlx5 updates 2023-12-20 Jakub Kicinski
  15 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Have an actual mlx5_sd instance in the core device, and fix the getter
accordingly. This allows SD stuff to flow, the feature becomes supported
only here.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 3 ++-
 include/linux/mlx5/driver.h                        | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index 0810b92b48d0..37d5f445598c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -59,10 +59,11 @@ struct mlx5_sd;
 
 static inline struct mlx5_sd *mlx5_get_sd(struct mlx5_core_dev *dev)
 {
-	return NULL;
+	return dev->sd;
 }
 
 static inline void mlx5_set_sd(struct mlx5_core_dev *dev, struct mlx5_sd *sd)
 {
+	dev->sd = sd;
 }
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index aafb36c9e5d9..cd286b681970 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -822,6 +822,7 @@ struct mlx5_core_dev {
 	struct blocking_notifier_head macsec_nh;
 #endif
 	u64 num_ipsec_offloads;
+	struct mlx5_sd          *sd;
 };
 
 struct mlx5_db {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 14/15] net/mlx5: Enable SD feature Saeed Mahameed
@ 2023-12-21  0:57 ` Saeed Mahameed
  2023-12-21  2:45   ` Nelson, Shannon
  2024-01-04 22:47 ` [pull request][net-next 00/15] mlx5 updates 2023-12-20 Jakub Kicinski
  15 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21  0:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner, Daniel Jurgens

From: Armen Ratner <armeng@nvidia.com>

Add management PF modules, which introduce support for the structures
needed to create the resources for the MGMT PF to work.
Also, add the necessary calls and functions to establish this
functionality.

Signed-off-by: Armen Ratner <armeng@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |   3 +
 .../net/ethernet/mellanox/mlx5/core/ecpf.c    |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   4 +
 .../ethernet/mellanox/mlx5/core/en/mgmt_pf.c  | 268 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  24 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |   2 +-
 include/linux/mlx5/driver.h                   |   8 +
 include/linux/mlx5/mlx5_ifc.h                 |  14 +-
 9 files changed, 323 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 76dc5a9b9648..f36232dead1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -29,7 +29,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += en/rqt.o en/tir.o en/rss.o en/rx_res.o \
 		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \
 		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o en/ptp.o \
 		en/qos.o en/htb.o en/trap.o en/fs_tt_redirect.o en/selq.o \
-		lib/crypto.o lib/sd.o
+		en/mgmt_pf.o lib/crypto.o lib/sd.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index cf0477f53dc4..aa1b471e13fa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -190,6 +190,9 @@ bool mlx5_rdma_supported(struct mlx5_core_dev *dev)
 	if (is_mp_supported(dev))
 		return false;
 
+	if (mlx5_core_is_mgmt_pf(dev))
+		return false;
+
 	return true;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index d000236ddbac..aa397e3ebe6d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -75,6 +75,9 @@ int mlx5_ec_init(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_ecpf(dev))
 		return 0;
 
+	if (mlx5_core_is_mgmt_pf(dev))
+		return 0;
+
 	return mlx5_host_pf_init(dev);
 }
 
@@ -85,6 +88,9 @@ void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_ecpf(dev))
 		return;
 
+	if (mlx5_core_is_mgmt_pf(dev))
+		return;
+
 	mlx5_host_pf_cleanup(dev);
 
 	err = mlx5_wait_for_pages(dev, &dev->priv.page_counters[MLX5_HOST_PF]);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 84db05fb9389..922b63c25154 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -63,6 +63,7 @@
 #include "lib/sd.h"
 
 extern const struct net_device_ops mlx5e_netdev_ops;
+extern const struct net_device_ops mlx5e_mgmt_netdev_ops;
 struct page_pool;
 
 #define MLX5E_METADATA_ETHER_TYPE (0x8CE4)
@@ -1125,6 +1126,7 @@ static inline bool mlx5_tx_swp_supported(struct mlx5_core_dev *mdev)
 }
 
 extern const struct ethtool_ops mlx5e_ethtool_ops;
+extern const struct mlx5e_profile mlx5e_mgmt_pf_nic_profile;
 
 int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, u32 *mkey);
 int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises);
@@ -1230,6 +1232,8 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb,
 				       struct net_device *netdev,
 				       netdev_features_t features);
 int mlx5e_set_features(struct net_device *netdev, netdev_features_t features);
+void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv);
+
 #ifdef CONFIG_MLX5_ESWITCH
 int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac);
 int mlx5e_set_vf_rate(struct net_device *dev, int vf, int min_tx_rate, int max_tx_rate);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c b/drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c
new file mode 100644
index 000000000000..77b5805895b9
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c
@@ -0,0 +1,268 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+#include <linux/kernel.h>
+#include "en/params.h"
+#include "en/health.h"
+#include "lib/eq.h"
+#include "en/dcbnl.h"
+#include "en_accel/ipsec.h"
+#include "en_accel/en_accel.h"
+#include "en/trap.h"
+#include "en/monitor_stats.h"
+#include "en/hv_vhca_stats.h"
+#include "en_rep.h"
+#include "en.h"
+
+static int mgmt_pf_async_event(struct notifier_block *nb, unsigned long event, void *data)
+{
+	struct mlx5e_priv *priv = container_of(nb, struct mlx5e_priv, events_nb);
+	struct mlx5_eqe   *eqe = data;
+
+	if (event != MLX5_EVENT_TYPE_PORT_CHANGE)
+		return NOTIFY_DONE;
+
+	switch (eqe->sub_type) {
+	case MLX5_PORT_CHANGE_SUBTYPE_DOWN:
+	case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE:
+		queue_work(priv->wq, &priv->update_carrier_work);
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static void mlx5e_mgmt_pf_enable_async_events(struct mlx5e_priv *priv)
+{
+	priv->events_nb.notifier_call = mgmt_pf_async_event;
+	mlx5_notifier_register(priv->mdev, &priv->events_nb);
+}
+
+static void mlx5e_disable_mgmt_pf_async_events(struct mlx5e_priv *priv)
+{
+	mlx5_notifier_unregister(priv->mdev, &priv->events_nb);
+}
+
+static void mlx5e_modify_mgmt_pf_admin_state(struct mlx5_core_dev *mdev,
+					     enum mlx5_port_status state)
+{
+	struct mlx5_eswitch *esw = mdev->priv.eswitch;
+	int vport_admin_state;
+
+	mlx5_set_port_admin_status(mdev, state);
+
+	if (state == MLX5_PORT_UP)
+		vport_admin_state = MLX5_VPORT_ADMIN_STATE_AUTO;
+	else
+		vport_admin_state = MLX5_VPORT_ADMIN_STATE_DOWN;
+
+	mlx5_eswitch_set_vport_state(esw, MLX5_VPORT_UPLINK, vport_admin_state);
+}
+
+static void mlx5e_build_mgmt_pf_nic_params(struct mlx5e_priv *priv, u16 mtu)
+{
+	struct mlx5e_params *params = &priv->channels.params;
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u8 rx_cq_period_mode;
+
+	params->sw_mtu = mtu;
+	params->hard_mtu = MLX5E_ETH_HARD_MTU;
+	params->num_channels = 1;
+
+	/* SQ */
+	params->log_sq_size = is_kdump_kernel() ?
+		MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE :
+		MLX5E_PARAMS_DEFAULT_LOG_SQ_SIZE;
+	MLX5E_SET_PFLAG(params, MLX5E_PFLAG_SKB_TX_MPWQE, mlx5e_tx_mpwqe_supported(mdev));
+
+	MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_NO_CSUM_COMPLETE, false);
+
+	/* RQ */
+	mlx5e_build_rq_params(mdev, params);
+
+	/* CQ moderation params */
+	rx_cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
+			MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
+			MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+	params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
+	params->tx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
+	mlx5e_set_rx_cq_mode_params(params, rx_cq_period_mode);
+	mlx5e_set_tx_cq_mode_params(params, MLX5_CQ_PERIOD_MODE_START_FROM_EQE);
+
+	/* TX inline */
+	mlx5_query_min_inline(mdev, &params->tx_min_inline_mode);
+}
+
+static int mlx5e_mgmt_pf_init(struct mlx5_core_dev *mdev,
+			      struct net_device *netdev)
+{
+	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct mlx5e_flow_steering *fs;
+	int err;
+
+	mlx5e_build_mgmt_pf_nic_params(priv, netdev->mtu);
+
+	mlx5e_timestamp_init(priv);
+
+	fs = mlx5e_fs_init(priv->profile, mdev,
+			   !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
+			   priv->dfs_root);
+	if (!fs) {
+		err = -ENOMEM;
+		mlx5_core_err(mdev, "FS initialization failed, %d\n", err);
+		return err;
+	}
+	priv->fs = fs;
+
+	mlx5e_health_create_reporters(priv);
+
+	return 0;
+}
+
+static void mlx5e_mgmt_pf_cleanup(struct mlx5e_priv *priv)
+{
+	mlx5e_health_destroy_reporters(priv);
+	mlx5e_fs_cleanup(priv->fs);
+	priv->fs = NULL;
+}
+
+static int mlx5e_mgmt_pf_init_rx(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int err;
+
+	priv->rx_res = mlx5e_rx_res_create(mdev, 0, priv->max_nch, priv->drop_rq.rqn,
+					   &priv->channels.params.packet_merge,
+					   priv->channels.params.num_channels);
+	if (!priv->rx_res)
+		return -ENOMEM;
+
+	mlx5e_create_q_counters(priv);
+
+	err = mlx5e_open_drop_rq(priv, &priv->drop_rq);
+	if (err) {
+		mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
+		goto err_destroy_q_counters;
+	}
+
+	err = mlx5e_create_flow_steering(priv->fs, priv->rx_res, priv->profile,
+					 priv->netdev);
+	if (err) {
+		mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
+		goto err_destroy_rx_res;
+	}
+
+	return 0;
+
+err_destroy_rx_res:
+	mlx5e_rx_res_destroy(priv->rx_res);
+	priv->rx_res = NULL;
+	mlx5e_close_drop_rq(&priv->drop_rq);
+err_destroy_q_counters:
+	mlx5e_destroy_q_counters(priv);
+	return err;
+}
+
+static void mlx5e_mgmt_pf_cleanup_rx(struct mlx5e_priv *priv)
+{
+	mlx5e_destroy_flow_steering(priv->fs, !!(priv->netdev->hw_features & NETIF_F_NTUPLE),
+				    priv->profile);
+	mlx5e_rx_res_destroy(priv->rx_res);
+	priv->rx_res = NULL;
+	mlx5e_close_drop_rq(&priv->drop_rq);
+	mlx5e_destroy_q_counters(priv);
+}
+
+static int mlx5e_mgmt_pf_init_tx(struct mlx5e_priv *priv)
+{
+	return 0;
+}
+
+static void mlx5e_mgmt_pf_cleanup_tx(struct mlx5e_priv *priv)
+{
+}
+
+static void mlx5e_mgmt_pf_enable(struct mlx5e_priv *priv)
+{
+	struct net_device *netdev = priv->netdev;
+	struct mlx5_core_dev *mdev = priv->mdev;
+
+	mlx5e_fs_init_l2_addr(priv->fs, netdev);
+
+	/* Marking the link as currently not needed by the Driver */
+	if (!netif_running(netdev))
+		mlx5e_modify_mgmt_pf_admin_state(mdev, MLX5_PORT_DOWN);
+
+	mlx5e_set_netdev_mtu_boundaries(priv);
+	mlx5e_set_dev_port_mtu(priv);
+
+	mlx5e_mgmt_pf_enable_async_events(priv);
+	if (mlx5e_monitor_counter_supported(priv))
+		mlx5e_monitor_counter_init(priv);
+
+	mlx5e_hv_vhca_stats_create(priv);
+	if (netdev->reg_state != NETREG_REGISTERED)
+		return;
+	mlx5e_dcbnl_init_app(priv);
+
+	mlx5e_nic_set_rx_mode(priv);
+
+	rtnl_lock();
+	if (netif_running(netdev))
+		mlx5e_open(netdev);
+	udp_tunnel_nic_reset_ntf(priv->netdev);
+	netif_device_attach(netdev);
+	rtnl_unlock();
+}
+
+static void mlx5e_mgmt_pf_disable(struct mlx5e_priv *priv)
+{
+	if (priv->netdev->reg_state == NETREG_REGISTERED)
+		mlx5e_dcbnl_delete_app(priv);
+
+	rtnl_lock();
+	if (netif_running(priv->netdev))
+		mlx5e_close(priv->netdev);
+	netif_device_detach(priv->netdev);
+	rtnl_unlock();
+
+	mlx5e_nic_set_rx_mode(priv);
+
+	mlx5e_hv_vhca_stats_destroy(priv);
+	if (mlx5e_monitor_counter_supported(priv))
+		mlx5e_monitor_counter_cleanup(priv);
+
+	mlx5e_disable_mgmt_pf_async_events(priv);
+	mlx5e_ipsec_cleanup(priv);
+}
+
+static int mlx5e_mgmt_pf_update_rx(struct mlx5e_priv *priv)
+{
+	return mlx5e_refresh_tirs(priv, false, false);
+}
+
+static int mlx5e_mgmt_pf_max_nch_limit(struct mlx5_core_dev *mdev)
+{
+	return 1;
+}
+
+const struct mlx5e_profile mlx5e_mgmt_pf_nic_profile = {
+	.init		   = mlx5e_mgmt_pf_init,
+	.cleanup	   = mlx5e_mgmt_pf_cleanup,
+	.init_rx	   = mlx5e_mgmt_pf_init_rx,
+	.cleanup_rx	   = mlx5e_mgmt_pf_cleanup_rx,
+	.init_tx	   = mlx5e_mgmt_pf_init_tx,
+	.cleanup_tx	   = mlx5e_mgmt_pf_cleanup_tx,
+	.enable		   = mlx5e_mgmt_pf_enable,
+	.disable	   = mlx5e_mgmt_pf_disable,
+	.update_rx	   = mlx5e_mgmt_pf_update_rx,
+	.update_stats	   = mlx5e_stats_update_ndo_stats,
+	.update_carrier	   = mlx5e_update_carrier,
+	.rx_handlers       = &mlx5e_rx_handlers_nic,
+	.max_tc		   = 1,
+	.max_nch_limit	   = mlx5e_mgmt_pf_max_nch_limit,
+	.stats_grps	   = mlx5e_nic_stats_grps,
+	.stats_grps_num	   = mlx5e_nic_stats_grps_num
+};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b8f08d64f66b..40626b6108fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3799,7 +3799,7 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	stats->tx_errors = stats->tx_aborted_errors + stats->tx_carrier_errors;
 }
 
-static void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv)
+void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv)
 {
 	if (mlx5e_is_uplink_rep(priv))
 		return; /* no rx mode for uplink rep */
@@ -5004,6 +5004,15 @@ const struct net_device_ops mlx5e_netdev_ops = {
 #endif
 };
 
+const struct net_device_ops mlx5e_mgmt_netdev_ops = {
+	.ndo_open		= mlx5e_open,
+	.ndo_stop		= mlx5e_close,
+	.ndo_start_xmit		= mlx5e_xmit,
+	.ndo_get_stats64	= mlx5e_get_stats,
+	.ndo_change_mtu		= mlx5e_change_nic_mtu,
+	.ndo_set_rx_mode	= mlx5e_set_rx_mode,
+};
+
 static u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout)
 {
 	int i;
@@ -5143,7 +5152,11 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	SET_NETDEV_DEV(netdev, mdev->device);
 
-	netdev->netdev_ops = &mlx5e_netdev_ops;
+	if (mlx5_core_is_mgmt_pf(mdev))
+		netdev->netdev_ops = &mlx5e_mgmt_netdev_ops;
+	else
+		netdev->netdev_ops = &mlx5e_netdev_ops;
+
 	netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops;
 	netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;
 
@@ -6094,13 +6107,18 @@ static int mlx5e_suspend(struct auxiliary_device *adev, pm_message_t state)
 static int _mlx5e_probe(struct auxiliary_device *adev)
 {
 	struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
-	const struct mlx5e_profile *profile = &mlx5e_nic_profile;
 	struct mlx5_core_dev *mdev = edev->mdev;
+	const struct mlx5e_profile *profile;
 	struct mlx5e_dev *mlx5e_dev;
 	struct net_device *netdev;
 	struct mlx5e_priv *priv;
 	int err;
 
+	if (mlx5_core_is_mgmt_pf(mdev))
+		profile = &mlx5e_mgmt_pf_nic_profile;
+	else
+		profile = &mlx5e_nic_profile;
+
 	mlx5e_dev = mlx5e_create_devlink(&adev->dev, mdev);
 	if (IS_ERR(mlx5e_dev))
 		return PTR_ERR(mlx5e_dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 3047d7015c52..3bf419d06d53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1665,7 +1665,7 @@ int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 *
 	void *hca_caps;
 	int err;
 
-	if (!mlx5_core_is_ecpf(dev)) {
+	if (!mlx5_core_is_ecpf(dev) || mlx5_core_is_mgmt_pf(dev)) {
 		*max_sfs = 0;
 		return 0;
 	}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index cd286b681970..2bba88c67f58 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1224,6 +1224,14 @@ static inline bool mlx5_core_is_ecpf(const struct mlx5_core_dev *dev)
 	return dev->caps.embedded_cpu;
 }
 
+static inline bool mlx5_core_is_mgmt_pf(const struct mlx5_core_dev *dev)
+{
+	if (!MLX5_CAP_GEN_2(dev, local_mng_port_valid))
+		return false;
+
+	return MLX5_CAP_GEN_2(dev, local_mng_port);
+}
+
 static inline bool
 mlx5_core_is_ecpf_esw_manager(const struct mlx5_core_dev *dev)
 {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index bf2d51952e48..586569209254 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1954,8 +1954,10 @@ enum {
 struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   reserved_at_0[0x80];
 
-	u8         migratable[0x1];
-	u8         reserved_at_81[0x1f];
+	u8	   migratable[0x1];
+	u8	   reserved_at_81[0x19];
+	u8	   local_mng_port[0x1];
+	u8	   reserved_at_9b[0x5];
 
 	u8	   max_reformat_insert_size[0x8];
 	u8	   max_reformat_insert_offset[0x8];
@@ -1973,7 +1975,13 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 
 	u8	   allowed_object_for_other_vhca_access[0x40];
 
-	u8	   reserved_at_140[0x60];
+	u8	   reserved_at_140[0x20];
+
+	u8	   reserved_at_160[0xa];
+	u8	   local_mng_port_valid[0x1];
+	u8	   reserved_at_16b[0x15];
+
+	u8	   reserved_at_180[0x20];
 
 	u8	   flow_table_type_2_type[0x8];
 	u8	   reserved_at_1a8[0x3];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2023-12-21  0:57 ` [net-next 15/15] net/mlx5: Implement management PF Ethernet profile Saeed Mahameed
@ 2023-12-21  2:45   ` Nelson, Shannon
  2023-12-21 22:25     ` Saeed Mahameed
  0 siblings, 1 reply; 45+ messages in thread
From: Nelson, Shannon @ 2023-12-21  2:45 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner, Daniel Jurgens

On 12/20/2023 4:57 PM, Saeed Mahameed wrote:
> 
> From: Armen Ratner <armeng@nvidia.com>
> 
> Add management PF modules, which introduce support for the structures
> needed to create the resources for the MGMT PF to work.
> Also, add the necessary calls and functions to establish this
> functionality.

Hmmm.... this reminds me of a previous discussion:
https://lore.kernel.org/netdev/20200305140322.2dc86db0@kicinski-fedora-PC1C0HJN/

sln


> 
> Signed-off-by: Armen Ratner <armeng@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
> ---
>   .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/dev.c |   3 +
>   .../net/ethernet/mellanox/mlx5/core/ecpf.c    |   6 +
>   drivers/net/ethernet/mellanox/mlx5/core/en.h  |   4 +
>   .../ethernet/mellanox/mlx5/core/en/mgmt_pf.c  | 268 ++++++++++++++++++
>   .../net/ethernet/mellanox/mlx5/core/en_main.c |  24 +-
>   .../net/ethernet/mellanox/mlx5/core/eswitch.c |   2 +-
>   include/linux/mlx5/driver.h                   |   8 +
>   include/linux/mlx5/mlx5_ifc.h                 |  14 +-
>   9 files changed, 323 insertions(+), 8 deletions(-)
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index 76dc5a9b9648..f36232dead1a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -29,7 +29,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += en/rqt.o en/tir.o en/rss.o en/rx_res.o \
>                  en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \
>                  en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o en/ptp.o \
>                  en/qos.o en/htb.o en/trap.o en/fs_tt_redirect.o en/selq.o \
> -               lib/crypto.o lib/sd.o
> +               en/mgmt_pf.o lib/crypto.o lib/sd.o
> 
>   #
>   # Netdev extra
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
> index cf0477f53dc4..aa1b471e13fa 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
> @@ -190,6 +190,9 @@ bool mlx5_rdma_supported(struct mlx5_core_dev *dev)
>          if (is_mp_supported(dev))
>                  return false;
> 
> +       if (mlx5_core_is_mgmt_pf(dev))
> +               return false;
> +
>          return true;
>   }
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
> index d000236ddbac..aa397e3ebe6d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
> @@ -75,6 +75,9 @@ int mlx5_ec_init(struct mlx5_core_dev *dev)
>          if (!mlx5_core_is_ecpf(dev))
>                  return 0;
> 
> +       if (mlx5_core_is_mgmt_pf(dev))
> +               return 0;
> +
>          return mlx5_host_pf_init(dev);
>   }
> 
> @@ -85,6 +88,9 @@ void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
>          if (!mlx5_core_is_ecpf(dev))
>                  return;
> 
> +       if (mlx5_core_is_mgmt_pf(dev))
> +               return;
> +
>          mlx5_host_pf_cleanup(dev);
> 
>          err = mlx5_wait_for_pages(dev, &dev->priv.page_counters[MLX5_HOST_PF]);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 84db05fb9389..922b63c25154 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -63,6 +63,7 @@
>   #include "lib/sd.h"
> 
>   extern const struct net_device_ops mlx5e_netdev_ops;
> +extern const struct net_device_ops mlx5e_mgmt_netdev_ops;
>   struct page_pool;
> 
>   #define MLX5E_METADATA_ETHER_TYPE (0x8CE4)
> @@ -1125,6 +1126,7 @@ static inline bool mlx5_tx_swp_supported(struct mlx5_core_dev *mdev)
>   }
> 
>   extern const struct ethtool_ops mlx5e_ethtool_ops;
> +extern const struct mlx5e_profile mlx5e_mgmt_pf_nic_profile;
> 
>   int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, u32 *mkey);
>   int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises);
> @@ -1230,6 +1232,8 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb,
>                                         struct net_device *netdev,
>                                         netdev_features_t features);
>   int mlx5e_set_features(struct net_device *netdev, netdev_features_t features);
> +void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv);
> +
>   #ifdef CONFIG_MLX5_ESWITCH
>   int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac);
>   int mlx5e_set_vf_rate(struct net_device *dev, int vf, int min_tx_rate, int max_tx_rate);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c b/drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c
> new file mode 100644
> index 000000000000..77b5805895b9
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/mgmt_pf.c
> @@ -0,0 +1,268 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +// Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> +
> +#include <linux/kernel.h>
> +#include "en/params.h"
> +#include "en/health.h"
> +#include "lib/eq.h"
> +#include "en/dcbnl.h"
> +#include "en_accel/ipsec.h"
> +#include "en_accel/en_accel.h"
> +#include "en/trap.h"
> +#include "en/monitor_stats.h"
> +#include "en/hv_vhca_stats.h"
> +#include "en_rep.h"
> +#include "en.h"
> +
> +static int mgmt_pf_async_event(struct notifier_block *nb, unsigned long event, void *data)
> +{
> +       struct mlx5e_priv *priv = container_of(nb, struct mlx5e_priv, events_nb);
> +       struct mlx5_eqe   *eqe = data;
> +
> +       if (event != MLX5_EVENT_TYPE_PORT_CHANGE)
> +               return NOTIFY_DONE;
> +
> +       switch (eqe->sub_type) {
> +       case MLX5_PORT_CHANGE_SUBTYPE_DOWN:
> +       case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE:
> +               queue_work(priv->wq, &priv->update_carrier_work);
> +               break;
> +       default:
> +               return NOTIFY_DONE;
> +       }
> +
> +       return NOTIFY_OK;
> +}
> +
> +static void mlx5e_mgmt_pf_enable_async_events(struct mlx5e_priv *priv)
> +{
> +       priv->events_nb.notifier_call = mgmt_pf_async_event;
> +       mlx5_notifier_register(priv->mdev, &priv->events_nb);
> +}
> +
> +static void mlx5e_disable_mgmt_pf_async_events(struct mlx5e_priv *priv)
> +{
> +       mlx5_notifier_unregister(priv->mdev, &priv->events_nb);
> +}
> +
> +static void mlx5e_modify_mgmt_pf_admin_state(struct mlx5_core_dev *mdev,
> +                                            enum mlx5_port_status state)
> +{
> +       struct mlx5_eswitch *esw = mdev->priv.eswitch;
> +       int vport_admin_state;
> +
> +       mlx5_set_port_admin_status(mdev, state);
> +
> +       if (state == MLX5_PORT_UP)
> +               vport_admin_state = MLX5_VPORT_ADMIN_STATE_AUTO;
> +       else
> +               vport_admin_state = MLX5_VPORT_ADMIN_STATE_DOWN;
> +
> +       mlx5_eswitch_set_vport_state(esw, MLX5_VPORT_UPLINK, vport_admin_state);
> +}
> +
> +static void mlx5e_build_mgmt_pf_nic_params(struct mlx5e_priv *priv, u16 mtu)
> +{
> +       struct mlx5e_params *params = &priv->channels.params;
> +       struct mlx5_core_dev *mdev = priv->mdev;
> +       u8 rx_cq_period_mode;
> +
> +       params->sw_mtu = mtu;
> +       params->hard_mtu = MLX5E_ETH_HARD_MTU;
> +       params->num_channels = 1;
> +
> +       /* SQ */
> +       params->log_sq_size = is_kdump_kernel() ?
> +               MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE :
> +               MLX5E_PARAMS_DEFAULT_LOG_SQ_SIZE;
> +       MLX5E_SET_PFLAG(params, MLX5E_PFLAG_SKB_TX_MPWQE, mlx5e_tx_mpwqe_supported(mdev));
> +
> +       MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_NO_CSUM_COMPLETE, false);
> +
> +       /* RQ */
> +       mlx5e_build_rq_params(mdev, params);
> +
> +       /* CQ moderation params */
> +       rx_cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
> +                       MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
> +                       MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> +       params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
> +       params->tx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
> +       mlx5e_set_rx_cq_mode_params(params, rx_cq_period_mode);
> +       mlx5e_set_tx_cq_mode_params(params, MLX5_CQ_PERIOD_MODE_START_FROM_EQE);
> +
> +       /* TX inline */
> +       mlx5_query_min_inline(mdev, &params->tx_min_inline_mode);
> +}
> +
> +static int mlx5e_mgmt_pf_init(struct mlx5_core_dev *mdev,
> +                             struct net_device *netdev)
> +{
> +       struct mlx5e_priv *priv = netdev_priv(netdev);
> +       struct mlx5e_flow_steering *fs;
> +       int err;
> +
> +       mlx5e_build_mgmt_pf_nic_params(priv, netdev->mtu);
> +
> +       mlx5e_timestamp_init(priv);
> +
> +       fs = mlx5e_fs_init(priv->profile, mdev,
> +                          !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
> +                          priv->dfs_root);
> +       if (!fs) {
> +               err = -ENOMEM;
> +               mlx5_core_err(mdev, "FS initialization failed, %d\n", err);
> +               return err;
> +       }
> +       priv->fs = fs;
> +
> +       mlx5e_health_create_reporters(priv);
> +
> +       return 0;
> +}
> +
> +static void mlx5e_mgmt_pf_cleanup(struct mlx5e_priv *priv)
> +{
> +       mlx5e_health_destroy_reporters(priv);
> +       mlx5e_fs_cleanup(priv->fs);
> +       priv->fs = NULL;
> +}
> +
> +static int mlx5e_mgmt_pf_init_rx(struct mlx5e_priv *priv)
> +{
> +       struct mlx5_core_dev *mdev = priv->mdev;
> +       int err;
> +
> +       priv->rx_res = mlx5e_rx_res_create(mdev, 0, priv->max_nch, priv->drop_rq.rqn,
> +                                          &priv->channels.params.packet_merge,
> +                                          priv->channels.params.num_channels);
> +       if (!priv->rx_res)
> +               return -ENOMEM;
> +
> +       mlx5e_create_q_counters(priv);
> +
> +       err = mlx5e_open_drop_rq(priv, &priv->drop_rq);
> +       if (err) {
> +               mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
> +               goto err_destroy_q_counters;
> +       }
> +
> +       err = mlx5e_create_flow_steering(priv->fs, priv->rx_res, priv->profile,
> +                                        priv->netdev);
> +       if (err) {
> +               mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
> +               goto err_destroy_rx_res;
> +       }
> +
> +       return 0;
> +
> +err_destroy_rx_res:
> +       mlx5e_rx_res_destroy(priv->rx_res);
> +       priv->rx_res = NULL;
> +       mlx5e_close_drop_rq(&priv->drop_rq);
> +err_destroy_q_counters:
> +       mlx5e_destroy_q_counters(priv);
> +       return err;
> +}
> +
> +static void mlx5e_mgmt_pf_cleanup_rx(struct mlx5e_priv *priv)
> +{
> +       mlx5e_destroy_flow_steering(priv->fs, !!(priv->netdev->hw_features & NETIF_F_NTUPLE),
> +                                   priv->profile);
> +       mlx5e_rx_res_destroy(priv->rx_res);
> +       priv->rx_res = NULL;
> +       mlx5e_close_drop_rq(&priv->drop_rq);
> +       mlx5e_destroy_q_counters(priv);
> +}
> +
> +static int mlx5e_mgmt_pf_init_tx(struct mlx5e_priv *priv)
> +{
> +       return 0;
> +}
> +
> +static void mlx5e_mgmt_pf_cleanup_tx(struct mlx5e_priv *priv)
> +{
> +}
> +
> +static void mlx5e_mgmt_pf_enable(struct mlx5e_priv *priv)
> +{
> +       struct net_device *netdev = priv->netdev;
> +       struct mlx5_core_dev *mdev = priv->mdev;
> +
> +       mlx5e_fs_init_l2_addr(priv->fs, netdev);
> +
> +       /* Marking the link as currently not needed by the Driver */
> +       if (!netif_running(netdev))
> +               mlx5e_modify_mgmt_pf_admin_state(mdev, MLX5_PORT_DOWN);
> +
> +       mlx5e_set_netdev_mtu_boundaries(priv);
> +       mlx5e_set_dev_port_mtu(priv);
> +
> +       mlx5e_mgmt_pf_enable_async_events(priv);
> +       if (mlx5e_monitor_counter_supported(priv))
> +               mlx5e_monitor_counter_init(priv);
> +
> +       mlx5e_hv_vhca_stats_create(priv);
> +       if (netdev->reg_state != NETREG_REGISTERED)
> +               return;
> +       mlx5e_dcbnl_init_app(priv);
> +
> +       mlx5e_nic_set_rx_mode(priv);
> +
> +       rtnl_lock();
> +       if (netif_running(netdev))
> +               mlx5e_open(netdev);
> +       udp_tunnel_nic_reset_ntf(priv->netdev);
> +       netif_device_attach(netdev);
> +       rtnl_unlock();
> +}
> +
> +static void mlx5e_mgmt_pf_disable(struct mlx5e_priv *priv)
> +{
> +       if (priv->netdev->reg_state == NETREG_REGISTERED)
> +               mlx5e_dcbnl_delete_app(priv);
> +
> +       rtnl_lock();
> +       if (netif_running(priv->netdev))
> +               mlx5e_close(priv->netdev);
> +       netif_device_detach(priv->netdev);
> +       rtnl_unlock();
> +
> +       mlx5e_nic_set_rx_mode(priv);
> +
> +       mlx5e_hv_vhca_stats_destroy(priv);
> +       if (mlx5e_monitor_counter_supported(priv))
> +               mlx5e_monitor_counter_cleanup(priv);
> +
> +       mlx5e_disable_mgmt_pf_async_events(priv);
> +       mlx5e_ipsec_cleanup(priv);
> +}
> +
> +static int mlx5e_mgmt_pf_update_rx(struct mlx5e_priv *priv)
> +{
> +       return mlx5e_refresh_tirs(priv, false, false);
> +}
> +
> +static int mlx5e_mgmt_pf_max_nch_limit(struct mlx5_core_dev *mdev)
> +{
> +       return 1;
> +}
> +
> +const struct mlx5e_profile mlx5e_mgmt_pf_nic_profile = {
> +       .init              = mlx5e_mgmt_pf_init,
> +       .cleanup           = mlx5e_mgmt_pf_cleanup,
> +       .init_rx           = mlx5e_mgmt_pf_init_rx,
> +       .cleanup_rx        = mlx5e_mgmt_pf_cleanup_rx,
> +       .init_tx           = mlx5e_mgmt_pf_init_tx,
> +       .cleanup_tx        = mlx5e_mgmt_pf_cleanup_tx,
> +       .enable            = mlx5e_mgmt_pf_enable,
> +       .disable           = mlx5e_mgmt_pf_disable,
> +       .update_rx         = mlx5e_mgmt_pf_update_rx,
> +       .update_stats      = mlx5e_stats_update_ndo_stats,
> +       .update_carrier    = mlx5e_update_carrier,
> +       .rx_handlers       = &mlx5e_rx_handlers_nic,
> +       .max_tc            = 1,
> +       .max_nch_limit     = mlx5e_mgmt_pf_max_nch_limit,
> +       .stats_grps        = mlx5e_nic_stats_grps,
> +       .stats_grps_num    = mlx5e_nic_stats_grps_num
> +};
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index b8f08d64f66b..40626b6108fb 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3799,7 +3799,7 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
>          stats->tx_errors = stats->tx_aborted_errors + stats->tx_carrier_errors;
>   }
> 
> -static void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv)
> +void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv)
>   {
>          if (mlx5e_is_uplink_rep(priv))
>                  return; /* no rx mode for uplink rep */
> @@ -5004,6 +5004,15 @@ const struct net_device_ops mlx5e_netdev_ops = {
>   #endif
>   };
> 
> +const struct net_device_ops mlx5e_mgmt_netdev_ops = {
> +       .ndo_open               = mlx5e_open,
> +       .ndo_stop               = mlx5e_close,
> +       .ndo_start_xmit         = mlx5e_xmit,
> +       .ndo_get_stats64        = mlx5e_get_stats,
> +       .ndo_change_mtu         = mlx5e_change_nic_mtu,
> +       .ndo_set_rx_mode        = mlx5e_set_rx_mode,
> +};
> +
>   static u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout)
>   {
>          int i;
> @@ -5143,7 +5152,11 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> 
>          SET_NETDEV_DEV(netdev, mdev->device);
> 
> -       netdev->netdev_ops = &mlx5e_netdev_ops;
> +       if (mlx5_core_is_mgmt_pf(mdev))
> +               netdev->netdev_ops = &mlx5e_mgmt_netdev_ops;
> +       else
> +               netdev->netdev_ops = &mlx5e_netdev_ops;
> +
>          netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops;
>          netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;
> 
> @@ -6094,13 +6107,18 @@ static int mlx5e_suspend(struct auxiliary_device *adev, pm_message_t state)
>   static int _mlx5e_probe(struct auxiliary_device *adev)
>   {
>          struct mlx5_adev *edev = container_of(adev, struct mlx5_adev, adev);
> -       const struct mlx5e_profile *profile = &mlx5e_nic_profile;
>          struct mlx5_core_dev *mdev = edev->mdev;
> +       const struct mlx5e_profile *profile;
>          struct mlx5e_dev *mlx5e_dev;
>          struct net_device *netdev;
>          struct mlx5e_priv *priv;
>          int err;
> 
> +       if (mlx5_core_is_mgmt_pf(mdev))
> +               profile = &mlx5e_mgmt_pf_nic_profile;
> +       else
> +               profile = &mlx5e_nic_profile;
> +
>          mlx5e_dev = mlx5e_create_devlink(&adev->dev, mdev);
>          if (IS_ERR(mlx5e_dev))
>                  return PTR_ERR(mlx5e_dev);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 3047d7015c52..3bf419d06d53 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1665,7 +1665,7 @@ int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 *
>          void *hca_caps;
>          int err;
> 
> -       if (!mlx5_core_is_ecpf(dev)) {
> +       if (!mlx5_core_is_ecpf(dev) || mlx5_core_is_mgmt_pf(dev)) {
>                  *max_sfs = 0;
>                  return 0;
>          }
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index cd286b681970..2bba88c67f58 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -1224,6 +1224,14 @@ static inline bool mlx5_core_is_ecpf(const struct mlx5_core_dev *dev)
>          return dev->caps.embedded_cpu;
>   }
> 
> +static inline bool mlx5_core_is_mgmt_pf(const struct mlx5_core_dev *dev)
> +{
> +       if (!MLX5_CAP_GEN_2(dev, local_mng_port_valid))
> +               return false;
> +
> +       return MLX5_CAP_GEN_2(dev, local_mng_port);
> +}
> +
>   static inline bool
>   mlx5_core_is_ecpf_esw_manager(const struct mlx5_core_dev *dev)
>   {
> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> index bf2d51952e48..586569209254 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -1954,8 +1954,10 @@ enum {
>   struct mlx5_ifc_cmd_hca_cap_2_bits {
>          u8         reserved_at_0[0x80];
> 
> -       u8         migratable[0x1];
> -       u8         reserved_at_81[0x1f];
> +       u8         migratable[0x1];
> +       u8         reserved_at_81[0x19];
> +       u8         local_mng_port[0x1];
> +       u8         reserved_at_9b[0x5];
> 
>          u8         max_reformat_insert_size[0x8];
>          u8         max_reformat_insert_offset[0x8];
> @@ -1973,7 +1975,13 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
> 
>          u8         allowed_object_for_other_vhca_access[0x40];
> 
> -       u8         reserved_at_140[0x60];
> +       u8         reserved_at_140[0x20];
> +
> +       u8         reserved_at_160[0xa];
> +       u8         local_mng_port_valid[0x1];
> +       u8         reserved_at_16b[0x15];
> +
> +       u8         reserved_at_180[0x20];
> 
>          u8         flow_table_type_2_type[0x8];
>          u8         reserved_at_1a8[0x3];
> --
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2023-12-21  2:45   ` Nelson, Shannon
@ 2023-12-21 22:25     ` Saeed Mahameed
  2024-01-04 22:44       ` Jakub Kicinski
  0 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2023-12-21 22:25 UTC (permalink / raw)
  To: Nelson, Shannon
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner,
	Daniel Jurgens

On 20 Dec 18:45, Nelson, Shannon wrote:
>On 12/20/2023 4:57 PM, Saeed Mahameed wrote:
>>
>>From: Armen Ratner <armeng@nvidia.com>
>>
>>Add management PF modules, which introduce support for the structures
>>needed to create the resources for the MGMT PF to work.
>>Also, add the necessary calls and functions to establish this
>>functionality.
>
>Hmmm.... this reminds me of a previous discussion:
>https://lore.kernel.org/netdev/20200305140322.2dc86db0@kicinski-fedora-PC1C0HJN/
>

Maybe we should have made it clear here as well, this management PF just
exposes a netdev on the embedded ARM that will be used to communicate
with the device onboard BMC via NC-SI, so it meant to be used
only by standard tools.

Thanks,
Saeed.


>sln
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 01/15] net/mlx5e: Use the correct lag ports number when creating TISes
  2023-12-21  0:57 ` [net-next 01/15] net/mlx5e: Use the correct lag ports number when creating TISes Saeed Mahameed
@ 2023-12-29 22:40   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 45+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-12-29 22:40 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem, kuba, pabeni, edumazet, saeedm, netdev, tariqt

Hello:

This series was applied to netdev/net-next.git (main)
by Saeed Mahameed <saeedm@nvidia.com>:

On Wed, 20 Dec 2023 16:57:07 -0800 you wrote:
> From: Saeed Mahameed <saeedm@nvidia.com>
> 
> The cited commit moved the code of mlx5e_create_tises() and changed the
> loop to create TISes over MLX5_MAX_PORTS constant value, instead of
> getting the correct lag ports supported by the device, which can cause
> FW errors on devices with less than MLX5_MAX_PORTS ports.
> 
> [...]

Here is the summary with links:
  - [net-next,01/15] net/mlx5e: Use the correct lag ports number when creating TISes
    https://git.kernel.org/netdev/net-next/c/a7e7b40c4bc1
  - [net-next,02/15] net/mlx5: Fix query of sd_group field
    https://git.kernel.org/netdev/net-next/c/e04984a37398
  - [net-next,03/15] net/mlx5: SD, Introduce SD lib
    https://git.kernel.org/netdev/net-next/c/4a04a31f4932
  - [net-next,04/15] net/mlx5: SD, Implement basic query and instantiation
    https://git.kernel.org/netdev/net-next/c/63b9ce944c0e
  - [net-next,05/15] net/mlx5: SD, Implement devcom communication and primary election
    https://git.kernel.org/netdev/net-next/c/a45af9a96740
  - [net-next,06/15] net/mlx5: SD, Implement steering for primary and secondaries
    https://git.kernel.org/netdev/net-next/c/605fcce33b2d
  - [net-next,07/15] net/mlx5: SD, Add informative prints in kernel log
    https://git.kernel.org/netdev/net-next/c/c82d36032511
  - [net-next,08/15] net/mlx5e: Create single netdev per SD group
    https://git.kernel.org/netdev/net-next/c/e2578b4f983c
  - [net-next,09/15] net/mlx5e: Create EN core HW resources for all secondary devices
    https://git.kernel.org/netdev/net-next/c/c4fb94aa822d
  - [net-next,10/15] net/mlx5e: Let channels be SD-aware
    https://git.kernel.org/netdev/net-next/c/e4f9686bdee7
  - [net-next,11/15] net/mlx5e: Support cross-vhca RSS
    https://git.kernel.org/netdev/net-next/c/c73a3ab8fa6e
  - [net-next,12/15] net/mlx5e: Support per-mdev queue counter
    https://git.kernel.org/netdev/net-next/c/d72baceb9253
  - [net-next,13/15] net/mlx5e: Block TLS device offload on combined SD netdev
    https://git.kernel.org/netdev/net-next/c/83a59ce0057b
  - [net-next,14/15] net/mlx5: Enable SD feature
    https://git.kernel.org/netdev/net-next/c/c88c49ac9c18
  - [net-next,15/15] net/mlx5: Implement management PF Ethernet profile
    https://git.kernel.org/netdev/net-next/c/22c4640698a1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2023-12-21 22:25     ` Saeed Mahameed
@ 2024-01-04 22:44       ` Jakub Kicinski
  2024-01-08 23:22         ` Saeed Mahameed
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-04 22:44 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Nelson, Shannon, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner,
	Daniel Jurgens

On Thu, 21 Dec 2023 14:25:33 -0800 Saeed Mahameed wrote:
> Maybe we should have made it clear here as well, this management PF just
> exposes a netdev on the embedded ARM that will be used to communicate
> with the device onboard BMC via NC-SI, so it meant to be used
> only by standard tools.

How's that different to any other BMC via NC-SI setup?
NC-SI is supposed to steal packets which were directed to the wire.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [pull request][net-next 00/15] mlx5 updates 2023-12-20
  2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
                   ` (14 preceding siblings ...)
  2023-12-21  0:57 ` [net-next 15/15] net/mlx5: Implement management PF Ethernet profile Saeed Mahameed
@ 2024-01-04 22:47 ` Jakub Kicinski
  2024-01-08  1:19   ` Jakub Kicinski
  15 siblings, 1 reply; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-04 22:47 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan

On Wed, 20 Dec 2023 16:57:06 -0800 Saeed Mahameed wrote:
> Support Socket-Direct multi-dev netdev

There's no documentation for any of it?

$ git grep -i 'socket.direct' -- Documentation/
$

it's a feature many people have talked about for ever.
I'm pretty sure there are at least 2 vendors who have
HW support to do the same thing. Without docs everyone
will implement is slightly differently :(

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2023-12-21  0:57 ` [net-next 10/15] net/mlx5e: Let channels be SD-aware Saeed Mahameed
@ 2024-01-04 22:50   ` Jakub Kicinski
  2024-01-08 12:30     ` Gal Pressman
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-04 22:50 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On Wed, 20 Dec 2023 16:57:16 -0800 Saeed Mahameed wrote:
> Example for 2 mdevs and 6 channels:
> +-------+---------+
> | ch ix | mdev ix |
> +-------+---------+
> |   0   |    0    |
> |   1   |    1    |
> |   2   |    0    |
> |   3   |    1    |
> |   4   |    0    |
> |   5   |    1    |
> +-------+---------+

Meaning Rx queue 0 goes to PF 0, Rx queue 1 goes to PF 1, etc.?
Is the user then expected to magic pixie dust the XPS or some such
to get to the right queue?

How is this going to get represented in the recently merged Netlink
queue API?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log
  2023-12-21  0:57 ` [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log Saeed Mahameed
@ 2024-01-05 12:12   ` Jiri Pirko
  2024-01-25  7:42     ` Tariq Toukan
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Pirko @ 2024-01-05 12:12 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

Thu, Dec 21, 2023 at 01:57:13AM CET, saeed@kernel.org wrote:
>From: Tariq Toukan <tariqt@nvidia.com>
>
>Print to kernel log when an SD group moves from/to ready state.
>
>Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>---
> .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 21 +++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>index 3309f21d892e..f68942277c62 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>@@ -373,6 +373,21 @@ static void sd_cmd_unset_secondary(struct mlx5_core_dev *secondary)
> 	mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
> }
> 
>+static void sd_print_group(struct mlx5_core_dev *primary)
>+{
>+	struct mlx5_sd *sd = mlx5_get_sd(primary);
>+	struct mlx5_core_dev *pos;
>+	int i;
>+
>+	sd_info(primary, "group id %#x, primary %s, vhca %u\n",
>+		sd->group_id, pci_name(primary->pdev),
>+		MLX5_CAP_GEN(primary, vhca_id));
>+	mlx5_sd_for_each_secondary(i, primary, pos)
>+		sd_info(primary, "group id %#x, secondary#%d %s, vhca %u\n",
>+			sd->group_id, i - 1, pci_name(pos->pdev),
>+			MLX5_CAP_GEN(pos, vhca_id));
>+}
>+
> int mlx5_sd_init(struct mlx5_core_dev *dev)
> {
> 	struct mlx5_core_dev *primary, *pos, *to;
>@@ -410,6 +425,10 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
> 			goto err_unset_secondaries;
> 	}
> 
>+	sd_info(primary, "group id %#x, size %d, combined\n",
>+		sd->group_id, mlx5_devcom_comp_get_size(sd->devcom));

Can't you rather expose this over sysfs or debugfs? I mean, dmesg print
does not seem like a good idea.


>+	sd_print_group(primary);
>+
> 	return 0;
> 
> err_unset_secondaries:
>@@ -440,6 +459,8 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
> 	mlx5_sd_for_each_secondary(i, primary, pos)
> 		sd_cmd_unset_secondary(pos);
> 	sd_cmd_unset_primary(primary);
>+
>+	sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
> out:
> 	sd_unregister(dev);
> 	sd_cleanup(dev);
>-- 
>2.43.0
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation
  2023-12-21  0:57 ` [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation Saeed Mahameed
@ 2024-01-05 12:15   ` Jiri Pirko
  2024-01-25  7:34     ` Tariq Toukan
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Pirko @ 2024-01-05 12:15 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

Thu, Dec 21, 2023 at 01:57:10AM CET, saeed@kernel.org wrote:
>From: Tariq Toukan <tariqt@nvidia.com>

[...]

>+static int sd_init(struct mlx5_core_dev *dev)

Could you maintain "mlx5_" prefix here and in the rest of the patches?


>+{

[...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [pull request][net-next 00/15] mlx5 updates 2023-12-20
  2024-01-04 22:47 ` [pull request][net-next 00/15] mlx5 updates 2023-12-20 Jakub Kicinski
@ 2024-01-08  1:19   ` Jakub Kicinski
  2024-01-08 23:14     ` Saeed Mahameed
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-08  1:19 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan

On Thu, 4 Jan 2024 14:47:21 -0800 Jakub Kicinski wrote:
> On Wed, 20 Dec 2023 16:57:06 -0800 Saeed Mahameed wrote:
> > Support Socket-Direct multi-dev netdev  
> 
> There's no documentation for any of it?
> 
> $ git grep -i 'socket.direct' -- Documentation/
> $
> 
> it's a feature many people have talked about for ever.
> I'm pretty sure there are at least 2 vendors who have
> HW support to do the same thing. Without docs everyone
> will implement is slightly differently :(

No replies so far, and v6.8 merge window has just begun,
so let me drop this from -next for now.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2024-01-04 22:50   ` Jakub Kicinski
@ 2024-01-08 12:30     ` Gal Pressman
  2024-01-09  3:08       ` Jakub Kicinski
  0 siblings, 1 reply; 45+ messages in thread
From: Gal Pressman @ 2024-01-08 12:30 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan

On 05/01/2024 0:50, Jakub Kicinski wrote:
> On Wed, 20 Dec 2023 16:57:16 -0800 Saeed Mahameed wrote:
>> Example for 2 mdevs and 6 channels:
>> +-------+---------+
>> | ch ix | mdev ix |
>> +-------+---------+
>> |   0   |    0    |
>> |   1   |    1    |
>> |   2   |    0    |
>> |   3   |    1    |
>> |   4   |    0    |
>> |   5   |    1    |
>> +-------+---------+
> 
> Meaning Rx queue 0 goes to PF 0, Rx queue 1 goes to PF 1, etc.?

Correct.

> Is the user then expected to magic pixie dust the XPS or some such
> to get to the right queue?

I'm confused, how are RX queues related to XPS?
XPS shouldn't be affected, we just make sure that whatever queue XPS
chose will go out through the "right" PF.

So for example, XPS will choose a queue according to the CPU, and the
driver will make sure that packets transmitted from this SQ are going
out through the PF closer to that NUMA.

> 
> How is this going to get represented in the recently merged Netlink
> queue API?

Can you share a link please?

All the logic is internal to the driver, so I expect it to be fine, but
I'd like to double check.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Create single netdev per SD group
  2023-12-21  0:57 ` [net-next 08/15] net/mlx5e: Create single netdev per SD group Saeed Mahameed
@ 2024-01-08 13:36   ` Aishwarya TCV
  2024-01-08 13:50     ` Gal Pressman
  0 siblings, 1 reply; 45+ messages in thread
From: Aishwarya TCV @ 2024-01-08 13:36 UTC (permalink / raw)
  To: Saeed Mahameed, Tariq Toukan
  Cc: Saeed Mahameed, netdev, Gal Pressman, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, Mark Brown,
	Suzuki K Poulose



On 21/12/2023 00:57, Saeed Mahameed wrote:
> From: Tariq Toukan <tariqt@nvidia.com>
> 
> Integrate the SD library calls into the auxiliary_driver ops in
> preparation for creating a single netdev for the multiple devices
> belonging to the same SD group.
> 
> SD is still disabled at this stage. It is enabled by a downstream patch
> when all needed parts are implemented.
> 
> The netdev is created only when the SD group, with all its participants,
> are ready. It is later destroyed if any of the participating devices
> drops.
> 
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> ---

Hi Tariq,


Currently when booting the kernel against next-master(next-20240108)
with Arm64 on Marvell Thunder X2 (TX2), the kernel is failing to probe
the network card which is resulting in boot failures for our CI (with
rootfs over NFS). I can send the full logs if required. Most other
boards seem fine.

A bisect (full log below) identified this patch as introducing the
failure. Bisected it on the tag "mlx5-updates-2023-12-20" at repo
"https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/".

This works fine on Linux 6.7-rc5


Sample back trace from failure:
------
<3>[   67.915121] mlx5_core 0000:0b:00.1: mlx5_cmd_out_err:808:(pid
1585): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3),
syndrome (0x6c4d48), err(-22)
<3>[   67.915121] mlx5_core 0000:0b:00.1: mlx5_cmd_out_err:808:(pid
1585): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3),
syndrome (0x6c4d48), err(-22)
<4>[   67.945022] mlx5_core.eth: probe of mlx5_core.eth.1 failed with
error -22
<4>[   67.945022] mlx5_core.eth: probe of mlx5_core.eth.1 failed with
error -22
------


Here is the lspci o/p for the card:
------
0b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
    Subsystem: Hewlett Packard Enterprise MT27710 Family [ConnectX-4 Lx]
    Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0,
IOMMU group 0
    Memory at 10000000000 (64-bit, prefetchable) [size=32M]
    Expansion ROM at 43000000 [disabled] [size=1M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [48] Vital Product Data
    Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
    Capabilities: [c0] Vendor Specific Information: Len=18 <?>
    Capabilities: [40] Power Management version 3
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
    Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
    Capabilities: [1c0] Secondary PCI Express
    Capabilities: [230] Access Control Services
------


Bisect log:
------
git bisect start
# good: [a39b6ac3781d46ba18193c9dbb2110f31e9bffe9] Linux 6.7-rc5
git bisect good a39b6ac3781d46ba18193c9dbb2110f31e9bffe9
# bad: [22c4640698a1d47606b5a4264a584e8046641784] net/mlx5: Implement
management PF Ethernet profile
git bisect bad 22c4640698a1d47606b5a4264a584e8046641784
# good: [f12f551b5b966ec58bfba9daa15f3cb99a92c1f9] bnxt_en: Prevent TX
timeout with a very small TX ring
git bisect good f12f551b5b966ec58bfba9daa15f3cb99a92c1f9
# good: [509afc7452707e62fb7c4bb257f111617332ffad] Merge branch
'tools-net-ynl-add-sub-message-support-to-ynl'
git bisect good 509afc7452707e62fb7c4bb257f111617332ffad
# good: [0ee28c9ae042e77100fae2cd82a54750668aafce] Merge tag
'wireless-next-2023-12-18' of
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
git bisect good 0ee28c9ae042e77100fae2cd82a54750668aafce
# good: [29c302a2e265a356434b005155990a9e766db75d] libbpf: further
decouple feature checking logic from bpf_object
git bisect good 29c302a2e265a356434b005155990a9e766db75d
# good: [852486b35f344887786d63250946dd921a05d7e8] x86/cfi,bpf: Fix
bpf_exception_cb() signature
git bisect good 852486b35f344887786d63250946dd921a05d7e8
# good: [e37a11fca41864c9f652ff81296b82e6f65a4242] bridge: add MDB state
mask uAPI attribute
git bisect good e37a11fca41864c9f652ff81296b82e6f65a4242
# good: [bee9705c679d0df8ee099e3c5312ac76f447848a] Merge branch
'net-sched-tc-drop-reason'
git bisect good bee9705c679d0df8ee099e3c5312ac76f447848a
# good: [c82d360325112ccc512fc11a3b68cdcdf04a1478] net/mlx5: SD, Add
informative prints in kernel log
git bisect good c82d360325112ccc512fc11a3b68cdcdf04a1478
# bad: [c73a3ab8fa6e93a783bd563938d7cf00d62d5d34] net/mlx5e: Support
cross-vhca RSS
git bisect bad c73a3ab8fa6e93a783bd563938d7cf00d62d5d34
# bad: [c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1] net/mlx5e: Create EN
core HW resources for all secondary devices
git bisect bad c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1
# bad: [e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad] net/mlx5e: Create
single netdev per SD group
git bisect bad e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad
# first bad commit: [e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad]
net/mlx5e: Create single netdev per SD group
------

Thanks,
Aishwarya

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Create single netdev per SD group
  2024-01-08 13:36   ` Aishwarya TCV
@ 2024-01-08 13:50     ` Gal Pressman
  2024-01-08 15:54       ` Mark Brown
  0 siblings, 1 reply; 45+ messages in thread
From: Gal Pressman @ 2024-01-08 13:50 UTC (permalink / raw)
  To: Aishwarya TCV, Saeed Mahameed, Tariq Toukan
  Cc: Saeed Mahameed, netdev, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Eric Dumazet, Mark Brown, Suzuki K Poulose

On 08/01/2024 15:36, Aishwarya TCV wrote:
> 
> 
> On 21/12/2023 00:57, Saeed Mahameed wrote:
>> From: Tariq Toukan <tariqt@nvidia.com>
>>
>> Integrate the SD library calls into the auxiliary_driver ops in
>> preparation for creating a single netdev for the multiple devices
>> belonging to the same SD group.
>>
>> SD is still disabled at this stage. It is enabled by a downstream patch
>> when all needed parts are implemented.
>>
>> The netdev is created only when the SD group, with all its participants,
>> are ready. It is later destroyed if any of the participating devices
>> drops.
>>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> Reviewed-by: Gal Pressman <gal@nvidia.com>
>> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>> ---
> 
> Hi Tariq,
> 
> 
> Currently when booting the kernel against next-master(next-20240108)
> with Arm64 on Marvell Thunder X2 (TX2), the kernel is failing to probe
> the network card which is resulting in boot failures for our CI (with
> rootfs over NFS). I can send the full logs if required. Most other
> boards seem fine.
> 
> A bisect (full log below) identified this patch as introducing the
> failure. Bisected it on the tag "mlx5-updates-2023-12-20" at repo
> "https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/".
> 
> This works fine on Linux 6.7-rc5

Thanks Aishwarya!

We just stumbled upon this internally as well, I assume you are using a
(very) old firmware version?
If it's the same issue we should have a fix coming soon.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Create single netdev per SD group
  2024-01-08 13:50     ` Gal Pressman
@ 2024-01-08 15:54       ` Mark Brown
  2024-01-08 16:00         ` Gal Pressman
  0 siblings, 1 reply; 45+ messages in thread
From: Mark Brown @ 2024-01-08 15:54 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Aishwarya TCV, Saeed Mahameed, Tariq Toukan, Saeed Mahameed,
	netdev, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Suzuki K Poulose

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Mon, Jan 08, 2024 at 03:50:09PM +0200, Gal Pressman wrote:

> We just stumbled upon this internally as well, I assume you are using a
> (very) old firmware version?
> If it's the same issue we should have a fix coming soon.

The firmware version announced on boot is 14.21.1000 - the rootfs the
tests are using is based on Debian Bullseye, the firmware will be
coming from either there or the UEFI image on the system.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Create single netdev per SD group
  2024-01-08 15:54       ` Mark Brown
@ 2024-01-08 16:00         ` Gal Pressman
  0 siblings, 0 replies; 45+ messages in thread
From: Gal Pressman @ 2024-01-08 16:00 UTC (permalink / raw)
  To: Mark Brown
  Cc: Aishwarya TCV, Saeed Mahameed, Tariq Toukan, Saeed Mahameed,
	netdev, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Suzuki K Poulose

On 08/01/2024 17:54, Mark Brown wrote:
> On Mon, Jan 08, 2024 at 03:50:09PM +0200, Gal Pressman wrote:
> 
>> We just stumbled upon this internally as well, I assume you are using a
>> (very) old firmware version?
>> If it's the same issue we should have a fix coming soon.
> 
> The firmware version announced on boot is 14.21.1000 - the rootfs the
> tests are using is based on Debian Bullseye, the firmware will be
> coming from either there or the UEFI image on the system.

Makes sense, you are using a fw version from 2017 :(.
Anyway, we should have a fix soon.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [pull request][net-next 00/15] mlx5 updates 2023-12-20
  2024-01-08  1:19   ` Jakub Kicinski
@ 2024-01-08 23:14     ` Saeed Mahameed
  0 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2024-01-08 23:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan

On 07 Jan 17:19, Jakub Kicinski wrote:
>On Thu, 4 Jan 2024 14:47:21 -0800 Jakub Kicinski wrote:
>> On Wed, 20 Dec 2023 16:57:06 -0800 Saeed Mahameed wrote:
>> > Support Socket-Direct multi-dev netdev
>>
>> There's no documentation for any of it?
>>
>> $ git grep -i 'socket.direct' -- Documentation/
>> $
>>
>> it's a feature many people have talked about for ever.
>> I'm pretty sure there are at least 2 vendors who have
>> HW support to do the same thing. Without docs everyone
>> will implement is slightly differently :(
>
>No replies so far, and v6.8 merge window has just begun,
>so let me drop this from -next for now.
>

But why revert ? what was wrong with the code or the current design? 
The current comments aren't that critical and I am sure you understand
that people are on holiday vacation.

We will provide the docs, but IMHO, docs could have been easily a follow
up.

What's the point of the upstream process if a surprise
revert can be done at any point by a maintainer? This is is not the first
instance, This has happened before with the management PF first iteration,
at least that time you asked for a revert and we approved, but this revert
came as a complete surprise.. 

Can we not do these reverts in such a stealthy way, this makes the whole
acceptance criteria unreliable, many teams rely on things getting accepted
so they plan the next steps, we have an upstream first open source policy
at nVidia networking and predictability is very important to us,
uncertainty especially when things are already accepted is something that
is very hard for us to work with.

Thanks,
Saeed.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2024-01-04 22:44       ` Jakub Kicinski
@ 2024-01-08 23:22         ` Saeed Mahameed
  2024-01-09  2:58           ` Jakub Kicinski
  0 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2024-01-08 23:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nelson, Shannon, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner,
	Daniel Jurgens

On 04 Jan 14:44, Jakub Kicinski wrote:
>On Thu, 21 Dec 2023 14:25:33 -0800 Saeed Mahameed wrote:
>> Maybe we should have made it clear here as well, this management PF just
>> exposes a netdev on the embedded ARM that will be used to communicate
>> with the device onboard BMC via NC-SI, so it meant to be used
>> only by standard tools.
>
>How's that different to any other BMC via NC-SI setup?
>NC-SI is supposed to steal packets which were directed to the wire.
>

This is embedded core switchdev setup, there is no PF representor, only
uplink and VF/SF representors, the term management PF is only FW
terminology, since uplink traffic is controlled by the admin, and uplink
interface represents what goes in/out the wire, the current FW architecture
demands that BMC/NCSI traffic goes through a separate PF that is not the
uplink since the uplink rules are managed purely by the eswitch admin.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2024-01-08 23:22         ` Saeed Mahameed
@ 2024-01-09  2:58           ` Jakub Kicinski
  2024-01-17  7:37             ` Saeed Mahameed
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-09  2:58 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Nelson, Shannon, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner,
	Daniel Jurgens

On Mon, 8 Jan 2024 15:22:12 -0800 Saeed Mahameed wrote:
> This is embedded core switchdev setup, there is no PF representor, only
> uplink and VF/SF representors, the term management PF is only FW
> terminology, since uplink traffic is controlled by the admin, and uplink
> interface represents what goes in/out the wire, the current FW architecture
> demands that BMC/NCSI traffic goes through a separate PF that is not the
> uplink since the uplink rules are managed purely by the eswitch admin.

"Normal way" to talk to the BMC is to send the traffic to the uplink
and let the NC-SI filter "steal" the frames. There's not need for host
PF (which I think is what you're referring to when you say there's
no PF representor).

Can you rephrase / draw a diagram? Perhaps I'm missing something.
When the host is managing the eswitch for mlx5 AFAIU NC-SI frame
stealing works fine.. so I'm missing what's different with the EC.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2024-01-08 12:30     ` Gal Pressman
@ 2024-01-09  3:08       ` Jakub Kicinski
  2024-01-09 14:15         ` Gal Pressman
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-09  3:08 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

On Mon, 8 Jan 2024 14:30:54 +0200 Gal Pressman wrote:
> On 05/01/2024 0:50, Jakub Kicinski wrote:
> > On Wed, 20 Dec 2023 16:57:16 -0800 Saeed Mahameed wrote:  
> >> Example for 2 mdevs and 6 channels:
> >> +-------+---------+
> >> | ch ix | mdev ix |
> >> +-------+---------+
> >> |   0   |    0    |
> >> |   1   |    1    |
> >> |   2   |    0    |
> >> |   3   |    1    |
> >> |   4   |    0    |
> >> |   5   |    1    |
> >> +-------+---------+  
> > 
> > Meaning Rx queue 0 goes to PF 0, Rx queue 1 goes to PF 1, etc.?  
> 
> Correct.
> 
> > Is the user then expected to magic pixie dust the XPS or some such
> > to get to the right queue?  
> 
> I'm confused, how are RX queues related to XPS?

Separate sentence, perhaps I should be more verbose..

> XPS shouldn't be affected, we just make sure that whatever queue XPS
> chose will go out through the "right" PF.

But you said "correct" to queue 0 going to PF 0 and queue 1 to PF 1.
The queue IDs in my question refer to the queue mapping form the stacks
perspective. If user wants to send everything to queue 0 will it use
both PFs?

> So for example, XPS will choose a queue according to the CPU, and the
> driver will make sure that packets transmitted from this SQ are going
> out through the PF closer to that NUMA.

Sounds like queue 0 is duplicated in both PFs, then?

> > How is this going to get represented in the recently merged Netlink
> > queue API?  
> 
> Can you share a link please?

commit a90d56049acc45802f67cd7d4c058ac45b1bc26f
 
> All the logic is internal to the driver, so I expect it to be fine, but
> I'd like to double check.

Herm, "internal to the driver" is a bit of a landmine. It will be fine
for iperf testing but real users will want to configure the NIC.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2024-01-09  3:08       ` Jakub Kicinski
@ 2024-01-09 14:15         ` Gal Pressman
  2024-01-09 16:00           ` Jakub Kicinski
  0 siblings, 1 reply; 45+ messages in thread
From: Gal Pressman @ 2024-01-09 14:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

On 09/01/2024 5:08, Jakub Kicinski wrote:
> On Mon, 8 Jan 2024 14:30:54 +0200 Gal Pressman wrote:
>> On 05/01/2024 0:50, Jakub Kicinski wrote:
>>> On Wed, 20 Dec 2023 16:57:16 -0800 Saeed Mahameed wrote:  
>>>> Example for 2 mdevs and 6 channels:
>>>> +-------+---------+
>>>> | ch ix | mdev ix |
>>>> +-------+---------+
>>>> |   0   |    0    |
>>>> |   1   |    1    |
>>>> |   2   |    0    |
>>>> |   3   |    1    |
>>>> |   4   |    0    |
>>>> |   5   |    1    |
>>>> +-------+---------+  
>>>
>>> Meaning Rx queue 0 goes to PF 0, Rx queue 1 goes to PF 1, etc.?  
>>
>> Correct.
>>
>>> Is the user then expected to magic pixie dust the XPS or some such
>>> to get to the right queue?  
>>
>> I'm confused, how are RX queues related to XPS?
> 
> Separate sentence, perhaps I should be more verbose..

Sorry, yes, your understanding is correct.
If a packet is received on RQ 0 then it is from PF 0, RQ 1 came from PF
1, etc. Though this is all from the same wire/port.

You can enable arfs for example, which will make sure that packets that
are destined to a certain CPU will be received by the PF that is closer
to it.

>> XPS shouldn't be affected, we just make sure that whatever queue XPS
>> chose will go out through the "right" PF.
> 
> But you said "correct" to queue 0 going to PF 0 and queue 1 to PF 1.
> The queue IDs in my question refer to the queue mapping form the stacks
> perspective. If user wants to send everything to queue 0 will it use
> both PFs?

If all traffic is transmitted through queue 0, it will go out from PF 0
(the PF that is closer to CPU 0 numa).

>> So for example, XPS will choose a queue according to the CPU, and the
>> driver will make sure that packets transmitted from this SQ are going
>> out through the PF closer to that NUMA.
> 
> Sounds like queue 0 is duplicated in both PFs, then?

Depends on how you look at it, each PF has X queues, the netdev has 2X
queues.

>>> How is this going to get represented in the recently merged Netlink
>>> queue API?  
>>
>> Can you share a link please?
> 
> commit a90d56049acc45802f67cd7d4c058ac45b1bc26f

Thanks, will take a look.

>> All the logic is internal to the driver, so I expect it to be fine, but
>> I'd like to double check.
> 
> Herm, "internal to the driver" is a bit of a landmine. It will be fine
> for iperf testing but real users will want to configure the NIC.

What kind of configuration are you thinking of?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2024-01-09 14:15         ` Gal Pressman
@ 2024-01-09 16:00           ` Jakub Kicinski
  2024-01-10 14:09             ` Gal Pressman
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-09 16:00 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

On Tue, 9 Jan 2024 16:15:50 +0200 Gal Pressman wrote:
> >> I'm confused, how are RX queues related to XPS?  
> > 
> > Separate sentence, perhaps I should be more verbose..  
> 
> Sorry, yes, your understanding is correct.
> If a packet is received on RQ 0 then it is from PF 0, RQ 1 came from PF
> 1, etc. Though this is all from the same wire/port.
> 
> You can enable arfs for example, which will make sure that packets that
> are destined to a certain CPU will be received by the PF that is closer
> to it.

Got it.

> >> XPS shouldn't be affected, we just make sure that whatever queue XPS
> >> chose will go out through the "right" PF.  
> > 
> > But you said "correct" to queue 0 going to PF 0 and queue 1 to PF 1.
> > The queue IDs in my question refer to the queue mapping form the stacks
> > perspective. If user wants to send everything to queue 0 will it use
> > both PFs?  
> 
> If all traffic is transmitted through queue 0, it will go out from PF 0
> (the PF that is closer to CPU 0 numa).

Okay, but earlier you said: "whatever queue XPS chose will go out
through the "right" PF." - which I read as PF will be chosen based
on CPU locality regardless of XPS logic.

If queue 0 => PF 0, then user has to set up XPS to make CPUs from NUMA
node which has PF 0 use even number queues, and PF 1 to use odd number
queues. Correct?

> >> So for example, XPS will choose a queue according to the CPU, and the
> >> driver will make sure that packets transmitted from this SQ are going
> >> out through the PF closer to that NUMA.  
> > 
> > Sounds like queue 0 is duplicated in both PFs, then?  
> 
> Depends on how you look at it, each PF has X queues, the netdev has 2X
> queues.

I'm asking how it looks from the user perspective, to be clear.
From above I gather than the answer is no - queue 0 maps directly 
to PF 0 / queue 0, nothing on PF 1 will ever see traffic of queue 0.

> >> Can you share a link please?  
> > 
> > commit a90d56049acc45802f67cd7d4c058ac45b1bc26f  
> 
> Thanks, will take a look.
> 
> >> All the logic is internal to the driver, so I expect it to be fine, but
> >> I'd like to double check.
> > 
> > Herm, "internal to the driver" is a bit of a landmine. It will be fine
> > for iperf testing but real users will want to configure the NIC.
> 
> What kind of configuration are you thinking of?

Well, I was hoping you'd do the legwork and show how user configuration
logic has to be augmented for all relevant stack features to work with
multi-PF devices. I can list the APIs that come to mind while writing
this email, but that won't be exhaustive :(

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2024-01-09 16:00           ` Jakub Kicinski
@ 2024-01-10 14:09             ` Gal Pressman
  2024-01-25  8:01               ` Tariq Toukan
  0 siblings, 1 reply; 45+ messages in thread
From: Gal Pressman @ 2024-01-10 14:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

On 09/01/2024 18:00, Jakub Kicinski wrote:
> On Tue, 9 Jan 2024 16:15:50 +0200 Gal Pressman wrote:
>>>> I'm confused, how are RX queues related to XPS?  
>>>
>>> Separate sentence, perhaps I should be more verbose..  
>>
>> Sorry, yes, your understanding is correct.
>> If a packet is received on RQ 0 then it is from PF 0, RQ 1 came from PF
>> 1, etc. Though this is all from the same wire/port.
>>
>> You can enable arfs for example, which will make sure that packets that
>> are destined to a certain CPU will be received by the PF that is closer
>> to it.
> 
> Got it.
> 
>>>> XPS shouldn't be affected, we just make sure that whatever queue XPS
>>>> chose will go out through the "right" PF.  
>>>
>>> But you said "correct" to queue 0 going to PF 0 and queue 1 to PF 1.
>>> The queue IDs in my question refer to the queue mapping form the stacks
>>> perspective. If user wants to send everything to queue 0 will it use
>>> both PFs?  
>>
>> If all traffic is transmitted through queue 0, it will go out from PF 0
>> (the PF that is closer to CPU 0 numa).
> 
> Okay, but earlier you said: "whatever queue XPS chose will go out
> through the "right" PF." - which I read as PF will be chosen based
> on CPU locality regardless of XPS logic.
> 
> If queue 0 => PF 0, then user has to set up XPS to make CPUs from NUMA
> node which has PF 0 use even number queues, and PF 1 to use odd number
> queues. Correct?

I think it is based on the default xps configuration, but I don't want
to get the details wrong, checking with Tariq and will reply (he's OOO).

>>>> So for example, XPS will choose a queue according to the CPU, and the
>>>> driver will make sure that packets transmitted from this SQ are going
>>>> out through the PF closer to that NUMA.  
>>>
>>> Sounds like queue 0 is duplicated in both PFs, then?  
>>
>> Depends on how you look at it, each PF has X queues, the netdev has 2X
>> queues.
> 
> I'm asking how it looks from the user perspective, to be clear.

From the user's perspective there is a single netdev, the PFs separation
is internal to the driver and transparent to the user.
The user configures the number of queues, and the driver splits them
between the PF.

Same for other features, the user configures the netdev like any other
netdev, it is up to the driver to make sure that the netdev model is
working.

> From above I gather than the answer is no - queue 0 maps directly 
> to PF 0 / queue 0, nothing on PF 1 will ever see traffic of queue 0.

Right, traffic received on RQ 0 is traffic that was processed by PF 0.
RQ 1 is in fact (PF 1, RQ 0).

>>>> Can you share a link please?  
>>>
>>> commit a90d56049acc45802f67cd7d4c058ac45b1bc26f  
>>
>> Thanks, will take a look.
>>
>>>> All the logic is internal to the driver, so I expect it to be fine, but
>>>> I'd like to double check.
>>>
>>> Herm, "internal to the driver" is a bit of a landmine. It will be fine
>>> for iperf testing but real users will want to configure the NIC.
>>
>> What kind of configuration are you thinking of?
> 
> Well, I was hoping you'd do the legwork and show how user configuration
> logic has to be augmented for all relevant stack features to work with
> multi-PF devices. I can list the APIs that come to mind while writing
> this email, but that won't be exhaustive :(

We have been working on this feature for a long time, we did think of
the different configurations and potential issues, and backed that up
with our testing.

TLS for example is explicitly blocked in this series for such netdevices
as we identified it as problematic.

There is always potential that we missed things, that's why I was
genuinely curious to hear if you had anything specific in mind.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2024-01-09  2:58           ` Jakub Kicinski
@ 2024-01-17  7:37             ` Saeed Mahameed
  2024-01-18  2:04               ` Jakub Kicinski
  0 siblings, 1 reply; 45+ messages in thread
From: Saeed Mahameed @ 2024-01-17  7:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nelson, Shannon, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner,
	Daniel Jurgens

On 08 Jan 18:58, Jakub Kicinski wrote:
>On Mon, 8 Jan 2024 15:22:12 -0800 Saeed Mahameed wrote:
>> This is embedded core switchdev setup, there is no PF representor, only
>> uplink and VF/SF representors, the term management PF is only FW
>> terminology, since uplink traffic is controlled by the admin, and uplink
>> interface represents what goes in/out the wire, the current FW architecture
>> demands that BMC/NCSI traffic goes through a separate PF that is not the
>> uplink since the uplink rules are managed purely by the eswitch admin.
>
>"Normal way" to talk to the BMC is to send the traffic to the uplink
>and let the NC-SI filter "steal" the frames. There's not need for host
>PF (which I think is what you're referring to when you say there's
>no PF representor).
>
>Can you rephrase / draw a diagram? Perhaps I'm missing something.
>When the host is managing the eswitch for mlx5 AFAIU NC-SI frame
>stealing works fine.. so I'm missing what's different with the EC.

AFAIK it is not implemented via "stealing" from esw, esw is completely
managed by driver, FW has no access to it, the management PF completely
bypasses eswitch to talk to BMC in ConnectX arch.


    ┌─────────────┐            ┌─────────────┐
    │             │            │             │
    │             │            │            ┌┼────────────┐
    │     ┌───────┼────────────┼────────────┼│ mgmt PF    │
    │  BMC│       │ NC-SI      │   ConnectX └┼────────────┘
    │     │       │◄──────────►│             │
    │     │       │            │     NIC     │
    │     │       │            │            ┌┼────────────┐
    │     │       │            │      ┌─────┼│ PF         │
    │     │       │            │      │     └┼────────────┘
    │     │       │            │      │      │
    └─────▼───────┘            └──────▼──────┘
          │phy                        │ phy
          │                           │
          ▼                           ▼
      Management                     Network
        Network


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 15/15] net/mlx5: Implement management PF Ethernet profile
  2024-01-17  7:37             ` Saeed Mahameed
@ 2024-01-18  2:04               ` Jakub Kicinski
  0 siblings, 0 replies; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-18  2:04 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Nelson, Shannon, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Armen Ratner,
	Daniel Jurgens

On Tue, 16 Jan 2024 23:37:28 -0800 Saeed Mahameed wrote:
> On 08 Jan 18:58, Jakub Kicinski wrote:
> >On Mon, 8 Jan 2024 15:22:12 -0800 Saeed Mahameed wrote:  
> >> This is embedded core switchdev setup, there is no PF representor, only
> >> uplink and VF/SF representors, the term management PF is only FW
> >> terminology, since uplink traffic is controlled by the admin, and uplink
> >> interface represents what goes in/out the wire, the current FW architecture
> >> demands that BMC/NCSI traffic goes through a separate PF that is not the
> >> uplink since the uplink rules are managed purely by the eswitch admin.  
> >
> >"Normal way" to talk to the BMC is to send the traffic to the uplink
> >and let the NC-SI filter "steal" the frames. There's not need for host
> >PF (which I think is what you're referring to when you say there's
> >no PF representor).
> >
> >Can you rephrase / draw a diagram? Perhaps I'm missing something.
> >When the host is managing the eswitch for mlx5 AFAIU NC-SI frame
> >stealing works fine.. so I'm missing what's different with the EC.  
> 
> AFAIK it is not implemented via "stealing" from esw, esw is completely
> managed by driver, FW has no access to it, the management PF completely
> bypasses eswitch to talk to BMC in ConnectX arch.
> 
> 
>     ┌─────────────┐            ┌─────────────┐
>     │             │            │             │
>     │             │            │            ┌┼────────────┐
>     │     ┌───────┼────────────┼────────────┼│ mgmt PF    │
>     │  BMC│       │ NC-SI      │   ConnectX └┼────────────┘
>     │     │       │◄──────────►│             │
>     │     │       │      ^     │     NIC     │
>     │     │       │      |     │            ┌┼────────────┐
>     │     │       │      |     │      ┌─────┼│ PF         │
>     │     │       │      |     │      │     └┼────────────┘
>     │     │       │      |     │      │      │
>     └─────▼───────┘      |     └──────▼──────┘
>           │phy           /            │ phy
>           │             /             │
>           ▼            /              ▼
>       Management      /              Network
>         Network      /
                      /
                     /
What are the two lines here?

Are there really two connections / a separate MAC that's
not the NC-SI one?

Or is rhe BMC is configured to bridge / forward between NC-SI 
and the port?

Or the pass-thru packets are somehow encapsulated over the NC-SI MAC?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation
  2024-01-05 12:15   ` Jiri Pirko
@ 2024-01-25  7:34     ` Tariq Toukan
  2024-01-29  9:21       ` Jiri Pirko
  0 siblings, 1 reply; 45+ messages in thread
From: Tariq Toukan @ 2024-01-25  7:34 UTC (permalink / raw)
  To: Jiri Pirko, Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan



On 05/01/2024 14:15, Jiri Pirko wrote:
> Thu, Dec 21, 2023 at 01:57:10AM CET, saeed@kernel.org wrote:
>> From: Tariq Toukan <tariqt@nvidia.com>
> 
> [...]
> 
>> +static int sd_init(struct mlx5_core_dev *dev)
> 
> Could you maintain "mlx5_" prefix here and in the rest of the patches?
> 
> 

Hi Jiri,

We do not necessarily maintain this prefix for non-exposed static functions.

>> +{
> 
> [...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log
  2024-01-05 12:12   ` Jiri Pirko
@ 2024-01-25  7:42     ` Tariq Toukan
  2024-01-29  9:20       ` Jiri Pirko
  0 siblings, 1 reply; 45+ messages in thread
From: Tariq Toukan @ 2024-01-25  7:42 UTC (permalink / raw)
  To: Jiri Pirko, Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan



On 05/01/2024 14:12, Jiri Pirko wrote:
> Thu, Dec 21, 2023 at 01:57:13AM CET, saeed@kernel.org wrote:
>> From: Tariq Toukan <tariqt@nvidia.com>
>>
>> Print to kernel log when an SD group moves from/to ready state.
>>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>> ---
>> .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 21 +++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>> index 3309f21d892e..f68942277c62 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>> @@ -373,6 +373,21 @@ static void sd_cmd_unset_secondary(struct mlx5_core_dev *secondary)
>> 	mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
>> }
>>
>> +static void sd_print_group(struct mlx5_core_dev *primary)
>> +{
>> +	struct mlx5_sd *sd = mlx5_get_sd(primary);
>> +	struct mlx5_core_dev *pos;
>> +	int i;
>> +
>> +	sd_info(primary, "group id %#x, primary %s, vhca %u\n",
>> +		sd->group_id, pci_name(primary->pdev),
>> +		MLX5_CAP_GEN(primary, vhca_id));
>> +	mlx5_sd_for_each_secondary(i, primary, pos)
>> +		sd_info(primary, "group id %#x, secondary#%d %s, vhca %u\n",
>> +			sd->group_id, i - 1, pci_name(pos->pdev),
>> +			MLX5_CAP_GEN(pos, vhca_id));
>> +}
>> +
>> int mlx5_sd_init(struct mlx5_core_dev *dev)
>> {
>> 	struct mlx5_core_dev *primary, *pos, *to;
>> @@ -410,6 +425,10 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
>> 			goto err_unset_secondaries;
>> 	}
>>
>> +	sd_info(primary, "group id %#x, size %d, combined\n",
>> +		sd->group_id, mlx5_devcom_comp_get_size(sd->devcom));
> 
> Can't you rather expose this over sysfs or debugfs? I mean, dmesg print
> does not seem like a good idea.
> 
> 

I think that the events of netdev combine/uncombine are important enough 
to be logged in the kernel dmesg.
I can implement a debugfs as an addition, not replacing the print.

>> +	sd_print_group(primary);
>> +
>> 	return 0;
>>
>> err_unset_secondaries:
>> @@ -440,6 +459,8 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
>> 	mlx5_sd_for_each_secondary(i, primary, pos)
>> 		sd_cmd_unset_secondary(pos);
>> 	sd_cmd_unset_primary(primary);
>> +
>> +	sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
>> out:
>> 	sd_unregister(dev);
>> 	sd_cleanup(dev);
>> -- 
>> 2.43.0
>>
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2024-01-10 14:09             ` Gal Pressman
@ 2024-01-25  8:01               ` Tariq Toukan
  2024-01-26  2:40                 ` Jakub Kicinski
  0 siblings, 1 reply; 45+ messages in thread
From: Tariq Toukan @ 2024-01-25  8:01 UTC (permalink / raw)
  To: Gal Pressman, Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan



On 10/01/2024 16:09, Gal Pressman wrote:
> On 09/01/2024 18:00, Jakub Kicinski wrote:
>> On Tue, 9 Jan 2024 16:15:50 +0200 Gal Pressman wrote:
>>>>> I'm confused, how are RX queues related to XPS?
>>>>
>>>> Separate sentence, perhaps I should be more verbose..
>>>
>>> Sorry, yes, your understanding is correct.
>>> If a packet is received on RQ 0 then it is from PF 0, RQ 1 came from PF
>>> 1, etc. Though this is all from the same wire/port.
>>>
>>> You can enable arfs for example, which will make sure that packets that
>>> are destined to a certain CPU will be received by the PF that is closer
>>> to it.
>>
>> Got it.
>>
>>>>> XPS shouldn't be affected, we just make sure that whatever queue XPS
>>>>> chose will go out through the "right" PF.
>>>>
>>>> But you said "correct" to queue 0 going to PF 0 and queue 1 to PF 1.
>>>> The queue IDs in my question refer to the queue mapping form the stacks
>>>> perspective. If user wants to send everything to queue 0 will it use
>>>> both PFs?
>>>
>>> If all traffic is transmitted through queue 0, it will go out from PF 0
>>> (the PF that is closer to CPU 0 numa).
>>

Hi,
I'm back from a long vacation. Catching up on emails...

>> Okay, but earlier you said: "whatever queue XPS chose will go out
>> through the "right" PF." - which I read as PF will be chosen based
>> on CPU locality regardless of XPS logic.
>>
>> If queue 0 => PF 0, then user has to set up XPS to make CPUs from NUMA
>> node which has PF 0 use even number queues, and PF 1 to use odd number
>> queues. Correct?

Exactly. That's the desired configuration.
Our driver has the logic to set it in default.

Here's the default XPS on my setup:

NUMA:
   NUMA node(s):          2
   NUMA node0 CPU(s):     0-11
   NUMA node1 CPU(s):     12-23

PF0 on node0, PF1 on node1.

/sys/class/net/eth2/queues/tx-0/xps_cpus:000001
/sys/class/net/eth2/queues/tx-1/xps_cpus:001000
/sys/class/net/eth2/queues/tx-2/xps_cpus:000002
/sys/class/net/eth2/queues/tx-3/xps_cpus:002000
/sys/class/net/eth2/queues/tx-4/xps_cpus:000004
/sys/class/net/eth2/queues/tx-5/xps_cpus:004000
/sys/class/net/eth2/queues/tx-6/xps_cpus:000008
/sys/class/net/eth2/queues/tx-7/xps_cpus:008000
/sys/class/net/eth2/queues/tx-8/xps_cpus:000010
/sys/class/net/eth2/queues/tx-9/xps_cpus:010000
/sys/class/net/eth2/queues/tx-10/xps_cpus:000020
/sys/class/net/eth2/queues/tx-11/xps_cpus:020000
/sys/class/net/eth2/queues/tx-12/xps_cpus:000040
/sys/class/net/eth2/queues/tx-13/xps_cpus:040000
/sys/class/net/eth2/queues/tx-14/xps_cpus:000080
/sys/class/net/eth2/queues/tx-15/xps_cpus:080000
/sys/class/net/eth2/queues/tx-16/xps_cpus:000100
/sys/class/net/eth2/queues/tx-17/xps_cpus:100000
/sys/class/net/eth2/queues/tx-18/xps_cpus:000200
/sys/class/net/eth2/queues/tx-19/xps_cpus:200000
/sys/class/net/eth2/queues/tx-20/xps_cpus:000400
/sys/class/net/eth2/queues/tx-21/xps_cpus:400000
/sys/class/net/eth2/queues/tx-22/xps_cpus:000800
/sys/class/net/eth2/queues/tx-23/xps_cpus:800000

> 
> I think it is based on the default xps configuration, but I don't want
> to get the details wrong, checking with Tariq and will reply (he's OOO).
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 10/15] net/mlx5e: Let channels be SD-aware
  2024-01-25  8:01               ` Tariq Toukan
@ 2024-01-26  2:40                 ` Jakub Kicinski
  0 siblings, 0 replies; 45+ messages in thread
From: Jakub Kicinski @ 2024-01-26  2:40 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Gal Pressman, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, Saeed Mahameed, netdev, Tariq Toukan

On Thu, 25 Jan 2024 10:01:05 +0200 Tariq Toukan wrote:
> Exactly. That's the desired configuration.
> Our driver has the logic to set it in default.
> 
> Here's the default XPS on my setup:
> 
> NUMA:
>    NUMA node(s):          2
>    NUMA node0 CPU(s):     0-11
>    NUMA node1 CPU(s):     12-23
> 
> PF0 on node0, PF1 on node1.

Okay, good that you took care of the defaults, but having a queue per
CPU thread is quite inefficient. Most sensible users will reconfigure
your NICs and remap IRQs and XPS. Which is fine, but we need to give
them the necessary info to do this right - documentation and preferably
the PCIe dev mapping in the new netlink queue API.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log
  2024-01-25  7:42     ` Tariq Toukan
@ 2024-01-29  9:20       ` Jiri Pirko
  0 siblings, 0 replies; 45+ messages in thread
From: Jiri Pirko @ 2024-01-29  9:20 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Saeed Mahameed, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Saeed Mahameed, netdev, Tariq Toukan

Thu, Jan 25, 2024 at 08:42:41AM CET, ttoukan.linux@gmail.com wrote:
>
>
>On 05/01/2024 14:12, Jiri Pirko wrote:
>> Thu, Dec 21, 2023 at 01:57:13AM CET, saeed@kernel.org wrote:
>> > From: Tariq Toukan <tariqt@nvidia.com>
>> > 
>> > Print to kernel log when an SD group moves from/to ready state.
>> > 
>> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>> > ---
>> > .../net/ethernet/mellanox/mlx5/core/lib/sd.c  | 21 +++++++++++++++++++
>> > 1 file changed, 21 insertions(+)
>> > 
>> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>> > index 3309f21d892e..f68942277c62 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
>> > @@ -373,6 +373,21 @@ static void sd_cmd_unset_secondary(struct mlx5_core_dev *secondary)
>> > 	mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
>> > }
>> > 
>> > +static void sd_print_group(struct mlx5_core_dev *primary)
>> > +{
>> > +	struct mlx5_sd *sd = mlx5_get_sd(primary);
>> > +	struct mlx5_core_dev *pos;
>> > +	int i;
>> > +
>> > +	sd_info(primary, "group id %#x, primary %s, vhca %u\n",
>> > +		sd->group_id, pci_name(primary->pdev),
>> > +		MLX5_CAP_GEN(primary, vhca_id));
>> > +	mlx5_sd_for_each_secondary(i, primary, pos)
>> > +		sd_info(primary, "group id %#x, secondary#%d %s, vhca %u\n",
>> > +			sd->group_id, i - 1, pci_name(pos->pdev),
>> > +			MLX5_CAP_GEN(pos, vhca_id));
>> > +}
>> > +
>> > int mlx5_sd_init(struct mlx5_core_dev *dev)
>> > {
>> > 	struct mlx5_core_dev *primary, *pos, *to;
>> > @@ -410,6 +425,10 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
>> > 			goto err_unset_secondaries;
>> > 	}
>> > 
>> > +	sd_info(primary, "group id %#x, size %d, combined\n",
>> > +		sd->group_id, mlx5_devcom_comp_get_size(sd->devcom));
>> 
>> Can't you rather expose this over sysfs or debugfs? I mean, dmesg print
>> does not seem like a good idea.
>> 
>> 
>
>I think that the events of netdev combine/uncombine are important enough to
>be logged in the kernel dmesg.

Why? I believe that the best amount od dmesg log is exactly 0. You
should find proper interfaces. Definitelly for new features. Why do you
keep asking user to look for random messages in dmesg? Does not make any
sense :/



>I can implement a debugfs as an addition, not replacing the print.
>
>> > +	sd_print_group(primary);
>> > +
>> > 	return 0;
>> > 
>> > err_unset_secondaries:
>> > @@ -440,6 +459,8 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
>> > 	mlx5_sd_for_each_secondary(i, primary, pos)
>> > 		sd_cmd_unset_secondary(pos);
>> > 	sd_cmd_unset_primary(primary);
>> > +
>> > +	sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
>> > out:
>> > 	sd_unregister(dev);
>> > 	sd_cleanup(dev);
>> > -- 
>> > 2.43.0
>> > 
>> > 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation
  2024-01-25  7:34     ` Tariq Toukan
@ 2024-01-29  9:21       ` Jiri Pirko
  0 siblings, 0 replies; 45+ messages in thread
From: Jiri Pirko @ 2024-01-29  9:21 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Saeed Mahameed, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Saeed Mahameed, netdev, Tariq Toukan

Thu, Jan 25, 2024 at 08:34:25AM CET, ttoukan.linux@gmail.com wrote:
>
>
>On 05/01/2024 14:15, Jiri Pirko wrote:
>> Thu, Dec 21, 2023 at 01:57:10AM CET, saeed@kernel.org wrote:
>> > From: Tariq Toukan <tariqt@nvidia.com>
>> 
>> [...]
>> 
>> > +static int sd_init(struct mlx5_core_dev *dev)
>> 
>> Could you maintain "mlx5_" prefix here and in the rest of the patches?
>> 
>> 
>
>Hi Jiri,
>
>We do not necessarily maintain this prefix for non-exposed static functions.

Yet, it is very common all over mlx5 driver. It is much more common than
no prefix. Why this is exception?


>
>> > +{
>> 
>> [...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2024-01-29  9:21 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  0:57 [pull request][net-next 00/15] mlx5 updates 2023-12-20 Saeed Mahameed
2023-12-21  0:57 ` [net-next 01/15] net/mlx5e: Use the correct lag ports number when creating TISes Saeed Mahameed
2023-12-29 22:40   ` patchwork-bot+netdevbpf
2023-12-21  0:57 ` [net-next 02/15] net/mlx5: Fix query of sd_group field Saeed Mahameed
2023-12-21  0:57 ` [net-next 03/15] net/mlx5: SD, Introduce SD lib Saeed Mahameed
2023-12-21  0:57 ` [net-next 04/15] net/mlx5: SD, Implement basic query and instantiation Saeed Mahameed
2024-01-05 12:15   ` Jiri Pirko
2024-01-25  7:34     ` Tariq Toukan
2024-01-29  9:21       ` Jiri Pirko
2023-12-21  0:57 ` [net-next 05/15] net/mlx5: SD, Implement devcom communication and primary election Saeed Mahameed
2023-12-21  0:57 ` [net-next 06/15] net/mlx5: SD, Implement steering for primary and secondaries Saeed Mahameed
2023-12-21  0:57 ` [net-next 07/15] net/mlx5: SD, Add informative prints in kernel log Saeed Mahameed
2024-01-05 12:12   ` Jiri Pirko
2024-01-25  7:42     ` Tariq Toukan
2024-01-29  9:20       ` Jiri Pirko
2023-12-21  0:57 ` [net-next 08/15] net/mlx5e: Create single netdev per SD group Saeed Mahameed
2024-01-08 13:36   ` Aishwarya TCV
2024-01-08 13:50     ` Gal Pressman
2024-01-08 15:54       ` Mark Brown
2024-01-08 16:00         ` Gal Pressman
2023-12-21  0:57 ` [net-next 09/15] net/mlx5e: Create EN core HW resources for all secondary devices Saeed Mahameed
2023-12-21  0:57 ` [net-next 10/15] net/mlx5e: Let channels be SD-aware Saeed Mahameed
2024-01-04 22:50   ` Jakub Kicinski
2024-01-08 12:30     ` Gal Pressman
2024-01-09  3:08       ` Jakub Kicinski
2024-01-09 14:15         ` Gal Pressman
2024-01-09 16:00           ` Jakub Kicinski
2024-01-10 14:09             ` Gal Pressman
2024-01-25  8:01               ` Tariq Toukan
2024-01-26  2:40                 ` Jakub Kicinski
2023-12-21  0:57 ` [net-next 11/15] net/mlx5e: Support cross-vhca RSS Saeed Mahameed
2023-12-21  0:57 ` [net-next 12/15] net/mlx5e: Support per-mdev queue counter Saeed Mahameed
2023-12-21  0:57 ` [net-next 13/15] net/mlx5e: Block TLS device offload on combined SD netdev Saeed Mahameed
2023-12-21  0:57 ` [net-next 14/15] net/mlx5: Enable SD feature Saeed Mahameed
2023-12-21  0:57 ` [net-next 15/15] net/mlx5: Implement management PF Ethernet profile Saeed Mahameed
2023-12-21  2:45   ` Nelson, Shannon
2023-12-21 22:25     ` Saeed Mahameed
2024-01-04 22:44       ` Jakub Kicinski
2024-01-08 23:22         ` Saeed Mahameed
2024-01-09  2:58           ` Jakub Kicinski
2024-01-17  7:37             ` Saeed Mahameed
2024-01-18  2:04               ` Jakub Kicinski
2024-01-04 22:47 ` [pull request][net-next 00/15] mlx5 updates 2023-12-20 Jakub Kicinski
2024-01-08  1:19   ` Jakub Kicinski
2024-01-08 23:14     ` Saeed Mahameed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.