netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net-next 00/15] mlx5 updates 2023-06-09
@ 2023-06-10  1:42 Saeed Mahameed
  2023-06-10  1:42 ` [net-next 01/15] net/mlx5: Simplify unload all rep code Saeed Mahameed
                   ` (14 more replies)
  0 siblings, 15 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Saeed Mahameed <saeedm@nvidia.com>

This pull adds the following to mlx5 features:
1) Embedded CPU Virtual Functions
2) Lightweight local SFs

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.


The following changes since commit ded5c1a16ec69bb815f2b7d9ea4028913ebffca4:

  Merge branch 'tools-ynl-gen-code-gen-improvements-before-ethtool' (2023-06-09 14:40:33 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2023-06-09

for you to fetch changes up to 978015f7ef9240acfb078f4c1c0d79459b42f951:

  net/mlx5e: Remove a useless function call (2023-06-09 18:40:53 -0700)

----------------------------------------------------------------
mlx5-updates-2023-06-09

1) Embedded CPU Virtual Functions
2) Lightweight local SFs

Daniel Jurgens says:
====================
Embedded CPU Virtual Functions

This series enables the creation of virtual functions on Bluefield (the
embedded CPU platform). Embedded CPU virtual functions (EC VFs). EC VF
creation, deletion and management interfaces are the same as those for
virtual functions in a server with a Connect-X NIC.

When using EC VFs on the ARM the creation of virtual functions on the
host system is still supported. Host VFs eswitch vports occupy a range
of 1..max_vfs, the EC VF vport range is max_vfs+1..max_ec_vfs.

Every function (PF, ECPF, VF, EC VF, and subfunction) has a function ID
associated with it. Prior to this series the function ID and the eswitch
vport were the same. That is no longer the case, the EC VF function ID
range is 1..max_ec_vfs. When querying or setting the capabilities of an
EC VF function an new bit must be set in the query/set HCA cap
structure.

This is a high level overview of the changes made:
	- Allocate vports for EC VFs if they are enabled.
	- Create representors and devlink ports for the EC VF vports.
	- When querying/setting HCA caps by vport break the assumption
	  that function ID is the same a vport number and adjust
	  accordingly.
	- Create a new type of page, so that when SRIOV on the ARM is
	  disabled, but remains enabled on the host, the driver can
	  wait for the correct pages.
	- Update SRIOV code to support EC VF creation/deletion.

===================

Lightweight local SFs:

Last 3 patches form Shay Drory:

SFs are heavy weight and by default they come with the full package of
ConnectX features. Usually users want specialized SFs for one specific
purpose and using devlink users will almost always override the set of
advertises features of an SF and reload it.

Shay Drory says:
================
In order to avoid the wasted time and resources on the reload, local SFs
will probe without any auxiliary sub-device, so that the SFs can be
configured prior to its full probe.

The defaults of the enable_* devlink params of these SFs are set to
false.

Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

================

----------------------------------------------------------------
Christophe JAILLET (1):
      net/mlx5e: Remove a useless function call

Daniel Jurgens (11):
      net/mlx5: Simplify unload all rep code
      net/mlx5: mlx5_ifc updates for embedded CPU SRIOV
      net/mlx5: Enable devlink port for embedded cpu VF vports
      net/mlx5: Update vport caps query/set for EC VFs
      net/mlx5: Add management of EC VF vports
      net/mlx5: Add/remove peer miss rules for EC VFs
      net/mlx5: Add new page type for EC VF pages
      net/mlx5: Use correct vport when restoring GUIDs
      net/mlx5: Query correct caps for min msix vectors
      net/mlx5: Update SRIOV enable/disable to handle EC/VFs
      net/mlx5: Set max number of embedded CPU VFs

Shay Drory (3):
      net/mlx5: Split function_setup() to enable and open functions
      net/mlx5: Move esw multiport devlink param to eswitch code
      net/mlx5: Light probe local SFs

 .../ethernet/mellanox/mlx5/switchdev.rst           |  20 ++
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/dev.c      |  16 ++
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c  |  54 ++----
 .../ethernet/mellanox/mlx5/core/en/tc/post_act.c   |   4 +-
 .../ethernet/mellanox/mlx5/core/esw/devlink_port.c |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 174 +++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  13 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 102 +++++-----
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  24 ++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 209 +++++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  35 +++-
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c    |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c  |  16 +-
 .../ethernet/mellanox/mlx5/core/sf/dev/driver.c    |  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c    |  46 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  19 +-
 include/linux/mlx5/driver.h                        |   7 +
 include/linux/mlx5/mlx5_ifc.h                      |  11 +-
 include/linux/mlx5/vport.h                         |   2 +-
 20 files changed, 610 insertions(+), 177 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [net-next 01/15] net/mlx5: Simplify unload all rep code
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-12 11:00   ` patchwork-bot+netdevbpf
  2023-06-10  1:42 ` [net-next 02/15] net/mlx5: mlx5_ifc updates for embedded CPU SRIOV Saeed Mahameed
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu,
	Parav Pandit

From: Daniel Jurgens <danielj@nvidia.com>

Instead of using type specific iterators which are only used in one place
just traverse the xarray. It will provide suitable ordering based on the
vport numbers. This will also eliminate the need for changes here when
new types are added.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     | 48 +------------------
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index eafb098db6b0..625982454575 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -55,13 +55,6 @@
 #define mlx5_esw_for_each_rep(esw, i, rep) \
 	xa_for_each(&((esw)->offloads.vport_reps), i, rep)
 
-#define mlx5_esw_for_each_sf_rep(esw, i, rep) \
-	xa_for_each_marked(&((esw)->offloads.vport_reps), i, rep, MLX5_ESW_VPT_SF)
-
-#define mlx5_esw_for_each_vf_rep(esw, index, rep)	\
-	mlx5_esw_for_each_entry_marked(&((esw)->offloads.vport_reps), index, \
-				       rep, (esw)->esw_funcs.num_vfs, MLX5_ESW_VPT_VF)
-
 /* There are two match-all miss flows, one for unicast dst mac and
  * one for multicast.
  */
@@ -2191,18 +2184,6 @@ static int esw_offloads_start(struct mlx5_eswitch *esw,
 	return 0;
 }
 
-static void mlx5_esw_offloads_rep_mark_set(struct mlx5_eswitch *esw,
-					   struct mlx5_eswitch_rep *rep,
-					   xa_mark_t mark)
-{
-	bool mark_set;
-
-	/* Copy the mark from vport to its rep */
-	mark_set = xa_get_mark(&esw->vports, rep->vport, mark);
-	if (mark_set)
-		xa_set_mark(&esw->offloads.vport_reps, rep->vport, mark);
-}
-
 static int mlx5_esw_offloads_rep_init(struct mlx5_eswitch *esw, const struct mlx5_vport *vport)
 {
 	struct mlx5_eswitch_rep *rep;
@@ -2222,9 +2203,6 @@ static int mlx5_esw_offloads_rep_init(struct mlx5_eswitch *esw, const struct mlx
 	if (err)
 		goto insert_err;
 
-	mlx5_esw_offloads_rep_mark_set(esw, rep, MLX5_ESW_VPT_HOST_FN);
-	mlx5_esw_offloads_rep_mark_set(esw, rep, MLX5_ESW_VPT_VF);
-	mlx5_esw_offloads_rep_mark_set(esw, rep, MLX5_ESW_VPT_SF);
 	return 0;
 
 insert_err:
@@ -2365,37 +2343,13 @@ static void __esw_offloads_unload_rep(struct mlx5_eswitch *esw,
 		esw->offloads.rep_ops[rep_type]->unload(rep);
 }
 
-static void __unload_reps_sf_vport(struct mlx5_eswitch *esw, u8 rep_type)
-{
-	struct mlx5_eswitch_rep *rep;
-	unsigned long i;
-
-	mlx5_esw_for_each_sf_rep(esw, i, rep)
-		__esw_offloads_unload_rep(esw, rep, rep_type);
-}
-
 static void __unload_reps_all_vport(struct mlx5_eswitch *esw, u8 rep_type)
 {
 	struct mlx5_eswitch_rep *rep;
 	unsigned long i;
 
-	__unload_reps_sf_vport(esw, rep_type);
-
-	mlx5_esw_for_each_vf_rep(esw, i, rep)
-		__esw_offloads_unload_rep(esw, rep, rep_type);
-
-	if (mlx5_ecpf_vport_exists(esw->dev)) {
-		rep = mlx5_eswitch_get_rep(esw, MLX5_VPORT_ECPF);
-		__esw_offloads_unload_rep(esw, rep, rep_type);
-	}
-
-	if (mlx5_core_is_ecpf_esw_manager(esw->dev)) {
-		rep = mlx5_eswitch_get_rep(esw, MLX5_VPORT_PF);
+	mlx5_esw_for_each_rep(esw, i, rep)
 		__esw_offloads_unload_rep(esw, rep, rep_type);
-	}
-
-	rep = mlx5_eswitch_get_rep(esw, MLX5_VPORT_UPLINK);
-	__esw_offloads_unload_rep(esw, rep, rep_type);
 }
 
 int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 02/15] net/mlx5: mlx5_ifc updates for embedded CPU SRIOV
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
  2023-06-10  1:42 ` [net-next 01/15] net/mlx5: Simplify unload all rep code Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 03/15] net/mlx5: Enable devlink port for embedded cpu VF vports Saeed Mahameed
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens,
	Bodong Wang, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport
of embedded CPU virtual functions that are connected to the eswitch.

Add ec_vf_function to query/set_hca_caps. If set this indicates
accessing a virtual function on the embedded CPU by function ID. This
should only be used with other_function set to 1.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index af3a92ad2e6b..1f4f62cb9f34 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1992,7 +1992,10 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   ts_cqe_metadata_size2wqe_counter[0x5];
 	u8	   reserved_at_250[0x10];
 
-	u8	   reserved_at_260[0x5a0];
+	u8	   reserved_at_260[0x120];
+	u8	   reserved_at_380[0x10];
+	u8	   ec_vf_vport_base[0x10];
+	u8	   reserved_at_3a0[0x460];
 };
 
 enum mlx5_ifc_flow_destination_type {
@@ -4805,7 +4808,8 @@ struct mlx5_ifc_set_hca_cap_in_bits {
 	u8         op_mod[0x10];
 
 	u8         other_function[0x1];
-	u8         reserved_at_41[0xf];
+	u8         ec_vf_function[0x1];
+	u8         reserved_at_42[0xe];
 	u8         function_id[0x10];
 
 	u8         reserved_at_60[0x20];
@@ -5956,7 +5960,8 @@ struct mlx5_ifc_query_hca_cap_in_bits {
 	u8         op_mod[0x10];
 
 	u8         other_function[0x1];
-	u8         reserved_at_41[0xf];
+	u8         ec_vf_function[0x1];
+	u8         reserved_at_42[0xe];
 	u8         function_id[0x10];
 
 	u8         reserved_at_60[0x20];
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 03/15] net/mlx5: Enable devlink port for embedded cpu VF vports
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
  2023-06-10  1:42 ` [net-next 01/15] net/mlx5: Simplify unload all rep code Saeed Mahameed
  2023-06-10  1:42 ` [net-next 02/15] net/mlx5: mlx5_ifc updates for embedded CPU SRIOV Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 04/15] net/mlx5: Update vport caps query/set for EC VFs Saeed Mahameed
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

Enable creation of a devlink port for EC VF vports.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/esw/devlink_port.c     |  8 +++++++-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   | 20 +++++++++++++++++++
 include/linux/mlx5/driver.h                   |  6 ++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index f370f67d9e33..af779c700278 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -18,7 +18,8 @@ static bool mlx5_esw_devlink_port_supported(struct mlx5_eswitch *esw, u16 vport_
 {
 	return vport_num == MLX5_VPORT_UPLINK ||
 	       (mlx5_core_is_ecpf(esw->dev) && vport_num == MLX5_VPORT_PF) ||
-	       mlx5_eswitch_is_vf_vport(esw, vport_num);
+	       mlx5_eswitch_is_vf_vport(esw, vport_num) ||
+	       mlx5_core_is_ec_vf_vport(esw->dev, vport_num);
 }
 
 static struct devlink_port *mlx5_esw_dl_port_alloc(struct mlx5_eswitch *esw, u16 vport_num)
@@ -56,6 +57,11 @@ static struct devlink_port *mlx5_esw_dl_port_alloc(struct mlx5_eswitch *esw, u16
 		dl_port->attrs.switch_id.id_len = ppid.id_len;
 		devlink_port_attrs_pci_vf_set(dl_port, controller_num, pfnum,
 					      vport_num - 1, external);
+	}  else if (mlx5_core_is_ec_vf_vport(esw->dev, vport_num)) {
+		memcpy(dl_port->attrs.switch_id.id, ppid.id, ppid.id_len);
+		dl_port->attrs.switch_id.id_len = ppid.id_len;
+		devlink_port_attrs_pci_vf_set(dl_port, controller_num, pfnum,
+					      vport_num - 1, false);
 	}
 	return dl_port;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 1d879374acaa..0e7b5c6e4020 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -343,4 +343,24 @@ bool mlx5_rdma_supported(struct mlx5_core_dev *dev);
 bool mlx5_vnet_supported(struct mlx5_core_dev *dev);
 bool mlx5_same_hw_devs(struct mlx5_core_dev *dev, struct mlx5_core_dev *peer_dev);
 
+static inline u16 mlx5_core_ec_vf_vport_base(const struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN_2(dev, ec_vf_vport_base);
+}
+
+static inline u16 mlx5_core_ec_sriov_enabled(const struct mlx5_core_dev *dev)
+{
+	return mlx5_core_is_ecpf(dev) && mlx5_core_ec_vf_vport_base(dev);
+}
+
+static inline bool mlx5_core_is_ec_vf_vport(const struct mlx5_core_dev *dev, u16 vport_num)
+{
+	int base_vport = mlx5_core_ec_vf_vport_base(dev);
+	int max_vport = base_vport + mlx5_core_max_ec_vfs(dev);
+
+	if (!mlx5_core_ec_sriov_enabled(dev))
+		return false;
+
+	return (vport_num >= base_vport && vport_num < max_vport);
+}
 #endif /* __MLX5_CORE_H__ */
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 9a744c48eec2..252b6a6965b8 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -474,6 +474,7 @@ struct mlx5_core_sriov {
 	struct mlx5_vf_context	*vfs_ctx;
 	int			num_vfs;
 	u16			max_vfs;
+	u16			max_ec_vfs;
 };
 
 struct mlx5_fc_pool {
@@ -1244,6 +1245,11 @@ static inline u16 mlx5_core_max_vfs(const struct mlx5_core_dev *dev)
 	return dev->priv.sriov.max_vfs;
 }
 
+static inline u16 mlx5_core_max_ec_vfs(const struct mlx5_core_dev *dev)
+{
+	return dev->priv.sriov.max_ec_vfs;
+}
+
 static inline int mlx5_get_gid_table_len(u16 param)
 {
 	if (param > 4) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 04/15] net/mlx5: Update vport caps query/set for EC VFs
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 03/15] net/mlx5: Enable devlink port for embedded cpu VF vports Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 05/15] net/mlx5: Add management of EC VF vports Saeed Mahameed
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

These functions are for query/set by vport, there was an underlying
assumption that vport was equal to function ID. That's not the case for
EC VF functions. Set the ec_vf_function bit accordingly.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  6 +++---
 .../net/ethernet/mellanox/mlx5/core/vport.c   | 19 +++++++++++++++----
 include/linux/mlx5/vport.h                    |  2 +-
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 0e7b5c6e4020..7ca0c7a547aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -325,10 +325,10 @@ void mlx5_unload_one_devl_locked(struct mlx5_core_dev *dev, bool suspend);
 int mlx5_load_one(struct mlx5_core_dev *dev, bool recovery);
 int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery);
 
-int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, u16 function_id,
+int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, u16 vport,
 				  u16 opmod);
-#define mlx5_vport_get_other_func_general_cap(dev, fid, out)		\
-	mlx5_vport_get_other_func_cap(dev, fid, out, MLX5_CAP_GENERAL)
+#define mlx5_vport_get_other_func_general_cap(dev, vport, out)		\
+	mlx5_vport_get_other_func_cap(dev, vport, out, MLX5_CAP_GENERAL)
 
 void mlx5_events_work_enqueue(struct mlx5_core_dev *dev, struct work_struct *work);
 static inline u32 mlx5_sriov_get_vf_total_msix(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index bc66b078a8a1..6d3984dd5b21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -1161,23 +1161,32 @@ u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev)
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_system_image_guid);
 
-int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out,
+static int mlx5_vport_to_func_id(const struct mlx5_core_dev *dev, u16 vport, bool ec_vf_func)
+{
+	return ec_vf_func ? vport - mlx5_core_ec_vf_vport_base(dev)
+			  : vport;
+}
+
+int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 vport, void *out,
 				  u16 opmod)
 {
+	bool ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport);
 	u8 in[MLX5_ST_SZ_BYTES(query_hca_cap_in)] = {};
 
 	opmod = (opmod << 1) | (HCA_CAP_OPMOD_GET_MAX & 0x01);
 	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
 	MLX5_SET(query_hca_cap_in, in, op_mod, opmod);
-	MLX5_SET(query_hca_cap_in, in, function_id, function_id);
+	MLX5_SET(query_hca_cap_in, in, function_id, mlx5_vport_to_func_id(dev, vport, ec_vf_func));
 	MLX5_SET(query_hca_cap_in, in, other_function, true);
+	MLX5_SET(query_hca_cap_in, in, ec_vf_function, ec_vf_func);
 	return mlx5_cmd_exec_inout(dev, query_hca_cap, in, out);
 }
 EXPORT_SYMBOL_GPL(mlx5_vport_get_other_func_cap);
 
 int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap,
-				  u16 function_id, u16 opmod)
+				  u16 vport, u16 opmod)
 {
+	bool ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport);
 	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
 	void *set_hca_cap;
 	void *set_ctx;
@@ -1191,8 +1200,10 @@ int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap
 	MLX5_SET(set_hca_cap_in, set_ctx, op_mod, opmod << 1);
 	set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, capability);
 	memcpy(set_hca_cap, hca_cap, MLX5_ST_SZ_BYTES(cmd_hca_cap));
-	MLX5_SET(set_hca_cap_in, set_ctx, function_id, function_id);
+	MLX5_SET(set_hca_cap_in, set_ctx, function_id,
+		 mlx5_vport_to_func_id(dev, vport, ec_vf_func));
 	MLX5_SET(set_hca_cap_in, set_ctx, other_function, true);
+	MLX5_SET(set_hca_cap_in, set_ctx, ec_vf_function, ec_vf_func);
 	ret = mlx5_cmd_exec_in(dev, set_hca_cap, set_ctx);
 
 	kfree(set_ctx);
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index 7f31432f44c2..fbb9bf447889 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -132,6 +132,6 @@ int mlx5_nic_vport_affiliate_multiport(struct mlx5_core_dev *master_mdev,
 int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev);
 
 u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev);
-int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out,
+int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 vport, void *out,
 				  u16 opmod);
 #endif /* __MLX5_VPORT_H__ */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 05/15] net/mlx5: Add management of EC VF vports
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 04/15] net/mlx5: Update vport caps query/set for EC VFs Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 06/15] net/mlx5: Add/remove peer miss rules for EC VFs Saeed Mahameed
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

Add init, load, unload, and cleanup of the EC VF vports. This includes
changes in how eswitch SRIOV is managed. Previous on an embedded CPU
platform the number of VFs provided when enabling the eswitch was always
0, host VFs vports are handled in the eswitch functions change event
handler. Now track the number of EC VFs as well, so they can be handled
properly in the enable/disable flows.

There are only 3 marks available for use in xarrays, all 3 were already
in use for this use case. EC VF vports are in a known range so we can
access them by index instead of marks.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 125 +++++++++++++++---
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  13 ++
 .../mellanox/mlx5/core/eswitch_offloads.c     |  22 +++
 3 files changed, 143 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index ecd8864d5d11..b33d852aae34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1051,6 +1051,18 @@ static void mlx5_eswitch_clear_vf_vports_info(struct mlx5_eswitch *esw)
 	}
 }
 
+static void mlx5_eswitch_clear_ec_vf_vports_info(struct mlx5_eswitch *esw)
+{
+	struct mlx5_vport *vport;
+	unsigned long i;
+
+	mlx5_esw_for_each_ec_vf_vport(esw, i, vport, esw->esw_funcs.num_ec_vfs) {
+		memset(&vport->qos, 0, sizeof(vport->qos));
+		memset(&vport->info, 0, sizeof(vport->info));
+		vport->info.link_state = MLX5_VPORT_ADMIN_STATE_AUTO;
+	}
+}
+
 /* Public E-Switch API */
 int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, u16 vport_num,
 			    enum mlx5_eswitch_vport_event enabled_events)
@@ -1090,6 +1102,19 @@ void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs)
 	}
 }
 
+static void mlx5_eswitch_unload_ec_vf_vports(struct mlx5_eswitch *esw,
+					     u16 num_ec_vfs)
+{
+	struct mlx5_vport *vport;
+	unsigned long i;
+
+	mlx5_esw_for_each_ec_vf_vport(esw, i, vport, num_ec_vfs) {
+		if (!vport->enabled)
+			continue;
+		mlx5_eswitch_unload_vport(esw, vport->vport);
+	}
+}
+
 int mlx5_eswitch_load_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs,
 				enum mlx5_eswitch_vport_event enabled_events)
 {
@@ -1110,6 +1135,26 @@ int mlx5_eswitch_load_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs,
 	return err;
 }
 
+static int mlx5_eswitch_load_ec_vf_vports(struct mlx5_eswitch *esw, u16 num_ec_vfs,
+					  enum mlx5_eswitch_vport_event enabled_events)
+{
+	struct mlx5_vport *vport;
+	unsigned long i;
+	int err;
+
+	mlx5_esw_for_each_ec_vf_vport(esw, i, vport, num_ec_vfs) {
+		err = mlx5_eswitch_load_vport(esw, vport->vport, enabled_events);
+		if (err)
+			goto vf_err;
+	}
+
+	return 0;
+
+vf_err:
+	mlx5_eswitch_unload_ec_vf_vports(esw, num_ec_vfs);
+	return err;
+}
+
 static int host_pf_enable_hca(struct mlx5_core_dev *dev)
 {
 	if (!mlx5_core_is_ecpf(dev))
@@ -1154,6 +1199,12 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 		ret = mlx5_eswitch_load_vport(esw, MLX5_VPORT_ECPF, enabled_events);
 		if (ret)
 			goto ecpf_err;
+		if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+			ret = mlx5_eswitch_load_ec_vf_vports(esw, esw->esw_funcs.num_ec_vfs,
+							     enabled_events);
+			if (ret)
+				goto ec_vf_err;
+		}
 	}
 
 	/* Enable VF vports */
@@ -1164,6 +1215,9 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 	return 0;
 
 vf_err:
+	if (mlx5_core_ec_sriov_enabled(esw->dev))
+		mlx5_eswitch_unload_ec_vf_vports(esw, esw->esw_funcs.num_ec_vfs);
+ec_vf_err:
 	if (mlx5_ecpf_vport_exists(esw->dev))
 		mlx5_eswitch_unload_vport(esw, MLX5_VPORT_ECPF);
 ecpf_err:
@@ -1180,8 +1234,11 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw)
 {
 	mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs);
 
-	if (mlx5_ecpf_vport_exists(esw->dev))
+	if (mlx5_ecpf_vport_exists(esw->dev)) {
+		if (mlx5_core_ec_sriov_enabled(esw->dev))
+			mlx5_eswitch_unload_ec_vf_vports(esw, esw->esw_funcs.num_vfs);
 		mlx5_eswitch_unload_vport(esw, MLX5_VPORT_ECPF);
+	}
 
 	host_pf_disable_hca(esw->dev);
 	mlx5_eswitch_unload_vport(esw, MLX5_VPORT_PF);
@@ -1225,6 +1282,9 @@ mlx5_eswitch_update_num_of_vfs(struct mlx5_eswitch *esw, int num_vfs)
 
 	esw->esw_funcs.num_vfs = MLX5_GET(query_esw_functions_out, out,
 					  host_params_context.host_num_of_vfs);
+	if (mlx5_core_ec_sriov_enabled(esw->dev))
+		esw->esw_funcs.num_ec_vfs = num_vfs;
+
 	kvfree(out);
 }
 
@@ -1332,9 +1392,9 @@ int mlx5_eswitch_enable_locked(struct mlx5_eswitch *esw, int num_vfs)
 
 	mlx5_eswitch_event_handlers_register(esw);
 
-	esw_info(esw->dev, "Enable: mode(%s), nvfs(%d), active vports(%d)\n",
+	esw_info(esw->dev, "Enable: mode(%s), nvfs(%d), necvfs(%d), active vports(%d)\n",
 		 esw->mode == MLX5_ESWITCH_LEGACY ? "LEGACY" : "OFFLOADS",
-		 esw->esw_funcs.num_vfs, esw->enabled_vports);
+		 esw->esw_funcs.num_vfs, esw->esw_funcs.num_ec_vfs, esw->enabled_vports);
 
 	mlx5_esw_mode_change_notify(esw, esw->mode);
 
@@ -1356,7 +1416,7 @@ int mlx5_eswitch_enable_locked(struct mlx5_eswitch *esw, int num_vfs)
 int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs)
 {
 	bool toggle_lag;
-	int ret;
+	int ret = 0;
 
 	if (!mlx5_esw_allowed(esw))
 		return 0;
@@ -1376,10 +1436,21 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs)
 
 		vport_events = (esw->mode == MLX5_ESWITCH_LEGACY) ?
 					MLX5_LEGACY_SRIOV_VPORT_EVENTS : MLX5_VPORT_UC_ADDR_CHANGE;
-		ret = mlx5_eswitch_load_vf_vports(esw, num_vfs, vport_events);
-		if (!ret)
-			esw->esw_funcs.num_vfs = num_vfs;
+		/* If this is the ECPF the number of host VFs is managed via the
+		 * eswitch function change event handler, and any num_vfs provided
+		 * here are intended to be EC VFs.
+		 */
+		if (!mlx5_core_is_ecpf(esw->dev)) {
+			ret = mlx5_eswitch_load_vf_vports(esw, num_vfs, vport_events);
+			if (!ret)
+				esw->esw_funcs.num_vfs = num_vfs;
+		} else if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+			ret = mlx5_eswitch_load_ec_vf_vports(esw, num_vfs, vport_events);
+			if (!ret)
+				esw->esw_funcs.num_ec_vfs = num_vfs;
+		}
 	}
+
 	up_write(&esw->mode_lock);
 
 	if (toggle_lag)
@@ -1399,16 +1470,22 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf)
 	/* If driver is unloaded, this function is called twice by remove_one()
 	 * and mlx5_unload(). Prevent the second call.
 	 */
-	if (!esw->esw_funcs.num_vfs && !clear_vf)
+	if (!esw->esw_funcs.num_vfs && !esw->esw_funcs.num_ec_vfs && !clear_vf)
 		goto unlock;
 
-	esw_info(esw->dev, "Unload vfs: mode(%s), nvfs(%d), active vports(%d)\n",
+	esw_info(esw->dev, "Unload vfs: mode(%s), nvfs(%d), necvfs(%d), active vports(%d)\n",
 		 esw->mode == MLX5_ESWITCH_LEGACY ? "LEGACY" : "OFFLOADS",
-		 esw->esw_funcs.num_vfs, esw->enabled_vports);
-
-	mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs);
-	if (clear_vf)
-		mlx5_eswitch_clear_vf_vports_info(esw);
+		 esw->esw_funcs.num_vfs, esw->esw_funcs.num_ec_vfs, esw->enabled_vports);
+
+	if (!mlx5_core_is_ecpf(esw->dev)) {
+		mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs);
+		if (clear_vf)
+			mlx5_eswitch_clear_vf_vports_info(esw);
+	} else if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+		mlx5_eswitch_unload_ec_vf_vports(esw, esw->esw_funcs.num_ec_vfs);
+		if (clear_vf)
+			mlx5_eswitch_clear_ec_vf_vports_info(esw);
+	}
 
 	if (esw->mode == MLX5_ESWITCH_OFFLOADS) {
 		struct devlink *devlink = priv_to_devlink(esw->dev);
@@ -1419,7 +1496,10 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf)
 	if (esw->mode == MLX5_ESWITCH_LEGACY)
 		mlx5_eswitch_disable_locked(esw);
 
-	esw->esw_funcs.num_vfs = 0;
+	if (!mlx5_core_is_ecpf(esw->dev))
+		esw->esw_funcs.num_vfs = 0;
+	else
+		esw->esw_funcs.num_ec_vfs = 0;
 
 unlock:
 	up_write(&esw->mode_lock);
@@ -1439,9 +1519,9 @@ void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw)
 
 	mlx5_eswitch_event_handlers_unregister(esw);
 
-	esw_info(esw->dev, "Disable: mode(%s), nvfs(%d), active vports(%d)\n",
+	esw_info(esw->dev, "Disable: mode(%s), nvfs(%d), necvfs(%d), active vports(%d)\n",
 		 esw->mode == MLX5_ESWITCH_LEGACY ? "LEGACY" : "OFFLOADS",
-		 esw->esw_funcs.num_vfs, esw->enabled_vports);
+		 esw->esw_funcs.num_vfs, esw->esw_funcs.num_ec_vfs, esw->enabled_vports);
 
 	if (esw->fdb_table.flags & MLX5_ESW_FDB_CREATED) {
 		esw->fdb_table.flags &= ~MLX5_ESW_FDB_CREATED;
@@ -1601,6 +1681,17 @@ static int mlx5_esw_vports_init(struct mlx5_eswitch *esw)
 		idx++;
 	}
 
+	if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+		int ec_vf_base_num = mlx5_core_ec_vf_vport_base(dev);
+
+		for (i = 0; i < mlx5_core_max_ec_vfs(esw->dev); i++) {
+			err = mlx5_esw_vport_alloc(esw, idx, ec_vf_base_num + i);
+			if (err)
+				goto err;
+			idx++;
+		}
+	}
+
 	if (mlx5_ecpf_vport_exists(dev) ||
 	    mlx5_core_is_ecpf_esw_manager(dev)) {
 		err = mlx5_esw_vport_alloc(esw, idx, MLX5_VPORT_ECPF);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index d3608f198e0a..266b60fefe25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -289,6 +289,7 @@ struct mlx5_host_work {
 struct mlx5_esw_functions {
 	struct mlx5_nb		nb;
 	u16			num_vfs;
+	u16			num_ec_vfs;
 };
 
 enum {
@@ -654,6 +655,18 @@ void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw);
 #define mlx5_esw_for_each_host_func_vport(esw, index, vport, last)	\
 	mlx5_esw_for_each_vport_marked(esw, index, vport, last, MLX5_ESW_VPT_HOST_FN)
 
+/* This macro should only be used if EC SRIOV is enabled.
+ *
+ * Because there were no more marks available on the xarray this uses a
+ * for_each_range approach. The range is only valid when EC SRIOV is enabled
+ */
+#define mlx5_esw_for_each_ec_vf_vport(esw, index, vport, last)		\
+	xa_for_each_range(&((esw)->vports),				\
+			  index,					\
+			  vport,					\
+			  MLX5_CAP_GEN_2((esw->dev), ec_vf_vport_base),	\
+			  (last) - 1)
+
 struct mlx5_eswitch *mlx5_devlink_eswitch_get(struct devlink *devlink);
 struct mlx5_vport *__must_check
 mlx5_eswitch_get_vport(struct mlx5_eswitch *esw, u16 vport_num);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 625982454575..68798aed792f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3287,6 +3287,9 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
 	/* Representor will control the vport link state */
 	mlx5_esw_for_each_vf_vport(esw, i, vport, esw->esw_funcs.num_vfs)
 		vport->info.link_state = MLX5_VPORT_ADMIN_STATE_DOWN;
+	if (mlx5_core_ec_sriov_enabled(esw->dev))
+		mlx5_esw_for_each_ec_vf_vport(esw, i, vport, esw->esw_funcs.num_ec_vfs)
+			vport->info.link_state = MLX5_VPORT_ADMIN_STATE_DOWN;
 
 	/* Uplink vport rep must load first. */
 	err = esw_offloads_load_rep(esw, MLX5_VPORT_UPLINK);
@@ -3524,8 +3527,27 @@ static int mlx5_esw_vports_inline_set(struct mlx5_eswitch *esw, u8 mlx5_mode,
 			goto revert_inline_mode;
 		}
 	}
+	if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+		mlx5_esw_for_each_ec_vf_vport(esw, i, vport, esw->esw_funcs.num_ec_vfs) {
+			err = mlx5_modify_nic_vport_min_inline(dev, vport->vport, mlx5_mode);
+			if (err) {
+				err_vport_num = vport->vport;
+				NL_SET_ERR_MSG_MOD(extack,
+						   "Failed to set min inline on vport");
+				goto revert_ec_vf_inline_mode;
+			}
+		}
+	}
 	return 0;
 
+revert_ec_vf_inline_mode:
+	mlx5_esw_for_each_ec_vf_vport(esw, i, vport, esw->esw_funcs.num_ec_vfs) {
+		if (vport->vport == err_vport_num)
+			break;
+		mlx5_modify_nic_vport_min_inline(dev,
+						 vport->vport,
+						 esw->offloads.inline_mode);
+	}
 revert_inline_mode:
 	mlx5_esw_for_each_host_func_vport(esw, i, vport, esw->esw_funcs.num_vfs) {
 		if (vport->vport == err_vport_num)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 06/15] net/mlx5: Add/remove peer miss rules for EC VFs
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 05/15] net/mlx5: Add management of EC VF vports Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 07/15] net/mlx5: Add new page type for EC VF pages Saeed Mahameed
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu,
	Roi Dayan

From: Daniel Jurgens <danielj@nvidia.com>

Add and remove the peer miss rules for EC VFs. It's possible that there
are different amounts of total VFs per function so only create rules for
the minimum number of max VFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     | 32 +++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 68798aed792f..fdf482f6fb34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1125,11 +1125,32 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
 		flows[vport->index] = flow;
 	}
 
+	if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+		mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) {
+			if (i >= mlx5_core_max_ec_vfs(peer_dev))
+				break;
+			esw_set_peer_miss_rule_source_port(esw, peer_dev->priv.eswitch,
+							   spec, vport->vport);
+			flow = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb,
+						   spec, &flow_act, &dest, 1);
+			if (IS_ERR(flow)) {
+				err = PTR_ERR(flow);
+				goto add_ec_vf_flow_err;
+			}
+			flows[vport->index] = flow;
+		}
+	}
 	esw->fdb_table.offloads.peer_miss_rules[mlx5_get_dev_index(peer_dev)] = flows;
 
 	kvfree(spec);
 	return 0;
 
+add_ec_vf_flow_err:
+	mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) {
+		if (!flows[vport->index])
+			continue;
+		mlx5_del_flow_rules(flows[vport->index]);
+	}
 add_vf_flow_err:
 	mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev)) {
 		if (!flows[vport->index])
@@ -1162,6 +1183,17 @@ static void esw_del_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
 
 	flows = esw->fdb_table.offloads.peer_miss_rules[mlx5_get_dev_index(peer_dev)];
 
+	if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+		mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) {
+			/* The flow for a particular vport could be NULL if the other ECPF
+			 * has fewer or no VFs enabled
+			 */
+			if (!flows[vport->index])
+				continue;
+			mlx5_del_flow_rules(flows[vport->index]);
+		}
+	}
+
 	mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev))
 		mlx5_del_flow_rules(flows[vport->index]);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 07/15] net/mlx5: Add new page type for EC VF pages
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 06/15] net/mlx5: Add/remove peer miss rules for EC VFs Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 08/15] net/mlx5: Use correct vport when restoring GUIDs Saeed Mahameed
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

When the embedded cpu supports SRIOV it can be enabled and disabled
independently from the host SRIOV. Track the pages separately so we can
properly wait for returned VF pages.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c   |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 11 ++++++++++-
 include/linux/mlx5/driver.h                         |  1 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index bb95b40d25eb..fc13b41cc9b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -246,6 +246,7 @@ void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev)
 
 	debugfs_create_u32("fw_pages_total", 0400, pages, &dev->priv.fw_pages);
 	debugfs_create_u32("fw_pages_vfs", 0400, pages, &dev->priv.page_counters[MLX5_VF]);
+	debugfs_create_u32("fw_pages_ec_vfs", 0400, pages, &dev->priv.page_counters[MLX5_EC_VF]);
 	debugfs_create_u32("fw_pages_sfs", 0400, pages, &dev->priv.page_counters[MLX5_SF]);
 	debugfs_create_u32("fw_pages_host_pf", 0400, pages, &dev->priv.page_counters[MLX5_HOST_PF]);
 	debugfs_create_u32("fw_pages_alloc_failed", 0400, pages, &dev->priv.fw_pages_alloc_failed);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index 95dc67fb3001..dcf58efac159 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -79,7 +79,13 @@ static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_funct
 	if (!func_id)
 		return mlx5_core_is_ecpf(dev) && !ec_function ? MLX5_HOST_PF : MLX5_PF;
 
-	return func_id <= mlx5_core_max_vfs(dev) ?  MLX5_VF : MLX5_SF;
+	if (func_id <= max(mlx5_core_max_vfs(dev), mlx5_core_max_ec_vfs(dev))) {
+		if (ec_function)
+			return MLX5_EC_VF;
+		else
+			return MLX5_VF;
+	}
+	return MLX5_SF;
 }
 
 static u32 mlx5_get_ec_function(u32 function)
@@ -730,6 +736,9 @@ int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev)
 	WARN(dev->priv.page_counters[MLX5_HOST_PF],
 	     "External host PF FW pages counter is %d after reclaiming all pages\n",
 	     dev->priv.page_counters[MLX5_HOST_PF]);
+	WARN(dev->priv.page_counters[MLX5_EC_VF],
+	     "EC VFs FW pages counter is %d after reclaiming all pages\n",
+	     dev->priv.page_counters[MLX5_EC_VF]);
 
 	return 0;
 }
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 252b6a6965b8..18a608a1f567 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -581,6 +581,7 @@ enum mlx5_func_type {
 	MLX5_VF,
 	MLX5_SF,
 	MLX5_HOST_PF,
+	MLX5_EC_VF,
 	MLX5_FUNC_TYPE_NUM,
 };
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 08/15] net/mlx5: Use correct vport when restoring GUIDs
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 07/15] net/mlx5: Add new page type for EC VF pages Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 09/15] net/mlx5: Query correct caps for min msix vectors Saeed Mahameed
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

Prior to enabling EC VF functionality the vport number and function ID
were always the same. That's not the case now. Use the correct vport
number to modify the HCA vport context.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index f07d00929162..c2463a1d7035 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -37,7 +37,7 @@
 #include "mlx5_irq.h"
 #include "eswitch.h"
 
-static int sriov_restore_guids(struct mlx5_core_dev *dev, int vf)
+static int sriov_restore_guids(struct mlx5_core_dev *dev, int vf, u16 func_id)
 {
 	struct mlx5_core_sriov *sriov = &dev->priv.sriov;
 	struct mlx5_hca_vport_context *in;
@@ -59,7 +59,7 @@ static int sriov_restore_guids(struct mlx5_core_dev *dev, int vf)
 			!!(in->node_guid) * MLX5_HCA_VPORT_SEL_NODE_GUID |
 			!!(in->policy) * MLX5_HCA_VPORT_SEL_STATE_POLICY;
 
-		err = mlx5_core_modify_hca_vport_context(dev, 1, 1, vf + 1, in);
+		err = mlx5_core_modify_hca_vport_context(dev, 1, 1, func_id, in);
 		if (err)
 			mlx5_core_warn(dev, "modify vport context failed, unable to restore VF %d settings\n", vf);
 
@@ -73,6 +73,7 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
 {
 	struct mlx5_core_sriov *sriov = &dev->priv.sriov;
 	int err, vf, num_msix_count;
+	int vport_num;
 
 	err = mlx5_eswitch_enable(dev->priv.eswitch, num_vfs);
 	if (err) {
@@ -104,7 +105,10 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
 
 		sriov->vfs_ctx[vf].enabled = 1;
 		if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) {
-			err = sriov_restore_guids(dev, vf);
+			vport_num = mlx5_core_ec_sriov_enabled(dev) ?
+					mlx5_core_ec_vf_vport_base(dev) + vf
+					: vf + 1;
+			err = sriov_restore_guids(dev, vf, vport_num);
 			if (err) {
 				mlx5_core_warn(dev,
 					       "failed to restore VF %d settings, err %d\n",
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 09/15] net/mlx5: Query correct caps for min msix vectors
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 08/15] net/mlx5: Use correct vport when restoring GUIDs Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 10/15] net/mlx5: Update SRIOV enable/disable to handle EC/VFs Saeed Mahameed
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

The VFs on the host and the embedded CPU platform share function
numbers. Set the ec_vf_function field to query the caps for the correct
function.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/pci_irq.c    | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 843da89a9035..b2dbae763ca6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -41,6 +41,15 @@ struct mlx5_irq_table {
 	struct mlx5_irq_pool *sf_comp_pool;
 };
 
+static int mlx5_core_func_to_vport(const struct mlx5_core_dev *dev,
+				   int func,
+				   bool ec_vf_func)
+{
+	if (!ec_vf_func)
+		return func;
+	return mlx5_core_ec_vf_vport_base(dev) + func - 1;
+}
+
 /**
  * mlx5_get_default_msix_vec_count - Get the default number of MSI-X vectors
  *                                   to be ssigned to each VF.
@@ -79,6 +88,8 @@ int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id,
 	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
 	void *hca_cap = NULL, *query_cap = NULL, *cap;
 	int num_vf_msix, min_msix, max_msix;
+	bool ec_vf_function;
+	int vport;
 	int ret;
 
 	num_vf_msix = MLX5_CAP_GEN_MAX(dev, num_total_dynamic_vf_msix);
@@ -104,7 +115,9 @@ int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id,
 		goto out;
 	}
 
-	ret = mlx5_vport_get_other_func_general_cap(dev, function_id, query_cap);
+	ec_vf_function = mlx5_core_ec_sriov_enabled(dev);
+	vport = mlx5_core_func_to_vport(dev, function_id, ec_vf_function);
+	ret = mlx5_vport_get_other_func_general_cap(dev, vport, query_cap);
 	if (ret)
 		goto out;
 
@@ -115,6 +128,7 @@ int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id,
 
 	MLX5_SET(set_hca_cap_in, hca_cap, opcode, MLX5_CMD_OP_SET_HCA_CAP);
 	MLX5_SET(set_hca_cap_in, hca_cap, other_function, 1);
+	MLX5_SET(set_hca_cap_in, hca_cap, ec_vf_function, ec_vf_function);
 	MLX5_SET(set_hca_cap_in, hca_cap, function_id, function_id);
 
 	MLX5_SET(set_hca_cap_in, hca_cap, op_mod,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 10/15] net/mlx5: Update SRIOV enable/disable to handle EC/VFs
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 09/15] net/mlx5: Query correct caps for min msix vectors Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 11/15] net/mlx5: Set max number of embedded CPU VFs Saeed Mahameed
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens, William Tu

From: Daniel Jurgens <danielj@nvidia.com>

Previously on the embedded CPU platform SRIOV was never enabled/disabled
via mlx5_core_sriov_configure. Host VF updates are provided by an event
handler. Now in the disable flow it must be known if this is a disable
due to driver unload or SRIOV detach, or if the user updated the number
of VFs. If due to change in the number of VFs only wait for the pages of
ECVFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/main.c    |  2 +-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/sriov.c   | 35 +++++++++++++++----
 3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index d6ee016deae1..fed8b48a5b20 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1809,7 +1809,7 @@ static void remove_one(struct pci_dev *pdev)
 	mlx5_drain_fw_reset(dev);
 	mlx5_drain_health_wq(dev);
 	devlink_unregister(devlink);
-	mlx5_sriov_disable(pdev);
+	mlx5_sriov_disable(pdev, false);
 	mlx5_thermal_uninit(dev);
 	mlx5_crdump_disable(dev);
 	mlx5_uninit_one(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 7ca0c7a547aa..7a5f04082058 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -195,7 +195,7 @@ void mlx5_sriov_cleanup(struct mlx5_core_dev *dev);
 int mlx5_sriov_attach(struct mlx5_core_dev *dev);
 void mlx5_sriov_detach(struct mlx5_core_dev *dev);
 int mlx5_core_sriov_configure(struct pci_dev *dev, int num_vfs);
-void mlx5_sriov_disable(struct pci_dev *pdev);
+void mlx5_sriov_disable(struct pci_dev *pdev, bool num_vf_change);
 int mlx5_core_sriov_set_msix_vec_count(struct pci_dev *vf, int msix_vec_count);
 int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index c2463a1d7035..b73583b0a0fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -123,9 +123,11 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
 }
 
 static void
-mlx5_device_disable_sriov(struct mlx5_core_dev *dev, int num_vfs, bool clear_vf)
+mlx5_device_disable_sriov(struct mlx5_core_dev *dev, int num_vfs, bool clear_vf, bool num_vf_change)
 {
 	struct mlx5_core_sriov *sriov = &dev->priv.sriov;
+	bool wait_for_ec_vf_pages = true;
+	bool wait_for_vf_pages = true;
 	int err;
 	int vf;
 
@@ -147,11 +149,30 @@ mlx5_device_disable_sriov(struct mlx5_core_dev *dev, int num_vfs, bool clear_vf)
 
 	mlx5_eswitch_disable_sriov(dev->priv.eswitch, clear_vf);
 
+	/* There are a number of scenarios when SRIOV is being disabled:
+	 *     1. VFs or ECVFs had been created, and now set back to 0 (num_vf_change == true).
+	 *		- If EC SRIOV is enabled then this flow is happening on the
+	 *		  embedded platform, wait for only EC VF pages.
+	 *		- If EC SRIOV is not enabled this flow is happening on non-embedded
+	 *		  platform, wait for the VF pages.
+	 *
+	 *     2. The driver is being unloaded. In this case wait for all pages.
+	 */
+	if (num_vf_change) {
+		if (mlx5_core_ec_sriov_enabled(dev))
+			wait_for_vf_pages = false;
+		else
+			wait_for_ec_vf_pages = false;
+	}
+
+	if (wait_for_ec_vf_pages && mlx5_wait_for_pages(dev, &dev->priv.page_counters[MLX5_EC_VF]))
+		mlx5_core_warn(dev, "timeout reclaiming EC VFs pages\n");
+
 	/* For ECPFs, skip waiting for host VF pages until ECPF is destroyed */
 	if (mlx5_core_is_ecpf(dev))
 		return;
 
-	if (mlx5_wait_for_pages(dev, &dev->priv.page_counters[MLX5_VF]))
+	if (wait_for_vf_pages && mlx5_wait_for_pages(dev, &dev->priv.page_counters[MLX5_VF]))
 		mlx5_core_warn(dev, "timeout reclaiming VFs pages\n");
 }
 
@@ -172,12 +193,12 @@ static int mlx5_sriov_enable(struct pci_dev *pdev, int num_vfs)
 	err = pci_enable_sriov(pdev, num_vfs);
 	if (err) {
 		mlx5_core_warn(dev, "pci_enable_sriov failed : %d\n", err);
-		mlx5_device_disable_sriov(dev, num_vfs, true);
+		mlx5_device_disable_sriov(dev, num_vfs, true, true);
 	}
 	return err;
 }
 
-void mlx5_sriov_disable(struct pci_dev *pdev)
+void mlx5_sriov_disable(struct pci_dev *pdev, bool num_vf_change)
 {
 	struct mlx5_core_dev *dev  = pci_get_drvdata(pdev);
 	struct devlink *devlink = priv_to_devlink(dev);
@@ -185,7 +206,7 @@ void mlx5_sriov_disable(struct pci_dev *pdev)
 
 	pci_disable_sriov(pdev);
 	devl_lock(devlink);
-	mlx5_device_disable_sriov(dev, num_vfs, true);
+	mlx5_device_disable_sriov(dev, num_vfs, true, num_vf_change);
 	devl_unlock(devlink);
 }
 
@@ -200,7 +221,7 @@ int mlx5_core_sriov_configure(struct pci_dev *pdev, int num_vfs)
 	if (num_vfs)
 		err = mlx5_sriov_enable(pdev, num_vfs);
 	else
-		mlx5_sriov_disable(pdev);
+		mlx5_sriov_disable(pdev, true);
 
 	if (!err)
 		sriov->num_vfs = num_vfs;
@@ -245,7 +266,7 @@ void mlx5_sriov_detach(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_pf(dev))
 		return;
 
-	mlx5_device_disable_sriov(dev, pci_num_vf(dev->pdev), false);
+	mlx5_device_disable_sriov(dev, pci_num_vf(dev->pdev), false, false);
 }
 
 static u16 mlx5_get_max_vfs(struct mlx5_core_dev *dev)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 11/15] net/mlx5: Set max number of embedded CPU VFs
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 10/15] net/mlx5: Update SRIOV enable/disable to handle EC/VFs Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 12/15] net/mlx5: Split function_setup() to enable and open functions Saeed Mahameed
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Daniel Jurgens

From: Daniel Jurgens <danielj@nvidia.com>

Set the maximum number of embedded cpu VF functions available.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index b73583b0a0fe..4e42a3b9b8ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -305,6 +305,7 @@ int mlx5_sriov_init(struct mlx5_core_dev *dev)
 	total_vfs = pci_sriov_get_totalvfs(pdev);
 	sriov->max_vfs = mlx5_get_max_vfs(dev);
 	sriov->num_vfs = pci_num_vf(pdev);
+	sriov->max_ec_vfs = mlx5_core_ec_sriov_enabled(dev) ? pci_sriov_get_totalvfs(dev->pdev) : 0;
 	sriov->vfs_ctx = kcalloc(total_vfs, sizeof(*sriov->vfs_ctx), GFP_KERNEL);
 	if (!sriov->vfs_ctx)
 		return -ENOMEM;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 12/15] net/mlx5: Split function_setup() to enable and open functions
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 11/15] net/mlx5: Set max number of embedded CPU VFs Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 13/15] net/mlx5: Move esw multiport devlink param to eswitch code Saeed Mahameed
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who
desire to disable some of the SF aux devices, and with large scale-1K
SFs for example, this user will waste more than 3 minutes on
mlx5_cmd_init_hca() which isn't needed at this stage.

Downstream patch will change SFs which are probe over the E-switch,
local SFs, to be probed without any aux dev. In order to support this,
split function_setup() to avoid executing mlx5_cmd_init_hca().

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/main.c    | 83 +++++++++++++------
 1 file changed, 58 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index fed8b48a5b20..0faae77d84e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1118,7 +1118,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_devcom_unregister_device(dev->priv.devcom);
 }
 
-static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot, u64 timeout)
+static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeout)
 {
 	int err;
 
@@ -1183,28 +1183,56 @@ static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot, u64 timeout
 		goto reclaim_boot_pages;
 	}
 
+	return 0;
+
+reclaim_boot_pages:
+	mlx5_reclaim_startup_pages(dev);
+err_disable_hca:
+	mlx5_core_disable_hca(dev, 0);
+stop_health_poll:
+	mlx5_stop_health_poll(dev, boot);
+err_cmd_cleanup:
+	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
+	mlx5_cmd_cleanup(dev);
+
+	return err;
+}
+
+static void mlx5_function_disable(struct mlx5_core_dev *dev, bool boot)
+{
+	mlx5_reclaim_startup_pages(dev);
+	mlx5_core_disable_hca(dev, 0);
+	mlx5_stop_health_poll(dev, boot);
+	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
+	mlx5_cmd_cleanup(dev);
+}
+
+static int mlx5_function_open(struct mlx5_core_dev *dev)
+{
+	int err;
+
 	err = set_hca_ctrl(dev);
 	if (err) {
 		mlx5_core_err(dev, "set_hca_ctrl failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	err = set_hca_cap(dev);
 	if (err) {
 		mlx5_core_err(dev, "set_hca_cap failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	err = mlx5_satisfy_startup_pages(dev, 0);
 	if (err) {
 		mlx5_core_err(dev, "failed to allocate init pages\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	err = mlx5_cmd_init_hca(dev, sw_owner_id);
 	if (err) {
 		mlx5_core_err(dev, "init hca failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	mlx5_set_driver_version(dev);
@@ -1212,26 +1240,13 @@ static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot, u64 timeout
 	err = mlx5_query_hca_caps(dev);
 	if (err) {
 		mlx5_core_err(dev, "query hca failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 	mlx5_start_health_fw_log_up(dev);
-
 	return 0;
-
-reclaim_boot_pages:
-	mlx5_reclaim_startup_pages(dev);
-err_disable_hca:
-	mlx5_core_disable_hca(dev, 0);
-stop_health_poll:
-	mlx5_stop_health_poll(dev, boot);
-err_cmd_cleanup:
-	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
-	mlx5_cmd_cleanup(dev);
-
-	return err;
 }
 
-static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
+static int mlx5_function_close(struct mlx5_core_dev *dev)
 {
 	int err;
 
@@ -1240,15 +1255,33 @@ static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
 		mlx5_core_err(dev, "tear_down_hca failed, skip cleanup\n");
 		return err;
 	}
-	mlx5_reclaim_startup_pages(dev);
-	mlx5_core_disable_hca(dev, 0);
-	mlx5_stop_health_poll(dev, boot);
-	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
-	mlx5_cmd_cleanup(dev);
 
 	return 0;
 }
 
+static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot, u64 timeout)
+{
+	int err;
+
+	err = mlx5_function_enable(dev, boot, timeout);
+	if (err)
+		return err;
+
+	err = mlx5_function_open(dev);
+	if (err)
+		mlx5_function_disable(dev, boot);
+	return err;
+}
+
+static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
+{
+	int err = mlx5_function_close(dev);
+
+	if (!err)
+		mlx5_function_disable(dev, boot);
+	return err;
+}
+
 static int mlx5_load(struct mlx5_core_dev *dev)
 {
 	int err;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 13/15] net/mlx5: Move esw multiport devlink param to eswitch code
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 12/15] net/mlx5: Split function_setup() to enable and open functions Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  1:42 ` [net-next 14/15] net/mlx5: Light probe local SFs Saeed Mahameed
  2023-06-10  1:42 ` [net-next 15/15] net/mlx5e: Remove a useless function call Saeed Mahameed
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

Move the param registration and handling code into the eswitch
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 34 -------------
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 49 ++++++++++++++++++-
 2 files changed, 47 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 63635cc44479..27197acdb4d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -7,7 +7,6 @@
 #include "fw_reset.h"
 #include "fs_core.h"
 #include "eswitch.h"
-#include "lag/lag.h"
 #include "esw/qos.h"
 #include "sf/dev/dev.h"
 #include "sf/sf.h"
@@ -427,33 +426,6 @@ static int mlx5_devlink_large_group_num_validate(struct devlink *devlink, u32 id
 
 	return 0;
 }
-
-static int mlx5_devlink_esw_multiport_set(struct devlink *devlink, u32 id,
-					  struct devlink_param_gset_ctx *ctx)
-{
-	struct mlx5_core_dev *dev = devlink_priv(devlink);
-
-	if (!MLX5_ESWITCH_MANAGER(dev))
-		return -EOPNOTSUPP;
-
-	if (ctx->val.vbool)
-		return mlx5_lag_mpesw_enable(dev);
-
-	mlx5_lag_mpesw_disable(dev);
-	return 0;
-}
-
-static int mlx5_devlink_esw_multiport_get(struct devlink *devlink, u32 id,
-					  struct devlink_param_gset_ctx *ctx)
-{
-	struct mlx5_core_dev *dev = devlink_priv(devlink);
-
-	if (!MLX5_ESWITCH_MANAGER(dev))
-		return -EOPNOTSUPP;
-
-	ctx->val.vbool = mlx5_lag_is_mpesw(dev);
-	return 0;
-}
 #endif
 
 static int mlx5_devlink_eq_depth_validate(struct devlink *devlink, u32 id,
@@ -527,12 +499,6 @@ static const struct devlink_param mlx5_devlink_params[] = {
 			     BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
 			     NULL, NULL,
 			     mlx5_devlink_large_group_num_validate),
-	DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT,
-			     "esw_multiport", DEVLINK_PARAM_TYPE_BOOL,
-			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
-			     mlx5_devlink_esw_multiport_get,
-			     mlx5_devlink_esw_multiport_set,
-			     NULL),
 #endif
 	DEVLINK_PARAM_GENERIC(IO_EQ_SIZE, BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
 			      NULL, NULL, mlx5_devlink_eq_depth_validate),
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b33d852aae34..2af9c4646bc7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -41,6 +41,7 @@
 #include "esw/qos.h"
 #include "mlx5_core.h"
 #include "lib/eq.h"
+#include "lag/lag.h"
 #include "eswitch.h"
 #include "fs_core.h"
 #include "devlink.h"
@@ -1709,6 +1710,38 @@ static int mlx5_esw_vports_init(struct mlx5_eswitch *esw)
 	return err;
 }
 
+static int mlx5_devlink_esw_multiport_set(struct devlink *devlink, u32 id,
+					  struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	if (!MLX5_ESWITCH_MANAGER(dev))
+		return -EOPNOTSUPP;
+
+	if (ctx->val.vbool)
+		return mlx5_lag_mpesw_enable(dev);
+
+	mlx5_lag_mpesw_disable(dev);
+	return 0;
+}
+
+static int mlx5_devlink_esw_multiport_get(struct devlink *devlink, u32 id,
+					  struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	ctx->val.vbool = mlx5_lag_is_mpesw(dev);
+	return 0;
+}
+
+static const struct devlink_param mlx5_eswitch_params[] = {
+	DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT,
+			     "esw_multiport", DEVLINK_PARAM_TYPE_BOOL,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     mlx5_devlink_esw_multiport_get,
+			     mlx5_devlink_esw_multiport_set, NULL),
+};
+
 int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eswitch *esw;
@@ -1717,9 +1750,16 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 	if (!MLX5_VPORT_MANAGER(dev) && !MLX5_ESWITCH_MANAGER(dev))
 		return 0;
 
+	err = devl_params_register(priv_to_devlink(dev), mlx5_eswitch_params,
+				   ARRAY_SIZE(mlx5_eswitch_params));
+	if (err)
+		return err;
+
 	esw = kzalloc(sizeof(*esw), GFP_KERNEL);
-	if (!esw)
-		return -ENOMEM;
+	if (!esw) {
+		err = -ENOMEM;
+		goto unregister_param;
+	}
 
 	esw->dev = dev;
 	esw->manager_vport = mlx5_eswitch_manager_vport(dev);
@@ -1779,6 +1819,9 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 	if (esw->work_queue)
 		destroy_workqueue(esw->work_queue);
 	kfree(esw);
+unregister_param:
+	devl_params_unregister(priv_to_devlink(dev), mlx5_eswitch_params,
+			       ARRAY_SIZE(mlx5_eswitch_params));
 	return err;
 }
 
@@ -1802,6 +1845,8 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
 	esw_offloads_cleanup(esw);
 	mlx5_esw_vports_cleanup(esw);
 	kfree(esw);
+	devl_params_unregister(priv_to_devlink(esw->dev), mlx5_eswitch_params,
+			       ARRAY_SIZE(mlx5_eswitch_params));
 }
 
 /* Vport Administration */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 13/15] net/mlx5: Move esw multiport devlink param to eswitch code Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  2023-06-10  7:01   ` Jakub Kicinski
  2023-06-10  1:42 ` [net-next 15/15] net/mlx5e: Remove a useless function call Saeed Mahameed
  14 siblings, 1 reply; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.

In order to save the time of the reload, local SFs will probe without
any auxiliary sub-device, so that the SFs can be configured prior to
its full probe.

The defaults of the enable_* devlink params of these SFs are set to
false.

Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/switchdev.rst      |  20 +++
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |  16 +++
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  20 ++-
 .../net/ethernet/mellanox/mlx5/core/health.c  |  24 ++--
 .../net/ethernet/mellanox/mlx5/core/main.c    | 124 ++++++++++++++++--
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   7 +
 .../mellanox/mlx5/core/sf/dev/driver.c        |  15 ++-
 7 files changed, 203 insertions(+), 23 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
index 01deedb71597..db62187eebce 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
@@ -45,6 +45,26 @@ Following bridge VLAN functions are supported by mlx5:
 Subfunction
 ===========
 
+Subfunction which are spawned over the E-switch are created only with devlink
+device, and by default all the SF auxiliary devices are disabled.
+This will allow user to configure the SF before the SF have been fully probed,
+which will save time.
+
+Usage example:
+Create SF:
+$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
+$ devlink port function set pci/0000:08:00.0/32768 \
+               hw_addr 00:00:00:00:00:11 state active
+
+Enable ETH auxiliary device:
+$ devlink dev param set auxiliary/mlx5_core.sf.1 \
+              name enable_eth value true cmode driverinit
+
+Now, in order to fully probe the SF, use devlink reload:
+$ devlink dev reload auxiliary/mlx5_core.sf.1
+
+mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst`)
+
 mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
 
 A subfunction has its own function capabilities and its own resources. This
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 1b33533b15de..617ac7e5d75c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -323,6 +323,18 @@ static void del_adev(struct auxiliary_device *adev)
 	auxiliary_device_uninit(adev);
 }
 
+void mlx5_dev_set_lightweight(struct mlx5_core_dev *dev)
+{
+	mutex_lock(&mlx5_intf_mutex);
+	dev->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
+	mutex_unlock(&mlx5_intf_mutex);
+}
+
+bool mlx5_dev_is_lightweight(struct mlx5_core_dev *dev)
+{
+	return dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
+}
+
 int mlx5_attach_device(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
@@ -457,6 +469,10 @@ static int add_drivers(struct mlx5_core_dev *dev)
 		if (priv->adev[i])
 			continue;
 
+		if (mlx5_adev_devices[i].is_enabled &&
+		    !(mlx5_adev_devices[i].is_enabled(dev)))
+			continue;
+
 		if (mlx5_adev_devices[i].is_supported)
 			is_supported = mlx5_adev_devices[i].is_supported(dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 27197acdb4d8..3d82ec890666 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -141,6 +141,13 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 	bool sf_dev_allocated;
 	int ret = 0;
 
+	if (mlx5_dev_is_lightweight(dev)) {
+		if (action != DEVLINK_RELOAD_ACTION_DRIVER_REINIT)
+			return -EOPNOTSUPP;
+		mlx5_unload_one_light(dev);
+		return 0;
+	}
+
 	sf_dev_allocated = mlx5_sf_dev_allocated(dev);
 	if (sf_dev_allocated) {
 		/* Reload results in deleting SF device which further results in
@@ -193,6 +200,10 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a
 	*actions_performed = BIT(action);
 	switch (action) {
 	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
+		if (mlx5_dev_is_lightweight(dev)) {
+			mlx5_fw_reporters_create(dev);
+			return mlx5_init_one_devl_locked(dev);
+		}
 		ret = mlx5_load_one_devl_locked(dev, false);
 		break;
 	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
@@ -511,7 +522,7 @@ static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 	union devlink_param_value value;
 
-	value.vbool = MLX5_CAP_GEN(dev, roce);
+	value.vbool = MLX5_CAP_GEN(dev, roce) && !mlx5_dev_is_lightweight(dev);
 	devl_param_driverinit_value_set(devlink,
 					DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
 					value);
@@ -561,7 +572,7 @@ static int mlx5_devlink_eth_params_register(struct devlink *devlink)
 	if (err)
 		return err;
 
-	value.vbool = true;
+	value.vbool = !mlx5_dev_is_lightweight(dev);
 	devl_param_driverinit_value_set(devlink,
 					DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH,
 					value);
@@ -601,6 +612,7 @@ static const struct devlink_param mlx5_devlink_rdma_params[] = {
 
 static int mlx5_devlink_rdma_params_register(struct devlink *devlink)
 {
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
 	union devlink_param_value value;
 	int err;
 
@@ -612,7 +624,7 @@ static int mlx5_devlink_rdma_params_register(struct devlink *devlink)
 	if (err)
 		return err;
 
-	value.vbool = true;
+	value.vbool = !mlx5_dev_is_lightweight(dev);
 	devl_param_driverinit_value_set(devlink,
 					DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
 					value);
@@ -647,7 +659,7 @@ static int mlx5_devlink_vnet_params_register(struct devlink *devlink)
 	if (err)
 		return err;
 
-	value.vbool = true;
+	value.vbool = !mlx5_dev_is_lightweight(dev);
 	devl_param_driverinit_value_set(devlink,
 					DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET,
 					value);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 871c32dda66e..210100a4064a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -719,7 +719,7 @@ static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
 #define MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 30000
 #define MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD
 
-static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
+void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_health *health = &dev->priv.health;
 	struct devlink *devlink = priv_to_devlink(dev);
@@ -735,17 +735,17 @@ static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
 	}
 
 	health->fw_reporter =
-		devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
-					       0, dev);
+		devl_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
+					    0, dev);
 	if (IS_ERR(health->fw_reporter))
 		mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n",
 			       PTR_ERR(health->fw_reporter));
 
 	health->fw_fatal_reporter =
-		devlink_health_reporter_create(devlink,
-					       &mlx5_fw_fatal_reporter_ops,
-					       grace_period,
-					       dev);
+		devl_health_reporter_create(devlink,
+					    &mlx5_fw_fatal_reporter_ops,
+					    grace_period,
+					    dev);
 	if (IS_ERR(health->fw_fatal_reporter))
 		mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %ld\n",
 			       PTR_ERR(health->fw_fatal_reporter));
@@ -777,7 +777,8 @@ void mlx5_trigger_health_work(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_health *health = &dev->priv.health;
 
-	queue_work(health->wq, &health->fatal_report_work);
+	if (!mlx5_dev_is_lightweight(dev))
+		queue_work(health->wq, &health->fatal_report_work);
 }
 
 #define MLX5_MSEC_PER_HOUR (MSEC_PER_SEC * 60 * 60)
@@ -905,10 +906,15 @@ void mlx5_health_cleanup(struct mlx5_core_dev *dev)
 
 int mlx5_health_init(struct mlx5_core_dev *dev)
 {
+	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_core_health *health;
 	char *name;
 
-	mlx5_fw_reporters_create(dev);
+	if (!mlx5_dev_is_lightweight(dev)) {
+		devl_lock(devlink);
+		mlx5_fw_reporters_create(dev);
+		devl_unlock(devlink);
+	}
 	mlx5_reporter_vnic_create(dev);
 
 	health = &dev->priv.health;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0faae77d84e6..6fa314f8e5ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1424,12 +1424,11 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
 	mlx5_put_uars_page(dev, dev->priv.uar);
 }
 
-int mlx5_init_one(struct mlx5_core_dev *dev)
+int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev)
 {
-	struct devlink *devlink = priv_to_devlink(dev);
+	bool light_probe = mlx5_dev_is_lightweight(dev);
 	int err = 0;
 
-	devl_lock(devlink);
 	mutex_lock(&dev->intf_state_mutex);
 	dev->state = MLX5_DEVICE_STATE_UP;
 
@@ -1443,9 +1442,14 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 		goto function_teardown;
 	}
 
-	err = mlx5_devlink_params_register(priv_to_devlink(dev));
-	if (err)
-		goto err_devlink_params_reg;
+	/* In case of light_probe, mlx5_devlink is already registered.
+	 * Hence, don't register devlink again.
+	 */
+	if (!light_probe) {
+		err = mlx5_devlink_params_register(priv_to_devlink(dev));
+		if (err)
+			goto err_devlink_params_reg;
+	}
 
 	err = mlx5_load(dev);
 	if (err)
@@ -1458,14 +1462,14 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 		goto err_register;
 
 	mutex_unlock(&dev->intf_state_mutex);
-	devl_unlock(devlink);
 	return 0;
 
 err_register:
 	clear_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
 	mlx5_unload(dev);
 err_load:
-	mlx5_devlink_params_unregister(priv_to_devlink(dev));
+	if (!light_probe)
+		mlx5_devlink_params_unregister(priv_to_devlink(dev));
 err_devlink_params_reg:
 	mlx5_cleanup_once(dev);
 function_teardown:
@@ -1473,6 +1477,16 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 err_function:
 	dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
 	mutex_unlock(&dev->intf_state_mutex);
+	return err;
+}
+
+int mlx5_init_one(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	int err;
+
+	devl_lock(devlink);
+	err = mlx5_init_one_devl_locked(dev);
 	devl_unlock(devlink);
 	return err;
 }
@@ -1590,6 +1604,100 @@ void mlx5_unload_one(struct mlx5_core_dev *dev, bool suspend)
 	devl_unlock(devlink);
 }
 
+/* In case of light probe, we don't need a full query of hca_caps, but only the bellow caps.
+ * A full query of hca_caps will be done when the device will reload.
+ */
+static int mlx5_query_hca_caps_light(struct mlx5_core_dev *dev)
+{
+	int err;
+
+	err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL);
+	if (err)
+		return err;
+
+	if (MLX5_CAP_GEN(dev, eth_net_offloads)) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_ETHERNET_OFFLOADS);
+		if (err)
+			return err;
+	}
+
+	if (MLX5_CAP_GEN(dev, nic_flow_table) ||
+	    MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
+		if (err)
+			return err;
+	}
+
+	if (MLX5_CAP_GEN_64(dev, general_obj_types) &
+		MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_VDPA_EMULATION);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+int mlx5_init_one_light(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	int err;
+
+	dev->state = MLX5_DEVICE_STATE_UP;
+	err = mlx5_function_enable(dev, true, mlx5_tout_ms(dev, FW_PRE_INIT_TIMEOUT));
+	if (err) {
+		mlx5_core_warn(dev, "mlx5_function_enable err=%d\n", err);
+		goto out;
+	}
+
+	err = mlx5_query_hca_caps_light(dev);
+	if (err) {
+		mlx5_core_warn(dev, "mlx5_query_hca_caps_light err=%d\n", err);
+		goto query_hca_caps_err;
+	}
+
+	devl_lock(devlink);
+	err = mlx5_devlink_params_register(priv_to_devlink(dev));
+	devl_unlock(devlink);
+	if (err) {
+		mlx5_core_warn(dev, "mlx5_devlink_param_reg err = %d\n", err);
+		goto query_hca_caps_err;
+	}
+
+	return 0;
+
+query_hca_caps_err:
+	mlx5_function_disable(dev, true);
+out:
+	dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+	return err;
+}
+
+void mlx5_uninit_one_light(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+
+	devl_lock(devlink);
+	mlx5_devlink_params_unregister(priv_to_devlink(dev));
+	devl_unlock(devlink);
+	if (dev->state != MLX5_DEVICE_STATE_UP)
+		return;
+	mlx5_function_disable(dev, true);
+}
+
+/* xxx_light() function are used in order to configure the device without full
+ * init (light init). e.g.: There isn't a point in reload a device to light state.
+ * Hence, mlx5_load_one_light() isn't needed.
+ */
+
+void mlx5_unload_one_light(struct mlx5_core_dev *dev)
+{
+	if (dev->state != MLX5_DEVICE_STATE_UP)
+		return;
+	mlx5_function_disable(dev, false);
+	dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+}
+
 static const int types[] = {
 	MLX5_CAP_GENERAL,
 	MLX5_CAP_GENERAL_2,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 7a5f04082058..464c6885a01c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -240,11 +240,14 @@ int mlx5_attach_device(struct mlx5_core_dev *dev);
 void mlx5_detach_device(struct mlx5_core_dev *dev, bool suspend);
 int mlx5_register_device(struct mlx5_core_dev *dev);
 void mlx5_unregister_device(struct mlx5_core_dev *dev);
+void mlx5_dev_set_lightweight(struct mlx5_core_dev *dev);
+bool mlx5_dev_is_lightweight(struct mlx5_core_dev *dev);
 struct mlx5_core_dev *mlx5_get_next_phys_dev_lag(struct mlx5_core_dev *dev);
 void mlx5_dev_list_lock(void);
 void mlx5_dev_list_unlock(void);
 int mlx5_dev_list_trylock(void);
 
+void mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
 int mlx5_query_mtpps(struct mlx5_core_dev *dev, u32 *mtpps, u32 mtpps_size);
 int mlx5_set_mtpps(struct mlx5_core_dev *mdev, u32 *mtpps, u32 mtpps_size);
 int mlx5_query_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 *arm, u8 *mode);
@@ -319,11 +322,15 @@ static inline bool mlx5_core_is_sf(const struct mlx5_core_dev *dev)
 int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx);
 void mlx5_mdev_uninit(struct mlx5_core_dev *dev);
 int mlx5_init_one(struct mlx5_core_dev *dev);
+int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev);
 void mlx5_uninit_one(struct mlx5_core_dev *dev);
 void mlx5_unload_one(struct mlx5_core_dev *dev, bool suspend);
 void mlx5_unload_one_devl_locked(struct mlx5_core_dev *dev, bool suspend);
 int mlx5_load_one(struct mlx5_core_dev *dev, bool recovery);
 int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery);
+int mlx5_init_one_light(struct mlx5_core_dev *dev);
+void mlx5_uninit_one_light(struct mlx5_core_dev *dev);
+void mlx5_unload_one_light(struct mlx5_core_dev *dev);
 
 int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, u16 vport,
 				  u16 opmod);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index 0692363cf80e..8fe82f1191bb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -3,6 +3,7 @@
 
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/device.h>
+#include <linux/mlx5/eswitch.h>
 #include "mlx5_core.h"
 #include "dev.h"
 #include "devlink.h"
@@ -28,6 +29,10 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
 	mdev->priv.adev_idx = adev->id;
 	sf_dev->mdev = mdev;
 
+	/* Only local SFs do light probe */
+	if (MLX5_ESWITCH_MANAGER(sf_dev->parent_mdev))
+		mlx5_dev_set_lightweight(mdev);
+
 	err = mlx5_mdev_init(mdev, MLX5_SF_PROF);
 	if (err) {
 		mlx5_core_warn(mdev, "mlx5_mdev_init on err=%d\n", err);
@@ -41,7 +46,10 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
 		goto remap_err;
 	}
 
-	err = mlx5_init_one(mdev);
+	if (MLX5_ESWITCH_MANAGER(sf_dev->parent_mdev))
+		err = mlx5_init_one_light(mdev);
+	else
+		err = mlx5_init_one(mdev);
 	if (err) {
 		mlx5_core_warn(mdev, "mlx5_init_one err=%d\n", err);
 		goto init_one_err;
@@ -65,7 +73,10 @@ static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
 
 	mlx5_drain_health_wq(sf_dev->mdev);
 	devlink_unregister(devlink);
-	mlx5_uninit_one(sf_dev->mdev);
+	if (mlx5_dev_is_lightweight(sf_dev->mdev))
+		mlx5_uninit_one_light(sf_dev->mdev);
+	else
+		mlx5_uninit_one(sf_dev->mdev);
 	iounmap(sf_dev->mdev->iseg);
 	mlx5_mdev_uninit(sf_dev->mdev);
 	mlx5_devlink_free(devlink);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next 15/15] net/mlx5e: Remove a useless function call
  2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2023-06-10  1:42 ` [net-next 14/15] net/mlx5: Light probe local SFs Saeed Mahameed
@ 2023-06-10  1:42 ` Saeed Mahameed
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-10  1:42 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Christophe JAILLET, Simon Horman

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

'handle' is known to be NULL here. There is no need to kfree() it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
index 0290e0dea539..4e923a2874ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
@@ -112,10 +112,8 @@ mlx5e_tc_post_act_add(struct mlx5e_post_act *post_act, struct mlx5_flow_attr *po
 	int err;
 
 	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
-	if (!handle) {
-		kfree(handle);
+	if (!handle)
 		return ERR_PTR(-ENOMEM);
-	}
 
 	post_attr->chain = 0;
 	post_attr->prio = 0;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-10  1:42 ` [net-next 14/15] net/mlx5: Light probe local SFs Saeed Mahameed
@ 2023-06-10  7:01   ` Jakub Kicinski
  2023-06-11  4:15     ` Saeed Mahameed
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-10  7:01 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Fri,  9 Jun 2023 18:42:53 -0700 Saeed Mahameed wrote:
> In case user wants to configure the SFs, for example: to use only vdpa
> functionality, he needs to fully probe a SF, configure what he wants,
> and afterward reload the SF.
> 
> In order to save the time of the reload, local SFs will probe without
> any auxiliary sub-device, so that the SFs can be configured prior to
> its full probe.

I feel like we talked about this at least twice already, and I keep
saying that the features should be specified when the device is
spawned. Am I misremembering?

Will this patch not surprise existing users? You're changing the
defaults. Does "local" mean on the IPU? Also "lightweight" feels
uncomfortably close to marketing language.

> The defaults of the enable_* devlink params of these SFs are set to
> false.
> 
> Usage example:

Is this a real example? Because we have..

> Create SF:
> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11

sfnum 11 here

> $ devlink port function set pci/0000:08:00.0/32768 \

then port is 32768

>                hw_addr 00:00:00:00:00:11 state active
> 
> Enable ETH auxiliary device:
> $ devlink dev param set auxiliary/mlx5_core.sf.1 \

and instance is sf.1 

>               name enable_eth value true cmode driverinit
> 
> Now, in order to fully probe the SF, use devlink reload:
> $ devlink dev reload auxiliary/mlx5_core.sf.1
> 
> At this point the user have SF devlink instance with auxiliary device
> for the Ethernet functionality only.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-10  7:01   ` Jakub Kicinski
@ 2023-06-11  4:15     ` Saeed Mahameed
  2023-06-11  5:10       ` Samudrala, Sridhar
  2023-06-12 17:51       ` Jakub Kicinski
  0 siblings, 2 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-11  4:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On 10 Jun 00:01, Jakub Kicinski wrote:
>On Fri,  9 Jun 2023 18:42:53 -0700 Saeed Mahameed wrote:
>> In case user wants to configure the SFs, for example: to use only vdpa
>> functionality, he needs to fully probe a SF, configure what he wants,
>> and afterward reload the SF.
>>
>> In order to save the time of the reload, local SFs will probe without
>> any auxiliary sub-device, so that the SFs can be configured prior to
>> its full probe.
>
>I feel like we talked about this at least twice already, and I keep
>saying that the features should be specified when the device is
>spawned. Am I misremembering?
>

I think we did talk about this, but after internal research we prefer to
avoid adding additional knobs, unless you insist :) .. 
I think we already did a research and we feel that all of our users are
going to re-configure the SF anyway, so why not make all SFs start with
"blank state" ?

>Will this patch not surprise existing users? You're changing the

I think we already checked, the feature is still not widely known.
Let me double check.

>defaults. Does "local" mean on the IPU? Also "lightweight" feels
>uncomfortably close to marketing language.
>

That wasn't out intention, poor choice of words, will reword to "blank SF"

>> The defaults of the enable_* devlink params of these SFs are set to
>> false.
>>
>> Usage example:
>
>Is this a real example? Because we have..
>
>> Create SF:
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>
>sfnum 11 here
>

This an arbitrary user index.

>> $ devlink port function set pci/0000:08:00.0/32768 \
>
>then port is 32768
>

This is the actual HW port index, our SFs indexing start with an offset.

>>                hw_addr 00:00:00:00:00:11 state active
>>
>> Enable ETH auxiliary device:
>> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>
>and instance is sf.1
>

This was the first SF aux dev to be created on the system. :/

It's a mess ha...
  
Maybe we need to set the SF aux device index the same as the user index.
But the HW/port index will always be different, otherwise we will need a map
inside the driver.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-11  4:15     ` Saeed Mahameed
@ 2023-06-11  5:10       ` Samudrala, Sridhar
  2023-06-13 23:41         ` Saeed Mahameed
  2023-06-12 17:51       ` Jakub Kicinski
  1 sibling, 1 reply; 41+ messages in thread
From: Samudrala, Sridhar @ 2023-06-11  5:10 UTC (permalink / raw)
  To: Saeed Mahameed, Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	netdev, Tariq Toukan, Shay Drory, Moshe Shemesh



On 6/10/2023 9:15 PM, Saeed Mahameed wrote:
> On 10 Jun 00:01, Jakub Kicinski wrote:
>> On Fri,  9 Jun 2023 18:42:53 -0700 Saeed Mahameed wrote:
>>> In case user wants to configure the SFs, for example: to use only vdpa
>>> functionality, he needs to fully probe a SF, configure what he wants,
>>> and afterward reload the SF.
>>>
>>> In order to save the time of the reload, local SFs will probe without
>>> any auxiliary sub-device, so that the SFs can be configured prior to
>>> its full probe.
>>
>> I feel like we talked about this at least twice already, and I keep
>> saying that the features should be specified when the device is
>> spawned. Am I misremembering?
>>
> 
> I think we did talk about this, but after internal research we prefer to
> avoid adding additional knobs, unless you insist :) .. I think we 
> already did a research and we feel that all of our users are
> going to re-configure the SF anyway, so why not make all SFs start with
> "blank state" ?

Shouldn't this be a devlink port param to enable/disable a specific 
feature on the SF before it is activated rather than making it a dev 
param on the SF aux device and requiring a devlink reload?

> 
>> Will this patch not surprise existing users? You're changing the
> 
> I think we already checked, the feature is still not widely known.
> Let me double check.
> 
>> defaults. Does "local" mean on the IPU? Also "lightweight" feels
>> uncomfortably close to marketing language.
>>
> 
> That wasn't out intention, poor choice of words, will reword to "blank SF"
> 
>>> The defaults of the enable_* devlink params of these SFs are set to
>>> false.
>>>
>>> Usage example:
>>
>> Is this a real example? Because we have..
>>
>>> Create SF:
>>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>
>> sfnum 11 here
>>
> 
> This an arbitrary user index.
> 
>>> $ devlink port function set pci/0000:08:00.0/32768 \
>>
>> then port is 32768
>>
> 
> This is the actual HW port index, our SFs indexing start with an offset.
> 
>>>                hw_addr 00:00:00:00:00:11 state active
>>>
>>> Enable ETH auxiliary device:
>>> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>>
>> and instance is sf.1
>>
> 
> This was the first SF aux dev to be created on the system. :/
> 
> It's a mess ha...
> 
> Maybe we need to set the SF aux device index the same as the user index.
> But the HW/port index will always be different, otherwise we will need a 
> map
> inside the driver.

Yes. Associating sfnum passed by user when creating a SF with the aux 
device would make it easier for orchestration tools to identify the aux 
device corresponding to a SF.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 01/15] net/mlx5: Simplify unload all rep code
  2023-06-10  1:42 ` [net-next 01/15] net/mlx5: Simplify unload all rep code Saeed Mahameed
@ 2023-06-12 11:00   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 41+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-12 11:00 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: davem, kuba, pabeni, edumazet, saeedm, netdev, tariqt, danielj,
	witu, parav

Hello:

This series was applied to netdev/net-next.git (main)
by Saeed Mahameed <saeedm@nvidia.com>:

On Fri,  9 Jun 2023 18:42:40 -0700 you wrote:
> From: Daniel Jurgens <danielj@nvidia.com>
> 
> Instead of using type specific iterators which are only used in one place
> just traverse the xarray. It will provide suitable ordering based on the
> vport numbers. This will also eliminate the need for changes here when
> new types are added.
> 
> [...]

Here is the summary with links:
  - [net-next,01/15] net/mlx5: Simplify unload all rep code
    https://git.kernel.org/netdev/net-next/c/18a92b054254
  - [net-next,02/15] net/mlx5: mlx5_ifc updates for embedded CPU SRIOV
    https://git.kernel.org/netdev/net-next/c/93b36d0f2892
  - [net-next,03/15] net/mlx5: Enable devlink port for embedded cpu VF vports
    https://git.kernel.org/netdev/net-next/c/dc13180824b7
  - [net-next,04/15] net/mlx5: Update vport caps query/set for EC VFs
    https://git.kernel.org/netdev/net-next/c/9ac0b128248e
  - [net-next,05/15] net/mlx5: Add management of EC VF vports
    https://git.kernel.org/netdev/net-next/c/a7719b29a821
  - [net-next,06/15] net/mlx5: Add/remove peer miss rules for EC VFs
    https://git.kernel.org/netdev/net-next/c/fa3c73eee641
  - [net-next,07/15] net/mlx5: Add new page type for EC VF pages
    https://git.kernel.org/netdev/net-next/c/395ccd6eb49a
  - [net-next,08/15] net/mlx5: Use correct vport when restoring GUIDs
    https://git.kernel.org/netdev/net-next/c/2ee3db806e85
  - [net-next,09/15] net/mlx5: Query correct caps for min msix vectors
    https://git.kernel.org/netdev/net-next/c/42a84a430931
  - [net-next,10/15] net/mlx5: Update SRIOV enable/disable to handle EC/VFs
    https://git.kernel.org/netdev/net-next/c/6d98f314bfca
  - [net-next,11/15] net/mlx5: Set max number of embedded CPU VFs
    https://git.kernel.org/netdev/net-next/c/7057fe561988
  - [net-next,12/15] net/mlx5: Split function_setup() to enable and open functions
    https://git.kernel.org/netdev/net-next/c/2059cf51f318
  - [net-next,13/15] net/mlx5: Move esw multiport devlink param to eswitch code
    https://git.kernel.org/netdev/net-next/c/3f90840305e2
  - [net-next,14/15] net/mlx5: Light probe local SFs
    https://git.kernel.org/netdev/net-next/c/e71383fb9cd1
  - [net-next,15/15] net/mlx5e: Remove a useless function call
    https://git.kernel.org/netdev/net-next/c/978015f7ef92

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-11  4:15     ` Saeed Mahameed
  2023-06-11  5:10       ` Samudrala, Sridhar
@ 2023-06-12 17:51       ` Jakub Kicinski
  2023-06-13 23:32         ` Saeed Mahameed
  1 sibling, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-12 17:51 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Sat, 10 Jun 2023 21:15:57 -0700 Saeed Mahameed wrote:
> On 10 Jun 00:01, Jakub Kicinski wrote:
> >On Fri,  9 Jun 2023 18:42:53 -0700 Saeed Mahameed wrote:  
> >> In case user wants to configure the SFs, for example: to use only vdpa
> >> functionality, he needs to fully probe a SF, configure what he wants,
> >> and afterward reload the SF.
> >>
> >> In order to save the time of the reload, local SFs will probe without
> >> any auxiliary sub-device, so that the SFs can be configured prior to
> >> its full probe.  
> >
> >I feel like we talked about this at least twice already, and I keep
> >saying that the features should be specified when the device is
> >spawned. Am I misremembering?
> 
> I think we did talk about this, but after internal research we prefer to
> avoid adding additional knobs, unless you insist :) .. 
> I think we already did a research and we feel that all of our users are
> going to re-configure the SF anyway, so why not make all SFs start with
> "blank state" ?

In the container world, at least, I would have thought that the
management daemon gets a full spec of the container its starting
upfront. So going thru this spawn / config / futz / reset cycle
is pure boilerplate pain.

What use cases are you considering? More VM-oriented?

> >> The defaults of the enable_* devlink params of these SFs are set to
> >> false.
> >>
> >> Usage example:  
> >
> >Is this a real example? Because we have..
> >  
> >> Create SF:
> >> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11  
> >
> >sfnum 11 here
> 
> This an arbitrary user index.
> 
> >> $ devlink port function set pci/0000:08:00.0/32768 \  
> >
> >then port is 32768
> 
> This is the actual HW port index, our SFs indexing start with an offset.
> 
> >>                hw_addr 00:00:00:00:00:11 state active
> >>
> >> Enable ETH auxiliary device:
> >> $ devlink dev param set auxiliary/mlx5_core.sf.1 \  
> >
> >and instance is sf.1
> 
> This was the first SF aux dev to be created on the system. :/
> 
> It's a mess ha...
>   
> Maybe we need to set the SF aux device index the same as the user index.
> But the HW/port index will always be different, otherwise we will need a map
> inside the driver.

It'd be best to synchronously return to the user what the ID of the
allocated entity is. It should be possible with some core changes to
rig up devlink to return the sfnum and port ID. But IDK about the new
devlink instance :(

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-12 17:51       ` Jakub Kicinski
@ 2023-06-13 23:32         ` Saeed Mahameed
  2023-06-14  2:05           ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-13 23:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On 12 Jun 10:51, Jakub Kicinski wrote:
>On Sat, 10 Jun 2023 21:15:57 -0700 Saeed Mahameed wrote:
>> On 10 Jun 00:01, Jakub Kicinski wrote:
>> >On Fri,  9 Jun 2023 18:42:53 -0700 Saeed Mahameed wrote:
>> >> In case user wants to configure the SFs, for example: to use only vdpa
>> >> functionality, he needs to fully probe a SF, configure what he wants,
>> >> and afterward reload the SF.
>> >>
>> >> In order to save the time of the reload, local SFs will probe without
>> >> any auxiliary sub-device, so that the SFs can be configured prior to
>> >> its full probe.
>> >
>> >I feel like we talked about this at least twice already, and I keep
>> >saying that the features should be specified when the device is
>> >spawned. Am I misremembering?
>>
>> I think we did talk about this, but after internal research we prefer to
>> avoid adding additional knobs, unless you insist :) ..
>> I think we already did a research and we feel that all of our users are
>> going to re-configure the SF anyway, so why not make all SFs start with
>> "blank state" ?
>
>In the container world, at least, I would have thought that the
>management daemon gets a full spec of the container its starting
>upfront. So going thru this spawn / config / futz / reset cycle
>is pure boilerplate pain.
>

That's the point of the series. create / config / spawn.

personally I like that the SF object is created blank, with dev handles
(devlink/aux) to configure it, and spawn it when ready.
I don't see a point of having an extra "blank state" devlink param.

>What use cases are you considering? More VM-oriented?
>

Mostly container oriented, and selecting the ULP stacks, e.g RDMA, VDPA,
virtio, netdev, etc .. 


>> >> The defaults of the enable_* devlink params of these SFs are set to
>> >> false.
>> >>
>> >> Usage example:
>> >
>> >Is this a real example? Because we have..
>> >
>> >> Create SF:
>> >> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>> >
>> >sfnum 11 here
>>
>> This an arbitrary user index.
>>
>> >> $ devlink port function set pci/0000:08:00.0/32768 \
>> >
>> >then port is 32768
>>
>> This is the actual HW port index, our SFs indexing start with an offset.
>>
>> >>                hw_addr 00:00:00:00:00:11 state active
>> >>
>> >> Enable ETH auxiliary device:
>> >> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>> >
>> >and instance is sf.1
>>
>> This was the first SF aux dev to be created on the system. :/
>>
>> It's a mess ha...
>>
>> Maybe we need to set the SF aux device index the same as the user index.
>> But the HW/port index will always be different, otherwise we will need a map
>> inside the driver.
>
>It'd be best to synchronously return to the user what the ID of the
>allocated entity is. It should be possible with some core changes to
>rig up devlink to return the sfnum and port ID. But IDK about the new
>devlink instance :(

I think that's possible, let me ask the team to take a shot at this.. 

I am not sure I understand what you mean by "new devlink instance".

SF creation will result in spawning two devlink handles, the SF function port of
on the eswitch and the SF device devlink instance..



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-11  5:10       ` Samudrala, Sridhar
@ 2023-06-13 23:41         ` Saeed Mahameed
  0 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2023-06-13 23:41 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Saeed Mahameed, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On 10 Jun 22:10, Samudrala, Sridhar wrote:
>
>
>On 6/10/2023 9:15 PM, Saeed Mahameed wrote:
>>On 10 Jun 00:01, Jakub Kicinski wrote:
>>>On Fri,  9 Jun 2023 18:42:53 -0700 Saeed Mahameed wrote:
>>>>In case user wants to configure the SFs, for example: to use only vdpa
>>>>functionality, he needs to fully probe a SF, configure what he wants,
>>>>and afterward reload the SF.
>>>>
>>>>In order to save the time of the reload, local SFs will probe without
>>>>any auxiliary sub-device, so that the SFs can be configured prior to
>>>>its full probe.
>>>
>>>I feel like we talked about this at least twice already, and I keep
>>>saying that the features should be specified when the device is
>>>spawned. Am I misremembering?
>>>
>>
>>I think we did talk about this, but after internal research we prefer to
>>avoid adding additional knobs, unless you insist :) .. I think we 
>>already did a research and we feel that all of our users are
>>going to re-configure the SF anyway, so why not make all SFs start with
>>"blank state" ?
>
>Shouldn't this be a devlink port param to enable/disable a specific 
>feature on the SF before it is activated rather than making it a dev 
>param on the SF aux device and requiring a devlink reload?
>

Specific virtual HCA/SF/VF features are not directly related to what aux
devices to expose, this is a separate feature that we are currently
working on.

We are going to add devlink port params to have a granular control of what
features the SF/VF will expose, e.g ipsec, tls, etc ..  

These would affect the internal characteristic of the aux ULPs (netdev,
rdma,vdpa) .. 

But in this series, we improve the orchestration process of SF and what aux
devs would spawn with it by default.. we already have an API to
enable/disable what aux devs an SF would have, here we just improve the
sequence of creating the SF.

>>
>>>Will this patch not surprise existing users? You're changing the
>>
>>I think we already checked, the feature is still not widely known.
>>Let me double check.
>>
>>>defaults. Does "local" mean on the IPU? Also "lightweight" feels
>>>uncomfortably close to marketing language.
>>>
>>
>>That wasn't out intention, poor choice of words, will reword to "blank SF"
>>
>>>>The defaults of the enable_* devlink params of these SFs are set to
>>>>false.
>>>>
>>>>Usage example:
>>>
>>>Is this a real example? Because we have..
>>>
>>>>Create SF:
>>>>$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>>
>>>sfnum 11 here
>>>
>>
>>This an arbitrary user index.
>>
>>>>$ devlink port function set pci/0000:08:00.0/32768 \
>>>
>>>then port is 32768
>>>
>>
>>This is the actual HW port index, our SFs indexing start with an offset.
>>
>>>>               hw_addr 00:00:00:00:00:11 state active
>>>>
>>>>Enable ETH auxiliary device:
>>>>$ devlink dev param set auxiliary/mlx5_core.sf.1 \
>>>
>>>and instance is sf.1
>>>
>>
>>This was the first SF aux dev to be created on the system. :/
>>
>>It's a mess ha...
>>
>>Maybe we need to set the SF aux device index the same as the user index.
>>But the HW/port index will always be different, otherwise we will 
>>need a map
>>inside the driver.
>
>Yes. Associating sfnum passed by user when creating a SF with the aux 
>device would make it easier for orchestration tools to identify the 
>aux device corresponding to a SF.
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-13 23:32         ` Saeed Mahameed
@ 2023-06-14  2:05           ` Jakub Kicinski
  2023-06-15 10:51             ` Jiri Pirko
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-14  2:05 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Tue, 13 Jun 2023 16:32:07 -0700 Saeed Mahameed wrote:
> On 12 Jun 10:51, Jakub Kicinski wrote:
> >On Sat, 10 Jun 2023 21:15:57 -0700 Saeed Mahameed wrote:  
> >> I think we did talk about this, but after internal research we prefer to
> >> avoid adding additional knobs, unless you insist :) ..
> >> I think we already did a research and we feel that all of our users are
> >> going to re-configure the SF anyway, so why not make all SFs start with
> >> "blank state" ?  
> >
> >In the container world, at least, I would have thought that the
> >management daemon gets a full spec of the container its starting
> >upfront. So going thru this spawn / config / futz / reset cycle
> >is pure boilerplate pain.
> 
> That's the point of the series. create / config / spawn.
> 
> personally I like that the SF object is created blank, with dev handles
> (devlink/aux) to configure it, and spawn it when ready.

I think we had this discussion before, wasn't the initial proposal for SF
along those lines? And we're slowly trending back towards ports in
uninitialized state. It's okay, too late now.

> I don't see a point of having an extra "blank state" devlink param.

Yeah, the param would be worse of both worlds. 
We'll need to ensure consistency in other vendors, tho.

> >What use cases are you considering? More VM-oriented?
> 
> Mostly container oriented, and selecting the ULP stacks, e.g RDMA, VDPA,
> virtio, netdev, etc .. 

Odd, okay.

> >> This was the first SF aux dev to be created on the system. :/
> >>
> >> It's a mess ha...
> >>
> >> Maybe we need to set the SF aux device index the same as the user index.
> >> But the HW/port index will always be different, otherwise we will need a map
> >> inside the driver.  
> >
> >It'd be best to synchronously return to the user what the ID of the
> >allocated entity is. It should be possible with some core changes to
> >rig up devlink to return the sfnum and port ID. But IDK about the new
> >devlink instance :(  
> 
> I think that's possible, let me ask the team to take a shot at this.. 
> 
> I am not sure I understand what you mean by "new devlink instance".
> 
> SF creation will result in spawning two devlink handles, the SF function port of
> on the eswitch and the SF device devlink instance..

Yes, I mean "SF device devlink instance" by "new devlink instance".

In theory this should all be doable with netlink. NLM_F_ECHO should
loop all notifications back to the requester. The tricky part is
catching the notifications, I'm guessing, because in theory the devlink
instance spawning may be async for locking reasons? Hopefully not,
then it's easy..

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-14  2:05           ` Jakub Kicinski
@ 2023-06-15 10:51             ` Jiri Pirko
  2023-06-15 16:37               ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2023-06-15 10:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Wed, Jun 14, 2023 at 04:05:52AM CEST, kuba@kernel.org wrote:
>On Tue, 13 Jun 2023 16:32:07 -0700 Saeed Mahameed wrote:
>> On 12 Jun 10:51, Jakub Kicinski wrote:
>> >On Sat, 10 Jun 2023 21:15:57 -0700 Saeed Mahameed wrote:  
>> >> I think we did talk about this, but after internal research we prefer to
>> >> avoid adding additional knobs, unless you insist :) ..
>> >> I think we already did a research and we feel that all of our users are
>> >> going to re-configure the SF anyway, so why not make all SFs start with
>> >> "blank state" ?  
>> >
>> >In the container world, at least, I would have thought that the
>> >management daemon gets a full spec of the container its starting
>> >upfront. So going thru this spawn / config / futz / reset cycle
>> >is pure boilerplate pain.
>> 
>> That's the point of the series. create / config / spawn.
>> 
>> personally I like that the SF object is created blank, with dev handles
>> (devlink/aux) to configure it, and spawn it when ready.
>
>I think we had this discussion before, wasn't the initial proposal for SF
>along those lines? And we're slowly trending back towards ports in
>uninitialized state. It's okay, too late now.
>
>> I don't see a point of having an extra "blank state" devlink param.
>
>Yeah, the param would be worse of both worlds. 
>We'll need to ensure consistency in other vendors, tho.
>
>> >What use cases are you considering? More VM-oriented?
>> 
>> Mostly container oriented, and selecting the ULP stacks, e.g RDMA, VDPA,
>> virtio, netdev, etc .. 
>
>Odd, okay.
>
>> >> This was the first SF aux dev to be created on the system. :/
>> >>
>> >> It's a mess ha...
>> >>
>> >> Maybe we need to set the SF aux device index the same as the user index.
>> >> But the HW/port index will always be different, otherwise we will need a map
>> >> inside the driver.  
>> >
>> >It'd be best to synchronously return to the user what the ID of the
>> >allocated entity is. It should be possible with some core changes to
>> >rig up devlink to return the sfnum and port ID. But IDK about the new
>> >devlink instance :(  
>> 
>> I think that's possible, let me ask the team to take a shot at this.. 
>> 
>> I am not sure I understand what you mean by "new devlink instance".
>> 
>> SF creation will result in spawning two devlink handles, the SF function port of
>> on the eswitch and the SF device devlink instance..
>
>Yes, I mean "SF device devlink instance" by "new devlink instance".
>
>In theory this should all be doable with netlink. NLM_F_ECHO should
>loop all notifications back to the requester. The tricky part is
>catching the notifications, I'm guessing, because in theory the devlink
>instance spawning may be async for locking reasons? Hopefully not,
>then it's easy..

The problem is scalability. SFs could be activated in parallel, but the
cmd that is doing that holds devlink instance lock. That serializes it.
So we need to either:
1) change the devlink locking to be able to execute some of the cmds in
   parallel and leave the activation sync
2) change the activation to be async and work with notifications

I like 2) better, as the 1) maze we just got out of recently :)
WDYT?

>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-15 10:51             ` Jiri Pirko
@ 2023-06-15 16:37               ` Jakub Kicinski
  2023-06-15 17:37                 ` Jiri Pirko
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-15 16:37 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Thu, 15 Jun 2023 12:51:09 +0200 Jiri Pirko wrote:
> The problem is scalability. SFs could be activated in parallel, but the
> cmd that is doing that holds devlink instance lock. That serializes it.
> So we need to either:
> 1) change the devlink locking to be able to execute some of the cmds in
>    parallel and leave the activation sync
> 2) change the activation to be async and work with notifications
> 
> I like 2) better, as the 1) maze we just got out of recently :)
> WDYT?

I guess we don't need to wait for the full activation. Is the port
creation also async, then, or just the SF devlink instance creation?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-15 16:37               ` Jakub Kicinski
@ 2023-06-15 17:37                 ` Jiri Pirko
  2023-06-15 19:33                   ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2023-06-15 17:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Thu, Jun 15, 2023 at 06:37:01PM CEST, kuba@kernel.org wrote:
>On Thu, 15 Jun 2023 12:51:09 +0200 Jiri Pirko wrote:
>> The problem is scalability. SFs could be activated in parallel, but the
>> cmd that is doing that holds devlink instance lock. That serializes it.
>> So we need to either:
>> 1) change the devlink locking to be able to execute some of the cmds in
>>    parallel and leave the activation sync
>> 2) change the activation to be async and work with notifications
>> 
>> I like 2) better, as the 1) maze we just got out of recently :)
>> WDYT?
>
>I guess we don't need to wait for the full activation. Is the port
>creation also async, then, or just the SF devlink instance creation?

I'm not sure I follow :/
The activation is when the SF auxiliary device is created. The driver then
probes the SF auxiliary device and instantiates everything, SF devlink,
SF netdev, etc.

We need wait/notification for 2 reasons
1) to get the auxiliary device name for the activated
   SF. It is needed for convenience of the orchestration tools.
2) to get the result of the activation (success/fail)
   It is also needed for convenience of the orchestration tools.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-15 17:37                 ` Jiri Pirko
@ 2023-06-15 19:33                   ` Jakub Kicinski
  2023-06-21 13:14                     ` Jiri Pirko
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-15 19:33 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Thu, 15 Jun 2023 19:37:23 +0200 Jiri Pirko wrote:
> Thu, Jun 15, 2023 at 06:37:01PM CEST, kuba@kernel.org wrote:
> >On Thu, 15 Jun 2023 12:51:09 +0200 Jiri Pirko wrote:  
> >> The problem is scalability. SFs could be activated in parallel, but the
> >> cmd that is doing that holds devlink instance lock. That serializes it.
> >> So we need to either:
> >> 1) change the devlink locking to be able to execute some of the cmds in
> >>    parallel and leave the activation sync
> >> 2) change the activation to be async and work with notifications
> >> 
> >> I like 2) better, as the 1) maze we just got out of recently :)
> >> WDYT?  
> >
> >I guess we don't need to wait for the full activation. Is the port
> >creation also async, then, or just the SF devlink instance creation?  
> 
> I'm not sure I follow :/
> The activation is when the SF auxiliary device is created. The driver then
> probes the SF auxiliary device and instantiates everything, SF devlink,
> SF netdev, etc.

Sorry, maybe let's look at an example:

$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11

needs to print / return the handle of the created port.

$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

needs to print / return the handle of the devlink instance

> We need wait/notification for 2 reasons
> 1) to get the auxiliary device name for the activated
>    SF. It is needed for convenience of the orchestration tools.
> 2) to get the result of the activation (success/fail)
>    It is also needed for convenience of the orchestration tools.

Are you saying the activation already waits for the devlink instance to
be spawned? If so that's great, all we need to do is for the:

$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

to either return sufficient info for the orchestration to know what the
resulting SF / SF devlink instance is. Most likely indirectly by adding
that info to the port so that the PORT_NEW notification carries it.

Did I confuse things even more?

As a reminder what sparked this convo is that user specifies "sfnum 11"
in the example, and the sf device gets called "sf.1".

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-15 19:33                   ` Jakub Kicinski
@ 2023-06-21 13:14                     ` Jiri Pirko
  2023-06-21 18:23                       ` Jakub Kicinski
  2023-06-22  6:38                       ` Jiri Pirko
  0 siblings, 2 replies; 41+ messages in thread
From: Jiri Pirko @ 2023-06-21 13:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Thu, Jun 15, 2023 at 09:33:25PM CEST, kuba@kernel.org wrote:
>On Thu, 15 Jun 2023 19:37:23 +0200 Jiri Pirko wrote:
>> Thu, Jun 15, 2023 at 06:37:01PM CEST, kuba@kernel.org wrote:
>> >On Thu, 15 Jun 2023 12:51:09 +0200 Jiri Pirko wrote:  
>> >> The problem is scalability. SFs could be activated in parallel, but the
>> >> cmd that is doing that holds devlink instance lock. That serializes it.
>> >> So we need to either:
>> >> 1) change the devlink locking to be able to execute some of the cmds in
>> >>    parallel and leave the activation sync
>> >> 2) change the activation to be async and work with notifications
>> >> 
>> >> I like 2) better, as the 1) maze we just got out of recently :)
>> >> WDYT?  
>> >
>> >I guess we don't need to wait for the full activation. Is the port
>> >creation also async, then, or just the SF devlink instance creation?  
>> 
>> I'm not sure I follow :/
>> The activation is when the SF auxiliary device is created. The driver then
>> probes the SF auxiliary device and instantiates everything, SF devlink,
>> SF netdev, etc.
>
>Sorry, maybe let's look at an example:
>
>$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>
>needs to print / return the handle of the created port.

Yep, that is what happens now.


>
>$ devlink port function set pci/0000:08:00.0/32768 \
>               hw_addr 00:00:00:00:00:11 state active
>
>needs to print / return the handle of the devlink instance

Right, that makes sense. I will look into adding that. It's a bit
tricky, as the instance might actually appear on a different host, then,
there is nothing to return here.


>
>> We need wait/notification for 2 reasons
>> 1) to get the auxiliary device name for the activated
>>    SF. It is needed for convenience of the orchestration tools.
>> 2) to get the result of the activation (success/fail)
>>    It is also needed for convenience of the orchestration tools.
>
>Are you saying the activation already waits for the devlink instance to
>be spawned? If so that's great, all we need to do is for the:

Yep.


>
>$ devlink port function set pci/0000:08:00.0/32768 \
>               hw_addr 00:00:00:00:00:11 state active
>
>to either return sufficient info for the orchestration to know what the
>resulting SF / SF devlink instance is. Most likely indirectly by adding
>that info to the port so that the PORT_NEW notification carries it.

Yeah, we need to add the same info to the GET cmd.


>
>Did I confuse things even more?

No, no confusion. But, the problem with this is that devlink port function set
active blocks until the activation is done holding the devlink instance
lock. That prevents other ports from being activated in parallel. From
driver/FW/HW perspective, we can do that.

So the question is, how to allow this parallelism?



>
>As a reminder what sparked this convo is that user specifies "sfnum 11"
>in the example, and the sf device gets called "sf.1".

Yeah, will look into that as well.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-21 13:14                     ` Jiri Pirko
@ 2023-06-21 18:23                       ` Jakub Kicinski
  2023-06-22  6:42                         ` Jiri Pirko
  2023-06-22  6:38                       ` Jiri Pirko
  1 sibling, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-21 18:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Wed, 21 Jun 2023 15:14:35 +0200 Jiri Pirko wrote:
>> Did I confuse things even more?  
> 
> No, no confusion. But, the problem with this is that devlink port function set
> active blocks until the activation is done holding the devlink instance
> lock. That prevents other ports from being activated in parallel. From
> driver/FW/HW perspective, we can do that.
> 
> So the question is, how to allow this parallelism?

You seem to be concerned about parallelism, maybe you can share some
details / data / calculations? I don't think that we need to hold 
the instance lock just to get the notification but I'd strongly prefer
not to complicate things until problem actually exists.

The recent problems in the rtnl-lock-less flower implementation made me
very cautious about complicating the stack because someone's FW is slow.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-21 13:14                     ` Jiri Pirko
  2023-06-21 18:23                       ` Jakub Kicinski
@ 2023-06-22  6:38                       ` Jiri Pirko
  2023-06-22 16:35                         ` Jakub Kicinski
  1 sibling, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2023-06-22  6:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Wed, Jun 21, 2023 at 03:14:35PM CEST, jiri@resnulli.us wrote:
>Thu, Jun 15, 2023 at 09:33:25PM CEST, kuba@kernel.org wrote:
>>On Thu, 15 Jun 2023 19:37:23 +0200 Jiri Pirko wrote:
>>> Thu, Jun 15, 2023 at 06:37:01PM CEST, kuba@kernel.org wrote:
>>> >On Thu, 15 Jun 2023 12:51:09 +0200 Jiri Pirko wrote:  

[...]

>>As a reminder what sparked this convo is that user specifies "sfnum 11"
>>in the example, and the sf device gets called "sf.1".
>
>Yeah, will look into that as well.

I checked, the misalignment between sfnum and auxdev index.
The problem is that the index space of sfnum is per-devlink instance,
however the index space of auxdev is per module name.
So if you have one devlink instance managing eswitch, in theory we can
map sfnum to auxdev id 1:1. But if you plug-in another physical nic,
second devlink instance managing eswitch appears, then we have an
overlap. I don't see any way out of this, do you?

But, I believe if we add a proper reference between devlink sf port and
the actual sf devlink instace, that would be enough.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-21 18:23                       ` Jakub Kicinski
@ 2023-06-22  6:42                         ` Jiri Pirko
  0 siblings, 0 replies; 41+ messages in thread
From: Jiri Pirko @ 2023-06-22  6:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Wed, Jun 21, 2023 at 08:23:53PM CEST, kuba@kernel.org wrote:
>On Wed, 21 Jun 2023 15:14:35 +0200 Jiri Pirko wrote:
>>> Did I confuse things even more?  
>> 
>> No, no confusion. But, the problem with this is that devlink port function set
>> active blocks until the activation is done holding the devlink instance
>> lock. That prevents other ports from being activated in parallel. From
>> driver/FW/HW perspective, we can do that.
>> 
>> So the question is, how to allow this parallelism?
>
>You seem to be concerned about parallelism, maybe you can share some
>details / data / calculations? I don't think that we need to hold 
>the instance lock just to get the notification but I'd strongly prefer
>not to complicate things until problem actually exists.
>
>The recent problems in the rtnl-lock-less flower implementation made me
>very cautious about complicating the stack because someone's FW is slow.

I agree 100%. That is my stand as well. Lock less devlink commands
are a simple no-go from my perspective. We got out of that mess only
recently.

I just checked mlx5 code, the SF activation does not block. It just
triggers activation of SF in FW and returns. Does not wait for
completion.

Let me figure out the devlink_port<->sf_devlink linkage and I believe
that would be enough for all needed usecases.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-22  6:38                       ` Jiri Pirko
@ 2023-06-22 16:35                         ` Jakub Kicinski
  2023-06-23  9:27                           ` Jiri Pirko
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-22 16:35 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Thu, 22 Jun 2023 08:38:05 +0200 Jiri Pirko wrote:
> I checked, the misalignment between sfnum and auxdev index.
> The problem is that the index space of sfnum is per-devlink instance,
> however the index space of auxdev is per module name.
> So if you have one devlink instance managing eswitch, in theory we can
> map sfnum to auxdev id 1:1. But if you plug-in another physical nic,
> second devlink instance managing eswitch appears, then we have an
> overlap. I don't see any way out of this, do you?
> 
> But, I believe if we add a proper reference between devlink sf port and
> the actual sf devlink instace, that would be enough.

SG. For the IPU case when spawning from within the IPU can we still
figure out what the auxdev id is going to be? If not maybe we should
add some form of UUID on both the port and the sf devlink instance?
Maybe there's already some UUID in the vfio world we can co-opt?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-22 16:35                         ` Jakub Kicinski
@ 2023-06-23  9:27                           ` Jiri Pirko
  2023-06-23 15:21                             ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2023-06-23  9:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Thu, Jun 22, 2023 at 06:35:23PM CEST, kuba@kernel.org wrote:
>On Thu, 22 Jun 2023 08:38:05 +0200 Jiri Pirko wrote:
>> I checked, the misalignment between sfnum and auxdev index.
>> The problem is that the index space of sfnum is per-devlink instance,
>> however the index space of auxdev is per module name.
>> So if you have one devlink instance managing eswitch, in theory we can
>> map sfnum to auxdev id 1:1. But if you plug-in another physical nic,
>> second devlink instance managing eswitch appears, then we have an
>> overlap. I don't see any way out of this, do you?
>> 
>> But, I believe if we add a proper reference between devlink sf port and
>> the actual sf devlink instace, that would be enough.
>
>SG. For the IPU case when spawning from within the IPU can we still
>figure out what the auxdev id is going to be? If not maybe we should

Yeah, the driver is assigning the auxdev id. I'm now trying to figure
out how to pass that to devlink code/user nicely. The devlink instance
for the SF does not exist yet, but we know what the handle is going to
be. Perhaps some sort of place holder instance would do. IDK.


>add some form of UUID on both the port and the sf devlink instance?

What about the MAC?

$ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 102
pci/0000:08:00.0/32769: type eth netdev eth8 flavour pcisf controller 0 pfnum 0 sfnum 102 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ sudo devlink port function set pci/0000:08:00.0/32769 hw_addr AA:22:33:44:55:66
$ sudo devlink port function set pci/0000:08:00.0/32769 state active
$ ip link show eth9
15: eth9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether aa:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff

There are 2 issues with this:
1) If the hw_addr stays zeroed for activation, random MAC is generated
2) On the SF side, the MAC is only seen for netdev. That is problematic
   for SFs without netdev. However, I don't see why we cannot add
   devlink port attr to expose hw_addr on the SF.


>Maybe there's already some UUID in the vfio world we can co-opt?

Let me check that out.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-23  9:27                           ` Jiri Pirko
@ 2023-06-23 15:21                             ` Jakub Kicinski
  2023-06-24  9:33                               ` Jiri Pirko
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-23 15:21 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Fri, 23 Jun 2023 11:27:10 +0200 Jiri Pirko wrote:
> Thu, Jun 22, 2023 at 06:35:23PM CEST, kuba@kernel.org wrote:
> >SG. For the IPU case when spawning from within the IPU can we still
> >figure out what the auxdev id is going to be? If not maybe we should  
> 
> Yeah, the driver is assigning the auxdev id. I'm now trying to figure
> out how to pass that to devlink code/user nicely. The devlink instance
> for the SF does not exist yet, but we know what the handle is going to
> be. Perhaps some sort of place holder instance would do. IDK.
> 
> >add some form of UUID on both the port and the sf devlink instance?  
> 
> What about the MAC?
> 
> $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 102
> pci/0000:08:00.0/32769: type eth netdev eth8 flavour pcisf controller 0 pfnum 0 sfnum 102 splittable false
>   function:
>     hw_addr 00:00:00:00:00:00 state inactive opstate detached
> $ sudo devlink port function set pci/0000:08:00.0/32769 hw_addr AA:22:33:44:55:66
> $ sudo devlink port function set pci/0000:08:00.0/32769 state active
> $ ip link show eth9
> 15: eth9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ether aa:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff
> 
> There are 2 issues with this:
> 1) If the hw_addr stays zeroed for activation, random MAC is generated
> 2) On the SF side, the MAC is only seen for netdev. That is problematic
>    for SFs without netdev. However, I don't see why we cannot add
>    devlink port attr to expose hw_addr on the SF.

Yeah, the fact that mac add of zero has special meaning could be
problematic. Other vendors may get "inspired" by the legacy SR-IOV
semantics of MAC addr of zero == user can decide, or whatnot.
The value on the port is the admin-set MAC, not the current MAC
of the SF after all, right? Probably best to find another way...

> >Maybe there's already some UUID in the vfio world we can co-opt?  
> 
> Let me check that out.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-23 15:21                             ` Jakub Kicinski
@ 2023-06-24  9:33                               ` Jiri Pirko
  2023-06-24 20:47                                 ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2023-06-24  9:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Fri, Jun 23, 2023 at 05:21:08PM CEST, kuba@kernel.org wrote:
>On Fri, 23 Jun 2023 11:27:10 +0200 Jiri Pirko wrote:
>> Thu, Jun 22, 2023 at 06:35:23PM CEST, kuba@kernel.org wrote:
>> >SG. For the IPU case when spawning from within the IPU can we still
>> >figure out what the auxdev id is going to be? If not maybe we should  
>> 
>> Yeah, the driver is assigning the auxdev id. I'm now trying to figure
>> out how to pass that to devlink code/user nicely. The devlink instance
>> for the SF does not exist yet, but we know what the handle is going to
>> be. Perhaps some sort of place holder instance would do. IDK.
>> 
>> >add some form of UUID on both the port and the sf devlink instance?  
>> 
>> What about the MAC?
>> 
>> $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 102
>> pci/0000:08:00.0/32769: type eth netdev eth8 flavour pcisf controller 0 pfnum 0 sfnum 102 splittable false
>>   function:
>>     hw_addr 00:00:00:00:00:00 state inactive opstate detached
>> $ sudo devlink port function set pci/0000:08:00.0/32769 hw_addr AA:22:33:44:55:66
>> $ sudo devlink port function set pci/0000:08:00.0/32769 state active
>> $ ip link show eth9
>> 15: eth9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>>     link/ether aa:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff
>> 
>> There are 2 issues with this:
>> 1) If the hw_addr stays zeroed for activation, random MAC is generated
>> 2) On the SF side, the MAC is only seen for netdev. That is problematic
>>    for SFs without netdev. However, I don't see why we cannot add
>>    devlink port attr to expose hw_addr on the SF.
>
>Yeah, the fact that mac add of zero has special meaning could be

True. But I believe we can sanitize this in devlink core, not allowing
to activate zeroed mac. Instead of radom generating of a mac on the SF
side, we random generate it on the eswitch port function side.

I think this can work.


>problematic. Other vendors may get "inspired" by the legacy SR-IOV
>semantics of MAC addr of zero == user can decide, or whatnot.
>The value on the port is the admin-set MAC, not the current MAC

True, that is why I suggested to expose hw_addr attr on devlink port of
the SF instance, where this would be fixed no matter what the user does
on the actual netdevice.


>of the SF after all, right? Probably best to find another way...


Well, yeah. The mac/hw_addr is quite convenient. It's there and
I believe that any device could work with that. Having some kind of
"extra cookie" would require to implement that in FW, which makes things
more complicated.


>
>> >Maybe there's already some UUID in the vfio world we can co-opt?  
>> 
>> Let me check that out.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-24  9:33                               ` Jiri Pirko
@ 2023-06-24 20:47                                 ` Jakub Kicinski
  2023-06-27 10:12                                   ` Jiri Pirko
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-24 20:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Sat, 24 Jun 2023 11:33:20 +0200 Jiri Pirko wrote:
> >of the SF after all, right? Probably best to find another way...  
> 
> Well, yeah. The mac/hw_addr is quite convenient. It's there and
> I believe that any device could work with that. Having some kind of
> "extra cookie" would require to implement that in FW, which makes things
> more complicated.

"Let's piggyback on something else in the uAPI because I don't want 
to extend FW" is not an argument which can be taken seriously.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-24 20:47                                 ` Jakub Kicinski
@ 2023-06-27 10:12                                   ` Jiri Pirko
  2023-06-27 15:24                                     ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2023-06-27 10:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Sat, Jun 24, 2023 at 10:47:03PM CEST, kuba@kernel.org wrote:
>On Sat, 24 Jun 2023 11:33:20 +0200 Jiri Pirko wrote:
>> >of the SF after all, right? Probably best to find another way...  
>> 
>> Well, yeah. The mac/hw_addr is quite convenient. It's there and
>> I believe that any device could work with that. Having some kind of
>> "extra cookie" would require to implement that in FW, which makes things
>> more complicated.
>
>"Let's piggyback on something else in the uAPI because I don't want 
>to extend FW" is not an argument which can be taken seriously.

It's not about what I want or not. It's about what can realistically
happen (not talking about mlx5 specifically).

But what is a difference between using hw_addr and some other uuid.
cmdline wise:

option 1 - JUST HW_ADDR:

$ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 103
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ sudo devlink port function set pci/0000:08:00.0/32770 hw_addr AA:22:33:44:55:66

$ sudo devlink port show pci/0000:08:00.0/32770
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr aa:22:33:44:55:66 state inactive opstate detached

$ sudo devlink port function set pci/0000:08:00.0/32770 state active
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr aa:22:33:44:55:66 state active opstate detached devlink_handle auxiliary/mlx5_core.eth.5
# This is new, currently activation does not give feedback,
# devlink_handle attr is new here

$ sudo devlink port show pci/0000:08:00.0/32770
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr aa:22:33:44:55:66 state active opstate attached devlink_handle auxiliary/mlx5_core.eth.5
# devlink_handle attr is new here

$ sudo devlink dev show auxiliary/mlx5_core.eth.5
auxiliary/mlx5_core.eth.5: hw_addr aa:22:33:44:55:66
# hw_addr attr is new here

hw_addr value is passed through FW here. This value is currently used
for netdevice mac. So it is just a matter of exposing for auxdev
devlink/devlink_port (not 100% sure which)


option 2 - ADDED UUID:

$ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 103
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ sudo devlink port function set pci/0000:08:00.0/32770 hw_addr AA:22:33:44:55:66 uuid SOMETHINGREALLYUNIQUE

$ sudo devlink port show pci/0000:08:00.0/32770
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr aa:22:33:44:55:66 uuid SOMETHINGREALLYUNIQUE state inactive opstate detached
# uuid attr is new here

$ sudo devlink port function set pci/0000:08:00.0/32770 state active
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr aa:22:33:44:55:66 uuid SOMETHINGREALLYUNIQUE state active opstate detached devlink_handle auxiliary/mlx5_core.eth.5
# This is new, currently activation does not give feedback,
# devlink_handle attr is new here, uuid attr is new here

$ sudo devlink port show pci/0000:08:00.0/32770
pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
  function:
    hw_addr aa:22:33:44:55:66 uuid SOMETHINGREALLYUNIQUE state active opstate attached devlink_handle auxiliary/mlx5_core.eth.5
# devlink_handle attr is new here, uuid attr is new here

$ sudo devlink dev show auxiliary/mlx5_core.eth.5
auxiliary/mlx5_core.eth.5: uuid SOMETHINGREALLYUNIQUE
# uuid attr is new here

uuid value is passed through FW here.


The uuid kind of values is nothing new in netdev. We have:
IFLA_PHYS_PORT_ID
IFLA_PHYS_SWITCH_ID
In most of the cases (if not all), these are in drivers filled-up with
MAC. So I'm quite positive that the drivers would implement the auxdev
uuid as a MAC as well.

Do you like option 2 better than option 1? Is there option 3?

Thanks!


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-27 10:12                                   ` Jiri Pirko
@ 2023-06-27 15:24                                     ` Jakub Kicinski
  2023-06-27 17:16                                       ` Jiri Pirko
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-27 15:24 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Tue, 27 Jun 2023 12:12:10 +0200 Jiri Pirko wrote:
> $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 103
> pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
>   function:
>     hw_addr 00:00:00:00:00:00 state inactive opstate detached
> 
> $ sudo devlink port function set pci/0000:08:00.0/32770 hw_addr AA:22:33:44:55:66 uuid SOMETHINGREALLYUNIQUE

Why does the user have to set the uuid? I was expecting it'd pop up 
on the port automatically, generated by the kernel, as a read-only
attribute.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-27 15:24                                     ` Jakub Kicinski
@ 2023-06-27 17:16                                       ` Jiri Pirko
  2023-06-27 17:35                                         ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Jiri Pirko @ 2023-06-27 17:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

Tue, Jun 27, 2023 at 05:24:29PM CEST, kuba@kernel.org wrote:
>On Tue, 27 Jun 2023 12:12:10 +0200 Jiri Pirko wrote:
>> $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 103
>> pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 0 pfnum 0 sfnum 103 splittable false
>>   function:
>>     hw_addr 00:00:00:00:00:00 state inactive opstate detached
>> 
>> $ sudo devlink port function set pci/0000:08:00.0/32770 hw_addr AA:22:33:44:55:66 uuid SOMETHINGREALLYUNIQUE
>
>Why does the user have to set the uuid? I was expecting it'd pop up 
>on the port automatically, generated by the kernel, as a read-only
>attribute.

Hmm, how that could be generated by kernel if it should be really
unique? Consider an example scenario where you have 2 DPUS (smartnic
with CPUs) plugged into a single host.

1) On DPU 1 you do:
   $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 1 sfnum 103
   pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 1 pfnum 0 sfnum 103 splittable false
       function:
       hw_addr 00:00:00:00:00:00 state inactive opstate detached uuid XXX

2) On DPU 2 you do:
   $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 1 sfnum 103
   pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 1 pfnum 0 sfnum 103 splittable false
       function:
       hw_addr 00:00:00:00:00:00 state inactive opstate detached uuid XXX

There is no way to sync between kernel running in the DPUs.
Both SFs in this example would be externaly created on the host. The
host will see 2 devices with the same uuid XXX, collision.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next 14/15] net/mlx5: Light probe local SFs
  2023-06-27 17:16                                       ` Jiri Pirko
@ 2023-06-27 17:35                                         ` Jakub Kicinski
  0 siblings, 0 replies; 41+ messages in thread
From: Jakub Kicinski @ 2023-06-27 17:35 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Saeed Mahameed, Saeed Mahameed, David S. Miller, Paolo Abeni,
	Eric Dumazet, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Tue, 27 Jun 2023 19:16:42 +0200 Jiri Pirko wrote:
> Hmm, how that could be generated by kernel if it should be really
> unique? Consider an example scenario where you have 2 DPUS (smartnic
> with CPUs) plugged into a single host.
> 
> 1) On DPU 1 you do:
>    $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 1 sfnum 103
>    pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 1 pfnum 0 sfnum 103 splittable false
>        function:
>        hw_addr 00:00:00:00:00:00 state inactive opstate detached uuid XXX
> 
> 2) On DPU 2 you do:
>    $ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 1 sfnum 103
>    pci/0000:08:00.0/32770: type eth netdev eth10 flavour pcisf controller 1 pfnum 0 sfnum 103 splittable false
>        function:
>        hw_addr 00:00:00:00:00:00 state inactive opstate detached uuid XXX
> 
> There is no way to sync between kernel running in the DPUs.
> Both SFs in this example would be externaly created on the host. The
> host will see 2 devices with the same uuid XXX, collision.

https://en.wikipedia.org/wiki/Universally_unique_identifier

tl;dr the whole point of UUIDs is that collisions do not happen

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2023-06-27 17:35 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-10  1:42 [pull request][net-next 00/15] mlx5 updates 2023-06-09 Saeed Mahameed
2023-06-10  1:42 ` [net-next 01/15] net/mlx5: Simplify unload all rep code Saeed Mahameed
2023-06-12 11:00   ` patchwork-bot+netdevbpf
2023-06-10  1:42 ` [net-next 02/15] net/mlx5: mlx5_ifc updates for embedded CPU SRIOV Saeed Mahameed
2023-06-10  1:42 ` [net-next 03/15] net/mlx5: Enable devlink port for embedded cpu VF vports Saeed Mahameed
2023-06-10  1:42 ` [net-next 04/15] net/mlx5: Update vport caps query/set for EC VFs Saeed Mahameed
2023-06-10  1:42 ` [net-next 05/15] net/mlx5: Add management of EC VF vports Saeed Mahameed
2023-06-10  1:42 ` [net-next 06/15] net/mlx5: Add/remove peer miss rules for EC VFs Saeed Mahameed
2023-06-10  1:42 ` [net-next 07/15] net/mlx5: Add new page type for EC VF pages Saeed Mahameed
2023-06-10  1:42 ` [net-next 08/15] net/mlx5: Use correct vport when restoring GUIDs Saeed Mahameed
2023-06-10  1:42 ` [net-next 09/15] net/mlx5: Query correct caps for min msix vectors Saeed Mahameed
2023-06-10  1:42 ` [net-next 10/15] net/mlx5: Update SRIOV enable/disable to handle EC/VFs Saeed Mahameed
2023-06-10  1:42 ` [net-next 11/15] net/mlx5: Set max number of embedded CPU VFs Saeed Mahameed
2023-06-10  1:42 ` [net-next 12/15] net/mlx5: Split function_setup() to enable and open functions Saeed Mahameed
2023-06-10  1:42 ` [net-next 13/15] net/mlx5: Move esw multiport devlink param to eswitch code Saeed Mahameed
2023-06-10  1:42 ` [net-next 14/15] net/mlx5: Light probe local SFs Saeed Mahameed
2023-06-10  7:01   ` Jakub Kicinski
2023-06-11  4:15     ` Saeed Mahameed
2023-06-11  5:10       ` Samudrala, Sridhar
2023-06-13 23:41         ` Saeed Mahameed
2023-06-12 17:51       ` Jakub Kicinski
2023-06-13 23:32         ` Saeed Mahameed
2023-06-14  2:05           ` Jakub Kicinski
2023-06-15 10:51             ` Jiri Pirko
2023-06-15 16:37               ` Jakub Kicinski
2023-06-15 17:37                 ` Jiri Pirko
2023-06-15 19:33                   ` Jakub Kicinski
2023-06-21 13:14                     ` Jiri Pirko
2023-06-21 18:23                       ` Jakub Kicinski
2023-06-22  6:42                         ` Jiri Pirko
2023-06-22  6:38                       ` Jiri Pirko
2023-06-22 16:35                         ` Jakub Kicinski
2023-06-23  9:27                           ` Jiri Pirko
2023-06-23 15:21                             ` Jakub Kicinski
2023-06-24  9:33                               ` Jiri Pirko
2023-06-24 20:47                                 ` Jakub Kicinski
2023-06-27 10:12                                   ` Jiri Pirko
2023-06-27 15:24                                     ` Jakub Kicinski
2023-06-27 17:16                                       ` Jiri Pirko
2023-06-27 17:35                                         ` Jakub Kicinski
2023-06-10  1:42 ` [net-next 15/15] net/mlx5e: Remove a useless function call Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).