All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
@ 2022-02-08 17:14 Moshe Shemesh
  2022-02-08 17:14 ` [PATCH net-next 1/4] net/mlx5: Split function_setup() to enable and open functions Moshe Shemesh
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-08 17:14 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jiri Pirko, Saeed Mahameed, netdev, linux-kernel, Moshe Shemesh

Currently SF device has all the aux devices enabled by default. Once
loaded, user who desire to disable some of them need to perform devlink
reload. This operation helps to reclaim memory that was not supposed
to be used, but the lost time in disabling and enabling again cannot be
recovered by this approach[1].
Therefore, introduce a new devlink generic parameter for PCI PF which
controls the creation of SFs. This parameter sets a flag in order to
disable all auxiliary devices of the SF. i.e.: All children auxiliary
devices of SF for RDMA, eth and vdpa-net are disabled by default and
hence no device initialization is done at probe stage.

$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
              value false cmode runtime

Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Now depending on the use case, the user can enable specific auxiliary
device(s). For example:

$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_vnet value true cmde driverinit

Afterwards, user needs to reload the SF in order for the SF to come up
with the specific configuration:

$ devlink dev reload auxiliary/mlx5_core.sf.1

[1]
mlx5 devlink reload is taking about 2 seconds, which means that with
256 SFs we are speaking about ~8.5 minutes.

Shay Drory (4):
  net/mlx5: Split function_setup() to enable and open functions
  net/mlx5: Delete redundant default assignment of runtime devlink
    params
  devlink: Add new "enable_sfs_aux_devs" generic device param
  net/mlx5: Support enable_sfs_aux_devs devlink param

 .../networking/devlink/devlink-params.rst     |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |  16 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  51 ++---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |   3 +
 .../net/ethernet/mellanox/mlx5/core/health.c  |   5 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    | 183 +++++++++++++++---
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   6 +
 .../mellanox/mlx5/core/sf/dev/driver.c        |  13 +-
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  |  40 ++++
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c |   7 +
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |   2 +
 include/linux/mlx5/driver.h                   |   1 +
 include/net/devlink.h                         |   4 +
 net/core/devlink.c                            |   5 +
 14 files changed, 284 insertions(+), 57 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/4] net/mlx5: Split function_setup() to enable and open functions
  2022-02-08 17:14 [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Moshe Shemesh
@ 2022-02-08 17:14 ` Moshe Shemesh
  2022-02-08 17:14 ` [PATCH net-next 2/4] net/mlx5: Delete redundant default assignment of runtime devlink params Moshe Shemesh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-08 17:14 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jiri Pirko, Saeed Mahameed, netdev, linux-kernel, Shay Drory

From: Shay Drory <shayd@nvidia.com>

mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who
desire to disable some of the SF aux devices, and with large scale-1K
SFs for example, this user will waste more than 3 minutes on
mlx5_cmd_init_hca() which isn't needed at this stage.

Therefore, split function_setup() in order to avoid executing
mlx5_cmd_init_hca() in case user disables SFs aux dev via the devlink
param which will be introduced in downstream patch.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/main.c    | 81 +++++++++++++------
 1 file changed, 56 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 2c774f367199..73d2cec01ead 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1006,7 +1006,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_devcom_unregister_device(dev->priv.devcom);
 }
 
-static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot)
+static int mlx5_function_enable(struct mlx5_core_dev *dev)
 {
 	int err;
 
@@ -1070,28 +1070,53 @@ static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot)
 		goto reclaim_boot_pages;
 	}
 
+	return 0;
+
+reclaim_boot_pages:
+	mlx5_reclaim_startup_pages(dev);
+err_disable_hca:
+	mlx5_core_disable_hca(dev, 0);
+err_cmd_cleanup:
+	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
+	mlx5_cmd_cleanup(dev);
+
+	return err;
+}
+
+static void mlx5_function_disable(struct mlx5_core_dev *dev)
+{
+	mlx5_reclaim_startup_pages(dev);
+	mlx5_core_disable_hca(dev, 0);
+	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
+	mlx5_cmd_cleanup(dev);
+}
+
+static int mlx5_function_open(struct mlx5_core_dev *dev, bool boot)
+{
+	int err;
+
 	err = set_hca_ctrl(dev);
 	if (err) {
 		mlx5_core_err(dev, "set_hca_ctrl failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	err = set_hca_cap(dev);
 	if (err) {
 		mlx5_core_err(dev, "set_hca_cap failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	err = mlx5_satisfy_startup_pages(dev, 0);
 	if (err) {
 		mlx5_core_err(dev, "failed to allocate init pages\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	err = mlx5_cmd_init_hca(dev, sw_owner_id);
 	if (err) {
 		mlx5_core_err(dev, "init hca failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	mlx5_set_driver_version(dev);
@@ -1099,25 +1124,15 @@ static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot)
 	err = mlx5_query_hca_caps(dev);
 	if (err) {
 		mlx5_core_err(dev, "query hca failed\n");
-		goto reclaim_boot_pages;
+		return err;
 	}
 
 	mlx5_start_health_poll(dev);
 
 	return 0;
-
-reclaim_boot_pages:
-	mlx5_reclaim_startup_pages(dev);
-err_disable_hca:
-	mlx5_core_disable_hca(dev, 0);
-err_cmd_cleanup:
-	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
-	mlx5_cmd_cleanup(dev);
-
-	return err;
 }
 
-static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
+static int mlx5_function_close(struct mlx5_core_dev *dev, bool boot)
 {
 	int err;
 
@@ -1127,14 +1142,30 @@ static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
 		mlx5_core_err(dev, "tear_down_hca failed, skip cleanup\n");
 		return err;
 	}
-	mlx5_reclaim_startup_pages(dev);
-	mlx5_core_disable_hca(dev, 0);
-	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
-	mlx5_cmd_cleanup(dev);
 
 	return 0;
 }
 
+static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot)
+{
+	int err;
+
+	err = mlx5_function_enable(dev);
+	if (err)
+		return err;
+
+	err = mlx5_function_open(dev, boot);
+	if (err)
+		mlx5_function_disable(dev);
+	return err;
+}
+
+static void mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
+{
+	if (!mlx5_function_close(dev, boot))
+		mlx5_function_disable(dev);
+}
+
 static int mlx5_load(struct mlx5_core_dev *dev)
 {
 	int err;
@@ -1290,7 +1321,7 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 
 	err = mlx5_function_setup(dev, true);
 	if (err)
-		goto err_function;
+		goto err_function_setup;
 
 	err = mlx5_init_once(dev);
 	if (err) {
@@ -1324,7 +1355,7 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 	mlx5_cleanup_once(dev);
 function_teardown:
 	mlx5_function_teardown(dev, true);
-err_function:
+err_function_setup:
 	dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
 	mutex_unlock(&dev->intf_state_mutex);
 	return err;
@@ -1366,7 +1397,7 @@ int mlx5_load_one(struct mlx5_core_dev *dev)
 
 	err = mlx5_function_setup(dev, false);
 	if (err)
-		goto err_function;
+		goto err_function_setup;
 
 	err = mlx5_load(dev);
 	if (err)
@@ -1386,7 +1417,7 @@ int mlx5_load_one(struct mlx5_core_dev *dev)
 	mlx5_unload(dev);
 err_load:
 	mlx5_function_teardown(dev, false);
-err_function:
+err_function_setup:
 	dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
 out:
 	mutex_unlock(&dev->intf_state_mutex);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/4] net/mlx5: Delete redundant default assignment of runtime devlink params
  2022-02-08 17:14 [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Moshe Shemesh
  2022-02-08 17:14 ` [PATCH net-next 1/4] net/mlx5: Split function_setup() to enable and open functions Moshe Shemesh
@ 2022-02-08 17:14 ` Moshe Shemesh
  2022-02-08 17:14 ` [PATCH net-next 3/4] devlink: Add new "enable_sfs_aux_devs" generic device param Moshe Shemesh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-08 17:14 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jiri Pirko, Saeed Mahameed, netdev, linux-kernel, Shay Drory

From: Shay Drory <shayd@nvidia.com>

Runtime devlink params always read their values from the get() callbacks.
Also, it is an error to set driverinit_value for params which don't
support driverinit cmode. Delete such assignments.

In addition, move the set of default matching mode inside eswitch code.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 20 -------------------
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  3 +++
 2 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index d1093bb2d436..e832a3f4c18a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -588,14 +588,6 @@ static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 	union devlink_param_value value;
 
-	if (dev->priv.steering->mode == MLX5_FLOW_STEERING_MODE_DMFS)
-		strcpy(value.vstr, "dmfs");
-	else
-		strcpy(value.vstr, "smfs");
-	devlink_param_driverinit_value_set(devlink,
-					   MLX5_DEVLINK_PARAM_ID_FLOW_STEERING_MODE,
-					   value);
-
 	value.vbool = MLX5_CAP_GEN(dev, roce);
 	devlink_param_driverinit_value_set(devlink,
 					   DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
@@ -606,18 +598,6 @@ static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
 	devlink_param_driverinit_value_set(devlink,
 					   MLX5_DEVLINK_PARAM_ID_ESW_LARGE_GROUP_NUM,
 					   value);
-
-	if (MLX5_ESWITCH_MANAGER(dev)) {
-		if (mlx5_esw_vport_match_metadata_supported(dev->priv.eswitch)) {
-			dev->priv.eswitch->flags |= MLX5_ESWITCH_VPORT_MATCH_METADATA;
-			value.vbool = true;
-		} else {
-			value.vbool = false;
-		}
-		devlink_param_driverinit_value_set(devlink,
-						   MLX5_DEVLINK_PARAM_ID_ESW_PORT_METADATA,
-						   value);
-	}
 #endif
 
 	value.vu32 = MLX5_COMP_EQ_SIZE;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 458ec0bca1b8..25f2d2717aaa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1582,6 +1582,9 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 		esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC;
 	else
 		esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
+	if (MLX5_ESWITCH_MANAGER(dev) &&
+	    mlx5_esw_vport_match_metadata_supported(esw))
+		esw->flags |= MLX5_ESWITCH_VPORT_MATCH_METADATA;
 
 	dev->priv.eswitch = esw;
 	BLOCKING_INIT_NOTIFIER_HEAD(&esw->n_head);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/4] devlink: Add new "enable_sfs_aux_devs" generic device param
  2022-02-08 17:14 [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Moshe Shemesh
  2022-02-08 17:14 ` [PATCH net-next 1/4] net/mlx5: Split function_setup() to enable and open functions Moshe Shemesh
  2022-02-08 17:14 ` [PATCH net-next 2/4] net/mlx5: Delete redundant default assignment of runtime devlink params Moshe Shemesh
@ 2022-02-08 17:14 ` Moshe Shemesh
  2022-02-08 17:14 ` [PATCH net-next 4/4] net/mlx5: Support enable_sfs_aux_devs devlink param Moshe Shemesh
  2022-02-09  5:23 ` [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Jakub Kicinski
  4 siblings, 0 replies; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-08 17:14 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jiri Pirko, Saeed Mahameed, netdev, linux-kernel, Shay Drory

From: Shay Drory <shayd@nvidia.com>

Add new device generic parameter to enable/disable creation of
Sub-Functions auxiliary devices of a certain Physical-Function.

User who wants to use specific SFs auxiliary devices, can disable
the creation of all SF auxiliary devices upon SF creation.
After the SF is created, the user can enable only the requested
auxiliary devices.

for example:

$ devlink dev param set pci/0000:08:00.0 \
              name enable_sfs_aux_devs value false cmode driverinit

Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_eth value true cmode driverinit

$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
 Documentation/networking/devlink/devlink-params.rst | 5 +++++
 include/net/devlink.h                               | 4 ++++
 net/core/devlink.c                                  | 5 +++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 4e01dc32bc08..aa0edb915f88 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -137,3 +137,8 @@ own name.
    * - ``event_eq_size``
      - u32
      - Control the size of asynchronous control events EQ.
+   * - ``enable_sfs_aux_devs``
+     - Boolean
+     - When enabled, the device driver will instantiate all auxiliary devices of
+       SFs of the devlink device. When clear, the SFs of the devlink device will
+       not instantiate any auxiliary devices.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 8d5349d2fb68..8013252790bf 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -461,6 +461,7 @@ enum devlink_param_generic_id {
 	DEVLINK_PARAM_GENERIC_ID_ENABLE_IWARP,
 	DEVLINK_PARAM_GENERIC_ID_IO_EQ_SIZE,
 	DEVLINK_PARAM_GENERIC_ID_EVENT_EQ_SIZE,
+	DEVLINK_PARAM_GENERIC_ID_ENABLE_SFS_AUX_DEVS,
 
 	/* add new param generic ids above here*/
 	__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -519,6 +520,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_NAME "event_eq_size"
 #define DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_TYPE DEVLINK_PARAM_TYPE_U32
 
+#define DEVLINK_PARAM_GENERIC_ENABLE_SFS_AUX_DEVS_NAME "enable_sfs_aux_devs"
+#define DEVLINK_PARAM_GENERIC_ENABLE_SFS_AUX_DEVS_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)	\
 {									\
 	.id = DEVLINK_PARAM_GENERIC_ID_##_id,				\
diff --git a/net/core/devlink.c b/net/core/devlink.c
index fcd9f6d85cf1..b5368024ac18 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4477,6 +4477,11 @@ static const struct devlink_param devlink_param_generic[] = {
 		.name = DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_NAME,
 		.type = DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_TYPE,
 	},
+	{
+		.id = DEVLINK_PARAM_GENERIC_ID_ENABLE_SFS_AUX_DEVS,
+		.name = DEVLINK_PARAM_GENERIC_ENABLE_SFS_AUX_DEVS_NAME,
+		.type = DEVLINK_PARAM_GENERIC_ENABLE_SFS_AUX_DEVS_TYPE,
+	},
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 4/4] net/mlx5: Support enable_sfs_aux_devs devlink param
  2022-02-08 17:14 [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Moshe Shemesh
                   ` (2 preceding siblings ...)
  2022-02-08 17:14 ` [PATCH net-next 3/4] devlink: Add new "enable_sfs_aux_devs" generic device param Moshe Shemesh
@ 2022-02-08 17:14 ` Moshe Shemesh
  2022-02-09  5:23 ` [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Jakub Kicinski
  4 siblings, 0 replies; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-08 17:14 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Jiri Pirko, Saeed Mahameed, netdev, linux-kernel, Shay Drory

From: Shay Drory <shayd@nvidia.com>

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.
In order to save the time of the reload, exposing a devlink runtime
param on the PF, that when set to off, the SFs won't create any auxiliary
sub-device, so that the SF can be configured prior to its full probe.

Usage example:
$ devlink dev param set pci/0000:00:0b.0 name enable_sfs_aux_devs \
	  value false cmode runtime

Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |  16 +++
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  31 +++++-
 .../net/ethernet/mellanox/mlx5/core/health.c  |   5 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    | 102 +++++++++++++++++-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   6 ++
 .../mellanox/mlx5/core/sf/dev/driver.c        |  13 ++-
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  |  40 +++++++
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c |   7 ++
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |   2 +
 include/linux/mlx5/driver.h                   |   1 +
 10 files changed, 211 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index ba6dad97e308..89ead37b98ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -333,6 +333,18 @@ static void del_adev(struct auxiliary_device *adev)
 	auxiliary_device_uninit(adev);
 }
 
+void mlx5_dev_mark_unregistered(struct mlx5_core_dev *dev)
+{
+	mutex_lock(&mlx5_intf_mutex);
+	dev->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
+	mutex_unlock(&mlx5_intf_mutex);
+}
+
+bool mlx5_dev_is_unregistered(struct mlx5_core_dev *dev)
+{
+	return dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
+}
+
 int mlx5_attach_device(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
@@ -473,6 +485,10 @@ static int add_drivers(struct mlx5_core_dev *dev)
 		if (!is_supported)
 			continue;
 
+		if (mlx5_adev_devices[i].is_enabled &&
+		    !(mlx5_adev_devices[i].is_enabled(dev)))
+			continue;
+
 		priv->adev[i] = add_adev(dev, i);
 		if (IS_ERR(priv->adev[i])) {
 			mlx5_core_warn(dev, "Device[%d] (%s) failed to load\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index e832a3f4c18a..1b769b5da577 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -139,6 +139,13 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 	struct pci_dev *pdev = dev->pdev;
 	bool sf_dev_allocated;
 
+	if (mlx5_dev_is_unregistered(dev)) {
+		if (action != DEVLINK_RELOAD_ACTION_DRIVER_REINIT)
+			return -EOPNOTSUPP;
+		mlx5_unload_one_light(dev);
+		return 0;
+	}
+
 	sf_dev_allocated = mlx5_sf_dev_allocated(dev);
 	if (sf_dev_allocated) {
 		/* Reload results in deleting SF device which further results in
@@ -182,6 +189,10 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a
 	*actions_performed = BIT(action);
 	switch (action) {
 	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
+		if (mlx5_dev_is_unregistered(dev)) {
+			mlx5_fw_reporters_create(dev);
+			return mlx5_init_one(dev);
+		}
 		return mlx5_load_one(dev);
 	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
 		if (limit == DEVLINK_RELOAD_LIMIT_NO_RESET)
@@ -258,6 +269,9 @@ static int mlx5_devlink_trap_action_set(struct devlink *devlink,
 	struct mlx5_devlink_trap *dl_trap;
 	int err = 0;
 
+	if (mlx5_dev_is_unregistered(dev))
+		return -EOPNOTSUPP;
+
 	if (is_mdev_switchdev_mode(dev)) {
 		NL_SET_ERR_MSG_MOD(extack, "Devlink traps can't be set in switchdev mode");
 		return -EOPNOTSUPP;
@@ -440,6 +454,9 @@ static int mlx5_devlink_fs_mode_get(struct devlink *devlink, u32 id,
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 
+	if (mlx5_dev_is_unregistered(dev))
+		return -EOPNOTSUPP;
+
 	if (dev->priv.steering->mode == MLX5_FLOW_STEERING_MODE_SMFS)
 		strcpy(ctx->val.vstr, "smfs");
 	else
@@ -499,6 +516,9 @@ static int mlx5_devlink_esw_port_metadata_get(struct devlink *devlink, u32 id,
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 
+	if (mlx5_dev_is_unregistered(dev))
+		return -EOPNOTSUPP;
+
 	if (!MLX5_ESWITCH_MANAGER(dev))
 		return -EOPNOTSUPP;
 
@@ -542,6 +562,9 @@ static int mlx5_devlink_enable_remote_dev_reset_get(struct devlink *devlink, u32
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 
+	if (mlx5_dev_is_unregistered(dev))
+		return -EOPNOTSUPP;
+
 	ctx->val.vbool = mlx5_fw_reset_enable_remote_dev_reset_get(dev);
 	return 0;
 }
@@ -588,7 +611,7 @@ static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 	union devlink_param_value value;
 
-	value.vbool = MLX5_CAP_GEN(dev, roce);
+	value.vbool = MLX5_CAP_GEN(dev, roce) && !mlx5_dev_is_unregistered(dev);
 	devlink_param_driverinit_value_set(devlink,
 					   DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
 					   value);
@@ -628,7 +651,7 @@ static int mlx5_devlink_eth_param_register(struct devlink *devlink)
 	if (err)
 		return err;
 
-	value.vbool = true;
+	value.vbool = !mlx5_dev_is_unregistered(dev);
 	devlink_param_driverinit_value_set(devlink,
 					   DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH,
 					   value);
@@ -673,7 +696,7 @@ static int mlx5_devlink_rdma_param_register(struct devlink *devlink)
 	if (err)
 		return err;
 
-	value.vbool = true;
+	value.vbool = !mlx5_dev_is_unregistered(devlink_priv(devlink));
 	devlink_param_driverinit_value_set(devlink,
 					   DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
 					   value);
@@ -705,7 +728,7 @@ static int mlx5_devlink_vnet_param_register(struct devlink *devlink)
 	if (err)
 		return err;
 
-	value.vbool = true;
+	value.vbool = !mlx5_dev_is_unregistered(dev);
 	devlink_param_driverinit_value_set(devlink,
 					   DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET,
 					   value);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 737df402c927..1b40ada0e9c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -700,7 +700,7 @@ static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
 };
 
 #define MLX5_REPORTER_FW_GRACEFUL_PERIOD 1200000
-static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
+void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_health *health = &dev->priv.health;
 	struct devlink *devlink = priv_to_devlink(dev);
@@ -893,7 +893,8 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 	struct mlx5_core_health *health;
 	char *name;
 
-	mlx5_fw_reporters_create(dev);
+	if (!mlx5_dev_is_unregistered(dev))
+		mlx5_fw_reporters_create(dev);
 
 	health = &dev->priv.health;
 	name = kmalloc(64, GFP_KERNEL);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 73d2cec01ead..b5e40ad0b5fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1314,6 +1314,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
 
 int mlx5_init_one(struct mlx5_core_dev *dev)
 {
+	bool light_probe = mlx5_dev_is_unregistered(dev);
 	int err = 0;
 
 	mutex_lock(&dev->intf_state_mutex);
@@ -1335,9 +1336,14 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 
 	set_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
 
-	err = mlx5_devlink_register(priv_to_devlink(dev));
-	if (err)
-		goto err_devlink_reg;
+	/* In case of light_probe, mlx5_devlink is already registered.
+	 * Hence, don't register devlink again.
+	 */
+	if (!light_probe) {
+		err = mlx5_devlink_register(priv_to_devlink(dev));
+		if (err)
+			goto err_devlink_reg;
+	}
 
 	err = mlx5_register_device(dev);
 	if (err)
@@ -1347,7 +1353,8 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 	return 0;
 
 err_register:
-	mlx5_devlink_unregister(priv_to_devlink(dev));
+	if (!light_probe)
+		mlx5_devlink_unregister(priv_to_devlink(dev));
 err_devlink_reg:
 	clear_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
 	mlx5_unload(dev);
@@ -1443,6 +1450,93 @@ void mlx5_unload_one(struct mlx5_core_dev *dev)
 	mutex_unlock(&dev->intf_state_mutex);
 }
 
+/* In case of light probe, we don't need a full query of hca_caps, but only the bellow caps.
+ * A full query of hca_caps will be done when the device will reload.
+ */
+static int mlx5_query_hca_caps_light(struct mlx5_core_dev *dev)
+{
+	int err;
+
+	err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL);
+	if (err)
+		return err;
+
+	if (MLX5_CAP_GEN(dev, eth_net_offloads)) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_ETHERNET_OFFLOADS);
+		if (err)
+			return err;
+	}
+
+	if (MLX5_CAP_GEN(dev, nic_flow_table) ||
+	    MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
+		if (err)
+			return err;
+	}
+
+	if (MLX5_CAP_GEN_64(dev, general_obj_types) &
+		MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_VDPA_EMULATION);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+int mlx5_init_one_light(struct mlx5_core_dev *dev)
+{
+	int err;
+
+	dev->state = MLX5_DEVICE_STATE_UP;
+	err = mlx5_function_enable(dev);
+	if (err) {
+		mlx5_core_warn(dev, "mlx5_function_enable err=%d\n", err);
+		goto out;
+	}
+
+	err = mlx5_query_hca_caps_light(dev);
+	if (err) {
+		mlx5_core_warn(dev, "mlx5_query_hca_caps_light err=%d\n", err);
+		goto query_hca_caps_err;
+	}
+
+	err = mlx5_devlink_register(priv_to_devlink(dev));
+	if (err) {
+		mlx5_core_warn(dev, "mlx5_devlink_reg err = %d\n", err);
+		goto query_hca_caps_err;
+	}
+
+	return 0;
+
+query_hca_caps_err:
+	mlx5_function_disable(dev);
+out:
+	dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+	return err;
+}
+
+void mlx5_uninit_one_light(struct mlx5_core_dev *dev)
+{
+	mlx5_devlink_unregister(priv_to_devlink(dev));
+	if (dev->state != MLX5_DEVICE_STATE_UP)
+		return;
+	mlx5_function_disable(dev);
+}
+
+/* xxx_ligth() function are used in order to configure the device without full
+ * init (light init). e.g.: There isn't a point in reload a device to light state.
+ * Hence, mlx5_load_one_light() isn't needed.
+ */
+
+void mlx5_unload_one_light(struct mlx5_core_dev *dev)
+{
+	if (dev->state != MLX5_DEVICE_STATE_UP)
+		return;
+	mlx5_function_disable(dev);
+	dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+}
+
 static const int types[] = {
 	MLX5_CAP_GENERAL,
 	MLX5_CAP_GENERAL_2,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 6f8baa0f2a73..e2d827474552 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -209,11 +209,14 @@ int mlx5_attach_device(struct mlx5_core_dev *dev);
 void mlx5_detach_device(struct mlx5_core_dev *dev);
 int mlx5_register_device(struct mlx5_core_dev *dev);
 void mlx5_unregister_device(struct mlx5_core_dev *dev);
+void mlx5_dev_mark_unregistered(struct mlx5_core_dev *dev);
+bool mlx5_dev_is_unregistered(struct mlx5_core_dev *dev);
 struct mlx5_core_dev *mlx5_get_next_phys_dev(struct mlx5_core_dev *dev);
 void mlx5_dev_list_lock(void);
 void mlx5_dev_list_unlock(void);
 int mlx5_dev_list_trylock(void);
 
+void mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
 int mlx5_query_mtpps(struct mlx5_core_dev *dev, u32 *mtpps, u32 mtpps_size);
 int mlx5_set_mtpps(struct mlx5_core_dev *mdev, u32 *mtpps, u32 mtpps_size);
 int mlx5_query_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 *arm, u8 *mode);
@@ -291,6 +294,9 @@ int mlx5_init_one(struct mlx5_core_dev *dev);
 void mlx5_uninit_one(struct mlx5_core_dev *dev);
 void mlx5_unload_one(struct mlx5_core_dev *dev);
 int mlx5_load_one(struct mlx5_core_dev *dev);
+int mlx5_init_one_light(struct mlx5_core_dev *dev);
+void mlx5_uninit_one_light(struct mlx5_core_dev *dev);
+void mlx5_unload_one_light(struct mlx5_core_dev *dev);
 
 int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index 7b4783ce213e..8d6e0e44e614 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -28,6 +28,9 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
 	mdev->priv.adev_idx = adev->id;
 	sf_dev->mdev = mdev;
 
+	if (sf_dev->parent_mdev->priv.sfs_light_probe)
+		mlx5_dev_mark_unregistered(mdev);
+
 	err = mlx5_mdev_init(mdev, MLX5_DEFAULT_PROF);
 	if (err) {
 		mlx5_core_warn(mdev, "mlx5_mdev_init on err=%d\n", err);
@@ -41,7 +44,10 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
 		goto remap_err;
 	}
 
-	err = mlx5_init_one(mdev);
+	if (sf_dev->parent_mdev->priv.sfs_light_probe)
+		err = mlx5_init_one_light(mdev);
+	else
+		err = mlx5_init_one(mdev);
 	if (err) {
 		mlx5_core_warn(mdev, "mlx5_init_one err=%d\n", err);
 		goto init_one_err;
@@ -64,7 +70,10 @@ static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
 	struct devlink *devlink = priv_to_devlink(sf_dev->mdev);
 
 	devlink_unregister(devlink);
-	mlx5_uninit_one(sf_dev->mdev);
+	if (mlx5_dev_is_unregistered(sf_dev->mdev))
+		mlx5_uninit_one_light(sf_dev->mdev);
+	else
+		mlx5_uninit_one(sf_dev->mdev);
 	iounmap(sf_dev->mdev->iseg);
 	mlx5_mdev_uninit(sf_dev->mdev);
 	mlx5_devlink_free(devlink);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
index 3be659cd91f1..e3f6066ab7af 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
@@ -10,6 +10,7 @@
 #include "ecpf.h"
 #define CREATE_TRACE_POINTS
 #include "diag/sf_tracepoint.h"
+#include "devlink.h"
 
 struct mlx5_sf {
 	struct devlink_port dl_port;
@@ -456,6 +457,44 @@ static int mlx5_sf_vhca_event(struct notifier_block *nb, unsigned long opcode, v
 	return 0;
 }
 
+static int mlx5_devlink_sfs_light_probe_get(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	ctx->val.vbool = !dev->priv.sfs_light_probe;
+	return 0;
+}
+
+static int mlx5_devlink_sfs_light_probe_set(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	dev->priv.sfs_light_probe = !ctx->val.vbool;
+	return 0;
+}
+
+static const struct devlink_param sfs_light_probe_param =
+	DEVLINK_PARAM_GENERIC(ENABLE_SFS_AUX_DEVS, BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			      mlx5_devlink_sfs_light_probe_get,
+			      mlx5_devlink_sfs_light_probe_set, NULL);
+
+int mlx5_devlink_sfs_light_probe_param_register(struct devlink *devlink)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	if (!mlx5_core_is_pf(dev) || !mlx5_sf_max_functions(dev))
+		return 0;
+
+	return devlink_param_register(devlink, &sfs_light_probe_param);
+}
+
+void mlx5_devlink_sfs_light_probe_param_unregister(struct devlink *devlink)
+{
+	devlink_param_unregister(devlink, &sfs_light_probe_param);
+}
+
 static void mlx5_sf_table_enable(struct mlx5_sf_table *table)
 {
 	init_completion(&table->disable_complete);
@@ -533,6 +572,7 @@ int mlx5_sf_table_init(struct mlx5_core_dev *dev)
 	table->dev = dev;
 	xa_init(&table->port_indices);
 	dev->priv.sf_table = table;
+	dev->priv.sfs_light_probe = false;
 	refcount_set(&table->refcount, 0);
 	table->esw_nb.notifier_call = mlx5_sf_esw_event;
 	err = mlx5_esw_event_notifier_register(dev->priv.eswitch, &table->esw_nb);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
index 17aa348989cb..664704598b76 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
@@ -283,9 +283,15 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
 	if (err)
 		goto ext_err;
 
+	err = mlx5_devlink_sfs_light_probe_param_register(priv_to_devlink(dev));
+	if (err)
+		goto sfs_reg_err;
+
 	mlx5_core_dbg(dev, "SF HW table: max sfs = %d, ext sfs = %d\n", max_fn, max_ext_fn);
 	return 0;
 
+sfs_reg_err:
+	mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_EXTERNAL]);
 ext_err:
 	mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_LOCAL]);
 table_err:
@@ -301,6 +307,7 @@ void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev)
 	if (!table)
 		return;
 
+	mlx5_devlink_sfs_light_probe_param_unregister(priv_to_devlink(dev));
 	mutex_destroy(&table->table_lock);
 	mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_EXTERNAL]);
 	mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_LOCAL]);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
index 7114f3fc335f..c228edc8e43c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
@@ -19,4 +19,6 @@ void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u32 controller, u16 id)
 void mlx5_sf_hw_table_sf_deferred_free(struct mlx5_core_dev *dev, u32 controller, u16 id);
 bool mlx5_sf_hw_table_supported(const struct mlx5_core_dev *dev);
 
+int mlx5_devlink_sfs_light_probe_param_register(struct devlink *devlink);
+void mlx5_devlink_sfs_light_probe_param_unregister(struct devlink *devlink);
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 78655d8d13a7..fd07175896ff 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -604,6 +604,7 @@ struct mlx5_priv {
 	struct mlx5_vhca_state_notifier *vhca_state_notifier;
 	struct mlx5_sf_dev_table *sf_dev_table;
 	struct mlx5_core_dev *parent_mdev;
+	u8 sfs_light_probe:1;
 #endif
 #ifdef CONFIG_MLX5_SF_MANAGER
 	struct mlx5_sf_hw_table *sf_hw_table;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-08 17:14 [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Moshe Shemesh
                   ` (3 preceding siblings ...)
  2022-02-08 17:14 ` [PATCH net-next 4/4] net/mlx5: Support enable_sfs_aux_devs devlink param Moshe Shemesh
@ 2022-02-09  5:23 ` Jakub Kicinski
  2022-02-09  7:39   ` Moshe Shemesh
  2022-02-09  9:57   ` Jiri Pirko
  4 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2022-02-09  5:23 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: David S. Miller, Jiri Pirko, Saeed Mahameed, netdev, linux-kernel

On Tue, 8 Feb 2022 19:14:02 +0200 Moshe Shemesh wrote:
> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>               value false cmode runtime
> 
> Create SF:
> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
> $ devlink port function set pci/0000:08:00.0/32768 \
>                hw_addr 00:00:00:00:00:11 state active
> 
> Now depending on the use case, the user can enable specific auxiliary
> device(s). For example:
> 
> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>               name enable_vnet value true cmde driverinit
> 
> Afterwards, user needs to reload the SF in order for the SF to come up
> with the specific configuration:
> 
> $ devlink dev reload auxiliary/mlx5_core.sf.1

If the user just wants vnet why not add an API which tells the driver
which functionality the user wants when the "port" is "spawned"?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-09  5:23 ` [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Jakub Kicinski
@ 2022-02-09  7:39   ` Moshe Shemesh
  2022-02-09  9:57   ` Jiri Pirko
  1 sibling, 0 replies; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-09  7:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Jiri Pirko, Saeed Mahameed, netdev, linux-kernel


On 2/9/2022 7:23 AM, Jakub Kicinski wrote:
> On Tue, 8 Feb 2022 19:14:02 +0200 Moshe Shemesh wrote:
>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>>                value false cmode runtime
>>
>> Create SF:
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>> $ devlink port function set pci/0000:08:00.0/32768 \
>>                 hw_addr 00:00:00:00:00:11 state active
>>
>> Now depending on the use case, the user can enable specific auxiliary
>> device(s). For example:
>>
>> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>>                name enable_vnet value true cmde driverinit
>>
>> Afterwards, user needs to reload the SF in order for the SF to come up
>> with the specific configuration:
>>
>> $ devlink dev reload auxiliary/mlx5_core.sf.1
> If the user just wants vnet why not add an API which tells the driver
> which functionality the user wants when the "port" is "spawned"?


Well we don't have the SFs at that stage, how can we tell which SF will 
use vnet and which SF will use eth ?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-09  5:23 ` [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Jakub Kicinski
  2022-02-09  7:39   ` Moshe Shemesh
@ 2022-02-09  9:57   ` Jiri Pirko
  2022-02-10  1:25     ` Jakub Kicinski
  1 sibling, 1 reply; 13+ messages in thread
From: Jiri Pirko @ 2022-02-09  9:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Moshe Shemesh, David S. Miller, Jiri Pirko, Saeed Mahameed,
	netdev, linux-kernel

Wed, Feb 09, 2022 at 06:23:41AM CET, kuba@kernel.org wrote:
>On Tue, 8 Feb 2022 19:14:02 +0200 Moshe Shemesh wrote:
>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>>               value false cmode runtime
>> 
>> Create SF:
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>> $ devlink port function set pci/0000:08:00.0/32768 \
>>                hw_addr 00:00:00:00:00:11 state active
>> 
>> Now depending on the use case, the user can enable specific auxiliary
>> device(s). For example:
>> 
>> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>>               name enable_vnet value true cmde driverinit
>> 
>> Afterwards, user needs to reload the SF in order for the SF to come up
>> with the specific configuration:
>> 
>> $ devlink dev reload auxiliary/mlx5_core.sf.1
>
>If the user just wants vnet why not add an API which tells the driver
>which functionality the user wants when the "port" is "spawned"?

It's a different user. One works with the eswitch and creates the port
function. The other one takes the created instance and works with it.
Note that it may be on a different host.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-09  9:57   ` Jiri Pirko
@ 2022-02-10  1:25     ` Jakub Kicinski
  2022-02-10  7:02       ` Jiri Pirko
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2022-02-10  1:25 UTC (permalink / raw)
  To: Jiri Pirko, Moshe Shemesh
  Cc: David S. Miller, Jiri Pirko, Saeed Mahameed, netdev, linux-kernel

On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
> Well we don't have the SFs at that stage, how can we tell which SF will 
> use vnet and which SF will use eth ?

On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote: 
> It's a different user. One works with the eswitch and creates the port
> function. The other one takes the created instance and works with it.
> Note that it may be on a different host.

It is a little confusing, so I may well be misunderstanding but the
cover letter says:

$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
              value false cmode runtime

$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11

So both of these run on the same side, no?

What I meant is make the former part of the latter:

$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 noprobe


Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
auxiliary/mlx5_core.sf.1 is the... "customer" side, correct? 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-10  1:25     ` Jakub Kicinski
@ 2022-02-10  7:02       ` Jiri Pirko
  2022-02-10 10:28         ` Moshe Shemesh
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Pirko @ 2022-02-10  7:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Moshe Shemesh, David S. Miller, Jiri Pirko, Saeed Mahameed,
	netdev, linux-kernel

Thu, Feb 10, 2022 at 02:25:25AM CET, kuba@kernel.org wrote:
>On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
>> Well we don't have the SFs at that stage, how can we tell which SF will 
>> use vnet and which SF will use eth ?
>
>On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote: 
>> It's a different user. One works with the eswitch and creates the port
>> function. The other one takes the created instance and works with it.
>> Note that it may be on a different host.
>
>It is a little confusing, so I may well be misunderstanding but the
>cover letter says:
>
>$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>              value false cmode runtime
>
>$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>
>So both of these run on the same side, no?
>
>What I meant is make the former part of the latter:
>
>$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 noprobe

I see. So it would not be "global policy" but per-instance option during
creation. That makes sense. I wonder if the HW is capable of such flow,
Moshe, Saeed?


>
>
>Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
>auxiliary/mlx5_core.sf.1 is the... "customer" side, correct? 

Yep.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-10  7:02       ` Jiri Pirko
@ 2022-02-10 10:28         ` Moshe Shemesh
  2022-02-10 19:09           ` Parav Pandit
  0 siblings, 1 reply; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-10 10:28 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: David S. Miller, Jiri Pirko, Saeed Mahameed, netdev, linux-kernel


On 2/10/2022 9:02 AM, Jiri Pirko wrote:
> Thu, Feb 10, 2022 at 02:25:25AM CET, kuba@kernel.org wrote:
>> On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
>>> Well we don't have the SFs at that stage, how can we tell which SF will
>>> use vnet and which SF will use eth ?
>> On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
>>> It's a different user. One works with the eswitch and creates the port
>>> function. The other one takes the created instance and works with it.
>>> Note that it may be on a different host.
>> It is a little confusing, so I may well be misunderstanding but the
>> cover letter says:
>>
>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>>               value false cmode runtime
>>
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>
>> So both of these run on the same side, no?
Yes.
>> What I meant is make the former part of the latter:
>>
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 noprobe
> I see. So it would not be "global policy" but per-instance option during
> creation. That makes sense. I wonder if the HW is capable of such flow,
> Moshe, Saeed?


LGTM. Thanks.

>
>>
>> Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
>> auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
> Yep.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-10 10:28         ` Moshe Shemesh
@ 2022-02-10 19:09           ` Parav Pandit
  2022-02-11  8:46             ` Moshe Shemesh
  0 siblings, 1 reply; 13+ messages in thread
From: Parav Pandit @ 2022-02-10 19:09 UTC (permalink / raw)
  To: Moshe Shemesh, Jiri Pirko, Jakub Kicinski
  Cc: David S. Miller, Jiri Pirko, Saeed Mahameed, netdev, linux-kernel


> From: Moshe Shemesh <moshe@nvidia.com>
> Sent: Thursday, February 10, 2022 3:58 PM
> 
> On 2/10/2022 9:02 AM, Jiri Pirko wrote:
> > Thu, Feb 10, 2022 at 02:25:25AM CET, kuba@kernel.org wrote:
> >> On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
> >>> Well we don't have the SFs at that stage, how can we tell which SF
> >>> will use vnet and which SF will use eth ?
> >> On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
> >>> It's a different user. One works with the eswitch and creates the
> >>> port function. The other one takes the created instance and works with it.
> >>> Note that it may be on a different host.
> >> It is a little confusing, so I may well be misunderstanding but the
> >> cover letter says:
> >>
> >> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
> >>               value false cmode runtime
> >>
> >> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
> >>
> >> So both of these run on the same side, no?
> Yes.
In this cover letter example it is on same side.
But as Jiri explained, both can be on different host.

> >> What I meant is make the former part of the latter:
> >>
> >> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
> >> noprobe
> > I see. So it would not be "global policy" but per-instance option
> > during creation. That makes sense. I wonder if the HW is capable of
> > such flow, Moshe, Saeed?
At present the device isn't capable of propagating this hint.
Moreover, the probe option is for the auxiliary devices of the SF (net, vdpa, rdma).
We still need to probe the SF's main auxiliary device so that a devlink instance of the SF is present to control the SF parameters [1] to compose it.

The one very good advantage I see of the per SF suggestion of Jakub is, the ability to compose most properties of a SF at one place on eswitch side.

However, even with per SF approach on eswitch side, the hurdle was in assigning the cpu affinity of the SF, which is something preferable to do on the host, where the actual workload is running.
So cpu affinity assignment per SF on host side requires devlink reload.
With that consideration it is better to control rest of the other parameters [1] too on customer side auxiliary/mlx5_core.sf.1 side.

[1] https://www.kernel.org/doc/html/latest/networking/devlink/devlink-params.html

> 
> LGTM. Thanks.
> 
> >
> >>
> >> Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
> >> auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
> > Yep.

It is important to describe both use cases in the cover letter where customer side and eswitch side can be in same/different host with example.

Moshe,
Can you please revise the cover letter?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe
  2022-02-10 19:09           ` Parav Pandit
@ 2022-02-11  8:46             ` Moshe Shemesh
  0 siblings, 0 replies; 13+ messages in thread
From: Moshe Shemesh @ 2022-02-11  8:46 UTC (permalink / raw)
  To: Parav Pandit, Jiri Pirko, Jakub Kicinski
  Cc: David S. Miller, Jiri Pirko, Saeed Mahameed, netdev, linux-kernel


On 2/10/2022 9:09 PM, Parav Pandit wrote:
>> From: Moshe Shemesh <moshe@nvidia.com>
>> Sent: Thursday, February 10, 2022 3:58 PM
>>
>> On 2/10/2022 9:02 AM, Jiri Pirko wrote:
>>> Thu, Feb 10, 2022 at 02:25:25AM CET, kuba@kernel.org wrote:
>>>> On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
>>>>> Well we don't have the SFs at that stage, how can we tell which SF
>>>>> will use vnet and which SF will use eth ?
>>>> On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
>>>>> It's a different user. One works with the eswitch and creates the
>>>>> port function. The other one takes the created instance and works with it.
>>>>> Note that it may be on a different host.
>>>> It is a little confusing, so I may well be misunderstanding but the
>>>> cover letter says:
>>>>
>>>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>>>>                value false cmode runtime
>>>>
>>>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>>>
>>>> So both of these run on the same side, no?
>> Yes.
> In this cover letter example it is on same side.
> But as Jiri explained, both can be on different host.
>
>>>> What I meant is make the former part of the latter:
>>>>
>>>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>>> noprobe
>>> I see. So it would not be "global policy" but per-instance option
>>> during creation. That makes sense. I wonder if the HW is capable of
>>> such flow, Moshe, Saeed?
> At present the device isn't capable of propagating this hint.
> Moreover, the probe option is for the auxiliary devices of the SF (net, vdpa, rdma).
> We still need to probe the SF's main auxiliary device so that a devlink instance of the SF is present to control the SF parameters [1] to compose it.
>
> The one very good advantage I see of the per SF suggestion of Jakub is, the ability to compose most properties of a SF at one place on eswitch side.
>
> However, even with per SF approach on eswitch side, the hurdle was in assigning the cpu affinity of the SF, which is something preferable to do on the host, where the actual workload is running.
> So cpu affinity assignment per SF on host side requires devlink reload.
> With that consideration it is better to control rest of the other parameters [1] too on customer side auxiliary/mlx5_core.sf.1 side.
>
> [1] https://www.kernel.org/doc/html/latest/networking/devlink/devlink-params.html
>
>> LGTM. Thanks.
>>
>>>> Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
>>>> auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
>>> Yep.
> It is important to describe both use cases in the cover letter where customer side and eswitch side can be in same/different host with example.
>
> Moshe,
> Can you please revise the cover letter?


Yes, I will send revised version.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-02-11  8:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08 17:14 [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Moshe Shemesh
2022-02-08 17:14 ` [PATCH net-next 1/4] net/mlx5: Split function_setup() to enable and open functions Moshe Shemesh
2022-02-08 17:14 ` [PATCH net-next 2/4] net/mlx5: Delete redundant default assignment of runtime devlink params Moshe Shemesh
2022-02-08 17:14 ` [PATCH net-next 3/4] devlink: Add new "enable_sfs_aux_devs" generic device param Moshe Shemesh
2022-02-08 17:14 ` [PATCH net-next 4/4] net/mlx5: Support enable_sfs_aux_devs devlink param Moshe Shemesh
2022-02-09  5:23 ` [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe Jakub Kicinski
2022-02-09  7:39   ` Moshe Shemesh
2022-02-09  9:57   ` Jiri Pirko
2022-02-10  1:25     ` Jakub Kicinski
2022-02-10  7:02       ` Jiri Pirko
2022-02-10 10:28         ` Moshe Shemesh
2022-02-10 19:09           ` Parav Pandit
2022-02-11  8:46             ` Moshe Shemesh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.