linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net/mlx5: Memory optimizations
@ 2021-11-30 15:07 Shay Drory
  2021-11-30 15:07 ` [PATCH net-next 1/4] net/mlx5: Let user configure io_eq_size resource Shay Drory
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Shay Drory @ 2021-11-30 15:07 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: jiri, saeedm, netdev, linux-kernel, Shay Drory

This series provides knobs which will enable users to
minimize memory consumption of mlx5 Functions (PF/VF/SF).
mlx5 exposes two new generic devlink resources for EQ size
configuration and uses devlink generic param max_macs.

Patches summary:
 - Patch-1 Provides I/O EQ size resource which enables to save
   up to 128KB.
 - Patch-2 Provides event EQ size resource which enables to save up to
   512KB.
 - Patch-3 Clarify max_macs param.
 - Patch-4 Provides max_macs param which enables to save up to 70KB

In total, this series can save up to 700KB per Function.

Shay Drory (4):
  net/mlx5: Let user configure io_eq_size resource
  net/mlx5: Let user configure event_eq_size resource
  devlink: Clarifies max_macs generic devlink param
  net/mlx5: Let user configure max_macs generic param

 .../networking/devlink/devlink-params.rst     |  6 +-
 .../networking/devlink/devlink-resource.rst   |  4 +
 Documentation/networking/devlink/mlx5.rst     |  4 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 67 ++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h | 12 +++
 .../ethernet/mellanox/mlx5/core/devlink_res.c | 79 +++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  5 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    | 21 +++++
 include/linux/mlx5/driver.h                   |  4 -
 include/linux/mlx5/eq.h                       |  1 -
 include/linux/mlx5/mlx5_ifc.h                 |  2 +-
 include/net/devlink.h                         |  2 +
 13 files changed, 198 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c

-- 
2.21.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next 1/4] net/mlx5: Let user configure io_eq_size resource
  2021-11-30 15:07 [PATCH net-next 0/4] net/mlx5: Memory optimizations Shay Drory
@ 2021-11-30 15:07 ` Shay Drory
  2021-11-30 15:07 ` [PATCH net-next 2/4] net/mlx5: Let user configure event_eq_size resource Shay Drory
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Shay Drory @ 2021-11-30 15:07 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: jiri, saeedm, netdev, linux-kernel, Shay Drory, Moshe Shemesh

Currently, each I/O EQ is taking 128KB of memory. This size
is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the size of I/O EQs.

For example, to reduce I/O EQ size to 64, execute:
$ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0

In addition, add it as a "Generic Resource" in order for different
drivers to be aligned by the same resource name when exposing to
user space.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
 .../networking/devlink/devlink-resource.rst   |  2 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.h | 11 ++++
 .../ethernet/mellanox/mlx5/core/devlink_res.c | 55 +++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  3 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |  3 +
 include/linux/mlx5/driver.h                   |  4 --
 include/net/devlink.h                         |  1 +
 8 files changed, 75 insertions(+), 6 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c

diff --git a/Documentation/networking/devlink/devlink-resource.rst b/Documentation/networking/devlink/devlink-resource.rst
index 3d5ae51e65a2..d5df5e65d057 100644
--- a/Documentation/networking/devlink/devlink-resource.rst
+++ b/Documentation/networking/devlink/devlink-resource.rst
@@ -36,6 +36,8 @@ device drivers and their description must be added to the following table:
      - Description
    * - ``physical_ports``
      - A limited capacity of physical ports that the switch ASIC can support
+   * - ``io_eq_size``
+     - Control the size of I/O completion EQs
 
 example usage
 -------------
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index e63bb9ceb9c0..19656ea025c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -16,7 +16,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		transobj.o vport.o sriov.o fs_cmd.o fs_core.o pci_irq.o \
 		fs_counters.o fs_ft_pool.o rl.o lag/lag.o dev.o events.o wq.o lib/gid.o \
 		lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o diag/fs_tracepoint.o \
-		diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o \
+		diag/fw_tracer.o diag/crdump.o devlink.o devlink_res.o diag/rsc_dump.o \
 		fw_reset.o qos.o lib/tout.o
 
 #
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 30bf4882779b..4192f23b1446 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -6,6 +6,13 @@
 
 #include <net/devlink.h>
 
+enum mlx5_devlink_resource_id {
+	MLX5_DL_RES_COMP_EQ = 1,
+
+	__MLX5_ID_RES_MAX,
+	MLX5_ID_RES_MAX = __MLX5_ID_RES_MAX - 1,
+};
+
 enum mlx5_devlink_param_id {
 	MLX5_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	MLX5_DEVLINK_PARAM_ID_FLOW_STEERING_MODE,
@@ -31,6 +38,10 @@ int mlx5_devlink_trap_get_num_active(struct mlx5_core_dev *dev);
 int mlx5_devlink_traps_get_action(struct mlx5_core_dev *dev, int trap_id,
 				  enum devlink_trap_action *action);
 
+void mlx5_devlink_res_register(struct mlx5_core_dev *dev);
+void mlx5_devlink_res_unregister(struct mlx5_core_dev *dev);
+size_t mlx5_devlink_res_size(struct mlx5_core_dev *dev, enum mlx5_devlink_resource_id id);
+
 struct devlink *mlx5_devlink_alloc(struct device *dev);
 void mlx5_devlink_free(struct devlink *devlink);
 int mlx5_devlink_register(struct devlink *devlink);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c
new file mode 100644
index 000000000000..2b7a956b7779
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. */
+
+#include "devlink.h"
+#include "mlx5_core.h"
+
+enum {
+	MLX5_EQ_MIN_SIZE = 64,
+	MLX5_EQ_MAX_SIZE = 4096,
+	MLX5_COMP_EQ_SIZE = 1024,
+};
+
+static int comp_eq_res_register(struct mlx5_core_dev *dev)
+{
+	struct devlink_resource_size_params comp_eq_size;
+	struct devlink *devlink = priv_to_devlink(dev);
+
+	devlink_resource_size_params_init(&comp_eq_size, MLX5_EQ_MIN_SIZE,
+					  MLX5_EQ_MAX_SIZE, 1, DEVLINK_RESOURCE_UNIT_ENTRY);
+	return devlink_resource_register(devlink, DEVLINK_RESOURCE_GENERIC_NAME_IO_EQ,
+					 MLX5_COMP_EQ_SIZE, MLX5_DL_RES_COMP_EQ,
+					 DEVLINK_RESOURCE_ID_PARENT_TOP, &comp_eq_size);
+}
+
+void mlx5_devlink_res_register(struct mlx5_core_dev *dev)
+{
+	int err;
+
+	err = comp_eq_res_register(dev);
+	if (err)
+		mlx5_core_err(dev, "Failed to register resources, err = %d\n", err);
+}
+
+void mlx5_devlink_res_unregister(struct mlx5_core_dev *dev)
+{
+	devlink_resources_unregister(priv_to_devlink(dev), NULL);
+}
+
+static const size_t default_vals[MLX5_ID_RES_MAX + 1] = {
+	[MLX5_DL_RES_COMP_EQ] = MLX5_COMP_EQ_SIZE,
+};
+
+size_t mlx5_devlink_res_size(struct mlx5_core_dev *dev, enum mlx5_devlink_resource_id id)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	u64 size;
+	int err;
+
+	err = devlink_resource_size_get(devlink, id, &size);
+	if (!err)
+		return size;
+	mlx5_core_err(dev, "Failed to get param. using default. err = %d, id = %u\n",
+		      err, id);
+	return default_vals[id];
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 792e0d6aa861..4dda6e2a4cbc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -19,6 +19,7 @@
 #include "lib/clock.h"
 #include "diag/fw_tracer.h"
 #include "mlx5_irq.h"
+#include "devlink.h"
 
 enum {
 	MLX5_EQE_OWNER_INIT_VAL	= 0x1,
@@ -807,7 +808,7 @@ static int create_comp_eqs(struct mlx5_core_dev *dev)
 
 	INIT_LIST_HEAD(&table->comp_eqs_list);
 	ncomp_eqs = table->num_comp_eqs;
-	nent = MLX5_COMP_EQ_SIZE;
+	nent = mlx5_devlink_res_size(dev, MLX5_DL_RES_COMP_EQ);
 	for (i = 0; i < ncomp_eqs; i++) {
 		struct mlx5_eq_param param = {};
 		int vecidx = i;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index a92a92a52346..f55a89bd3736 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -922,6 +922,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 	dev->hv_vhca = mlx5_hv_vhca_create(dev);
 	dev->rsc_dump = mlx5_rsc_dump_create(dev);
 
+	mlx5_devlink_res_register(dev);
+
 	return 0;
 
 err_sf_table_cleanup:
@@ -957,6 +959,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 
 static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 {
+	mlx5_devlink_res_unregister(dev);
 	mlx5_rsc_dump_destroy(dev);
 	mlx5_hv_vhca_destroy(dev->hv_vhca);
 	mlx5_fw_tracer_destroy(dev->tracer);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index a623ec635947..d07359e98fd4 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -781,10 +781,6 @@ struct mlx5_db {
 	int			index;
 };
 
-enum {
-	MLX5_COMP_EQ_SIZE = 1024,
-};
-
 enum {
 	MLX5_PTYS_IB = 1 << 0,
 	MLX5_PTYS_EN = 1 << 2,
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 043fcec8b0aa..ecc55ee526fa 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -364,6 +364,7 @@ typedef u64 devlink_resource_occ_get_t(void *priv);
 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
 
 #define DEVLINK_RESOURCE_GENERIC_NAME_PORTS "physical_ports"
+#define DEVLINK_RESOURCE_GENERIC_NAME_IO_EQ "io_eq_size"
 
 #define __DEVLINK_PARAM_MAX_STRING_VALUE 32
 enum devlink_param_type {
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 2/4] net/mlx5: Let user configure event_eq_size resource
  2021-11-30 15:07 [PATCH net-next 0/4] net/mlx5: Memory optimizations Shay Drory
  2021-11-30 15:07 ` [PATCH net-next 1/4] net/mlx5: Let user configure io_eq_size resource Shay Drory
@ 2021-11-30 15:07 ` Shay Drory
  2021-11-30 15:07 ` [PATCH net-next 3/4] devlink: Clarifies max_macs generic devlink param Shay Drory
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Shay Drory @ 2021-11-30 15:07 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: jiri, saeedm, netdev, linux-kernel, Shay Drory, Moshe Shemesh

Event EQ is an EQ which received the notification of almost all the
events generated by the NIC.
Currently, each event EQ is taking 512KB of memory. This size is not
needed in most use cases, and is critical with large scale. Hence,
allow user to configure the size of the event EQ.

For example to reduce event EQ size to 64, execute::
$ devlink resource set pci/0000:00:0b.0 path /event_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0

In addition, add it as a "Generic Resource" in order for different
drivers to be aligned by the same resource name when exposing to
user space.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
 .../networking/devlink/devlink-resource.rst   |  2 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |  1 +
 .../ethernet/mellanox/mlx5/core/devlink_res.c | 26 ++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  2 +-
 include/linux/mlx5/eq.h                       |  1 -
 include/net/devlink.h                         |  1 +
 6 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-resource.rst b/Documentation/networking/devlink/devlink-resource.rst
index d5df5e65d057..7c66ae6df2e6 100644
--- a/Documentation/networking/devlink/devlink-resource.rst
+++ b/Documentation/networking/devlink/devlink-resource.rst
@@ -38,6 +38,8 @@ device drivers and their description must be added to the following table:
      - A limited capacity of physical ports that the switch ASIC can support
    * - ``io_eq_size``
      - Control the size of I/O completion EQs
+   * - ``event_eq_size``
+     - Control the size of the asynchronous control events EQ
 
 example usage
 -------------
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 4192f23b1446..674415fd0b3a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -8,6 +8,7 @@
 
 enum mlx5_devlink_resource_id {
 	MLX5_DL_RES_COMP_EQ = 1,
+	MLX5_DL_RES_ASYNC_EQ,
 
 	__MLX5_ID_RES_MAX,
 	MLX5_ID_RES_MAX = __MLX5_ID_RES_MAX - 1,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c
index 2b7a956b7779..8cbe08577c05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink_res.c
@@ -7,6 +7,7 @@
 enum {
 	MLX5_EQ_MIN_SIZE = 64,
 	MLX5_EQ_MAX_SIZE = 4096,
+	MLX5_NUM_ASYNC_EQE = 4096,
 	MLX5_COMP_EQ_SIZE = 1024,
 };
 
@@ -22,13 +23,35 @@ static int comp_eq_res_register(struct mlx5_core_dev *dev)
 					 DEVLINK_RESOURCE_ID_PARENT_TOP, &comp_eq_size);
 }
 
+static int async_eq_res_register(struct mlx5_core_dev *dev)
+{
+	struct devlink_resource_size_params async_eq_size;
+	struct devlink *devlink = priv_to_devlink(dev);
+
+	devlink_resource_size_params_init(&async_eq_size, MLX5_EQ_MIN_SIZE,
+					  MLX5_EQ_MAX_SIZE, 1, DEVLINK_RESOURCE_UNIT_ENTRY);
+	return devlink_resource_register(devlink, DEVLINK_RESOURCE_GENERIC_NAME_EVENT_EQ,
+					 MLX5_NUM_ASYNC_EQE, MLX5_DL_RES_ASYNC_EQ,
+					 DEVLINK_RESOURCE_ID_PARENT_TOP,
+					 &async_eq_size);
+}
+
 void mlx5_devlink_res_register(struct mlx5_core_dev *dev)
 {
 	int err;
 
 	err = comp_eq_res_register(dev);
 	if (err)
-		mlx5_core_err(dev, "Failed to register resources, err = %d\n", err);
+		goto err_msg;
+
+	err = async_eq_res_register(dev);
+	if (err)
+		goto err;
+	return;
+err:
+	devlink_resources_unregister(priv_to_devlink(dev), NULL);
+err_msg:
+	mlx5_core_err(dev, "Failed to register resources, err = %d\n", err);
 }
 
 void mlx5_devlink_res_unregister(struct mlx5_core_dev *dev)
@@ -38,6 +61,7 @@ void mlx5_devlink_res_unregister(struct mlx5_core_dev *dev)
 
 static const size_t default_vals[MLX5_ID_RES_MAX + 1] = {
 	[MLX5_DL_RES_COMP_EQ] = MLX5_COMP_EQ_SIZE,
+	[MLX5_DL_RES_ASYNC_EQ] = MLX5_NUM_ASYNC_EQE,
 };
 
 size_t mlx5_devlink_res_size(struct mlx5_core_dev *dev, enum mlx5_devlink_resource_id id)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 4dda6e2a4cbc..31e69067284b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -647,7 +647,7 @@ static int create_async_eqs(struct mlx5_core_dev *dev)
 
 	param = (struct mlx5_eq_param) {
 		.irq_index = MLX5_IRQ_EQ_CTRL,
-		.nent = MLX5_NUM_ASYNC_EQE,
+		.nent = mlx5_devlink_res_size(dev, MLX5_DL_RES_ASYNC_EQ),
 	};
 
 	gather_async_events_mask(dev, param.mask);
diff --git a/include/linux/mlx5/eq.h b/include/linux/mlx5/eq.h
index ea3ff5a8ced3..11161e427608 100644
--- a/include/linux/mlx5/eq.h
+++ b/include/linux/mlx5/eq.h
@@ -5,7 +5,6 @@
 #define MLX5_CORE_EQ_H
 
 #define MLX5_NUM_CMD_EQE   (32)
-#define MLX5_NUM_ASYNC_EQE (0x1000)
 #define MLX5_NUM_SPARE_EQE (0x80)
 
 struct mlx5_eq;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index ecc55ee526fa..43b6fdd9ffa5 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -365,6 +365,7 @@ typedef u64 devlink_resource_occ_get_t(void *priv);
 
 #define DEVLINK_RESOURCE_GENERIC_NAME_PORTS "physical_ports"
 #define DEVLINK_RESOURCE_GENERIC_NAME_IO_EQ "io_eq_size"
+#define DEVLINK_RESOURCE_GENERIC_NAME_EVENT_EQ "event_eq_size"
 
 #define __DEVLINK_PARAM_MAX_STRING_VALUE 32
 enum devlink_param_type {
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 3/4] devlink: Clarifies max_macs generic devlink param
  2021-11-30 15:07 [PATCH net-next 0/4] net/mlx5: Memory optimizations Shay Drory
  2021-11-30 15:07 ` [PATCH net-next 1/4] net/mlx5: Let user configure io_eq_size resource Shay Drory
  2021-11-30 15:07 ` [PATCH net-next 2/4] net/mlx5: Let user configure event_eq_size resource Shay Drory
@ 2021-11-30 15:07 ` Shay Drory
  2021-11-30 15:07 ` [PATCH net-next 4/4] net/mlx5: Let user configure max_macs generic param Shay Drory
  2021-11-30 19:39 ` [PATCH net-next 0/4] net/mlx5: Memory optimizations Jakub Kicinski
  4 siblings, 0 replies; 11+ messages in thread
From: Shay Drory @ 2021-11-30 15:07 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: jiri, saeedm, netdev, linux-kernel, Shay Drory, Moshe Shemesh

The generic param max_macs documentation isn't clear.
Replace it with a more descriptive documentation

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
 Documentation/networking/devlink/devlink-params.rst | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index b7dfe693a332..c2542dcf63c0 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -118,8 +118,10 @@ own name.
        errors.
    * - ``max_macs``
      - u32
-     - Specifies the maximum number of MAC addresses per ethernet port of
-       this device.
+     - Typically macvlan, vlan net devices mac are also programmed in their
+       parent netdevice's Function rx filter. This parameter limit the
+       maximum number of unicast mac address filters to receive traffic from
+       per ethernet port of this device.
    * - ``region_snapshot_enable``
      - Boolean
      - Enable capture of ``devlink-region`` snapshots.
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 4/4] net/mlx5: Let user configure max_macs generic param
  2021-11-30 15:07 [PATCH net-next 0/4] net/mlx5: Memory optimizations Shay Drory
                   ` (2 preceding siblings ...)
  2021-11-30 15:07 ` [PATCH net-next 3/4] devlink: Clarifies max_macs generic devlink param Shay Drory
@ 2021-11-30 15:07 ` Shay Drory
  2021-11-30 19:39 ` [PATCH net-next 0/4] net/mlx5: Memory optimizations Jakub Kicinski
  4 siblings, 0 replies; 11+ messages in thread
From: Shay Drory @ 2021-11-30 15:07 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: jiri, saeedm, netdev, linux-kernel, Shay Drory, Moshe Shemesh,
	Parav Pandit

Currently, max_macs is taking 70Kbytes of memory per function. This
size is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the number of max_macs.

For example, to reduce the number of max_macs to 1, execute::
$ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \
              cmode driverinit
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
---
 Documentation/networking/devlink/mlx5.rst     |  4 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 67 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/main.c    | 18 +++++
 include/linux/mlx5/mlx5_ifc.h                 |  2 +-
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 4e4b97f7971a..c44043bcae72 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -14,8 +14,12 @@ Parameters
 
    * - Name
      - Mode
+     - Validation
    * - ``enable_roce``
      - driverinit
+   * - ``max_macs``
+     - driverinit
+     - The range is between 1 and 2^31. Only power of 2 values are supported.
 
 The ``mlx5`` driver also implements the following driver-specific
 parameters.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 1c98652b244a..7383b727f49e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -752,6 +752,66 @@ static void mlx5_devlink_auxdev_params_unregister(struct devlink *devlink)
 	mlx5_devlink_eth_param_unregister(devlink);
 }
 
+static int mlx5_devlink_max_uc_list_validate(struct devlink *devlink, u32 id,
+					     union devlink_param_value val,
+					     struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	if (val.vu32 == 0) {
+		NL_SET_ERR_MSG_MOD(extack, "max_macs value must be greater than 0");
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(val.vu32)) {
+		NL_SET_ERR_MSG_MOD(extack, "Only power of 2 values are supported for max_macs");
+		return -EINVAL;
+	}
+
+	if (ilog2(val.vu32) >
+	    MLX5_CAP_GEN_MAX(dev, log_max_current_uc_list)) {
+		NL_SET_ERR_MSG_MOD(extack, "max_macs value is out of the supported range");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct devlink_param max_uc_list_param =
+	DEVLINK_PARAM_GENERIC(MAX_MACS, BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+			      NULL, NULL, mlx5_devlink_max_uc_list_validate);
+
+static int mlx5_devlink_max_uc_list_param_register(struct devlink *devlink)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	union devlink_param_value value;
+	int err;
+
+	if (!MLX5_CAP_GEN_MAX(dev, log_max_current_uc_list_wr_supported))
+		return 0;
+
+	err = devlink_param_register(devlink, &max_uc_list_param);
+	if (err)
+		return err;
+
+	value.vu32 = 1 << MLX5_CAP_GEN(dev, log_max_current_uc_list);
+	devlink_param_driverinit_value_set(devlink,
+					   DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
+					   value);
+	return 0;
+}
+
+static void
+mlx5_devlink_max_uc_list_param_unregister(struct devlink *devlink)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	if (!MLX5_CAP_GEN(dev, log_max_current_uc_list_wr_supported))
+		return;
+
+	devlink_param_unregister(devlink, &max_uc_list_param);
+}
+
 #define MLX5_TRAP_DROP(_id, _group_id)					\
 	DEVLINK_TRAP_GENERIC(DROP, DROP, _id,				\
 			     DEVLINK_TRAP_GROUP_GENERIC_ID_##_group_id, \
@@ -815,11 +875,17 @@ int mlx5_devlink_register(struct devlink *devlink)
 	if (err)
 		goto traps_reg_err;
 
+	err = mlx5_devlink_max_uc_list_param_register(devlink);
+	if (err)
+		goto uc_list_reg_err;
+
 	if (!mlx5_core_is_mp_slave(dev))
 		devlink_set_features(devlink, DEVLINK_F_RELOAD);
 
 	return 0;
 
+uc_list_reg_err:
+	mlx5_devlink_traps_unregister(devlink);
 traps_reg_err:
 	mlx5_devlink_auxdev_params_unregister(devlink);
 auxdev_reg_err:
@@ -830,6 +896,7 @@ int mlx5_devlink_register(struct devlink *devlink)
 
 void mlx5_devlink_unregister(struct devlink *devlink)
 {
+	mlx5_devlink_max_uc_list_param_unregister(devlink);
 	mlx5_devlink_traps_unregister(devlink);
 	mlx5_devlink_auxdev_params_unregister(devlink);
 	devlink_params_unregister(devlink, mlx5_devlink_params,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f55a89bd3736..a6819575854f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -484,10 +484,23 @@ static int handle_hca_cap_odp(struct mlx5_core_dev *dev, void *set_ctx)
 	return set_caps(dev, set_ctx, MLX5_SET_HCA_CAP_OP_MOD_ODP);
 }
 
+static int max_uc_list_get_devlink_param(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	union devlink_param_value val;
+	int err;
+
+	err = devlink_param_driverinit_value_get(devlink,
+						 DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
+						 &val);
+	return err ? err : val.vu32;
+}
+
 static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)
 {
 	struct mlx5_profile *prof = &dev->profile;
 	void *set_hca_cap;
+	int max_uc_list;
 	int err;
 
 	err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL);
@@ -561,6 +574,11 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)
 	if (MLX5_CAP_GEN(dev, roce_rw_supported))
 		MLX5_SET(cmd_hca_cap, set_hca_cap, roce, mlx5_is_roce_init_enabled(dev));
 
+	max_uc_list = max_uc_list_get_devlink_param(dev);
+	if (max_uc_list > 0)
+		MLX5_SET(cmd_hca_cap, set_hca_cap, log_max_current_uc_list,
+			 ilog2(max_uc_list));
+
 	return set_caps(dev, set_ctx, MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
 }
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 3636df90899a..d3899fc33fd7 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1621,7 +1621,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
 	u8         ext_stride_num_range[0x1];
 	u8         roce_rw_supported[0x1];
-	u8         reserved_at_3a2[0x1];
+	u8         log_max_current_uc_list_wr_supported[0x1];
 	u8         log_max_stride_sz_rq[0x5];
 	u8         reserved_at_3a8[0x3];
 	u8         log_min_stride_sz_rq[0x5];
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Memory optimizations
  2021-11-30 15:07 [PATCH net-next 0/4] net/mlx5: Memory optimizations Shay Drory
                   ` (3 preceding siblings ...)
  2021-11-30 15:07 ` [PATCH net-next 4/4] net/mlx5: Let user configure max_macs generic param Shay Drory
@ 2021-11-30 19:39 ` Jakub Kicinski
  2021-12-01  8:22   ` Shay Drory
  4 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2021-11-30 19:39 UTC (permalink / raw)
  To: Shay Drory; +Cc: David S . Miller, jiri, saeedm, netdev, linux-kernel

On Tue, 30 Nov 2021 17:07:02 +0200 Shay Drory wrote:
>  - Patch-1 Provides I/O EQ size resource which enables to save
>    up to 128KB.
>  - Patch-2 Provides event EQ size resource which enables to save up to
>    512KB.

Why is something allocated in host memory a device resource? 🤔

Did you analyze if others may need this?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Memory optimizations
  2021-11-30 19:39 ` [PATCH net-next 0/4] net/mlx5: Memory optimizations Jakub Kicinski
@ 2021-12-01  8:22   ` Shay Drory
  2021-12-02 17:31     ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Shay Drory @ 2021-12-01  8:22 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S . Miller, jiri, saeedm, netdev, linux-kernel


On 11/30/2021 21:39, Jakub Kicinski wrote:
> On Tue, 30 Nov 2021 17:07:02 +0200 Shay Drory wrote:
>>   - Patch-1 Provides I/O EQ size resource which enables to save
>>     up to 128KB.
>>   - Patch-2 Provides event EQ size resource which enables to save up to
>>     512KB.
> Why is something allocated in host memory a device resource? 🤔

EQ resides in the host memory. It is RO for host driver, RW by device.
When interrupt is generated EQ entry is placed by device and read by driver.
It indicates about what event occurred such as CQE, async and more.

> Did you analyze if others may need this?

So far no feedback by other vendors.
The resources are implemented in generic way, if other vendors would
like to implement them.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Memory optimizations
  2021-12-01  8:22   ` Shay Drory
@ 2021-12-02 17:31     ` Jakub Kicinski
  2021-12-02 18:55       ` Saeed Mahameed
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-02 17:31 UTC (permalink / raw)
  To: Shay Drory; +Cc: David S . Miller, jiri, saeedm, netdev, linux-kernel

On Wed, 1 Dec 2021 10:22:17 +0200 Shay Drory wrote:
> On 11/30/2021 21:39, Jakub Kicinski wrote:
> > On Tue, 30 Nov 2021 17:07:02 +0200 Shay Drory wrote:  
> >>   - Patch-1 Provides I/O EQ size resource which enables to save
> >>     up to 128KB.
> >>   - Patch-2 Provides event EQ size resource which enables to save up to
> >>     512KB.  
> > Why is something allocated in host memory a device resource? 🤔  
> 
> EQ resides in the host memory. It is RO for host driver, RW by device.
> When interrupt is generated EQ entry is placed by device and read by driver.
> It indicates about what event occurred such as CQE, async and more.

I understand that. My point was the resource which is being consumed
here is _host_ memory. Is there precedent for configuring host memory
consumption via devlink resource?

I'd even question whether this belongs in devlink in the first place.
It is not global device config in any way. If devlink represents the
entire device it's rather strange to have a case where main instance
limits a size of some resource by VFs and other endpoints can still
choose whatever they want.

> > Did you analyze if others may need this?  
> 
> So far no feedback by other vendors.
> The resources are implemented in generic way, if other vendors would
> like to implement them.

Well, I was hoping you'd look around, but maybe that's too much to ask
of a vendor.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Memory optimizations
  2021-12-02 17:31     ` Jakub Kicinski
@ 2021-12-02 18:55       ` Saeed Mahameed
  2021-12-03  1:28         ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Saeed Mahameed @ 2021-12-02 18:55 UTC (permalink / raw)
  To: Shay Drory, kuba; +Cc: Jiri Pirko, davem, netdev, linux-kernel

On Thu, 2021-12-02 at 09:31 -0800, Jakub Kicinski wrote:
> On Wed, 1 Dec 2021 10:22:17 +0200 Shay Drory wrote:
> > On 11/30/2021 21:39, Jakub Kicinski wrote:
> > > On Tue, 30 Nov 2021 17:07:02 +0200 Shay Drory wrote:  
> > > >   - Patch-1 Provides I/O EQ size resource which enables to save
> > > >     up to 128KB.
> > > >   - Patch-2 Provides event EQ size resource which enables to
> > > > save up to
> > > >     512KB.  
> > > Why is something allocated in host memory a device resource? 🤔  
> > 
> > EQ resides in the host memory. It is RO for host driver, RW by
> > device.
> > When interrupt is generated EQ entry is placed by device and read
> > by driver.
> > It indicates about what event occurred such as CQE, async and more.
> 
> I understand that. My point was the resource which is being consumed
> here is _host_ memory. Is there precedent for configuring host memory
> consumption via devlink resource?
> 

it's a device resource size nonetheless, devlink resource API makes
total sense.

> I'd even question whether this belongs in devlink in the first place.
> It is not global device config in any way. If devlink represents the
> entire device it's rather strange to have a case where main instance
> limits a size of some resource by VFs and other endpoints can still
> choose whatever they want.
> 

This resource is per function instance, we have devlink instance per
function, e.g. in the VM, there is a VF devlink instance the VM user
can use to control own VF resources. in the PF/Hypervisor, the only
devlink representation of the VF will be devlink port function (used
for other purposes)

for example:

A tenant can fine-tune a resource size tailored to their needs via the
VF's own devlink instance.

An admin can only control or restrict a max size of a resource for a
given port function ( the devlink instance that represents the VF in
the hypervisor). (note: this patchset is not about that)


> > > Did you analyze if others may need this?  
> > 
> > So far no feedback by other vendors.
> > The resources are implemented in generic way, if other vendors
> > would
> > like to implement them.
> 
> Well, I was hoping you'd look around, but maybe that's too much to
> ask
> of a vendor.

We looked, eq is a common object among many other drivers.
and DEVLINK_PARAM_GENERIC_ID_MAX_MACS is already a devlink generic
param, and i am sure other vendors have limited macs per VF :) .. 
so this applies to all vendors even if they don't advertise it.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Memory optimizations
  2021-12-02 18:55       ` Saeed Mahameed
@ 2021-12-03  1:28         ` Jakub Kicinski
  2021-12-06  8:18           ` Jiri Pirko
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-03  1:28 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: Shay Drory, Jiri Pirko, davem, netdev, linux-kernel

On Thu, 2 Dec 2021 18:55:37 +0000 Saeed Mahameed wrote:
> On Thu, 2021-12-02 at 09:31 -0800, Jakub Kicinski wrote:
> > On Wed, 1 Dec 2021 10:22:17 +0200 Shay Drory wrote:  
> > > EQ resides in the host memory. It is RO for host driver, RW by
> > > device.
> > > When interrupt is generated EQ entry is placed by device and read
> > > by driver.
> > > It indicates about what event occurred such as CQE, async and more.  
> > 
> > I understand that. My point was the resource which is being consumed
> > here is _host_ memory. Is there precedent for configuring host memory
> > consumption via devlink resource?
> 
> it's a device resource size nonetheless, devlink resource API makes
> total sense.

I disagree. Devlink resources were originally written to partition
finite device resources. You're just sizing a queue here.

> > I'd even question whether this belongs in devlink in the first place.
> > It is not global device config in any way. If devlink represents the
> > entire device it's rather strange to have a case where main instance
> > limits a size of some resource by VFs and other endpoints can still
> > choose whatever they want.
> 
> This resource is per function instance, we have devlink instance per
> function, e.g. in the VM, there is a VF devlink instance the VM user
> can use to control own VF resources. in the PF/Hypervisor, the only
> devlink representation of the VF will be devlink port function (used
> for other purposes)
> 
> for example:
> 
> A tenant can fine-tune a resource size tailored to their needs via the
> VF's own devlink instance.

Yeah, because it's a device resource. Tenant can consume their host
DRAM in any way they find suitable.

> An admin can only control or restrict a max size of a resource for a
> given port function ( the devlink instance that represents the VF in
> the hypervisor). (note: this patchset is not about that)
> 
> > > So far no feedback by other vendors.
> > > The resources are implemented in generic way, if other vendors
> > > would
> > > like to implement them.  
> > 
> > Well, I was hoping you'd look around, but maybe that's too much to
> > ask of a vendor.  
> 
> We looked, eq is a common object among many other drivers.
> and DEVLINK_PARAM_GENERIC_ID_MAX_MACS is already a devlink generic
> param, and i am sure other vendors have limited macs per VF :) .. 
> so this applies to all vendors even if they don't advertise it.

Yeah, if you're not willing to model the Event Queue as a queue using
params seems like a better idea than abusing resources.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx5: Memory optimizations
  2021-12-03  1:28         ` Jakub Kicinski
@ 2021-12-06  8:18           ` Jiri Pirko
  0 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2021-12-06  8:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Shay Drory, Jiri Pirko, davem, netdev, linux-kernel

Fri, Dec 03, 2021 at 02:28:03AM CET, kuba@kernel.org wrote:
>On Thu, 2 Dec 2021 18:55:37 +0000 Saeed Mahameed wrote:
>> On Thu, 2021-12-02 at 09:31 -0800, Jakub Kicinski wrote:
>> > On Wed, 1 Dec 2021 10:22:17 +0200 Shay Drory wrote:  
>> > > EQ resides in the host memory. It is RO for host driver, RW by
>> > > device.
>> > > When interrupt is generated EQ entry is placed by device and read
>> > > by driver.
>> > > It indicates about what event occurred such as CQE, async and more.  
>> > 
>> > I understand that. My point was the resource which is being consumed
>> > here is _host_ memory. Is there precedent for configuring host memory
>> > consumption via devlink resource?
>> 
>> it's a device resource size nonetheless, devlink resource API makes
>> total sense.
>
>I disagree. Devlink resources were originally written to partition
>finite device resources. You're just sizing a queue here.
>
>> > I'd even question whether this belongs in devlink in the first place.
>> > It is not global device config in any way. If devlink represents the
>> > entire device it's rather strange to have a case where main instance
>> > limits a size of some resource by VFs and other endpoints can still
>> > choose whatever they want.
>> 
>> This resource is per function instance, we have devlink instance per
>> function, e.g. in the VM, there is a VF devlink instance the VM user
>> can use to control own VF resources. in the PF/Hypervisor, the only
>> devlink representation of the VF will be devlink port function (used
>> for other purposes)
>> 
>> for example:
>> 
>> A tenant can fine-tune a resource size tailored to their needs via the
>> VF's own devlink instance.
>
>Yeah, because it's a device resource. Tenant can consume their host
>DRAM in any way they find suitable.
>
>> An admin can only control or restrict a max size of a resource for a
>> given port function ( the devlink instance that represents the VF in
>> the hypervisor). (note: this patchset is not about that)
>> 
>> > > So far no feedback by other vendors.
>> > > The resources are implemented in generic way, if other vendors
>> > > would
>> > > like to implement them.  
>> > 
>> > Well, I was hoping you'd look around, but maybe that's too much to
>> > ask of a vendor.  
>> 
>> We looked, eq is a common object among many other drivers.
>> and DEVLINK_PARAM_GENERIC_ID_MAX_MACS is already a devlink generic
>> param, and i am sure other vendors have limited macs per VF :) .. 
>> so this applies to all vendors even if they don't advertise it.
>
>Yeah, if you're not willing to model the Event Queue as a queue using
>params seems like a better idea than abusing resources.

I think you are right. On second thought, param look like a better fit.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-06  8:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-30 15:07 [PATCH net-next 0/4] net/mlx5: Memory optimizations Shay Drory
2021-11-30 15:07 ` [PATCH net-next 1/4] net/mlx5: Let user configure io_eq_size resource Shay Drory
2021-11-30 15:07 ` [PATCH net-next 2/4] net/mlx5: Let user configure event_eq_size resource Shay Drory
2021-11-30 15:07 ` [PATCH net-next 3/4] devlink: Clarifies max_macs generic devlink param Shay Drory
2021-11-30 15:07 ` [PATCH net-next 4/4] net/mlx5: Let user configure max_macs generic param Shay Drory
2021-11-30 19:39 ` [PATCH net-next 0/4] net/mlx5: Memory optimizations Jakub Kicinski
2021-12-01  8:22   ` Shay Drory
2021-12-02 17:31     ` Jakub Kicinski
2021-12-02 18:55       ` Saeed Mahameed
2021-12-03  1:28         ` Jakub Kicinski
2021-12-06  8:18           ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).