netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net-next 00/15] mlx5 updates 2023-01-10
@ 2023-01-11  5:30 Saeed Mahameed
  2023-01-11  5:30 ` [net-next 01/15] net/mlx5: Expose shared buffer registers bits and structs Saeed Mahameed
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Saeed Mahameed <saeedm@nvidia.com>

This series provides updates to mlx5 driver.
For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.


The following changes since commit a6f536063b69102adf3588fbc0bb4f08d6c8cb82:

  qed: fix a typo in comment (2023-01-10 18:13:22 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2023-01-10

for you to fetch changes up to 96c31b5b2caecae2eebb1ed0fba5dc082b2fb740:

  net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create() (2023-01-10 21:24:43 -0800)

----------------------------------------------------------------
mlx5-updates-2023-01-10

1) From Gal: Add debugfs entries for netdev nic driver
   - ktls, flow steering and hairpin info
   - useful for debug and performance analysis
   - e.g hairpin queue attributes, dump ktls tx pool size, etc

2) From Maher: Update shared buffer configuration on PFC commands
   2.1) For every change of buffer's headroom, recalculate the size of shared
       buffer to be equal to "total_buffer_size" - "new_headroom_size".
       The new shared buffer size will be split in ratio of 3:1 between
       lossy and lossless pools, respectively.

   2.2) For each port buffer change, count the number of lossless buffers.
       If there is only one lossless buffer, then set its lossless pool
       usage threshold to be infinite. Otherwise, if there is more than
       one lossless buffer, set a usage threshold for each lossless buffer.

    While at it, add more verbosity to debug prints when handling user
    commands, to assist in future debug.

3) From Tariq: Throttle high rate FW commands

4) From Shay: Properly initialize management PF

5) Various cleanup patches

----------------------------------------------------------------
Gal Pressman (4):
      net/mlx5e: Add Ethernet driver debugfs
      net/mlx5e: Add hairpin params structure
      net/mlx5e: Add flow steering debugfs directory
      net/mlx5e: Add hairpin debugfs files

Gustavo A. R. Silva (1):
      net/mlx5e: Replace zero-length array with flexible-array member

Kees Cook (1):
      net/mlx5e: Replace 0-length array with flexible array

Maher Sanalla (3):
      net/mlx5: Expose shared buffer registers bits and structs
      net/mlx5e: Add API to query/modify SBPR and SBCM registers
      net/mlx5e: Update shared buffer along with device buffer changes

Shay Drory (1):
      net/mlx5: Enable management PF initialization

Tariq Toukan (3):
      net/mlx5e: kTLS, Add debugfs
      net/mlx5: Introduce and use opcode getter in command interface
      net/mlx5: Prevent high-rate FW commands from populating all slots

YueHaibing (1):
      net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create()

zhang songyi (1):
      net/mlx5: remove redundant ret variable

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      | 118 ++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/dev.c      |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c     |   8 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/fs.h    |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/port.c  |  72 +++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/port.h  |   6 +
 .../ethernet/mellanox/mlx5/core/en/port_buffer.c   | 222 ++++++++++++++++++++-
 .../ethernet/mellanox/mlx5/core/en/port_buffer.h   |   1 +
 .../net/ethernet/mellanox/mlx5/core/en/tc/meter.c  |   2 +-
 .../ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c  |   6 +-
 .../ethernet/mellanox/mlx5/core/en_accel/ktls.c    |  22 ++
 .../ethernet/mellanox/mlx5/core/en_accel/ktls.h    |   8 +
 .../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c |  22 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    |  22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 169 ++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |   3 +-
 .../ethernet/mellanox/mlx5/core/steering/dr_send.c |   5 +-
 include/linux/mlx5/driver.h                        |   8 +
 include/linux/mlx5/mlx5_ifc.h                      |  61 ++++++
 23 files changed, 706 insertions(+), 83 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [net-next 01/15] net/mlx5: Expose shared buffer registers bits and structs
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  9:10   ` patchwork-bot+netdevbpf
  2023-01-11  5:30 ` [net-next 02/15] net/mlx5e: Add API to query/modify SBPR and SBCM registers Saeed Mahameed
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maher Sanalla, Moshe Shemesh

From: Maher Sanalla <msanalla@nvidia.com>

Add the shared receive buffer management and configuration registers:
1. SBPR - Shared Buffer Pools Register
2. SBCM - Shared Buffer Class Management Register

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/driver.h   |  2 ++
 include/linux/mlx5/mlx5_ifc.h | 61 +++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d476255c9a3f..0c4f6acf59ca 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -100,6 +100,8 @@ enum {
 };
 
 enum {
+	MLX5_REG_SBPR            = 0xb001,
+	MLX5_REG_SBCM            = 0xb002,
 	MLX5_REG_QPTS            = 0x4002,
 	MLX5_REG_QETCR		 = 0x4005,
 	MLX5_REG_QTCT		 = 0x400a,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index a9ee7bc59c90..a84bdeeed2c6 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -11000,6 +11000,67 @@ struct mlx5_ifc_pbmc_reg_bits {
 	u8         reserved_at_2e0[0x80];
 };
 
+struct mlx5_ifc_sbpr_reg_bits {
+	u8         desc[0x1];
+	u8         snap[0x1];
+	u8         reserved_at_2[0x4];
+	u8         dir[0x2];
+	u8         reserved_at_8[0x14];
+	u8         pool[0x4];
+
+	u8         infi_size[0x1];
+	u8         reserved_at_21[0x7];
+	u8         size[0x18];
+
+	u8         reserved_at_40[0x1c];
+	u8         mode[0x4];
+
+	u8         reserved_at_60[0x8];
+	u8         buff_occupancy[0x18];
+
+	u8         clr[0x1];
+	u8         reserved_at_81[0x7];
+	u8         max_buff_occupancy[0x18];
+
+	u8         reserved_at_a0[0x8];
+	u8         ext_buff_occupancy[0x18];
+};
+
+struct mlx5_ifc_sbcm_reg_bits {
+	u8         desc[0x1];
+	u8         snap[0x1];
+	u8         reserved_at_2[0x6];
+	u8         local_port[0x8];
+	u8         pnat[0x2];
+	u8         pg_buff[0x6];
+	u8         reserved_at_18[0x6];
+	u8         dir[0x2];
+
+	u8         reserved_at_20[0x1f];
+	u8         exc[0x1];
+
+	u8         reserved_at_40[0x40];
+
+	u8         reserved_at_80[0x8];
+	u8         buff_occupancy[0x18];
+
+	u8         clr[0x1];
+	u8         reserved_at_a1[0x7];
+	u8         max_buff_occupancy[0x18];
+
+	u8         reserved_at_c0[0x8];
+	u8         min_buff[0x18];
+
+	u8         infi_max[0x1];
+	u8         reserved_at_e1[0x7];
+	u8         max_buff[0x18];
+
+	u8         reserved_at_100[0x20];
+
+	u8         reserved_at_120[0x1c];
+	u8         pool[0x4];
+};
+
 struct mlx5_ifc_qtct_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         port_number[0x8];
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 02/15] net/mlx5e: Add API to query/modify SBPR and SBCM registers
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
  2023-01-11  5:30 ` [net-next 01/15] net/mlx5: Expose shared buffer registers bits and structs Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 03/15] net/mlx5e: Update shared buffer along with device buffer changes Saeed Mahameed
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maher Sanalla, Moshe Shemesh

From: Maher Sanalla <msanalla@nvidia.com>

To allow users to configure shared receive buffer parameters through
dcbnl callbacks, expose an API to query and modify SBPR and SBCM registers,
which will be used in the upcoming patch.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/port.c | 72 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/port.h |  6 ++
 2 files changed, 78 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
index 89510cac46c2..505ba41195b9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -287,6 +287,78 @@ int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in)
 	return err;
 }
 
+int mlx5e_port_query_sbpr(struct mlx5_core_dev *mdev, u32 desc, u8 dir,
+			  u8 pool_idx, void *out, int size_out)
+{
+	u32 in[MLX5_ST_SZ_DW(sbpr_reg)] = {};
+
+	MLX5_SET(sbpr_reg, in, desc, desc);
+	MLX5_SET(sbpr_reg, in, dir, dir);
+	MLX5_SET(sbpr_reg, in, pool, pool_idx);
+
+	return mlx5_core_access_reg(mdev, in, sizeof(in), out, size_out, MLX5_REG_SBPR, 0, 0);
+}
+
+int mlx5e_port_set_sbpr(struct mlx5_core_dev *mdev, u32 desc, u8 dir,
+			u8 pool_idx, u32 infi_size, u32 size)
+{
+	u32 out[MLX5_ST_SZ_DW(sbpr_reg)] = {};
+	u32 in[MLX5_ST_SZ_DW(sbpr_reg)] = {};
+
+	MLX5_SET(sbpr_reg, in, desc, desc);
+	MLX5_SET(sbpr_reg, in, dir, dir);
+	MLX5_SET(sbpr_reg, in, pool, pool_idx);
+	MLX5_SET(sbpr_reg, in, infi_size, infi_size);
+	MLX5_SET(sbpr_reg, in, size, size);
+	MLX5_SET(sbpr_reg, in, mode, 1);
+
+	return mlx5_core_access_reg(mdev, in, sizeof(in), out, sizeof(out), MLX5_REG_SBPR, 0, 1);
+}
+
+static int mlx5e_port_query_sbcm(struct mlx5_core_dev *mdev, u32 desc,
+				 u8 pg_buff_idx, u8 dir, void *out,
+				 int size_out)
+{
+	u32 in[MLX5_ST_SZ_DW(sbcm_reg)] = {};
+
+	MLX5_SET(sbcm_reg, in, desc, desc);
+	MLX5_SET(sbcm_reg, in, local_port, 1);
+	MLX5_SET(sbcm_reg, in, pg_buff, pg_buff_idx);
+	MLX5_SET(sbcm_reg, in, dir, dir);
+
+	return mlx5_core_access_reg(mdev, in, sizeof(in), out, size_out, MLX5_REG_SBCM, 0, 0);
+}
+
+int mlx5e_port_set_sbcm(struct mlx5_core_dev *mdev, u32 desc, u8 pg_buff_idx,
+			u8 dir, u8 infi_size, u32 max_buff, u8 pool_idx)
+{
+	u32 out[MLX5_ST_SZ_DW(sbcm_reg)] = {};
+	u32 in[MLX5_ST_SZ_DW(sbcm_reg)] = {};
+	u32 min_buff;
+	int err;
+	u8 exc;
+
+	err = mlx5e_port_query_sbcm(mdev, desc, pg_buff_idx, dir, out,
+				    sizeof(out));
+	if (err)
+		return err;
+
+	exc = MLX5_GET(sbcm_reg, out, exc);
+	min_buff = MLX5_GET(sbcm_reg, out, min_buff);
+
+	MLX5_SET(sbcm_reg, in, desc, desc);
+	MLX5_SET(sbcm_reg, in, local_port, 1);
+	MLX5_SET(sbcm_reg, in, pg_buff, pg_buff_idx);
+	MLX5_SET(sbcm_reg, in, dir, dir);
+	MLX5_SET(sbcm_reg, in, exc, exc);
+	MLX5_SET(sbcm_reg, in, min_buff, min_buff);
+	MLX5_SET(sbcm_reg, in, infi_max, infi_size);
+	MLX5_SET(sbcm_reg, in, max_buff, max_buff);
+	MLX5_SET(sbcm_reg, in, pool, pool_idx);
+
+	return mlx5_core_access_reg(mdev, in, sizeof(in), out, sizeof(out), MLX5_REG_SBCM, 0, 1);
+}
+
 /* buffer[i]: buffer that priority i mapped to */
 int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
index 7a7defe60792..3f474e370828 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
@@ -57,6 +57,12 @@ u32 mlx5e_port_speed2linkmodes(struct mlx5_core_dev *mdev, u32 speed,
 bool mlx5e_ptys_ext_supported(struct mlx5_core_dev *mdev);
 int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out);
 int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in);
+int mlx5e_port_query_sbpr(struct mlx5_core_dev *mdev, u32 desc, u8 dir,
+			  u8 pool_idx, void *out, int size_out);
+int mlx5e_port_set_sbpr(struct mlx5_core_dev *mdev, u32 desc, u8 dir,
+			u8 pool_idx, u32 infi_size, u32 size);
+int mlx5e_port_set_sbcm(struct mlx5_core_dev *mdev, u32 desc, u8 pg_buff_idx,
+			u8 dir, u8 infi_size, u32 max_buff, u8 pool_idx);
 int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
 int mlx5e_port_set_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 03/15] net/mlx5e: Update shared buffer along with device buffer changes
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
  2023-01-11  5:30 ` [net-next 01/15] net/mlx5: Expose shared buffer registers bits and structs Saeed Mahameed
  2023-01-11  5:30 ` [net-next 02/15] net/mlx5e: Add API to query/modify SBPR and SBCM registers Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 04/15] net/mlx5e: Add Ethernet driver debugfs Saeed Mahameed
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maher Sanalla, Moshe Shemesh

From: Maher Sanalla <msanalla@nvidia.com>

Currently, the user can modify device's receive buffer size, modify the
mapping between QoS priority groups to buffers and change the buffer
state to become lossy/lossless via pfc command.

However, the shared receive buffer pool alignments, as a result of
such commands, is performed only when the shared buffer is in FW ownership.
When a user changes the mapping of priority groups or buffer size,
the shared buffer is moved to SW ownership.

Therefore, for devices that support shared buffer, handle the shared buffer
alignments in accordance to user's desired configurations.

Meaning, the following will be performed:
1. For every change of buffer's headroom, recalculate the size of shared
   buffer to be equal to "total_buffer_size" - "new_headroom_size".
   The new shared buffer size will be split in ratio of 3:1 between
   lossy and lossless pools, respectively.

2. For each port buffer change, count the number of lossless buffers.
   If there is only one lossless buffer, then set its lossless pool
   usage threshold to be infinite. Otherwise, if there is more than
   one lossless buffer, set a usage threshold for each lossless buffer.

While at it, add more verbosity to debug prints when handling user
commands, to assist in future debug.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/en/port_buffer.c       | 222 +++++++++++++++++-
 .../mellanox/mlx5/core/en/port_buffer.h       |   1 +
 2 files changed, 219 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
index c9d5d8d93994..57f4b1b50421 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
@@ -73,6 +73,7 @@ int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
 			  port_buffer->buffer[i].lossy);
 	}
 
+	port_buffer->headroom_size = total_used;
 	port_buffer->port_buffer_size =
 		MLX5_GET(pbmc_reg, out, port_buffer_size) * port_buff_cell_sz;
 	port_buffer->spare_buffer_size =
@@ -86,16 +87,204 @@ int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
 	return err;
 }
 
+struct mlx5e_buffer_pool {
+	u32 infi_size;
+	u32 size;
+	u32 buff_occupancy;
+};
+
+static int mlx5e_port_query_pool(struct mlx5_core_dev *mdev,
+				 struct mlx5e_buffer_pool *buffer_pool,
+				 u32 desc, u8 dir, u8 pool_idx)
+{
+	u32 out[MLX5_ST_SZ_DW(sbpr_reg)] = {};
+	int err;
+
+	err = mlx5e_port_query_sbpr(mdev, desc, dir, pool_idx, out,
+				    sizeof(out));
+	if (err)
+		return err;
+
+	buffer_pool->size = MLX5_GET(sbpr_reg, out, size);
+	buffer_pool->infi_size = MLX5_GET(sbpr_reg, out, infi_size);
+	buffer_pool->buff_occupancy = MLX5_GET(sbpr_reg, out, buff_occupancy);
+
+	return err;
+}
+
+enum {
+	MLX5_INGRESS_DIR = 0,
+	MLX5_EGRESS_DIR = 1,
+};
+
+enum {
+	MLX5_LOSSY_POOL = 0,
+	MLX5_LOSSLESS_POOL = 1,
+};
+
+/* No limit on usage of shared buffer pool (max_buff=0) */
+#define MLX5_SB_POOL_NO_THRESHOLD  0
+/* Shared buffer pool usage threshold when calculated
+ * dynamically in alpha units. alpha=13 is equivalent to
+ * HW_alpha of  [(1/128) * 2 ^ (alpha-1)] = 32, where HW_alpha
+ * equates to the following portion of the shared buffer pool:
+ * [32 / (1 + n * 32)] While *n* is the number of buffers
+ * that are using the shared buffer pool.
+ */
+#define MLX5_SB_POOL_THRESHOLD 13
+
+/* Shared buffer class management parameters */
+struct mlx5_sbcm_params {
+	u8 pool_idx;
+	u8 max_buff;
+	u8 infi_size;
+};
+
+static const struct mlx5_sbcm_params sbcm_default = {
+	.pool_idx = MLX5_LOSSY_POOL,
+	.max_buff = MLX5_SB_POOL_NO_THRESHOLD,
+	.infi_size = 0,
+};
+
+static const struct mlx5_sbcm_params sbcm_lossy = {
+	.pool_idx = MLX5_LOSSY_POOL,
+	.max_buff = MLX5_SB_POOL_NO_THRESHOLD,
+	.infi_size = 1,
+};
+
+static const struct mlx5_sbcm_params sbcm_lossless = {
+	.pool_idx = MLX5_LOSSLESS_POOL,
+	.max_buff = MLX5_SB_POOL_THRESHOLD,
+	.infi_size = 0,
+};
+
+static const struct mlx5_sbcm_params sbcm_lossless_no_threshold = {
+	.pool_idx = MLX5_LOSSLESS_POOL,
+	.max_buff = MLX5_SB_POOL_NO_THRESHOLD,
+	.infi_size = 1,
+};
+
+/**
+ * select_sbcm_params() - selects the shared buffer pool configuration
+ *
+ * @buffer: <input> port buffer to retrieve params of
+ * @lossless_buff_count: <input> number of lossless buffers in total
+ *
+ * The selection is based on the following rules:
+ * 1. If buffer size is 0, no shared buffer pool is used.
+ * 2. If buffer is lossy, use lossy shared buffer pool.
+ * 3. If there are more than 1 lossless buffers, use lossless shared buffer pool
+ *    with threshold.
+ * 4. If there is only 1 lossless buffer, use lossless shared buffer pool
+ *    without threshold.
+ *
+ * @return const struct mlx5_sbcm_params* selected values
+ */
+static const struct mlx5_sbcm_params *
+select_sbcm_params(struct mlx5e_bufferx_reg *buffer, u8 lossless_buff_count)
+{
+	if (buffer->size == 0)
+		return &sbcm_default;
+
+	if (buffer->lossy)
+		return &sbcm_lossy;
+
+	if (lossless_buff_count > 1)
+		return &sbcm_lossless;
+
+	return &sbcm_lossless_no_threshold;
+}
+
+static int port_update_pool_cfg(struct mlx5_core_dev *mdev,
+				struct mlx5e_port_buffer *port_buffer)
+{
+	const struct mlx5_sbcm_params *p;
+	u8 lossless_buff_count = 0;
+	int err;
+	int i;
+
+	if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+		return 0;
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+		lossless_buff_count += ((port_buffer->buffer[i].size) &&
+				       (!(port_buffer->buffer[i].lossy)));
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+		p = select_sbcm_params(&port_buffer->buffer[i], lossless_buff_count);
+		err = mlx5e_port_set_sbcm(mdev, 0, i,
+					  MLX5_INGRESS_DIR,
+					  p->infi_size,
+					  p->max_buff,
+					  p->pool_idx);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int port_update_shared_buffer(struct mlx5_core_dev *mdev,
+				     u32 current_headroom_size,
+				     u32 new_headroom_size)
+{
+	struct mlx5e_buffer_pool lossless_ipool;
+	struct mlx5e_buffer_pool lossy_epool;
+	u32 lossless_ipool_size;
+	u32 shared_buffer_size;
+	u32 total_buffer_size;
+	u32 lossy_epool_size;
+	int err;
+
+	if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+		return 0;
+
+	err = mlx5e_port_query_pool(mdev, &lossy_epool, 0, MLX5_EGRESS_DIR,
+				    MLX5_LOSSY_POOL);
+	if (err)
+		return err;
+
+	err = mlx5e_port_query_pool(mdev, &lossless_ipool, 0, MLX5_INGRESS_DIR,
+				    MLX5_LOSSLESS_POOL);
+	if (err)
+		return err;
+
+	total_buffer_size = current_headroom_size + lossy_epool.size +
+			    lossless_ipool.size;
+	shared_buffer_size = total_buffer_size - new_headroom_size;
+
+	if (shared_buffer_size < 4) {
+		pr_err("Requested port buffer is too large, not enough space left for shared buffer\n");
+		return -EINVAL;
+	}
+
+	/* Total shared buffer size is split in a ratio of 3:1 between
+	 * lossy and lossless pools respectively.
+	 */
+	lossy_epool_size = (shared_buffer_size / 4) * 3;
+	lossless_ipool_size = shared_buffer_size / 4;
+
+	mlx5e_port_set_sbpr(mdev, 0, MLX5_EGRESS_DIR, MLX5_LOSSY_POOL, 0,
+			    lossy_epool_size);
+	mlx5e_port_set_sbpr(mdev, 0, MLX5_INGRESS_DIR, MLX5_LOSSLESS_POOL, 0,
+			    lossless_ipool_size);
+	return 0;
+}
+
 static int port_set_buffer(struct mlx5e_priv *priv,
 			   struct mlx5e_port_buffer *port_buffer)
 {
 	u16 port_buff_cell_sz = priv->dcbx.port_buff_cell_sz;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+	u32 new_headroom_size = 0;
+	u32 current_headroom_size;
 	void *in;
 	int err;
 	int i;
 
+	current_headroom_size = port_buffer->headroom_size;
+
 	in = kzalloc(sz, GFP_KERNEL);
 	if (!in)
 		return -ENOMEM;
@@ -110,6 +299,7 @@ static int port_set_buffer(struct mlx5e_priv *priv,
 		u64 xoff = port_buffer->buffer[i].xoff;
 		u64 xon = port_buffer->buffer[i].xon;
 
+		new_headroom_size += size;
 		do_div(size, port_buff_cell_sz);
 		do_div(xoff, port_buff_cell_sz);
 		do_div(xon, port_buff_cell_sz);
@@ -119,6 +309,17 @@ static int port_set_buffer(struct mlx5e_priv *priv,
 		MLX5_SET(bufferx_reg, buffer, xon_threshold, xon);
 	}
 
+	new_headroom_size /= port_buff_cell_sz;
+	current_headroom_size /= port_buff_cell_sz;
+	err = port_update_shared_buffer(priv->mdev, current_headroom_size,
+					new_headroom_size);
+	if (err)
+		return err;
+
+	err = port_update_pool_cfg(priv->mdev, port_buffer);
+	if (err)
+		return err;
+
 	err = mlx5e_port_set_pbmc(mdev, in);
 out:
 	kfree(in);
@@ -174,6 +375,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
 
 /**
  *	update_buffer_lossy	- Update buffer configuration based on pfc
+ *	@mdev: port function core device
  *	@max_mtu: netdev's max_mtu
  *	@pfc_en: <input> current pfc configuration
  *	@buffer: <input> current prio to buffer mapping
@@ -192,7 +394,8 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
  *	@return: 0 if no error,
  *	sets change to true if buffer configuration was modified.
  */
-static int update_buffer_lossy(unsigned int max_mtu,
+static int update_buffer_lossy(struct mlx5_core_dev *mdev,
+			       unsigned int max_mtu,
 			       u8 pfc_en, u8 *buffer, u32 xoff, u16 port_buff_cell_sz,
 			       struct mlx5e_port_buffer *port_buffer,
 			       bool *change)
@@ -229,6 +432,10 @@ static int update_buffer_lossy(unsigned int max_mtu,
 	}
 
 	if (changed) {
+		err = port_update_pool_cfg(mdev, port_buffer);
+		if (err)
+			return err;
+
 		err = update_xoff_threshold(port_buffer, xoff, max_mtu, port_buff_cell_sz);
 		if (err)
 			return err;
@@ -293,23 +500,30 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 	}
 
 	if (change & MLX5E_PORT_BUFFER_PFC) {
+		mlx5e_dbg(HW, priv, "%s: requested PFC per priority bitmask: 0x%x\n",
+			  __func__, pfc->pfc_en);
 		err = mlx5e_port_query_priority2buffer(priv->mdev, buffer);
 		if (err)
 			return err;
 
-		err = update_buffer_lossy(max_mtu, pfc->pfc_en, buffer, xoff, port_buff_cell_sz,
-					  &port_buffer, &update_buffer);
+		err = update_buffer_lossy(priv->mdev, max_mtu, pfc->pfc_en, buffer, xoff,
+					  port_buff_cell_sz, &port_buffer,
+					  &update_buffer);
 		if (err)
 			return err;
 	}
 
 	if (change & MLX5E_PORT_BUFFER_PRIO2BUFFER) {
 		update_prio2buffer = true;
+		for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+			mlx5e_dbg(HW, priv, "%s: requested to map prio[%d] to buffer %d\n",
+				  __func__, i, prio2buffer[i]);
+
 		err = fill_pfc_en(priv->mdev, &curr_pfc_en);
 		if (err)
 			return err;
 
-		err = update_buffer_lossy(max_mtu, curr_pfc_en, prio2buffer, xoff,
+		err = update_buffer_lossy(priv->mdev, max_mtu, curr_pfc_en, prio2buffer, xoff,
 					  port_buff_cell_sz, &port_buffer, &update_buffer);
 		if (err)
 			return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
index 80af7a5ac604..a6ef118de758 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
@@ -60,6 +60,7 @@ struct mlx5e_bufferx_reg {
 struct mlx5e_port_buffer {
 	u32                       port_buffer_size;
 	u32                       spare_buffer_size;
+	u32                       headroom_size;
 	struct mlx5e_bufferx_reg  buffer[MLX5E_MAX_BUFFER];
 };
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 04/15] net/mlx5e: Add Ethernet driver debugfs
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 03/15] net/mlx5e: Update shared buffer along with device buffer changes Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 05/15] net/mlx5e: kTLS, Add debugfs Saeed Mahameed
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Gal Pressman <gal@nvidia.com>

Similar to the mlx5_core debugfs, lay the groundwork for mlx5e debugfs
files under /sys/kernel/debug/mlx5/<pci>/nic/..

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2d77fb8a8a01..7cbd71f0b8ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -968,6 +968,7 @@ struct mlx5e_priv {
 	struct mlx5e_scratchpad    scratchpad;
 	struct mlx5e_htb          *htb;
 	struct mlx5e_mqprio_rl    *mqprio_rl;
+	struct dentry             *dfs_root;
 };
 
 struct mlx5e_rx_handlers {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cff5f2e29e1e..16c8bbad5b33 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -35,6 +35,7 @@
 #include <net/vxlan.h>
 #include <net/geneve.h>
 #include <linux/bpf.h>
+#include <linux/debugfs.h>
 #include <linux/if_bridge.h>
 #include <linux/filter.h>
 #include <net/page_pool.h>
@@ -5931,6 +5932,9 @@ static int mlx5e_probe(struct auxiliary_device *adev,
 	priv->profile = profile;
 	priv->ppriv = NULL;
 
+	priv->dfs_root = debugfs_create_dir("nic",
+					    mlx5_debugfs_get_dev_root(priv->mdev));
+
 	err = mlx5e_devlink_port_register(priv);
 	if (err) {
 		mlx5_core_err(mdev, "mlx5e_devlink_port_register failed, %d\n", err);
@@ -5968,6 +5972,7 @@ static int mlx5e_probe(struct auxiliary_device *adev,
 err_devlink_cleanup:
 	mlx5e_devlink_port_unregister(priv);
 err_destroy_netdev:
+	debugfs_remove_recursive(priv->dfs_root);
 	mlx5e_destroy_netdev(priv);
 	return err;
 }
@@ -5982,6 +5987,7 @@ static void mlx5e_remove(struct auxiliary_device *adev)
 	mlx5e_suspend(adev, state);
 	priv->profile->cleanup(priv);
 	mlx5e_devlink_port_unregister(priv);
+	debugfs_remove_recursive(priv->dfs_root);
 	mlx5e_destroy_netdev(priv);
 }
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 05/15] net/mlx5e: kTLS, Add debugfs
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 04/15] net/mlx5e: Add Ethernet driver debugfs Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11 18:32   ` Jakub Kicinski
  2023-01-11  5:30 ` [net-next 06/15] net/mlx5e: Add hairpin params structure Saeed Mahameed
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Add TLS debugfs to improve observability by exposing the size of the tls
TX pool.

To observe the size of the TX pool:
$ cat /sys/kernel/debug/mlx5/<pci>/nic/tls/tx/pool_size

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Co-developed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/ktls.c        | 22 +++++++++++++++++++
 .../mellanox/mlx5/core/en_accel/ktls.h        |  8 +++++++
 .../mellanox/mlx5/core/en_accel/ktls_tx.c     | 22 +++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
index da2184c94203..eb5b09f81dec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 // Copyright (c) 2019 Mellanox Technologies.
 
+#include <linux/debugfs.h>
 #include "en.h"
 #include "lib/mlx5.h"
 #include "en_accel/ktls.h"
@@ -177,6 +178,15 @@ void mlx5e_ktls_cleanup_rx(struct mlx5e_priv *priv)
 	destroy_workqueue(priv->tls->rx_wq);
 }
 
+static void mlx5e_tls_debugfs_init(struct mlx5e_tls *tls,
+				   struct dentry *dfs_root)
+{
+	if (IS_ERR_OR_NULL(dfs_root))
+		return;
+
+	tls->debugfs.dfs = debugfs_create_dir("tls", dfs_root);
+}
+
 int mlx5e_ktls_init(struct mlx5e_priv *priv)
 {
 	struct mlx5e_tls *tls;
@@ -189,11 +199,23 @@ int mlx5e_ktls_init(struct mlx5e_priv *priv)
 		return -ENOMEM;
 
 	priv->tls = tls;
+	priv->tls->mdev = priv->mdev;
+
+	mlx5e_tls_debugfs_init(tls, priv->dfs_root);
+
 	return 0;
 }
 
 void mlx5e_ktls_cleanup(struct mlx5e_priv *priv)
 {
+	struct mlx5e_tls *tls = priv->tls;
+
+	if (!mlx5e_is_ktls_device(priv->mdev))
+		return;
+
+	debugfs_remove_recursive(tls->debugfs.dfs);
+	tls->debugfs.dfs = NULL;
+
 	kfree(priv->tls);
 	priv->tls = NULL;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
index 1c35045e41fb..fccf995ee16d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
@@ -4,6 +4,7 @@
 #ifndef __MLX5E_KTLS_H__
 #define __MLX5E_KTLS_H__
 
+#include <linux/debugfs.h>
 #include <linux/tls.h>
 #include <net/tls.h>
 #include "en.h"
@@ -72,10 +73,17 @@ struct mlx5e_tls_sw_stats {
 	atomic64_t rx_tls_del;
 };
 
+struct mlx5e_tls_debugfs {
+	struct dentry *dfs;
+	struct dentry *dfs_tx;
+};
+
 struct mlx5e_tls {
+	struct mlx5_core_dev *mdev;
 	struct mlx5e_tls_sw_stats sw_stats;
 	struct workqueue_struct *rx_wq;
 	struct mlx5e_tls_tx_pool *tx_pool;
+	struct mlx5e_tls_debugfs debugfs;
 };
 
 int mlx5e_ktls_init(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 78072bf93f3f..6db27062b765 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 // Copyright (c) 2019 Mellanox Technologies.
 
+#include <linux/debugfs.h>
 #include "en_accel/ktls.h"
 #include "en_accel/ktls_txrx.h"
 #include "en_accel/ktls_utils.h"
@@ -886,8 +887,24 @@ bool mlx5e_ktls_handle_tx_skb(struct net_device *netdev, struct mlx5e_txqsq *sq,
 	return false;
 }
 
+static void mlx5e_tls_tx_debugfs_init(struct mlx5e_tls *tls,
+				      struct dentry *dfs_root)
+{
+	if (IS_ERR_OR_NULL(dfs_root))
+		return;
+
+	tls->debugfs.dfs_tx = debugfs_create_dir("tx", dfs_root);
+	if (!tls->debugfs.dfs_tx)
+		return;
+
+	debugfs_create_size_t("pool_size", 0400, tls->debugfs.dfs_tx,
+			      &tls->tx_pool->size);
+}
+
 int mlx5e_ktls_init_tx(struct mlx5e_priv *priv)
 {
+	struct mlx5e_tls *tls = priv->tls;
+
 	if (!mlx5e_is_ktls_tx(priv->mdev))
 		return 0;
 
@@ -895,6 +912,8 @@ int mlx5e_ktls_init_tx(struct mlx5e_priv *priv)
 	if (!priv->tls->tx_pool)
 		return -ENOMEM;
 
+	mlx5e_tls_tx_debugfs_init(tls, tls->debugfs.dfs);
+
 	return 0;
 }
 
@@ -903,6 +922,9 @@ void mlx5e_ktls_cleanup_tx(struct mlx5e_priv *priv)
 	if (!mlx5e_is_ktls_tx(priv->mdev))
 		return;
 
+	debugfs_remove_recursive(priv->tls->debugfs.dfs_tx);
+	priv->tls->debugfs.dfs_tx = NULL;
+
 	mlx5e_tls_tx_pool_cleanup(priv->tls->tx_pool);
 	priv->tls->tx_pool = NULL;
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 06/15] net/mlx5e: Add hairpin params structure
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 05/15] net/mlx5e: kTLS, Add debugfs Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 07/15] net/mlx5e: Add flow steering debugfs directory Saeed Mahameed
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Gal Pressman <gal@nvidia.com>

In preparation for downstream work to expose hairpin queues parameters,
introduce a hairpin parameters struct as part of the tc structure.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 52 +++++++++++++------
 1 file changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 9af2aa2922f5..800442eaf9b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -71,6 +71,12 @@
 #define MLX5E_TC_TABLE_NUM_GROUPS 4
 #define MLX5E_TC_TABLE_MAX_GROUP_SIZE BIT(18)
 
+struct mlx5e_hairpin_params {
+	struct mlx5_core_dev *mdev;
+	u32 num_queues;
+	u32 queue_size;
+};
+
 struct mlx5e_tc_table {
 	/* Protects the dynamic assignment of the t parameter
 	 * which is the nic tc root table.
@@ -93,6 +99,7 @@ struct mlx5e_tc_table {
 
 	struct mlx5_tc_ct_priv         *ct;
 	struct mapping_ctx             *mapping;
+	struct mlx5e_hairpin_params    hairpin_params;
 };
 
 struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
@@ -1016,6 +1023,26 @@ static int mlx5e_hairpin_get_prio(struct mlx5e_priv *priv,
 	return 0;
 }
 
+static void
+mlx5e_hairpin_params_init(struct mlx5e_hairpin_params *hairpin_params,
+			  struct mlx5_core_dev *mdev)
+{
+	u64 link_speed64;
+	u32 link_speed;
+
+	hairpin_params->mdev = mdev;
+	/* set hairpin pair per each 50Gbs share of the link */
+	mlx5e_port_max_linkspeed(mdev, &link_speed);
+	link_speed = max_t(u32, link_speed, 50000);
+	link_speed64 = link_speed;
+	do_div(link_speed64, 50000);
+	hairpin_params->num_queues = link_speed64;
+
+	hairpin_params->queue_size =
+		BIT(min_t(u32, 16 - MLX5_MPWRQ_MIN_LOG_STRIDE_SZ(mdev),
+			  MLX5_CAP_GEN(mdev, log_max_hairpin_num_packets)));
+}
+
 static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
 				  struct mlx5e_tc_flow *flow,
 				  struct mlx5e_tc_flow_parse_attr *parse_attr,
@@ -1027,8 +1054,6 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
 	struct mlx5_core_dev *peer_mdev;
 	struct mlx5e_hairpin_entry *hpe;
 	struct mlx5e_hairpin *hp;
-	u64 link_speed64;
-	u32 link_speed;
 	u8 match_prio;
 	u16 peer_id;
 	int err;
@@ -1081,21 +1106,16 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
 		 hash_hairpin_info(peer_id, match_prio));
 	mutex_unlock(&tc->hairpin_tbl_lock);
 
-	params.log_data_size = clamp_t(u8, 16,
-				       MLX5_CAP_GEN(priv->mdev, log_min_hairpin_wq_data_sz),
-				       MLX5_CAP_GEN(priv->mdev, log_max_hairpin_wq_data_sz));
-	params.log_num_packets = params.log_data_size -
-				 MLX5_MPWRQ_MIN_LOG_STRIDE_SZ(priv->mdev);
-	params.log_num_packets = min_t(u8, params.log_num_packets,
-				       MLX5_CAP_GEN(priv->mdev, log_max_hairpin_num_packets));
+	params.log_num_packets = ilog2(tc->hairpin_params.queue_size);
+	params.log_data_size =
+		clamp_t(u32,
+			params.log_num_packets +
+				MLX5_MPWRQ_MIN_LOG_STRIDE_SZ(priv->mdev),
+			MLX5_CAP_GEN(priv->mdev, log_min_hairpin_wq_data_sz),
+			MLX5_CAP_GEN(priv->mdev, log_max_hairpin_wq_data_sz));
 
 	params.q_counter = priv->q_counter;
-	/* set hairpin pair per each 50Gbs share of the link */
-	mlx5e_port_max_linkspeed(priv->mdev, &link_speed);
-	link_speed = max_t(u32, link_speed, 50000);
-	link_speed64 = link_speed;
-	do_div(link_speed64, 50000);
-	params.num_channels = link_speed64;
+	params.num_channels = tc->hairpin_params.num_queues;
 
 	hp = mlx5e_hairpin_create(priv, &params, peer_ifindex);
 	hpe->hp = hp;
@@ -5217,6 +5237,8 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 	tc->ct = mlx5_tc_ct_init(priv, tc->chains, &tc->mod_hdr,
 				 MLX5_FLOW_NAMESPACE_KERNEL, tc->post_act);
 
+	mlx5e_hairpin_params_init(&tc->hairpin_params, dev);
+
 	tc->netdevice_nb.notifier_call = mlx5e_tc_netdev_event;
 	err = register_netdevice_notifier_dev_net(priv->netdev,
 						  &tc->netdevice_nb,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 07/15] net/mlx5e: Add flow steering debugfs directory
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 06/15] net/mlx5e: Add hairpin params structure Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 08/15] net/mlx5e: Add hairpin debugfs files Saeed Mahameed
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Gal Pressman <gal@nvidia.com>

Add a debugfs directory for flow steering related information.
The directory is currently empty, and will hold the 'tc' subdirectory in
a downstream patch.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   |  5 ++++-
 .../net/ethernet/mellanox/mlx5/core/en_fs.c   | 22 ++++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  3 ++-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  9 +++++---
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib.c |  3 ++-
 5 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index 379c6dc9a3be..5233d4daca41 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -145,7 +145,8 @@ void mlx5e_destroy_flow_steering(struct mlx5e_flow_steering *fs, bool ntuple,
 
 struct mlx5e_flow_steering *mlx5e_fs_init(const struct mlx5e_profile *profile,
 					  struct mlx5_core_dev *mdev,
-					  bool state_destroy);
+					  bool state_destroy,
+					  struct dentry *dfs_root);
 void mlx5e_fs_cleanup(struct mlx5e_flow_steering *fs);
 struct mlx5e_vlan_table *mlx5e_fs_get_vlan(struct mlx5e_flow_steering *fs);
 void mlx5e_fs_set_tc(struct mlx5e_flow_steering *fs, struct mlx5e_tc_table *tc);
@@ -189,6 +190,8 @@ int mlx5e_fs_vlan_rx_kill_vid(struct mlx5e_flow_steering *fs,
 			      __be16 proto, u16 vid);
 void mlx5e_fs_init_l2_addr(struct mlx5e_flow_steering *fs, struct net_device *netdev);
 
+struct dentry *mlx5e_fs_get_debugfs_root(struct mlx5e_flow_steering *fs);
+
 #define fs_err(fs, fmt, ...) \
 	mlx5_core_err(mlx5e_fs_get_mdev(fs), fmt, ##__VA_ARGS__)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 1892ccb889b3..7298fe782e9e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include <linux/debugfs.h>
 #include <linux/list.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
@@ -67,6 +68,7 @@ struct mlx5e_flow_steering {
 	struct mlx5e_fs_udp            *udp;
 	struct mlx5e_fs_any            *any;
 	struct mlx5e_ptp_fs            *ptp_fs;
+	struct dentry                  *dfs_root;
 };
 
 static int mlx5e_add_l2_flow_rule(struct mlx5e_flow_steering *fs,
@@ -104,6 +106,11 @@ static inline int mlx5e_hash_l2(const u8 *addr)
 	return addr[5];
 }
 
+struct dentry *mlx5e_fs_get_debugfs_root(struct mlx5e_flow_steering *fs)
+{
+	return fs->dfs_root;
+}
+
 static void mlx5e_add_l2_to_hash(struct hlist_head *hash, const u8 *addr)
 {
 	struct mlx5e_l2_hash_node *hn;
@@ -1429,9 +1436,19 @@ static int mlx5e_fs_ethtool_alloc(struct mlx5e_flow_steering *fs)
 static void mlx5e_fs_ethtool_free(struct mlx5e_flow_steering *fs) { }
 #endif
 
+static void mlx5e_fs_debugfs_init(struct mlx5e_flow_steering *fs,
+				  struct dentry *dfs_root)
+{
+	if (IS_ERR_OR_NULL(dfs_root))
+		return;
+
+	fs->dfs_root = debugfs_create_dir("fs", dfs_root);
+}
+
 struct mlx5e_flow_steering *mlx5e_fs_init(const struct mlx5e_profile *profile,
 					  struct mlx5_core_dev *mdev,
-					  bool state_destroy)
+					  bool state_destroy,
+					  struct dentry *dfs_root)
 {
 	struct mlx5e_flow_steering *fs;
 	int err;
@@ -1458,6 +1475,8 @@ struct mlx5e_flow_steering *mlx5e_fs_init(const struct mlx5e_profile *profile,
 	if (err)
 		goto err_free_tc;
 
+	mlx5e_fs_debugfs_init(fs, dfs_root);
+
 	return fs;
 err_free_tc:
 	mlx5e_fs_tc_free(fs);
@@ -1471,6 +1490,7 @@ struct mlx5e_flow_steering *mlx5e_fs_init(const struct mlx5e_profile *profile,
 
 void mlx5e_fs_cleanup(struct mlx5e_flow_steering *fs)
 {
+	debugfs_remove_recursive(fs->dfs_root);
 	mlx5e_fs_ethtool_free(fs);
 	mlx5e_fs_tc_free(fs);
 	mlx5e_fs_vlan_free(fs);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 16c8bbad5b33..cef8df9cd42b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5231,7 +5231,8 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 	mlx5e_timestamp_init(priv);
 
 	fs = mlx5e_fs_init(priv->profile, mdev,
-			   !test_bit(MLX5E_STATE_DESTROYING, &priv->state));
+			   !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
+			   priv->dfs_root);
 	if (!fs) {
 		err = -ENOMEM;
 		mlx5_core_err(mdev, "FS initialization failed, %d\n", err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 75b9e1528fd2..eecaf46c55de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -788,8 +788,10 @@ static int mlx5e_init_rep(struct mlx5_core_dev *mdev,
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 
-	priv->fs = mlx5e_fs_init(priv->profile, mdev,
-				 !test_bit(MLX5E_STATE_DESTROYING, &priv->state));
+	priv->fs =
+		mlx5e_fs_init(priv->profile, mdev,
+			      !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
+			      priv->dfs_root);
 	if (!priv->fs) {
 		netdev_err(priv->netdev, "FS allocation failed\n");
 		return -ENOMEM;
@@ -807,7 +809,8 @@ static int mlx5e_init_ul_rep(struct mlx5_core_dev *mdev,
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 
 	priv->fs = mlx5e_fs_init(priv->profile, mdev,
-				 !test_bit(MLX5E_STATE_DESTROYING, &priv->state));
+				 !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
+				 priv->dfs_root);
 	if (!priv->fs) {
 		netdev_err(priv->netdev, "FS allocation failed\n");
 		return -ENOMEM;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 2c73c8445e63..dd4b255c416b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -374,7 +374,8 @@ static int mlx5i_init_rx(struct mlx5e_priv *priv)
 	int err;
 
 	priv->fs = mlx5e_fs_init(priv->profile, mdev,
-				 !test_bit(MLX5E_STATE_DESTROYING, &priv->state));
+				 !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
+				 priv->dfs_root);
 	if (!priv->fs) {
 		netdev_err(priv->netdev, "FS allocation failed\n");
 		return -ENOMEM;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 07/15] net/mlx5e: Add flow steering debugfs directory Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11 18:34   ` Jakub Kicinski
  2023-01-11  5:30 ` [net-next 09/15] net/mlx5: Enable management PF initialization Saeed Mahameed
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Gal Pressman <gal@nvidia.com>

We refer to a TC NIC rule that involves forwarding as "hairpin".
Hairpin queues are mlx5 hardware specific implementation for hardware
forwarding of such packets.

For debug purposes, introduce debugfs files which:
* Expose the number of active hairpins
* Dump the hairpin table
* Allow control over the number and size of the hairpin queues instead
  of the hard-coded values.

This allows us to get visibility of the feature in order to improve it
for next generation hardware.

Add debugfs files:
  fs/tc/hairpin_num_active
  fs/tc/hairpin_num_queues
  fs/tc/hairpin_queue_size
  fs/tc/hairpin_table_dump

Note that the new values will only take effect on the next queues
creation, it does not affect existing queues.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 117 ++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 800442eaf9b4..99a7edb88661 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -100,6 +100,7 @@ struct mlx5e_tc_table {
 	struct mlx5_tc_ct_priv         *ct;
 	struct mapping_ctx             *mapping;
 	struct mlx5e_hairpin_params    hairpin_params;
+	struct dentry                  *dfs_root;
 };
 
 struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
@@ -1023,6 +1024,118 @@ static int mlx5e_hairpin_get_prio(struct mlx5e_priv *priv,
 	return 0;
 }
 
+static int debugfs_hairpin_queues_set(void *data, u64 val)
+{
+	struct mlx5e_hairpin_params *hp = data;
+
+	if (!val) {
+		mlx5_core_err(hp->mdev,
+			      "Number of hairpin queues must be > 0\n");
+		return -EINVAL;
+	}
+
+	hp->num_queues = val;
+
+	return 0;
+}
+
+static int debugfs_hairpin_queues_get(void *data, u64 *val)
+{
+	struct mlx5e_hairpin_params *hp = data;
+
+	*val = hp->num_queues;
+
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_hairpin_queues, debugfs_hairpin_queues_get,
+			 debugfs_hairpin_queues_set, "%llu\n");
+
+static int debugfs_hairpin_queue_size_set(void *data, u64 val)
+{
+	struct mlx5e_hairpin_params *hp = data;
+
+	if (val > BIT(MLX5_CAP_GEN(hp->mdev, log_max_hairpin_num_packets))) {
+		mlx5_core_err(hp->mdev,
+			      "Invalid hairpin queue size, must be <= %lu\n",
+			      BIT(MLX5_CAP_GEN(hp->mdev,
+					       log_max_hairpin_num_packets)));
+		return -EINVAL;
+	}
+
+	hp->queue_size = roundup_pow_of_two(val);
+
+	return 0;
+}
+
+static int debugfs_hairpin_queue_size_get(void *data, u64 *val)
+{
+	struct mlx5e_hairpin_params *hp = data;
+
+	*val = hp->queue_size;
+
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_hairpin_queue_size,
+			 debugfs_hairpin_queue_size_get,
+			 debugfs_hairpin_queue_size_set, "%llu\n");
+
+static int debugfs_hairpin_num_active_get(void *data, u64 *val)
+{
+	struct mlx5e_tc_table *tc = data;
+	struct mlx5e_hairpin_entry *hpe;
+	u32 cnt = 0;
+	u32 bkt;
+
+	mutex_lock(&tc->hairpin_tbl_lock);
+	hash_for_each(tc->hairpin_tbl, bkt, hpe, hairpin_hlist)
+		cnt++;
+	mutex_unlock(&tc->hairpin_tbl_lock);
+
+	*val = cnt;
+
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_hairpin_num_active,
+			 debugfs_hairpin_num_active_get, NULL, "%llu\n");
+
+static int debugfs_hairpin_table_dump_show(struct seq_file *file, void *priv)
+
+{
+	struct mlx5e_tc_table *tc = file->private;
+	struct mlx5e_hairpin_entry *hpe;
+	u32 bkt;
+
+	mutex_lock(&tc->hairpin_tbl_lock);
+	hash_for_each(tc->hairpin_tbl, bkt, hpe, hairpin_hlist)
+		seq_printf(file, "Hairpin peer_vhca_id %u prio %u refcnt %u\n",
+			   hpe->peer_vhca_id, hpe->prio,
+			   refcount_read(&hpe->refcnt));
+	mutex_unlock(&tc->hairpin_tbl_lock);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(debugfs_hairpin_table_dump);
+
+static void mlx5e_tc_debugfs_init(struct mlx5e_tc_table *tc,
+				  struct dentry *dfs_root)
+{
+	if (IS_ERR_OR_NULL(dfs_root))
+		return;
+
+	tc->dfs_root = debugfs_create_dir("tc", dfs_root);
+	if (!tc->dfs_root)
+		return;
+
+	debugfs_create_file("hairpin_num_queues", 0644, tc->dfs_root,
+			    &tc->hairpin_params, &fops_hairpin_queues);
+	debugfs_create_file("hairpin_queue_size", 0644, tc->dfs_root,
+			    &tc->hairpin_params, &fops_hairpin_queue_size);
+	debugfs_create_file("hairpin_num_active", 0444, tc->dfs_root, tc,
+			    &fops_hairpin_num_active);
+	debugfs_create_file("hairpin_table_dump", 0444, tc->dfs_root, tc,
+			    &debugfs_hairpin_table_dump_fops);
+}
+
 static void
 mlx5e_hairpin_params_init(struct mlx5e_hairpin_params *hairpin_params,
 			  struct mlx5_core_dev *mdev)
@@ -5249,6 +5362,8 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 		goto err_reg;
 	}
 
+	mlx5e_tc_debugfs_init(tc, mlx5e_fs_get_debugfs_root(priv->fs));
+
 	return 0;
 
 err_reg:
@@ -5277,6 +5392,8 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv)
 {
 	struct mlx5e_tc_table *tc = mlx5e_fs_get_tc(priv->fs);
 
+	debugfs_remove_recursive(tc->dfs_root);
+
 	if (tc->netdevice_nb.notifier_call)
 		unregister_netdevice_notifier_dev_net(priv->netdev,
 						      &tc->netdevice_nb,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 09/15] net/mlx5: Enable management PF initialization
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 08/15] net/mlx5e: Add hairpin debugfs files Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 10/15] net/mlx5: Introduce and use opcode getter in command interface Saeed Mahameed
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory,
	Eran Ben Elisha, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

Enable initialization of DPU Management PF, which is a new loopback PF
designed for communication with BMC.
For now Management PF doesn't support nor require most upper layer
protocols so avoid them.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/dev.c     | 6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c    | 8 ++++++++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
 include/linux/mlx5/driver.h                       | 5 +++++
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 0571e40c6ee5..5b6b0b126e52 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -59,6 +59,9 @@ bool mlx5_eth_supported(struct mlx5_core_dev *dev)
 	if (!IS_ENABLED(CONFIG_MLX5_CORE_EN))
 		return false;
 
+	if (mlx5_core_is_management_pf(dev))
+		return false;
+
 	if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
 		return false;
 
@@ -198,6 +201,9 @@ bool mlx5_rdma_supported(struct mlx5_core_dev *dev)
 	if (!IS_ENABLED(CONFIG_MLX5_INFINIBAND))
 		return false;
 
+	if (mlx5_core_is_management_pf(dev))
+		return false;
+
 	if (dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_IB_ADEV)
 		return false;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index 464eb3a18450..b70e36025d92 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -75,6 +75,10 @@ int mlx5_ec_init(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_ecpf(dev))
 		return 0;
 
+	/* Management PF don't have a peer PF */
+	if (mlx5_core_is_management_pf(dev))
+		return 0;
+
 	return mlx5_host_pf_init(dev);
 }
 
@@ -85,6 +89,10 @@ void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_ecpf(dev))
 		return;
 
+	/* Management PF don't have a peer PF */
+	if (mlx5_core_is_management_pf(dev))
+		return;
+
 	mlx5_host_pf_cleanup(dev);
 
 	err = mlx5_wait_for_pages(dev, &dev->priv.host_pf_pages);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 0dfd5742c6fe..bbb6dab3b21f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1488,7 +1488,7 @@ int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 *
 	void *hca_caps;
 	int err;
 
-	if (!mlx5_core_is_ecpf(dev)) {
+	if (!mlx5_core_is_ecpf(dev) || mlx5_core_is_management_pf(dev)) {
 		*max_sfs = 0;
 		return 0;
 	}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 0c4f6acf59ca..50a5780367fa 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1202,6 +1202,11 @@ static inline bool mlx5_core_is_vf(const struct mlx5_core_dev *dev)
 	return dev->coredev_type == MLX5_COREDEV_VF;
 }
 
+static inline bool mlx5_core_is_management_pf(const struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, num_ports) == 1 && !MLX5_CAP_GEN(dev, native_port_num);
+}
+
 static inline bool mlx5_core_is_ecpf(const struct mlx5_core_dev *dev)
 {
 	return dev->caps.embedded_cpu;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 10/15] net/mlx5: Introduce and use opcode getter in command interface
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 09/15] net/mlx5: Enable management PF initialization Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 11/15] net/mlx5: Prevent high-rate FW commands from populating all slots Saeed Mahameed
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Moshe Shemesh

From: Tariq Toukan <tariqt@nvidia.com>

Introduce an opcode getter in the FW command interface, and use it.
Initialize the entry's opcode field early in cmd_alloc_ent() and use it
when possible.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 88 +++++++++----------
 1 file changed, 42 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index d3ca745d107d..541eecfdd598 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -47,6 +47,25 @@
 #define CREATE_TRACE_POINTS
 #include "diag/cmd_tracepoint.h"
 
+struct mlx5_ifc_mbox_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mbox_in_bits {
+	u8         opcode[0x10];
+	u8         uid[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x40];
+};
+
 enum {
 	CMD_IF_REV = 5,
 };
@@ -70,6 +89,11 @@ enum {
 	MLX5_CMD_DELIVERY_STAT_CMD_DESCR_ERR		= 0x10,
 };
 
+static u16 in_to_opcode(void *in)
+{
+	return MLX5_GET(mbox_in, in, opcode);
+}
+
 static struct mlx5_cmd_work_ent *
 cmd_alloc_ent(struct mlx5_cmd *cmd, struct mlx5_cmd_msg *in,
 	      struct mlx5_cmd_msg *out, void *uout, int uout_size,
@@ -91,6 +115,7 @@ cmd_alloc_ent(struct mlx5_cmd *cmd, struct mlx5_cmd_msg *in,
 	ent->context	= context;
 	ent->cmd	= cmd;
 	ent->page_queue = page_queue;
+	ent->op         = in_to_opcode(in->first.data);
 	refcount_set(&ent->refcnt, 1);
 
 	return ent;
@@ -752,25 +777,6 @@ static int cmd_status_to_err(u8 status)
 	}
 }
 
-struct mlx5_ifc_mbox_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-
-	u8         syndrome[0x20];
-
-	u8         reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_mbox_in_bits {
-	u8         opcode[0x10];
-	u8         uid[0x10];
-
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-
-	u8         reserved_at_40[0x40];
-};
-
 void mlx5_cmd_out_err(struct mlx5_core_dev *dev, u16 opcode, u16 op_mod, void *out)
 {
 	u32 syndrome = MLX5_GET(mbox_out, out, syndrome);
@@ -788,7 +794,7 @@ static void cmd_status_print(struct mlx5_core_dev *dev, void *in, void *out)
 	u16 opcode, op_mod;
 	u16 uid;
 
-	opcode = MLX5_GET(mbox_in, in, opcode);
+	opcode = in_to_opcode(in);
 	op_mod = MLX5_GET(mbox_in, in, op_mod);
 	uid    = MLX5_GET(mbox_in, in, uid);
 
@@ -800,7 +806,7 @@ int mlx5_cmd_check(struct mlx5_core_dev *dev, int err, void *in, void *out)
 {
 	/* aborted due to PCI error or via reset flow mlx5_cmd_trigger_completions() */
 	if (err == -ENXIO) {
-		u16 opcode = MLX5_GET(mbox_in, in, opcode);
+		u16 opcode = in_to_opcode(in);
 		u32 syndrome;
 		u8 status;
 
@@ -829,9 +835,9 @@ static void dump_command(struct mlx5_core_dev *dev,
 			 struct mlx5_cmd_work_ent *ent, int input)
 {
 	struct mlx5_cmd_msg *msg = input ? ent->in : ent->out;
-	u16 op = MLX5_GET(mbox_in, ent->lay->in, opcode);
 	struct mlx5_cmd_mailbox *next = msg->next;
 	int n = mlx5_calc_cmd_blocks(msg);
+	u16 op = ent->op;
 	int data_only;
 	u32 offset = 0;
 	int dump_len;
@@ -883,11 +889,6 @@ static void dump_command(struct mlx5_core_dev *dev,
 	mlx5_core_dbg(dev, "cmd[%d]: end dump\n", ent->idx);
 }
 
-static u16 msg_to_opcode(struct mlx5_cmd_msg *in)
-{
-	return MLX5_GET(mbox_in, in->first.data, opcode);
-}
-
 static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced);
 
 static void cb_timeout_handler(struct work_struct *work)
@@ -905,13 +906,13 @@ static void cb_timeout_handler(struct work_struct *work)
 	/* Maybe got handled by eq recover ? */
 	if (!test_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state)) {
 		mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) Async, recovered after timeout\n", ent->idx,
-			       mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in));
+			       mlx5_command_str(ent->op), ent->op);
 		goto out; /* phew, already handled */
 	}
 
 	ent->ret = -ETIMEDOUT;
 	mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) Async, timeout. Will cause a leak of a command resource\n",
-		       ent->idx, mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in));
+		       ent->idx, mlx5_command_str(ent->op), ent->op);
 	mlx5_cmd_comp_handler(dev, 1ULL << ent->idx, true);
 
 out:
@@ -985,7 +986,6 @@ static void cmd_work_handler(struct work_struct *work)
 	ent->lay = lay;
 	memset(lay, 0, sizeof(*lay));
 	memcpy(lay->in, ent->in->first.data, sizeof(lay->in));
-	ent->op = be32_to_cpu(lay->in[0]) >> 16;
 	if (ent->in->next)
 		lay->in_ptr = cpu_to_be64(ent->in->next->dma);
 	lay->inlen = cpu_to_be32(ent->in->len);
@@ -1098,12 +1098,12 @@ static void wait_func_handle_exec_timeout(struct mlx5_core_dev *dev,
 	 */
 	if (wait_for_completion_timeout(&ent->done, timeout)) {
 		mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) recovered after timeout\n", ent->idx,
-			       mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in));
+			       mlx5_command_str(ent->op), ent->op);
 		return;
 	}
 
 	mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) No done completion\n", ent->idx,
-		       mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in));
+		       mlx5_command_str(ent->op), ent->op);
 
 	ent->ret = -ETIMEDOUT;
 	mlx5_cmd_comp_handler(dev, 1ULL << ent->idx, true);
@@ -1130,12 +1130,10 @@ static int wait_func(struct mlx5_core_dev *dev, struct mlx5_cmd_work_ent *ent)
 
 	if (err == -ETIMEDOUT) {
 		mlx5_core_warn(dev, "%s(0x%x) timeout. Will cause a leak of a command resource\n",
-			       mlx5_command_str(msg_to_opcode(ent->in)),
-			       msg_to_opcode(ent->in));
+			       mlx5_command_str(ent->op), ent->op);
 	} else if (err == -ECANCELED) {
 		mlx5_core_warn(dev, "%s(0x%x) canceled on out of queue timeout.\n",
-			       mlx5_command_str(msg_to_opcode(ent->in)),
-			       msg_to_opcode(ent->in));
+			       mlx5_command_str(ent->op), ent->op);
 	}
 	mlx5_core_dbg(dev, "err %d, delivery status %s(%d)\n",
 		      err, deliv_status_to_str(ent->status), ent->status);
@@ -1169,7 +1167,6 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in,
 	u8 status = 0;
 	int err = 0;
 	s64 ds;
-	u16 op;
 
 	if (callback && page_queue)
 		return -EINVAL;
@@ -1209,9 +1206,8 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in,
 		goto out_free;
 
 	ds = ent->ts2 - ent->ts1;
-	op = MLX5_GET(mbox_in, in->first.data, opcode);
-	if (op < MLX5_CMD_OP_MAX) {
-		stats = &cmd->stats[op];
+	if (ent->op < MLX5_CMD_OP_MAX) {
+		stats = &cmd->stats[ent->op];
 		spin_lock_irq(&stats->lock);
 		stats->sum += ds;
 		++stats->n;
@@ -1219,7 +1215,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in,
 	}
 	mlx5_core_dbg_mask(dev, 1 << MLX5_CMD_TIME,
 			   "fw exec time for %s is %lld nsec\n",
-			   mlx5_command_str(op), ds);
+			   mlx5_command_str(ent->op), ds);
 
 out_free:
 	status = ent->status;
@@ -1816,7 +1812,7 @@ static struct mlx5_cmd_msg *alloc_msg(struct mlx5_core_dev *dev, int in_size,
 
 static int is_manage_pages(void *in)
 {
-	return MLX5_GET(mbox_in, in, opcode) == MLX5_CMD_OP_MANAGE_PAGES;
+	return in_to_opcode(in) == MLX5_CMD_OP_MANAGE_PAGES;
 }
 
 /*  Notes:
@@ -1827,8 +1823,8 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
 		    int out_size, mlx5_cmd_cbk_t callback, void *context,
 		    bool force_polling)
 {
-	u16 opcode = MLX5_GET(mbox_in, in, opcode);
 	struct mlx5_cmd_msg *inb, *outb;
+	u16 opcode = in_to_opcode(in);
 	int pages_queue;
 	gfp_t gfp;
 	u8 token;
@@ -1950,8 +1946,8 @@ static int cmd_status_err(struct mlx5_core_dev *dev, int err, u16 opcode, u16 op
 int mlx5_cmd_do(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int out_size)
 {
 	int err = cmd_exec(dev, in, in_size, out, out_size, NULL, NULL, false);
-	u16 opcode = MLX5_GET(mbox_in, in, opcode);
 	u16 op_mod = MLX5_GET(mbox_in, in, op_mod);
+	u16 opcode = in_to_opcode(in);
 
 	return cmd_status_err(dev, err, opcode, op_mod, out);
 }
@@ -1996,8 +1992,8 @@ int mlx5_cmd_exec_polling(struct mlx5_core_dev *dev, void *in, int in_size,
 			  void *out, int out_size)
 {
 	int err = cmd_exec(dev, in, in_size, out, out_size, NULL, NULL, true);
-	u16 opcode = MLX5_GET(mbox_in, in, opcode);
 	u16 op_mod = MLX5_GET(mbox_in, in, op_mod);
+	u16 opcode = in_to_opcode(in);
 
 	err = cmd_status_err(dev, err, opcode, op_mod, out);
 	return mlx5_cmd_check(dev, err, in, out);
@@ -2049,7 +2045,7 @@ int mlx5_cmd_exec_cb(struct mlx5_async_ctx *ctx, void *in, int in_size,
 
 	work->ctx = ctx;
 	work->user_callback = callback;
-	work->opcode = MLX5_GET(mbox_in, in, opcode);
+	work->opcode = in_to_opcode(in);
 	work->op_mod = MLX5_GET(mbox_in, in, op_mod);
 	work->out = out;
 	if (WARN_ON(!atomic_inc_not_zero(&ctx->num_inflight)))
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 11/15] net/mlx5: Prevent high-rate FW commands from populating all slots
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 10/15] net/mlx5: Introduce and use opcode getter in command interface Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 12/15] net/mlx5e: Replace zero-length array with flexible-array member Saeed Mahameed
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Moshe Shemesh

From: Tariq Toukan <tariqt@nvidia.com>

Certain connection-based device-offload protocols (like TLS) use
per-connection HW objects to track the state, maintain the context, and
perform the offload properly. Some of these objects are created,
modified, and destroyed via FW commands. Under high connection rate,
this type of FW commands might continuously populate all slots of the FW
command interface and throttle it, while starving other critical control
FW commands.

Limit these throttle commands to using only up to a portion (half) of
the FW command interface slots. FW commands maximal rate is not hit, and
the same high rate is still reached when applying this limitation.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 30 ++++++++++++++++++-
 include/linux/mlx5/driver.h                   |  1 +
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 541eecfdd598..24da9c5e63e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -94,6 +94,21 @@ static u16 in_to_opcode(void *in)
 	return MLX5_GET(mbox_in, in, opcode);
 }
 
+/* Returns true for opcodes that might be triggered very frequently and throttle
+ * the command interface. Limit their command slots usage.
+ */
+static bool mlx5_cmd_is_throttle_opcode(u16 op)
+{
+	switch (op) {
+	case MLX5_CMD_OP_CREATE_GENERAL_OBJECT:
+	case MLX5_CMD_OP_DESTROY_GENERAL_OBJECT:
+	case MLX5_CMD_OP_MODIFY_GENERAL_OBJECT:
+	case MLX5_CMD_OP_QUERY_GENERAL_OBJECT:
+		return true;
+	}
+	return false;
+}
+
 static struct mlx5_cmd_work_ent *
 cmd_alloc_ent(struct mlx5_cmd *cmd, struct mlx5_cmd_msg *in,
 	      struct mlx5_cmd_msg *out, void *uout, int uout_size,
@@ -1825,6 +1840,7 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
 {
 	struct mlx5_cmd_msg *inb, *outb;
 	u16 opcode = in_to_opcode(in);
+	bool throttle_op;
 	int pages_queue;
 	gfp_t gfp;
 	u8 token;
@@ -1833,13 +1849,21 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
 	if (mlx5_cmd_is_down(dev) || !opcode_allowed(&dev->cmd, opcode))
 		return -ENXIO;
 
+	throttle_op = mlx5_cmd_is_throttle_opcode(opcode);
+	if (throttle_op) {
+		/* atomic context may not sleep */
+		if (callback)
+			return -EINVAL;
+		down(&dev->cmd.throttle_sem);
+	}
+
 	pages_queue = is_manage_pages(in);
 	gfp = callback ? GFP_ATOMIC : GFP_KERNEL;
 
 	inb = alloc_msg(dev, in_size, gfp);
 	if (IS_ERR(inb)) {
 		err = PTR_ERR(inb);
-		return err;
+		goto out_up;
 	}
 
 	token = alloc_token(&dev->cmd);
@@ -1873,6 +1897,9 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
 	mlx5_free_cmd_msg(dev, outb);
 out_in:
 	free_msg(dev, inb);
+out_up:
+	if (throttle_op)
+		up(&dev->cmd.throttle_sem);
 	return err;
 }
 
@@ -2222,6 +2249,7 @@ int mlx5_cmd_init(struct mlx5_core_dev *dev)
 
 	sema_init(&cmd->sem, cmd->max_reg_cmds);
 	sema_init(&cmd->pages_sem, 1);
+	sema_init(&cmd->throttle_sem, DIV_ROUND_UP(cmd->max_reg_cmds, 2));
 
 	cmd_h = (u32)((u64)(cmd->dma) >> 32);
 	cmd_l = (u32)(cmd->dma);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 50a5780367fa..7c393da396b1 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -310,6 +310,7 @@ struct mlx5_cmd {
 	struct workqueue_struct *wq;
 	struct semaphore sem;
 	struct semaphore pages_sem;
+	struct semaphore throttle_sem;
 	int	mode;
 	u16     allowed_opcode;
 	struct mlx5_cmd_work_ent *ent_arr[MLX5_MAX_COMMANDS];
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 12/15] net/mlx5e: Replace zero-length array with flexible-array member
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 11/15] net/mlx5: Prevent high-rate FW commands from populating all slots Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 13/15] net/mlx5e: Replace 0-length array with flexible array Saeed Mahameed
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gustavo A. R. Silva

From: "Gustavo A. R. Silva" <gustavoars@kernel.org>

Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length
array declaration in struct mlx5e_flow_meter_aso_obj with flex-array
member.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [2].

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays [1]
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2]
Link: https://github.com/KSPP/linux/issues/78
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c
index 78af8a3175bf..7758a425bfa8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c
@@ -28,7 +28,7 @@ struct mlx5e_flow_meter_aso_obj {
 	int base_id;
 	int total_meters;
 
-	unsigned long meters_map[0]; /* must be at the end of this struct */
+	unsigned long meters_map[]; /* must be at the end of this struct */
 };
 
 struct mlx5e_flow_meters {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 13/15] net/mlx5e: Replace 0-length array with flexible array
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 12/15] net/mlx5e: Replace zero-length array with flexible-array member Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 14/15] net/mlx5: remove redundant ret variable Saeed Mahameed
  2023-01-11  5:30 ` [net-next 15/15] net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create() Saeed Mahameed
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Kees Cook, Leon Romanovsky,
	Gustavo A. R. Silva, linux-rdma, Jiri Pirko

From: Kees Cook <keescook@chromium.org>

Zero-length arrays are deprecated[1]. Replace struct mlx5e_rx_wqe_cyc's
"data" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_alloc_rq':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:827:42: warning: array subscript f is outside array bounds of 'struct mlx5_wqe_data_seg[0]' [-Warray-bounds=]
  827 |                                 wqe->data[f].byte_count = 0;
      |                                 ~~~~~~~~~^~~
In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h:11,
                 from drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:48,
                 from drivers/net/ethernet/mellanox/mlx5/core/en_main.c:42:
drivers/net/ethernet/mellanox/mlx5/core/en.h:250:39: note: while referencing 'data'
  250 |         struct mlx5_wqe_data_seg      data[0];
      |                                       ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7cbd71f0b8ae..82573ac722d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -247,7 +247,7 @@ struct mlx5e_rx_wqe_ll {
 };
 
 struct mlx5e_rx_wqe_cyc {
-	struct mlx5_wqe_data_seg      data[0];
+	DECLARE_FLEX_ARRAY(struct mlx5_wqe_data_seg, data);
 };
 
 struct mlx5e_umr_wqe {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 14/15] net/mlx5: remove redundant ret variable
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 13/15] net/mlx5e: Replace 0-length array with flexible array Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  2023-01-11  5:30 ` [net-next 15/15] net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create() Saeed Mahameed
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, zhang songyi, Roi Dayan,
	Jiri Pirko

From: zhang songyi <zhang.songyi@zte.com.cn>

Return value from mlx5dr_send_postsend_action() directly instead of taking
this in another redundant variable.

Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
index a4476cb4c3b3..fd2d31cdbcf9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
@@ -724,7 +724,6 @@ int mlx5dr_send_postsend_action(struct mlx5dr_domain *dmn,
 				struct mlx5dr_action *action)
 {
 	struct postsend_info send_info = {};
-	int ret;
 
 	send_info.write.addr = (uintptr_t)action->rewrite->data;
 	send_info.write.length = action->rewrite->num_of_actions *
@@ -734,9 +733,7 @@ int mlx5dr_send_postsend_action(struct mlx5dr_domain *dmn,
 		mlx5dr_icm_pool_get_chunk_mr_addr(action->rewrite->chunk);
 	send_info.rkey = mlx5dr_icm_pool_get_chunk_rkey(action->rewrite->chunk);
 
-	ret = dr_postsend_icm_data(dmn, &send_info);
-
-	return ret;
+	return dr_postsend_icm_data(dmn, &send_info);
 }
 
 static int dr_modify_qp_rst2init(struct mlx5_core_dev *mdev,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next 15/15] net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create()
  2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2023-01-11  5:30 ` [net-next 14/15] net/mlx5: remove redundant ret variable Saeed Mahameed
@ 2023-01-11  5:30 ` Saeed Mahameed
  14 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, YueHaibing

From: YueHaibing <yuehaibing@huawei.com>

'accel_tcp' is allocted by kvzalloc() now, which is a small chunk.
Use kzalloc() directly instead of kvzalloc().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c
index d7c020f72401..88a5aed9d678 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c
@@ -365,7 +365,7 @@ void mlx5e_accel_fs_tcp_destroy(struct mlx5e_flow_steering *fs)
 	for (i = 0; i < ACCEL_FS_TCP_NUM_TYPES; i++)
 		accel_fs_tcp_destroy_table(fs, i);
 
-	kvfree(accel_tcp);
+	kfree(accel_tcp);
 	mlx5e_fs_set_accel_tcp(fs, NULL);
 }
 
@@ -377,7 +377,7 @@ int mlx5e_accel_fs_tcp_create(struct mlx5e_flow_steering *fs)
 	if (!MLX5_CAP_FLOWTABLE_NIC_RX(mlx5e_fs_get_mdev(fs), ft_field_support.outer_ip_version))
 		return -EOPNOTSUPP;
 
-	accel_tcp = kvzalloc(sizeof(*accel_tcp), GFP_KERNEL);
+	accel_tcp = kzalloc(sizeof(*accel_tcp), GFP_KERNEL);
 	if (!accel_tcp)
 		return -ENOMEM;
 	mlx5e_fs_set_accel_tcp(fs, accel_tcp);
@@ -397,7 +397,7 @@ int mlx5e_accel_fs_tcp_create(struct mlx5e_flow_steering *fs)
 err_destroy_tables:
 	while (--i >= 0)
 		accel_fs_tcp_destroy_table(fs, i);
-	kvfree(accel_tcp);
+	kfree(accel_tcp);
 	mlx5e_fs_set_accel_tcp(fs, NULL);
 	return err;
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [net-next 01/15] net/mlx5: Expose shared buffer registers bits and structs
  2023-01-11  5:30 ` [net-next 01/15] net/mlx5: Expose shared buffer registers bits and structs Saeed Mahameed
@ 2023-01-11  9:10   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 29+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-01-11  9:10 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: davem, kuba, pabeni, edumazet, saeedm, netdev, tariqt, msanalla, moshe

Hello:

This series was applied to netdev/net-next.git (master)
by Saeed Mahameed <saeedm@nvidia.com>:

On Tue, 10 Jan 2023 21:30:31 -0800 you wrote:
> From: Maher Sanalla <msanalla@nvidia.com>
> 
> Add the shared receive buffer management and configuration registers:
> 1. SBPR - Shared Buffer Pools Register
> 2. SBCM - Shared Buffer Class Management Register
> 
> Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> 
> [...]

Here is the summary with links:
  - [net-next,01/15] net/mlx5: Expose shared buffer registers bits and structs
    https://git.kernel.org/netdev/net-next/c/8d231dbc3b10
  - [net-next,02/15] net/mlx5e: Add API to query/modify SBPR and SBCM registers
    https://git.kernel.org/netdev/net-next/c/11f0996d5c60
  - [net-next,03/15] net/mlx5e: Update shared buffer along with device buffer changes
    https://git.kernel.org/netdev/net-next/c/a440030d8946
  - [net-next,04/15] net/mlx5e: Add Ethernet driver debugfs
    https://git.kernel.org/netdev/net-next/c/288eca60cc31
  - [net-next,05/15] net/mlx5e: kTLS, Add debugfs
    https://git.kernel.org/netdev/net-next/c/0fedee1ae9ef
  - [net-next,06/15] net/mlx5e: Add hairpin params structure
    https://git.kernel.org/netdev/net-next/c/1a8034720f38
  - [net-next,07/15] net/mlx5e: Add flow steering debugfs directory
    https://git.kernel.org/netdev/net-next/c/3a3da78dd258
  - [net-next,08/15] net/mlx5e: Add hairpin debugfs files
    https://git.kernel.org/netdev/net-next/c/0e414518d6d8
  - [net-next,09/15] net/mlx5: Enable management PF initialization
    https://git.kernel.org/netdev/net-next/c/fe998a3c77b9
  - [net-next,10/15] net/mlx5: Introduce and use opcode getter in command interface
    https://git.kernel.org/netdev/net-next/c/7cb5eb937231
  - [net-next,11/15] net/mlx5: Prevent high-rate FW commands from populating all slots
    https://git.kernel.org/netdev/net-next/c/63fbae0a74c3
  - [net-next,12/15] net/mlx5e: Replace zero-length array with flexible-array member
    https://git.kernel.org/netdev/net-next/c/7193b436b56e
  - [net-next,13/15] net/mlx5e: Replace 0-length array with flexible array
    https://git.kernel.org/netdev/net-next/c/7bd1099c7ede
  - [net-next,14/15] net/mlx5: remove redundant ret variable
    https://git.kernel.org/netdev/net-next/c/4238654ce166
  - [net-next,15/15] net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create()
    https://git.kernel.org/netdev/net-next/c/96c31b5b2cae

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 05/15] net/mlx5e: kTLS, Add debugfs
  2023-01-11  5:30 ` [net-next 05/15] net/mlx5e: kTLS, Add debugfs Saeed Mahameed
@ 2023-01-11 18:32   ` Jakub Kicinski
  2023-01-11 20:20     ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-01-11 18:32 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On Tue, 10 Jan 2023 21:30:35 -0800 Saeed Mahameed wrote:
> Add TLS debugfs to improve observability by exposing the size of the tls
> TX pool.

What is the TLS TX pool?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-11  5:30 ` [net-next 08/15] net/mlx5e: Add hairpin debugfs files Saeed Mahameed
@ 2023-01-11 18:34   ` Jakub Kicinski
  2023-01-11 20:46     ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-01-11 18:34 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On Tue, 10 Jan 2023 21:30:38 -0800 Saeed Mahameed wrote:
> +	debugfs_create_file("hairpin_num_queues", 0644, tc->dfs_root,
> +			    &tc->hairpin_params, &fops_hairpin_queues);
> +	debugfs_create_file("hairpin_queue_size", 0644, tc->dfs_root,
> +			    &tc->hairpin_params, &fops_hairpin_queue_size);

debugfs should be read-only, please LMK if I'm missing something,
otherwise this series is getting reverted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 05/15] net/mlx5e: kTLS, Add debugfs
  2023-01-11 18:32   ` Jakub Kicinski
@ 2023-01-11 20:20     ` Saeed Mahameed
  2023-01-11 20:52       ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11 20:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On 11 Jan 10:32, Jakub Kicinski wrote:
>On Tue, 10 Jan 2023 21:30:35 -0800 Saeed Mahameed wrote:
>> Add TLS debugfs to improve observability by exposing the size of the tls
>> TX pool.
>
>What is the TLS TX pool?

https://lore.kernel.org/netdev/20220727094346.10540-1-tariqt@nvidia.com/

We recycle HW crypto objects used for tls between old and new connections.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-11 18:34   ` Jakub Kicinski
@ 2023-01-11 20:46     ` Saeed Mahameed
  2023-01-11 21:03       ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11 20:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On 11 Jan 10:34, Jakub Kicinski wrote:
>On Tue, 10 Jan 2023 21:30:38 -0800 Saeed Mahameed wrote:
>> +	debugfs_create_file("hairpin_num_queues", 0644, tc->dfs_root,
>> +			    &tc->hairpin_params, &fops_hairpin_queues);
>> +	debugfs_create_file("hairpin_queue_size", 0644, tc->dfs_root,
>> +			    &tc->hairpin_params, &fops_hairpin_queue_size);
>
>debugfs should be read-only, please LMK if I'm missing something,
>otherwise this series is getting reverted

I remember asking you about this and you said it's ok to use write for
debug features, this is needed for debugging performance bottlenecks.
hairpin + steering performance behaves differently between different
hardware versions and under different NIC/E-Switch configs, so it's really
important to have some control on some of these attributes when debugging.

Our dilemma was either to use devlink vendor params or a debug interface, 
since we are pretty sure that our NIC hairpin implementation
is unique as it uses software constructs (RQs/SQs) managed internally
by Firmware for abstraction of a TC redirect action, thus the only place
for this is either devlink vendor params or debugfs, we chose debugfs since
we want to keep this for debug purposes on production systems.

we also considered extending TC but again since this is unique to CX
architecture of the current chips, we didn't want to pollute TC.

Also devlink resource wasn't a good match since these resources don't
exist until a TC redirect action is offloaded.

Please let me know what you think and whether this is acceptable by you.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 05/15] net/mlx5e: kTLS, Add debugfs
  2023-01-11 20:20     ` Saeed Mahameed
@ 2023-01-11 20:52       ` Jakub Kicinski
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-01-11 20:52 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On Wed, 11 Jan 2023 12:20:24 -0800 Saeed Mahameed wrote:
> On 11 Jan 10:32, Jakub Kicinski wrote:
> >On Tue, 10 Jan 2023 21:30:35 -0800 Saeed Mahameed wrote:  
> >> Add TLS debugfs to improve observability by exposing the size of the tls
> >> TX pool.  
> >
> >What is the TLS TX pool?  
> 
> https://lore.kernel.org/netdev/20220727094346.10540-1-tariqt@nvidia.com/
> 
> We recycle HW crypto objects used for tls between old and new connections.

I see, thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-11 20:46     ` Saeed Mahameed
@ 2023-01-11 21:03       ` Jakub Kicinski
  2023-01-11 23:01         ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-01-11 21:03 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On Wed, 11 Jan 2023 12:46:08 -0800 Saeed Mahameed wrote:
> On 11 Jan 10:34, Jakub Kicinski wrote:
> >On Tue, 10 Jan 2023 21:30:38 -0800 Saeed Mahameed wrote:  
> >> +	debugfs_create_file("hairpin_num_queues", 0644, tc->dfs_root,
> >> +			    &tc->hairpin_params, &fops_hairpin_queues);
> >> +	debugfs_create_file("hairpin_queue_size", 0644, tc->dfs_root,
> >> +			    &tc->hairpin_params, &fops_hairpin_queue_size);  
> >
> >debugfs should be read-only, please LMK if I'm missing something,
> >otherwise this series is getting reverted  
> 
> I remember asking you about this and you said it's ok to use write for
> debug features, this is needed for debugging performance bottlenecks.

FWIW I don't think this fits into the debug exemption. What I meant by
debug was stuff like write to configure what traces or debug features
of the chip are enabled. This falls into configuration, even if it's
not expected to be tweaked by users.

> hairpin + steering performance behaves differently between different
> hardware versions and under different NIC/E-Switch configs, so it's really
> important to have some control on some of these attributes when debugging.

Can you expand on the use of this params when debugging? AFAICT these
configure the RQ/SQ pairs (count and size) so really the only
"debugging" you can do here is change the config and see if it fixes
performance...

> Our dilemma was either to use devlink vendor params or a debug interface, 
> since we are pretty sure that our NIC hairpin implementation
> is unique as it uses software constructs (RQs/SQs) managed internally
> by Firmware for abstraction of a TC redirect action, thus the only place
> for this is either devlink vendor params or debugfs, we chose debugfs since
> we want to keep this for debug purposes on production systems.
> 
> we also considered extending TC but again since this is unique to CX
> architecture of the current chips, we didn't want to pollute TC.
> 
> Also devlink resource wasn't a good match since these resources don't
> exist until a TC redirect action is offloaded.
> 
> Please let me know what you think and whether this is acceptable by you.

I don't know of any other devices which need the hairpin setup 
so I won't push for a common API. But we *do* need to list these
tunables somewhere because my ability to grep them out of mlx5 when
another vendor comes with the same problem will be very limited.
Which is one of the reasons why devlink params have to be documented.
Plus IIRC you already have the EQ configuration via params.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-11 21:03       ` Jakub Kicinski
@ 2023-01-11 23:01         ` Saeed Mahameed
  2023-01-12  3:46           ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2023-01-11 23:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On 11 Jan 13:03, Jakub Kicinski wrote:
>On Wed, 11 Jan 2023 12:46:08 -0800 Saeed Mahameed wrote:
>> On 11 Jan 10:34, Jakub Kicinski wrote:
>> >On Tue, 10 Jan 2023 21:30:38 -0800 Saeed Mahameed wrote:
>> >> +	debugfs_create_file("hairpin_num_queues", 0644, tc->dfs_root,
>> >> +			    &tc->hairpin_params, &fops_hairpin_queues);
>> >> +	debugfs_create_file("hairpin_queue_size", 0644, tc->dfs_root,
>> >> +			    &tc->hairpin_params, &fops_hairpin_queue_size);
>> >
>> >debugfs should be read-only, please LMK if I'm missing something,
>> >otherwise this series is getting reverted
>>
>> I remember asking you about this and you said it's ok to use write for
>> debug features, this is needed for debugging performance bottlenecks.
>
>FWIW I don't think this fits into the debug exemption. What I meant by
>debug was stuff like write to configure what traces or debug features
>of the chip are enabled. This falls into configuration, even if it's
>not expected to be tweaked by users.
>

I see.

>> hairpin + steering performance behaves differently between different
>> hardware versions and under different NIC/E-Switch configs, so it's really
>> important to have some control on some of these attributes when debugging.
>
>Can you expand on the use of this params when debugging? AFAICT these
>configure the RQ/SQ pairs (count and size) so really the only
>"debugging" you can do here is change the config and see if it fixes
>performance...

it's more of understanding the performance effects and characteristics when
combined with other steering configs depending on the HW and current
topology, i don't have exact examples, but usually the debug ends up with
optimizing other places (steering, Firmware, application at the
other end, etc .. )

Sorry i don't have much details here, Maybe Gal can chime in.. 
but what i am sure of is changing the hairpin RQ/SQ configs comes
with a risk.

>
>> Our dilemma was either to use devlink vendor params or a debug interface,
>> since we are pretty sure that our NIC hairpin implementation
>> is unique as it uses software constructs (RQs/SQs) managed internally
>> by Firmware for abstraction of a TC redirect action, thus the only place
>> for this is either devlink vendor params or debugfs, we chose debugfs since
>> we want to keep this for debug purposes on production systems.
>>
>> we also considered extending TC but again since this is unique to CX
>> architecture of the current chips, we didn't want to pollute TC.
>>
>> Also devlink resource wasn't a good match since these resources don't
>> exist until a TC redirect action is offloaded.
>>
>> Please let me know what you think and whether this is acceptable by you.
>
>I don't know of any other devices which need the hairpin setup
>so I won't push for a common API. But we *do* need to list these
>tunables somewhere because my ability to grep them out of mlx5 when
>another vendor comes with the same problem will be very limited.
>Which is one of the reasons why devlink params have to be documented.

Then let's create https://docs.kernel.org/networking/vendor_specific.rst
and record all vendor specific dump in there, including devlink and
ethtool private flags.
once we find a common behavior, it means this should move to be standard? 

>Plus IIRC you already have the EQ configuration via params.

EQ is considered standard parameter in devlink.

We currently have 2 vendor specific params and they are related to
steering pipeline/engines only.
hairpin buffer/queue sizes is only a CX limitation, and implementation
detail.

you can clearly see a pattern here, usually the steering pipeline
requires vendor specific knobs :/ ..

Will you be ok if we moved hairpin config to devlink driver specific param
? given that we will create the vendor_specific.rst for easy tracking and
grepping.

Thanks,
Saeed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-11 23:01         ` Saeed Mahameed
@ 2023-01-12  3:46           ` Jakub Kicinski
  2023-01-12  9:17             ` Gal Pressman
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-01-12  3:46 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman

On Wed, 11 Jan 2023 15:01:31 -0800 Saeed Mahameed wrote:
> Sorry i don't have much details here, Maybe Gal can chime in.. 
> but what i am sure of is changing the hairpin RQ/SQ configs comes
> with a risk.

Would be great if someone could chime in..

> >Plus IIRC you already have the EQ configuration via params.  
> 
> EQ is considered standard parameter in devlink.
> 
> We currently have 2 vendor specific params and they are related to
> steering pipeline/engines only.
> hairpin buffer/queue sizes is only a CX limitation, and implementation
> detail.
> 
> you can clearly see a pattern here, usually the steering pipeline
> requires vendor specific knobs :/ ..
> 
> Will you be ok if we moved hairpin config to devlink driver specific param
> ? given that we will create the vendor_specific.rst for easy tracking and
> grepping.

Alright, that sounds okay. But vendor_specific.rst needs to only cover
debugfs and ethtool private flags, right? Devlink params are already
documented in per-driver devlink docs, and splitting params into two
places would be odd.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-12  3:46           ` Jakub Kicinski
@ 2023-01-12  9:17             ` Gal Pressman
  2023-01-12 22:20               ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Gal Pressman @ 2023-01-12  9:17 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan

On 12/01/2023 5:46, Jakub Kicinski wrote:
> On Wed, 11 Jan 2023 15:01:31 -0800 Saeed Mahameed wrote:
>> Sorry i don't have much details here, Maybe Gal can chime in.. 
>> but what i am sure of is changing the hairpin RQ/SQ configs comes
>> with a risk.
> 
> Would be great if someone could chime in..

Hey,
As Saeed said, we discussed different APIs for this, debugfs seemed like
the best fit as we don't want users to change the queues parameters for
production purposes. Debugfs makes it clear that these params aren't for
your ordinary use, and allows us to be more flexible over time if needed
(we don't necessarily have to keep these files there forever, if our
hardware implementation changes for example).

Devlink param would work, but the message conveyed is a bit different
:). It makes it seem like this is a knob we want people to play with.
(And we have to support it forever).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-12  9:17             ` Gal Pressman
@ 2023-01-12 22:20               ` Jakub Kicinski
  2023-01-15 10:04                 ` Gal Pressman
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-01-12 22:20 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

On Thu, 12 Jan 2023 11:17:07 +0200 Gal Pressman wrote:
> As Saeed said, we discussed different APIs for this, debugfs seemed like
> the best fit as we don't want users to change the queues parameters for
> production purposes. Debugfs makes it clear that these params aren't for
> your ordinary use, and allows us to be more flexible over time if needed
> (we don't necessarily have to keep these files there forever, if our
> hardware implementation changes for example).

You cut off the original question in your reply, it was:

  Can you expand on the use of this params when debugging?

IOW why do you need to change this configuration during debug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-12 22:20               ` Jakub Kicinski
@ 2023-01-15 10:04                 ` Gal Pressman
  2023-01-17 18:48                   ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Gal Pressman @ 2023-01-15 10:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

On 13/01/2023 0:20, Jakub Kicinski wrote:
> On Thu, 12 Jan 2023 11:17:07 +0200 Gal Pressman wrote:
>> As Saeed said, we discussed different APIs for this, debugfs seemed like
>> the best fit as we don't want users to change the queues parameters for
>> production purposes. Debugfs makes it clear that these params aren't for
>> your ordinary use, and allows us to be more flexible over time if needed
>> (we don't necessarily have to keep these files there forever, if our
>> hardware implementation changes for example).
> 
> You cut off the original question in your reply, it was:
> 
>   Can you expand on the use of this params when debugging?
> 
> IOW why do you need to change this configuration during debug.

The hairpin queues are different than other queues in the driver as they
are controlled by the device (refill, completion handling, etc.).
Hardware configuration can make a difference in performance when working
with hairpin, things that wouldn't necessarily affect regular queues the
driver uses. The debugging process is also more difficult as the driver
has little control/visibility over these.

At the end of the day, the debug process *is* going to be playing with
the queue size/number, this allows us to potentially find a number that
releases the bottleneck and see how it affects other stages in the pipe.
Since these cases are unlikely to happen, and changing of these
parameters can affect the device in other ways, we don't want people to
just increase them when they encounter performance issues, especially
not in production environments.

Does that make sense?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next 08/15] net/mlx5e: Add hairpin debugfs files
  2023-01-15 10:04                 ` Gal Pressman
@ 2023-01-17 18:48                   ` Jakub Kicinski
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-01-17 18:48 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan

On Sun, 15 Jan 2023 12:04:58 +0200 Gal Pressman wrote:
> The hairpin queues are different than other queues in the driver as they
> are controlled by the device (refill, completion handling, etc.).
> Hardware configuration can make a difference in performance when working
> with hairpin, things that wouldn't necessarily affect regular queues the
> driver uses. The debugging process is also more difficult as the driver
> has little control/visibility over these.
> 
> At the end of the day, the debug process *is* going to be playing with
> the queue size/number, this allows us to potentially find a number that
> releases the bottleneck and see how it affects other stages in the pipe.
> Since these cases are unlikely to happen, and changing of these
> parameters can affect the device in other ways, we don't want people to
> just increase them when they encounter performance issues, especially
> not in production environments.
> 
> Does that make sense?

Okay, I think my guess that "debug" here means "wobble it to see if 
the device can go faster" was indeed correct. Long term maybe we should
find a better word for that than "debug".

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-01-17 19:56 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-11  5:30 [pull request][net-next 00/15] mlx5 updates 2023-01-10 Saeed Mahameed
2023-01-11  5:30 ` [net-next 01/15] net/mlx5: Expose shared buffer registers bits and structs Saeed Mahameed
2023-01-11  9:10   ` patchwork-bot+netdevbpf
2023-01-11  5:30 ` [net-next 02/15] net/mlx5e: Add API to query/modify SBPR and SBCM registers Saeed Mahameed
2023-01-11  5:30 ` [net-next 03/15] net/mlx5e: Update shared buffer along with device buffer changes Saeed Mahameed
2023-01-11  5:30 ` [net-next 04/15] net/mlx5e: Add Ethernet driver debugfs Saeed Mahameed
2023-01-11  5:30 ` [net-next 05/15] net/mlx5e: kTLS, Add debugfs Saeed Mahameed
2023-01-11 18:32   ` Jakub Kicinski
2023-01-11 20:20     ` Saeed Mahameed
2023-01-11 20:52       ` Jakub Kicinski
2023-01-11  5:30 ` [net-next 06/15] net/mlx5e: Add hairpin params structure Saeed Mahameed
2023-01-11  5:30 ` [net-next 07/15] net/mlx5e: Add flow steering debugfs directory Saeed Mahameed
2023-01-11  5:30 ` [net-next 08/15] net/mlx5e: Add hairpin debugfs files Saeed Mahameed
2023-01-11 18:34   ` Jakub Kicinski
2023-01-11 20:46     ` Saeed Mahameed
2023-01-11 21:03       ` Jakub Kicinski
2023-01-11 23:01         ` Saeed Mahameed
2023-01-12  3:46           ` Jakub Kicinski
2023-01-12  9:17             ` Gal Pressman
2023-01-12 22:20               ` Jakub Kicinski
2023-01-15 10:04                 ` Gal Pressman
2023-01-17 18:48                   ` Jakub Kicinski
2023-01-11  5:30 ` [net-next 09/15] net/mlx5: Enable management PF initialization Saeed Mahameed
2023-01-11  5:30 ` [net-next 10/15] net/mlx5: Introduce and use opcode getter in command interface Saeed Mahameed
2023-01-11  5:30 ` [net-next 11/15] net/mlx5: Prevent high-rate FW commands from populating all slots Saeed Mahameed
2023-01-11  5:30 ` [net-next 12/15] net/mlx5e: Replace zero-length array with flexible-array member Saeed Mahameed
2023-01-11  5:30 ` [net-next 13/15] net/mlx5e: Replace 0-length array with flexible array Saeed Mahameed
2023-01-11  5:30 ` [net-next 14/15] net/mlx5: remove redundant ret variable Saeed Mahameed
2023-01-11  5:30 ` [net-next 15/15] net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create() Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).