netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net-next 00/15] mlx5 updates 2022-05-09
@ 2022-05-10  5:57 Saeed Mahameed
  2022-05-10  5:57 ` [net-next 01/15] net/mlx5: Add exit route when waiting for FW Saeed Mahameed
                   ` (14 more replies)
  0 siblings, 15 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni; +Cc: netdev, Saeed Mahameed

This series adds the support for 4 ports HCAs LAG mode in mlx5 driver.
For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.


The following changes since commit 9eab75d45ddc9d29640fd17199880d39241eeb35:

  Merge branch 'nfp-support-corigine-pcie-vendor-id' (2022-05-09 18:20:42 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2022-05-09

for you to fetch changes up to 7f46a0b7327ae261f9981888708dbca22c283900:

  net/mlx5: Lag, add debugfs to query hardware lag state (2022-05-09 22:54:04 -0700)

----------------------------------------------------------------
mlx5-updates-2022-05-09

1) Gavin Li, adds exit route from waiting for FW init on device boot and
   increases FW init timeout on health recovery flow

2) Support 4 ports HCAs LAG mode

Mark Bloch Says:
================

This series adds to mlx5 drivers support for 4 ports HCAs.
Starting with ConnectX-7 HCAs with 4 ports are possible.

As most driver parts aren't affected by such configuration most driver
code is unchanged.

Specially the only affected areas are:
- Lag
- Devcom
- Merged E-Switch
- Single FDB E-Switch

Lag was chosen to be converted first. Creating hardware LAG when all 4
ports are added to the same bond device.

Devom, merge E-Switch and single FDB E-Switch, are marked as supporting
only 2 ports HCAs and future patches will add support for 4 ports HCAs.

In order to activate the hardware lag a user can execute the:

ip link add bond0 type bond
ip link set bond0 type bond miimon 100 mode 2
ip link set eth2 master bond0
ip link set eth3 master bond0
ip link set eth4 master bond0
ip link set eth5 master bond0

Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA.

================

----------------------------------------------------------------
Gavin Li (2):
      net/mlx5: Add exit route when waiting for FW
      net/mlx5: Increase FW pre-init timeout for health recovery

Mark Bloch (13):
      net/mlx5: Lag, expose number of lag ports
      net/mlx5: devcom only supports 2 ports
      net/mlx5: Lag, move E-Switch prerequisite check into lag code
      net/mlx5: Lag, use lag lock
      net/mlx5: Lag, filter non compatible devices
      net/mlx5: Lag, store number of ports inside lag object
      net/mlx5: Lag, support single FDB only on 2 ports
      net/mlx5: Lag, use hash when in roce lag on 4 ports
      net/mlx5: Lag, use actual number of lag ports
      net/mlx5: Support devices with more than 2 ports
      net/mlx5: Lag, refactor dmesg print
      net/mlx5: Lag, use buckets in hash mode
      net/mlx5: Lag, add debugfs to query hardware lag state

 drivers/infiniband/hw/mlx5/gsi.c                   |   2 +-
 drivers/infiniband/hw/mlx5/main.c                  |   1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |   1 +
 drivers/infiniband/hw/mlx5/qp.c                    |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/dev.c      |  49 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c  |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  25 -
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   8 -
 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c |   2 +-
 .../net/ethernet/mellanox/mlx5/core/lag/debugfs.c  | 173 +++++++
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c  | 537 ++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h  |  16 +-
 .../net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 129 +++--
 .../net/ethernet/mellanox/mlx5/core/lag/port_sel.h |  15 +-
 .../net/ethernet/mellanox/mlx5/core/lib/devcom.c   |  16 +-
 .../net/ethernet/mellanox/mlx5/core/lib/devcom.h   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  28 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   3 +-
 include/linux/mlx5/driver.h                        |   5 +-
 22 files changed, 720 insertions(+), 302 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [net-next 01/15] net/mlx5: Add exit route when waiting for FW
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-11 11:30   ` patchwork-bot+netdevbpf
  2022-05-10  5:57 ` [net-next 02/15] net/mlx5: Increase FW pre-init timeout for health recovery Saeed Mahameed
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Gavin Li, Moshe Shemesh, Saeed Mahameed

From: Gavin Li <gavinl@nvidia.com>

Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.

To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 ++++-
 include/linux/mlx5/driver.h                    | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 35e48ef04845..f28a3526aafa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -189,7 +189,8 @@ static int wait_fw_init(struct mlx5_core_dev *dev, u32 max_wait_mili,
 		fw_initializing = ioread32be(&dev->iseg->initializing);
 		if (!(fw_initializing >> 31))
 			break;
-		if (time_after(jiffies, end)) {
+		if (time_after(jiffies, end) ||
+		    test_and_clear_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state)) {
 			err = -EBUSY;
 			break;
 		}
@@ -1602,6 +1603,7 @@ static void remove_one(struct pci_dev *pdev)
 	struct mlx5_core_dev *dev  = pci_get_drvdata(pdev);
 	struct devlink *devlink = priv_to_devlink(dev);
 
+	set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state);
 	devlink_unregister(devlink);
 	mlx5_sriov_disable(pdev);
 	mlx5_crdump_disable(dev);
@@ -1785,6 +1787,7 @@ static void shutdown(struct pci_dev *pdev)
 	int err;
 
 	mlx5_core_info(dev, "Shutdown was called\n");
+	set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state);
 	err = mlx5_try_fast_unload(dev);
 	if (err)
 		mlx5_unload_one(dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index ff47d49d8be4..f327d0544038 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -632,6 +632,7 @@ enum mlx5_device_state {
 
 enum mlx5_interface_state {
 	MLX5_INTERFACE_STATE_UP = BIT(0),
+	MLX5_BREAK_FW_WAIT = BIT(1),
 };
 
 enum mlx5_pci_status {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 02/15] net/mlx5: Increase FW pre-init timeout for health recovery
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
  2022-05-10  5:57 ` [net-next 01/15] net/mlx5: Add exit route when waiting for FW Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 03/15] net/mlx5: Lag, expose number of lag ports Saeed Mahameed
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Gavin Li, Moshe Shemesh, Shay Drory, Saeed Mahameed

From: Gavin Li <gavinl@nvidia.com>

Currently, health recovery will reload driver to recover it from fatal
errors. During the driver's load process, it would wait for FW to set the
pre-init bit for up to 120 seconds, beyond this threshold it would abort
the load process. In some cases, such as a FW upgrade on the DPU, this
timeout period is insufficient, and the user has no way to recover the
host device.

To solve this issue, introduce a new FW pre-init timeout for health
recovery, which is set to 2 hours.

The timeout for devlink reload and probe will use the original one because
they are user triggered flows, and therefore should not have a
significantly long timeout, during which the user command would hang.

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  4 ++--
 .../ethernet/mellanox/mlx5/core/fw_reset.c    |  2 +-
 .../ethernet/mellanox/mlx5/core/lib/tout.c    |  1 +
 .../ethernet/mellanox/mlx5/core/lib/tout.h    |  1 +
 .../net/ethernet/mellanox/mlx5/core/main.c    | 23 +++++++++++--------
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  2 +-
 6 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index e8789e6d7e7b..f85166e587f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -178,13 +178,13 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a
 	*actions_performed = BIT(action);
 	switch (action) {
 	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
-		return mlx5_load_one(dev);
+		return mlx5_load_one(dev, false);
 	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
 		if (limit == DEVLINK_RELOAD_LIMIT_NO_RESET)
 			break;
 		/* On fw_activate action, also driver is reloaded and reinit performed */
 		*actions_performed |= BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
-		return mlx5_load_one(dev);
+		return mlx5_load_one(dev, false);
 	default:
 		/* Unsupported action should not get to this function */
 		WARN_ON(1);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index ca1aba845dd6..84df0d56a2b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -148,7 +148,7 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev)
 	if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) {
 		complete(&fw_reset->done);
 	} else {
-		mlx5_load_one(dev);
+		mlx5_load_one(dev, false);
 		devlink_remote_reload_actions_performed(priv_to_devlink(dev), 0,
 							BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
 							BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c
index c1df0d3595d8..d758848d34d0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c
@@ -10,6 +10,7 @@ struct mlx5_timeouts {
 
 static const u32 tout_def_sw_val[MAX_TIMEOUT_TYPES] = {
 	[MLX5_TO_FW_PRE_INIT_TIMEOUT_MS] = 120000,
+	[MLX5_TO_FW_PRE_INIT_ON_RECOVERY_TIMEOUT_MS] = 7200000,
 	[MLX5_TO_FW_PRE_INIT_WARN_MESSAGE_INTERVAL_MS] = 20000,
 	[MLX5_TO_FW_PRE_INIT_WAIT_MS] = 2,
 	[MLX5_TO_FW_INIT_MS] = 2000,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h
index 1c42ead782fa..257c03eeab36 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h
@@ -7,6 +7,7 @@
 enum mlx5_timeouts_types {
 	/* pre init timeouts (not read from FW) */
 	MLX5_TO_FW_PRE_INIT_TIMEOUT_MS,
+	MLX5_TO_FW_PRE_INIT_ON_RECOVERY_TIMEOUT_MS,
 	MLX5_TO_FW_PRE_INIT_WARN_MESSAGE_INTERVAL_MS,
 	MLX5_TO_FW_PRE_INIT_WAIT_MS,
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f28a3526aafa..84f75aa25214 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1003,7 +1003,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_devcom_unregister_device(dev->priv.devcom);
 }
 
-static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot)
+static int mlx5_function_setup(struct mlx5_core_dev *dev, u64 timeout)
 {
 	int err;
 
@@ -1018,11 +1018,11 @@ static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot)
 
 	/* wait for firmware to accept initialization segments configurations
 	 */
-	err = wait_fw_init(dev, mlx5_tout_ms(dev, FW_PRE_INIT_TIMEOUT),
+	err = wait_fw_init(dev, timeout,
 			   mlx5_tout_ms(dev, FW_PRE_INIT_WARN_MESSAGE_INTERVAL));
 	if (err) {
 		mlx5_core_err(dev, "Firmware over %llu MS in pre-initializing state, aborting\n",
-			      mlx5_tout_ms(dev, FW_PRE_INIT_TIMEOUT));
+			      timeout);
 		return err;
 	}
 
@@ -1272,7 +1272,7 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 	mutex_lock(&dev->intf_state_mutex);
 	dev->state = MLX5_DEVICE_STATE_UP;
 
-	err = mlx5_function_setup(dev, true);
+	err = mlx5_function_setup(dev, mlx5_tout_ms(dev, FW_PRE_INIT_TIMEOUT));
 	if (err)
 		goto err_function;
 
@@ -1336,9 +1336,10 @@ void mlx5_uninit_one(struct mlx5_core_dev *dev)
 	mutex_unlock(&dev->intf_state_mutex);
 }
 
-int mlx5_load_one(struct mlx5_core_dev *dev)
+int mlx5_load_one(struct mlx5_core_dev *dev, bool recovery)
 {
 	int err = 0;
+	u64 timeout;
 
 	mutex_lock(&dev->intf_state_mutex);
 	if (test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
@@ -1348,7 +1349,11 @@ int mlx5_load_one(struct mlx5_core_dev *dev)
 	/* remove any previous indication of internal error */
 	dev->state = MLX5_DEVICE_STATE_UP;
 
-	err = mlx5_function_setup(dev, false);
+	if (recovery)
+		timeout = mlx5_tout_ms(dev, FW_PRE_INIT_ON_RECOVERY_TIMEOUT);
+	else
+		timeout = mlx5_tout_ms(dev, FW_PRE_INIT_TIMEOUT);
+	err = mlx5_function_setup(dev, timeout);
 	if (err)
 		goto err_function;
 
@@ -1719,7 +1724,7 @@ static void mlx5_pci_resume(struct pci_dev *pdev)
 
 	mlx5_pci_trace(dev, "Enter, loading driver..\n");
 
-	err = mlx5_load_one(dev);
+	err = mlx5_load_one(dev, false);
 
 	mlx5_pci_trace(dev, "Done, err = %d, device %s\n", err,
 		       !err ? "recovered" : "Failed");
@@ -1807,7 +1812,7 @@ static int mlx5_resume(struct pci_dev *pdev)
 {
 	struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
 
-	return mlx5_load_one(dev);
+	return mlx5_load_one(dev, false);
 }
 
 static const struct pci_device_id mlx5_core_pci_table[] = {
@@ -1852,7 +1857,7 @@ int mlx5_recover_device(struct mlx5_core_dev *dev)
 			return -EIO;
 	}
 
-	return mlx5_load_one(dev);
+	return mlx5_load_one(dev, true);
 }
 
 static struct pci_driver mlx5_core_driver = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index a9b2d6ead542..9026be1d6223 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -290,7 +290,7 @@ void mlx5_mdev_uninit(struct mlx5_core_dev *dev);
 int mlx5_init_one(struct mlx5_core_dev *dev);
 void mlx5_uninit_one(struct mlx5_core_dev *dev);
 void mlx5_unload_one(struct mlx5_core_dev *dev);
-int mlx5_load_one(struct mlx5_core_dev *dev);
+int mlx5_load_one(struct mlx5_core_dev *dev, bool recovery);
 
 int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 03/15] net/mlx5: Lag, expose number of lag ports
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
  2022-05-10  5:57 ` [net-next 01/15] net/mlx5: Add exit route when waiting for FW Saeed Mahameed
  2022-05-10  5:57 ` [net-next 02/15] net/mlx5: Increase FW pre-init timeout for health recovery Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 04/15] net/mlx5: devcom only supports 2 ports Saeed Mahameed
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/infiniband/hw/mlx5/gsi.c                  | 2 +-
 drivers/infiniband/hw/mlx5/main.c                 | 1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h              | 1 +
 drivers/infiniband/hw/mlx5/qp.c                   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 6 ++++++
 include/linux/mlx5/driver.h                       | 1 +
 6 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/gsi.c b/drivers/infiniband/hw/mlx5/gsi.c
index 3ad8f637c589..b804f2dd5628 100644
--- a/drivers/infiniband/hw/mlx5/gsi.c
+++ b/drivers/infiniband/hw/mlx5/gsi.c
@@ -100,7 +100,7 @@ int mlx5_ib_create_gsi(struct ib_pd *pd, struct mlx5_ib_qp *mqp,
 				 port_type) == MLX5_CAP_PORT_TYPE_IB)
 			num_qps = pd->device->attrs.max_pkeys;
 		else if (dev->lag_active)
-			num_qps = MLX5_MAX_PORTS;
+			num_qps = dev->lag_ports;
 	}
 
 	gsi = &mqp->gsi;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 61aa196d6484..61a3b767262f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2991,6 +2991,7 @@ static int mlx5_eth_lag_init(struct mlx5_ib_dev *dev)
 	}
 
 	dev->flow_db->lag_demux_ft = ft;
+	dev->lag_ports = mlx5_lag_get_num_ports(mdev);
 	dev->lag_active = true;
 	return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 4f04bb55c4c6..8b3c83c0b70a 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1131,6 +1131,7 @@ struct mlx5_ib_dev {
 	struct xarray sig_mrs;
 	struct mlx5_port_caps port_caps[MLX5_MAX_PORTS];
 	u16 pkey_table_len;
+	u8 lag_ports;
 };
 
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 3f467557d34e..fb8669c02546 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3907,7 +3907,7 @@ static unsigned int get_tx_affinity_rr(struct mlx5_ib_dev *dev,
 		tx_port_affinity = &dev->port[port_num].roce.tx_port_affinity;
 
 	return (unsigned int)atomic_add_return(1, tx_port_affinity) %
-		MLX5_MAX_PORTS + 1;
+		(dev->lag_active ? dev->lag_ports : MLX5_CAP_GEN(dev->mdev, num_lag_ports)) + 1;
 }
 
 static bool qp_supports_affinity(struct mlx5_ib_qp *qp)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 6cad3b72c133..fe34cce77d07 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1185,6 +1185,12 @@ u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL(mlx5_lag_get_slave_port);
 
+u8 mlx5_lag_get_num_ports(struct mlx5_core_dev *dev)
+{
+	return MLX5_MAX_PORTS;
+}
+EXPORT_SYMBOL(mlx5_lag_get_num_ports);
+
 struct mlx5_core_dev *mlx5_lag_get_peer_mdev(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_dev *peer_dev = NULL;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index f327d0544038..62ea1120de9c 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1142,6 +1142,7 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 				 int num_counters,
 				 size_t *offsets);
 struct mlx5_core_dev *mlx5_lag_get_peer_mdev(struct mlx5_core_dev *dev);
+u8 mlx5_lag_get_num_ports(struct mlx5_core_dev *dev);
 struct mlx5_uars_page *mlx5_get_uars_page(struct mlx5_core_dev *mdev);
 void mlx5_put_uars_page(struct mlx5_core_dev *mdev, struct mlx5_uars_page *up);
 int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 04/15] net/mlx5: devcom only supports 2 ports
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 03/15] net/mlx5: Lag, expose number of lag ports Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 05/15] net/mlx5: Lag, move E-Switch prerequisite check into lag code Saeed Mahameed
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Devcom API is intended to be used between 2 devices only add this
implied assumption into the code and check when it's no true.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lib/devcom.c | 16 +++++++++-------
 .../net/ethernet/mellanox/mlx5/core/lib/devcom.h |  2 ++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
index bced2efe9bef..adefde3ea941 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
@@ -14,7 +14,7 @@ static LIST_HEAD(devcom_list);
 struct mlx5_devcom_component {
 	struct {
 		void *data;
-	} device[MLX5_MAX_PORTS];
+	} device[MLX5_DEVCOM_PORTS_SUPPORTED];
 
 	mlx5_devcom_event_handler_t handler;
 	struct rw_semaphore sem;
@@ -25,7 +25,7 @@ struct mlx5_devcom_list {
 	struct list_head list;
 
 	struct mlx5_devcom_component components[MLX5_DEVCOM_NUM_COMPONENTS];
-	struct mlx5_core_dev *devs[MLX5_MAX_PORTS];
+	struct mlx5_core_dev *devs[MLX5_DEVCOM_PORTS_SUPPORTED];
 };
 
 struct mlx5_devcom {
@@ -74,13 +74,15 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev)
 
 	if (!mlx5_core_is_pf(dev))
 		return NULL;
+	if (MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_DEVCOM_PORTS_SUPPORTED)
+		return NULL;
 
 	sguid0 = mlx5_query_nic_system_image_guid(dev);
 	list_for_each_entry(iter, &devcom_list, list) {
 		struct mlx5_core_dev *tmp_dev = NULL;
 
 		idx = -1;
-		for (i = 0; i < MLX5_MAX_PORTS; i++) {
+		for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) {
 			if (iter->devs[i])
 				tmp_dev = iter->devs[i];
 			else
@@ -134,11 +136,11 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom *devcom)
 
 	kfree(devcom);
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++)
 		if (priv->devs[i])
 			break;
 
-	if (i != MLX5_MAX_PORTS)
+	if (i != MLX5_DEVCOM_PORTS_SUPPORTED)
 		return;
 
 	list_del(&priv->list);
@@ -191,7 +193,7 @@ int mlx5_devcom_send_event(struct mlx5_devcom *devcom,
 
 	comp = &devcom->priv->components[id];
 	down_write(&comp->sem);
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++)
 		if (i != devcom->idx && comp->device[i].data) {
 			err = comp->handler(event, comp->device[i].data,
 					    event_data);
@@ -239,7 +241,7 @@ void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom,
 		return NULL;
 	}
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++)
 		if (i != devcom->idx)
 			break;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
index 939d5bf1581b..94313c18bb64 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
@@ -6,6 +6,8 @@
 
 #include <linux/mlx5/driver.h>
 
+#define MLX5_DEVCOM_PORTS_SUPPORTED 2
+
 enum mlx5_devcom_components {
 	MLX5_DEVCOM_ESW_OFFLOADS,
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 05/15] net/mlx5: Lag, move E-Switch prerequisite check into lag code
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 04/15] net/mlx5: devcom only supports 2 ports Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 06/15] net/mlx5: Lag, use lag lock Saeed Mahameed
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

There is no need to expose E-Switch function for something that can be
checked with already present API inside lag code.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 11 -----------
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  3 ---
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 11 +++++++++--
 3 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 25f2d2717aaa..8ef22893e5e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1893,17 +1893,6 @@ mlx5_eswitch_get_encap_mode(const struct mlx5_core_dev *dev)
 }
 EXPORT_SYMBOL(mlx5_eswitch_get_encap_mode);
 
-bool mlx5_esw_lag_prereq(struct mlx5_core_dev *dev0, struct mlx5_core_dev *dev1)
-{
-	if ((dev0->priv.eswitch->mode == MLX5_ESWITCH_NONE &&
-	     dev1->priv.eswitch->mode == MLX5_ESWITCH_NONE) ||
-	    (dev0->priv.eswitch->mode == MLX5_ESWITCH_OFFLOADS &&
-	     dev1->priv.eswitch->mode == MLX5_ESWITCH_OFFLOADS))
-		return true;
-
-	return false;
-}
-
 bool mlx5_esw_multipath_prereq(struct mlx5_core_dev *dev0,
 			       struct mlx5_core_dev *dev1)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index bac5160837c5..a5ae5df4d6f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -518,8 +518,6 @@ static inline bool mlx5_eswitch_vlan_actions_supported(struct mlx5_core_dev *dev
 		MLX5_CAP_ESW_FLOWTABLE_FDB(dev, push_vlan_2);
 }
 
-bool mlx5_esw_lag_prereq(struct mlx5_core_dev *dev0,
-			 struct mlx5_core_dev *dev1);
 bool mlx5_esw_multipath_prereq(struct mlx5_core_dev *dev0,
 			       struct mlx5_core_dev *dev1);
 
@@ -724,7 +722,6 @@ static inline int  mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; }
 static inline void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) {}
 static inline int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) { return 0; }
 static inline void mlx5_eswitch_disable(struct mlx5_eswitch *esw, bool clear_vf) {}
-static inline bool mlx5_esw_lag_prereq(struct mlx5_core_dev *dev0, struct mlx5_core_dev *dev1) { return true; }
 static inline bool mlx5_eswitch_is_funcs_handler(struct mlx5_core_dev *dev) { return false; }
 static inline
 int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw, u16 vport, int link_state) { return 0; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index fe34cce77d07..1de843d2f248 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -457,12 +457,19 @@ static int mlx5_deactivate_lag(struct mlx5_lag *ldev)
 
 static bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 {
+#ifdef CONFIG_MLX5_ESWITCH
+	u8 mode;
+#endif
+
 	if (!ldev->pf[MLX5_LAG_P1].dev || !ldev->pf[MLX5_LAG_P2].dev)
 		return false;
 
 #ifdef CONFIG_MLX5_ESWITCH
-	return mlx5_esw_lag_prereq(ldev->pf[MLX5_LAG_P1].dev,
-				   ldev->pf[MLX5_LAG_P2].dev);
+	mode = mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P1].dev);
+
+	return (mode == MLX5_ESWITCH_NONE || mode == MLX5_ESWITCH_OFFLOADS) &&
+		(mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P1].dev) ==
+		 mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P2].dev));
 #else
 	return (!mlx5_sriov_is_enabled(ldev->pf[MLX5_LAG_P1].dev) &&
 		!mlx5_sriov_is_enabled(ldev->pf[MLX5_LAG_P2].dev));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 06/15] net/mlx5: Lag, use lag lock
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 05/15] net/mlx5: Lag, move E-Switch prerequisite check into lag code Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 07/15] net/mlx5: Lag, filter non compatible devices Saeed Mahameed
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Use a lag specific lock instead of depending on external locks to
synchronise the lag creation/destruction.

With this, taking E-Switch mode lock is no longer needed for syncing
lag logic.

Cleanup any dead code that is left over and don't export functions that
aren't used outside the E-Switch core code.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 14 ----
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  5 --
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 79 ++++++++-----------
 .../net/ethernet/mellanox/mlx5/core/lag/lag.h |  2 +
 4 files changed, 35 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 8ef22893e5e6..719ef26d23c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1569,9 +1569,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 	ida_init(&esw->offloads.vport_metadata_ida);
 	xa_init_flags(&esw->offloads.vhca_map, XA_FLAGS_ALLOC);
 	mutex_init(&esw->state_lock);
-	lockdep_register_key(&esw->mode_lock_key);
 	init_rwsem(&esw->mode_lock);
-	lockdep_set_class(&esw->mode_lock, &esw->mode_lock_key);
 	refcount_set(&esw->qos.refcnt, 0);
 
 	esw->enabled_vports = 0;
@@ -1615,7 +1613,6 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
 	esw->dev->priv.eswitch = NULL;
 	destroy_workqueue(esw->work_queue);
 	WARN_ON(refcount_read(&esw->qos.refcnt));
-	lockdep_unregister_key(&esw->mode_lock_key);
 	mutex_destroy(&esw->state_lock);
 	WARN_ON(!xa_empty(&esw->offloads.vhca_map));
 	xa_destroy(&esw->offloads.vhca_map);
@@ -2003,17 +2000,6 @@ void mlx5_esw_unlock(struct mlx5_eswitch *esw)
 	up_write(&esw->mode_lock);
 }
 
-/**
- * mlx5_esw_lock() - Take write lock on esw mode lock
- * @esw: eswitch device.
- */
-void mlx5_esw_lock(struct mlx5_eswitch *esw)
-{
-	if (!mlx5_esw_allowed(esw))
-		return;
-	down_write(&esw->mode_lock);
-}
-
 /**
  * mlx5_eswitch_get_total_vports - Get total vports of the eswitch
  *
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index a5ae5df4d6f1..2754a732914d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -331,7 +331,6 @@ struct mlx5_eswitch {
 		u32             large_group_num;
 	}  params;
 	struct blocking_notifier_head n_head;
-	struct lock_class_key mode_lock_key;
 };
 
 void esw_offloads_disable(struct mlx5_eswitch *esw);
@@ -704,7 +703,6 @@ void mlx5_esw_get(struct mlx5_core_dev *dev);
 void mlx5_esw_put(struct mlx5_core_dev *dev);
 int mlx5_esw_try_lock(struct mlx5_eswitch *esw);
 void mlx5_esw_unlock(struct mlx5_eswitch *esw);
-void mlx5_esw_lock(struct mlx5_eswitch *esw);
 
 void esw_vport_change_handle_locked(struct mlx5_vport *vport);
 
@@ -730,9 +728,6 @@ static inline const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev)
 	return ERR_PTR(-EOPNOTSUPP);
 }
 
-static inline void mlx5_esw_unlock(struct mlx5_eswitch *esw) { return; }
-static inline void mlx5_esw_lock(struct mlx5_eswitch *esw) { return; }
-
 static inline struct mlx5_flow_handle *
 esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 1de843d2f248..fc32f3e05191 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -121,6 +121,7 @@ static void mlx5_ldev_free(struct kref *ref)
 	mlx5_lag_mp_cleanup(ldev);
 	cancel_delayed_work_sync(&ldev->bond_work);
 	destroy_workqueue(ldev->wq);
+	mutex_destroy(&ldev->lock);
 	kfree(ldev);
 }
 
@@ -150,6 +151,7 @@ static struct mlx5_lag *mlx5_lag_dev_alloc(struct mlx5_core_dev *dev)
 	}
 
 	kref_init(&ldev->ref);
+	mutex_init(&ldev->lock);
 	INIT_DELAYED_WORK(&ldev->bond_work, mlx5_do_bond_work);
 
 	ldev->nb.notifier_call = mlx5_lag_netdev_event;
@@ -643,31 +645,11 @@ static void mlx5_queue_bond_work(struct mlx5_lag *ldev, unsigned long delay)
 	queue_delayed_work(ldev->wq, &ldev->bond_work, delay);
 }
 
-static void mlx5_lag_lock_eswitches(struct mlx5_core_dev *dev0,
-				    struct mlx5_core_dev *dev1)
-{
-	if (dev0)
-		mlx5_esw_lock(dev0->priv.eswitch);
-	if (dev1)
-		mlx5_esw_lock(dev1->priv.eswitch);
-}
-
-static void mlx5_lag_unlock_eswitches(struct mlx5_core_dev *dev0,
-				      struct mlx5_core_dev *dev1)
-{
-	if (dev1)
-		mlx5_esw_unlock(dev1->priv.eswitch);
-	if (dev0)
-		mlx5_esw_unlock(dev0->priv.eswitch);
-}
-
 static void mlx5_do_bond_work(struct work_struct *work)
 {
 	struct delayed_work *delayed_work = to_delayed_work(work);
 	struct mlx5_lag *ldev = container_of(delayed_work, struct mlx5_lag,
 					     bond_work);
-	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	struct mlx5_core_dev *dev1 = ldev->pf[MLX5_LAG_P2].dev;
 	int status;
 
 	status = mlx5_dev_list_trylock();
@@ -676,15 +658,16 @@ static void mlx5_do_bond_work(struct work_struct *work)
 		return;
 	}
 
+	mutex_lock(&ldev->lock);
 	if (ldev->mode_changes_in_progress) {
+		mutex_unlock(&ldev->lock);
 		mlx5_dev_list_unlock();
 		mlx5_queue_bond_work(ldev, HZ);
 		return;
 	}
 
-	mlx5_lag_lock_eswitches(dev0, dev1);
 	mlx5_do_bond(ldev);
-	mlx5_lag_unlock_eswitches(dev0, dev1);
+	mutex_unlock(&ldev->lock);
 	mlx5_dev_list_unlock();
 }
 
@@ -908,7 +891,6 @@ static void mlx5_ldev_add_mdev(struct mlx5_lag *ldev,
 	dev->priv.lag = ldev;
 }
 
-/* Must be called with intf_mutex held */
 static void mlx5_ldev_remove_mdev(struct mlx5_lag *ldev,
 				  struct mlx5_core_dev *dev)
 {
@@ -946,13 +928,18 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
 			mlx5_core_err(dev, "Failed to alloc lag dev\n");
 			return 0;
 		}
-	} else {
-		if (ldev->mode_changes_in_progress)
-			return -EAGAIN;
-		mlx5_ldev_get(ldev);
+		mlx5_ldev_add_mdev(ldev, dev);
+		return 0;
 	}
 
+	mutex_lock(&ldev->lock);
+	if (ldev->mode_changes_in_progress) {
+		mutex_unlock(&ldev->lock);
+		return -EAGAIN;
+	}
+	mlx5_ldev_get(ldev);
 	mlx5_ldev_add_mdev(ldev, dev);
+	mutex_unlock(&ldev->lock);
 
 	return 0;
 }
@@ -966,14 +953,14 @@ void mlx5_lag_remove_mdev(struct mlx5_core_dev *dev)
 		return;
 
 recheck:
-	mlx5_dev_list_lock();
+	mutex_lock(&ldev->lock);
 	if (ldev->mode_changes_in_progress) {
-		mlx5_dev_list_unlock();
+		mutex_unlock(&ldev->lock);
 		msleep(100);
 		goto recheck;
 	}
 	mlx5_ldev_remove_mdev(ldev, dev);
-	mlx5_dev_list_unlock();
+	mutex_unlock(&ldev->lock);
 	mlx5_ldev_put(ldev);
 }
 
@@ -984,32 +971,35 @@ void mlx5_lag_add_mdev(struct mlx5_core_dev *dev)
 recheck:
 	mlx5_dev_list_lock();
 	err = __mlx5_lag_dev_add_mdev(dev);
+	mlx5_dev_list_unlock();
+
 	if (err) {
-		mlx5_dev_list_unlock();
 		msleep(100);
 		goto recheck;
 	}
-	mlx5_dev_list_unlock();
 }
 
-/* Must be called with intf_mutex held */
 void mlx5_lag_remove_netdev(struct mlx5_core_dev *dev,
 			    struct net_device *netdev)
 {
 	struct mlx5_lag *ldev;
+	bool lag_is_active;
 
 	ldev = mlx5_lag_dev(dev);
 	if (!ldev)
 		return;
 
+	mutex_lock(&ldev->lock);
 	mlx5_ldev_remove_netdev(ldev, netdev);
 	ldev->flags &= ~MLX5_LAG_FLAG_READY;
 
-	if (__mlx5_lag_is_active(ldev))
+	lag_is_active = __mlx5_lag_is_active(ldev);
+	mutex_unlock(&ldev->lock);
+
+	if (lag_is_active)
 		mlx5_queue_bond_work(ldev, 0);
 }
 
-/* Must be called with intf_mutex held */
 void mlx5_lag_add_netdev(struct mlx5_core_dev *dev,
 			 struct net_device *netdev)
 {
@@ -1020,6 +1010,7 @@ void mlx5_lag_add_netdev(struct mlx5_core_dev *dev,
 	if (!ldev)
 		return;
 
+	mutex_lock(&ldev->lock);
 	mlx5_ldev_add_netdev(ldev, dev, netdev);
 
 	for (i = 0; i < MLX5_MAX_PORTS; i++)
@@ -1028,6 +1019,7 @@ void mlx5_lag_add_netdev(struct mlx5_core_dev *dev,
 
 	if (i >= MLX5_MAX_PORTS)
 		ldev->flags |= MLX5_LAG_FLAG_READY;
+	mutex_unlock(&ldev->lock);
 	mlx5_queue_bond_work(ldev, 0);
 }
 
@@ -1104,8 +1096,6 @@ EXPORT_SYMBOL(mlx5_lag_is_shared_fdb);
 
 void mlx5_lag_disable_change(struct mlx5_core_dev *dev)
 {
-	struct mlx5_core_dev *dev0;
-	struct mlx5_core_dev *dev1;
 	struct mlx5_lag *ldev;
 
 	ldev = mlx5_lag_dev(dev);
@@ -1113,16 +1103,13 @@ void mlx5_lag_disable_change(struct mlx5_core_dev *dev)
 		return;
 
 	mlx5_dev_list_lock();
-
-	dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	dev1 = ldev->pf[MLX5_LAG_P2].dev;
+	mutex_lock(&ldev->lock);
 
 	ldev->mode_changes_in_progress++;
-	if (__mlx5_lag_is_active(ldev)) {
-		mlx5_lag_lock_eswitches(dev0, dev1);
+	if (__mlx5_lag_is_active(ldev))
 		mlx5_disable_lag(ldev);
-		mlx5_lag_unlock_eswitches(dev0, dev1);
-	}
+
+	mutex_unlock(&ldev->lock);
 	mlx5_dev_list_unlock();
 }
 
@@ -1134,9 +1121,9 @@ void mlx5_lag_enable_change(struct mlx5_core_dev *dev)
 	if (!ldev)
 		return;
 
-	mlx5_dev_list_lock();
+	mutex_lock(&ldev->lock);
 	ldev->mode_changes_in_progress--;
-	mlx5_dev_list_unlock();
+	mutex_unlock(&ldev->lock);
 	mlx5_queue_bond_work(ldev, 0);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
index cbf9a9003e55..03a7ea07ce96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
@@ -56,6 +56,8 @@ struct mlx5_lag {
 	struct notifier_block     nb;
 	struct lag_mp             lag_mp;
 	struct mlx5_lag_port_sel  port_sel;
+	/* Protect lag fields/state changes */
+	struct mutex		  lock;
 };
 
 static inline struct mlx5_lag *
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 07/15] net/mlx5: Lag, filter non compatible devices
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 06/15] net/mlx5: Lag, use lag lock Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 08/15] net/mlx5: Lag, store number of ports inside lag object Saeed Mahameed
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

When search for a peer lag device we can filter based on that
device's capabilities.

Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/dev.c | 48 +++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 12 ++---
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  1 +
 3 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index ba6dad97e308..3e750b827a19 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -555,12 +555,9 @@ static u32 mlx5_gen_pci_id(const struct mlx5_core_dev *dev)
 		     PCI_SLOT(dev->pdev->devfn));
 }
 
-static int next_phys_dev(struct device *dev, const void *data)
+static int _next_phys_dev(struct mlx5_core_dev *mdev,
+			  const struct mlx5_core_dev *curr)
 {
-	struct mlx5_adev *madev = container_of(dev, struct mlx5_adev, adev.dev);
-	struct mlx5_core_dev *mdev = madev->mdev;
-	const struct mlx5_core_dev *curr = data;
-
 	if (!mlx5_core_is_pf(mdev))
 		return 0;
 
@@ -574,8 +571,29 @@ static int next_phys_dev(struct device *dev, const void *data)
 	return 1;
 }
 
-/* Must be called with intf_mutex held */
-struct mlx5_core_dev *mlx5_get_next_phys_dev(struct mlx5_core_dev *dev)
+static int next_phys_dev(struct device *dev, const void *data)
+{
+	struct mlx5_adev *madev = container_of(dev, struct mlx5_adev, adev.dev);
+	struct mlx5_core_dev *mdev = madev->mdev;
+
+	return _next_phys_dev(mdev, data);
+}
+
+static int next_phys_dev_lag(struct device *dev, const void *data)
+{
+	struct mlx5_adev *madev = container_of(dev, struct mlx5_adev, adev.dev);
+	struct mlx5_core_dev *mdev = madev->mdev;
+
+	if (!MLX5_CAP_GEN(mdev, vport_group_manager) ||
+	    !MLX5_CAP_GEN(mdev, lag_master) ||
+	    MLX5_CAP_GEN(mdev, num_lag_ports) != MLX5_MAX_PORTS)
+		return 0;
+
+	return _next_phys_dev(mdev, data);
+}
+
+static struct mlx5_core_dev *mlx5_get_next_dev(struct mlx5_core_dev *dev,
+					       int (*match)(struct device *dev, const void *data))
 {
 	struct auxiliary_device *adev;
 	struct mlx5_adev *madev;
@@ -583,7 +601,7 @@ struct mlx5_core_dev *mlx5_get_next_phys_dev(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_pf(dev))
 		return NULL;
 
-	adev = auxiliary_find_device(NULL, dev, &next_phys_dev);
+	adev = auxiliary_find_device(NULL, dev, match);
 	if (!adev)
 		return NULL;
 
@@ -592,6 +610,20 @@ struct mlx5_core_dev *mlx5_get_next_phys_dev(struct mlx5_core_dev *dev)
 	return madev->mdev;
 }
 
+/* Must be called with intf_mutex held */
+struct mlx5_core_dev *mlx5_get_next_phys_dev(struct mlx5_core_dev *dev)
+{
+	lockdep_assert_held(&mlx5_intf_mutex);
+	return mlx5_get_next_dev(dev, &next_phys_dev);
+}
+
+/* Must be called with intf_mutex held */
+struct mlx5_core_dev *mlx5_get_next_phys_dev_lag(struct mlx5_core_dev *dev)
+{
+	lockdep_assert_held(&mlx5_intf_mutex);
+	return mlx5_get_next_dev(dev, &next_phys_dev_lag);
+}
+
 void mlx5_dev_list_lock(void)
 {
 	mutex_lock(&mlx5_intf_mutex);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index fc32f3e05191..360cb1c4221e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -913,12 +913,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
 	struct mlx5_lag *ldev = NULL;
 	struct mlx5_core_dev *tmp_dev;
 
-	if (!MLX5_CAP_GEN(dev, vport_group_manager) ||
-	    !MLX5_CAP_GEN(dev, lag_master) ||
-	    MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_MAX_PORTS)
-		return 0;
-
-	tmp_dev = mlx5_get_next_phys_dev(dev);
+	tmp_dev = mlx5_get_next_phys_dev_lag(dev);
 	if (tmp_dev)
 		ldev = tmp_dev->priv.lag;
 
@@ -968,6 +963,11 @@ void mlx5_lag_add_mdev(struct mlx5_core_dev *dev)
 {
 	int err;
 
+	if (!MLX5_CAP_GEN(dev, vport_group_manager) ||
+	    !MLX5_CAP_GEN(dev, lag_master) ||
+	    MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_MAX_PORTS)
+		return;
+
 recheck:
 	mlx5_dev_list_lock();
 	err = __mlx5_lag_dev_add_mdev(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 9026be1d6223..484cb1e4fc7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -210,6 +210,7 @@ void mlx5_detach_device(struct mlx5_core_dev *dev);
 int mlx5_register_device(struct mlx5_core_dev *dev);
 void mlx5_unregister_device(struct mlx5_core_dev *dev);
 struct mlx5_core_dev *mlx5_get_next_phys_dev(struct mlx5_core_dev *dev);
+struct mlx5_core_dev *mlx5_get_next_phys_dev_lag(struct mlx5_core_dev *dev);
 void mlx5_dev_list_lock(void);
 void mlx5_dev_list_unlock(void);
 int mlx5_dev_list_trylock(void);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 08/15] net/mlx5: Lag, store number of ports inside lag object
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 07/15] net/mlx5: Lag, filter non compatible devices Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 09/15] net/mlx5: Lag, support single FDB only on 2 ports Saeed Mahameed
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Store the number of lag ports inside the lag object. Lag object is a single
shared object managing the lag state of multiple mlx5 devices on the same
physical HCA.

Downstream patches will allow hardware lag to be created over devices with
more than 2 ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 360cb1c4221e..deac240e6d78 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -164,6 +164,7 @@ static struct mlx5_lag *mlx5_lag_dev_alloc(struct mlx5_core_dev *dev)
 	if (err)
 		mlx5_core_err(dev, "Failed to init multipath lag err=%d\n",
 			      err);
+	ldev->ports = MLX5_CAP_GEN(dev, num_lag_ports);
 
 	return ldev;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
index 03a7ea07ce96..1c8fb3fada0c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
@@ -45,6 +45,7 @@ struct lag_tracker {
  */
 struct mlx5_lag {
 	u8                        flags;
+	u8			  ports;
 	int			  mode_changes_in_progress;
 	bool			  shared_fdb;
 	u8                        v2p_map[MLX5_MAX_PORTS];
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 09/15] net/mlx5: Lag, support single FDB only on 2 ports
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 08/15] net/mlx5: Lag, store number of ports inside lag object Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 10/15] net/mlx5: Lag, use hash when in roce lag on 4 ports Saeed Mahameed
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

E-Switch currently doesn't support more than 2 E-Switch managers
being aggregated under a single hardware lag. Have specific checks
to disallow creating lag when the code doesn't support it.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index deac240e6d78..4678b50b7e18 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -458,6 +458,7 @@ static int mlx5_deactivate_lag(struct mlx5_lag *ldev)
 	return 0;
 }
 
+#define MLX5_LAG_OFFLOADS_SUPPORTED_PORTS 2
 static bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 {
 #ifdef CONFIG_MLX5_ESWITCH
@@ -470,6 +471,9 @@ static bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 #ifdef CONFIG_MLX5_ESWITCH
 	mode = mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P1].dev);
 
+	if (mode == MLX5_ESWITCH_OFFLOADS && ldev->ports != MLX5_LAG_OFFLOADS_SUPPORTED_PORTS)
+		return false;
+
 	return (mode == MLX5_ESWITCH_NONE || mode == MLX5_ESWITCH_OFFLOADS) &&
 		(mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P1].dev) ==
 		 mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P2].dev));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 10/15] net/mlx5: Lag, use hash when in roce lag on 4 ports
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 09/15] net/mlx5: Lag, support single FDB only on 2 ports Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 11/15] net/mlx5: Lag, use actual number of lag ports Saeed Mahameed
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Downstream patches will add support for lag over 4 ports.
In that mode we will only use hash as the uplink selection method.
Using hash instead of queue affinity (before this patch) offers key
advantages like:

- Align ports selection method with the method used by the bond device

- Better packets distribution where a single queue can transmit from
  multiple ports (with queue affinity a queue is bound to a single port
  regardless of the packet being sent).

- In case of failover we traffic is split between multiple ports and not
  a single one like in queue affinity.

Going forward it was decided that queue affinity will be deprecated
as using hash provides a better user experience which means on 4 ports
HCAs hash will always be used.

Future work will add hash support for 2 ports HCAs as well.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 45 +++++++++++++++----
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 4678b50b7e18..4f6867eba5fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -310,17 +310,41 @@ void mlx5_modify_lag(struct mlx5_lag *ldev,
 		mlx5_lag_drop_rule_setup(ldev, tracker);
 }
 
-static void mlx5_lag_set_port_sel_mode(struct mlx5_lag *ldev,
-				       struct lag_tracker *tracker, u8 *flags)
+#define MLX5_LAG_ROCE_HASH_PORTS_SUPPORTED 4
+static int mlx5_lag_set_port_sel_mode_roce(struct mlx5_lag *ldev,
+					   struct lag_tracker *tracker, u8 *flags)
 {
-	bool roce_lag = !!(*flags & MLX5_LAG_FLAG_ROCE);
 	struct lag_func *dev0 = &ldev->pf[MLX5_LAG_P1];
 
-	if (roce_lag ||
-	    !MLX5_CAP_PORT_SELECTION(dev0->dev, port_select_flow_table) ||
-	    tracker->tx_type != NETDEV_LAG_TX_TYPE_HASH)
-		return;
-	*flags |= MLX5_LAG_FLAG_HASH_BASED;
+	if (ldev->ports == MLX5_LAG_ROCE_HASH_PORTS_SUPPORTED) {
+		/* Four ports are support only in hash mode */
+		if (!MLX5_CAP_PORT_SELECTION(dev0->dev, port_select_flow_table))
+			return -EINVAL;
+		*flags |= MLX5_LAG_FLAG_HASH_BASED;
+	}
+
+	return 0;
+}
+
+static int mlx5_lag_set_port_sel_mode_offloads(struct mlx5_lag *ldev,
+					       struct lag_tracker *tracker, u8 *flags)
+{
+	struct lag_func *dev0 = &ldev->pf[MLX5_LAG_P1];
+
+	if (MLX5_CAP_PORT_SELECTION(dev0->dev, port_select_flow_table) &&
+	    tracker->tx_type == NETDEV_LAG_TX_TYPE_HASH)
+		*flags |= MLX5_LAG_FLAG_HASH_BASED;
+	return 0;
+}
+
+static int mlx5_lag_set_port_sel_mode(struct mlx5_lag *ldev,
+				      struct lag_tracker *tracker, u8 *flags)
+{
+	bool roce_lag = !!(*flags & MLX5_LAG_FLAG_ROCE);
+
+	if (roce_lag)
+		return mlx5_lag_set_port_sel_mode_roce(ldev, tracker, flags);
+	return mlx5_lag_set_port_sel_mode_offloads(ldev, tracker, flags);
 }
 
 static char *get_str_port_sel_mode(u8 flags)
@@ -382,7 +406,10 @@ int mlx5_activate_lag(struct mlx5_lag *ldev,
 
 	mlx5_infer_tx_affinity_mapping(tracker, &ldev->v2p_map[MLX5_LAG_P1],
 				       &ldev->v2p_map[MLX5_LAG_P2]);
-	mlx5_lag_set_port_sel_mode(ldev, tracker, &flags);
+	err = mlx5_lag_set_port_sel_mode(ldev, tracker, &flags);
+	if (err)
+		return err;
+
 	if (flags & MLX5_LAG_FLAG_HASH_BASED) {
 		err = mlx5_lag_port_sel_create(ldev, tracker->hash_type,
 					       ldev->v2p_map[MLX5_LAG_P1],
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 11/15] net/mlx5: Lag, use actual number of lag ports
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 10/15] net/mlx5: Lag, use hash when in roce lag on 4 ports Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 12/15] net/mlx5: Support devices with more than 2 ports Saeed Mahameed
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Refactor the entire lag code to use ldev->ports instead of hard-coded
defines (like MLX5_MAX_PORTS) for its operations.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 295 +++++++++++-------
 .../mellanox/mlx5/core/lag/port_sel.c         |  60 ++--
 .../mellanox/mlx5/core/lag/port_sel.h         |  10 +-
 3 files changed, 216 insertions(+), 149 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 4f6867eba5fb..f2659b0f8cc5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -53,8 +53,7 @@ enum {
  */
 static DEFINE_SPINLOCK(lag_lock);
 
-static int mlx5_cmd_create_lag(struct mlx5_core_dev *dev, u8 remap_port1,
-			       u8 remap_port2, bool shared_fdb, u8 flags)
+static int mlx5_cmd_create_lag(struct mlx5_core_dev *dev, u8 *ports, bool shared_fdb, u8 flags)
 {
 	u32 in[MLX5_ST_SZ_DW(create_lag_in)] = {};
 	void *lag_ctx = MLX5_ADDR_OF(create_lag_in, in, ctx);
@@ -63,8 +62,8 @@ static int mlx5_cmd_create_lag(struct mlx5_core_dev *dev, u8 remap_port1,
 
 	MLX5_SET(lagc, lag_ctx, fdb_selection_mode, shared_fdb);
 	if (!(flags & MLX5_LAG_FLAG_HASH_BASED)) {
-		MLX5_SET(lagc, lag_ctx, tx_remap_affinity_1, remap_port1);
-		MLX5_SET(lagc, lag_ctx, tx_remap_affinity_2, remap_port2);
+		MLX5_SET(lagc, lag_ctx, tx_remap_affinity_1, ports[0]);
+		MLX5_SET(lagc, lag_ctx, tx_remap_affinity_2, ports[1]);
 	} else {
 		MLX5_SET(lagc, lag_ctx, port_select_mode,
 			 MLX5_LAG_PORT_SELECT_MODE_PORT_SELECT_FT);
@@ -73,8 +72,8 @@ static int mlx5_cmd_create_lag(struct mlx5_core_dev *dev, u8 remap_port1,
 	return mlx5_cmd_exec_in(dev, create_lag, in);
 }
 
-static int mlx5_cmd_modify_lag(struct mlx5_core_dev *dev, u8 remap_port1,
-			       u8 remap_port2)
+static int mlx5_cmd_modify_lag(struct mlx5_core_dev *dev, u8 num_ports,
+			       u8 *ports)
 {
 	u32 in[MLX5_ST_SZ_DW(modify_lag_in)] = {};
 	void *lag_ctx = MLX5_ADDR_OF(modify_lag_in, in, ctx);
@@ -82,8 +81,8 @@ static int mlx5_cmd_modify_lag(struct mlx5_core_dev *dev, u8 remap_port1,
 	MLX5_SET(modify_lag_in, in, opcode, MLX5_CMD_OP_MODIFY_LAG);
 	MLX5_SET(modify_lag_in, in, field_select, 0x1);
 
-	MLX5_SET(lagc, lag_ctx, tx_remap_affinity_1, remap_port1);
-	MLX5_SET(lagc, lag_ctx, tx_remap_affinity_2, remap_port2);
+	MLX5_SET(lagc, lag_ctx, tx_remap_affinity_1, ports[0]);
+	MLX5_SET(lagc, lag_ctx, tx_remap_affinity_2, ports[1]);
 
 	return mlx5_cmd_exec_in(dev, modify_lag, in);
 }
@@ -174,7 +173,7 @@ int mlx5_lag_dev_get_netdev_idx(struct mlx5_lag *ldev,
 {
 	int i;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < ldev->ports; i++)
 		if (ldev->pf[i].netdev == ndev)
 			return i;
 
@@ -191,39 +190,69 @@ static bool __mlx5_lag_is_sriov(struct mlx5_lag *ldev)
 	return !!(ldev->flags & MLX5_LAG_FLAG_SRIOV);
 }
 
+static void mlx5_infer_tx_disabled(struct lag_tracker *tracker, u8 num_ports,
+				   u8 *ports, int *num_disabled)
+{
+	int i;
+
+	*num_disabled = 0;
+	for (i = 0; i < num_ports; i++) {
+		if (!tracker->netdev_state[i].tx_enabled ||
+		    !tracker->netdev_state[i].link_up)
+			ports[(*num_disabled)++] = i;
+	}
+}
+
 static void mlx5_infer_tx_affinity_mapping(struct lag_tracker *tracker,
-					   u8 *port1, u8 *port2)
+					   u8 num_ports, u8 *ports)
 {
-	bool p1en;
-	bool p2en;
+	int disabled[MLX5_MAX_PORTS] = {};
+	int enabled[MLX5_MAX_PORTS] = {};
+	int disabled_ports_num = 0;
+	int enabled_ports_num = 0;
+	u32 rand;
+	int i;
 
-	p1en = tracker->netdev_state[MLX5_LAG_P1].tx_enabled &&
-	       tracker->netdev_state[MLX5_LAG_P1].link_up;
+	for (i = 0; i < num_ports; i++) {
+		if (tracker->netdev_state[i].tx_enabled &&
+		    tracker->netdev_state[i].link_up)
+			enabled[enabled_ports_num++] = i;
+		else
+			disabled[disabled_ports_num++] = i;
+	}
 
-	p2en = tracker->netdev_state[MLX5_LAG_P2].tx_enabled &&
-	       tracker->netdev_state[MLX5_LAG_P2].link_up;
+	/* Use native mapping by default */
+	for (i = 0; i < num_ports; i++)
+		ports[i] = MLX5_LAG_EGRESS_PORT_1 + i;
 
-	*port1 = MLX5_LAG_EGRESS_PORT_1;
-	*port2 = MLX5_LAG_EGRESS_PORT_2;
-	if ((!p1en && !p2en) || (p1en && p2en))
+	/* If all ports are disabled/enabled keep native mapping */
+	if (enabled_ports_num == num_ports ||
+	    disabled_ports_num == num_ports)
 		return;
 
-	if (p1en)
-		*port2 = MLX5_LAG_EGRESS_PORT_1;
-	else
-		*port1 = MLX5_LAG_EGRESS_PORT_2;
+	/* Go over the disabled ports and for each assign a random active port */
+	for (i = 0; i < disabled_ports_num; i++) {
+		get_random_bytes(&rand, 4);
+
+		ports[disabled[i]] = enabled[rand % enabled_ports_num] + 1;
+	}
 }
 
 static bool mlx5_lag_has_drop_rule(struct mlx5_lag *ldev)
 {
-	return ldev->pf[MLX5_LAG_P1].has_drop || ldev->pf[MLX5_LAG_P2].has_drop;
+	int i;
+
+	for (i = 0; i < ldev->ports; i++)
+		if (ldev->pf[i].has_drop)
+			return true;
+	return false;
 }
 
 static void mlx5_lag_drop_rule_cleanup(struct mlx5_lag *ldev)
 {
 	int i;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++) {
+	for (i = 0; i < ldev->ports; i++) {
 		if (!ldev->pf[i].has_drop)
 			continue;
 
@@ -236,12 +265,12 @@ static void mlx5_lag_drop_rule_cleanup(struct mlx5_lag *ldev)
 static void mlx5_lag_drop_rule_setup(struct mlx5_lag *ldev,
 				     struct lag_tracker *tracker)
 {
-	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	struct mlx5_core_dev *dev1 = ldev->pf[MLX5_LAG_P2].dev;
-	struct mlx5_core_dev *inactive;
-	u8 v2p_port1, v2p_port2;
-	int inactive_idx;
+	u8 disabled_ports[MLX5_MAX_PORTS] = {};
+	struct mlx5_core_dev *dev;
+	int disabled_index;
+	int num_disabled;
 	int err;
+	int i;
 
 	/* First delete the current drop rule so there won't be any dropped
 	 * packets
@@ -251,58 +280,60 @@ static void mlx5_lag_drop_rule_setup(struct mlx5_lag *ldev,
 	if (!ldev->tracker.has_inactive)
 		return;
 
-	mlx5_infer_tx_affinity_mapping(tracker, &v2p_port1, &v2p_port2);
+	mlx5_infer_tx_disabled(tracker, ldev->ports, disabled_ports, &num_disabled);
 
-	if (v2p_port1 == MLX5_LAG_EGRESS_PORT_1) {
-		inactive = dev1;
-		inactive_idx = MLX5_LAG_P2;
-	} else {
-		inactive = dev0;
-		inactive_idx = MLX5_LAG_P1;
+	for (i = 0; i < num_disabled; i++) {
+		disabled_index = disabled_ports[i];
+		dev = ldev->pf[disabled_index].dev;
+		err = mlx5_esw_acl_ingress_vport_drop_rule_create(dev->priv.eswitch,
+								  MLX5_VPORT_UPLINK);
+		if (!err)
+			ldev->pf[disabled_index].has_drop = true;
+		else
+			mlx5_core_err(dev,
+				      "Failed to create lag drop rule, error: %d", err);
 	}
-
-	err = mlx5_esw_acl_ingress_vport_drop_rule_create(inactive->priv.eswitch,
-							  MLX5_VPORT_UPLINK);
-	if (!err)
-		ldev->pf[inactive_idx].has_drop = true;
-	else
-		mlx5_core_err(inactive,
-			      "Failed to create lag drop rule, error: %d", err);
 }
 
-static int _mlx5_modify_lag(struct mlx5_lag *ldev, u8 v2p_port1, u8 v2p_port2)
+static int _mlx5_modify_lag(struct mlx5_lag *ldev, u8 *ports)
 {
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
 
 	if (ldev->flags & MLX5_LAG_FLAG_HASH_BASED)
-		return mlx5_lag_port_sel_modify(ldev, v2p_port1, v2p_port2);
-	return mlx5_cmd_modify_lag(dev0, v2p_port1, v2p_port2);
+		return mlx5_lag_port_sel_modify(ldev, ports);
+	return mlx5_cmd_modify_lag(dev0, ldev->ports, ports);
 }
 
 void mlx5_modify_lag(struct mlx5_lag *ldev,
 		     struct lag_tracker *tracker)
 {
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	u8 v2p_port1, v2p_port2;
+	u8 ports[MLX5_MAX_PORTS] = {};
 	int err;
+	int i;
+	int j;
 
-	mlx5_infer_tx_affinity_mapping(tracker, &v2p_port1,
-				       &v2p_port2);
+	mlx5_infer_tx_affinity_mapping(tracker, ldev->ports, ports);
 
-	if (v2p_port1 != ldev->v2p_map[MLX5_LAG_P1] ||
-	    v2p_port2 != ldev->v2p_map[MLX5_LAG_P2]) {
-		err = _mlx5_modify_lag(ldev, v2p_port1, v2p_port2);
+	for (i = 0; i < ldev->ports; i++) {
+		if (ports[i] == ldev->v2p_map[i])
+			continue;
+		err = _mlx5_modify_lag(ldev, ports);
 		if (err) {
 			mlx5_core_err(dev0,
 				      "Failed to modify LAG (%d)\n",
 				      err);
 			return;
 		}
-		ldev->v2p_map[MLX5_LAG_P1] = v2p_port1;
-		ldev->v2p_map[MLX5_LAG_P2] = v2p_port2;
-		mlx5_core_info(dev0, "modify lag map port 1:%d port 2:%d",
-			       ldev->v2p_map[MLX5_LAG_P1],
-			       ldev->v2p_map[MLX5_LAG_P2]);
+		memcpy(ldev->v2p_map, ports, sizeof(ports[0]) *
+		       ldev->ports);
+
+		mlx5_core_info(dev0, "modify lag map\n");
+		for (j = 0; j < ldev->ports; j++)
+			mlx5_core_info(dev0, "\tmap port %d:%d\n",
+				       j + 1,
+				       ldev->v2p_map[j]);
+		break;
 	}
 
 	if (tracker->tx_type == NETDEV_LAG_TX_TYPE_ACTIVEBACKUP &&
@@ -362,13 +393,15 @@ static int mlx5_create_lag(struct mlx5_lag *ldev,
 	struct mlx5_core_dev *dev1 = ldev->pf[MLX5_LAG_P2].dev;
 	u32 in[MLX5_ST_SZ_DW(destroy_lag_in)] = {};
 	int err;
+	int i;
 
-	mlx5_core_info(dev0, "lag map port 1:%d port 2:%d shared_fdb:%d mode:%s",
-		       ldev->v2p_map[MLX5_LAG_P1], ldev->v2p_map[MLX5_LAG_P2],
+	mlx5_core_info(dev0, "lag map:\n");
+	for (i = 0; i < ldev->ports; i++)
+		mlx5_core_info(dev0, "\tport %d:%d\n", i + 1, ldev->v2p_map[i]);
+	mlx5_core_info(dev0, "shared_fdb:%d mode:%s\n",
 		       shared_fdb, get_str_port_sel_mode(flags));
 
-	err = mlx5_cmd_create_lag(dev0, ldev->v2p_map[MLX5_LAG_P1],
-				  ldev->v2p_map[MLX5_LAG_P2], shared_fdb, flags);
+	err = mlx5_cmd_create_lag(dev0, ldev->v2p_map, shared_fdb, flags);
 	if (err) {
 		mlx5_core_err(dev0,
 			      "Failed to create LAG (%d)\n",
@@ -404,16 +437,14 @@ int mlx5_activate_lag(struct mlx5_lag *ldev,
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
 	int err;
 
-	mlx5_infer_tx_affinity_mapping(tracker, &ldev->v2p_map[MLX5_LAG_P1],
-				       &ldev->v2p_map[MLX5_LAG_P2]);
+	mlx5_infer_tx_affinity_mapping(tracker, ldev->ports, ldev->v2p_map);
 	err = mlx5_lag_set_port_sel_mode(ldev, tracker, &flags);
 	if (err)
 		return err;
 
 	if (flags & MLX5_LAG_FLAG_HASH_BASED) {
 		err = mlx5_lag_port_sel_create(ldev, tracker->hash_type,
-					       ldev->v2p_map[MLX5_LAG_P1],
-					       ldev->v2p_map[MLX5_LAG_P2]);
+					       ldev->v2p_map);
 		if (err) {
 			mlx5_core_err(dev0,
 				      "Failed to create LAG port selection(%d)\n",
@@ -491,30 +522,37 @@ static bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 #ifdef CONFIG_MLX5_ESWITCH
 	u8 mode;
 #endif
+	int i;
 
-	if (!ldev->pf[MLX5_LAG_P1].dev || !ldev->pf[MLX5_LAG_P2].dev)
-		return false;
+	for (i = 0; i < ldev->ports; i++)
+		if (!ldev->pf[i].dev)
+			return false;
 
 #ifdef CONFIG_MLX5_ESWITCH
 	mode = mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P1].dev);
 
-	if (mode == MLX5_ESWITCH_OFFLOADS && ldev->ports != MLX5_LAG_OFFLOADS_SUPPORTED_PORTS)
+	if (mode != MLX5_ESWITCH_NONE && mode != MLX5_ESWITCH_OFFLOADS)
 		return false;
 
-	return (mode == MLX5_ESWITCH_NONE || mode == MLX5_ESWITCH_OFFLOADS) &&
-		(mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P1].dev) ==
-		 mlx5_eswitch_mode(ldev->pf[MLX5_LAG_P2].dev));
+	for (i = 0; i < ldev->ports; i++)
+		if (mlx5_eswitch_mode(ldev->pf[i].dev) != mode)
+			return false;
+
+	if (mode == MLX5_ESWITCH_OFFLOADS && ldev->ports != MLX5_LAG_OFFLOADS_SUPPORTED_PORTS)
+		return false;
 #else
-	return (!mlx5_sriov_is_enabled(ldev->pf[MLX5_LAG_P1].dev) &&
-		!mlx5_sriov_is_enabled(ldev->pf[MLX5_LAG_P2].dev));
+	for (i = 0; i < ldev->ports; i++)
+		if (mlx5_sriov_is_enabled(ldev->pf[i].dev))
+			return false;
 #endif
+	return true;
 }
 
 static void mlx5_lag_add_devices(struct mlx5_lag *ldev)
 {
 	int i;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++) {
+	for (i = 0; i < ldev->ports; i++) {
 		if (!ldev->pf[i].dev)
 			continue;
 
@@ -531,7 +569,7 @@ static void mlx5_lag_remove_devices(struct mlx5_lag *ldev)
 {
 	int i;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++) {
+	for (i = 0; i < ldev->ports; i++) {
 		if (!ldev->pf[i].dev)
 			continue;
 
@@ -551,6 +589,7 @@ static void mlx5_disable_lag(struct mlx5_lag *ldev)
 	bool shared_fdb = ldev->shared_fdb;
 	bool roce_lag;
 	int err;
+	int i;
 
 	roce_lag = __mlx5_lag_is_roce(ldev);
 
@@ -561,7 +600,8 @@ static void mlx5_disable_lag(struct mlx5_lag *ldev)
 			dev0->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
 			mlx5_rescan_drivers_locked(dev0);
 		}
-		mlx5_nic_vport_disable_roce(dev1);
+		for (i = 1; i < ldev->ports; i++)
+			mlx5_nic_vport_disable_roce(ldev->pf[i].dev);
 	}
 
 	err = mlx5_deactivate_lag(ldev);
@@ -598,6 +638,23 @@ static bool mlx5_shared_fdb_supported(struct mlx5_lag *ldev)
 	return false;
 }
 
+static bool mlx5_lag_is_roce_lag(struct mlx5_lag *ldev)
+{
+	bool roce_lag = true;
+	int i;
+
+	for (i = 0; i < ldev->ports; i++)
+		roce_lag = roce_lag && !mlx5_sriov_is_enabled(ldev->pf[i].dev);
+
+#ifdef CONFIG_MLX5_ESWITCH
+	for (i = 0; i < ldev->ports; i++)
+		roce_lag = roce_lag &&
+			ldev->pf[i].dev->priv.eswitch->mode == MLX5_ESWITCH_NONE;
+#endif
+
+	return roce_lag;
+}
+
 static void mlx5_do_bond(struct mlx5_lag *ldev)
 {
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
@@ -605,6 +662,7 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 	struct lag_tracker tracker;
 	bool do_bond, roce_lag;
 	int err;
+	int i;
 
 	if (!mlx5_lag_is_ready(ldev)) {
 		do_bond = false;
@@ -621,14 +679,7 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 	if (do_bond && !__mlx5_lag_is_active(ldev)) {
 		bool shared_fdb = mlx5_shared_fdb_supported(ldev);
 
-		roce_lag = !mlx5_sriov_is_enabled(dev0) &&
-			   !mlx5_sriov_is_enabled(dev1);
-
-#ifdef CONFIG_MLX5_ESWITCH
-		roce_lag = roce_lag &&
-			   dev0->priv.eswitch->mode == MLX5_ESWITCH_NONE &&
-			   dev1->priv.eswitch->mode == MLX5_ESWITCH_NONE;
-#endif
+		roce_lag = mlx5_lag_is_roce_lag(ldev);
 
 		if (shared_fdb || roce_lag)
 			mlx5_lag_remove_devices(ldev);
@@ -645,7 +696,8 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 		} else if (roce_lag) {
 			dev0->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
 			mlx5_rescan_drivers_locked(dev0);
-			mlx5_nic_vport_enable_roce(dev1);
+			for (i = 1; i < ldev->ports; i++)
+				mlx5_nic_vport_enable_roce(ldev->pf[i].dev);
 		} else if (shared_fdb) {
 			dev0->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
 			mlx5_rescan_drivers_locked(dev0);
@@ -713,7 +765,7 @@ static int mlx5_handle_changeupper_event(struct mlx5_lag *ldev,
 	bool is_bonded, is_in_lag, mode_supported;
 	bool has_inactive = 0;
 	struct slave *slave;
-	int bond_status = 0;
+	u8 bond_status = 0;
 	int num_slaves = 0;
 	int changed = 0;
 	int idx;
@@ -744,7 +796,7 @@ static int mlx5_handle_changeupper_event(struct mlx5_lag *ldev,
 	rcu_read_unlock();
 
 	/* None of this lagdev's netdevs are slaves of this master. */
-	if (!(bond_status & 0x3))
+	if (!(bond_status & GENMASK(ldev->ports - 1, 0)))
 		return 0;
 
 	if (lag_upper_info) {
@@ -757,7 +809,8 @@ static int mlx5_handle_changeupper_event(struct mlx5_lag *ldev,
 	 * A device is considered bonded if both its physical ports are slaves
 	 * of the same lag master, and only them.
 	 */
-	is_in_lag = num_slaves == MLX5_MAX_PORTS && bond_status == 0x3;
+	is_in_lag = num_slaves == ldev->ports &&
+		bond_status == GENMASK(ldev->ports - 1, 0);
 
 	/* Lag mode must be activebackup or hash. */
 	mode_supported = tracker->tx_type == NETDEV_LAG_TX_TYPE_ACTIVEBACKUP ||
@@ -886,7 +939,7 @@ static void mlx5_ldev_add_netdev(struct mlx5_lag *ldev,
 {
 	unsigned int fn = mlx5_get_dev_index(dev);
 
-	if (fn >= MLX5_MAX_PORTS)
+	if (fn >= ldev->ports)
 		return;
 
 	spin_lock(&lag_lock);
@@ -902,7 +955,7 @@ static void mlx5_ldev_remove_netdev(struct mlx5_lag *ldev,
 	int i;
 
 	spin_lock(&lag_lock);
-	for (i = 0; i < MLX5_MAX_PORTS; i++) {
+	for (i = 0; i < ldev->ports; i++) {
 		if (ldev->pf[i].netdev == netdev) {
 			ldev->pf[i].netdev = NULL;
 			break;
@@ -916,7 +969,7 @@ static void mlx5_ldev_add_mdev(struct mlx5_lag *ldev,
 {
 	unsigned int fn = mlx5_get_dev_index(dev);
 
-	if (fn >= MLX5_MAX_PORTS)
+	if (fn >= ldev->ports)
 		return;
 
 	ldev->pf[fn].dev = dev;
@@ -928,11 +981,11 @@ static void mlx5_ldev_remove_mdev(struct mlx5_lag *ldev,
 {
 	int i;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < ldev->ports; i++)
 		if (ldev->pf[i].dev == dev)
 			break;
 
-	if (i == MLX5_MAX_PORTS)
+	if (i == ldev->ports)
 		return;
 
 	ldev->pf[i].dev = NULL;
@@ -1045,11 +1098,11 @@ void mlx5_lag_add_netdev(struct mlx5_core_dev *dev,
 	mutex_lock(&ldev->lock);
 	mlx5_ldev_add_netdev(ldev, dev, netdev);
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < ldev->ports; i++)
 		if (!ldev->pf[i].dev)
 			break;
 
-	if (i >= MLX5_MAX_PORTS)
+	if (i >= ldev->ports)
 		ldev->flags |= MLX5_LAG_FLAG_READY;
 	mutex_unlock(&ldev->lock);
 	mlx5_queue_bond_work(ldev, 0);
@@ -1163,6 +1216,7 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
 {
 	struct net_device *ndev = NULL;
 	struct mlx5_lag *ldev;
+	int i;
 
 	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev(dev);
@@ -1171,9 +1225,11 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
 		goto unlock;
 
 	if (ldev->tracker.tx_type == NETDEV_LAG_TX_TYPE_ACTIVEBACKUP) {
-		ndev = ldev->tracker.netdev_state[MLX5_LAG_P1].tx_enabled ?
-		       ldev->pf[MLX5_LAG_P1].netdev :
-		       ldev->pf[MLX5_LAG_P2].netdev;
+		for (i = 0; i < ldev->ports; i++)
+			if (ldev->tracker.netdev_state[i].tx_enabled)
+				ndev = ldev->pf[i].netdev;
+		if (!ndev)
+			ndev = ldev->pf[ldev->ports - 1].netdev;
 	} else {
 		ndev = ldev->pf[MLX5_LAG_P1].netdev;
 	}
@@ -1192,16 +1248,19 @@ u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
 {
 	struct mlx5_lag *ldev;
 	u8 port = 0;
+	int i;
 
 	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev(dev);
 	if (!(ldev && __mlx5_lag_is_roce(ldev)))
 		goto unlock;
 
-	if (ldev->pf[MLX5_LAG_P1].netdev == slave)
-		port = MLX5_LAG_P1;
-	else
-		port = MLX5_LAG_P2;
+	for (i = 0; i < ldev->ports; i++) {
+		if (ldev->pf[MLX5_LAG_P1].netdev == slave) {
+			port = i;
+			break;
+		}
+	}
 
 	port = ldev->v2p_map[port];
 
@@ -1213,7 +1272,13 @@ EXPORT_SYMBOL(mlx5_lag_get_slave_port);
 
 u8 mlx5_lag_get_num_ports(struct mlx5_core_dev *dev)
 {
-	return MLX5_MAX_PORTS;
+	struct mlx5_lag *ldev;
+
+	ldev = mlx5_lag_dev(dev);
+	if (!ldev)
+		return 0;
+
+	return ldev->ports;
 }
 EXPORT_SYMBOL(mlx5_lag_get_num_ports);
 
@@ -1243,7 +1308,7 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 				 size_t *offsets)
 {
 	int outlen = MLX5_ST_SZ_BYTES(query_cong_statistics_out);
-	struct mlx5_core_dev *mdev[MLX5_MAX_PORTS];
+	struct mlx5_core_dev **mdev;
 	struct mlx5_lag *ldev;
 	int num_ports;
 	int ret, i, j;
@@ -1253,14 +1318,20 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 	if (!out)
 		return -ENOMEM;
 
+	mdev = kvzalloc(sizeof(mdev[0]) * MLX5_MAX_PORTS, GFP_KERNEL);
+	if (!mdev) {
+		ret = -ENOMEM;
+		goto free_out;
+	}
+
 	memset(values, 0, sizeof(*values) * num_counters);
 
 	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev(dev);
 	if (ldev && __mlx5_lag_is_active(ldev)) {
-		num_ports = MLX5_MAX_PORTS;
-		mdev[MLX5_LAG_P1] = ldev->pf[MLX5_LAG_P1].dev;
-		mdev[MLX5_LAG_P2] = ldev->pf[MLX5_LAG_P2].dev;
+		num_ports = ldev->ports;
+		for (i = 0; i < ldev->ports; i++)
+			mdev[i] = ldev->pf[i].dev;
 	} else {
 		num_ports = 1;
 		mdev[MLX5_LAG_P1] = dev;
@@ -1275,13 +1346,15 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 		ret = mlx5_cmd_exec_inout(mdev[i], query_cong_statistics, in,
 					  out);
 		if (ret)
-			goto free;
+			goto free_mdev;
 
 		for (j = 0; j < num_counters; ++j)
 			values[j] += be64_to_cpup((__be64 *)(out + offsets[j]));
 	}
 
-free:
+free_mdev:
+	kvfree(mdev);
+free_out:
 	kvfree(out);
 	return ret;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
index 5be322528279..478b4ef723f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
@@ -12,7 +12,8 @@ enum {
 
 static struct mlx5_flow_group *
 mlx5_create_hash_flow_group(struct mlx5_flow_table *ft,
-			    struct mlx5_flow_definer *definer)
+			    struct mlx5_flow_definer *definer,
+			    u8 ports)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
 	struct mlx5_flow_group *fg;
@@ -25,7 +26,7 @@ mlx5_create_hash_flow_group(struct mlx5_flow_table *ft,
 	MLX5_SET(create_flow_group_in, in, match_definer_id,
 		 mlx5_get_match_definer_id(definer));
 	MLX5_SET(create_flow_group_in, in, start_flow_index, 0);
-	MLX5_SET(create_flow_group_in, in, end_flow_index, MLX5_MAX_PORTS - 1);
+	MLX5_SET(create_flow_group_in, in, end_flow_index, ports - 1);
 	MLX5_SET(create_flow_group_in, in, group_type,
 		 MLX5_CREATE_FLOW_GROUP_IN_GROUP_TYPE_HASH_SPLIT);
 
@@ -36,7 +37,7 @@ mlx5_create_hash_flow_group(struct mlx5_flow_table *ft,
 
 static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
 					  struct mlx5_lag_definer *lag_definer,
-					  u8 port1, u8 port2)
+					  u8 *ports)
 {
 	struct mlx5_core_dev *dev = ldev->pf[MLX5_LAG_P1].dev;
 	struct mlx5_flow_table_attr ft_attr = {};
@@ -45,7 +46,7 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
 	struct mlx5_flow_namespace *ns;
 	int err, i;
 
-	ft_attr.max_fte = MLX5_MAX_PORTS;
+	ft_attr.max_fte = ldev->ports;
 	ft_attr.level = MLX5_LAG_FT_LEVEL_DEFINER;
 
 	ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_PORT_SEL);
@@ -61,7 +62,8 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
 	}
 
 	lag_definer->fg = mlx5_create_hash_flow_group(lag_definer->ft,
-						      lag_definer->definer);
+						      lag_definer->definer,
+						      ldev->ports);
 	if (IS_ERR(lag_definer->fg)) {
 		err = PTR_ERR(lag_definer->fg);
 		goto destroy_ft;
@@ -70,8 +72,8 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
 	dest.type = MLX5_FLOW_DESTINATION_TYPE_UPLINK;
 	dest.vport.flags |= MLX5_FLOW_DEST_VPORT_VHCA_ID;
 	flow_act.flags |= FLOW_ACT_NO_APPEND;
-	for (i = 0; i < MLX5_MAX_PORTS; i++) {
-		u8 affinity = i == 0 ? port1 : port2;
+	for (i = 0; i < ldev->ports; i++) {
+		u8 affinity = ports[i];
 
 		dest.vport.vhca_id = MLX5_CAP_GEN(ldev->pf[affinity - 1].dev,
 						  vhca_id);
@@ -279,8 +281,7 @@ static int mlx5_lag_set_definer(u32 *match_definer_mask,
 
 static struct mlx5_lag_definer *
 mlx5_lag_create_definer(struct mlx5_lag *ldev, enum netdev_lag_hash hash,
-			enum mlx5_traffic_types tt, bool tunnel, u8 port1,
-			u8 port2)
+			enum mlx5_traffic_types tt, bool tunnel, u8 *ports)
 {
 	struct mlx5_core_dev *dev = ldev->pf[MLX5_LAG_P1].dev;
 	struct mlx5_lag_definer *lag_definer;
@@ -308,7 +309,7 @@ mlx5_lag_create_definer(struct mlx5_lag *ldev, enum netdev_lag_hash hash,
 		goto free_mask;
 	}
 
-	err = mlx5_lag_create_port_sel_table(ldev, lag_definer, port1, port2);
+	err = mlx5_lag_create_port_sel_table(ldev, lag_definer, ports);
 	if (err)
 		goto destroy_match_definer;
 
@@ -331,7 +332,7 @@ static void mlx5_lag_destroy_definer(struct mlx5_lag *ldev,
 	struct mlx5_core_dev *dev = ldev->pf[MLX5_LAG_P1].dev;
 	int i;
 
-	for (i = 0; i < MLX5_MAX_PORTS; i++)
+	for (i = 0; i < ldev->ports; i++)
 		mlx5_del_flow_rules(lag_definer->rules[i]);
 	mlx5_destroy_flow_group(lag_definer->fg);
 	mlx5_destroy_flow_table(lag_definer->ft);
@@ -356,7 +357,7 @@ static void mlx5_lag_destroy_definers(struct mlx5_lag *ldev)
 
 static int mlx5_lag_create_definers(struct mlx5_lag *ldev,
 				    enum netdev_lag_hash hash_type,
-				    u8 port1, u8 port2)
+				    u8 *ports)
 {
 	struct mlx5_lag_port_sel *port_sel = &ldev->port_sel;
 	struct mlx5_lag_definer *lag_definer;
@@ -364,7 +365,7 @@ static int mlx5_lag_create_definers(struct mlx5_lag *ldev,
 
 	for_each_set_bit(tt, port_sel->tt_map, MLX5_NUM_TT) {
 		lag_definer = mlx5_lag_create_definer(ldev, hash_type, tt,
-						      false, port1, port2);
+						      false, ports);
 		if (IS_ERR(lag_definer)) {
 			err = PTR_ERR(lag_definer);
 			goto destroy_definers;
@@ -376,7 +377,7 @@ static int mlx5_lag_create_definers(struct mlx5_lag *ldev,
 
 		lag_definer =
 			mlx5_lag_create_definer(ldev, hash_type, tt,
-						true, port1, port2);
+						true, ports);
 		if (IS_ERR(lag_definer)) {
 			err = PTR_ERR(lag_definer);
 			goto destroy_definers;
@@ -513,13 +514,13 @@ static int mlx5_lag_create_inner_ttc_table(struct mlx5_lag *ldev)
 }
 
 int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
-			     enum netdev_lag_hash hash_type, u8 port1, u8 port2)
+			     enum netdev_lag_hash hash_type, u8 *ports)
 {
 	struct mlx5_lag_port_sel *port_sel = &ldev->port_sel;
 	int err;
 
 	set_tt_map(port_sel, hash_type);
-	err = mlx5_lag_create_definers(ldev, hash_type, port1, port2);
+	err = mlx5_lag_create_definers(ldev, hash_type, ports);
 	if (err)
 		return err;
 
@@ -546,12 +547,13 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
 static int
 mlx5_lag_modify_definers_destinations(struct mlx5_lag *ldev,
 				      struct mlx5_lag_definer **definers,
-				      u8 port1, u8 port2)
+				      u8 *ports)
 {
 	struct mlx5_lag_port_sel *port_sel = &ldev->port_sel;
 	struct mlx5_flow_destination dest = {};
 	int err;
 	int tt;
+	int i;
 
 	dest.type = MLX5_FLOW_DESTINATION_TYPE_UPLINK;
 	dest.vport.flags |= MLX5_FLOW_DEST_VPORT_VHCA_ID;
@@ -559,19 +561,13 @@ mlx5_lag_modify_definers_destinations(struct mlx5_lag *ldev,
 	for_each_set_bit(tt, port_sel->tt_map, MLX5_NUM_TT) {
 		struct mlx5_flow_handle **rules = definers[tt]->rules;
 
-		if (ldev->v2p_map[MLX5_LAG_P1] != port1) {
-			dest.vport.vhca_id =
-				MLX5_CAP_GEN(ldev->pf[port1 - 1].dev, vhca_id);
-			err = mlx5_modify_rule_destination(rules[MLX5_LAG_P1],
-							   &dest, NULL);
-			if (err)
-				return err;
-		}
-
-		if (ldev->v2p_map[MLX5_LAG_P2] != port2) {
+		for (i = 0; i < ldev->ports; i++) {
+			if (ldev->v2p_map[i] == ports[i])
+				continue;
 			dest.vport.vhca_id =
-				MLX5_CAP_GEN(ldev->pf[port2 - 1].dev, vhca_id);
-			err = mlx5_modify_rule_destination(rules[MLX5_LAG_P2],
+				MLX5_CAP_GEN(ldev->pf[ports[i] - 1].dev,
+					     vhca_id);
+			err = mlx5_modify_rule_destination(rules[i],
 							   &dest, NULL);
 			if (err)
 				return err;
@@ -581,14 +577,14 @@ mlx5_lag_modify_definers_destinations(struct mlx5_lag *ldev,
 	return 0;
 }
 
-int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 port1, u8 port2)
+int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 *ports)
 {
 	struct mlx5_lag_port_sel *port_sel = &ldev->port_sel;
 	int err;
 
 	err = mlx5_lag_modify_definers_destinations(ldev,
 						    port_sel->outer.definers,
-						    port1, port2);
+						    ports);
 	if (err)
 		return err;
 
@@ -597,7 +593,7 @@ int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 port1, u8 port2)
 
 	return mlx5_lag_modify_definers_destinations(ldev,
 						     port_sel->inner.definers,
-						     port1, port2);
+						     ports);
 }
 
 void mlx5_lag_port_sel_destroy(struct mlx5_lag *ldev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h
index 6d15b28a42fc..79852ac41dbc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h
@@ -27,22 +27,20 @@ struct mlx5_lag_port_sel {
 
 #ifdef CONFIG_MLX5_ESWITCH
 
-int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 port1, u8 port2);
+int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 *ports);
 void mlx5_lag_port_sel_destroy(struct mlx5_lag *ldev);
 int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
-			     enum netdev_lag_hash hash_type, u8 port1,
-			     u8 port2);
+			     enum netdev_lag_hash hash_type, u8 *ports);
 
 #else /* CONFIG_MLX5_ESWITCH */
 static inline int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
 					   enum netdev_lag_hash hash_type,
-					   u8 port1, u8 port2)
+					   u8 *ports)
 {
 	return 0;
 }
 
-static inline int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 port1,
-					   u8 port2)
+static inline int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 *ports)
 {
 	return 0;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 12/15] net/mlx5: Support devices with more than 2 ports
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 11/15] net/mlx5: Lag, use actual number of lag ports Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 13/15] net/mlx5: Lag, refactor dmesg print Saeed Mahameed
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/dev.c     | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 3 ++-
 include/linux/mlx5/driver.h                       | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 3e750b827a19..11f7c03ae81b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -586,7 +586,8 @@ static int next_phys_dev_lag(struct device *dev, const void *data)
 
 	if (!MLX5_CAP_GEN(mdev, vport_group_manager) ||
 	    !MLX5_CAP_GEN(mdev, lag_master) ||
-	    MLX5_CAP_GEN(mdev, num_lag_ports) != MLX5_MAX_PORTS)
+	    (MLX5_CAP_GEN(mdev, num_lag_ports) > MLX5_MAX_PORTS ||
+	     MLX5_CAP_GEN(mdev, num_lag_ports) <= 1))
 		return 0;
 
 	return _next_phys_dev(mdev, data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index f2659b0f8cc5..716e073c80d4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1050,7 +1050,8 @@ void mlx5_lag_add_mdev(struct mlx5_core_dev *dev)
 
 	if (!MLX5_CAP_GEN(dev, vport_group_manager) ||
 	    !MLX5_CAP_GEN(dev, lag_master) ||
-	    MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_MAX_PORTS)
+	    (MLX5_CAP_GEN(dev, num_lag_ports) > MLX5_MAX_PORTS ||
+	     MLX5_CAP_GEN(dev, num_lag_ports) <= 1))
 		return;
 
 recheck:
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 62ea1120de9c..fdb9d07a05a4 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -84,7 +84,7 @@ enum mlx5_sqp_t {
 };
 
 enum {
-	MLX5_MAX_PORTS	= 2,
+	MLX5_MAX_PORTS	= 4,
 };
 
 enum {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 13/15] net/mlx5: Lag, refactor dmesg print
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 12/15] net/mlx5: Support devices with more than 2 ports Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 14/15] net/mlx5: Lag, use buckets in hash mode Saeed Mahameed
  2022-05-10  5:57 ` [net-next 15/15] net/mlx5: Lag, add debugfs to query hardware lag state Saeed Mahameed
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Combine dmesg lag prints into a single function.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 22 ++++++++++---------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 716e073c80d4..90056a3ca89d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -107,6 +107,16 @@ int mlx5_cmd_destroy_vport_lag(struct mlx5_core_dev *dev)
 }
 EXPORT_SYMBOL(mlx5_cmd_destroy_vport_lag);
 
+static void mlx5_lag_print_mapping(struct mlx5_core_dev *dev,
+				   struct mlx5_lag *ldev)
+{
+	int i;
+
+	mlx5_core_info(dev, "lag map:\n");
+	for (i = 0; i < ldev->ports; i++)
+		mlx5_core_info(dev, "\tport %d:%d\n", i + 1, ldev->v2p_map[i]);
+}
+
 static int mlx5_lag_netdev_event(struct notifier_block *this,
 				 unsigned long event, void *ptr);
 static void mlx5_do_bond_work(struct work_struct *work);
@@ -311,7 +321,6 @@ void mlx5_modify_lag(struct mlx5_lag *ldev,
 	u8 ports[MLX5_MAX_PORTS] = {};
 	int err;
 	int i;
-	int j;
 
 	mlx5_infer_tx_affinity_mapping(tracker, ldev->ports, ports);
 
@@ -328,11 +337,7 @@ void mlx5_modify_lag(struct mlx5_lag *ldev,
 		memcpy(ldev->v2p_map, ports, sizeof(ports[0]) *
 		       ldev->ports);
 
-		mlx5_core_info(dev0, "modify lag map\n");
-		for (j = 0; j < ldev->ports; j++)
-			mlx5_core_info(dev0, "\tmap port %d:%d\n",
-				       j + 1,
-				       ldev->v2p_map[j]);
+		mlx5_lag_print_mapping(dev0, ldev);
 		break;
 	}
 
@@ -393,11 +398,8 @@ static int mlx5_create_lag(struct mlx5_lag *ldev,
 	struct mlx5_core_dev *dev1 = ldev->pf[MLX5_LAG_P2].dev;
 	u32 in[MLX5_ST_SZ_DW(destroy_lag_in)] = {};
 	int err;
-	int i;
 
-	mlx5_core_info(dev0, "lag map:\n");
-	for (i = 0; i < ldev->ports; i++)
-		mlx5_core_info(dev0, "\tport %d:%d\n", i + 1, ldev->v2p_map[i]);
+	mlx5_lag_print_mapping(dev0, ldev);
 	mlx5_core_info(dev0, "shared_fdb:%d mode:%s\n",
 		       shared_fdb, get_str_port_sel_mode(flags));
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 14/15] net/mlx5: Lag, use buckets in hash mode
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 13/15] net/mlx5: Lag, refactor dmesg print Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  2022-05-10  5:57 ` [net-next 15/15] net/mlx5: Lag, add debugfs to query hardware lag state Saeed Mahameed
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Maor Gottlieb, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

When in hardware lag and the NIC has more than 2 ports when one port
goes down need to distribute the traffic between the remaining
active ports.

For better spread in such cases instead of using 1-to-1 mapping and only
4 slots in the hash, use many.

Each port will have many slots that point to it. When a port goes down
go over all the slots that pointed to that port and spread them between
the remaining active ports. Once the port comes back restore the default
mapping.

We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots.
Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port.
The native mapping is such that:

port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1]
which means if a packet is hased into one of this slots it will hit the
wire via port 1.

port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2]
which means if a packet is hased into one of this slots it will hit the
wire via port2.

and this mapping is the same of the rest of the ports.
On a failover, lets say port 2 goes down (port 1, 3, 4 are still up).
the new mapping for port 2 will be:

port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4]
which means the mapping was changed from the native mapping to a mapping
that consists of only the active ports.

With this if a port goes down the traffic will be split between the
active ports randomly

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 154 +++++++++++++-----
 .../net/ethernet/mellanox/mlx5/core/lag/lag.h |   4 +-
 .../mellanox/mlx5/core/lag/port_sel.c         |  95 +++++++----
 .../mellanox/mlx5/core/lag/port_sel.h         |   5 +-
 4 files changed, 182 insertions(+), 76 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 90056a3ca89d..8a74c409b501 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -107,14 +107,73 @@ int mlx5_cmd_destroy_vport_lag(struct mlx5_core_dev *dev)
 }
 EXPORT_SYMBOL(mlx5_cmd_destroy_vport_lag);
 
+static void mlx5_infer_tx_disabled(struct lag_tracker *tracker, u8 num_ports,
+				   u8 *ports, int *num_disabled)
+{
+	int i;
+
+	*num_disabled = 0;
+	for (i = 0; i < num_ports; i++) {
+		if (!tracker->netdev_state[i].tx_enabled ||
+		    !tracker->netdev_state[i].link_up)
+			ports[(*num_disabled)++] = i;
+	}
+}
+
+static void mlx5_infer_tx_enabled(struct lag_tracker *tracker, u8 num_ports,
+				  u8 *ports, int *num_enabled)
+{
+	int i;
+
+	*num_enabled = 0;
+	for (i = 0; i < num_ports; i++) {
+		if (tracker->netdev_state[i].tx_enabled &&
+		    tracker->netdev_state[i].link_up)
+			ports[(*num_enabled)++] = i;
+	}
+
+	if (*num_enabled == 0)
+		mlx5_infer_tx_disabled(tracker, num_ports, ports, num_enabled);
+}
+
 static void mlx5_lag_print_mapping(struct mlx5_core_dev *dev,
-				   struct mlx5_lag *ldev)
+				   struct mlx5_lag *ldev,
+				   struct lag_tracker *tracker,
+				   u8 flags)
 {
+	char buf[MLX5_MAX_PORTS * 10 + 1] = {};
+	u8 enabled_ports[MLX5_MAX_PORTS] = {};
+	int written = 0;
+	int num_enabled;
+	int idx;
+	int err;
 	int i;
+	int j;
 
-	mlx5_core_info(dev, "lag map:\n");
-	for (i = 0; i < ldev->ports; i++)
-		mlx5_core_info(dev, "\tport %d:%d\n", i + 1, ldev->v2p_map[i]);
+	if (flags & MLX5_LAG_FLAG_HASH_BASED) {
+		mlx5_infer_tx_enabled(tracker, ldev->ports, enabled_ports,
+				      &num_enabled);
+		for (i = 0; i < num_enabled; i++) {
+			err = scnprintf(buf + written, 4, "%d, ", enabled_ports[i] + 1);
+			if (err != 3)
+				return;
+			written += err;
+		}
+		buf[written - 2] = 0;
+		mlx5_core_info(dev, "lag map active ports: %s\n", buf);
+	} else {
+		for (i = 0; i < ldev->ports; i++) {
+			for (j  = 0; j < ldev->buckets; j++) {
+				idx = i * ldev->buckets + j;
+				err = scnprintf(buf + written, 10,
+						" port %d:%d", i + 1, ldev->v2p_map[idx]);
+				if (err != 9)
+					return;
+				written += err;
+			}
+		}
+		mlx5_core_info(dev, "lag map:%s\n", buf);
+	}
 }
 
 static int mlx5_lag_netdev_event(struct notifier_block *this,
@@ -174,6 +233,7 @@ static struct mlx5_lag *mlx5_lag_dev_alloc(struct mlx5_core_dev *dev)
 		mlx5_core_err(dev, "Failed to init multipath lag err=%d\n",
 			      err);
 	ldev->ports = MLX5_CAP_GEN(dev, num_lag_ports);
+	ldev->buckets = 1;
 
 	return ldev;
 }
@@ -200,28 +260,25 @@ static bool __mlx5_lag_is_sriov(struct mlx5_lag *ldev)
 	return !!(ldev->flags & MLX5_LAG_FLAG_SRIOV);
 }
 
-static void mlx5_infer_tx_disabled(struct lag_tracker *tracker, u8 num_ports,
-				   u8 *ports, int *num_disabled)
-{
-	int i;
-
-	*num_disabled = 0;
-	for (i = 0; i < num_ports; i++) {
-		if (!tracker->netdev_state[i].tx_enabled ||
-		    !tracker->netdev_state[i].link_up)
-			ports[(*num_disabled)++] = i;
-	}
-}
-
+/* Create a mapping between steering slots and active ports.
+ * As we have ldev->buckets slots per port first assume the native
+ * mapping should be used.
+ * If there are ports that are disabled fill the relevant slots
+ * with mapping that points to active ports.
+ */
 static void mlx5_infer_tx_affinity_mapping(struct lag_tracker *tracker,
-					   u8 num_ports, u8 *ports)
+					   u8 num_ports,
+					   u8 buckets,
+					   u8 *ports)
 {
 	int disabled[MLX5_MAX_PORTS] = {};
 	int enabled[MLX5_MAX_PORTS] = {};
 	int disabled_ports_num = 0;
 	int enabled_ports_num = 0;
+	int idx;
 	u32 rand;
 	int i;
+	int j;
 
 	for (i = 0; i < num_ports; i++) {
 		if (tracker->netdev_state[i].tx_enabled &&
@@ -231,9 +288,14 @@ static void mlx5_infer_tx_affinity_mapping(struct lag_tracker *tracker,
 			disabled[disabled_ports_num++] = i;
 	}
 
-	/* Use native mapping by default */
+	/* Use native mapping by default where each port's buckets
+	 * point the native port: 1 1 1 .. 1 2 2 2 ... 2 3 3 3 ... 3 etc
+	 */
 	for (i = 0; i < num_ports; i++)
-		ports[i] = MLX5_LAG_EGRESS_PORT_1 + i;
+		for (j = 0; j < buckets; j++) {
+			idx = i * buckets + j;
+			ports[idx] = MLX5_LAG_EGRESS_PORT_1 + i;
+		}
 
 	/* If all ports are disabled/enabled keep native mapping */
 	if (enabled_ports_num == num_ports ||
@@ -242,9 +304,10 @@ static void mlx5_infer_tx_affinity_mapping(struct lag_tracker *tracker,
 
 	/* Go over the disabled ports and for each assign a random active port */
 	for (i = 0; i < disabled_ports_num; i++) {
-		get_random_bytes(&rand, 4);
-
-		ports[disabled[i]] = enabled[rand % enabled_ports_num] + 1;
+		for (j = 0; j < buckets; j++) {
+			get_random_bytes(&rand, 4);
+			ports[disabled[i] * buckets + j] = enabled[rand % enabled_ports_num] + 1;
+		}
 	}
 }
 
@@ -317,28 +380,33 @@ static int _mlx5_modify_lag(struct mlx5_lag *ldev, u8 *ports)
 void mlx5_modify_lag(struct mlx5_lag *ldev,
 		     struct lag_tracker *tracker)
 {
+	u8 ports[MLX5_MAX_PORTS * MLX5_LAG_MAX_HASH_BUCKETS] = {};
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
-	u8 ports[MLX5_MAX_PORTS] = {};
+	int idx;
 	int err;
 	int i;
+	int j;
 
-	mlx5_infer_tx_affinity_mapping(tracker, ldev->ports, ports);
+	mlx5_infer_tx_affinity_mapping(tracker, ldev->ports, ldev->buckets, ports);
 
 	for (i = 0; i < ldev->ports; i++) {
-		if (ports[i] == ldev->v2p_map[i])
-			continue;
-		err = _mlx5_modify_lag(ldev, ports);
-		if (err) {
-			mlx5_core_err(dev0,
-				      "Failed to modify LAG (%d)\n",
-				      err);
-			return;
-		}
-		memcpy(ldev->v2p_map, ports, sizeof(ports[0]) *
-		       ldev->ports);
+		for (j = 0; j < ldev->buckets; j++) {
+			idx = i * ldev->buckets + j;
+			if (ports[idx] == ldev->v2p_map[idx])
+				continue;
+			err = _mlx5_modify_lag(ldev, ports);
+			if (err) {
+				mlx5_core_err(dev0,
+					      "Failed to modify LAG (%d)\n",
+					      err);
+				return;
+			}
+			memcpy(ldev->v2p_map, ports, sizeof(ports));
 
-		mlx5_lag_print_mapping(dev0, ldev);
-		break;
+			mlx5_lag_print_mapping(dev0, ldev, tracker,
+					       ldev->flags);
+			break;
+		}
 	}
 
 	if (tracker->tx_type == NETDEV_LAG_TX_TYPE_ACTIVEBACKUP &&
@@ -357,6 +425,8 @@ static int mlx5_lag_set_port_sel_mode_roce(struct mlx5_lag *ldev,
 		if (!MLX5_CAP_PORT_SELECTION(dev0->dev, port_select_flow_table))
 			return -EINVAL;
 		*flags |= MLX5_LAG_FLAG_HASH_BASED;
+		if (ldev->ports > 2)
+			ldev->buckets = MLX5_LAG_MAX_HASH_BUCKETS;
 	}
 
 	return 0;
@@ -370,6 +440,7 @@ static int mlx5_lag_set_port_sel_mode_offloads(struct mlx5_lag *ldev,
 	if (MLX5_CAP_PORT_SELECTION(dev0->dev, port_select_flow_table) &&
 	    tracker->tx_type == NETDEV_LAG_TX_TYPE_HASH)
 		*flags |= MLX5_LAG_FLAG_HASH_BASED;
+
 	return 0;
 }
 
@@ -399,7 +470,7 @@ static int mlx5_create_lag(struct mlx5_lag *ldev,
 	u32 in[MLX5_ST_SZ_DW(destroy_lag_in)] = {};
 	int err;
 
-	mlx5_lag_print_mapping(dev0, ldev);
+	mlx5_lag_print_mapping(dev0, ldev, tracker, flags);
 	mlx5_core_info(dev0, "shared_fdb:%d mode:%s\n",
 		       shared_fdb, get_str_port_sel_mode(flags));
 
@@ -439,11 +510,12 @@ int mlx5_activate_lag(struct mlx5_lag *ldev,
 	struct mlx5_core_dev *dev0 = ldev->pf[MLX5_LAG_P1].dev;
 	int err;
 
-	mlx5_infer_tx_affinity_mapping(tracker, ldev->ports, ldev->v2p_map);
 	err = mlx5_lag_set_port_sel_mode(ldev, tracker, &flags);
 	if (err)
 		return err;
 
+	mlx5_infer_tx_affinity_mapping(tracker, ldev->ports, ldev->buckets, ldev->v2p_map);
+
 	if (flags & MLX5_LAG_FLAG_HASH_BASED) {
 		err = mlx5_lag_port_sel_create(ldev, tracker->hash_type,
 					       ldev->v2p_map);
@@ -1265,7 +1337,7 @@ u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
 		}
 	}
 
-	port = ldev->v2p_map[port];
+	port = ldev->v2p_map[port * ldev->buckets];
 
 unlock:
 	spin_unlock(&lag_lock);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
index 1c8fb3fada0c..0c90d0ed03be 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
@@ -4,6 +4,7 @@
 #ifndef __MLX5_LAG_H__
 #define __MLX5_LAG_H__
 
+#define MLX5_LAG_MAX_HASH_BUCKETS 16
 #include "mlx5_core.h"
 #include "mp.h"
 #include "port_sel.h"
@@ -46,9 +47,10 @@ struct lag_tracker {
 struct mlx5_lag {
 	u8                        flags;
 	u8			  ports;
+	u8			  buckets;
 	int			  mode_changes_in_progress;
 	bool			  shared_fdb;
-	u8                        v2p_map[MLX5_MAX_PORTS];
+	u8			  v2p_map[MLX5_MAX_PORTS * MLX5_LAG_MAX_HASH_BUCKETS];
 	struct kref               ref;
 	struct lag_func           pf[MLX5_MAX_PORTS];
 	struct lag_tracker        tracker;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
index 478b4ef723f8..d3a3fe4ce670 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c
@@ -13,7 +13,7 @@ enum {
 static struct mlx5_flow_group *
 mlx5_create_hash_flow_group(struct mlx5_flow_table *ft,
 			    struct mlx5_flow_definer *definer,
-			    u8 ports)
+			    u8 rules)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
 	struct mlx5_flow_group *fg;
@@ -26,7 +26,7 @@ mlx5_create_hash_flow_group(struct mlx5_flow_table *ft,
 	MLX5_SET(create_flow_group_in, in, match_definer_id,
 		 mlx5_get_match_definer_id(definer));
 	MLX5_SET(create_flow_group_in, in, start_flow_index, 0);
-	MLX5_SET(create_flow_group_in, in, end_flow_index, ports - 1);
+	MLX5_SET(create_flow_group_in, in, end_flow_index, rules - 1);
 	MLX5_SET(create_flow_group_in, in, group_type,
 		 MLX5_CREATE_FLOW_GROUP_IN_GROUP_TYPE_HASH_SPLIT);
 
@@ -45,8 +45,10 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
 	MLX5_DECLARE_FLOW_ACT(flow_act);
 	struct mlx5_flow_namespace *ns;
 	int err, i;
+	int idx;
+	int j;
 
-	ft_attr.max_fte = ldev->ports;
+	ft_attr.max_fte = ldev->ports * ldev->buckets;
 	ft_attr.level = MLX5_LAG_FT_LEVEL_DEFINER;
 
 	ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_PORT_SEL);
@@ -63,7 +65,7 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
 
 	lag_definer->fg = mlx5_create_hash_flow_group(lag_definer->ft,
 						      lag_definer->definer,
-						      ldev->ports);
+						      ft_attr.max_fte);
 	if (IS_ERR(lag_definer->fg)) {
 		err = PTR_ERR(lag_definer->fg);
 		goto destroy_ft;
@@ -73,18 +75,24 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
 	dest.vport.flags |= MLX5_FLOW_DEST_VPORT_VHCA_ID;
 	flow_act.flags |= FLOW_ACT_NO_APPEND;
 	for (i = 0; i < ldev->ports; i++) {
-		u8 affinity = ports[i];
-
-		dest.vport.vhca_id = MLX5_CAP_GEN(ldev->pf[affinity - 1].dev,
-						  vhca_id);
-		lag_definer->rules[i] = mlx5_add_flow_rules(lag_definer->ft,
-							    NULL, &flow_act,
-							    &dest, 1);
-		if (IS_ERR(lag_definer->rules[i])) {
-			err = PTR_ERR(lag_definer->rules[i]);
-			while (i--)
-				mlx5_del_flow_rules(lag_definer->rules[i]);
-			goto destroy_fg;
+		for (j = 0; j < ldev->buckets; j++) {
+			u8 affinity;
+
+			idx = i * ldev->buckets + j;
+			affinity = ports[idx];
+
+			dest.vport.vhca_id = MLX5_CAP_GEN(ldev->pf[affinity - 1].dev,
+							  vhca_id);
+			lag_definer->rules[idx] = mlx5_add_flow_rules(lag_definer->ft,
+								      NULL, &flow_act,
+								      &dest, 1);
+			if (IS_ERR(lag_definer->rules[idx])) {
+				err = PTR_ERR(lag_definer->rules[idx]);
+				while (i--)
+					while (j--)
+						mlx5_del_flow_rules(lag_definer->rules[idx]);
+				goto destroy_fg;
+			}
 		}
 	}
 
@@ -330,10 +338,16 @@ static void mlx5_lag_destroy_definer(struct mlx5_lag *ldev,
 				     struct mlx5_lag_definer *lag_definer)
 {
 	struct mlx5_core_dev *dev = ldev->pf[MLX5_LAG_P1].dev;
+	int idx;
 	int i;
+	int j;
 
-	for (i = 0; i < ldev->ports; i++)
-		mlx5_del_flow_rules(lag_definer->rules[i]);
+	for (i = 0; i < ldev->ports; i++) {
+		for (j = 0; j < ldev->buckets; j++) {
+			idx = i * ldev->buckets + j;
+			mlx5_del_flow_rules(lag_definer->rules[idx]);
+		}
+	}
 	mlx5_destroy_flow_group(lag_definer->fg);
 	mlx5_destroy_flow_table(lag_definer->ft);
 	mlx5_destroy_match_definer(dev, lag_definer->definer);
@@ -544,31 +558,28 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev,
 	return err;
 }
 
-static int
-mlx5_lag_modify_definers_destinations(struct mlx5_lag *ldev,
-				      struct mlx5_lag_definer **definers,
-				      u8 *ports)
+static int __mlx5_lag_modify_definers_destinations(struct mlx5_lag *ldev,
+						   struct mlx5_lag_definer *def,
+						   u8 *ports)
 {
-	struct mlx5_lag_port_sel *port_sel = &ldev->port_sel;
 	struct mlx5_flow_destination dest = {};
+	int idx;
 	int err;
-	int tt;
 	int i;
+	int j;
 
 	dest.type = MLX5_FLOW_DESTINATION_TYPE_UPLINK;
 	dest.vport.flags |= MLX5_FLOW_DEST_VPORT_VHCA_ID;
 
-	for_each_set_bit(tt, port_sel->tt_map, MLX5_NUM_TT) {
-		struct mlx5_flow_handle **rules = definers[tt]->rules;
-
-		for (i = 0; i < ldev->ports; i++) {
+	for (i = 0; i < ldev->ports; i++) {
+		for (j = 0; j < ldev->buckets; j++) {
+			idx = i * ldev->buckets + j;
 			if (ldev->v2p_map[i] == ports[i])
 				continue;
-			dest.vport.vhca_id =
-				MLX5_CAP_GEN(ldev->pf[ports[i] - 1].dev,
-					     vhca_id);
-			err = mlx5_modify_rule_destination(rules[i],
-							   &dest, NULL);
+
+			dest.vport.vhca_id = MLX5_CAP_GEN(ldev->pf[ports[idx] - 1].dev,
+							  vhca_id);
+			err = mlx5_modify_rule_destination(def->rules[idx], &dest, NULL);
 			if (err)
 				return err;
 		}
@@ -577,6 +588,24 @@ mlx5_lag_modify_definers_destinations(struct mlx5_lag *ldev,
 	return 0;
 }
 
+static int
+mlx5_lag_modify_definers_destinations(struct mlx5_lag *ldev,
+				      struct mlx5_lag_definer **definers,
+				      u8 *ports)
+{
+	struct mlx5_lag_port_sel *port_sel = &ldev->port_sel;
+	int err;
+	int tt;
+
+	for_each_set_bit(tt, port_sel->tt_map, MLX5_NUM_TT) {
+		err = __mlx5_lag_modify_definers_destinations(ldev, definers[tt], ports);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 int mlx5_lag_port_sel_modify(struct mlx5_lag *ldev, u8 *ports)
 {
 	struct mlx5_lag_port_sel *port_sel = &ldev->port_sel;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h
index 79852ac41dbc..5ec3af2a3ecd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.h
@@ -10,7 +10,10 @@ struct mlx5_lag_definer {
 	struct mlx5_flow_definer *definer;
 	struct mlx5_flow_table *ft;
 	struct mlx5_flow_group *fg;
-	struct mlx5_flow_handle *rules[MLX5_MAX_PORTS];
+	/* Each port has ldev->buckets number of rules and they are arrange in
+	 * [port * buckets .. port * buckets + buckets) locations
+	 */
+	struct mlx5_flow_handle *rules[MLX5_MAX_PORTS * MLX5_LAG_MAX_HASH_BUCKETS];
 };
 
 struct mlx5_lag_ttc {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net-next 15/15] net/mlx5: Lag, add debugfs to query hardware lag state
  2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2022-05-10  5:57 ` [net-next 14/15] net/mlx5: Lag, use buckets in hash mode Saeed Mahameed
@ 2022-05-10  5:57 ` Saeed Mahameed
  14 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-05-10  5:57 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Mark Bloch, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.

Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag

For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag

The following files are exposed:

- state: Returns "active" or "disabled". If "active" it means hardware
         lag is active.

- members: Returns the BDFs of all the members of lag object.

- type: Returns the type of the lag currently configured. Valid only
	if hardware lag is active.
	* "roce" - Members are bare metal PFs.
	* "switchdev" - Members are in switchdev mode.
	* "multipath" - ECMP offloads.

- port_sel_mode: Returns the egress port selection method, valid
		 only if hardware lag is active.
		 * "queue_affinity" - Egress port is selected by
		   the QP/SQ affinity.
		 * "hash" - Egress port is selected by hash done on
		   each packet. Controlled by: xmit_hash_policy of the
		   bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
	 hardware lag is active.
	 * "shared_fdb" - "on" or "off", if "on" single FDB is used.

- mapping: Returns the mapping which is used to select egress port.
	   Valid only if hardware lag is active.
	   If @port_sel_mode is "hash" returns the active egress ports.
	   The hash result will select only active ports.
	   if @port_sel_mode is "queue_affinity" returns the mapping
	   between the configured port affinity of the QP/SQ and actual
	   egress port. For example:
	   * 1:1 - Mapping means if the configured affinity is port 1
	           traffic will egress via port 1.
	   * 1:2 - Mapping means if the configured affinity is port 1
		   traffic will egress via port 2. This can happen
		   if port 1 is down or in active/backup mode and port 1
		   is backup.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../ethernet/mellanox/mlx5/core/lag/debugfs.c | 173 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c |  11 +-
 .../net/ethernet/mellanox/mlx5/core/lag/lag.h |   9 +
 include/linux/mlx5/driver.h                   |   1 +
 5 files changed, 192 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 81620c25c77e..7895ed7cc285 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o alloc.o port.o mr.o pd.o \
 		transobj.o vport.o sriov.o fs_cmd.o fs_core.o pci_irq.o \
-		fs_counters.o fs_ft_pool.o rl.o lag/lag.o dev.o events.o wq.o lib/gid.o \
+		fs_counters.o fs_ft_pool.o rl.o lag/debugfs.o lag/lag.o dev.o events.o wq.o lib/gid.o \
 		lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o diag/fs_tracepoint.o \
 		diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o \
 		fw_reset.o qos.o lib/tout.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c
new file mode 100644
index 000000000000..443daf6e3d4b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include "lag.h"
+
+static char *get_str_mode_type(struct mlx5_lag *ldev)
+{
+	if (ldev->flags & MLX5_LAG_FLAG_ROCE)
+		return "roce";
+	if (ldev->flags & MLX5_LAG_FLAG_SRIOV)
+		return "switchdev";
+	if (ldev->flags & MLX5_LAG_FLAG_MULTIPATH)
+		return "multipath";
+
+	return NULL;
+}
+
+static int type_show(struct seq_file *file, void *priv)
+{
+	struct mlx5_core_dev *dev = file->private;
+	struct mlx5_lag *ldev;
+	char *mode = NULL;
+
+	ldev = dev->priv.lag;
+	mutex_lock(&ldev->lock);
+	if (__mlx5_lag_is_active(ldev))
+		mode = get_str_mode_type(ldev);
+	mutex_unlock(&ldev->lock);
+	if (!mode)
+		return -EINVAL;
+	seq_printf(file, "%s\n", mode);
+
+	return 0;
+}
+
+static int port_sel_mode_show(struct seq_file *file, void *priv)
+{
+	struct mlx5_core_dev *dev = file->private;
+	struct mlx5_lag *ldev;
+	int ret = 0;
+	char *mode;
+
+	ldev = dev->priv.lag;
+	mutex_lock(&ldev->lock);
+	if (__mlx5_lag_is_active(ldev))
+		mode = get_str_port_sel_mode(ldev->flags);
+	else
+		ret = -EINVAL;
+	mutex_unlock(&ldev->lock);
+	if (ret || !mode)
+		return ret;
+
+	seq_printf(file, "%s\n", mode);
+	return 0;
+}
+
+static int state_show(struct seq_file *file, void *priv)
+{
+	struct mlx5_core_dev *dev = file->private;
+	struct mlx5_lag *ldev;
+	bool active;
+
+	ldev = dev->priv.lag;
+	mutex_lock(&ldev->lock);
+	active = __mlx5_lag_is_active(ldev);
+	mutex_unlock(&ldev->lock);
+	seq_printf(file, "%s\n", active ? "active" : "disabled");
+	return 0;
+}
+
+static int flags_show(struct seq_file *file, void *priv)
+{
+	struct mlx5_core_dev *dev = file->private;
+	struct mlx5_lag *ldev;
+	bool shared_fdb;
+	bool lag_active;
+
+	ldev = dev->priv.lag;
+	mutex_lock(&ldev->lock);
+	lag_active = __mlx5_lag_is_active(ldev);
+	if (lag_active)
+		shared_fdb = ldev->shared_fdb;
+
+	mutex_unlock(&ldev->lock);
+	if (!lag_active)
+		return -EINVAL;
+
+	seq_printf(file, "%s:%s\n", "shared_fdb", shared_fdb ? "on" : "off");
+	return 0;
+}
+
+static int mapping_show(struct seq_file *file, void *priv)
+{
+	struct mlx5_core_dev *dev = file->private;
+	u8 ports[MLX5_MAX_PORTS] = {};
+	struct mlx5_lag *ldev;
+	bool hash = false;
+	bool lag_active;
+	int num_ports;
+	int i;
+
+	ldev = dev->priv.lag;
+	mutex_lock(&ldev->lock);
+	lag_active = __mlx5_lag_is_active(ldev);
+	if (lag_active) {
+		if (ldev->flags & MLX5_LAG_FLAG_HASH_BASED) {
+			mlx5_infer_tx_enabled(&ldev->tracker, ldev->ports, ports,
+					      &num_ports);
+			hash = true;
+		} else {
+			for (i = 0; i < ldev->ports; i++)
+				ports[i] = ldev->v2p_map[i];
+			num_ports = ldev->ports;
+		}
+	}
+	mutex_unlock(&ldev->lock);
+	if (!lag_active)
+		return -EINVAL;
+
+	for (i = 0; i < num_ports; i++) {
+		if (hash)
+			seq_printf(file, "%d\n", ports[i] + 1);
+		else
+			seq_printf(file, "%d:%d\n", i + 1, ports[i]);
+	}
+
+	return 0;
+}
+
+static int members_show(struct seq_file *file, void *priv)
+{
+	struct mlx5_core_dev *dev = file->private;
+	struct mlx5_lag *ldev;
+	int i;
+
+	ldev = dev->priv.lag;
+	mutex_lock(&ldev->lock);
+	for (i = 0; i < ldev->ports; i++) {
+		if (!ldev->pf[i].dev)
+			continue;
+		seq_printf(file, "%s\n", dev_name(ldev->pf[i].dev->device));
+	}
+	mutex_unlock(&ldev->lock);
+
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(type);
+DEFINE_SHOW_ATTRIBUTE(port_sel_mode);
+DEFINE_SHOW_ATTRIBUTE(state);
+DEFINE_SHOW_ATTRIBUTE(flags);
+DEFINE_SHOW_ATTRIBUTE(mapping);
+DEFINE_SHOW_ATTRIBUTE(members);
+
+void mlx5_ldev_add_debugfs(struct mlx5_core_dev *dev)
+{
+	struct dentry *dbg;
+
+	dbg = debugfs_create_dir("lag", mlx5_debugfs_get_dev_root(dev));
+	dev->priv.dbg.lag_debugfs = dbg;
+
+	debugfs_create_file("type", 0444, dbg, dev, &type_fops);
+	debugfs_create_file("port_sel_mode", 0444, dbg, dev, &port_sel_mode_fops);
+	debugfs_create_file("state", 0444, dbg, dev, &state_fops);
+	debugfs_create_file("flags", 0444, dbg, dev, &flags_fops);
+	debugfs_create_file("mapping", 0444, dbg, dev, &mapping_fops);
+	debugfs_create_file("members", 0444, dbg, dev, &members_fops);
+}
+
+void mlx5_ldev_remove_debugfs(struct dentry *dbg)
+{
+	debugfs_remove_recursive(dbg);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 8a74c409b501..b6dd9043061f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -120,8 +120,8 @@ static void mlx5_infer_tx_disabled(struct lag_tracker *tracker, u8 num_ports,
 	}
 }
 
-static void mlx5_infer_tx_enabled(struct lag_tracker *tracker, u8 num_ports,
-				  u8 *ports, int *num_enabled)
+void mlx5_infer_tx_enabled(struct lag_tracker *tracker, u8 num_ports,
+			   u8 *ports, int *num_enabled)
 {
 	int i;
 
@@ -454,7 +454,7 @@ static int mlx5_lag_set_port_sel_mode(struct mlx5_lag *ldev,
 	return mlx5_lag_set_port_sel_mode_offloads(ldev, tracker, flags);
 }
 
-static char *get_str_port_sel_mode(u8 flags)
+char *get_str_port_sel_mode(u8 flags)
 {
 	if (flags &  MLX5_LAG_FLAG_HASH_BASED)
 		return "hash";
@@ -1106,6 +1106,10 @@ void mlx5_lag_remove_mdev(struct mlx5_core_dev *dev)
 	if (!ldev)
 		return;
 
+	/* mdev is being removed, might as well remove debugfs
+	 * as early as possible.
+	 */
+	mlx5_ldev_remove_debugfs(dev->priv.dbg.lag_debugfs);
 recheck:
 	mutex_lock(&ldev->lock);
 	if (ldev->mode_changes_in_progress) {
@@ -1137,6 +1141,7 @@ void mlx5_lag_add_mdev(struct mlx5_core_dev *dev)
 		msleep(100);
 		goto recheck;
 	}
+	mlx5_ldev_add_debugfs(dev);
 }
 
 void mlx5_lag_remove_netdev(struct mlx5_core_dev *dev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
index 0c90d0ed03be..46683b84ff84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
@@ -4,6 +4,8 @@
 #ifndef __MLX5_LAG_H__
 #define __MLX5_LAG_H__
 
+#include <linux/debugfs.h>
+
 #define MLX5_LAG_MAX_HASH_BUCKETS 16
 #include "mlx5_core.h"
 #include "mp.h"
@@ -90,4 +92,11 @@ int mlx5_activate_lag(struct mlx5_lag *ldev,
 int mlx5_lag_dev_get_netdev_idx(struct mlx5_lag *ldev,
 				struct net_device *ndev);
 
+char *get_str_port_sel_mode(u8 flags);
+void mlx5_infer_tx_enabled(struct lag_tracker *tracker, u8 num_ports,
+			   u8 *ports, int *num_enabled);
+
+void mlx5_ldev_add_debugfs(struct mlx5_core_dev *dev);
+void mlx5_ldev_remove_debugfs(struct dentry *dbg);
+
 #endif /* __MLX5_LAG_H__ */
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index fdb9d07a05a4..d6bac3976913 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -558,6 +558,7 @@ struct mlx5_debugfs_entries {
 	struct dentry *cq_debugfs;
 	struct dentry *cmdif_debugfs;
 	struct dentry *pages_debugfs;
+	struct dentry *lag_debugfs;
 };
 
 struct mlx5_ft_pool;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [net-next 01/15] net/mlx5: Add exit route when waiting for FW
  2022-05-10  5:57 ` [net-next 01/15] net/mlx5: Add exit route when waiting for FW Saeed Mahameed
@ 2022-05-11 11:30   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-11 11:30 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem, kuba, pabeni, netdev, gavinl, moshe

Hello:

This series was applied to netdev/net-next.git (master)
by Saeed Mahameed <saeedm@nvidia.com>:

On Mon,  9 May 2022 22:57:29 -0700 you wrote:
> From: Gavin Li <gavinl@nvidia.com>
> 
> Currently, removing a device needs to get the driver interface lock before
> doing any cleanup. If the driver is waiting in a loop for FW init, there
> is no way to cancel the wait, instead the device cleanup waits for the
> loop to conclude and release the lock.
> 
> [...]

Here is the summary with links:
  - [net-next,01/15] net/mlx5: Add exit route when waiting for FW
    https://git.kernel.org/netdev/net-next/c/8324a02c342a
  - [net-next,02/15] net/mlx5: Increase FW pre-init timeout for health recovery
    https://git.kernel.org/netdev/net-next/c/37ca95e62ee2
  - [net-next,03/15] net/mlx5: Lag, expose number of lag ports
    https://git.kernel.org/netdev/net-next/c/34a30d7635a8
  - [net-next,04/15] net/mlx5: devcom only supports 2 ports
    https://git.kernel.org/netdev/net-next/c/8a6e75e5f57e
  - [net-next,05/15] net/mlx5: Lag, move E-Switch prerequisite check into lag code
    https://git.kernel.org/netdev/net-next/c/4202ea95a6b6
  - [net-next,06/15] net/mlx5: Lag, use lag lock
    https://git.kernel.org/netdev/net-next/c/ec2fa47d7b98
  - [net-next,07/15] net/mlx5: Lag, filter non compatible devices
    https://git.kernel.org/netdev/net-next/c/bc4c2f2e0179
  - [net-next,08/15] net/mlx5: Lag, store number of ports inside lag object
    https://git.kernel.org/netdev/net-next/c/e9d5bb51c592
  - [net-next,09/15] net/mlx5: Lag, support single FDB only on 2 ports
    https://git.kernel.org/netdev/net-next/c/e2c45931ff12
  - [net-next,10/15] net/mlx5: Lag, use hash when in roce lag on 4 ports
    https://git.kernel.org/netdev/net-next/c/cdf611d17094
  - [net-next,11/15] net/mlx5: Lag, use actual number of lag ports
    https://git.kernel.org/netdev/net-next/c/7e978e7714d6
  - [net-next,12/15] net/mlx5: Support devices with more than 2 ports
    https://git.kernel.org/netdev/net-next/c/4cd14d44b11d
  - [net-next,13/15] net/mlx5: Lag, refactor dmesg print
    https://git.kernel.org/netdev/net-next/c/24b3599effe2
  - [net-next,14/15] net/mlx5: Lag, use buckets in hash mode
    https://git.kernel.org/netdev/net-next/c/352899f384d4
  - [net-next,15/15] net/mlx5: Lag, add debugfs to query hardware lag state
    https://git.kernel.org/netdev/net-next/c/7f46a0b7327a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-05-11 11:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-10  5:57 [pull request][net-next 00/15] mlx5 updates 2022-05-09 Saeed Mahameed
2022-05-10  5:57 ` [net-next 01/15] net/mlx5: Add exit route when waiting for FW Saeed Mahameed
2022-05-11 11:30   ` patchwork-bot+netdevbpf
2022-05-10  5:57 ` [net-next 02/15] net/mlx5: Increase FW pre-init timeout for health recovery Saeed Mahameed
2022-05-10  5:57 ` [net-next 03/15] net/mlx5: Lag, expose number of lag ports Saeed Mahameed
2022-05-10  5:57 ` [net-next 04/15] net/mlx5: devcom only supports 2 ports Saeed Mahameed
2022-05-10  5:57 ` [net-next 05/15] net/mlx5: Lag, move E-Switch prerequisite check into lag code Saeed Mahameed
2022-05-10  5:57 ` [net-next 06/15] net/mlx5: Lag, use lag lock Saeed Mahameed
2022-05-10  5:57 ` [net-next 07/15] net/mlx5: Lag, filter non compatible devices Saeed Mahameed
2022-05-10  5:57 ` [net-next 08/15] net/mlx5: Lag, store number of ports inside lag object Saeed Mahameed
2022-05-10  5:57 ` [net-next 09/15] net/mlx5: Lag, support single FDB only on 2 ports Saeed Mahameed
2022-05-10  5:57 ` [net-next 10/15] net/mlx5: Lag, use hash when in roce lag on 4 ports Saeed Mahameed
2022-05-10  5:57 ` [net-next 11/15] net/mlx5: Lag, use actual number of lag ports Saeed Mahameed
2022-05-10  5:57 ` [net-next 12/15] net/mlx5: Support devices with more than 2 ports Saeed Mahameed
2022-05-10  5:57 ` [net-next 13/15] net/mlx5: Lag, refactor dmesg print Saeed Mahameed
2022-05-10  5:57 ` [net-next 14/15] net/mlx5: Lag, use buckets in hash mode Saeed Mahameed
2022-05-10  5:57 ` [net-next 15/15] net/mlx5: Lag, add debugfs to query hardware lag state Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).