All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net-next 00/15] mlx5 updates 2022-07-06
@ 2022-07-06 23:24 Saeed Mahameed
  2022-07-06 23:24 ` [net-next 01/15] net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter Saeed Mahameed
                   ` (14 more replies)
  0 siblings, 15 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev

From: Saeed Mahameed <saeedm@nvidia.com>

This series provides two updates to mlx5:
1) Fix devlink lock in mlx5 devlink eswitch callbacks
2) Pool hw objects to increase KTLs connection rates

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.


The following changes since commit cd355d0bc60df51266d228c0f69570cdcfa1e6ba:

  Merge branch 'hinic-dev_get_stats-fixes' (2022-07-06 13:09:28 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2022-07-06

for you to fetch changes up to 7b8e904d60fc01ae8edf46dc142f58f72f3d3780:

  net/mlx5e: kTLS, Dynamically re-size TX recycling pool (2022-07-06 16:20:09 -0700)

----------------------------------------------------------------
mlx5-updates-2022-07-06

Moshe Shemesh Says:
===================
1) Fix devlink lock in mlx5 devlink eswitch callbacks

Following the commit 14e426bf1a4d "devlink: hold the instance lock
during eswitch_mode callbacks" which takes devlink instance lock for all
devlink eswitch callbacks and adds a temporary workaround, this patchset
removes the workaround, replaces devlink API functions by devl_ API
where called from mlx5 driver eswitch callbacks flows and adds devlink
instance lock in other driver's path that leads to these functions.
While moving to devl_ API the patchset removes part of the devlink API
functions which mlx5 was the last one to use and so not used by any
driver now.

The patchset also remove DEVLINK_NL_FLAG_NO_LOCK flag from the callbacks
of port_new/port which are called only from mlx5 driver and the already
locked by the patchset as parallel paths to the eswitch callbacks using
devl_ API functions.

This patchset will be followed by another patchset that will remove
DEVLINK_NL_FLAG_NO_LOCK flag from devlink reload and devlink health
callbacks. Thus we will have all devlink callbacks locked and it will
pave the way to remove devlink mutex.
===================

Tariq Toukan Says:
==================
2) Use TLS TX pool to improve connection rate

To offload encryption operations, the device maintains state and keeps
track of every kTLS device-offloaded connection.  Two HW objects are used
per TX context of a kTLS offloaded connection:
a. Transport interface send (TIS) object, to reach the HW context.
b. Data Encryption Key (DEK) to perform the crypto operations.

These two objects are created and destroyed per TLS TX context, via FW
commands.  In total, 4 FW commands are issued per TLS TX context, which
seriously limits the connection rate.

In this series, we aim to save creation and destroy of TIS objects by
recycling them.  Upon recycling of a TIS, the HW still needs to be
notified for the re-mapping between a TIS and a context. This is done by
posting WQEs via an SQ, significantly faster API than the FW command
interface.

A pool is used for recycling. The pool dynamically interacts to the load
and connection rate, growing and shrinking accordingly.

Saving the TIS FW commands per context increases connection rate by ~42%,
from 11.6K to 16.5K connections per sec.

Connection rate is still limited by FW bottleneck due to the remaining
per context FW commands (DEK create/destroy). This will soon be addressed
in a followup series.  By combining the two series, the FW bottleneck
will be released, and a significantly higher (about 100K connections per
sec) kTLS TX device-offloaded connection rate is reached.
==================

----------------------------------------------------------------
Moshe Shemesh (9):
      net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter
      net/mlx5: Use devl_ API for rate nodes destroy
      devlink: Remove unused function devlink_rate_nodes_destroy
      net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register
      net/mlx5: Use devl_ API in mlx5_esw_devlink_sf_port_register
      devlink: Remove unused functions devlink_rate_leaf_create/destroy
      net/mlx5: Use devl_ API in mlx5e_devlink_port_register
      net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set
      devlink: Hold the instance lock in port_new / port_del callbacks

Tariq Toukan (6):
      net/tls: Perform immediate device ctx cleanup when possible
      net/tls: Multi-threaded calls to TX tls_dev_del
      net/mlx5e: kTLS, Introduce TLS-specific create TIS
      net/mlx5e: kTLS, Take stats out of OOO handler
      net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections
      net/mlx5e: kTLS, Dynamically re-size TX recycling pool

 drivers/net/ethernet/mellanox/mlx5/core/dev.c      |  29 +-
 .../net/ethernet/mellanox/mlx5/core/en/devlink.c   |  16 +-
 .../mellanox/mlx5/core/en_accel/en_accel.h         |  10 +
 .../ethernet/mellanox/mlx5/core/en_accel/ktls.h    |  14 +
 .../mellanox/mlx5/core/en_accel/ktls_stats.c       |   2 +
 .../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 513 ++++++++++++++++++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   9 +
 .../ethernet/mellanox/mlx5/core/esw/devlink_port.c |  20 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  18 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  57 +--
 include/linux/mlx5/driver.h                        |   4 +
 include/net/devlink.h                              |   3 -
 include/net/tls.h                                  |   6 +
 net/core/devlink.c                                 |  66 +--
 net/tls/tls_device.c                               |  56 +--
 15 files changed, 603 insertions(+), 220 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [net-next 01/15] net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 02/15] net/mlx5: Use devl_ API for rate nodes destroy Saeed Mahameed
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

The function mlx5_eswtich_mode_callback_enter() was added as a temporary
workaround once devlink instance lock was added to devlink eswitch
callbacks. However, code review and testing show that all the callbacks
part to eswitch_mode_set don't take devlink instance lock in any flow
and so unlocking devlink instance lock while entering these functions is
not needed.

Remove devl_lock from mlx5_eswtich_mode_callback_enter() and devl_unlock
from mlx5_eswtich_mode_callback_exit(). Also remove the functions
mlx5_eswtich_mode_callback_enter()/exit() as they are not needed any
more. The callback eswitch_mode_set will be treated separately in the
following patches.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     | 43 +++++--------------
 1 file changed, 11 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index e224ec7005a6..3bd843e6d66a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3342,27 +3342,6 @@ static int esw_inline_mode_to_devlink(u8 mlx5_mode, u8 *mode)
 	return 0;
 }
 
-/* FIXME: devl_unlock() followed by devl_lock() inside driver callback
- * is never correct and prone to races. It's a transitional workaround,
- * never repeat this pattern.
- *
- * This code MUST be fixed before removing devlink_mutex as it is safe
- * to do only because of that mutex.
- */
-static void mlx5_eswtich_mode_callback_enter(struct devlink *devlink,
-					     struct mlx5_eswitch *esw)
-{
-	devl_unlock(devlink);
-	down_write(&esw->mode_lock);
-}
-
-static void mlx5_eswtich_mode_callback_exit(struct devlink *devlink,
-					    struct mlx5_eswitch *esw)
-{
-	up_write(&esw->mode_lock);
-	devl_lock(devlink);
-}
-
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 				  struct netlink_ext_ack *extack)
 {
@@ -3431,9 +3410,9 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 	if (IS_ERR(esw))
 		return PTR_ERR(esw);
 
-	mlx5_eswtich_mode_callback_enter(devlink, esw);
+	down_write(&esw->mode_lock);
 	err = esw_mode_to_devlink(esw->mode, mode);
-	mlx5_eswtich_mode_callback_exit(devlink, esw);
+	up_write(&esw->mode_lock);
 	return err;
 }
 
@@ -3480,7 +3459,7 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode,
 	if (IS_ERR(esw))
 		return PTR_ERR(esw);
 
-	mlx5_eswtich_mode_callback_enter(devlink, esw);
+	down_write(&esw->mode_lock);
 
 	switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) {
 	case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
@@ -3514,11 +3493,11 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode,
 		goto out;
 
 	esw->offloads.inline_mode = mlx5_mode;
-	mlx5_eswtich_mode_callback_exit(devlink, esw);
+	up_write(&esw->mode_lock);
 	return 0;
 
 out:
-	mlx5_eswtich_mode_callback_exit(devlink, esw);
+	up_write(&esw->mode_lock);
 	return err;
 }
 
@@ -3531,9 +3510,9 @@ int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode)
 	if (IS_ERR(esw))
 		return PTR_ERR(esw);
 
-	mlx5_eswtich_mode_callback_enter(devlink, esw);
+	down_write(&esw->mode_lock);
 	err = esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode);
-	mlx5_eswtich_mode_callback_exit(devlink, esw);
+	up_write(&esw->mode_lock);
 	return err;
 }
 
@@ -3549,7 +3528,7 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink,
 	if (IS_ERR(esw))
 		return PTR_ERR(esw);
 
-	mlx5_eswtich_mode_callback_enter(devlink, esw);
+	down_write(&esw->mode_lock);
 
 	if (encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE &&
 	    (!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, reformat) ||
@@ -3592,7 +3571,7 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink,
 	}
 
 unlock:
-	mlx5_eswtich_mode_callback_exit(devlink, esw);
+	up_write(&esw->mode_lock);
 	return err;
 }
 
@@ -3605,9 +3584,9 @@ int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink,
 	if (IS_ERR(esw))
 		return PTR_ERR(esw);
 
-	mlx5_eswtich_mode_callback_enter(devlink, esw);
+	down_write(&esw->mode_lock);
 	*encap = esw->offloads.encap;
-	mlx5_eswtich_mode_callback_exit(devlink, esw);
+	up_write(&esw->mode_lock);
 	return 0;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 02/15] net/mlx5: Use devl_ API for rate nodes destroy
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
  2022-07-06 23:24 ` [net-next 01/15] net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 03/15] devlink: Remove unused function devlink_rate_nodes_destroy Saeed Mahameed
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

Use devl_rate_nodes_destroy() instead of devlink_rate_nodes_destroy().
Add devlink instance lock in the driver paths to this function to have
it locked while calling devl_ API function.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 14 ++++++++++++--
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b938632f89ff..571114e4878f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1330,9 +1330,13 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs)
 /* When disabling sriov, free driver level resources. */
 void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf)
 {
+	struct devlink *devlink;
+
 	if (!mlx5_esw_allowed(esw))
 		return;
 
+	devlink = priv_to_devlink(esw->dev);
+	devl_lock(devlink);
 	down_write(&esw->mode_lock);
 	/* If driver is unloaded, this function is called twice by remove_one()
 	 * and mlx5_unload(). Prevent the second call.
@@ -1354,13 +1358,14 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf)
 		struct devlink *devlink = priv_to_devlink(esw->dev);
 
 		esw_offloads_del_send_to_vport_meta_rules(esw);
-		devlink_rate_nodes_destroy(devlink);
+		devl_rate_nodes_destroy(devlink);
 	}
 
 	esw->esw_funcs.num_vfs = 0;
 
 unlock:
 	up_write(&esw->mode_lock);
+	devl_unlock(devlink);
 }
 
 /* Free resources for corresponding eswitch mode. It is called by devlink
@@ -1389,18 +1394,23 @@ void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw)
 	mlx5_esw_acls_ns_cleanup(esw);
 
 	if (esw->mode == MLX5_ESWITCH_OFFLOADS)
-		devlink_rate_nodes_destroy(devlink);
+		devl_rate_nodes_destroy(devlink);
 }
 
 void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
 {
+	struct devlink *devlink;
+
 	if (!mlx5_esw_allowed(esw))
 		return;
 
 	mlx5_lag_disable_change(esw->dev);
+	devlink = priv_to_devlink(esw->dev);
+	devl_lock(devlink);
 	down_write(&esw->mode_lock);
 	mlx5_eswitch_disable_locked(esw);
 	up_write(&esw->mode_lock);
+	devl_unlock(devlink);
 	mlx5_lag_enable_change(esw->dev);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 3bd843e6d66a..f1640e4cb719 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3377,7 +3377,9 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	if (cur_mlx5_mode == mlx5_mode)
 		goto unlock;
 
+	devl_lock(devlink);
 	mlx5_eswitch_disable_locked(esw);
+	devl_unlock(devlink);
 	if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
 		if (mlx5_devlink_trap_get_num_active(esw->dev)) {
 			NL_SET_ERR_MSG_MOD(extack,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 03/15] devlink: Remove unused function devlink_rate_nodes_destroy
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
  2022-07-06 23:24 ` [net-next 01/15] net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter Saeed Mahameed
  2022-07-06 23:24 ` [net-next 02/15] net/mlx5: Use devl_ API for rate nodes destroy Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 04/15] net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register Saeed Mahameed
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

The previous patch removed the last usage of the function
devlink_rate_nodes_destroy(). Thus, remove this function from devlink
API.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/net/devlink.h |  1 -
 net/core/devlink.c    | 18 ------------------
 2 files changed, 19 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2a2a2a0c93f7..0e163cc87d45 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1571,7 +1571,6 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
 				   bool external);
 int devlink_rate_leaf_create(struct devlink_port *port, void *priv);
 void devlink_rate_leaf_destroy(struct devlink_port *devlink_port);
-void devlink_rate_nodes_destroy(struct devlink *devlink);
 void devlink_port_linecard_set(struct devlink_port *devlink_port,
 			       struct devlink_linecard *linecard);
 struct devlink_linecard *
diff --git a/net/core/devlink.c b/net/core/devlink.c
index db61f3a341cb..1588e2246234 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -10095,24 +10095,6 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
 }
 EXPORT_SYMBOL_GPL(devl_rate_nodes_destroy);
 
-/**
- * devlink_rate_nodes_destroy - destroy all devlink rate nodes on device
- *
- * @devlink: devlink instance
- *
- * Unset parent for all rate objects and destroy all rate nodes
- * on specified device.
- *
- * Context: Takes and release devlink->lock <mutex>.
- */
-void devlink_rate_nodes_destroy(struct devlink *devlink)
-{
-	mutex_lock(&devlink->lock);
-	devl_rate_nodes_destroy(devlink);
-	mutex_unlock(&devlink->lock);
-}
-EXPORT_SYMBOL_GPL(devlink_rate_nodes_destroy);
-
 /**
  *	devlink_port_linecard_set - Link port with a linecard
  *
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 04/15] net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 03/15] devlink: Remove unused function devlink_rate_nodes_destroy Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 05/15] net/mlx5: Use devl_ API in mlx5_esw_devlink_sf_port_register Saeed Mahameed
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

The function mlx5_esw_offloads_devlink_port_register() calls
devlink_port_register() and devlink_rate_leaf_create(). Use devl_ API to
call devl_port_register() and devl_rate_leaf_create() accordingly and
add devlink instance lock in driver paths to this function.

Similarly, use devl_ API to call devl_port_unregister() and
devl_rate_leaf_destroy() in mlx5_esw_offloads_devlink_port_unregister()
and ensure locking devlink instance lock on the paths to this function
too.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/devlink_port.c | 10 +++++-----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c      |  4 ++++
 .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 10 ++++++++++
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index 7f9b96d9537e..a8f7618831f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -87,11 +87,11 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_
 
 	devlink = priv_to_devlink(dev);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, vport_num);
-	err = devlink_port_register(devlink, dl_port, dl_port_index);
+	err = devl_port_register(devlink, dl_port, dl_port_index);
 	if (err)
 		goto reg_err;
 
-	err = devlink_rate_leaf_create(dl_port, vport);
+	err = devl_rate_leaf_create(dl_port, vport);
 	if (err)
 		goto rate_err;
 
@@ -99,7 +99,7 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_
 	return 0;
 
 rate_err:
-	devlink_port_unregister(dl_port);
+	devl_port_unregister(dl_port);
 reg_err:
 	mlx5_esw_dl_port_free(dl_port);
 	return err;
@@ -118,10 +118,10 @@ void mlx5_esw_offloads_devlink_port_unregister(struct mlx5_eswitch *esw, u16 vpo
 
 	if (vport->dl_port->devlink_rate) {
 		mlx5_esw_qos_vport_update_group(esw, vport, NULL, NULL);
-		devlink_rate_leaf_destroy(vport->dl_port);
+		devl_rate_leaf_destroy(vport->dl_port);
 	}
 
-	devlink_port_unregister(vport->dl_port);
+	devl_port_unregister(vport->dl_port);
 	mlx5_esw_dl_port_free(vport->dl_port);
 	vport->dl_port = NULL;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 571114e4878f..b95f75431882 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1296,6 +1296,7 @@ int mlx5_eswitch_enable_locked(struct mlx5_eswitch *esw, int num_vfs)
  */
 int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs)
 {
+	struct devlink *devlink;
 	bool toggle_lag;
 	int ret;
 
@@ -1307,6 +1308,8 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs)
 	if (toggle_lag)
 		mlx5_lag_disable_change(esw->dev);
 
+	devlink = priv_to_devlink(esw->dev);
+	devl_lock(devlink);
 	down_write(&esw->mode_lock);
 	if (!mlx5_esw_is_fdb_created(esw)) {
 		ret = mlx5_eswitch_enable_locked(esw, num_vfs);
@@ -1320,6 +1323,7 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs)
 			esw->esw_funcs.num_vfs = num_vfs;
 	}
 	up_write(&esw->mode_lock);
+	devl_unlock(devlink);
 
 	if (toggle_lag)
 		mlx5_lag_enable_change(esw->dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index f1640e4cb719..1bfbc88f513f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2177,8 +2177,10 @@ static int esw_create_restore_table(struct mlx5_eswitch *esw)
 static int esw_offloads_start(struct mlx5_eswitch *esw,
 			      struct netlink_ext_ack *extack)
 {
+	struct devlink *devlink = priv_to_devlink(esw->dev);
 	int err, err1;
 
+	devl_lock(devlink);
 	esw->mode = MLX5_ESWITCH_OFFLOADS;
 	err = mlx5_eswitch_enable_locked(esw, esw->dev->priv.sriov.num_vfs);
 	if (err) {
@@ -2200,6 +2202,7 @@ static int esw_offloads_start(struct mlx5_eswitch *esw,
 					   "Inline mode is different between vports");
 		}
 	}
+	devl_unlock(devlink);
 	return err;
 }
 
@@ -3064,6 +3067,7 @@ static void esw_offloads_steering_cleanup(struct mlx5_eswitch *esw)
 static void
 esw_vfs_changed_event_handler(struct mlx5_eswitch *esw, const u32 *out)
 {
+	struct devlink *devlink;
 	bool host_pf_disabled;
 	u16 new_num_vfs;
 
@@ -3075,6 +3079,8 @@ esw_vfs_changed_event_handler(struct mlx5_eswitch *esw, const u32 *out)
 	if (new_num_vfs == esw->esw_funcs.num_vfs || host_pf_disabled)
 		return;
 
+	devlink = priv_to_devlink(esw->dev);
+	devl_lock(devlink);
 	/* Number of VFs can only change from "0 to x" or "x to 0". */
 	if (esw->esw_funcs.num_vfs > 0) {
 		mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs);
@@ -3087,6 +3093,7 @@ esw_vfs_changed_event_handler(struct mlx5_eswitch *esw, const u32 *out)
 			return;
 	}
 	esw->esw_funcs.num_vfs = new_num_vfs;
+	devl_unlock(devlink);
 }
 
 static void esw_functions_changed_event_handler(struct work_struct *work)
@@ -3236,8 +3243,10 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
 static int esw_offloads_stop(struct mlx5_eswitch *esw,
 			     struct netlink_ext_ack *extack)
 {
+	struct devlink *devlink = priv_to_devlink(esw->dev);
 	int err, err1;
 
+	devl_lock(devlink);
 	esw->mode = MLX5_ESWITCH_LEGACY;
 	err = mlx5_eswitch_enable_locked(esw, MLX5_ESWITCH_IGNORE_NUM_VFS);
 	if (err) {
@@ -3249,6 +3258,7 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
 					   "Failed setting eswitch back to offloads");
 		}
 	}
+	devl_unlock(devlink);
 
 	return err;
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 05/15] net/mlx5: Use devl_ API in mlx5_esw_devlink_sf_port_register
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 04/15] net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 06/15] devlink: Remove unused functions devlink_rate_leaf_create/destroy Saeed Mahameed
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

The function mlx5_esw_devlink_sf_port_register() calls
devlink_port_register() and devlink_rate_leaf_create(). Use devl_ API to
call devl_port_register() and devl_rate_leaf_create() accordingly and
add devlink instance lock in driver paths to this function.

Similarly, use devl_ API to call devl_port_unregister() and
devl_rate_leaf_destroy() in mlx5_esw_devlink_sf_port_unregister() and
ensure locking devlink instance lock on all the paths to this function
too.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Note this patch is taking devlink lock on mlx5_devlink_sf_port_new/del()
which are devlink callbacks for port_new/del(). We will take these locks
off once these callbacks will be locked by devlink too.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/devlink_port.c | 10 +++++-----
 drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c   |  4 ++++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index a8f7618831f5..9bc7be95db54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -156,11 +156,11 @@ int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_p
 	devlink_port_attrs_pci_sf_set(dl_port, controller, pfnum, sfnum, !!controller);
 	devlink = priv_to_devlink(dev);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, vport_num);
-	err = devlink_port_register(devlink, dl_port, dl_port_index);
+	err = devl_port_register(devlink, dl_port, dl_port_index);
 	if (err)
 		return err;
 
-	err = devlink_rate_leaf_create(dl_port, vport);
+	err = devl_rate_leaf_create(dl_port, vport);
 	if (err)
 		goto rate_err;
 
@@ -168,7 +168,7 @@ int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_p
 	return 0;
 
 rate_err:
-	devlink_port_unregister(dl_port);
+	devl_port_unregister(dl_port);
 	return err;
 }
 
@@ -182,9 +182,9 @@ void mlx5_esw_devlink_sf_port_unregister(struct mlx5_eswitch *esw, u16 vport_num
 
 	if (vport->dl_port->devlink_rate) {
 		mlx5_esw_qos_vport_update_group(esw, vport, NULL, NULL);
-		devlink_rate_leaf_destroy(vport->dl_port);
+		devl_rate_leaf_destroy(vport->dl_port);
 	}
 
-	devlink_port_unregister(vport->dl_port);
+	devl_port_unregister(vport->dl_port);
 	vport->dl_port = NULL;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
index 7d955a4d9f14..2068c22ff367 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
@@ -355,7 +355,9 @@ int mlx5_devlink_sf_port_new(struct devlink *devlink,
 				   "Port add is only supported in eswitch switchdev mode or SF ports are disabled.");
 		return -EOPNOTSUPP;
 	}
+	devl_lock(devlink);
 	err = mlx5_sf_add(dev, table, new_attr, extack, new_port_index);
+	devl_unlock(devlink);
 	mlx5_sf_table_put(table);
 	return err;
 }
@@ -400,7 +402,9 @@ int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index,
 		goto sf_err;
 	}
 
+	devl_lock(devlink);
 	mlx5_esw_offloads_sf_vport_disable(esw, sf->hw_fn_id);
+	devl_unlock(devlink);
 	mlx5_sf_id_erase(table, sf);
 
 	mutex_lock(&table->sf_state_lock);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 06/15] devlink: Remove unused functions devlink_rate_leaf_create/destroy
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 05/15] net/mlx5: Use devl_ API in mlx5_esw_devlink_sf_port_register Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 07/15] net/mlx5: Use devl_ API in mlx5e_devlink_port_register Saeed Mahameed
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

The previous patch removed the last usage of the functions
devlink_rate_leaf_create() and devlink_rate_nodes_destroy(). Thus,
remove these function from devlink API.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/net/devlink.h |  2 --
 net/core/devlink.c    | 42 +++++++-----------------------------------
 2 files changed, 7 insertions(+), 37 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 0e163cc87d45..5150deb67fab 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1569,8 +1569,6 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
 void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
 				   u32 controller, u16 pf, u32 sf,
 				   bool external);
-int devlink_rate_leaf_create(struct devlink_port *port, void *priv);
-void devlink_rate_leaf_destroy(struct devlink_port *devlink_port);
 void devlink_port_linecard_set(struct devlink_port *devlink_port,
 			       struct devlink_linecard *linecard);
 struct devlink_linecard *
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 1588e2246234..970e5c2a52bd 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -10006,20 +10006,13 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv)
 }
 EXPORT_SYMBOL_GPL(devl_rate_leaf_create);
 
-int
-devlink_rate_leaf_create(struct devlink_port *devlink_port, void *priv)
-{
-	struct devlink *devlink = devlink_port->devlink;
-	int ret;
-
-	mutex_lock(&devlink->lock);
-	ret = devl_rate_leaf_create(devlink_port, priv);
-	mutex_unlock(&devlink->lock);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(devlink_rate_leaf_create);
-
+/**
+ * devl_rate_leaf_destroy - destroy devlink rate leaf
+ *
+ * @devlink_port: devlink port linked to the rate object
+ *
+ * Destroy the devlink rate object of type leaf on provided @devlink_port.
+ */
 void devl_rate_leaf_destroy(struct devlink_port *devlink_port)
 {
 	struct devlink_rate *devlink_rate = devlink_port->devlink_rate;
@@ -10037,27 +10030,6 @@ void devl_rate_leaf_destroy(struct devlink_port *devlink_port)
 }
 EXPORT_SYMBOL_GPL(devl_rate_leaf_destroy);
 
-/**
- * devlink_rate_leaf_destroy - destroy devlink rate leaf
- *
- * @devlink_port: devlink port linked to the rate object
- *
- * Context: Takes and release devlink->lock <mutex>.
- */
-void devlink_rate_leaf_destroy(struct devlink_port *devlink_port)
-{
-	struct devlink_rate *devlink_rate = devlink_port->devlink_rate;
-	struct devlink *devlink = devlink_port->devlink;
-
-	if (!devlink_rate)
-		return;
-
-	mutex_lock(&devlink->lock);
-	devl_rate_leaf_destroy(devlink_port);
-	mutex_unlock(&devlink->lock);
-}
-EXPORT_SYMBOL_GPL(devlink_rate_leaf_destroy);
-
 /**
  * devl_rate_nodes_destroy - destroy all devlink rate nodes on device
  * @devlink: devlink instance
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 07/15] net/mlx5: Use devl_ API in mlx5e_devlink_port_register
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 06/15] devlink: Remove unused functions devlink_rate_leaf_create/destroy Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 08/15] net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set Saeed Mahameed
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh

From: Moshe Shemesh <moshe@nvidia.com>

As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to
mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove
and register/unregister mlx5e driver ports accordingly. This can lead to
deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock.
Use devl_port_register/unregister() instead of
devlink_port_register/unregister() and add devlink instance locks in the
driver paths to this function to have it locked while calling devl_ API
function.

If remove or probe were called by module init or module cleanup flows,
need to lock devlink just before calling devl_port_register(), otherwise
it is called by attach/detach or register/unregister flow and we can
have the flow locked. Added flag to distinguish between these cases.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink locked.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/dev.c | 29 +++++++++++++++++--
 .../ethernet/mellanox/mlx5/core/en/devlink.c  | 16 ++++++++--
 .../mellanox/mlx5/core/eswitch_offloads.c     |  2 ++
 include/linux/mlx5/driver.h                   |  4 +++
 4 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 50422b56a64d..ccf2068d2e79 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -335,13 +335,16 @@ static void del_adev(struct auxiliary_device *adev)
 
 int mlx5_attach_device(struct mlx5_core_dev *dev)
 {
+	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_priv *priv = &dev->priv;
 	struct auxiliary_device *adev;
 	struct auxiliary_driver *adrv;
 	int ret = 0, i;
 
+	devl_lock(devlink);
 	mutex_lock(&mlx5_intf_mutex);
 	priv->flags &= ~MLX5_PRIV_FLAGS_DETACH;
+	priv->flags |= MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW;
 	for (i = 0; i < ARRAY_SIZE(mlx5_adev_devices); i++) {
 		if (!priv->adev[i]) {
 			bool is_supported = false;
@@ -389,19 +392,24 @@ int mlx5_attach_device(struct mlx5_core_dev *dev)
 			break;
 		}
 	}
+	priv->flags &= ~MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW;
 	mutex_unlock(&mlx5_intf_mutex);
+	devl_unlock(devlink);
 	return ret;
 }
 
 void mlx5_detach_device(struct mlx5_core_dev *dev)
 {
+	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_priv *priv = &dev->priv;
 	struct auxiliary_device *adev;
 	struct auxiliary_driver *adrv;
 	pm_message_t pm = {};
 	int i;
 
+	devl_lock(devlink);
 	mutex_lock(&mlx5_intf_mutex);
+	priv->flags |= MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW;
 	for (i = ARRAY_SIZE(mlx5_adev_devices) - 1; i >= 0; i--) {
 		if (!priv->adev[i])
 			continue;
@@ -430,18 +438,24 @@ void mlx5_detach_device(struct mlx5_core_dev *dev)
 		del_adev(&priv->adev[i]->adev);
 		priv->adev[i] = NULL;
 	}
+	priv->flags &= ~MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW;
 	priv->flags |= MLX5_PRIV_FLAGS_DETACH;
 	mutex_unlock(&mlx5_intf_mutex);
+	devl_unlock(devlink);
 }
 
 int mlx5_register_device(struct mlx5_core_dev *dev)
 {
+	struct devlink *devlink;
 	int ret;
 
+	devlink = priv_to_devlink(dev);
+	devl_lock(devlink);
 	mutex_lock(&mlx5_intf_mutex);
 	dev->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
 	ret = mlx5_rescan_drivers_locked(dev);
 	mutex_unlock(&mlx5_intf_mutex);
+	devl_unlock(devlink);
 	if (ret)
 		mlx5_unregister_device(dev);
 
@@ -450,10 +464,15 @@ int mlx5_register_device(struct mlx5_core_dev *dev)
 
 void mlx5_unregister_device(struct mlx5_core_dev *dev)
 {
+	struct devlink *devlink;
+
+	devlink = priv_to_devlink(dev);
+	devl_lock(devlink);
 	mutex_lock(&mlx5_intf_mutex);
 	dev->priv.flags = MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
 	mlx5_rescan_drivers_locked(dev);
 	mutex_unlock(&mlx5_intf_mutex);
+	devl_unlock(devlink);
 }
 
 static int add_drivers(struct mlx5_core_dev *dev)
@@ -526,16 +545,22 @@ static void delete_drivers(struct mlx5_core_dev *dev)
 int mlx5_rescan_drivers_locked(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
+	int err = 0;
 
 	lockdep_assert_held(&mlx5_intf_mutex);
 	if (priv->flags & MLX5_PRIV_FLAGS_DETACH)
 		return 0;
 
+	priv->flags |= MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW;
 	delete_drivers(dev);
 	if (priv->flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV)
-		return 0;
+		goto out;
+
+	err = add_drivers(dev);
 
-	return add_drivers(dev);
+out:
+	priv->flags &= ~MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW;
+	return err;
 }
 
 bool mlx5_same_hw_devs(struct mlx5_core_dev *dev, struct mlx5_core_dev *peer_dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c
index ae52e7f38306..b69f9d10ccbd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c
@@ -21,6 +21,7 @@ int mlx5e_devlink_port_register(struct mlx5e_priv *priv)
 	struct netdev_phys_item_id ppid = {};
 	struct devlink_port *dl_port;
 	unsigned int dl_port_index;
+	int ret;
 
 	if (mlx5_core_is_pf(priv->mdev)) {
 		attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
@@ -41,7 +42,13 @@ int mlx5e_devlink_port_register(struct mlx5e_priv *priv)
 	memset(dl_port, 0, sizeof(*dl_port));
 	devlink_port_attrs_set(dl_port, &attrs);
 
-	return devlink_port_register(devlink, dl_port, dl_port_index);
+	if (!(priv->mdev->priv.flags & MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW))
+		devl_lock(devlink);
+	ret = devl_port_register(devlink, dl_port, dl_port_index);
+	if (!(priv->mdev->priv.flags & MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW))
+		devl_unlock(devlink);
+
+	return ret;
 }
 
 void mlx5e_devlink_port_type_eth_set(struct mlx5e_priv *priv)
@@ -54,8 +61,13 @@ void mlx5e_devlink_port_type_eth_set(struct mlx5e_priv *priv)
 void mlx5e_devlink_port_unregister(struct mlx5e_priv *priv)
 {
 	struct devlink_port *dl_port = mlx5e_devlink_get_dl_port(priv);
+	struct devlink *devlink = priv_to_devlink(priv->mdev);
 
-	devlink_port_unregister(dl_port);
+	if (!(priv->mdev->priv.flags & MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW))
+		devl_lock(devlink);
+	devl_port_unregister(dl_port);
+	if (!(priv->mdev->priv.flags & MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW))
+		devl_unlock(devlink);
 }
 
 struct devlink_port *mlx5e_get_devlink_port(struct net_device *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 1bfbc88f513f..ccda3a0a2594 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3400,7 +3400,9 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 		err = esw_offloads_start(esw, extack);
 	} else if (mode == DEVLINK_ESWITCH_MODE_LEGACY) {
 		err = esw_offloads_stop(esw, extack);
+		devl_lock(devlink);
 		mlx5_rescan_drivers(esw->dev);
+		devl_unlock(devlink);
 	} else {
 		err = -EINVAL;
 	}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 76d7661e3e63..bd882884b23c 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -551,6 +551,10 @@ enum {
 	 * creation/deletion on drivers rescan. Unset during device attach.
 	 */
 	MLX5_PRIV_FLAGS_DETACH = 1 << 2,
+	/* Distinguish between mlx5e_probe/remove called by module init/cleanup
+	 * and called by other flows which can already hold devlink lock
+	 */
+	MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW = 1 << 3,
 };
 
 struct mlx5_adev {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 08/15] net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 07/15] net/mlx5: Use devl_ API in mlx5e_devlink_port_register Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 09/15] devlink: Hold the instance lock in port_new / port_del callbacks Saeed Mahameed
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

The callback mlx5_devlink_eswitch_mode_set() had unlocked devlink as a
temporary workaround once devlink instance lock was added to devlink
eswitch callbacks. Now that all flows triggered by this function
that took devlink lock are using devl_ API and all parallel paths are
locked we can remove this workaround.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     | 20 -------------------
 1 file changed, 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index ccda3a0a2594..d3da52e3fc67 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2177,10 +2177,8 @@ static int esw_create_restore_table(struct mlx5_eswitch *esw)
 static int esw_offloads_start(struct mlx5_eswitch *esw,
 			      struct netlink_ext_ack *extack)
 {
-	struct devlink *devlink = priv_to_devlink(esw->dev);
 	int err, err1;
 
-	devl_lock(devlink);
 	esw->mode = MLX5_ESWITCH_OFFLOADS;
 	err = mlx5_eswitch_enable_locked(esw, esw->dev->priv.sriov.num_vfs);
 	if (err) {
@@ -2202,7 +2200,6 @@ static int esw_offloads_start(struct mlx5_eswitch *esw,
 					   "Inline mode is different between vports");
 		}
 	}
-	devl_unlock(devlink);
 	return err;
 }
 
@@ -3243,10 +3240,8 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
 static int esw_offloads_stop(struct mlx5_eswitch *esw,
 			     struct netlink_ext_ack *extack)
 {
-	struct devlink *devlink = priv_to_devlink(esw->dev);
 	int err, err1;
 
-	devl_lock(devlink);
 	esw->mode = MLX5_ESWITCH_LEGACY;
 	err = mlx5_eswitch_enable_locked(esw, MLX5_ESWITCH_IGNORE_NUM_VFS);
 	if (err) {
@@ -3258,7 +3253,6 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
 					   "Failed setting eswitch back to offloads");
 		}
 	}
-	devl_unlock(devlink);
 
 	return err;
 }
@@ -3366,15 +3360,6 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	if (esw_mode_from_devlink(mode, &mlx5_mode))
 		return -EINVAL;
 
-	/* FIXME: devl_unlock() followed by devl_lock() inside driver callback
-	 * is never correct and prone to races. It's a transitional workaround,
-	 * never repeat this pattern.
-	 *
-	 * This code MUST be fixed before removing devlink_mutex as it is safe
-	 * to do only because of that mutex.
-	 */
-	devl_unlock(devlink);
-
 	mlx5_lag_disable_change(esw->dev);
 	err = mlx5_esw_try_lock(esw);
 	if (err < 0) {
@@ -3387,9 +3372,7 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	if (cur_mlx5_mode == mlx5_mode)
 		goto unlock;
 
-	devl_lock(devlink);
 	mlx5_eswitch_disable_locked(esw);
-	devl_unlock(devlink);
 	if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
 		if (mlx5_devlink_trap_get_num_active(esw->dev)) {
 			NL_SET_ERR_MSG_MOD(extack,
@@ -3400,9 +3383,7 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 		err = esw_offloads_start(esw, extack);
 	} else if (mode == DEVLINK_ESWITCH_MODE_LEGACY) {
 		err = esw_offloads_stop(esw, extack);
-		devl_lock(devlink);
 		mlx5_rescan_drivers(esw->dev);
-		devl_unlock(devlink);
 	} else {
 		err = -EINVAL;
 	}
@@ -3411,7 +3392,6 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	mlx5_esw_unlock(esw);
 enable_lag:
 	mlx5_lag_enable_change(esw->dev);
-	devl_lock(devlink);
 	return err;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 09/15] devlink: Hold the instance lock in port_new / port_del callbacks
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 08/15] net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible Saeed Mahameed
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Moshe Shemesh, Leon Romanovsky

From: Moshe Shemesh <moshe@nvidia.com>

Let the core take the devlink instance lock around port_new and port_del
callbacks and remove the now redundant locking in the only driver that
currently use them.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c | 4 ----
 net/core/devlink.c                                   | 6 +-----
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
index 2068c22ff367..7d955a4d9f14 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
@@ -355,9 +355,7 @@ int mlx5_devlink_sf_port_new(struct devlink *devlink,
 				   "Port add is only supported in eswitch switchdev mode or SF ports are disabled.");
 		return -EOPNOTSUPP;
 	}
-	devl_lock(devlink);
 	err = mlx5_sf_add(dev, table, new_attr, extack, new_port_index);
-	devl_unlock(devlink);
 	mlx5_sf_table_put(table);
 	return err;
 }
@@ -402,9 +400,7 @@ int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index,
 		goto sf_err;
 	}
 
-	devl_lock(devlink);
 	mlx5_esw_offloads_sf_vport_disable(esw, sf->hw_fn_id);
-	devl_unlock(devlink);
 	mlx5_sf_id_erase(table, sf);
 
 	mutex_lock(&table->sf_state_lock);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 970e5c2a52bd..e206cc90bec5 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1712,7 +1712,7 @@ static int devlink_port_new_notifiy(struct devlink *devlink,
 	if (!msg)
 		return -ENOMEM;
 
-	mutex_lock(&devlink->lock);
+	lockdep_assert_held(&devlink->lock);
 	devlink_port = devlink_port_get_by_index(devlink, port_index);
 	if (!devlink_port) {
 		err = -ENODEV;
@@ -1725,11 +1725,9 @@ static int devlink_port_new_notifiy(struct devlink *devlink,
 		goto out;
 
 	err = genlmsg_reply(msg, info);
-	mutex_unlock(&devlink->lock);
 	return err;
 
 out:
-	mutex_unlock(&devlink->lock);
 	nlmsg_free(msg);
 	return err;
 }
@@ -9067,13 +9065,11 @@ static const struct genl_small_ops devlink_nl_ops[] = {
 		.cmd = DEVLINK_CMD_PORT_NEW,
 		.doit = devlink_nl_cmd_port_new_doit,
 		.flags = GENL_ADMIN_PERM,
-		.internal_flags = DEVLINK_NL_FLAG_NO_LOCK,
 	},
 	{
 		.cmd = DEVLINK_CMD_PORT_DEL,
 		.doit = devlink_nl_cmd_port_del_doit,
 		.flags = GENL_ADMIN_PERM,
-		.internal_flags = DEVLINK_NL_FLAG_NO_LOCK,
 	},
 	{
 		.cmd = DEVLINK_CMD_LINECARD_GET,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 09/15] devlink: Hold the instance lock in port_new / port_del callbacks Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-07  2:21   ` Jakub Kicinski
  2022-07-06 23:24 ` [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del Saeed Mahameed
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Tariq Toukan <tariqt@nvidia.com>

TLS context destructor can be run in atomic context. Cleanup operations
for device-offloaded contexts could require access and interaction with
the device callbacks, which might sleep. Hence, the cleanup of such
contexts must be deferred and completed inside an async work.

For all others, this is not necessary, as cleanup is atomic. Invoke
cleanup immediately for them, avoiding queueuing redundant gc work.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 net/tls/tls_device.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index ec6f4b699a2b..2c004ce46887 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -95,16 +95,24 @@ static void tls_device_gc_task(struct work_struct *work)
 static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
 {
 	unsigned long flags;
+	bool async_cleanup;
 
 	spin_lock_irqsave(&tls_device_lock, flags);
-	list_move_tail(&ctx->list, &tls_device_gc_list);
-
-	/* schedule_work inside the spinlock
-	 * to make sure tls_device_down waits for that work.
-	 */
-	schedule_work(&tls_device_gc_work);
+	async_cleanup = ctx->netdev && ctx->tx_conf == TLS_HW;
+	if (async_cleanup) {
+		list_move_tail(&ctx->list, &tls_device_gc_list);
 
+		/* schedule_work inside the spinlock
+		 * to make sure tls_device_down waits for that work.
+		 */
+		schedule_work(&tls_device_gc_work);
+	} else {
+		list_del(&ctx->list);
+	}
 	spin_unlock_irqrestore(&tls_device_lock, flags);
+
+	if (!async_cleanup)
+		tls_device_free_ctx(ctx);
 }
 
 /* We assume that the socket is already connected */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-07  2:37   ` Jakub Kicinski
  2022-07-06 23:24 ` [net-next 12/15] net/mlx5e: kTLS, Introduce TLS-specific create TIS Saeed Mahameed
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

From: Tariq Toukan <tariqt@nvidia.com>

Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.

This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate).  When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.

Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel.

Tested with mlx5 device:
Before: 22141 add/sec,   103 del/sec
After:  11684 add/sec, 11684 del/sec

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/net/tls.h    |  6 +++++
 net/tls/tls_device.c | 56 ++++++++++++++------------------------------
 2 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4fc16ca5f469..c4be74635502 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -163,6 +163,11 @@ struct tls_record_info {
 	skb_frag_t frags[MAX_SKB_FRAGS];
 };
 
+struct destruct_work {
+	struct work_struct work;
+	struct tls_context *ctx;
+};
+
 struct tls_offload_context_tx {
 	struct crypto_aead *aead_send;
 	spinlock_t lock;	/* protects records list */
@@ -174,6 +179,7 @@ struct tls_offload_context_tx {
 
 	struct scatterlist sg_tx_data[MAX_SKB_FRAGS];
 	void (*sk_destruct)(struct sock *sk);
+	struct destruct_work destruct_work;
 	u8 driver_state[] __aligned(8);
 	/* The TLS layer reserves room for driver specific state
 	 * Currently the belief is that there is not enough
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 2c004ce46887..87401852e565 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -45,10 +45,6 @@
  */
 static DECLARE_RWSEM(device_offload_lock);
 
-static void tls_device_gc_task(struct work_struct *work);
-
-static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task);
-static LIST_HEAD(tls_device_gc_list);
 static LIST_HEAD(tls_device_list);
 static LIST_HEAD(tls_device_down_list);
 static DEFINE_SPINLOCK(tls_device_lock);
@@ -67,29 +63,17 @@ static void tls_device_free_ctx(struct tls_context *ctx)
 	tls_ctx_free(NULL, ctx);
 }
 
-static void tls_device_gc_task(struct work_struct *work)
+static void tls_device_tx_del_task(struct work_struct *work)
 {
-	struct tls_context *ctx, *tmp;
-	unsigned long flags;
-	LIST_HEAD(gc_list);
-
-	spin_lock_irqsave(&tls_device_lock, flags);
-	list_splice_init(&tls_device_gc_list, &gc_list);
-	spin_unlock_irqrestore(&tls_device_lock, flags);
+	struct destruct_work *destruct_work =
+		container_of(work, struct destruct_work, work);
+	struct tls_context *ctx = destruct_work->ctx;
+	struct net_device *netdev = ctx->netdev;
 
-	list_for_each_entry_safe(ctx, tmp, &gc_list, list) {
-		struct net_device *netdev = ctx->netdev;
-
-		if (netdev && ctx->tx_conf == TLS_HW) {
-			netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
-							TLS_OFFLOAD_CTX_DIR_TX);
-			dev_put(netdev);
-			ctx->netdev = NULL;
-		}
-
-		list_del(&ctx->list);
-		tls_device_free_ctx(ctx);
-	}
+	netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_TX);
+	dev_put(netdev);
+	ctx->netdev = NULL;
+	tls_device_free_ctx(ctx);
 }
 
 static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
@@ -98,21 +82,17 @@ static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
 	bool async_cleanup;
 
 	spin_lock_irqsave(&tls_device_lock, flags);
+	list_del(&ctx->list); /* Remove from tls_device_list / tls_device_down_list */
+	spin_unlock_irqrestore(&tls_device_lock, flags);
+
 	async_cleanup = ctx->netdev && ctx->tx_conf == TLS_HW;
 	if (async_cleanup) {
-		list_move_tail(&ctx->list, &tls_device_gc_list);
+		struct tls_offload_context_tx *offload_ctx = tls_offload_ctx_tx(ctx);
 
-		/* schedule_work inside the spinlock
-		 * to make sure tls_device_down waits for that work.
-		 */
-		schedule_work(&tls_device_gc_work);
+		schedule_work(&offload_ctx->destruct_work.work);
 	} else {
-		list_del(&ctx->list);
-	}
-	spin_unlock_irqrestore(&tls_device_lock, flags);
-
-	if (!async_cleanup)
 		tls_device_free_ctx(ctx);
+	}
 }
 
 /* We assume that the socket is already connected */
@@ -1149,6 +1129,9 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 	start_marker_record->len = 0;
 	start_marker_record->num_frags = 0;
 
+	INIT_WORK(&offload_ctx->destruct_work.work, tls_device_tx_del_task);
+	offload_ctx->destruct_work.ctx = ctx;
+
 	INIT_LIST_HEAD(&offload_ctx->records_list);
 	list_add_tail(&start_marker_record->list, &offload_ctx->records_list);
 	spin_lock_init(&offload_ctx->lock);
@@ -1388,8 +1371,6 @@ static int tls_device_down(struct net_device *netdev)
 
 	up_write(&device_offload_lock);
 
-	flush_work(&tls_device_gc_work);
-
 	return NOTIFY_DONE;
 }
 
@@ -1435,6 +1416,5 @@ void __init tls_device_init(void)
 void __exit tls_device_cleanup(void)
 {
 	unregister_netdevice_notifier(&tls_dev_notifier);
-	flush_work(&tls_device_gc_work);
 	clean_acked_data_flush();
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 12/15] net/mlx5e: kTLS, Introduce TLS-specific create TIS
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 13/15] net/mlx5e: kTLS, Take stats out of OOO handler Saeed Mahameed
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

TLS TIS objects have a defined role in mapping and reaching the HW TLS
contexts.  Some standard TIS attributes (like LAG port affinity) are
not relevant for them.

Use a dedicated TLS TIS create function instead of the generic
mlx5e_create_tis.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index cc5cb3010e64..2cd0437666d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -39,16 +39,20 @@ u16 mlx5e_ktls_get_stop_room(struct mlx5_core_dev *mdev, struct mlx5e_params *pa
 	return stop_room;
 }
 
+static void mlx5e_ktls_set_tisc(struct mlx5_core_dev *mdev, void *tisc)
+{
+	MLX5_SET(tisc, tisc, tls_en, 1);
+	MLX5_SET(tisc, tisc, pd, mdev->mlx5e_res.hw_objs.pdn);
+	MLX5_SET(tisc, tisc, transport_domain, mdev->mlx5e_res.hw_objs.td.tdn);
+}
+
 static int mlx5e_ktls_create_tis(struct mlx5_core_dev *mdev, u32 *tisn)
 {
 	u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
-	void *tisc;
-
-	tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
 
-	MLX5_SET(tisc, tisc, tls_en, 1);
+	mlx5e_ktls_set_tisc(mdev, MLX5_ADDR_OF(create_tis_in, in, ctx));
 
-	return mlx5e_create_tis(mdev, in, tisn);
+	return mlx5_core_create_tis(mdev, in, tisn);
 }
 
 struct mlx5e_ktls_offload_context_tx {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 13/15] net/mlx5e: kTLS, Take stats out of OOO handler
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 12/15] net/mlx5e: kTLS, Introduce TLS-specific create TIS Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 14/15] net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections Saeed Mahameed
  2022-07-06 23:24 ` [net-next 15/15] net/mlx5e: kTLS, Dynamically re-size TX recycling pool Saeed Mahameed
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

Let the caller of mlx5e_ktls_tx_handle_ooo() take care of updating the
stats, according to the returned value.  As the switch/case blocks are
already there, this change saves unnecessary branches in the handler.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/ktls_tx.c     | 27 ++++++++-----------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 2cd0437666d2..99e1cd015083 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -382,26 +382,17 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx,
 			 int datalen,
 			 u32 seq)
 {
-	struct mlx5e_sq_stats *stats = sq->stats;
 	enum mlx5e_ktls_sync_retval ret;
 	struct tx_sync_info info = {};
-	int i = 0;
+	int i;
 
 	ret = tx_sync_info_get(priv_tx, seq, datalen, &info);
-	if (unlikely(ret != MLX5E_KTLS_SYNC_DONE)) {
-		if (ret == MLX5E_KTLS_SYNC_SKIP_NO_DATA) {
-			stats->tls_skip_no_sync_data++;
-			return MLX5E_KTLS_SYNC_SKIP_NO_DATA;
-		}
-		/* We might get here if a retransmission reaches the driver
-		 * after the relevant record is acked.
+	if (unlikely(ret != MLX5E_KTLS_SYNC_DONE))
+		/* We might get here with ret == FAIL if a retransmission
+		 * reaches the driver after the relevant record is acked.
 		 * It should be safe to drop the packet in this case
 		 */
-		stats->tls_drop_no_sync_data++;
-		goto err_out;
-	}
-
-	stats->tls_ooo++;
+		return ret;
 
 	tx_post_resync_params(sq, priv_tx, info.rcd_sn);
 
@@ -413,7 +404,7 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx,
 		return MLX5E_KTLS_SYNC_DONE;
 	}
 
-	for (; i < info.nr_frags; i++) {
+	for (i = 0; i < info.nr_frags; i++) {
 		unsigned int orig_fsz, frag_offset = 0, n = 0;
 		skb_frag_t *f = &info.frags[i];
 
@@ -483,15 +474,19 @@ bool mlx5e_ktls_handle_tx_skb(struct net_device *netdev, struct mlx5e_txqsq *sq,
 		enum mlx5e_ktls_sync_retval ret =
 			mlx5e_ktls_tx_handle_ooo(priv_tx, sq, datalen, seq);
 
+		stats->tls_ooo++;
+
 		switch (ret) {
 		case MLX5E_KTLS_SYNC_DONE:
 			break;
 		case MLX5E_KTLS_SYNC_SKIP_NO_DATA:
+			stats->tls_skip_no_sync_data++;
 			if (likely(!skb->decrypted))
 				goto out;
 			WARN_ON_ONCE(1);
-			fallthrough;
+			goto err_out;
 		case MLX5E_KTLS_SYNC_FAIL:
+			stats->tls_drop_no_sync_data++;
 			goto err_out;
 		}
 	}
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 14/15] net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 13/15] net/mlx5e: kTLS, Take stats out of OOO handler Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  2022-07-06 23:24 ` [net-next 15/15] net/mlx5e: kTLS, Dynamically re-size TX recycling pool Saeed Mahameed
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

The transport interface send (TIS) object is responsible for performing
all transport related operations of the transmit side.  The ConnectX HW
uses a TIS object to save and access the TLS crypto information and state
of an offloaded TX kTLS connection.

Before this patch, we used to create a new TIS per connection and destroy
it once it’s closed. Every create and destroy of a TIS is a FW command.

Same applies for the private TLS context, where we used to dynamically
allocate and free it per connection.

Resources recycling reduce the impact of the allocation/free operations
and helps speeding up the connection rate.

In this feature we maintain a pool of TX objects and use it to recycle
the resources instead of re-creating them per connection.

A cached TIS popped from the pool is updated to serve the new connection
via the fast-path HW interface, updating the tls static and progress
params. This is a very fast operation, significantly faster than FW
commands.

On recycling, a WQE fence is required after the context params change.
This guarantees that the data is sent after the context has been
successfully updated in hardware, and that the context modification
doesn't interfere with existing traffic.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/en_accel.h    |  10 +
 .../mellanox/mlx5/core/en_accel/ktls.h        |  14 ++
 .../mellanox/mlx5/core/en_accel/ktls_stats.c  |   2 +
 .../mellanox/mlx5/core/en_accel/ktls_tx.c     | 211 ++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   9 +
 5 files changed, 199 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
index 04c0a5e1c89a..1839f1ab1ddd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
@@ -194,4 +194,14 @@ static inline void mlx5e_accel_cleanup_rx(struct mlx5e_priv *priv)
 {
 	mlx5e_ktls_cleanup_rx(priv);
 }
+
+static inline int mlx5e_accel_init_tx(struct mlx5e_priv *priv)
+{
+	return mlx5e_ktls_init_tx(priv);
+}
+
+static inline void mlx5e_accel_cleanup_tx(struct mlx5e_priv *priv)
+{
+	mlx5e_ktls_cleanup_tx(priv);
+}
 #endif /* __MLX5E_EN_ACCEL_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
index d016624fbc9d..948400dee525 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
@@ -42,6 +42,8 @@ static inline bool mlx5e_ktls_type_check(struct mlx5_core_dev *mdev,
 }
 
 void mlx5e_ktls_build_netdev(struct mlx5e_priv *priv);
+int mlx5e_ktls_init_tx(struct mlx5e_priv *priv);
+void mlx5e_ktls_cleanup_tx(struct mlx5e_priv *priv);
 int mlx5e_ktls_init_rx(struct mlx5e_priv *priv);
 void mlx5e_ktls_cleanup_rx(struct mlx5e_priv *priv);
 int mlx5e_ktls_set_feature_rx(struct net_device *netdev, bool enable);
@@ -62,6 +64,8 @@ static inline bool mlx5e_is_ktls_rx(struct mlx5_core_dev *mdev)
 struct mlx5e_tls_sw_stats {
 	atomic64_t tx_tls_ctx;
 	atomic64_t tx_tls_del;
+	atomic64_t tx_tls_pool_alloc;
+	atomic64_t tx_tls_pool_free;
 	atomic64_t rx_tls_ctx;
 	atomic64_t rx_tls_del;
 };
@@ -69,6 +73,7 @@ struct mlx5e_tls_sw_stats {
 struct mlx5e_tls {
 	struct mlx5e_tls_sw_stats sw_stats;
 	struct workqueue_struct *rx_wq;
+	struct mlx5e_tls_tx_pool *tx_pool;
 };
 
 int mlx5e_ktls_init(struct mlx5e_priv *priv);
@@ -83,6 +88,15 @@ static inline void mlx5e_ktls_build_netdev(struct mlx5e_priv *priv)
 {
 }
 
+static inline int mlx5e_ktls_init_tx(struct mlx5e_priv *priv)
+{
+	return 0;
+}
+
+static inline void mlx5e_ktls_cleanup_tx(struct mlx5e_priv *priv)
+{
+}
+
 static inline int mlx5e_ktls_init_rx(struct mlx5e_priv *priv)
 {
 	return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_stats.c
index 2ab46c4247ff..7c1c0eb16787 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_stats.c
@@ -41,6 +41,8 @@
 static const struct counter_desc mlx5e_ktls_sw_stats_desc[] = {
 	{ MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_ctx) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_del) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_pool_alloc) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_pool_free) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, rx_tls_ctx) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, rx_tls_del) },
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 99e1cd015083..24d1288e906a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -35,6 +35,7 @@ u16 mlx5e_ktls_get_stop_room(struct mlx5_core_dev *mdev, struct mlx5e_params *pa
 	stop_room += mlx5e_stop_room_for_wqe(mdev, MLX5E_TLS_SET_STATIC_PARAMS_WQEBBS);
 	stop_room += mlx5e_stop_room_for_wqe(mdev, MLX5E_TLS_SET_PROGRESS_PARAMS_WQEBBS);
 	stop_room += num_dumps * mlx5e_stop_room_for_wqe(mdev, MLX5E_KTLS_DUMP_WQEBBS);
+	stop_room += 1; /* fence nop */
 
 	return stop_room;
 }
@@ -56,13 +57,17 @@ static int mlx5e_ktls_create_tis(struct mlx5_core_dev *mdev, u32 *tisn)
 }
 
 struct mlx5e_ktls_offload_context_tx {
-	struct tls_offload_context_tx *tx_ctx;
-	struct tls12_crypto_info_aes_gcm_128 crypto_info;
-	struct mlx5e_tls_sw_stats *sw_stats;
+	/* fast path */
 	u32 expected_seq;
 	u32 tisn;
-	u32 key_id;
 	bool ctx_post_pending;
+	/* control / resync */
+	struct list_head list_node; /* member of the pool */
+	struct tls12_crypto_info_aes_gcm_128 crypto_info;
+	struct tls_offload_context_tx *tx_ctx;
+	struct mlx5_core_dev *mdev;
+	struct mlx5e_tls_sw_stats *sw_stats;
+	u32 key_id;
 };
 
 static void
@@ -87,28 +92,136 @@ mlx5e_get_ktls_tx_priv_ctx(struct tls_context *tls_ctx)
 	return *ctx;
 }
 
+static struct mlx5e_ktls_offload_context_tx *
+mlx5e_tls_priv_tx_init(struct mlx5_core_dev *mdev, struct mlx5e_tls_sw_stats *sw_stats)
+{
+	struct mlx5e_ktls_offload_context_tx *priv_tx;
+	int err;
+
+	priv_tx = kzalloc(sizeof(*priv_tx), GFP_KERNEL);
+	if (!priv_tx)
+		return ERR_PTR(-ENOMEM);
+
+	priv_tx->mdev = mdev;
+	priv_tx->sw_stats = sw_stats;
+
+	err = mlx5e_ktls_create_tis(mdev, &priv_tx->tisn);
+	if (err) {
+		kfree(priv_tx);
+		return ERR_PTR(err);
+	}
+
+	return priv_tx;
+}
+
+static void mlx5e_tls_priv_tx_cleanup(struct mlx5e_ktls_offload_context_tx *priv_tx)
+{
+	mlx5e_destroy_tis(priv_tx->mdev, priv_tx->tisn);
+	kfree(priv_tx);
+}
+
+static void mlx5e_tls_priv_tx_list_cleanup(struct list_head *list)
+{
+	struct mlx5e_ktls_offload_context_tx *obj;
+
+	list_for_each_entry(obj, list, list_node)
+		mlx5e_tls_priv_tx_cleanup(obj);
+}
+
+/* Recycling pool API */
+
+struct mlx5e_tls_tx_pool {
+	struct mlx5_core_dev *mdev;
+	struct mlx5e_tls_sw_stats *sw_stats;
+	struct mutex lock; /* Protects access to the pool */
+	struct list_head list;
+#define MLX5E_TLS_TX_POOL_MAX_SIZE (256)
+	size_t size;
+};
+
+static struct mlx5e_tls_tx_pool *mlx5e_tls_tx_pool_init(struct mlx5_core_dev *mdev,
+							struct mlx5e_tls_sw_stats *sw_stats)
+{
+	struct mlx5e_tls_tx_pool *pool;
+
+	pool = kvzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return NULL;
+
+	INIT_LIST_HEAD(&pool->list);
+	mutex_init(&pool->lock);
+
+	pool->mdev = mdev;
+	pool->sw_stats = sw_stats;
+
+	return pool;
+}
+
+static void mlx5e_tls_tx_pool_cleanup(struct mlx5e_tls_tx_pool *pool)
+{
+	mlx5e_tls_priv_tx_list_cleanup(&pool->list);
+	atomic64_add(pool->size, &pool->sw_stats->tx_tls_pool_free);
+	kvfree(pool);
+}
+
+static void pool_push(struct mlx5e_tls_tx_pool *pool, struct mlx5e_ktls_offload_context_tx *obj)
+{
+	mutex_lock(&pool->lock);
+	if (pool->size >= MLX5E_TLS_TX_POOL_MAX_SIZE) {
+		mutex_unlock(&pool->lock);
+		mlx5e_tls_priv_tx_cleanup(obj);
+		atomic64_inc(&pool->sw_stats->tx_tls_pool_free);
+		return;
+	}
+	list_add(&obj->list_node, &pool->list);
+	pool->size++;
+	mutex_unlock(&pool->lock);
+}
+
+static struct mlx5e_ktls_offload_context_tx *pool_pop(struct mlx5e_tls_tx_pool *pool)
+{
+	struct mlx5e_ktls_offload_context_tx *obj;
+
+	mutex_lock(&pool->lock);
+	if (pool->size == 0) {
+		obj = mlx5e_tls_priv_tx_init(pool->mdev, pool->sw_stats);
+		if (!IS_ERR(obj))
+			atomic64_inc(&pool->sw_stats->tx_tls_pool_alloc);
+		goto out;
+	}
+
+	obj = list_first_entry(&pool->list, struct mlx5e_ktls_offload_context_tx,
+			       list_node);
+	list_del(&obj->list_node);
+	pool->size--;
+out:
+	mutex_unlock(&pool->lock);
+	return obj;
+}
+
+/* End of pool API */
+
 int mlx5e_ktls_add_tx(struct net_device *netdev, struct sock *sk,
 		      struct tls_crypto_info *crypto_info, u32 start_offload_tcp_sn)
 {
 	struct mlx5e_ktls_offload_context_tx *priv_tx;
+	struct mlx5e_tls_tx_pool *pool;
 	struct tls_context *tls_ctx;
-	struct mlx5_core_dev *mdev;
 	struct mlx5e_priv *priv;
 	int err;
 
 	tls_ctx = tls_get_ctx(sk);
 	priv = netdev_priv(netdev);
-	mdev = priv->mdev;
+	pool = priv->tls->tx_pool;
 
-	priv_tx = kzalloc(sizeof(*priv_tx), GFP_KERNEL);
-	if (!priv_tx)
-		return -ENOMEM;
+	priv_tx = pool_pop(pool);
+	if (IS_ERR(priv_tx))
+		return PTR_ERR(priv_tx);
 
-	err = mlx5_ktls_create_key(mdev, crypto_info, &priv_tx->key_id);
+	err = mlx5_ktls_create_key(pool->mdev, crypto_info, &priv_tx->key_id);
 	if (err)
 		goto err_create_key;
 
-	priv_tx->sw_stats = &priv->tls->sw_stats;
 	priv_tx->expected_seq = start_offload_tcp_sn;
 	priv_tx->crypto_info  =
 		*(struct tls12_crypto_info_aes_gcm_128 *)crypto_info;
@@ -116,36 +229,29 @@ int mlx5e_ktls_add_tx(struct net_device *netdev, struct sock *sk,
 
 	mlx5e_set_ktls_tx_priv_ctx(tls_ctx, priv_tx);
 
-	err = mlx5e_ktls_create_tis(mdev, &priv_tx->tisn);
-	if (err)
-		goto err_create_tis;
-
 	priv_tx->ctx_post_pending = true;
 	atomic64_inc(&priv_tx->sw_stats->tx_tls_ctx);
 
 	return 0;
 
-err_create_tis:
-	mlx5_ktls_destroy_key(mdev, priv_tx->key_id);
 err_create_key:
-	kfree(priv_tx);
+	pool_push(pool, priv_tx);
 	return err;
 }
 
 void mlx5e_ktls_del_tx(struct net_device *netdev, struct tls_context *tls_ctx)
 {
 	struct mlx5e_ktls_offload_context_tx *priv_tx;
-	struct mlx5_core_dev *mdev;
+	struct mlx5e_tls_tx_pool *pool;
 	struct mlx5e_priv *priv;
 
 	priv_tx = mlx5e_get_ktls_tx_priv_ctx(tls_ctx);
 	priv = netdev_priv(netdev);
-	mdev = priv->mdev;
+	pool = priv->tls->tx_pool;
 
 	atomic64_inc(&priv_tx->sw_stats->tx_tls_del);
-	mlx5e_destroy_tis(mdev, priv_tx->tisn);
-	mlx5_ktls_destroy_key(mdev, priv_tx->key_id);
-	kfree(priv_tx);
+	mlx5_ktls_destroy_key(priv_tx->mdev, priv_tx->key_id);
+	pool_push(pool, priv_tx);
 }
 
 static void tx_fill_wi(struct mlx5e_txqsq *sq,
@@ -206,6 +312,16 @@ post_progress_params(struct mlx5e_txqsq *sq,
 	sq->pc += num_wqebbs;
 }
 
+static void tx_post_fence_nop(struct mlx5e_txqsq *sq)
+{
+	struct mlx5_wq_cyc *wq = &sq->wq;
+	u16 pi = mlx5_wq_cyc_ctr2ix(wq, sq->pc);
+
+	tx_fill_wi(sq, pi, 1, 0, NULL);
+
+	mlx5e_post_nop_fence(wq, sq->sqn, &sq->pc);
+}
+
 static void
 mlx5e_ktls_tx_post_param_wqes(struct mlx5e_txqsq *sq,
 			      struct mlx5e_ktls_offload_context_tx *priv_tx,
@@ -217,6 +333,7 @@ mlx5e_ktls_tx_post_param_wqes(struct mlx5e_txqsq *sq,
 		post_static_params(sq, priv_tx, fence_first_post);
 
 	post_progress_params(sq, priv_tx, progress_fence);
+	tx_post_fence_nop(sq);
 }
 
 struct tx_sync_info {
@@ -309,7 +426,7 @@ tx_post_resync_params(struct mlx5e_txqsq *sq,
 }
 
 static int
-tx_post_resync_dump(struct mlx5e_txqsq *sq, skb_frag_t *frag, u32 tisn, bool first)
+tx_post_resync_dump(struct mlx5e_txqsq *sq, skb_frag_t *frag, u32 tisn)
 {
 	struct mlx5_wqe_ctrl_seg *cseg;
 	struct mlx5_wqe_data_seg *dseg;
@@ -331,7 +448,6 @@ tx_post_resync_dump(struct mlx5e_txqsq *sq, skb_frag_t *frag, u32 tisn, bool fir
 	cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8)  | MLX5_OPCODE_DUMP);
 	cseg->qpn_ds           = cpu_to_be32((sq->sqn << 8) | ds_cnt);
 	cseg->tis_tir_num      = cpu_to_be32(tisn << 8);
-	cseg->fm_ce_se         = first ? MLX5_FENCE_MODE_INITIATOR_SMALL : 0;
 
 	fsz = skb_frag_size(frag);
 	dma_addr = skb_frag_dma_map(sq->pdev, frag, 0, fsz,
@@ -366,16 +482,6 @@ void mlx5e_ktls_tx_handle_resync_dump_comp(struct mlx5e_txqsq *sq,
 	stats->tls_dump_bytes += wi->num_bytes;
 }
 
-static void tx_post_fence_nop(struct mlx5e_txqsq *sq)
-{
-	struct mlx5_wq_cyc *wq = &sq->wq;
-	u16 pi = mlx5_wq_cyc_ctr2ix(wq, sq->pc);
-
-	tx_fill_wi(sq, pi, 1, 0, NULL);
-
-	mlx5e_post_nop_fence(wq, sq->sqn, &sq->pc);
-}
-
 static enum mlx5e_ktls_sync_retval
 mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx,
 			 struct mlx5e_txqsq *sq,
@@ -396,14 +502,6 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx,
 
 	tx_post_resync_params(sq, priv_tx, info.rcd_sn);
 
-	/* If no dump WQE was sent, we need to have a fence NOP WQE before the
-	 * actual data xmit.
-	 */
-	if (!info.nr_frags) {
-		tx_post_fence_nop(sq);
-		return MLX5E_KTLS_SYNC_DONE;
-	}
-
 	for (i = 0; i < info.nr_frags; i++) {
 		unsigned int orig_fsz, frag_offset = 0, n = 0;
 		skb_frag_t *f = &info.frags[i];
@@ -411,13 +509,12 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx,
 		orig_fsz = skb_frag_size(f);
 
 		do {
-			bool fence = !(i || frag_offset);
 			unsigned int fsz;
 
 			n++;
 			fsz = min_t(unsigned int, sq->hw_mtu, orig_fsz - frag_offset);
 			skb_frag_size_set(f, fsz);
-			if (tx_post_resync_dump(sq, f, priv_tx->tisn, fence)) {
+			if (tx_post_resync_dump(sq, f, priv_tx->tisn)) {
 				page_ref_add(skb_frag_page(f), n - 1);
 				goto err_out;
 			}
@@ -465,9 +562,8 @@ bool mlx5e_ktls_handle_tx_skb(struct net_device *netdev, struct mlx5e_txqsq *sq,
 
 	priv_tx = mlx5e_get_ktls_tx_priv_ctx(tls_ctx);
 
-	if (unlikely(mlx5e_ktls_tx_offload_test_and_clear_pending(priv_tx))) {
+	if (unlikely(mlx5e_ktls_tx_offload_test_and_clear_pending(priv_tx)))
 		mlx5e_ktls_tx_post_param_wqes(sq, priv_tx, false, false);
-	}
 
 	seq = ntohl(tcp_hdr(skb)->seq);
 	if (unlikely(priv_tx->expected_seq != seq)) {
@@ -505,3 +601,24 @@ bool mlx5e_ktls_handle_tx_skb(struct net_device *netdev, struct mlx5e_txqsq *sq,
 	dev_kfree_skb_any(skb);
 	return false;
 }
+
+int mlx5e_ktls_init_tx(struct mlx5e_priv *priv)
+{
+	if (!mlx5e_is_ktls_tx(priv->mdev))
+		return 0;
+
+	priv->tls->tx_pool = mlx5e_tls_tx_pool_init(priv->mdev, &priv->tls->sw_stats);
+	if (!priv->tls->tx_pool)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void mlx5e_ktls_cleanup_tx(struct mlx5e_priv *priv)
+{
+	if (!mlx5e_is_ktls_tx(priv->mdev))
+		return;
+
+	mlx5e_tls_tx_pool_cleanup(priv->tls->tx_pool);
+	priv->tls->tx_pool = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 087952b84ccb..eb18cd459c7c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3135,6 +3135,7 @@ int mlx5e_create_tises(struct mlx5e_priv *priv)
 
 static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
 {
+	mlx5e_accel_cleanup_tx(priv);
 	mlx5e_destroy_tises(priv);
 }
 
@@ -5120,8 +5121,16 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 		return err;
 	}
 
+	err = mlx5e_accel_init_tx(priv);
+	if (err)
+		goto err_destroy_tises;
+
 	mlx5e_dcbnl_initialize(priv);
 	return 0;
+
+err_destroy_tises:
+	mlx5e_destroy_tises(priv);
+	return err;
 }
 
 static void mlx5e_nic_enable(struct mlx5e_priv *priv)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [net-next 15/15] net/mlx5e: kTLS, Dynamically re-size TX recycling pool
  2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2022-07-06 23:24 ` [net-next 14/15] net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections Saeed Mahameed
@ 2022-07-06 23:24 ` Saeed Mahameed
  14 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-06 23:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Tariq Toukan <tariqt@nvidia.com>

Let the TLS TX recycle pool be more flexible in size, by continuously
and dynamically allocating and releasing HW resources in response to
changes in the connections rate and load.

Allocate and release pool entries in bulks (16). Use a workqueue to
release/allocate in the background. Allocate a new bulk when the pool
size goes lower than the low threshold (1K). Symmetric operation is done
when the pool size gets greater than the upper threshold (4K).

Every idle pool entry holds: 1 TIS, 1 DEK (HW resources), in addition to
~100 bytes in host memory.

Start with an empty pool to minimize memory and HW resources waste for
non-TLS users that have the device-offload TLS enabled.

Upon a new request, in case the pool is empty, do not wait for a whole bulk
allocation to complete.  Instead, trigger an instant allocation of a single
resource to reduce latency.

Performance tests:
Before: 11,684 CPS
After:  16,556 CPS

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/ktls_tx.c     | 315 ++++++++++++++++--
 1 file changed, 289 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 24d1288e906a..fc8860012a18 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -56,6 +56,36 @@ static int mlx5e_ktls_create_tis(struct mlx5_core_dev *mdev, u32 *tisn)
 	return mlx5_core_create_tis(mdev, in, tisn);
 }
 
+static int mlx5e_ktls_create_tis_cb(struct mlx5_core_dev *mdev,
+				    struct mlx5_async_ctx *async_ctx,
+				    u32 *out, int outlen,
+				    mlx5_async_cbk_t callback,
+				    struct mlx5_async_work *context)
+{
+	u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
+
+	mlx5e_ktls_set_tisc(mdev, MLX5_ADDR_OF(create_tis_in, in, ctx));
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+
+	return mlx5_cmd_exec_cb(async_ctx, in, sizeof(in),
+				out, outlen, callback, context);
+}
+
+static int mlx5e_ktls_destroy_tis_cb(struct mlx5_core_dev *mdev, u32 tisn,
+				     struct mlx5_async_ctx *async_ctx,
+				     u32 *out, int outlen,
+				     mlx5_async_cbk_t callback,
+				     struct mlx5_async_work *context)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_tis_in)] = {};
+
+	MLX5_SET(destroy_tis_in, in, opcode, MLX5_CMD_OP_DESTROY_TIS);
+	MLX5_SET(destroy_tis_in, in, tisn, tisn);
+
+	return mlx5_cmd_exec_cb(async_ctx, in, sizeof(in),
+				out, outlen, callback, context);
+}
+
 struct mlx5e_ktls_offload_context_tx {
 	/* fast path */
 	u32 expected_seq;
@@ -68,6 +98,7 @@ struct mlx5e_ktls_offload_context_tx {
 	struct mlx5_core_dev *mdev;
 	struct mlx5e_tls_sw_stats *sw_stats;
 	u32 key_id;
+	u8 create_err : 1;
 };
 
 static void
@@ -92,8 +123,81 @@ mlx5e_get_ktls_tx_priv_ctx(struct tls_context *tls_ctx)
 	return *ctx;
 }
 
+/* struct for callback API management */
+struct mlx5e_async_ctx {
+	struct mlx5_async_work context;
+	struct mlx5_async_ctx async_ctx;
+	struct work_struct work;
+	struct mlx5e_ktls_offload_context_tx *priv_tx;
+	struct completion complete;
+	int err;
+	union {
+		u32 out_create[MLX5_ST_SZ_DW(create_tis_out)];
+		u32 out_destroy[MLX5_ST_SZ_DW(destroy_tis_out)];
+	};
+};
+
+static struct mlx5e_async_ctx *mlx5e_bulk_async_init(struct mlx5_core_dev *mdev, int n)
+{
+	struct mlx5e_async_ctx *bulk_async;
+	int i;
+
+	bulk_async = kvcalloc(n, sizeof(struct mlx5e_async_ctx), GFP_KERNEL);
+	if (!bulk_async)
+		return NULL;
+
+	for (i = 0; i < n; i++) {
+		struct mlx5e_async_ctx *async = &bulk_async[i];
+
+		mlx5_cmd_init_async_ctx(mdev, &async->async_ctx);
+		init_completion(&async->complete);
+	}
+
+	return bulk_async;
+}
+
+static void mlx5e_bulk_async_cleanup(struct mlx5e_async_ctx *bulk_async, int n)
+{
+	int i;
+
+	for (i = 0; i < n; i++) {
+		struct mlx5e_async_ctx *async = &bulk_async[i];
+
+		mlx5_cmd_cleanup_async_ctx(&async->async_ctx);
+	}
+	kvfree(bulk_async);
+}
+
+static void create_tis_callback(int status, struct mlx5_async_work *context)
+{
+	struct mlx5e_async_ctx *async =
+		container_of(context, struct mlx5e_async_ctx, context);
+	struct mlx5e_ktls_offload_context_tx *priv_tx = async->priv_tx;
+
+	if (status) {
+		async->err = status;
+		priv_tx->create_err = 1;
+		goto out;
+	}
+
+	priv_tx->tisn = MLX5_GET(create_tis_out, async->out_create, tisn);
+out:
+	complete(&async->complete);
+}
+
+static void destroy_tis_callback(int status, struct mlx5_async_work *context)
+{
+	struct mlx5e_async_ctx *async =
+		container_of(context, struct mlx5e_async_ctx, context);
+	struct mlx5e_ktls_offload_context_tx *priv_tx = async->priv_tx;
+
+	complete(&async->complete);
+	kfree(priv_tx);
+}
+
 static struct mlx5e_ktls_offload_context_tx *
-mlx5e_tls_priv_tx_init(struct mlx5_core_dev *mdev, struct mlx5e_tls_sw_stats *sw_stats)
+mlx5e_tls_priv_tx_init(struct mlx5_core_dev *mdev, struct mlx5e_tls_sw_stats *sw_stats,
+		       struct mlx5e_async_ctx *async)
 {
 	struct mlx5e_ktls_offload_context_tx *priv_tx;
 	int err;
@@ -105,76 +209,229 @@ mlx5e_tls_priv_tx_init(struct mlx5_core_dev *mdev, struct mlx5e_tls_sw_stats *sw
 	priv_tx->mdev = mdev;
 	priv_tx->sw_stats = sw_stats;
 
-	err = mlx5e_ktls_create_tis(mdev, &priv_tx->tisn);
-	if (err) {
-		kfree(priv_tx);
-		return ERR_PTR(err);
+	if (!async) {
+		err = mlx5e_ktls_create_tis(mdev, &priv_tx->tisn);
+		if (err)
+			goto err_out;
+	} else {
+		async->priv_tx = priv_tx;
+		err = mlx5e_ktls_create_tis_cb(mdev, &async->async_ctx,
+					       async->out_create, sizeof(async->out_create),
+					       create_tis_callback, &async->context);
+		if (err)
+			goto err_out;
 	}
 
 	return priv_tx;
+
+err_out:
+	kfree(priv_tx);
+	return ERR_PTR(err);
 }
 
-static void mlx5e_tls_priv_tx_cleanup(struct mlx5e_ktls_offload_context_tx *priv_tx)
+static void mlx5e_tls_priv_tx_cleanup(struct mlx5e_ktls_offload_context_tx *priv_tx,
+				      struct mlx5e_async_ctx *async)
 {
-	mlx5e_destroy_tis(priv_tx->mdev, priv_tx->tisn);
-	kfree(priv_tx);
+	if (priv_tx->create_err) {
+		complete(&async->complete);
+		kfree(priv_tx);
+		return;
+	}
+	async->priv_tx = priv_tx;
+	mlx5e_ktls_destroy_tis_cb(priv_tx->mdev, priv_tx->tisn,
+				  &async->async_ctx,
+				  async->out_destroy, sizeof(async->out_destroy),
+				  destroy_tis_callback, &async->context);
 }
 
-static void mlx5e_tls_priv_tx_list_cleanup(struct list_head *list)
+static void mlx5e_tls_priv_tx_list_cleanup(struct mlx5_core_dev *mdev,
+					   struct list_head *list, int size)
 {
 	struct mlx5e_ktls_offload_context_tx *obj;
+	struct mlx5e_async_ctx *bulk_async;
+	int i;
+
+	bulk_async = mlx5e_bulk_async_init(mdev, size);
+	if (!bulk_async)
+		return;
 
-	list_for_each_entry(obj, list, list_node)
-		mlx5e_tls_priv_tx_cleanup(obj);
+	i = 0;
+	list_for_each_entry(obj, list, list_node) {
+		mlx5e_tls_priv_tx_cleanup(obj, &bulk_async[i]);
+		i++;
+	}
+
+	for (i = 0; i < size; i++) {
+		struct mlx5e_async_ctx *async = &bulk_async[i];
+
+		wait_for_completion(&async->complete);
+	}
+	mlx5e_bulk_async_cleanup(bulk_async, size);
 }
 
 /* Recycling pool API */
 
+#define MLX5E_TLS_TX_POOL_BULK (16)
+#define MLX5E_TLS_TX_POOL_HIGH (4 * 1024)
+#define MLX5E_TLS_TX_POOL_LOW (MLX5E_TLS_TX_POOL_HIGH / 4)
+
 struct mlx5e_tls_tx_pool {
 	struct mlx5_core_dev *mdev;
 	struct mlx5e_tls_sw_stats *sw_stats;
 	struct mutex lock; /* Protects access to the pool */
 	struct list_head list;
-#define MLX5E_TLS_TX_POOL_MAX_SIZE (256)
 	size_t size;
+
+	struct workqueue_struct *wq;
+	struct work_struct create_work;
+	struct work_struct destroy_work;
 };
 
+static void create_work(struct work_struct *work)
+{
+	struct mlx5e_tls_tx_pool *pool =
+		container_of(work, struct mlx5e_tls_tx_pool, create_work);
+	struct mlx5e_ktls_offload_context_tx *obj;
+	struct mlx5e_async_ctx *bulk_async;
+	LIST_HEAD(local_list);
+	int i, j, err = 0;
+
+	bulk_async = mlx5e_bulk_async_init(pool->mdev, MLX5E_TLS_TX_POOL_BULK);
+	if (!bulk_async)
+		return;
+
+	for (i = 0; i < MLX5E_TLS_TX_POOL_BULK; i++) {
+		obj = mlx5e_tls_priv_tx_init(pool->mdev, pool->sw_stats, &bulk_async[i]);
+		if (IS_ERR(obj)) {
+			err = PTR_ERR(obj);
+			break;
+		}
+		list_add(&obj->list_node, &local_list);
+	}
+
+	for (j = 0; j < i; j++) {
+		struct mlx5e_async_ctx *async = &bulk_async[j];
+
+		wait_for_completion(&async->complete);
+		if (!err && async->err)
+			err = async->err;
+	}
+	atomic64_add(i, &pool->sw_stats->tx_tls_pool_alloc);
+	mlx5e_bulk_async_cleanup(bulk_async, MLX5E_TLS_TX_POOL_BULK);
+	if (err)
+		goto err_out;
+
+	mutex_lock(&pool->lock);
+	if (pool->size + MLX5E_TLS_TX_POOL_BULK >= MLX5E_TLS_TX_POOL_HIGH) {
+		mutex_unlock(&pool->lock);
+		goto err_out;
+	}
+	list_splice(&local_list, &pool->list);
+	pool->size += MLX5E_TLS_TX_POOL_BULK;
+	if (pool->size <= MLX5E_TLS_TX_POOL_LOW)
+		queue_work(pool->wq, work);
+	mutex_unlock(&pool->lock);
+	return;
+
+err_out:
+	mlx5e_tls_priv_tx_list_cleanup(pool->mdev, &local_list, i);
+	atomic64_add(i, &pool->sw_stats->tx_tls_pool_free);
+}
+
+static void destroy_work(struct work_struct *work)
+{
+	struct mlx5e_tls_tx_pool *pool =
+		container_of(work, struct mlx5e_tls_tx_pool, destroy_work);
+	struct mlx5e_ktls_offload_context_tx *obj;
+	LIST_HEAD(local_list);
+	int i = 0;
+
+	mutex_lock(&pool->lock);
+	if (pool->size < MLX5E_TLS_TX_POOL_HIGH) {
+		mutex_unlock(&pool->lock);
+		return;
+	}
+
+	list_for_each_entry(obj, &pool->list, list_node)
+		if (++i == MLX5E_TLS_TX_POOL_BULK)
+			break;
+
+	list_cut_position(&local_list, &pool->list, &obj->list_node);
+	pool->size -= MLX5E_TLS_TX_POOL_BULK;
+	if (pool->size >= MLX5E_TLS_TX_POOL_HIGH)
+		queue_work(pool->wq, work);
+	mutex_unlock(&pool->lock);
+
+	mlx5e_tls_priv_tx_list_cleanup(pool->mdev, &local_list, MLX5E_TLS_TX_POOL_BULK);
+	atomic64_add(MLX5E_TLS_TX_POOL_BULK, &pool->sw_stats->tx_tls_pool_free);
+}
+
 static struct mlx5e_tls_tx_pool *mlx5e_tls_tx_pool_init(struct mlx5_core_dev *mdev,
 							struct mlx5e_tls_sw_stats *sw_stats)
 {
 	struct mlx5e_tls_tx_pool *pool;
 
+	BUILD_BUG_ON(MLX5E_TLS_TX_POOL_LOW + MLX5E_TLS_TX_POOL_BULK >= MLX5E_TLS_TX_POOL_HIGH);
+
 	pool = kvzalloc(sizeof(*pool), GFP_KERNEL);
 	if (!pool)
 		return NULL;
 
+	pool->wq = create_singlethread_workqueue("mlx5e_tls_tx_pool");
+	if (!pool->wq)
+		goto err_free;
+
 	INIT_LIST_HEAD(&pool->list);
 	mutex_init(&pool->lock);
 
+	INIT_WORK(&pool->create_work, create_work);
+	INIT_WORK(&pool->destroy_work, destroy_work);
+
 	pool->mdev = mdev;
 	pool->sw_stats = sw_stats;
 
 	return pool;
+
+err_free:
+	kvfree(pool);
+	return NULL;
+}
+
+static void mlx5e_tls_tx_pool_list_cleanup(struct mlx5e_tls_tx_pool *pool)
+{
+	while (pool->size > MLX5E_TLS_TX_POOL_BULK) {
+		struct mlx5e_ktls_offload_context_tx *obj;
+		LIST_HEAD(local_list);
+		int i = 0;
+
+		list_for_each_entry(obj, &pool->list, list_node)
+			if (++i == MLX5E_TLS_TX_POOL_BULK)
+				break;
+
+		list_cut_position(&local_list, &pool->list, &obj->list_node);
+		mlx5e_tls_priv_tx_list_cleanup(pool->mdev, &local_list, MLX5E_TLS_TX_POOL_BULK);
+		atomic64_add(MLX5E_TLS_TX_POOL_BULK, &pool->sw_stats->tx_tls_pool_free);
+		pool->size -= MLX5E_TLS_TX_POOL_BULK;
+	}
+	if (pool->size) {
+		mlx5e_tls_priv_tx_list_cleanup(pool->mdev, &pool->list, pool->size);
+		atomic64_add(pool->size, &pool->sw_stats->tx_tls_pool_free);
+	}
 }
 
 static void mlx5e_tls_tx_pool_cleanup(struct mlx5e_tls_tx_pool *pool)
 {
-	mlx5e_tls_priv_tx_list_cleanup(&pool->list);
-	atomic64_add(pool->size, &pool->sw_stats->tx_tls_pool_free);
+	mlx5e_tls_tx_pool_list_cleanup(pool);
+	destroy_workqueue(pool->wq);
 	kvfree(pool);
 }
 
 static void pool_push(struct mlx5e_tls_tx_pool *pool, struct mlx5e_ktls_offload_context_tx *obj)
 {
 	mutex_lock(&pool->lock);
-	if (pool->size >= MLX5E_TLS_TX_POOL_MAX_SIZE) {
-		mutex_unlock(&pool->lock);
-		mlx5e_tls_priv_tx_cleanup(obj);
-		atomic64_inc(&pool->sw_stats->tx_tls_pool_free);
-		return;
-	}
 	list_add(&obj->list_node, &pool->list);
-	pool->size++;
+	if (++pool->size == MLX5E_TLS_TX_POOL_HIGH)
+		queue_work(pool->wq, &pool->destroy_work);
 	mutex_unlock(&pool->lock);
 }
 
@@ -183,18 +440,24 @@ static struct mlx5e_ktls_offload_context_tx *pool_pop(struct mlx5e_tls_tx_pool *
 	struct mlx5e_ktls_offload_context_tx *obj;
 
 	mutex_lock(&pool->lock);
-	if (pool->size == 0) {
-		obj = mlx5e_tls_priv_tx_init(pool->mdev, pool->sw_stats);
+	if (unlikely(pool->size == 0)) {
+		/* pool is empty:
+		 * - trigger the populating work, and
+		 * - serve the current context via the regular blocking api.
+		 */
+		queue_work(pool->wq, &pool->create_work);
+		mutex_unlock(&pool->lock);
+		obj = mlx5e_tls_priv_tx_init(pool->mdev, pool->sw_stats, NULL);
 		if (!IS_ERR(obj))
 			atomic64_inc(&pool->sw_stats->tx_tls_pool_alloc);
-		goto out;
+		return obj;
 	}
 
 	obj = list_first_entry(&pool->list, struct mlx5e_ktls_offload_context_tx,
 			       list_node);
 	list_del(&obj->list_node);
-	pool->size--;
-out:
+	if (--pool->size == MLX5E_TLS_TX_POOL_LOW)
+		queue_work(pool->wq, &pool->create_work);
 	mutex_unlock(&pool->lock);
 	return obj;
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible
  2022-07-06 23:24 ` [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible Saeed Mahameed
@ 2022-07-07  2:21   ` Jakub Kicinski
  2022-07-07  6:51     ` Saeed Mahameed
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2022-07-07  2:21 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Maxim Mikityanskiy

On Wed,  6 Jul 2022 16:24:16 -0700 Saeed Mahameed wrote:
> From: Tariq Toukan <tariqt@nvidia.com>
> 
> TLS context destructor can be run in atomic context. Cleanup operations
> for device-offloaded contexts could require access and interaction with
> the device callbacks, which might sleep. Hence, the cleanup of such
> contexts must be deferred and completed inside an async work.
> 
> For all others, this is not necessary, as cleanup is atomic. Invoke
> cleanup immediately for them, avoiding queueuing redundant gc work.
> 
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Not sure if posting core patches as part of driver PRs is a good idea,
if I ack this now the tag will not propagate.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del
  2022-07-06 23:24 ` [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del Saeed Mahameed
@ 2022-07-07  2:37   ` Jakub Kicinski
  2022-07-07 22:14     ` Tariq Toukan
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2022-07-07  2:37 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Maxim Mikityanskiy

On Wed,  6 Jul 2022 16:24:17 -0700 Saeed Mahameed wrote:
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 4fc16ca5f469..c4be74635502 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -163,6 +163,11 @@ struct tls_record_info {
>  	skb_frag_t frags[MAX_SKB_FRAGS];
>  };
>  
> +struct destruct_work {
> +	struct work_struct work;
> +	struct tls_context *ctx;

Pretty strange to bundle the back-pointer with the work.
Why not put it directly in struct tls_offload_context_tx?

Also now that we have the backpointer, can we move the list member of
struct tls_context to the offload context? (I haven't checked if its
used in other places)

>  
>  	up_write(&device_offload_lock);
>  
> -	flush_work(&tls_device_gc_work);
> -
>  	return NOTIFY_DONE;
>  }
>  
> @@ -1435,6 +1416,5 @@ void __init tls_device_init(void)
>  void __exit tls_device_cleanup(void)
>  {
>  	unregister_netdevice_notifier(&tls_dev_notifier);
> -	flush_work(&tls_device_gc_work);
>  	clean_acked_data_flush();
>  }

Why don't we need the flush any more? The module reference is gone as
soon as destructor runs (i.e. on ULP cleanup), the work can still be
pending, no?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible
  2022-07-07  2:21   ` Jakub Kicinski
@ 2022-07-07  6:51     ` Saeed Mahameed
  2022-07-07 16:14       ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-07  6:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Maxim Mikityanskiy

On 06 Jul 19:21, Jakub Kicinski wrote:
>On Wed,  6 Jul 2022 16:24:16 -0700 Saeed Mahameed wrote:
>> From: Tariq Toukan <tariqt@nvidia.com>
>>
>> TLS context destructor can be run in atomic context. Cleanup operations
>> for device-offloaded contexts could require access and interaction with
>> the device callbacks, which might sleep. Hence, the cleanup of such
>> contexts must be deferred and completed inside an async work.
>>
>> For all others, this is not necessary, as cleanup is atomic. Invoke
>> cleanup immediately for them, avoiding queueuing redundant gc work.
>>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>
>Not sure if posting core patches as part of driver PRs is a good idea,
>if I ack this now the tag will not propagate.

I agree, how about the devlink lock removal  ? same thing ? 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible
  2022-07-07  6:51     ` Saeed Mahameed
@ 2022-07-07 16:14       ` Jakub Kicinski
  2022-07-07 17:29         ` Saeed Mahameed
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2022-07-07 16:14 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Maxim Mikityanskiy

On Wed, 6 Jul 2022 23:51:14 -0700 Saeed Mahameed wrote:
> On 06 Jul 19:21, Jakub Kicinski wrote:
> >On Wed,  6 Jul 2022 16:24:16 -0700 Saeed Mahameed wrote:  
> >> From: Tariq Toukan <tariqt@nvidia.com>
> >>
> >> TLS context destructor can be run in atomic context. Cleanup operations
> >> for device-offloaded contexts could require access and interaction with
> >> the device callbacks, which might sleep. Hence, the cleanup of such
> >> contexts must be deferred and completed inside an async work.
> >>
> >> For all others, this is not necessary, as cleanup is atomic. Invoke
> >> cleanup immediately for them, avoiding queueuing redundant gc work.
> >>
> >> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> >> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> >> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>  
> >
> >Not sure if posting core patches as part of driver PRs is a good idea,
> >if I ack this now the tag will not propagate.  
> 
> I agree, how about the devlink lock removal  ? same thing ? 

I didn't have the same reaction to the devlink part, perhaps because
of the clear driver dependency there and the fact we discussed that 
work thoroughly before.

Looking at it again it seems like the problem is that these are really
two independent series squashed together, no? Multiple driver features
mixed up in a series is fine but when changing the core let's stick to
clearer separation.

The objective is to get reviewers engaged, and it's really easy to miss
the core changes among the driver ones in a large multi-purpose series.

On the topic of PRs, does it matter to you if the core changes are
posted as a PR? I presume it's okay for those to come out as a normal
series with a proper subject and applied from the list?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible
  2022-07-07 16:14       ` Jakub Kicinski
@ 2022-07-07 17:29         ` Saeed Mahameed
  0 siblings, 0 replies; 25+ messages in thread
From: Saeed Mahameed @ 2022-07-07 17:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Maxim Mikityanskiy

On 07 Jul 09:14, Jakub Kicinski wrote:
>On Wed, 6 Jul 2022 23:51:14 -0700 Saeed Mahameed wrote:
>> On 06 Jul 19:21, Jakub Kicinski wrote:
>> >On Wed,  6 Jul 2022 16:24:16 -0700 Saeed Mahameed wrote:
>> >> From: Tariq Toukan <tariqt@nvidia.com>
>> >>
>> >> TLS context destructor can be run in atomic context. Cleanup operations
>> >> for device-offloaded contexts could require access and interaction with
>> >> the device callbacks, which might sleep. Hence, the cleanup of such
>> >> contexts must be deferred and completed inside an async work.
>> >>
>> >> For all others, this is not necessary, as cleanup is atomic. Invoke
>> >> cleanup immediately for them, avoiding queueuing redundant gc work.
>> >>
>> >> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> >> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> >> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>> >
>> >Not sure if posting core patches as part of driver PRs is a good idea,
>> >if I ack this now the tag will not propagate.
>>
>> I agree, how about the devlink lock removal  ? same thing ?
>
>I didn't have the same reaction to the devlink part, perhaps because
>of the clear driver dependency there and the fact we discussed that
>work thoroughly before.
>
>Looking at it again it seems like the problem is that these are really
>two independent series squashed together, no? Multiple driver features
>mixed up in a series is fine but when changing the core let's stick to
>clearer separation.
>
>The objective is to get reviewers engaged, and it's really easy to miss
>the core changes among the driver ones in a large multi-purpose series.
>

I see, i will make the separation when I have core patches.

>On the topic of PRs, does it matter to you if the core changes are
>posted as a PR? I presume it's okay for those to come out as a normal
>series with a proper subject and applied from the list?
No i don't mind that at all, it just means that some patchsets will have
to go different path than what i have for mlx5, not a big deal


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del
  2022-07-07  2:37   ` Jakub Kicinski
@ 2022-07-07 22:14     ` Tariq Toukan
  2022-07-08  0:17       ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Tariq Toukan @ 2022-07-07 22:14 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Maxim Mikityanskiy



On 7/7/2022 5:37 AM, Jakub Kicinski wrote:
> On Wed,  6 Jul 2022 16:24:17 -0700 Saeed Mahameed wrote:
>> diff --git a/include/net/tls.h b/include/net/tls.h
>> index 4fc16ca5f469..c4be74635502 100644
>> --- a/include/net/tls.h
>> +++ b/include/net/tls.h
>> @@ -163,6 +163,11 @@ struct tls_record_info {
>>   	skb_frag_t frags[MAX_SKB_FRAGS];
>>   };
>>   
>> +struct destruct_work {
>> +	struct work_struct work;
>> +	struct tls_context *ctx;
> 
> Pretty strange to bundle the back-pointer with the work.
> Why not put it directly in struct tls_offload_context_tx?
> 

I can put them directly under struct tls_offload_context_tx.
No strong reason, I just followed the code reference in 
include/net/tls.h :: struct tx_work.
I'll change it and respin.

> Also now that we have the backpointer, can we move the list member of
> struct tls_context to the offload context? (I haven't checked if its
> used in other places)

I'll check and move if possible.

> 
>>   
>>   	up_write(&device_offload_lock);
>>   
>> -	flush_work(&tls_device_gc_work);
>> -
>>   	return NOTIFY_DONE;
>>   }
>>   
>> @@ -1435,6 +1416,5 @@ void __init tls_device_init(void)
>>   void __exit tls_device_cleanup(void)
>>   {
>>   	unregister_netdevice_notifier(&tls_dev_notifier);
>> -	flush_work(&tls_device_gc_work);
>>   	clean_acked_data_flush();
>>   }
> 
> Why don't we need the flush any more? The module reference is gone as
> soon as destructor runs (i.e. on ULP cleanup), the work can still be
> pending, no?

So this garbage collector work does not exist anymore. Replaced by 
per-context works, with no accessibility to them from this function.
It seems that we need to guarantee completion of these works. Maybe by 
queuing them to a new dedicated queue, flushing it here.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del
  2022-07-07 22:14     ` Tariq Toukan
@ 2022-07-08  0:17       ` Jakub Kicinski
  2022-07-08 13:10         ` Maxim Mikityanskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2022-07-08  0:17 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Maxim Mikityanskiy

On Fri, 8 Jul 2022 01:14:32 +0300 Tariq Toukan wrote:
> > Why don't we need the flush any more? The module reference is gone as
> > soon as destructor runs (i.e. on ULP cleanup), the work can still be
> > pending, no?  
> 
> So this garbage collector work does not exist anymore. Replaced by 
> per-context works, with no accessibility to them from this function.
> It seems that we need to guarantee completion of these works. Maybe by 
> queuing them to a new dedicated queue, flushing it here.

Yup, SG.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del
  2022-07-08  0:17       ` Jakub Kicinski
@ 2022-07-08 13:10         ` Maxim Mikityanskiy
  2022-07-08 18:10           ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Maxim Mikityanskiy @ 2022-07-08 13:10 UTC (permalink / raw)
  To: ttoukan.linux, kuba
  Cc: Tariq Toukan, Saeed Mahameed, edumazet, davem, saeed, pabeni, netdev

On Thu, 2022-07-07 at 17:17 -0700, Jakub Kicinski wrote:
> On Fri, 8 Jul 2022 01:14:32 +0300 Tariq Toukan wrote:
> > > Why don't we need the flush any more? The module reference is gone as
> > > soon as destructor runs (i.e. on ULP cleanup), the work can still be
> > > pending, no?  

Is this an issue? The work doesn't seem to access any module-level
objects like tls_device_gc_list anymore. Did I miss anything?

> > 
> > So this garbage collector work does not exist anymore. Replaced by 
> > per-context works, with no accessibility to them from this function.
> > It seems that we need to guarantee completion of these works. Maybe by 
> > queuing them to a new dedicated queue, flushing it here.
> 
> Yup, SG.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del
  2022-07-08 13:10         ` Maxim Mikityanskiy
@ 2022-07-08 18:10           ` Jakub Kicinski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2022-07-08 18:10 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: ttoukan.linux, Tariq Toukan, Saeed Mahameed, edumazet, davem,
	saeed, pabeni, netdev

On Fri, 8 Jul 2022 13:10:38 +0000 Maxim Mikityanskiy wrote:
> On Thu, 2022-07-07 at 17:17 -0700, Jakub Kicinski wrote:
> > On Fri, 8 Jul 2022 01:14:32 +0300 Tariq Toukan wrote:  
> > > > Why don't we need the flush any more? The module reference is gone as
> > > > soon as destructor runs (i.e. on ULP cleanup), the work can still be
> > > > pending, no?    
> 
> Is this an issue? The work doesn't seem to access any module-level
> objects like tls_device_gc_list anymore. Did I miss anything?

The function itself is still in the module's text, right?

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-07-08 18:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 23:24 [pull request][net-next 00/15] mlx5 updates 2022-07-06 Saeed Mahameed
2022-07-06 23:24 ` [net-next 01/15] net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter Saeed Mahameed
2022-07-06 23:24 ` [net-next 02/15] net/mlx5: Use devl_ API for rate nodes destroy Saeed Mahameed
2022-07-06 23:24 ` [net-next 03/15] devlink: Remove unused function devlink_rate_nodes_destroy Saeed Mahameed
2022-07-06 23:24 ` [net-next 04/15] net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register Saeed Mahameed
2022-07-06 23:24 ` [net-next 05/15] net/mlx5: Use devl_ API in mlx5_esw_devlink_sf_port_register Saeed Mahameed
2022-07-06 23:24 ` [net-next 06/15] devlink: Remove unused functions devlink_rate_leaf_create/destroy Saeed Mahameed
2022-07-06 23:24 ` [net-next 07/15] net/mlx5: Use devl_ API in mlx5e_devlink_port_register Saeed Mahameed
2022-07-06 23:24 ` [net-next 08/15] net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set Saeed Mahameed
2022-07-06 23:24 ` [net-next 09/15] devlink: Hold the instance lock in port_new / port_del callbacks Saeed Mahameed
2022-07-06 23:24 ` [net-next 10/15] net/tls: Perform immediate device ctx cleanup when possible Saeed Mahameed
2022-07-07  2:21   ` Jakub Kicinski
2022-07-07  6:51     ` Saeed Mahameed
2022-07-07 16:14       ` Jakub Kicinski
2022-07-07 17:29         ` Saeed Mahameed
2022-07-06 23:24 ` [net-next 11/15] net/tls: Multi-threaded calls to TX tls_dev_del Saeed Mahameed
2022-07-07  2:37   ` Jakub Kicinski
2022-07-07 22:14     ` Tariq Toukan
2022-07-08  0:17       ` Jakub Kicinski
2022-07-08 13:10         ` Maxim Mikityanskiy
2022-07-08 18:10           ` Jakub Kicinski
2022-07-06 23:24 ` [net-next 12/15] net/mlx5e: kTLS, Introduce TLS-specific create TIS Saeed Mahameed
2022-07-06 23:24 ` [net-next 13/15] net/mlx5e: kTLS, Take stats out of OOO handler Saeed Mahameed
2022-07-06 23:24 ` [net-next 14/15] net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections Saeed Mahameed
2022-07-06 23:24 ` [net-next 15/15] net/mlx5e: kTLS, Dynamically re-size TX recycling pool Saeed Mahameed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.