All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mlx5-next 0/5] Improve mlx5 live migration driver
@ 2022-04-27  9:31 Yishai Hadas
  2022-04-27  9:31 ` [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code Yishai Hadas
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Yishai Hadas @ 2022-04-27  9:31 UTC (permalink / raw)
  To: alex.williamson, jgg, saeedm
  Cc: kvm, netdev, kuba, leonro, yishaih, maorg, cohuck

This series improves mlx5 live migration driver in few aspects as of
below.

Refactor to enable running migration commands in parallel over the PF
command interface.

To achieve that we exposed from mlx5_core an API to let the VF be
notified before that the PF command interface goes down/up. (e.g. PF
reload upon health recovery).

Once having the above functionality in place mlx5 vfio doesn't need any
more to obtain the global PF lock upon using the command interface but
can rely on the above mechanism to be in sync with the PF.

This can enable parallel VFs migration over the PF command interface
from kernel driver point of view.

In addition,
Moved to use the PF async command mode for the SAVE state command.
This enables returning earlier to user space upon issuing successfully
the command and improve latency by let things run in parallel.

Alex, as this series touches mlx5_core we may need to send this in a
pull request format to VFIO to avoid conflicts before acceptance.

Yishai

Yishai Hadas (5):
  vfio/mlx5: Reorganize the VF is migratable code
  net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister
    APIs
  vfio/mlx5: Manage the VF attach/detach callback from the PF
  vfio/mlx5: Refactor to enable VFs migration in parallel
  vfio/mlx5: Run the SAVE state command in an async mode

 .../net/ethernet/mellanox/mlx5/core/sriov.c   |  65 ++++-
 drivers/vfio/pci/mlx5/cmd.c                   | 229 +++++++++++++-----
 drivers/vfio/pci/mlx5/cmd.h                   |  50 +++-
 drivers/vfio/pci/mlx5/main.c                  | 133 +++++-----
 include/linux/mlx5/driver.h                   |  12 +
 5 files changed, 358 insertions(+), 131 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code
  2022-04-27  9:31 [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
@ 2022-04-27  9:31 ` Yishai Hadas
  2022-05-04 20:13   ` Alex Williamson
  2022-04-27  9:31 ` [PATCH mlx5-next 2/5] net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs Yishai Hadas
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Yishai Hadas @ 2022-04-27  9:31 UTC (permalink / raw)
  To: alex.williamson, jgg, saeedm
  Cc: kvm, netdev, kuba, leonro, yishaih, maorg, cohuck

Reorganize the VF is migratable code to be in a separate function, next
patches from the series may use this.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  | 18 ++++++++++++++++++
 drivers/vfio/pci/mlx5/cmd.h  |  1 +
 drivers/vfio/pci/mlx5/main.c | 22 +++++++---------------
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 5c9f9218cc1d..d608b8167f58 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -71,6 +71,24 @@ int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
 	return ret;
 }
 
+bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev)
+{
+	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+	bool migratable = false;
+
+	if (!mdev)
+		return false;
+
+	if (!MLX5_CAP_GEN(mdev, migration))
+		goto end;
+
+	migratable = true;
+
+end:
+	mlx5_vf_put_core_dev(mdev);
+	return migratable;
+}
+
 int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id)
 {
 	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 1392a11a9cc0..2da6a1c0ec5c 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -29,6 +29,7 @@ int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
 int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
 					  size_t *state_size);
 int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
+bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev);
 int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 			       struct mlx5_vf_migration_file *migf);
 int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index bbec5d288fee..2578f61eaeae 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -597,21 +597,13 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
 		return -ENOMEM;
 	vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
 
-	if (pdev->is_virtfn) {
-		struct mlx5_core_dev *mdev =
-			mlx5_vf_get_core_dev(pdev);
-
-		if (mdev) {
-			if (MLX5_CAP_GEN(mdev, migration)) {
-				mvdev->migrate_cap = 1;
-				mvdev->core_device.vdev.migration_flags =
-					VFIO_MIGRATION_STOP_COPY |
-					VFIO_MIGRATION_P2P;
-				mutex_init(&mvdev->state_mutex);
-				spin_lock_init(&mvdev->reset_lock);
-			}
-			mlx5_vf_put_core_dev(mdev);
-		}
+	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(pdev)) {
+		mvdev->migrate_cap = 1;
+		mvdev->core_device.vdev.migration_flags =
+			VFIO_MIGRATION_STOP_COPY |
+			VFIO_MIGRATION_P2P;
+		mutex_init(&mvdev->state_mutex);
+		spin_lock_init(&mvdev->reset_lock);
 	}
 
 	ret = vfio_pci_core_register_device(&mvdev->core_device);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mlx5-next 2/5] net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister APIs
  2022-04-27  9:31 [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
  2022-04-27  9:31 ` [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code Yishai Hadas
@ 2022-04-27  9:31 ` Yishai Hadas
  2022-05-04 13:55   ` Jason Gunthorpe
  2022-04-27  9:31 ` [PATCH mlx5-next 3/5] vfio/mlx5: Manage the VF attach/detach callback from the PF Yishai Hadas
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Yishai Hadas @ 2022-04-27  9:31 UTC (permalink / raw)
  To: alex.williamson, jgg, saeedm
  Cc: kvm, netdev, kuba, leonro, yishaih, maorg, cohuck

Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a
VF register to be notified for its enablement / disablement by the PF.

Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with
its notifier block and upon VF remove it will call
mlx5_sriov_blocking_notifier_unregister() to drop its registration.

This can give a VF the ability to clean some resources upon disable
before that the command interface goes down and on the other hand sets
some stuff before that it's enabled.

This may be used by a VF which is migration capable in few cases.(e.g.
PF load/unload upon an health recovery).

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/sriov.c   | 65 ++++++++++++++++++-
 include/linux/mlx5/driver.h                   | 12 ++++
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index 887ee0f729d1..2935614f6fa9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -87,6 +87,11 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
 enable_vfs_hca:
 	num_msix_count = mlx5_get_default_msix_vec_count(dev, num_vfs);
 	for (vf = 0; vf < num_vfs; vf++) {
+		/* Notify the VF before its enablement to let it set
+		 * some stuff.
+		 */
+		blocking_notifier_call_chain(&sriov->vfs_ctx[vf].notifier,
+					     MLX5_PF_NOTIFY_ENABLE_VF, dev);
 		err = mlx5_core_enable_hca(dev, vf + 1);
 		if (err) {
 			mlx5_core_warn(dev, "failed to enable VF %d (%d)\n", vf, err);
@@ -127,6 +132,11 @@ mlx5_device_disable_sriov(struct mlx5_core_dev *dev, int num_vfs, bool clear_vf)
 	for (vf = num_vfs - 1; vf >= 0; vf--) {
 		if (!sriov->vfs_ctx[vf].enabled)
 			continue;
+		/* Notify the VF before its disablement to let it clean
+		 * some resources.
+		 */
+		blocking_notifier_call_chain(&sriov->vfs_ctx[vf].notifier,
+					     MLX5_PF_NOTIFY_DISABLE_VF, dev);
 		err = mlx5_core_disable_hca(dev, vf + 1);
 		if (err) {
 			mlx5_core_warn(dev, "failed to disable VF %d\n", vf);
@@ -257,7 +267,7 @@ int mlx5_sriov_init(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_sriov *sriov = &dev->priv.sriov;
 	struct pci_dev *pdev = dev->pdev;
-	int total_vfs;
+	int total_vfs, i;
 
 	if (!mlx5_core_is_pf(dev))
 		return 0;
@@ -269,6 +279,9 @@ int mlx5_sriov_init(struct mlx5_core_dev *dev)
 	if (!sriov->vfs_ctx)
 		return -ENOMEM;
 
+	for (i = 0; i < total_vfs; i++)
+		BLOCKING_INIT_NOTIFIER_HEAD(&sriov->vfs_ctx[i].notifier);
+
 	return 0;
 }
 
@@ -281,3 +294,53 @@ void mlx5_sriov_cleanup(struct mlx5_core_dev *dev)
 
 	kfree(sriov->vfs_ctx);
 }
+
+/**
+ * mlx5_sriov_blocking_notifier_unregister - Unregister a VF from
+ * a notification block chain.
+ *
+ * @mdev: The mlx5 core device.
+ * @vf_id: The VF id.
+ * @nb: The notifier block to be unregistered.
+ */
+void mlx5_sriov_blocking_notifier_unregister(struct mlx5_core_dev *mdev,
+					     int vf_id,
+					     struct notifier_block *nb)
+{
+	struct mlx5_vf_context *vfs_ctx;
+	struct mlx5_core_sriov *sriov;
+
+	sriov = &mdev->priv.sriov;
+	if (WARN_ON(vf_id < 0 || vf_id >= sriov->num_vfs))
+		return;
+
+	vfs_ctx = &sriov->vfs_ctx[vf_id];
+	blocking_notifier_chain_unregister(&vfs_ctx->notifier, nb);
+}
+EXPORT_SYMBOL(mlx5_sriov_blocking_notifier_unregister);
+
+/**
+ * mlx5_sriov_blocking_notifier_register - Register a VF notification
+ * block chain.
+ *
+ * @mdev: The mlx5 core device.
+ * @vf_id: The VF id.
+ * @nb: The notifier block to be called upon the VF events.
+ *
+ * Returns 0 on success or an error code.
+ */
+int mlx5_sriov_blocking_notifier_register(struct mlx5_core_dev *mdev,
+					  int vf_id,
+					  struct notifier_block *nb)
+{
+	struct mlx5_vf_context *vfs_ctx;
+	struct mlx5_core_sriov *sriov;
+
+	sriov = &mdev->priv.sriov;
+	if (vf_id < 0 || vf_id >= sriov->num_vfs)
+		return -EINVAL;
+
+	vfs_ctx = &sriov->vfs_ctx[vf_id];
+	return blocking_notifier_chain_register(&vfs_ctx->notifier, nb);
+}
+EXPORT_SYMBOL(mlx5_sriov_blocking_notifier_register);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 9424503eb8d3..3d1594bad4ec 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -445,6 +445,11 @@ struct mlx5_qp_table {
 	struct radix_tree_root	tree;
 };
 
+enum {
+	MLX5_PF_NOTIFY_DISABLE_VF,
+	MLX5_PF_NOTIFY_ENABLE_VF,
+};
+
 struct mlx5_vf_context {
 	int	enabled;
 	u64	port_guid;
@@ -455,6 +460,7 @@ struct mlx5_vf_context {
 	u8	port_guid_valid:1;
 	u8	node_guid_valid:1;
 	enum port_state_policy	policy;
+	struct blocking_notifier_head notifier;
 };
 
 struct mlx5_core_sriov {
@@ -1155,6 +1161,12 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type
 struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev);
 void mlx5_vf_put_core_dev(struct mlx5_core_dev *mdev);
 
+int mlx5_sriov_blocking_notifier_register(struct mlx5_core_dev *mdev,
+					  int vf_id,
+					  struct notifier_block *nb);
+void mlx5_sriov_blocking_notifier_unregister(struct mlx5_core_dev *mdev,
+					     int vf_id,
+					     struct notifier_block *nb);
 #ifdef CONFIG_MLX5_CORE_IPOIB
 struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 					  struct ib_device *ibdev,
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mlx5-next 3/5] vfio/mlx5: Manage the VF attach/detach callback from the PF
  2022-04-27  9:31 [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
  2022-04-27  9:31 ` [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code Yishai Hadas
  2022-04-27  9:31 ` [PATCH mlx5-next 2/5] net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs Yishai Hadas
@ 2022-04-27  9:31 ` Yishai Hadas
  2022-05-04 20:34   ` Alex Williamson
  2022-04-27  9:31 ` [PATCH mlx5-next 4/5] vfio/mlx5: Refactor to enable VFs migration in parallel Yishai Hadas
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Yishai Hadas @ 2022-04-27  9:31 UTC (permalink / raw)
  To: alex.williamson, jgg, saeedm
  Cc: kvm, netdev, kuba, leonro, yishaih, maorg, cohuck

Manage the VF attach/detach callback from the PF.

This lets the driver to enable parallel VFs migration as will be
introduced in the next patch.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  | 59 +++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/mlx5/cmd.h  | 23 +++++++++++++-
 drivers/vfio/pci/mlx5/main.c | 25 ++++-----------
 3 files changed, 82 insertions(+), 25 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index d608b8167f58..1f84d7b9b9e5 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -71,21 +71,70 @@ int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
 	return ret;
 }
 
-bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev)
+static int mlx5fv_vf_event(struct notifier_block *nb,
+			   unsigned long event, void *data)
 {
-	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+	struct mlx5vf_pci_core_device *mvdev =
+		container_of(nb, struct mlx5vf_pci_core_device, nb);
+
+	mutex_lock(&mvdev->state_mutex);
+	switch (event) {
+	case MLX5_PF_NOTIFY_ENABLE_VF:
+		mvdev->mdev_detach = false;
+		break;
+	case MLX5_PF_NOTIFY_DISABLE_VF:
+		mvdev->mdev_detach = true;
+		break;
+	default:
+		break;
+	}
+	mlx5vf_state_mutex_unlock(mvdev);
+	return 0;
+}
+
+void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev)
+{
+	mlx5_sriov_blocking_notifier_unregister(mvdev->mdev, mvdev->vf_id,
+						&mvdev->nb);
+}
+
+bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev)
+{
+	struct pci_dev *pdev = mvdev->core_device.pdev;
 	bool migratable = false;
+	int ret;
 
-	if (!mdev)
+	mvdev->mdev = mlx5_vf_get_core_dev(pdev);
+	if (!mvdev->mdev)
 		return false;
+	if (!MLX5_CAP_GEN(mvdev->mdev, migration))
+		goto end;
+	mvdev->vf_id = pci_iov_vf_id(pdev);
+	if (mvdev->vf_id < 0)
+		goto end;
 
-	if (!MLX5_CAP_GEN(mdev, migration))
+	mutex_init(&mvdev->state_mutex);
+	spin_lock_init(&mvdev->reset_lock);
+	mvdev->nb.notifier_call = mlx5fv_vf_event;
+	ret = mlx5_sriov_blocking_notifier_register(mvdev->mdev, mvdev->vf_id,
+						    &mvdev->nb);
+	if (ret)
 		goto end;
 
+	mutex_lock(&mvdev->state_mutex);
+	if (mvdev->mdev_detach)
+		goto unreg;
+
+	mlx5vf_state_mutex_unlock(mvdev);
 	migratable = true;
+	goto end;
 
+unreg:
+	mlx5vf_state_mutex_unlock(mvdev);
+	mlx5_sriov_blocking_notifier_unregister(mvdev->mdev, mvdev->vf_id,
+						&mvdev->nb);
 end:
-	mlx5_vf_put_core_dev(mdev);
+	mlx5_vf_put_core_dev(mvdev->mdev);
 	return migratable;
 }
 
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 2da6a1c0ec5c..f47174eab4b8 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -7,6 +7,7 @@
 #define MLX5_VFIO_CMD_H
 
 #include <linux/kernel.h>
+#include <linux/vfio_pci_core.h>
 #include <linux/mlx5/driver.h>
 
 struct mlx5_vf_migration_file {
@@ -24,14 +25,34 @@ struct mlx5_vf_migration_file {
 	unsigned long last_offset;
 };
 
+struct mlx5vf_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	int vf_id;
+	u16 vhca_id;
+	u8 migrate_cap:1;
+	u8 deferred_reset:1;
+	/* protect migration state */
+	struct mutex state_mutex;
+	enum vfio_device_mig_state mig_state;
+	/* protect the reset_done flow */
+	spinlock_t reset_lock;
+	struct mlx5_vf_migration_file *resuming_migf;
+	struct mlx5_vf_migration_file *saving_migf;
+	struct notifier_block nb;
+	struct mlx5_core_dev *mdev;
+	u8 mdev_detach:1;
+};
+
 int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
 int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
 int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
 					  size_t *state_size);
 int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
-bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev);
+bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev);
+void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev);
 int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 			       struct mlx5_vf_migration_file *migf);
 int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 			       struct mlx5_vf_migration_file *migf);
+void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
 #endif /* MLX5_VFIO_CMD_H */
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 2578f61eaeae..445c516d38d9 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -17,7 +17,6 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/sched/mm.h>
-#include <linux/vfio_pci_core.h>
 #include <linux/anon_inodes.h>
 
 #include "cmd.h"
@@ -25,20 +24,6 @@
 /* Arbitrary to prevent userspace from consuming endless memory */
 #define MAX_MIGRATION_SIZE (512*1024*1024)
 
-struct mlx5vf_pci_core_device {
-	struct vfio_pci_core_device core_device;
-	u16 vhca_id;
-	u8 migrate_cap:1;
-	u8 deferred_reset:1;
-	/* protect migration state */
-	struct mutex state_mutex;
-	enum vfio_device_mig_state mig_state;
-	/* protect the reset_done flow */
-	spinlock_t reset_lock;
-	struct mlx5_vf_migration_file *resuming_migf;
-	struct mlx5_vf_migration_file *saving_migf;
-};
-
 static struct page *
 mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf,
 			  unsigned long offset)
@@ -444,7 +429,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
  * This function is called in all state_mutex unlock cases to
  * handle a 'deferred_reset' if exists.
  */
-static void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
+void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
 {
 again:
 	spin_lock(&mvdev->reset_lock);
@@ -597,13 +582,11 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
 		return -ENOMEM;
 	vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
 
-	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(pdev)) {
+	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(mvdev)) {
 		mvdev->migrate_cap = 1;
 		mvdev->core_device.vdev.migration_flags =
 			VFIO_MIGRATION_STOP_COPY |
 			VFIO_MIGRATION_P2P;
-		mutex_init(&mvdev->state_mutex);
-		spin_lock_init(&mvdev->reset_lock);
 	}
 
 	ret = vfio_pci_core_register_device(&mvdev->core_device);
@@ -614,6 +597,8 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
 	return 0;
 
 out_free:
+	if (mvdev->migrate_cap)
+		mlx5vf_cmd_remove_migratable(mvdev);
 	vfio_pci_core_uninit_device(&mvdev->core_device);
 	kfree(mvdev);
 	return ret;
@@ -624,6 +609,8 @@ static void mlx5vf_pci_remove(struct pci_dev *pdev)
 	struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
 
 	vfio_pci_core_unregister_device(&mvdev->core_device);
+	if (mvdev->migrate_cap)
+		mlx5vf_cmd_remove_migratable(mvdev);
 	vfio_pci_core_uninit_device(&mvdev->core_device);
 	kfree(mvdev);
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mlx5-next 4/5] vfio/mlx5: Refactor to enable VFs migration in parallel
  2022-04-27  9:31 [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
                   ` (2 preceding siblings ...)
  2022-04-27  9:31 ` [PATCH mlx5-next 3/5] vfio/mlx5: Manage the VF attach/detach callback from the PF Yishai Hadas
@ 2022-04-27  9:31 ` Yishai Hadas
  2022-04-27  9:31 ` [PATCH mlx5-next 5/5] vfio/mlx5: Run the SAVE state command in an async mode Yishai Hadas
  2022-05-04 13:29 ` [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
  5 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2022-04-27  9:31 UTC (permalink / raw)
  To: alex.williamson, jgg, saeedm
  Cc: kvm, netdev, kuba, leonro, yishaih, maorg, cohuck

Refactor to enable different VFs to run their commands over the PF
command interface in parallel and to not block one each other.

This is done by not using the global PF lock that was used before but
relying on the VF attach/detach mechanism to sync.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  | 102 +++++++++++++++--------------------
 drivers/vfio/pci/mlx5/cmd.h  |  11 ++--
 drivers/vfio/pci/mlx5/main.c |  44 ++++-----------
 3 files changed, 58 insertions(+), 99 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 1f84d7b9b9e5..ba06b797d630 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -5,70 +5,65 @@
 
 #include "cmd.h"
 
-int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod)
+static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id,
+				  u16 *vhca_id);
+
+int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod)
 {
-	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
 	u32 out[MLX5_ST_SZ_DW(suspend_vhca_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(suspend_vhca_in)] = {};
-	int ret;
 
-	if (!mdev)
+	lockdep_assert_held(&mvdev->state_mutex);
+	if (mvdev->mdev_detach)
 		return -ENOTCONN;
 
 	MLX5_SET(suspend_vhca_in, in, opcode, MLX5_CMD_OP_SUSPEND_VHCA);
-	MLX5_SET(suspend_vhca_in, in, vhca_id, vhca_id);
+	MLX5_SET(suspend_vhca_in, in, vhca_id, mvdev->vhca_id);
 	MLX5_SET(suspend_vhca_in, in, op_mod, op_mod);
 
-	ret = mlx5_cmd_exec_inout(mdev, suspend_vhca, in, out);
-	mlx5_vf_put_core_dev(mdev);
-	return ret;
+	return mlx5_cmd_exec_inout(mvdev->mdev, suspend_vhca, in, out);
 }
 
-int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod)
+int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod)
 {
-	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
 	u32 out[MLX5_ST_SZ_DW(resume_vhca_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(resume_vhca_in)] = {};
-	int ret;
 
-	if (!mdev)
+	lockdep_assert_held(&mvdev->state_mutex);
+	if (mvdev->mdev_detach)
 		return -ENOTCONN;
 
 	MLX5_SET(resume_vhca_in, in, opcode, MLX5_CMD_OP_RESUME_VHCA);
-	MLX5_SET(resume_vhca_in, in, vhca_id, vhca_id);
+	MLX5_SET(resume_vhca_in, in, vhca_id, mvdev->vhca_id);
 	MLX5_SET(resume_vhca_in, in, op_mod, op_mod);
 
-	ret = mlx5_cmd_exec_inout(mdev, resume_vhca, in, out);
-	mlx5_vf_put_core_dev(mdev);
-	return ret;
+	return mlx5_cmd_exec_inout(mvdev->mdev, resume_vhca, in, out);
 }
 
-int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
+int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
 					  size_t *state_size)
 {
-	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
 	u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {};
 	int ret;
 
-	if (!mdev)
+	lockdep_assert_held(&mvdev->state_mutex);
+	if (mvdev->mdev_detach)
 		return -ENOTCONN;
 
 	MLX5_SET(query_vhca_migration_state_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE);
-	MLX5_SET(query_vhca_migration_state_in, in, vhca_id, vhca_id);
+	MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id);
 	MLX5_SET(query_vhca_migration_state_in, in, op_mod, 0);
 
-	ret = mlx5_cmd_exec_inout(mdev, query_vhca_migration_state, in, out);
+	ret = mlx5_cmd_exec_inout(mvdev->mdev, query_vhca_migration_state, in,
+				  out);
 	if (ret)
-		goto end;
+		return ret;
 
 	*state_size = MLX5_GET(query_vhca_migration_state_out, out,
 			       required_umem_size);
-
-end:
-	mlx5_vf_put_core_dev(mdev);
-	return ret;
+	return 0;
 }
 
 static int mlx5fv_vf_event(struct notifier_block *nb,
@@ -125,6 +120,9 @@ bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev)
 	if (mvdev->mdev_detach)
 		goto unreg;
 
+	if (mlx5vf_cmd_get_vhca_id(mvdev->mdev, mvdev->vf_id + 1, &mvdev->vhca_id))
+		goto unreg;
+
 	mlx5vf_state_mutex_unlock(mvdev);
 	migratable = true;
 	goto end;
@@ -138,23 +136,18 @@ bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev)
 	return migratable;
 }
 
-int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id)
+static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id,
+				  u16 *vhca_id)
 {
-	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
 	u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {};
 	int out_size;
 	void *out;
 	int ret;
 
-	if (!mdev)
-		return -ENOTCONN;
-
 	out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out);
 	out = kzalloc(out_size, GFP_KERNEL);
-	if (!out) {
-		ret = -ENOMEM;
-		goto end;
-	}
+	if (!out)
+		return -ENOMEM;
 
 	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
 	MLX5_SET(query_hca_cap_in, in, other_function, 1);
@@ -172,8 +165,6 @@ int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id)
 
 err_exec:
 	kfree(out);
-end:
-	mlx5_vf_put_core_dev(mdev);
 	return ret;
 }
 
@@ -218,21 +209,23 @@ static int _create_state_mkey(struct mlx5_core_dev *mdev, u32 pdn,
 	return err;
 }
 
-int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 			       struct mlx5_vf_migration_file *migf)
 {
-	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
 	u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {};
+	struct mlx5_core_dev *mdev;
 	u32 pdn, mkey;
 	int err;
 
-	if (!mdev)
+	lockdep_assert_held(&mvdev->state_mutex);
+	if (mvdev->mdev_detach)
 		return -ENOTCONN;
 
+	mdev = mvdev->mdev;
 	err = mlx5_core_alloc_pd(mdev, &pdn);
 	if (err)
-		goto end;
+		return err;
 
 	err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE,
 			      0);
@@ -246,7 +239,7 @@ int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 	MLX5_SET(save_vhca_state_in, in, opcode,
 		 MLX5_CMD_OP_SAVE_VHCA_STATE);
 	MLX5_SET(save_vhca_state_in, in, op_mod, 0);
-	MLX5_SET(save_vhca_state_in, in, vhca_id, vhca_id);
+	MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id);
 	MLX5_SET(save_vhca_state_in, in, mkey, mkey);
 	MLX5_SET(save_vhca_state_in, in, size, migf->total_length);
 
@@ -254,37 +247,28 @@ int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 	if (err)
 		goto err_exec;
 
-	migf->total_length =
-		MLX5_GET(save_vhca_state_out, out, actual_image_size);
-
-	mlx5_core_destroy_mkey(mdev, mkey);
-	mlx5_core_dealloc_pd(mdev, pdn);
-	dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0);
-	mlx5_vf_put_core_dev(mdev);
-
-	return 0;
-
+	migf->total_length = MLX5_GET(save_vhca_state_out, out,
+				      actual_image_size);
 err_exec:
 	mlx5_core_destroy_mkey(mdev, mkey);
 err_create_mkey:
 	dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0);
 err_dma_map:
 	mlx5_core_dealloc_pd(mdev, pdn);
-end:
-	mlx5_vf_put_core_dev(mdev);
 	return err;
 }
 
-int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 			       struct mlx5_vf_migration_file *migf)
 {
-	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+	struct mlx5_core_dev *mdev;
 	u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {};
 	u32 pdn, mkey;
 	int err;
 
-	if (!mdev)
+	lockdep_assert_held(&mvdev->state_mutex);
+	if (mvdev->mdev_detach)
 		return -ENOTCONN;
 
 	mutex_lock(&migf->lock);
@@ -293,6 +277,7 @@ int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 		goto end;
 	}
 
+	mdev = mvdev->mdev;
 	err = mlx5_core_alloc_pd(mdev, &pdn);
 	if (err)
 		goto end;
@@ -308,7 +293,7 @@ int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 	MLX5_SET(load_vhca_state_in, in, opcode,
 		 MLX5_CMD_OP_LOAD_VHCA_STATE);
 	MLX5_SET(load_vhca_state_in, in, op_mod, 0);
-	MLX5_SET(load_vhca_state_in, in, vhca_id, vhca_id);
+	MLX5_SET(load_vhca_state_in, in, vhca_id, mvdev->vhca_id);
 	MLX5_SET(load_vhca_state_in, in, mkey, mkey);
 	MLX5_SET(load_vhca_state_in, in, size, migf->total_length);
 
@@ -320,7 +305,6 @@ int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
 err_reg:
 	mlx5_core_dealloc_pd(mdev, pdn);
 end:
-	mlx5_vf_put_core_dev(mdev);
 	mutex_unlock(&migf->lock);
 	return err;
 }
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index f47174eab4b8..3246c73395bc 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -43,16 +43,15 @@ struct mlx5vf_pci_core_device {
 	u8 mdev_detach:1;
 };
 
-int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
-int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
-int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
+int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
+int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
+int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
 					  size_t *state_size);
-int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
 bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev);
 void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev);
-int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 			       struct mlx5_vf_migration_file *migf);
-int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 			       struct mlx5_vf_migration_file *migf);
 void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
 #endif /* MLX5_VFIO_CMD_H */
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 445c516d38d9..f9793a627c24 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -208,8 +208,8 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev)
 	stream_open(migf->filp->f_inode, migf->filp);
 	mutex_init(&migf->lock);
 
-	ret = mlx5vf_cmd_query_vhca_migration_state(
-		mvdev->core_device.pdev, mvdev->vhca_id, &migf->total_length);
+	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev,
+						    &migf->total_length);
 	if (ret)
 		goto out_free;
 
@@ -218,8 +218,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev)
 	if (ret)
 		goto out_free;
 
-	ret = mlx5vf_cmd_save_vhca_state(mvdev->core_device.pdev,
-					 mvdev->vhca_id, migf);
+	ret = mlx5vf_cmd_save_vhca_state(mvdev, migf);
 	if (ret)
 		goto out_free;
 	return migf;
@@ -346,8 +345,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
 	int ret;
 
 	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_STOP) {
-		ret = mlx5vf_cmd_suspend_vhca(
-			mvdev->core_device.pdev, mvdev->vhca_id,
+		ret = mlx5vf_cmd_suspend_vhca(mvdev,
 			MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_RESPONDER);
 		if (ret)
 			return ERR_PTR(ret);
@@ -355,8 +353,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RUNNING_P2P) {
-		ret = mlx5vf_cmd_resume_vhca(
-			mvdev->core_device.pdev, mvdev->vhca_id,
+		ret = mlx5vf_cmd_resume_vhca(mvdev,
 			MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_RESPONDER);
 		if (ret)
 			return ERR_PTR(ret);
@@ -364,8 +361,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) {
-		ret = mlx5vf_cmd_suspend_vhca(
-			mvdev->core_device.pdev, mvdev->vhca_id,
+		ret = mlx5vf_cmd_suspend_vhca(mvdev,
 			MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_INITIATOR);
 		if (ret)
 			return ERR_PTR(ret);
@@ -373,8 +369,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) {
-		ret = mlx5vf_cmd_resume_vhca(
-			mvdev->core_device.pdev, mvdev->vhca_id,
+		ret = mlx5vf_cmd_resume_vhca(mvdev,
 			MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_INITIATOR);
 		if (ret)
 			return ERR_PTR(ret);
@@ -409,8 +404,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
-		ret = mlx5vf_cmd_load_vhca_state(mvdev->core_device.pdev,
-						 mvdev->vhca_id,
+		ret = mlx5vf_cmd_load_vhca_state(mvdev,
 						 mvdev->resuming_migf);
 		if (ret)
 			return ERR_PTR(ret);
@@ -517,34 +511,16 @@ static int mlx5vf_pci_open_device(struct vfio_device *core_vdev)
 	struct mlx5vf_pci_core_device *mvdev = container_of(
 		core_vdev, struct mlx5vf_pci_core_device, core_device.vdev);
 	struct vfio_pci_core_device *vdev = &mvdev->core_device;
-	int vf_id;
 	int ret;
 
 	ret = vfio_pci_core_enable(vdev);
 	if (ret)
 		return ret;
 
-	if (!mvdev->migrate_cap) {
-		vfio_pci_core_finish_enable(vdev);
-		return 0;
-	}
-
-	vf_id = pci_iov_vf_id(vdev->pdev);
-	if (vf_id < 0) {
-		ret = vf_id;
-		goto out_disable;
-	}
-
-	ret = mlx5vf_cmd_get_vhca_id(vdev->pdev, vf_id + 1, &mvdev->vhca_id);
-	if (ret)
-		goto out_disable;
-
-	mvdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+	if (mvdev->migrate_cap)
+		mvdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
 	vfio_pci_core_finish_enable(vdev);
 	return 0;
-out_disable:
-	vfio_pci_core_disable(vdev);
-	return ret;
 }
 
 static void mlx5vf_pci_close_device(struct vfio_device *core_vdev)
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH mlx5-next 5/5] vfio/mlx5: Run the SAVE state command in an async mode
  2022-04-27  9:31 [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
                   ` (3 preceding siblings ...)
  2022-04-27  9:31 ` [PATCH mlx5-next 4/5] vfio/mlx5: Refactor to enable VFs migration in parallel Yishai Hadas
@ 2022-04-27  9:31 ` Yishai Hadas
  2022-05-04 13:29 ` [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
  5 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2022-04-27  9:31 UTC (permalink / raw)
  To: alex.williamson, jgg, saeedm
  Cc: kvm, netdev, kuba, leonro, yishaih, maorg, cohuck

Use the PF asynchronous command mode for the SAVE state command.

This enables returning earlier to user space upon issuing successfully
the command and improve latency by let things run in parallel.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  | 72 ++++++++++++++++++++++++++++++++++--
 drivers/vfio/pci/mlx5/cmd.h  | 17 +++++++++
 drivers/vfio/pci/mlx5/main.c | 54 ++++++++++++++++++++++++---
 3 files changed, 134 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index ba06b797d630..fad648d0c088 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -78,6 +78,7 @@ static int mlx5fv_vf_event(struct notifier_block *nb,
 		mvdev->mdev_detach = false;
 		break;
 	case MLX5_PF_NOTIFY_DISABLE_VF:
+		mlx5vf_disable_fds(mvdev);
 		mvdev->mdev_detach = true;
 		break;
 	default:
@@ -209,11 +210,56 @@ static int _create_state_mkey(struct mlx5_core_dev *mdev, u32 pdn,
 	return err;
 }
 
+void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work)
+{
+	struct mlx5vf_async_data *async_data = container_of(_work,
+		struct mlx5vf_async_data, work);
+	struct mlx5_vf_migration_file *migf = container_of(async_data,
+		struct mlx5_vf_migration_file, async_data);
+	struct mlx5_core_dev *mdev = migf->mvdev->mdev;
+
+	mutex_lock(&migf->lock);
+	if (async_data->status) {
+		migf->is_err = true;
+		wake_up_interruptible(&migf->poll_wait);
+	}
+	mutex_unlock(&migf->lock);
+
+	mlx5_core_destroy_mkey(mdev, async_data->mkey);
+	dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0);
+	mlx5_core_dealloc_pd(mdev, async_data->pdn);
+	kvfree(async_data->out);
+	fput(migf->filp);
+}
+
+static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
+{
+	struct mlx5vf_async_data *async_data = container_of(context,
+			struct mlx5vf_async_data, cb_work);
+	struct mlx5_vf_migration_file *migf = container_of(async_data,
+			struct mlx5_vf_migration_file, async_data);
+
+	if (!status) {
+		WRITE_ONCE(migf->total_length,
+			   MLX5_GET(save_vhca_state_out, async_data->out,
+				    actual_image_size));
+		wake_up_interruptible(&migf->poll_wait);
+	}
+
+	/*
+	 * The error and the cleanup flows can't run from an
+	 * interrupt context
+	 */
+	async_data->status = status;
+	queue_work(migf->mvdev->cb_wq, &async_data->work);
+}
+
 int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 			       struct mlx5_vf_migration_file *migf)
 {
-	u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {};
+	u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out);
 	u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {};
+	struct mlx5vf_async_data *async_data;
 	struct mlx5_core_dev *mdev;
 	u32 pdn, mkey;
 	int err;
@@ -243,13 +289,31 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 	MLX5_SET(save_vhca_state_in, in, mkey, mkey);
 	MLX5_SET(save_vhca_state_in, in, size, migf->total_length);
 
-	err = mlx5_cmd_exec_inout(mdev, save_vhca_state, in, out);
+	async_data = &migf->async_data;
+	async_data->out = kvzalloc(out_size, GFP_KERNEL);
+	if (!async_data->out) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	/* no data exists till the callback comes back */
+	migf->total_length = 0;
+	get_file(migf->filp);
+	async_data->mkey = mkey;
+	async_data->pdn = pdn;
+	err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in),
+			       async_data->out,
+			       out_size, mlx5vf_save_callback,
+			       &async_data->cb_work);
 	if (err)
 		goto err_exec;
 
-	migf->total_length = MLX5_GET(save_vhca_state_out, out,
-				      actual_image_size);
+	return 0;
+
 err_exec:
+	fput(migf->filp);
+	kvfree(async_data->out);
+err_out:
 	mlx5_core_destroy_mkey(mdev, mkey);
 err_create_mkey:
 	dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0);
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 3246c73395bc..f8f273faa5a8 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -10,10 +10,20 @@
 #include <linux/vfio_pci_core.h>
 #include <linux/mlx5/driver.h>
 
+struct mlx5vf_async_data {
+	struct mlx5_async_work cb_work;
+	struct work_struct work;
+	int status;
+	u32 pdn;
+	u32 mkey;
+	void *out;
+};
+
 struct mlx5_vf_migration_file {
 	struct file *filp;
 	struct mutex lock;
 	bool disabled;
+	u8 is_err:1;
 
 	struct sg_append_table table;
 	size_t total_length;
@@ -23,6 +33,10 @@ struct mlx5_vf_migration_file {
 	struct scatterlist *last_offset_sg;
 	unsigned int sg_last_entry;
 	unsigned long last_offset;
+	struct mlx5vf_pci_core_device *mvdev;
+	wait_queue_head_t poll_wait;
+	struct mlx5_async_ctx async_ctx;
+	struct mlx5vf_async_data async_data;
 };
 
 struct mlx5vf_pci_core_device {
@@ -38,6 +52,7 @@ struct mlx5vf_pci_core_device {
 	spinlock_t reset_lock;
 	struct mlx5_vf_migration_file *resuming_migf;
 	struct mlx5_vf_migration_file *saving_migf;
+	struct workqueue_struct *cb_wq;
 	struct notifier_block nb;
 	struct mlx5_core_dev *mdev;
 	u8 mdev_detach:1;
@@ -54,4 +69,6 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 			       struct mlx5_vf_migration_file *migf);
 void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
+void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev);
+void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work);
 #endif /* MLX5_VFIO_CMD_H */
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index f9793a627c24..6df7ad2dfa6d 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -134,12 +134,22 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len,
 		return -ESPIPE;
 	pos = &filp->f_pos;
 
+	if (!(filp->f_flags & O_NONBLOCK)) {
+		if (wait_event_interruptible(migf->poll_wait,
+			     READ_ONCE(migf->total_length) || migf->is_err))
+			return -ERESTARTSYS;
+	}
+
 	mutex_lock(&migf->lock);
+	if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(migf->total_length)) {
+		done = -EAGAIN;
+		goto out_unlock;
+	}
 	if (*pos > migf->total_length) {
 		done = -EINVAL;
 		goto out_unlock;
 	}
-	if (migf->disabled) {
+	if (migf->disabled || migf->is_err) {
 		done = -ENODEV;
 		goto out_unlock;
 	}
@@ -179,9 +189,28 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len,
 	return done;
 }
 
+static __poll_t mlx5vf_save_poll(struct file *filp,
+				 struct poll_table_struct *wait)
+{
+	struct mlx5_vf_migration_file *migf = filp->private_data;
+	__poll_t pollflags = 0;
+
+	poll_wait(filp, &migf->poll_wait, wait);
+
+	mutex_lock(&migf->lock);
+	if (migf->disabled || migf->is_err)
+		pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
+	else if (READ_ONCE(migf->total_length))
+		pollflags = EPOLLIN | EPOLLRDNORM;
+	mutex_unlock(&migf->lock);
+
+	return pollflags;
+}
+
 static const struct file_operations mlx5vf_save_fops = {
 	.owner = THIS_MODULE,
 	.read = mlx5vf_save_read,
+	.poll = mlx5vf_save_poll,
 	.release = mlx5vf_release_file,
 	.llseek = no_llseek,
 };
@@ -207,7 +236,9 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev)
 
 	stream_open(migf->filp->f_inode, migf->filp);
 	mutex_init(&migf->lock);
-
+	init_waitqueue_head(&migf->poll_wait);
+	mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx);
+	INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb);
 	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev,
 						    &migf->total_length);
 	if (ret)
@@ -218,6 +249,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev)
 	if (ret)
 		goto out_free;
 
+	migf->mvdev = mvdev;
 	ret = mlx5vf_cmd_save_vhca_state(mvdev, migf);
 	if (ret)
 		goto out_free;
@@ -323,7 +355,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
 	return migf;
 }
 
-static void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev)
+void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev)
 {
 	if (mvdev->resuming_migf) {
 		mlx5vf_disable_fd(mvdev->resuming_migf);
@@ -331,6 +363,8 @@ static void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev)
 		mvdev->resuming_migf = NULL;
 	}
 	if (mvdev->saving_migf) {
+		mlx5_cmd_cleanup_async_ctx(&mvdev->saving_migf->async_ctx);
+		cancel_work_sync(&mvdev->saving_migf->async_data.work);
 		mlx5vf_disable_fd(mvdev->saving_migf);
 		fput(mvdev->saving_migf->filp);
 		mvdev->saving_migf = NULL;
@@ -560,6 +594,11 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
 
 	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(mvdev)) {
 		mvdev->migrate_cap = 1;
+		mvdev->cb_wq = alloc_ordered_workqueue("mlx5vf_wq", 0);
+		if (!mvdev->cb_wq) {
+			ret = -ENOMEM;
+			goto out_free;
+		}
 		mvdev->core_device.vdev.migration_flags =
 			VFIO_MIGRATION_STOP_COPY |
 			VFIO_MIGRATION_P2P;
@@ -573,8 +612,11 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
 	return 0;
 
 out_free:
-	if (mvdev->migrate_cap)
+	if (mvdev->migrate_cap) {
 		mlx5vf_cmd_remove_migratable(mvdev);
+		if (mvdev->cb_wq)
+			destroy_workqueue(mvdev->cb_wq);
+	}
 	vfio_pci_core_uninit_device(&mvdev->core_device);
 	kfree(mvdev);
 	return ret;
@@ -585,8 +627,10 @@ static void mlx5vf_pci_remove(struct pci_dev *pdev)
 	struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
 
 	vfio_pci_core_unregister_device(&mvdev->core_device);
-	if (mvdev->migrate_cap)
+	if (mvdev->migrate_cap) {
 		mlx5vf_cmd_remove_migratable(mvdev);
+		destroy_workqueue(mvdev->cb_wq);
+	}
 	vfio_pci_core_uninit_device(&mvdev->core_device);
 	kfree(mvdev);
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 0/5] Improve mlx5 live migration driver
  2022-04-27  9:31 [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
                   ` (4 preceding siblings ...)
  2022-04-27  9:31 ` [PATCH mlx5-next 5/5] vfio/mlx5: Run the SAVE state command in an async mode Yishai Hadas
@ 2022-05-04 13:29 ` Yishai Hadas
  2022-05-04 20:19   ` Alex Williamson
  5 siblings, 1 reply; 16+ messages in thread
From: Yishai Hadas @ 2022-05-04 13:29 UTC (permalink / raw)
  To: alex.williamson, jgg, saeedm; +Cc: kvm, netdev, kuba, leonro, maorg, cohuck

On 27/04/2022 12:31, Yishai Hadas wrote:
> This series improves mlx5 live migration driver in few aspects as of
> below.
>
> Refactor to enable running migration commands in parallel over the PF
> command interface.
>
> To achieve that we exposed from mlx5_core an API to let the VF be
> notified before that the PF command interface goes down/up. (e.g. PF
> reload upon health recovery).
>
> Once having the above functionality in place mlx5 vfio doesn't need any
> more to obtain the global PF lock upon using the command interface but
> can rely on the above mechanism to be in sync with the PF.
>
> This can enable parallel VFs migration over the PF command interface
> from kernel driver point of view.
>
> In addition,
> Moved to use the PF async command mode for the SAVE state command.
> This enables returning earlier to user space upon issuing successfully
> the command and improve latency by let things run in parallel.
>
> Alex, as this series touches mlx5_core we may need to send this in a
> pull request format to VFIO to avoid conflicts before acceptance.
>
> Yishai
>
> Yishai Hadas (5):
>    vfio/mlx5: Reorganize the VF is migratable code
>    net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister
>      APIs
>    vfio/mlx5: Manage the VF attach/detach callback from the PF
>    vfio/mlx5: Refactor to enable VFs migration in parallel
>    vfio/mlx5: Run the SAVE state command in an async mode
>
>   .../net/ethernet/mellanox/mlx5/core/sriov.c   |  65 ++++-
>   drivers/vfio/pci/mlx5/cmd.c                   | 229 +++++++++++++-----
>   drivers/vfio/pci/mlx5/cmd.h                   |  50 +++-
>   drivers/vfio/pci/mlx5/main.c                  | 133 +++++-----
>   include/linux/mlx5/driver.h                   |  12 +
>   5 files changed, 358 insertions(+), 131 deletions(-)
>
Hi Alex,

Did you have the chance to look at the series ? It touches mlx5 code 
(vfio, net), no core changes.

This may go apparently via your tree as a PR from mlx5-next once you'll 
be fine with.

Thanks,
Yishai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 2/5] net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister APIs
  2022-04-27  9:31 ` [PATCH mlx5-next 2/5] net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs Yishai Hadas
@ 2022-05-04 13:55   ` Jason Gunthorpe
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2022-05-04 13:55 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On Wed, Apr 27, 2022 at 12:31:17PM +0300, Yishai Hadas wrote:
> Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a
> VF register to be notified for its enablement / disablement by the PF.
> 
> Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with
> its notifier block and upon VF remove it will call
> mlx5_sriov_blocking_notifier_unregister() to drop its registration.
> 
> This can give a VF the ability to clean some resources upon disable
> before that the command interface goes down and on the other hand sets
> some stuff before that it's enabled.
> 
> This may be used by a VF which is migration capable in few cases.(e.g.
> PF load/unload upon an health recovery).
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/sriov.c   | 65 ++++++++++++++++++-
>  include/linux/mlx5/driver.h                   | 12 ++++
>  2 files changed, 76 insertions(+), 1 deletion(-)

This patch needs to be first and be on the mlx5 shared branch
 
Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code
  2022-04-27  9:31 ` [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code Yishai Hadas
@ 2022-05-04 20:13   ` Alex Williamson
  2022-05-08 12:56     ` Yishai Hadas
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2022-05-04 20:13 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: jgg, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On Wed, 27 Apr 2022 12:31:16 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Reorganize the VF is migratable code to be in a separate function, next
> patches from the series may use this.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/vfio/pci/mlx5/cmd.c  | 18 ++++++++++++++++++
>  drivers/vfio/pci/mlx5/cmd.h  |  1 +
>  drivers/vfio/pci/mlx5/main.c | 22 +++++++---------------
>  3 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
> index 5c9f9218cc1d..d608b8167f58 100644
> --- a/drivers/vfio/pci/mlx5/cmd.c
> +++ b/drivers/vfio/pci/mlx5/cmd.c
> @@ -71,6 +71,24 @@ int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>  	return ret;
>  }
>  
> +bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev)
> +{
> +	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
> +	bool migratable = false;
> +
> +	if (!mdev)
> +		return false;
> +
> +	if (!MLX5_CAP_GEN(mdev, migration))
> +		goto end;
> +
> +	migratable = true;
> +
> +end:
> +	mlx5_vf_put_core_dev(mdev);
> +	return migratable;
> +}

This goto seems unnecessary, couldn't it instead be written:

{
	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
	boot migratable = true;

	if (!mdev)
		return false;

	if (!MLX5_CAP_GEN(mdev, migration))
		migratable = false;

	mlx5_vf_put_core_mdev(mdev);
	return migratable;
}

Thanks,
Alex

> +
>  int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id)
>  {
>  	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
> diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
> index 1392a11a9cc0..2da6a1c0ec5c 100644
> --- a/drivers/vfio/pci/mlx5/cmd.h
> +++ b/drivers/vfio/pci/mlx5/cmd.h
> @@ -29,6 +29,7 @@ int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
>  int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>  					  size_t *state_size);
>  int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
> +bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev);
>  int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
>  			       struct mlx5_vf_migration_file *migf);
>  int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
> diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
> index bbec5d288fee..2578f61eaeae 100644
> --- a/drivers/vfio/pci/mlx5/main.c
> +++ b/drivers/vfio/pci/mlx5/main.c
> @@ -597,21 +597,13 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
>  		return -ENOMEM;
>  	vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
>  
> -	if (pdev->is_virtfn) {
> -		struct mlx5_core_dev *mdev =
> -			mlx5_vf_get_core_dev(pdev);
> -
> -		if (mdev) {
> -			if (MLX5_CAP_GEN(mdev, migration)) {
> -				mvdev->migrate_cap = 1;
> -				mvdev->core_device.vdev.migration_flags =
> -					VFIO_MIGRATION_STOP_COPY |
> -					VFIO_MIGRATION_P2P;
> -				mutex_init(&mvdev->state_mutex);
> -				spin_lock_init(&mvdev->reset_lock);
> -			}
> -			mlx5_vf_put_core_dev(mdev);
> -		}
> +	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(pdev)) {
> +		mvdev->migrate_cap = 1;
> +		mvdev->core_device.vdev.migration_flags =
> +			VFIO_MIGRATION_STOP_COPY |
> +			VFIO_MIGRATION_P2P;
> +		mutex_init(&mvdev->state_mutex);
> +		spin_lock_init(&mvdev->reset_lock);
>  	}
>  
>  	ret = vfio_pci_core_register_device(&mvdev->core_device);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 0/5] Improve mlx5 live migration driver
  2022-05-04 13:29 ` [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
@ 2022-05-04 20:19   ` Alex Williamson
  2022-05-04 21:33     ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2022-05-04 20:19 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: jgg, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On Wed, 4 May 2022 16:29:37 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 27/04/2022 12:31, Yishai Hadas wrote:
> > This series improves mlx5 live migration driver in few aspects as of
> > below.
> >
> > Refactor to enable running migration commands in parallel over the PF
> > command interface.
> >
> > To achieve that we exposed from mlx5_core an API to let the VF be
> > notified before that the PF command interface goes down/up. (e.g. PF
> > reload upon health recovery).
> >
> > Once having the above functionality in place mlx5 vfio doesn't need any
> > more to obtain the global PF lock upon using the command interface but
> > can rely on the above mechanism to be in sync with the PF.
> >
> > This can enable parallel VFs migration over the PF command interface
> > from kernel driver point of view.
> >
> > In addition,
> > Moved to use the PF async command mode for the SAVE state command.
> > This enables returning earlier to user space upon issuing successfully
> > the command and improve latency by let things run in parallel.
> >
> > Alex, as this series touches mlx5_core we may need to send this in a
> > pull request format to VFIO to avoid conflicts before acceptance.
> >
> > Yishai
> >
> > Yishai Hadas (5):
> >    vfio/mlx5: Reorganize the VF is migratable code
> >    net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister
> >      APIs
> >    vfio/mlx5: Manage the VF attach/detach callback from the PF
> >    vfio/mlx5: Refactor to enable VFs migration in parallel
> >    vfio/mlx5: Run the SAVE state command in an async mode
> >
> >   .../net/ethernet/mellanox/mlx5/core/sriov.c   |  65 ++++-
> >   drivers/vfio/pci/mlx5/cmd.c                   | 229 +++++++++++++-----
> >   drivers/vfio/pci/mlx5/cmd.h                   |  50 +++-
> >   drivers/vfio/pci/mlx5/main.c                  | 133 +++++-----
> >   include/linux/mlx5/driver.h                   |  12 +
> >   5 files changed, 358 insertions(+), 131 deletions(-)
> >  
> Hi Alex,
> 
> Did you have the chance to look at the series ? It touches mlx5 code 
> (vfio, net), no core changes.
> 
> This may go apparently via your tree as a PR from mlx5-next once you'll 
> be fine with.

As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
which has no other dependencies in this series.  Is there something
else blocking committing that via the mlx tree and providing a branch
for the remainder to go in through the vfio tree?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 3/5] vfio/mlx5: Manage the VF attach/detach callback from the PF
  2022-04-27  9:31 ` [PATCH mlx5-next 3/5] vfio/mlx5: Manage the VF attach/detach callback from the PF Yishai Hadas
@ 2022-05-04 20:34   ` Alex Williamson
  2022-05-08 13:04     ` Yishai Hadas
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2022-05-04 20:34 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: jgg, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On Wed, 27 Apr 2022 12:31:18 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Manage the VF attach/detach callback from the PF.
> 
> This lets the driver to enable parallel VFs migration as will be
> introduced in the next patch.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/vfio/pci/mlx5/cmd.c  | 59 +++++++++++++++++++++++++++++++++---
>  drivers/vfio/pci/mlx5/cmd.h  | 23 +++++++++++++-
>  drivers/vfio/pci/mlx5/main.c | 25 ++++-----------
>  3 files changed, 82 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
> index d608b8167f58..1f84d7b9b9e5 100644
> --- a/drivers/vfio/pci/mlx5/cmd.c
> +++ b/drivers/vfio/pci/mlx5/cmd.c
> @@ -71,21 +71,70 @@ int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>  	return ret;
>  }
>  
> -bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev)
> +static int mlx5fv_vf_event(struct notifier_block *nb,
> +			   unsigned long event, void *data)
>  {
> -	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
> +	struct mlx5vf_pci_core_device *mvdev =
> +		container_of(nb, struct mlx5vf_pci_core_device, nb);
> +
> +	mutex_lock(&mvdev->state_mutex);
> +	switch (event) {
> +	case MLX5_PF_NOTIFY_ENABLE_VF:
> +		mvdev->mdev_detach = false;
> +		break;
> +	case MLX5_PF_NOTIFY_DISABLE_VF:
> +		mvdev->mdev_detach = true;
> +		break;
> +	default:
> +		break;
> +	}
> +	mlx5vf_state_mutex_unlock(mvdev);
> +	return 0;
> +}
> +
> +void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev)
> +{
> +	mlx5_sriov_blocking_notifier_unregister(mvdev->mdev, mvdev->vf_id,
> +						&mvdev->nb);
> +}
> +
> +bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev)

Why did the original implementation take a pdev knowing we're going to
gut it in the next patch to use an mvdev?  The diff would be easier to
read.

There's also quite a lot of setup here now, it's no longer a simple
test whether the device supports migration which makes the name
misleading.  This looks like a "setup migration" function that should
return 0/-errno.

> +{
> +	struct pci_dev *pdev = mvdev->core_device.pdev;
>  	bool migratable = false;
> +	int ret;
>  
> -	if (!mdev)
> +	mvdev->mdev = mlx5_vf_get_core_dev(pdev);
> +	if (!mvdev->mdev)
>  		return false;
> +	if (!MLX5_CAP_GEN(mvdev->mdev, migration))
> +		goto end;
> +	mvdev->vf_id = pci_iov_vf_id(pdev);
> +	if (mvdev->vf_id < 0)
> +		goto end;
>  
> -	if (!MLX5_CAP_GEN(mdev, migration))
> +	mutex_init(&mvdev->state_mutex);
> +	spin_lock_init(&mvdev->reset_lock);
> +	mvdev->nb.notifier_call = mlx5fv_vf_event;
> +	ret = mlx5_sriov_blocking_notifier_register(mvdev->mdev, mvdev->vf_id,
> +						    &mvdev->nb);
> +	if (ret)
>  		goto end;
>  
> +	mutex_lock(&mvdev->state_mutex);
> +	if (mvdev->mdev_detach)
> +		goto unreg;
> +
> +	mlx5vf_state_mutex_unlock(mvdev);
>  	migratable = true;
> +	goto end;
>  
> +unreg:
> +	mlx5vf_state_mutex_unlock(mvdev);
> +	mlx5_sriov_blocking_notifier_unregister(mvdev->mdev, mvdev->vf_id,
> +						&mvdev->nb);
>  end:
> -	mlx5_vf_put_core_dev(mdev);
> +	mlx5_vf_put_core_dev(mvdev->mdev);
>  	return migratable;
>  }
>  
> diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
> index 2da6a1c0ec5c..f47174eab4b8 100644
> --- a/drivers/vfio/pci/mlx5/cmd.h
> +++ b/drivers/vfio/pci/mlx5/cmd.h
> @@ -7,6 +7,7 @@
>  #define MLX5_VFIO_CMD_H
>  
>  #include <linux/kernel.h>
> +#include <linux/vfio_pci_core.h>
>  #include <linux/mlx5/driver.h>
>  
>  struct mlx5_vf_migration_file {
> @@ -24,14 +25,34 @@ struct mlx5_vf_migration_file {
>  	unsigned long last_offset;
>  };
>  
> +struct mlx5vf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	int vf_id;
> +	u16 vhca_id;
> +	u8 migrate_cap:1;
> +	u8 deferred_reset:1;
> +	/* protect migration state */
> +	struct mutex state_mutex;
> +	enum vfio_device_mig_state mig_state;
> +	/* protect the reset_done flow */
> +	spinlock_t reset_lock;
> +	struct mlx5_vf_migration_file *resuming_migf;
> +	struct mlx5_vf_migration_file *saving_migf;
> +	struct notifier_block nb;
> +	struct mlx5_core_dev *mdev;
> +	u8 mdev_detach:1;
> +};
> +
>  int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
>  int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
>  int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>  					  size_t *state_size);
>  int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
> -bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev);
> +bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev);
> +void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev);
>  int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
>  			       struct mlx5_vf_migration_file *migf);
>  int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
>  			       struct mlx5_vf_migration_file *migf);
> +void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
>  #endif /* MLX5_VFIO_CMD_H */
> diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
> index 2578f61eaeae..445c516d38d9 100644
> --- a/drivers/vfio/pci/mlx5/main.c
> +++ b/drivers/vfio/pci/mlx5/main.c
> @@ -17,7 +17,6 @@
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
>  #include <linux/sched/mm.h>
> -#include <linux/vfio_pci_core.h>
>  #include <linux/anon_inodes.h>
>  
>  #include "cmd.h"
> @@ -25,20 +24,6 @@
>  /* Arbitrary to prevent userspace from consuming endless memory */
>  #define MAX_MIGRATION_SIZE (512*1024*1024)
>  
> -struct mlx5vf_pci_core_device {
> -	struct vfio_pci_core_device core_device;
> -	u16 vhca_id;
> -	u8 migrate_cap:1;
> -	u8 deferred_reset:1;
> -	/* protect migration state */
> -	struct mutex state_mutex;
> -	enum vfio_device_mig_state mig_state;
> -	/* protect the reset_done flow */
> -	spinlock_t reset_lock;
> -	struct mlx5_vf_migration_file *resuming_migf;
> -	struct mlx5_vf_migration_file *saving_migf;
> -};
> -
>  static struct page *
>  mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf,
>  			  unsigned long offset)
> @@ -444,7 +429,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
>   * This function is called in all state_mutex unlock cases to
>   * handle a 'deferred_reset' if exists.
>   */
> -static void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
> +void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
>  {
>  again:
>  	spin_lock(&mvdev->reset_lock);
> @@ -597,13 +582,11 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
>  		return -ENOMEM;
>  	vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
>  
> -	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(pdev)) {
> +	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(mvdev)) {
>  		mvdev->migrate_cap = 1;
>  		mvdev->core_device.vdev.migration_flags =
>  			VFIO_MIGRATION_STOP_COPY |
>  			VFIO_MIGRATION_P2P;

Why do these aspects of setting up migration remain here?  Do we even
need this new function to have a return value?  It looks like all of
this and testing whether the pdev->is_virtfn could be pushed into the
new function, which could then return void.  Thanks,

Alex

> -		mutex_init(&mvdev->state_mutex);
> -		spin_lock_init(&mvdev->reset_lock);
>  	}
>  
>  	ret = vfio_pci_core_register_device(&mvdev->core_device);
> @@ -614,6 +597,8 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
>  	return 0;
>  
>  out_free:
> +	if (mvdev->migrate_cap)
> +		mlx5vf_cmd_remove_migratable(mvdev);
>  	vfio_pci_core_uninit_device(&mvdev->core_device);
>  	kfree(mvdev);
>  	return ret;
> @@ -624,6 +609,8 @@ static void mlx5vf_pci_remove(struct pci_dev *pdev)
>  	struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
>  
>  	vfio_pci_core_unregister_device(&mvdev->core_device);
> +	if (mvdev->migrate_cap)
> +		mlx5vf_cmd_remove_migratable(mvdev);
>  	vfio_pci_core_uninit_device(&mvdev->core_device);
>  	kfree(mvdev);
>  }


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 0/5] Improve mlx5 live migration driver
  2022-05-04 20:19   ` Alex Williamson
@ 2022-05-04 21:33     ` Jason Gunthorpe
  2022-05-04 22:48       ` Alex Williamson
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2022-05-04 21:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On Wed, May 04, 2022 at 02:19:19PM -0600, Alex Williamson wrote:

> > This may go apparently via your tree as a PR from mlx5-next once you'll 
> > be fine with.
> 
> As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
> which has no other dependencies in this series.  Is there something
> else blocking committing that via the mlx tree and providing a branch
> for the remainder to go in through the vfio tree?  Thanks,

Our process is to not add dead code to our non-rebasing branches until
we have an ack on the consumer patches.

So you can get a PR from Leon with everything sorted out including the
VFIO bits, or you can get a PR from Leon with just the shared branch,
after you say OK.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 0/5] Improve mlx5 live migration driver
  2022-05-04 21:33     ` Jason Gunthorpe
@ 2022-05-04 22:48       ` Alex Williamson
  2022-05-05  5:38         ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2022-05-04 22:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On Wed, 4 May 2022 18:33:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, May 04, 2022 at 02:19:19PM -0600, Alex Williamson wrote:
> 
> > > This may go apparently via your tree as a PR from mlx5-next once you'll 
> > > be fine with.  
> > 
> > As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
> > which has no other dependencies in this series.  Is there something
> > else blocking committing that via the mlx tree and providing a branch
> > for the remainder to go in through the vfio tree?  Thanks,  
> 
> Our process is to not add dead code to our non-rebasing branches until
> we have an ack on the consumer patches.
> 
> So you can get a PR from Leon with everything sorted out including the
> VFIO bits, or you can get a PR from Leon with just the shared branch,
> after you say OK.

As long as Leon wants to wait for some acks in the former case, I'm fine
with either, but I don't expect to be able to shoot down the premise of
the series.  You folks are the experts how your device works and there
are no API changes on the vfio side for me to critique here.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 0/5] Improve mlx5 live migration driver
  2022-05-04 22:48       ` Alex Williamson
@ 2022-05-05  5:38         ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2022-05-05  5:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jason Gunthorpe, Yishai Hadas, saeedm, kvm, netdev, kuba, maorg, cohuck

On Wed, May 04, 2022 at 04:48:17PM -0600, Alex Williamson wrote:
> On Wed, 4 May 2022 18:33:09 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, May 04, 2022 at 02:19:19PM -0600, Alex Williamson wrote:
> > 
> > > > This may go apparently via your tree as a PR from mlx5-next once you'll 
> > > > be fine with.  
> > > 
> > > As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
> > > which has no other dependencies in this series.  Is there something
> > > else blocking committing that via the mlx tree and providing a branch
> > > for the remainder to go in through the vfio tree?  Thanks,  
> > 
> > Our process is to not add dead code to our non-rebasing branches until
> > we have an ack on the consumer patches.
> > 
> > So you can get a PR from Leon with everything sorted out including the
> > VFIO bits, or you can get a PR from Leon with just the shared branch,
> > after you say OK.
> 
> As long as Leon wants to wait for some acks in the former case, I'm fine
> with either, but I don't expect to be able to shoot down the premise of
> the series.  You folks are the experts how your device works and there
> are no API changes on the vfio side for me to critique here.  Thanks,

I will prepare PR on Sunday/Monday.

Thanks

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code
  2022-05-04 20:13   ` Alex Williamson
@ 2022-05-08 12:56     ` Yishai Hadas
  0 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2022-05-08 12:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: jgg, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On 04/05/2022 23:13, Alex Williamson wrote:
> On Wed, 27 Apr 2022 12:31:16 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Reorganize the VF is migratable code to be in a separate function, next
>> patches from the series may use this.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>> ---
>>   drivers/vfio/pci/mlx5/cmd.c  | 18 ++++++++++++++++++
>>   drivers/vfio/pci/mlx5/cmd.h  |  1 +
>>   drivers/vfio/pci/mlx5/main.c | 22 +++++++---------------
>>   3 files changed, 26 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
>> index 5c9f9218cc1d..d608b8167f58 100644
>> --- a/drivers/vfio/pci/mlx5/cmd.c
>> +++ b/drivers/vfio/pci/mlx5/cmd.c
>> @@ -71,6 +71,24 @@ int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>>   	return ret;
>>   }
>>   
>> +bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev)
>> +{
>> +	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
>> +	bool migratable = false;
>> +
>> +	if (!mdev)
>> +		return false;
>> +
>> +	if (!MLX5_CAP_GEN(mdev, migration))
>> +		goto end;
>> +
>> +	migratable = true;
>> +
>> +end:
>> +	mlx5_vf_put_core_dev(mdev);
>> +	return migratable;
>> +}
> This goto seems unnecessary, couldn't it instead be written:
>
> {
> 	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
> 	boot migratable = true;
>
> 	if (!mdev)
> 		return false;
>
> 	if (!MLX5_CAP_GEN(mdev, migration))
> 		migratable = false;
>
> 	mlx5_vf_put_core_mdev(mdev);
> 	return migratable;
> }
>
> Thanks,
> Alex


V1 will handle that as part of some refactoring and combing this patch 
and patch #3 based on your notes there.

Thanks.

>
>> +
>>   int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id)
>>   {
>>   	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
>> diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
>> index 1392a11a9cc0..2da6a1c0ec5c 100644
>> --- a/drivers/vfio/pci/mlx5/cmd.h
>> +++ b/drivers/vfio/pci/mlx5/cmd.h
>> @@ -29,6 +29,7 @@ int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
>>   int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>>   					  size_t *state_size);
>>   int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
>> +bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev);
>>   int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
>>   			       struct mlx5_vf_migration_file *migf);
>>   int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
>> diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
>> index bbec5d288fee..2578f61eaeae 100644
>> --- a/drivers/vfio/pci/mlx5/main.c
>> +++ b/drivers/vfio/pci/mlx5/main.c
>> @@ -597,21 +597,13 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
>>   		return -ENOMEM;
>>   	vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
>>   
>> -	if (pdev->is_virtfn) {
>> -		struct mlx5_core_dev *mdev =
>> -			mlx5_vf_get_core_dev(pdev);
>> -
>> -		if (mdev) {
>> -			if (MLX5_CAP_GEN(mdev, migration)) {
>> -				mvdev->migrate_cap = 1;
>> -				mvdev->core_device.vdev.migration_flags =
>> -					VFIO_MIGRATION_STOP_COPY |
>> -					VFIO_MIGRATION_P2P;
>> -				mutex_init(&mvdev->state_mutex);
>> -				spin_lock_init(&mvdev->reset_lock);
>> -			}
>> -			mlx5_vf_put_core_dev(mdev);
>> -		}
>> +	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(pdev)) {
>> +		mvdev->migrate_cap = 1;
>> +		mvdev->core_device.vdev.migration_flags =
>> +			VFIO_MIGRATION_STOP_COPY |
>> +			VFIO_MIGRATION_P2P;
>> +		mutex_init(&mvdev->state_mutex);
>> +		spin_lock_init(&mvdev->reset_lock);
>>   	}
>>   
>>   	ret = vfio_pci_core_register_device(&mvdev->core_device);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH mlx5-next 3/5] vfio/mlx5: Manage the VF attach/detach callback from the PF
  2022-05-04 20:34   ` Alex Williamson
@ 2022-05-08 13:04     ` Yishai Hadas
  0 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2022-05-08 13:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: jgg, saeedm, kvm, netdev, kuba, leonro, maorg, cohuck

On 04/05/2022 23:34, Alex Williamson wrote:
> On Wed, 27 Apr 2022 12:31:18 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Manage the VF attach/detach callback from the PF.
>>
>> This lets the driver to enable parallel VFs migration as will be
>> introduced in the next patch.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>> ---
>>   drivers/vfio/pci/mlx5/cmd.c  | 59 +++++++++++++++++++++++++++++++++---
>>   drivers/vfio/pci/mlx5/cmd.h  | 23 +++++++++++++-
>>   drivers/vfio/pci/mlx5/main.c | 25 ++++-----------
>>   3 files changed, 82 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
>> index d608b8167f58..1f84d7b9b9e5 100644
>> --- a/drivers/vfio/pci/mlx5/cmd.c
>> +++ b/drivers/vfio/pci/mlx5/cmd.c
>> @@ -71,21 +71,70 @@ int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>>   	return ret;
>>   }
>>   
>> -bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev)
>> +static int mlx5fv_vf_event(struct notifier_block *nb,
>> +			   unsigned long event, void *data)
>>   {
>> -	struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
>> +	struct mlx5vf_pci_core_device *mvdev =
>> +		container_of(nb, struct mlx5vf_pci_core_device, nb);
>> +
>> +	mutex_lock(&mvdev->state_mutex);
>> +	switch (event) {
>> +	case MLX5_PF_NOTIFY_ENABLE_VF:
>> +		mvdev->mdev_detach = false;
>> +		break;
>> +	case MLX5_PF_NOTIFY_DISABLE_VF:
>> +		mvdev->mdev_detach = true;
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +	mlx5vf_state_mutex_unlock(mvdev);
>> +	return 0;
>> +}
>> +
>> +void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev)
>> +{
>> +	mlx5_sriov_blocking_notifier_unregister(mvdev->mdev, mvdev->vf_id,
>> +						&mvdev->nb);
>> +}
>> +
>> +bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev)
> Why did the original implementation take a pdev knowing we're going to
> gut it in the next patch to use an mvdev?  The diff would be easier to
> read.


Agree, in V1 I just combined this patch with the changes from patch #1.


>
> There's also quite a lot of setup here now, it's no longer a simple
> test whether the device supports migration which makes the name
> misleading.  This looks like a "setup migration" function that should
> return 0/-errno.


Thanks, makes sense.

>> +{
>> +	struct pci_dev *pdev = mvdev->core_device.pdev;
>>   	bool migratable = false;
>> +	int ret;
>>   
>> -	if (!mdev)
>> +	mvdev->mdev = mlx5_vf_get_core_dev(pdev);
>> +	if (!mvdev->mdev)
>>   		return false;
>> +	if (!MLX5_CAP_GEN(mvdev->mdev, migration))
>> +		goto end;
>> +	mvdev->vf_id = pci_iov_vf_id(pdev);
>> +	if (mvdev->vf_id < 0)
>> +		goto end;
>>   
>> -	if (!MLX5_CAP_GEN(mdev, migration))
>> +	mutex_init(&mvdev->state_mutex);
>> +	spin_lock_init(&mvdev->reset_lock);
>> +	mvdev->nb.notifier_call = mlx5fv_vf_event;
>> +	ret = mlx5_sriov_blocking_notifier_register(mvdev->mdev, mvdev->vf_id,
>> +						    &mvdev->nb);
>> +	if (ret)
>>   		goto end;
>>   
>> +	mutex_lock(&mvdev->state_mutex);
>> +	if (mvdev->mdev_detach)
>> +		goto unreg;
>> +
>> +	mlx5vf_state_mutex_unlock(mvdev);
>>   	migratable = true;
>> +	goto end;
>>   
>> +unreg:
>> +	mlx5vf_state_mutex_unlock(mvdev);
>> +	mlx5_sriov_blocking_notifier_unregister(mvdev->mdev, mvdev->vf_id,
>> +						&mvdev->nb);
>>   end:
>> -	mlx5_vf_put_core_dev(mdev);
>> +	mlx5_vf_put_core_dev(mvdev->mdev);
>>   	return migratable;
>>   }
>>   
>> diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
>> index 2da6a1c0ec5c..f47174eab4b8 100644
>> --- a/drivers/vfio/pci/mlx5/cmd.h
>> +++ b/drivers/vfio/pci/mlx5/cmd.h
>> @@ -7,6 +7,7 @@
>>   #define MLX5_VFIO_CMD_H
>>   
>>   #include <linux/kernel.h>
>> +#include <linux/vfio_pci_core.h>
>>   #include <linux/mlx5/driver.h>
>>   
>>   struct mlx5_vf_migration_file {
>> @@ -24,14 +25,34 @@ struct mlx5_vf_migration_file {
>>   	unsigned long last_offset;
>>   };
>>   
>> +struct mlx5vf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	int vf_id;
>> +	u16 vhca_id;
>> +	u8 migrate_cap:1;
>> +	u8 deferred_reset:1;
>> +	/* protect migration state */
>> +	struct mutex state_mutex;
>> +	enum vfio_device_mig_state mig_state;
>> +	/* protect the reset_done flow */
>> +	spinlock_t reset_lock;
>> +	struct mlx5_vf_migration_file *resuming_migf;
>> +	struct mlx5_vf_migration_file *saving_migf;
>> +	struct notifier_block nb;
>> +	struct mlx5_core_dev *mdev;
>> +	u8 mdev_detach:1;
>> +};
>> +
>>   int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
>>   int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
>>   int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
>>   					  size_t *state_size);
>>   int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
>> -bool mlx5vf_cmd_is_migratable(struct pci_dev *pdev);
>> +bool mlx5vf_cmd_is_migratable(struct mlx5vf_pci_core_device *mvdev);
>> +void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev);
>>   int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
>>   			       struct mlx5_vf_migration_file *migf);
>>   int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
>>   			       struct mlx5_vf_migration_file *migf);
>> +void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
>>   #endif /* MLX5_VFIO_CMD_H */
>> diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
>> index 2578f61eaeae..445c516d38d9 100644
>> --- a/drivers/vfio/pci/mlx5/main.c
>> +++ b/drivers/vfio/pci/mlx5/main.c
>> @@ -17,7 +17,6 @@
>>   #include <linux/uaccess.h>
>>   #include <linux/vfio.h>
>>   #include <linux/sched/mm.h>
>> -#include <linux/vfio_pci_core.h>
>>   #include <linux/anon_inodes.h>
>>   
>>   #include "cmd.h"
>> @@ -25,20 +24,6 @@
>>   /* Arbitrary to prevent userspace from consuming endless memory */
>>   #define MAX_MIGRATION_SIZE (512*1024*1024)
>>   
>> -struct mlx5vf_pci_core_device {
>> -	struct vfio_pci_core_device core_device;
>> -	u16 vhca_id;
>> -	u8 migrate_cap:1;
>> -	u8 deferred_reset:1;
>> -	/* protect migration state */
>> -	struct mutex state_mutex;
>> -	enum vfio_device_mig_state mig_state;
>> -	/* protect the reset_done flow */
>> -	spinlock_t reset_lock;
>> -	struct mlx5_vf_migration_file *resuming_migf;
>> -	struct mlx5_vf_migration_file *saving_migf;
>> -};
>> -
>>   static struct page *
>>   mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf,
>>   			  unsigned long offset)
>> @@ -444,7 +429,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
>>    * This function is called in all state_mutex unlock cases to
>>    * handle a 'deferred_reset' if exists.
>>    */
>> -static void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
>> +void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
>>   {
>>   again:
>>   	spin_lock(&mvdev->reset_lock);
>> @@ -597,13 +582,11 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
>>   		return -ENOMEM;
>>   	vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
>>   
>> -	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(pdev)) {
>> +	if (pdev->is_virtfn && mlx5vf_cmd_is_migratable(mvdev)) {
>>   		mvdev->migrate_cap = 1;
>>   		mvdev->core_device.vdev.migration_flags =
>>   			VFIO_MIGRATION_STOP_COPY |
>>   			VFIO_MIGRATION_P2P;
> Why do these aspects of setting up migration remain here?  Do we even
> need this new function to have a return value?  It looks like all of
> this and testing whether the pdev->is_virtfn could be pushed into the
> new function, which could then return void.  Thanks,


Makes sense, will be part of V1, thanks.


> Alex
>
>> -		mutex_init(&mvdev->state_mutex);
>> -		spin_lock_init(&mvdev->reset_lock);
>>   	}
>>   
>>   	ret = vfio_pci_core_register_device(&mvdev->core_device);
>> @@ -614,6 +597,8 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
>>   	return 0;
>>   
>>   out_free:
>> +	if (mvdev->migrate_cap)
>> +		mlx5vf_cmd_remove_migratable(mvdev);
>>   	vfio_pci_core_uninit_device(&mvdev->core_device);
>>   	kfree(mvdev);
>>   	return ret;
>> @@ -624,6 +609,8 @@ static void mlx5vf_pci_remove(struct pci_dev *pdev)
>>   	struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
>>   
>>   	vfio_pci_core_unregister_device(&mvdev->core_device);
>> +	if (mvdev->migrate_cap)
>> +		mlx5vf_cmd_remove_migratable(mvdev);
>>   	vfio_pci_core_uninit_device(&mvdev->core_device);
>>   	kfree(mvdev);
>>   }



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-05-08 13:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27  9:31 [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
2022-04-27  9:31 ` [PATCH mlx5-next 1/5] vfio/mlx5: Reorganize the VF is migratable code Yishai Hadas
2022-05-04 20:13   ` Alex Williamson
2022-05-08 12:56     ` Yishai Hadas
2022-04-27  9:31 ` [PATCH mlx5-next 2/5] net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs Yishai Hadas
2022-05-04 13:55   ` Jason Gunthorpe
2022-04-27  9:31 ` [PATCH mlx5-next 3/5] vfio/mlx5: Manage the VF attach/detach callback from the PF Yishai Hadas
2022-05-04 20:34   ` Alex Williamson
2022-05-08 13:04     ` Yishai Hadas
2022-04-27  9:31 ` [PATCH mlx5-next 4/5] vfio/mlx5: Refactor to enable VFs migration in parallel Yishai Hadas
2022-04-27  9:31 ` [PATCH mlx5-next 5/5] vfio/mlx5: Run the SAVE state command in an async mode Yishai Hadas
2022-05-04 13:29 ` [PATCH mlx5-next 0/5] Improve mlx5 live migration driver Yishai Hadas
2022-05-04 20:19   ` Alex Williamson
2022-05-04 21:33     ` Jason Gunthorpe
2022-05-04 22:48       ` Alex Williamson
2022-05-05  5:38         ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.