All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver
@ 2023-01-24 14:49 Yishai Hadas
  2023-01-24 14:49 ` [PATCH vfio 1/3] vfio/mlx5: Check whether VF is migratable Yishai Hadas
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Yishai Hadas @ 2023-01-24 14:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, shayd, yishaih, maorg, avihaih

This series includes few improvements for mlx5 driver in the migration
area as of below.

The first patch adds an early checking whether the VF was configured to
support migration and report this capability accordingly.

This is a complementary patch for the series [1] that was accepted in
the previous kernel cycle in the devlink/mlx5_core area which requires a
specific setting for a VF to be migratable capable.

The other two patches improve the PRE_COPY flow in both the source and
the target to prepare ahead the stuff (i.e. data buffers, MKEY) that is
needed upon STOP_COPY and as such reduce the migration downtime.

[1] https://www.spinics.net/lists/netdev/msg867066.html

Yishai

Shay Drory (1):
  vfio/mlx5: Check whether VF is migratable

Yishai Hadas (2):
  vfio/mlx5: Improve the source side flow upon pre_copy
  vfio/mlx5: Improve the target side flow to reduce downtime

 drivers/vfio/pci/mlx5/cmd.c  |  58 +++++++--
 drivers/vfio/pci/mlx5/cmd.h  |  28 +++-
 drivers/vfio/pci/mlx5/main.c | 244 ++++++++++++++++++++++++++++++-----
 3 files changed, 284 insertions(+), 46 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH vfio 1/3] vfio/mlx5: Check whether VF is migratable
  2023-01-24 14:49 [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Yishai Hadas
@ 2023-01-24 14:49 ` Yishai Hadas
  2023-01-24 14:49 ` [PATCH vfio 2/3] vfio/mlx5: Improve the source side flow upon pre_copy Yishai Hadas
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Yishai Hadas @ 2023-01-24 14:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, shayd, yishaih, maorg, avihaih

From: Shay Drory <shayd@nvidia.com>

Add a check whether VF is migratable. Only if VF is migratable, mark the
VFIO device as migration capable.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c | 27 +++++++++++++++++++++++++++
 drivers/vfio/pci/mlx5/cmd.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 0586f09c69af..e956e79626b7 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -7,6 +7,29 @@
 
 enum { CQ_OK = 0, CQ_EMPTY = -1, CQ_POLL_ERR = -2 };
 
+static int mlx5vf_is_migratable(struct mlx5_core_dev *mdev, u16 func_id)
+{
+	int query_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
+	void *query_cap = NULL, *cap;
+	int ret;
+
+	query_cap = kzalloc(query_sz, GFP_KERNEL);
+	if (!query_cap)
+		return -ENOMEM;
+
+	ret = mlx5_vport_get_other_func_cap(mdev, func_id, query_cap,
+					    MLX5_CAP_GENERAL_2);
+	if (ret)
+		goto out;
+
+	cap = MLX5_ADDR_OF(query_hca_cap_out, query_cap, capability);
+	if (!MLX5_GET(cmd_hca_cap_2, cap, migratable))
+		ret = -EOPNOTSUPP;
+out:
+	kfree(query_cap);
+	return ret;
+}
+
 static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id,
 				  u16 *vhca_id);
 static void
@@ -195,6 +218,10 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
 	if (mvdev->vf_id < 0)
 		goto end;
 
+	ret = mlx5vf_is_migratable(mvdev->mdev, mvdev->vf_id + 1);
+	if (ret)
+		goto end;
+
 	if (mlx5vf_cmd_get_vhca_id(mvdev->mdev, mvdev->vf_id + 1,
 				   &mvdev->vhca_id))
 		goto end;
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 5483171d57ad..657d94affe2b 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -9,6 +9,7 @@
 #include <linux/kernel.h>
 #include <linux/vfio_pci_core.h>
 #include <linux/mlx5/driver.h>
+#include <linux/mlx5/vport.h>
 #include <linux/mlx5/cq.h>
 #include <linux/mlx5/qp.h>
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH vfio 2/3] vfio/mlx5: Improve the source side flow upon pre_copy
  2023-01-24 14:49 [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Yishai Hadas
  2023-01-24 14:49 ` [PATCH vfio 1/3] vfio/mlx5: Check whether VF is migratable Yishai Hadas
@ 2023-01-24 14:49 ` Yishai Hadas
  2023-01-24 14:49 ` [PATCH vfio 3/3] vfio/mlx5: Improve the target side flow to reduce downtime Yishai Hadas
  2023-01-30 23:52 ` [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Alex Williamson
  3 siblings, 0 replies; 5+ messages in thread
From: Yishai Hadas @ 2023-01-24 14:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, shayd, yishaih, maorg, avihaih

Improve the source side flow upon pre_copy as of below.

- Prepare the stop_copy buffers as part of moving to pre_copy.
- Send to the target a record that includes the expected
  stop_copy size to let it optimize its stop_copy flow as well.

As for sending the target this new record type (i.e.
MLX5_MIGF_HEADER_TAG_STOP_COPY_SIZE) we split the current 64 header
flags bits into 32 flags bits and another 32 tag bits, each record may
have a tag and a flag whether it's optional or mandatory. Optional
records will be ignored in the target.

The above reduces the downtime upon stop_copy as the relevant data stuff
is prepared ahead as part of pre_copy.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  |  31 +++++---
 drivers/vfio/pci/mlx5/cmd.h  |  21 +++++-
 drivers/vfio/pci/mlx5/main.c | 133 +++++++++++++++++++++++++++++------
 3 files changed, 151 insertions(+), 34 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index e956e79626b7..5161d845c478 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -500,7 +500,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work)
 }
 
 static int add_buf_header(struct mlx5_vhca_data_buffer *header_buf,
-			  size_t image_size)
+			  size_t image_size, bool initial_pre_copy)
 {
 	struct mlx5_vf_migration_file *migf = header_buf->migf;
 	struct mlx5_vf_migration_header header = {};
@@ -508,7 +508,9 @@ static int add_buf_header(struct mlx5_vhca_data_buffer *header_buf,
 	struct page *page;
 	u8 *to_buff;
 
-	header.image_size = cpu_to_le64(image_size);
+	header.record_size = cpu_to_le64(image_size);
+	header.flags = cpu_to_le32(MLX5_MIGF_HEADER_FLAGS_TAG_MANDATORY);
+	header.tag = cpu_to_le32(MLX5_MIGF_HEADER_TAG_FW_DATA);
 	page = mlx5vf_get_migration_page(header_buf, 0);
 	if (!page)
 		return -EINVAL;
@@ -516,12 +518,13 @@ static int add_buf_header(struct mlx5_vhca_data_buffer *header_buf,
 	memcpy(to_buff, &header, sizeof(header));
 	kunmap_local(to_buff);
 	header_buf->length = sizeof(header);
-	header_buf->header_image_size = image_size;
 	header_buf->start_pos = header_buf->migf->max_pos;
 	migf->max_pos += header_buf->length;
 	spin_lock_irqsave(&migf->list_lock, flags);
 	list_add_tail(&header_buf->buf_elm, &migf->buf_list);
 	spin_unlock_irqrestore(&migf->list_lock, flags);
+	if (initial_pre_copy)
+		migf->pre_copy_initial_bytes += sizeof(header);
 	return 0;
 }
 
@@ -535,11 +538,14 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 	if (!status) {
 		size_t image_size;
 		unsigned long flags;
+		bool initial_pre_copy = migf->state != MLX5_MIGF_STATE_PRE_COPY &&
+				!async_data->last_chunk;
 
 		image_size = MLX5_GET(save_vhca_state_out, async_data->out,
 				      actual_image_size);
 		if (async_data->header_buf) {
-			status = add_buf_header(async_data->header_buf, image_size);
+			status = add_buf_header(async_data->header_buf, image_size,
+						initial_pre_copy);
 			if (status)
 				goto err;
 		}
@@ -549,6 +555,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 		spin_lock_irqsave(&migf->list_lock, flags);
 		list_add_tail(&async_data->buf->buf_elm, &migf->buf_list);
 		spin_unlock_irqrestore(&migf->list_lock, flags);
+		if (initial_pre_copy)
+			migf->pre_copy_initial_bytes += image_size;
 		migf->state = async_data->last_chunk ?
 			MLX5_MIGF_STATE_COMPLETE : MLX5_MIGF_STATE_PRE_COPY;
 		wake_up_interruptible(&migf->poll_wait);
@@ -610,11 +618,16 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (MLX5VF_PRE_COPY_SUPP(mvdev)) {
-		header_buf = mlx5vf_get_data_buffer(migf,
-			sizeof(struct mlx5_vf_migration_header), DMA_NONE);
-		if (IS_ERR(header_buf)) {
-			err = PTR_ERR(header_buf);
-			goto err_free;
+		if (async_data->last_chunk && migf->buf_header) {
+			header_buf = migf->buf_header;
+			migf->buf_header = NULL;
+		} else {
+			header_buf = mlx5vf_get_data_buffer(migf,
+				sizeof(struct mlx5_vf_migration_header), DMA_NONE);
+			if (IS_ERR(header_buf)) {
+				err = PTR_ERR(header_buf);
+				goto err_free;
+			}
 		}
 	}
 
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 657d94affe2b..8f1bef580028 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -32,10 +32,26 @@ enum mlx5_vf_load_state {
 	MLX5_VF_LOAD_STATE_LOAD_IMAGE,
 };
 
+struct mlx5_vf_migration_tag_stop_copy_data {
+	__le64 stop_copy_size;
+};
+
+enum mlx5_vf_migf_header_flags {
+	MLX5_MIGF_HEADER_FLAGS_TAG_MANDATORY = 0,
+	MLX5_MIGF_HEADER_FLAGS_TAG_OPTIONAL = 1 << 0,
+};
+
+enum mlx5_vf_migf_header_tag {
+	MLX5_MIGF_HEADER_TAG_FW_DATA = 0,
+	MLX5_MIGF_HEADER_TAG_STOP_COPY_SIZE = 1 << 0,
+};
+
 struct mlx5_vf_migration_header {
-	__le64 image_size;
+	__le64 record_size;
 	/* For future use in case we may need to change the kernel protocol */
-	__le64 flags;
+	__le32 flags; /* Use mlx5_vf_migf_header_flags */
+	__le32 tag; /* Use mlx5_vf_migf_header_tag */
+	__u8 data[]; /* Its size is given in the record_size */
 };
 
 struct mlx5_vhca_data_buffer {
@@ -73,6 +89,7 @@ struct mlx5_vf_migration_file {
 	enum mlx5_vf_load_state load_state;
 	u32 pdn;
 	loff_t max_pos;
+	u64 pre_copy_initial_bytes;
 	struct mlx5_vhca_data_buffer *buf;
 	struct mlx5_vhca_data_buffer *buf_header;
 	spinlock_t list_lock;
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 7ba127d8889a..6856e7b97533 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -304,6 +304,87 @@ static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf)
 	wake_up_interruptible(&migf->poll_wait);
 }
 
+static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf)
+{
+	size_t size = sizeof(struct mlx5_vf_migration_header) +
+		sizeof(struct mlx5_vf_migration_tag_stop_copy_data);
+	struct mlx5_vf_migration_tag_stop_copy_data data = {};
+	struct mlx5_vhca_data_buffer *header_buf = NULL;
+	struct mlx5_vf_migration_header header = {};
+	unsigned long flags;
+	struct page *page;
+	u8 *to_buff;
+	int ret;
+
+	header_buf = mlx5vf_get_data_buffer(migf, size, DMA_NONE);
+	if (IS_ERR(header_buf))
+		return PTR_ERR(header_buf);
+
+	header.record_size = cpu_to_le64(sizeof(data));
+	header.flags = cpu_to_le32(MLX5_MIGF_HEADER_FLAGS_TAG_OPTIONAL);
+	header.tag = cpu_to_le32(MLX5_MIGF_HEADER_TAG_STOP_COPY_SIZE);
+	page = mlx5vf_get_migration_page(header_buf, 0);
+	if (!page) {
+		ret = -EINVAL;
+		goto err;
+	}
+	to_buff = kmap_local_page(page);
+	memcpy(to_buff, &header, sizeof(header));
+	header_buf->length = sizeof(header);
+	data.stop_copy_size = cpu_to_le64(migf->buf->allocated_length);
+	memcpy(to_buff + sizeof(header), &data, sizeof(data));
+	header_buf->length += sizeof(data);
+	kunmap_local(to_buff);
+	header_buf->start_pos = header_buf->migf->max_pos;
+	migf->max_pos += header_buf->length;
+	spin_lock_irqsave(&migf->list_lock, flags);
+	list_add_tail(&header_buf->buf_elm, &migf->buf_list);
+	spin_unlock_irqrestore(&migf->list_lock, flags);
+	migf->pre_copy_initial_bytes = size;
+	return 0;
+err:
+	mlx5vf_put_data_buffer(header_buf);
+	return ret;
+}
+
+static int mlx5vf_prep_stop_copy(struct mlx5_vf_migration_file *migf,
+				 size_t state_size)
+{
+	struct mlx5_vhca_data_buffer *buf;
+	size_t inc_state_size;
+	int ret;
+
+	/* let's be ready for stop_copy size that might grow by 10 percents */
+	if (check_add_overflow(state_size, state_size / 10, &inc_state_size))
+		inc_state_size = state_size;
+
+	buf = mlx5vf_get_data_buffer(migf, inc_state_size, DMA_FROM_DEVICE);
+	if (IS_ERR(buf))
+		return PTR_ERR(buf);
+
+	migf->buf = buf;
+	buf = mlx5vf_get_data_buffer(migf,
+			sizeof(struct mlx5_vf_migration_header), DMA_NONE);
+	if (IS_ERR(buf)) {
+		ret = PTR_ERR(buf);
+		goto err;
+	}
+
+	migf->buf_header = buf;
+	ret = mlx5vf_add_stop_copy_header(migf);
+	if (ret)
+		goto err_header;
+	return 0;
+
+err_header:
+	mlx5vf_put_data_buffer(migf->buf_header);
+	migf->buf_header = NULL;
+err:
+	mlx5vf_put_data_buffer(migf->buf);
+	migf->buf = NULL;
+	return ret;
+}
+
 static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
 				 unsigned long arg)
 {
@@ -314,7 +395,7 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
 	loff_t *pos = &filp->f_pos;
 	unsigned long minsz;
 	size_t inc_length = 0;
-	bool end_of_data;
+	bool end_of_data = false;
 	int ret;
 
 	if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
@@ -358,25 +439,19 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
 		goto err_migf_unlock;
 	}
 
-	buf = mlx5vf_get_data_buff_from_pos(migf, *pos, &end_of_data);
-	if (buf) {
-		if (buf->start_pos == 0) {
-			info.initial_bytes = buf->header_image_size - *pos;
-		} else if (buf->start_pos ==
-				sizeof(struct mlx5_vf_migration_header)) {
-			/* First data buffer following the header */
-			info.initial_bytes = buf->start_pos +
-						buf->length - *pos;
-		} else {
-			info.dirty_bytes = buf->start_pos + buf->length - *pos;
-		}
+	if (migf->pre_copy_initial_bytes > *pos) {
+		info.initial_bytes = migf->pre_copy_initial_bytes - *pos;
 	} else {
-		if (!end_of_data) {
-			ret = -EINVAL;
-			goto err_migf_unlock;
+		buf = mlx5vf_get_data_buff_from_pos(migf, *pos, &end_of_data);
+		if (buf) {
+			info.dirty_bytes = buf->start_pos + buf->length - *pos;
+		} else {
+			if (!end_of_data) {
+				ret = -EINVAL;
+				goto err_migf_unlock;
+			}
+			info.dirty_bytes = inc_length;
 		}
-
-		info.dirty_bytes = inc_length;
 	}
 
 	if (!end_of_data || !inc_length) {
@@ -441,10 +516,16 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev)
 	if (ret)
 		goto err;
 
-	buf = mlx5vf_get_data_buffer(migf, length, DMA_FROM_DEVICE);
-	if (IS_ERR(buf)) {
-		ret = PTR_ERR(buf);
-		goto err;
+	/* Checking whether we have a matching pre-allocated buffer that can fit */
+	if (migf->buf && migf->buf->allocated_length >= length) {
+		buf = migf->buf;
+		migf->buf = NULL;
+	} else {
+		buf = mlx5vf_get_data_buffer(migf, length, DMA_FROM_DEVICE);
+		if (IS_ERR(buf)) {
+			ret = PTR_ERR(buf);
+			goto err;
+		}
 	}
 
 	ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, false);
@@ -503,6 +584,12 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
 	if (ret)
 		goto out_pd;
 
+	if (track) {
+		ret = mlx5vf_prep_stop_copy(migf, length);
+		if (ret)
+			goto out_pd;
+	}
+
 	buf = mlx5vf_alloc_data_buffer(migf, length, DMA_FROM_DEVICE);
 	if (IS_ERR(buf)) {
 		ret = PTR_ERR(buf);
@@ -516,7 +603,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
 out_save:
 	mlx5vf_free_data_buffer(buf);
 out_pd:
-	mlx5vf_cmd_dealloc_pd(migf);
+	mlx5fv_cmd_clean_migf_resources(migf);
 out_free:
 	fput(migf->filp);
 end:
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH vfio 3/3] vfio/mlx5: Improve the target side flow to reduce downtime
  2023-01-24 14:49 [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Yishai Hadas
  2023-01-24 14:49 ` [PATCH vfio 1/3] vfio/mlx5: Check whether VF is migratable Yishai Hadas
  2023-01-24 14:49 ` [PATCH vfio 2/3] vfio/mlx5: Improve the source side flow upon pre_copy Yishai Hadas
@ 2023-01-24 14:49 ` Yishai Hadas
  2023-01-30 23:52 ` [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Alex Williamson
  3 siblings, 0 replies; 5+ messages in thread
From: Yishai Hadas @ 2023-01-24 14:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, shayd, yishaih, maorg, avihaih

Improve the target side flow to reduce downtime as of below.

- Support reading an optional record which includes the expected
  stop_copy size.
- Once the source sends this record data, which expects to be sent as
  part of the pre_copy flow, prepare the data buffers that may be large
  enough to hold the final stop_copy data.

The above reduces the migration downtime as the relevant stuff that is
needed to load the image data is prepared ahead as part of pre_copy.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.h  |   6 +-
 drivers/vfio/pci/mlx5/main.c | 111 +++++++++++++++++++++++++++++++----
 2 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 8f1bef580028..aec4c69dd6c1 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -27,6 +27,8 @@ enum mlx5_vf_migf_state {
 enum mlx5_vf_load_state {
 	MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER,
 	MLX5_VF_LOAD_STATE_READ_HEADER,
+	MLX5_VF_LOAD_STATE_PREP_HEADER_DATA,
+	MLX5_VF_LOAD_STATE_READ_HEADER_DATA,
 	MLX5_VF_LOAD_STATE_PREP_IMAGE,
 	MLX5_VF_LOAD_STATE_READ_IMAGE,
 	MLX5_VF_LOAD_STATE_LOAD_IMAGE,
@@ -59,7 +61,6 @@ struct mlx5_vhca_data_buffer {
 	loff_t start_pos;
 	u64 length;
 	u64 allocated_length;
-	u64 header_image_size;
 	u32 mkey;
 	enum dma_data_direction dma_dir;
 	u8 dmaed:1;
@@ -89,6 +90,9 @@ struct mlx5_vf_migration_file {
 	enum mlx5_vf_load_state load_state;
 	u32 pdn;
 	loff_t max_pos;
+	u64 record_size;
+	u32 record_tag;
+	u64 stop_copy_prep_size;
 	u64 pre_copy_initial_bytes;
 	struct mlx5_vhca_data_buffer *buf;
 	struct mlx5_vhca_data_buffer *buf_header;
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 6856e7b97533..e897537a9e8a 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -703,6 +703,56 @@ mlx5vf_resume_read_image(struct mlx5_vf_migration_file *migf,
 	return 0;
 }
 
+static int
+mlx5vf_resume_read_header_data(struct mlx5_vf_migration_file *migf,
+			       struct mlx5_vhca_data_buffer *vhca_buf,
+			       const char __user **buf, size_t *len,
+			       loff_t *pos, ssize_t *done)
+{
+	size_t copy_len, to_copy;
+	size_t required_data;
+	u8 *to_buff;
+	int ret;
+
+	required_data = migf->record_size - vhca_buf->length;
+	to_copy = min_t(size_t, *len, required_data);
+	copy_len = to_copy;
+	while (to_copy) {
+		ret = mlx5vf_append_page_to_mig_buf(vhca_buf, buf, &to_copy, pos,
+						    done);
+		if (ret)
+			return ret;
+	}
+
+	*len -= copy_len;
+	if (vhca_buf->length == migf->record_size) {
+		switch (migf->record_tag) {
+		case MLX5_MIGF_HEADER_TAG_STOP_COPY_SIZE:
+		{
+			struct page *page;
+
+			page = mlx5vf_get_migration_page(vhca_buf, 0);
+			if (!page)
+				return -EINVAL;
+			to_buff = kmap_local_page(page);
+			migf->stop_copy_prep_size = min_t(u64,
+				le64_to_cpup((__le64 *)to_buff), MAX_LOAD_SIZE);
+			kunmap_local(to_buff);
+			break;
+		}
+		default:
+			/* Optional tag */
+			break;
+		}
+
+		migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER;
+		migf->max_pos += migf->record_size;
+		vhca_buf->length = 0;
+	}
+
+	return 0;
+}
+
 static int
 mlx5vf_resume_read_header(struct mlx5_vf_migration_file *migf,
 			  struct mlx5_vhca_data_buffer *vhca_buf,
@@ -733,23 +783,38 @@ mlx5vf_resume_read_header(struct mlx5_vf_migration_file *migf,
 	*len -= copy_len;
 	vhca_buf->length += copy_len;
 	if (vhca_buf->length == sizeof(struct mlx5_vf_migration_header)) {
-		u64 flags;
+		u64 record_size;
+		u32 flags;
 
-		vhca_buf->header_image_size = le64_to_cpup((__le64 *)to_buff);
-		if (vhca_buf->header_image_size > MAX_LOAD_SIZE) {
+		record_size = le64_to_cpup((__le64 *)to_buff);
+		if (record_size > MAX_LOAD_SIZE) {
 			ret = -ENOMEM;
 			goto end;
 		}
 
-		flags = le64_to_cpup((__le64 *)(to_buff +
+		migf->record_size = record_size;
+		flags = le32_to_cpup((__le32 *)(to_buff +
 			    offsetof(struct mlx5_vf_migration_header, flags)));
-		if (flags) {
-			ret = -EOPNOTSUPP;
-			goto end;
+		migf->record_tag = le32_to_cpup((__le32 *)(to_buff +
+			    offsetof(struct mlx5_vf_migration_header, tag)));
+		switch (migf->record_tag) {
+		case MLX5_MIGF_HEADER_TAG_FW_DATA:
+			migf->load_state = MLX5_VF_LOAD_STATE_PREP_IMAGE;
+			break;
+		case MLX5_MIGF_HEADER_TAG_STOP_COPY_SIZE:
+			migf->load_state = MLX5_VF_LOAD_STATE_PREP_HEADER_DATA;
+			break;
+		default:
+			if (!(flags & MLX5_MIGF_HEADER_FLAGS_TAG_OPTIONAL)) {
+				ret = -EOPNOTSUPP;
+				goto end;
+			}
+			/* We may read and skip this optional record data */
+			migf->load_state = MLX5_VF_LOAD_STATE_PREP_HEADER_DATA;
 		}
 
-		migf->load_state = MLX5_VF_LOAD_STATE_PREP_IMAGE;
 		migf->max_pos += vhca_buf->length;
+		vhca_buf->length = 0;
 		*has_work = true;
 	}
 end:
@@ -793,9 +858,34 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
 			if (ret)
 				goto out_unlock;
 			break;
+		case MLX5_VF_LOAD_STATE_PREP_HEADER_DATA:
+			if (vhca_buf_header->allocated_length < migf->record_size) {
+				mlx5vf_free_data_buffer(vhca_buf_header);
+
+				migf->buf_header = mlx5vf_alloc_data_buffer(migf,
+						migf->record_size, DMA_NONE);
+				if (IS_ERR(migf->buf_header)) {
+					ret = PTR_ERR(migf->buf_header);
+					migf->buf_header = NULL;
+					goto out_unlock;
+				}
+
+				vhca_buf_header = migf->buf_header;
+			}
+
+			vhca_buf_header->start_pos = migf->max_pos;
+			migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER_DATA;
+			break;
+		case MLX5_VF_LOAD_STATE_READ_HEADER_DATA:
+			ret = mlx5vf_resume_read_header_data(migf, vhca_buf_header,
+							&buf, &len, pos, &done);
+			if (ret)
+				goto out_unlock;
+			break;
 		case MLX5_VF_LOAD_STATE_PREP_IMAGE:
 		{
-			u64 size = vhca_buf_header->header_image_size;
+			u64 size = max(migf->record_size,
+				       migf->stop_copy_prep_size);
 
 			if (vhca_buf->allocated_length < size) {
 				mlx5vf_free_data_buffer(vhca_buf);
@@ -824,7 +914,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
 			break;
 		case MLX5_VF_LOAD_STATE_READ_IMAGE:
 			ret = mlx5vf_resume_read_image(migf, vhca_buf,
-						vhca_buf_header->header_image_size,
+						migf->record_size,
 						&buf, &len, pos, &done, &has_work);
 			if (ret)
 				goto out_unlock;
@@ -837,7 +927,6 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
 
 			/* prep header buf for next image */
 			vhca_buf_header->length = 0;
-			vhca_buf_header->header_image_size = 0;
 			/* prep data buf for next image */
 			vhca_buf->length = 0;
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver
  2023-01-24 14:49 [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Yishai Hadas
                   ` (2 preceding siblings ...)
  2023-01-24 14:49 ` [PATCH vfio 3/3] vfio/mlx5: Improve the target side flow to reduce downtime Yishai Hadas
@ 2023-01-30 23:52 ` Alex Williamson
  3 siblings, 0 replies; 5+ messages in thread
From: Alex Williamson @ 2023-01-30 23:52 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, shayd, maorg, avihaih

On Tue, 24 Jan 2023 16:49:52 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> This series includes few improvements for mlx5 driver in the migration
> area as of below.
> 
> The first patch adds an early checking whether the VF was configured to
> support migration and report this capability accordingly.
> 
> This is a complementary patch for the series [1] that was accepted in
> the previous kernel cycle in the devlink/mlx5_core area which requires a
> specific setting for a VF to be migratable capable.
> 
> The other two patches improve the PRE_COPY flow in both the source and
> the target to prepare ahead the stuff (i.e. data buffers, MKEY) that is
> needed upon STOP_COPY and as such reduce the migration downtime.
> 
> [1] https://www.spinics.net/lists/netdev/msg867066.html
> 
> Yishai
> 
> Shay Drory (1):
>   vfio/mlx5: Check whether VF is migratable
> 
> Yishai Hadas (2):
>   vfio/mlx5: Improve the source side flow upon pre_copy
>   vfio/mlx5: Improve the target side flow to reduce downtime
> 
>  drivers/vfio/pci/mlx5/cmd.c  |  58 +++++++--
>  drivers/vfio/pci/mlx5/cmd.h  |  28 +++-
>  drivers/vfio/pci/mlx5/main.c | 244 ++++++++++++++++++++++++++++++-----
>  3 files changed, 284 insertions(+), 46 deletions(-)
> 

Applied to vfio next branch for v6.3.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-30 23:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-24 14:49 [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Yishai Hadas
2023-01-24 14:49 ` [PATCH vfio 1/3] vfio/mlx5: Check whether VF is migratable Yishai Hadas
2023-01-24 14:49 ` [PATCH vfio 2/3] vfio/mlx5: Improve the source side flow upon pre_copy Yishai Hadas
2023-01-24 14:49 ` [PATCH vfio 3/3] vfio/mlx5: Improve the target side flow to reduce downtime Yishai Hadas
2023-01-30 23:52 ` [PATCH vfio 0/3] Few improvements in the migration area of mlx5 driver Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.