All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yishai Hadas <yishaih@nvidia.com>
To: <alex.williamson@redhat.com>, <jgg@nvidia.com>
Cc: <kvm@vger.kernel.org>, <kevin.tian@intel.com>,
	<joao.m.martins@oracle.com>, <leonro@nvidia.com>,
	<shayd@nvidia.com>, <yishaih@nvidia.com>, <maorg@nvidia.com>,
	<avihaih@nvidia.com>, <cohuck@redhat.com>
Subject: [PATCH vfio 00/13] Add migration PRE_COPY support for mlx5 driver
Date: Sun, 6 Nov 2022 19:46:17 +0200	[thread overview]
Message-ID: <20221106174630.25909-1-yishaih@nvidia.com> (raw)

This series adds migration PRE_COPY uAPIs and their implementation as
part of mlx5 driver.

The uAPIs follow some discussion that was done in the mailing list [1]
in this area.

By the time the patches were sent, there was no driver implementation
for the uAPIs, now we have it for mlx5 driver.

The optional PRE_COPY state opens the saving data transfer FD before
reaching STOP_COPY and allows the device to dirty track the internal
state changes with the general idea to reduce the volume of data
transferred in the STOP_COPY stage.

While in PRE_COPY the device remains RUNNING, but the saving FD is open.

A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to
query the progress of the precopy operation in the driver with the idea
it will judge to move to STOP_COPY at least once the initial data set is
transferred, and possibly after the dirty size has shrunk appropriately.

User space can detect whether PRE_COPY is supported for a given device
by checking the VFIO_MIGRATION_PRE_COPY flag once using the
VFIO_DEVICE_FEATURE_MIGRATION ioctl.

Extra details exist as part of the specific uAPI patch from the series.

Finally, we come with mlx5 implementation based on its device
specification for PRE_COPY.

To support PRE_COPY, mlx5 driver is transferring multiple states
(images) of the device. e.g.: the source VF can save and transfer
multiple states, and the target VF will load them by that order.

The device is saving three kinds of states:
1) Initial state - when the device moves to PRE_COPY state.
2) Middle state - during PRE_COPY phase via VFIO_MIG_GET_PRECOPY_INFO,
                  can be multiple such states.
3) Final state - when the device moves to STOP_COPY state.

After moving to PRE_COPY state, the user is holding the saving FD and
should use it for transferring the data from the source to the target
while the VM is still running. From user point of view, it's a stream of
data, however, from mlx5 driver point of view it includes multiple
images/states. For that, it sets some headers with metadata on the
source to be parsed on the target.

At some point, user may switch the device state from PRE_COPY to
STOP_COPY, this will invoke saving of the final state.

As discussed earlier in the mailing list, the data that is returned as
part of PRE_COPY is not required to have any bearing relative to the
data size available during the STOP_COPY phase.

For this, we have the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE option [2], it
was sent also as part of this series as its initial patch.

In mlx5 driver we could gain with this series about 20-30 percent
improvement in the downtime compared to the previous code when PRE_COPY
wasn't supported.

The series includes some pre-patches to be ready for managing multiple
images then it comes with the PRE_COPY implementation itself.

The matching qemu changes can be previewed here [3].
They come on top of the v2 migration protocol patches that were sent
already to the mailing list.

Note:
As this series includes a net/mlx5 patch, we may need to send it as a
pull request format to VFIO to avoid conflicts before acceptance.

[1] https://lore.kernel.org/kvm/20220302172903.1995-8-shameerali.kolothum.thodi@huawei.com/
[2] https://patchwork.kernel.org/project/kvm/patch/20221026072438.166707-1-yishaih@nvidia.com/
[3] https://github.com/avihai1122/qemu/commits/mig_v2_precopy

Yishai

Jason Gunthorpe (1):
  vfio: Extend the device migration protocol with PRE_COPY

Shay Drory (9):
  net/mlx5: Introduce ifc bits for pre_copy
  vfio/mlx5: Refactor total_length name and usage
  vfio/mlx5: Introduce device transitions of PRE_COPY
  vfio/mlx5: Introduce vfio precopy ioctl implementation
  vfio/mlx5: Manage read() of multiple state saves
  vfio/mlx5: Introduce SW headers for migration states
  vfio/mlx5: Introduce multiple loads
  vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error
  vfio/mlx5: Enable MIGRATION_PRE_COPY flag

Yishai Hadas (3):
  vfio: Add an option to get migration data size
  vfio/mlx5: Fix a typo in mlx5vf_cmd_load_vhca_state()
  vfio/mlx5: Enforce a single SAVE command at a time

 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   9 +
 drivers/vfio/pci/mlx5/cmd.c                   | 131 +++--
 drivers/vfio/pci/mlx5/cmd.h                   |  32 +-
 drivers/vfio/pci/mlx5/main.c                  | 495 ++++++++++++++++--
 drivers/vfio/pci/vfio_pci_core.c              |   3 +-
 drivers/vfio/vfio_main.c                      | 106 +++-
 include/linux/mlx5/mlx5_ifc.h                 |  14 +-
 include/linux/vfio.h                          |   5 +
 include/uapi/linux/vfio.h                     | 135 ++++-
 9 files changed, 843 insertions(+), 87 deletions(-)

-- 
2.18.1


             reply	other threads:[~2022-11-06 17:47 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-06 17:46 Yishai Hadas [this message]
2022-11-06 17:46 ` [PATCH vfio 01/13] vfio: Add an option to get migration data size Yishai Hadas
2022-11-09  7:42   ` liulongfang
2022-11-09 17:06   ` Jason Gunthorpe
2022-11-13 16:58     ` Yishai Hadas
2022-11-14 19:04       ` Alex Williamson
2022-11-06 17:46 ` [PATCH vfio 02/13] vfio/mlx5: Fix a typo in mlx5vf_cmd_load_vhca_state() Yishai Hadas
2022-11-09 17:06   ` Jason Gunthorpe
2022-11-06 17:46 ` [PATCH vfio 03/13] net/mlx5: Introduce ifc bits for pre_copy Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 04/13] vfio: Extend the device migration protocol with PRE_COPY Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 05/13] vfio/mlx5: Enforce a single SAVE command at a time Yishai Hadas
2022-11-09 18:04   ` Jason Gunthorpe
2022-11-10 10:38     ` Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 06/13] vfio/mlx5: Refactor total_length name and usage Yishai Hadas
2022-11-09 18:11   ` Jason Gunthorpe
2022-11-10 11:38     ` Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 07/13] vfio/mlx5: Introduce device transitions of PRE_COPY Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 08/13] vfio/mlx5: Introduce vfio precopy ioctl implementation Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 09/13] vfio/mlx5: Manage read() of multiple state saves Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 10/13] vfio/mlx5: Introduce SW headers for migration states Yishai Hadas
2022-11-09 18:38   ` Jason Gunthorpe
2022-11-06 17:46 ` [PATCH vfio 11/13] vfio/mlx5: Introduce multiple loads Yishai Hadas
2022-11-09 18:45   ` Jason Gunthorpe
2022-11-06 17:46 ` [PATCH vfio 12/13] vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error Yishai Hadas
2022-11-06 17:46 ` [PATCH vfio 13/13] vfio/mlx5: Enable MIGRATION_PRE_COPY flag Yishai Hadas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221106174630.25909-1-yishaih@nvidia.com \
    --to=yishaih@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leonro@nvidia.com \
    --cc=maorg@nvidia.com \
    --cc=shayd@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.