From: Yishai Hadas <yishaih@nvidia.com>
To: <alex.williamson@redhat.com>, <bhelgaas@google.com>,
<jgg@nvidia.com>, <saeedm@nvidia.com>
Cc: <linux-pci@vger.kernel.org>, <kvm@vger.kernel.org>,
<netdev@vger.kernel.org>, <kuba@kernel.org>, <leonro@nvidia.com>,
<kwankhede@nvidia.com>, <mgurtovoy@nvidia.com>,
<yishaih@nvidia.com>, <maorg@nvidia.com>, <cohuck@redhat.com>,
<ashok.raj@intel.com>, <kevin.tian@intel.com>,
<shameerali.kolothum.thodi@huawei.com>
Subject: [PATCH V9 mlx5-next 00/15] Add mlx5 live migration driver and v2 migration protocol
Date: Thu, 24 Feb 2022 16:20:09 +0200 [thread overview]
Message-ID: <20220224142024.147653-1-yishaih@nvidia.com> (raw)
This series adds mlx5 live migration driver for VFs that are migration
capable and includes the v2 migration protocol definition and mlx5
implementation.
The mlx5 driver uses the vfio_pci_core split to create a specific VFIO
PCI driver that matches the mlx5 virtual functions. The driver provides
the same experience as normal vfio-pci with the addition of migration
support.
In HW the migration is controlled by the PF function, using its
mlx5_core driver, and the VFIO PCI VF driver co-ordinates with the PF to
execute the migration actions.
The bulk of the v2 migration protocol is semantically the same v1,
however it has been recast into a FSM for the device_state and the
actual syscall interface uses normal ioctl(), read() and write() instead
of building a syscall interface using the region.
Several bits of infrastructure work are included here:
- pci_iov_vf_id() to help drivers like mlx5 figure out the VF index from
a BDF
- pci_iov_get_pf_drvdata() to clarify the tricky locking protocol when a
VF reaches into its PF's driver
- mlx5_core uses the normal SRIOV lifecycle and disables SRIOV before
driver remove, to be compatible with pci_iov_get_pf_drvdata()
- Lifting VFIO_DEVICE_FEATURE into core VFIO code
This series comes after alot of discussion. Some major points:
- v1 ABI compatible migration defined using the same FSM approach:
https://lore.kernel.org/all/0-v1-a4f7cab64938+3f-vfio_mig_states_jgg@nvidia.com/
- Attempts to clarify how the v1 API works:
Alex's:
https://lore.kernel.org/kvm/163909282574.728533.7460416142511440919.stgit@omen/
Jason's:
https://lore.kernel.org/all/0-v3-184b374ad0a8+24c-vfio_mig_doc_jgg@nvidia.com/
- Etherpad exploring the scope and questions of general VFIO migration:
https://lore.kernel.org/kvm/87mtm2loml.fsf@redhat.com/
NOTE: As this series touched mlx5_core parts we need to send this in a
pull request format to VFIO to avoid conflicts.
Matching qemu changes can be previewed here:
https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2
Changes from V8: https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/
vfio:
- Fix some documentation notes given by Alex and Cornelia for v2.
- Add Reviewed-by: Kevin Tian <kevin.tian@intel.com>
vfio/mlx5, net/mlx5:
- Use more inclusive terminology for slave/master as was asked by Alex.
Changes from V7: https://lore.kernel.org/kvm/20220207172216.206415-1-yishaih@nvidia.com/T/
vfio:
- Fix and improve some documentation notes.
- Improve vfio_ioctl_device_feature_migration() to check for the
existence of both set and get device ops.
- Improve some commit logs.
- Drop the PRE_COPY patch as was asked by Alex since we have no proposed
in-kernel users.
- Add Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>.
vfio/mlx5:
- Better packing struct mlx5vf_pci_core_device.
net/mlx5:
- Update mlx5 command list for error/debug cases.
Changes from V6: https://lore.kernel.org/netdev/20220130160826.32449-1-yishaih@nvidia.com/
vfio:
- Move to use the FEATURE ioctl for setting/getting the device state.
- Use state_flags_table as part of vfio_mig_get_next_state() and use
WARN_ON as Alex suggested.
- Leave the V1 definitions in the uAPI header and drop only its
documentation till V2 will be part of Linus's tree.
- Fix errno's usage in few places.
- Improve and adapt the uAPI documentation to match the latest code.
- Put the VFIO_DEVICE_FEATURE_PCI_VF_TOKEN functionality into a separate
function.
- Fix some rebase note.
vfio/mlx5:
- Adapt to use the vfio core changes.
- Fix some bad flow upon load state.
Changes from V5: https://lore.kernel.org/kvm/20211027095658.144468-1-yishaih@nvidia.com/
vfio:
- Migration protocol v2:
+ enum for device state, not bitmap
+ ioctl to manipulate device_state, not a region
+ Only STOP_COPY is mandatory, P2P and PRE_COPY are optional, discovered
via VFIO_DEVICE_FEATURE
+ Migration data transfer is done via dedicated FD
- VFIO core code to implement the migration related ioctls and help
drivers implement it correctly
- VFIO_DEVICE_FEATURE refactor
- Delete migration protocol, drop patches fixing it
- Drop "vfio/pci_core: Make the region->release() function optional"
vfio/mlx5:
- Switch to use migration v2 protocol, with core helpers
- Eliminate the region implementation
Changes from V4: https://lore.kernel.org/kvm/20211026090605.91646-1-yishaih@nvidia.com/
vfio:
- Add some Reviewed-by.
- Rename to vfio_pci_core_aer_err_detected() as Alex asked.
vfio/mlx5:
- Improve to enter the error state only if unquiesce also fails.
- Fix some typos.
- Use the multi-line comment style as in drivers/vfio.
Changes from V3: https://lore.kernel.org/kvm/20211024083019.232813-1-yishaih@nvidia.com/
vfio/mlx5:
- Align with mlx5 latest specification to create the MKEY with full read
write permissions.
- Fix unlock ordering in mlx5vf_state_mutex_unlock() to prevent some
race.
Changes from V2: https://lore.kernel.org/kvm/20211019105838.227569-1-yishaih@nvidia.com/
vfio:
- Put and use the new macro VFIO_DEVICE_STATE_SET_ERROR as Alex asked.
vfio/mlx5:
- Improve/fix state checking as was asked by Alex & Jason.
- Let things be done in a deterministic way upon 'reset_done' following
the suggested algorithm by Jason.
- Align with mlx5 latest specification when calling the SAVE command.
- Fix some typos.
vdpa/mlx5:
- Drop the patch from the series based on the discussion in the mailing
list.
Changes from V1: https://lore.kernel.org/kvm/20211013094707.163054-1-yishaih@nvidia.com/
PCI/IOV:
- Add actual interface in the subject as was asked by Bjorn and add
his Acked-by.
- Move to check explicitly for !dev->is_virtfn as was asked by Alex.
vfio:
- Come with a separate patch for fixing the non-compiled
VFIO_DEVICE_STATE_SET_ERROR macro.
- Expose vfio_pci_aer_err_detected() to be set by drivers on their own
pci error handles.
- Add a macro for VFIO_DEVICE_STATE_ERROR in the uapi header file as was
suggested by Alex.
vfio/mlx5:
- Improve to use xor as part of checking the 'state' change command as
was suggested by Alex.
- Set state to VFIO_DEVICE_STATE_ERROR when an error occurred instead of
VFIO_DEVICE_STATE_INVALID.
- Improve state checking as was suggested by Jason.
- Use its own PCI reset_done error handler as was suggested by Jason and
fix the locking scheme around the state mutex to work properly.
Changes from V0: https://lore.kernel.org/kvm/cover.1632305919.git.leonro@nvidia.com/
PCI/IOV:
- Add an API (i.e. pci_iov_get_pf_drvdata()) that allows SRVIO VF drivers
to reach the drvdata of a PF.
mlx5_core:
- Add an extra patch to disable SRIOV before PF removal.
- Adapt to use the above PCI/IOV API as part of mlx5_vf_get_core_dev().
- Reuse the exported PCI/IOV virtfn index function call (i.e. pci_iov_vf_id().
vfio:
- Add support in the pci_core to let a driver be notified when
'reset_done' to let it sets its internal state accordingly.
- Add some helper stuff for 'invalid' state handling.
mlx5_vfio_pci:
- Move to use the 'command mode' instead of the 'state machine'
scheme as was discussed in the mailing list.
- Handle the RESET scenario when called by vfio_pci_core to sets
its internal state accordingly.
- Set initial state as RUNNING.
- Put the driver files as sub-folder under drivers/vfio/pci named mlx5
and update MAINTAINER file as was asked.
vdpa_mlx5:
Add a new patch to use mlx5_vf_get_core_dev() to get PF device.
Jason Gunthorpe (6):
PCI/IOV: Add pci_iov_vf_id() to get VF index
PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata
of a PF
vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl
vfio: Define device migration protocol v2
vfio: Extend the device migration protocol with RUNNING_P2P
vfio: Remove migration protocol v1 documentation
Leon Romanovsky (1):
net/mlx5: Reuse exported virtfn index function call
Yishai Hadas (8):
net/mlx5: Disable SRIOV before PF removal
net/mlx5: Expose APIs to get/put the mlx5 core device
net/mlx5: Introduce migration bits and structures
net/mlx5: Add migration commands definitions
vfio/mlx5: Expose migration commands over mlx5 device
vfio/mlx5: Implement vfio_pci driver for mlx5 devices
vfio/pci: Expose vfio_pci_core_aer_err_detected()
vfio/mlx5: Use its own PCI reset_done error handler
MAINTAINERS | 6 +
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 10 +
.../net/ethernet/mellanox/mlx5/core/main.c | 45 ++
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 1 +
.../net/ethernet/mellanox/mlx5/core/sriov.c | 17 +-
drivers/pci/iov.c | 43 ++
drivers/vfio/pci/Kconfig | 3 +
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/mlx5/Kconfig | 10 +
drivers/vfio/pci/mlx5/Makefile | 4 +
drivers/vfio/pci/mlx5/cmd.c | 259 +++++++
drivers/vfio/pci/mlx5/cmd.h | 36 +
drivers/vfio/pci/mlx5/main.c | 676 ++++++++++++++++++
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 101 ++-
drivers/vfio/vfio.c | 295 +++++++-
include/linux/mlx5/driver.h | 3 +
include/linux/mlx5/mlx5_ifc.h | 147 +++-
include/linux/pci.h | 15 +-
include/linux/vfio.h | 53 ++
include/linux/vfio_pci_core.h | 4 +
include/uapi/linux/vfio.h | 406 +++++------
22 files changed, 1846 insertions(+), 291 deletions(-)
create mode 100644 drivers/vfio/pci/mlx5/Kconfig
create mode 100644 drivers/vfio/pci/mlx5/Makefile
create mode 100644 drivers/vfio/pci/mlx5/cmd.c
create mode 100644 drivers/vfio/pci/mlx5/cmd.h
create mode 100644 drivers/vfio/pci/mlx5/main.c
--
2.18.1
next reply other threads:[~2022-02-24 14:20 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-24 14:20 Yishai Hadas [this message]
2022-02-24 14:20 ` [PATCH V9 mlx5-next 01/15] PCI/IOV: Add pci_iov_vf_id() to get VF index Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 02/15] net/mlx5: Reuse exported virtfn index function call Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 03/15] net/mlx5: Disable SRIOV before PF removal Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 04/15] PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 05/15] net/mlx5: Expose APIs to get/put the mlx5 core device Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 06/15] net/mlx5: Introduce migration bits and structures Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 07/15] net/mlx5: Add migration commands definitions Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 08/15] vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl Yishai Hadas
2022-03-02 10:00 ` Cornelia Huck
2022-03-02 14:23 ` Jason Gunthorpe
2022-02-24 14:20 ` [PATCH V9 mlx5-next 09/15] vfio: Define device migration protocol v2 Yishai Hadas
2022-03-02 11:19 ` Cornelia Huck
2022-03-02 14:27 ` Jason Gunthorpe
2022-03-02 15:34 ` Alex Williamson
2022-03-02 16:07 ` Cornelia Huck
2022-03-02 16:34 ` Alex Williamson
2022-03-02 16:56 ` Cornelia Huck
2022-03-02 16:34 ` Jason Gunthorpe
2022-03-02 16:57 ` Cornelia Huck
2022-02-24 14:20 ` [PATCH V9 mlx5-next 10/15] vfio: Extend the device migration protocol with RUNNING_P2P Yishai Hadas
2022-02-24 15:21 ` Cornelia Huck
2022-02-24 15:30 ` Alex Williamson
2022-02-24 16:13 ` Jason Gunthorpe
2022-02-24 16:35 ` Alex Williamson
2022-02-24 16:53 ` Cornelia Huck
2022-02-24 20:46 ` Alex Williamson
2022-03-02 11:51 ` Cornelia Huck
2022-02-24 14:20 ` [PATCH V9 mlx5-next 11/15] vfio: Remove migration protocol v1 documentation Yishai Hadas
2022-03-02 10:09 ` Cornelia Huck
2022-02-24 14:20 ` [PATCH V9 mlx5-next 12/15] vfio/mlx5: Expose migration commands over mlx5 device Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 13/15] vfio/mlx5: Implement vfio_pci driver for mlx5 devices Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 14/15] vfio/pci: Expose vfio_pci_core_aer_err_detected() Yishai Hadas
2022-02-24 14:20 ` [PATCH V9 mlx5-next 15/15] vfio/mlx5: Use its own PCI reset_done error handler Yishai Hadas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220224142024.147653-1-yishaih@nvidia.com \
--to=yishaih@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=ashok.raj@intel.com \
--cc=bhelgaas@google.com \
--cc=cohuck@redhat.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-pci@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=mgurtovoy@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
--cc=shameerali.kolothum.thodi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).