linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Yishai Hadas <yishaih@nvidia.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"saeedm@nvidia.com" <saeedm@nvidia.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"leonro@nvidia.com" <leonro@nvidia.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"mgurtovoy@nvidia.com" <mgurtovoy@nvidia.com>,
	"maorg@nvidia.com" <maorg@nvidia.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"shameerali.kolothum.thodi@huawei.com" 
	<shameerali.kolothum.thodi@huawei.com>
Subject: RE: [PATCH V8 mlx5-next 09/15] vfio: Define device migration protocol v2
Date: Tue, 22 Feb 2022 01:55:20 +0000	[thread overview]
Message-ID: <BN9PR11MB52769CE4DE386BDF1325F8698C3B9@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20220220095716.153757-10-yishaih@nvidia.com>

> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Sunday, February 20, 2022 5:57 PM
> 
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> Replace the existing region based migration protocol with an ioctl based
> protocol. The two protocols have the same general semantic behaviors, but
> the way the data is transported is changed.
> 
> This is the STOP_COPY portion of the new protocol, it defines the 5 states
> for basic stop and copy migration and the protocol to move the migration
> data in/out of the kernel.
> 
> Compared to the clarification of the v1 protocol Alex proposed:
> 
> https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit
> @omen
> 
> This has a few deliberate functional differences:
> 
>  - ERROR arcs allow the device function to remain unchanged.
> 
>  - The protocol is not required to return to the original state on
>    transition failure. Instead userspace can execute an unwind back to
>    the original state, reset, or do something else without needing kernel
>    support. This simplifies the kernel design and should userspace choose
>    a policy like always reset, avoids doing useless work in the kernel
>    on error handling paths.
> 
>  - PRE_COPY is made optional, userspace must discover it before using it.
>    This reflects the fact that the majority of drivers we are aware of
>    right now will not implement PRE_COPY.
> 
>  - segmentation is not part of the data stream protocol, the receiver
>    does not have to reproduce the framing boundaries.
> 
> The hybrid FSM for the device_state is described as a Mealy machine by
> documenting each of the arcs the driver is required to implement. Defining
> the remaining set of old/new device_state transitions as 'combination
> transitions' which are naturally defined as taking multiple FSM arcs along
> the shortest path within the FSM's digraph allows a complete matrix of
> transitions.
> 
> A new VFIO_DEVICE_FEATURE of
> VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is
> defined to replace writing to the device_state field in the region. This
> allows returning a brand new FD whenever the requested transition opens
> a data transfer session.
> 
> The VFIO core code implements the new feature and provides a helper
> function to the driver. Using the helper the driver only has to
> implement 6 of the FSM arcs and the other combination transitions are
> elaborated consistently from those arcs.
> 
> A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is
> defined to
> report the capability for migration and indicate which set of states and
> arcs are supported by the device. The FSM provides a lot of flexibility to
> make backwards compatible extensions but the VFIO_DEVICE_FEATURE also
> allows for future breaking extensions for scenarios that cannot support
> even the basic STOP_COPY requirements.
> 
> The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e.
> VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state
> of the VFIO device.
> 
> Data transfer sessions are now carried over a file descriptor, instead of
> the region. The FD functions for the lifetime of the data transfer
> session. read() and write() transfer the data with normal Linux stream FD
> semantics. This design allows future expansion to support poll(),
> io_uring, and other performance optimizations.
> 
> The complicated mmap mode for data transfer is discarded as current qemu
> doesn't take meaningful advantage of it, and the new qemu implementation
> avoids substantially all the performance penalty of using a read() on the
> region.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  drivers/vfio/vfio.c       | 199 ++++++++++++++++++++++++++++++++++++++
>  include/linux/vfio.h      |  18 ++++
>  include/uapi/linux/vfio.h | 173 ++++++++++++++++++++++++++++++---
>  3 files changed, 377 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 71763e2ac561..b37ab27b511f 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1557,6 +1557,197 @@ static int vfio_device_fops_release(struct inode
> *inode, struct file *filep)
>  	return 0;
>  }
> 
> +/*
> + * vfio_mig_get_next_state - Compute the next step in the FSM
> + * @cur_fsm - The current state the device is in
> + * @new_fsm - The target state to reach
> + * @next_fsm - Pointer to the next step to get to new_fsm
> + *
> + * Return 0 upon success, otherwise -errno
> + * Upon success the next step in the state progression between cur_fsm and
> + * new_fsm will be set in next_fsm.
> + *
> + * This breaks down requests for combination transitions into smaller steps
> and
> + * returns the next step to get to new_fsm. The function may need to be
> called
> + * multiple times before reaching new_fsm.
> + *
> + */
> +int vfio_mig_get_next_state(struct vfio_device *device,
> +			    enum vfio_device_mig_state cur_fsm,
> +			    enum vfio_device_mig_state new_fsm,
> +			    enum vfio_device_mig_state *next_fsm)
> +{
> +	enum { VFIO_DEVICE_NUM_STATES =
> VFIO_DEVICE_STATE_RESUMING + 1 };
> +	/*
> +	 * The coding in this table requires the driver to implement 6
> +	 * FSM arcs:
> +	 *         RESUMING -> STOP
> +	 *         RUNNING -> STOP
> +	 *         STOP -> RESUMING
> +	 *         STOP -> RUNNING
> +	 *         STOP -> STOP_COPY
> +	 *         STOP_COPY -> STOP
> +	 *
> +	 * The coding will step through multiple states for these combination
> +	 * transitions:
> +	 *         RESUMING -> STOP -> RUNNING
> +	 *         RESUMING -> STOP -> STOP_COPY
> +	 *         RUNNING -> STOP -> RESUMING
> +	 *         RUNNING -> STOP -> STOP_COPY
> +	 *         STOP_COPY -> STOP -> RESUMING
> +	 *         STOP_COPY -> STOP -> RUNNING
> +	 */
> +	static const u8
> vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STA
> TES] = {
> +		[VFIO_DEVICE_STATE_STOP] = {
> +			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> +			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP_COPY,
> +			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RESUMING,
> +			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> +		},
> +		[VFIO_DEVICE_STATE_RUNNING] = {
> +			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> +			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> +		},
> +		[VFIO_DEVICE_STATE_STOP_COPY] = {
> +			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP_COPY,
> +			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> +		},
> +		[VFIO_DEVICE_STATE_RESUMING] = {
> +			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RESUMING,
> +			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> +		},
> +		[VFIO_DEVICE_STATE_ERROR] = {
> +			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_ERROR,
> +			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_ERROR,
> +			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_ERROR,
> +			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_ERROR,
> +			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> +		},
> +	};
> +
> +	if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table)))
> +		return -EINVAL;
> +
> +	if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table))
> +		return -EINVAL;
> +
> +	*next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
> +	return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_mig_get_next_state);
> +
> +/*
> + * Convert the drivers's struct file into a FD number and return it to
> userspace
> + */
> +static int vfio_ioct_mig_return_fd(struct file *filp, void __user *arg,
> +				   struct vfio_device_feature_mig_state *mig)
> +{
> +	int ret;
> +	int fd;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0) {
> +		ret = fd;
> +		goto out_fput;
> +	}
> +
> +	mig->data_fd = fd;
> +	if (copy_to_user(arg, mig, sizeof(*mig))) {
> +		ret = -EFAULT;
> +		goto out_put_unused;
> +	}
> +	fd_install(fd, filp);
> +	return 0;
> +
> +out_put_unused:
> +	put_unused_fd(fd);
> +out_fput:
> +	fput(filp);
> +	return ret;
> +}
> +
> +static int
> +vfio_ioctl_device_feature_mig_device_state(struct vfio_device *device,
> +					   u32 flags, void __user *arg,
> +					   size_t argsz)
> +{
> +	size_t minsz =
> +		offsetofend(struct vfio_device_feature_mig_state, data_fd);
> +	struct vfio_device_feature_mig_state mig;
> +	struct file *filp = NULL;
> +	int ret;
> +
> +	if (!device->ops->migration_set_state ||
> +	    !device->ops->migration_get_state)
> +		return -ENOTTY;
> +
> +	ret = vfio_check_feature(flags, argsz,
> +				 VFIO_DEVICE_FEATURE_SET |
> +				 VFIO_DEVICE_FEATURE_GET,
> +				 sizeof(mig));
> +	if (ret != 1)
> +		return ret;
> +
> +	if (copy_from_user(&mig, arg, minsz))
> +		return -EFAULT;
> +
> +	if (flags & VFIO_DEVICE_FEATURE_GET) {
> +		enum vfio_device_mig_state curr_state;
> +
> +		ret = device->ops->migration_get_state(device, &curr_state);
> +		if (ret)
> +			return ret;
> +		mig.device_state = curr_state;
> +		goto out_copy;
> +	}
> +
> +	/* Handle the VFIO_DEVICE_FEATURE_SET */
> +	filp = device->ops->migration_set_state(device, mig.device_state);
> +	if (IS_ERR(filp) || !filp)
> +		goto out_copy;
> +
> +	return vfio_ioct_mig_return_fd(filp, arg, &mig);
> +out_copy:
> +	mig.data_fd = -1;
> +	if (copy_to_user(arg, &mig, sizeof(mig)))
> +		return -EFAULT;
> +	if (IS_ERR(filp))
> +		return PTR_ERR(filp);
> +	return 0;
> +}
> +
> +static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
> +					       u32 flags, void __user *arg,
> +					       size_t argsz)
> +{
> +	struct vfio_device_feature_migration mig = {
> +		.flags = VFIO_MIGRATION_STOP_COPY,
> +	};
> +	int ret;
> +
> +	if (!device->ops->migration_set_state ||
> +	    !device->ops->migration_get_state)
> +		return -ENOTTY;
> +
> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> +				 sizeof(mig));
> +	if (ret != 1)
> +		return ret;
> +	if (copy_to_user(arg, &mig, sizeof(mig)))
> +		return -EFAULT;
> +	return 0;
> +}
> +
>  static int vfio_ioctl_device_feature(struct vfio_device *device,
>  				     struct vfio_device_feature __user *arg)
>  {
> @@ -1582,6 +1773,14 @@ static int vfio_ioctl_device_feature(struct
> vfio_device *device,
>  		return -EINVAL;
> 
>  	switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
> +	case VFIO_DEVICE_FEATURE_MIGRATION:
> +		return vfio_ioctl_device_feature_migration(
> +			device, feature.flags, arg->data,
> +			feature.argsz - minsz);
> +	case VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE:
> +		return vfio_ioctl_device_feature_mig_device_state(
> +			device, feature.flags, arg->data,
> +			feature.argsz - minsz);
>  	default:
>  		if (unlikely(!device->ops->device_feature))
>  			return -EINVAL;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ca69516f869d..3bbadcdbc9c8 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -56,6 +56,14 @@ struct vfio_device {
>   *         match, -errno for abort (ex. match with insufficient or incorrect
>   *         additional args)
>   * @device_feature: Fill in the VFIO_DEVICE_FEATURE ioctl
> + * @migration_set_state: Optional callback to change the migration state for
> + *         devices that support migration. The returned FD is used for data
> + *         transfer according to the FSM definition. The driver is responsible
> + *         to ensure that FD reaches end of stream or error whenever the
> + *         migration FSM leaves a data transfer state or before close_device()
> + *         returns.
> + * @migration_get_state: Optional callback to get the migration state for
> + *         devices that support migration.
>   */
>  struct vfio_device_ops {
>  	char	*name;
> @@ -72,6 +80,11 @@ struct vfio_device_ops {
>  	int	(*match)(struct vfio_device *vdev, char *buf);
>  	int	(*device_feature)(struct vfio_device *device, u32 flags,
>  				  void __user *arg, size_t argsz);
> +	struct file *(*migration_set_state)(
> +		struct vfio_device *device,
> +		enum vfio_device_mig_state new_state);
> +	int (*migration_get_state)(struct vfio_device *device,
> +				   enum vfio_device_mig_state *curr_state);
>  };
> 
>  /**
> @@ -114,6 +127,11 @@ extern void vfio_device_put(struct vfio_device
> *device);
> 
>  int vfio_assign_device_set(struct vfio_device *device, void *set_id);
> 
> +int vfio_mig_get_next_state(struct vfio_device *device,
> +			    enum vfio_device_mig_state cur_fsm,
> +			    enum vfio_device_mig_state new_fsm,
> +			    enum vfio_device_mig_state *next_fsm);
> +
>  /*
>   * External user API
>   */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ef33ea002b0b..02b836ea8f46 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -605,25 +605,25 @@ struct vfio_region_gfx_edid {
> 
>  struct vfio_device_migration_info {
>  	__u32 device_state;         /* VFIO device state */
> -#define VFIO_DEVICE_STATE_STOP      (0)
> -#define VFIO_DEVICE_STATE_RUNNING   (1 << 0)
> -#define VFIO_DEVICE_STATE_SAVING    (1 << 1)
> -#define VFIO_DEVICE_STATE_RESUMING  (1 << 2)
> -#define VFIO_DEVICE_STATE_MASK      (VFIO_DEVICE_STATE_RUNNING | \
> -				     VFIO_DEVICE_STATE_SAVING |  \
> -				     VFIO_DEVICE_STATE_RESUMING)
> +#define VFIO_DEVICE_STATE_V1_STOP      (0)
> +#define VFIO_DEVICE_STATE_V1_RUNNING   (1 << 0)
> +#define VFIO_DEVICE_STATE_V1_SAVING    (1 << 1)
> +#define VFIO_DEVICE_STATE_V1_RESUMING  (1 << 2)
> +#define VFIO_DEVICE_STATE_MASK      (VFIO_DEVICE_STATE_V1_RUNNING
> | \
> +				     VFIO_DEVICE_STATE_V1_SAVING |  \
> +				     VFIO_DEVICE_STATE_V1_RESUMING)
> 
>  #define VFIO_DEVICE_STATE_VALID(state) \
> -	(state & VFIO_DEVICE_STATE_RESUMING ? \
> -	(state & VFIO_DEVICE_STATE_MASK) ==
> VFIO_DEVICE_STATE_RESUMING : 1)
> +	(state & VFIO_DEVICE_STATE_V1_RESUMING ? \
> +	(state & VFIO_DEVICE_STATE_MASK) ==
> VFIO_DEVICE_STATE_V1_RESUMING : 1)
> 
>  #define VFIO_DEVICE_STATE_IS_ERROR(state) \
> -	((state & VFIO_DEVICE_STATE_MASK) ==
> (VFIO_DEVICE_STATE_SAVING | \
> -					      VFIO_DEVICE_STATE_RESUMING))
> +	((state & VFIO_DEVICE_STATE_MASK) ==
> (VFIO_DEVICE_STATE_V1_SAVING | \
> +
> VFIO_DEVICE_STATE_V1_RESUMING))
> 
>  #define VFIO_DEVICE_STATE_SET_ERROR(state) \
> -	((state & ~VFIO_DEVICE_STATE_MASK) |
> VFIO_DEVICE_SATE_SAVING | \
> -					     VFIO_DEVICE_STATE_RESUMING)
> +	((state & ~VFIO_DEVICE_STATE_MASK) |
> VFIO_DEVICE_STATE_V1_SAVING | \
> +
> VFIO_DEVICE_STATE_V1_RESUMING)
> 
>  	__u32 reserved;
>  	__u64 pending_bytes;
> @@ -1002,6 +1002,153 @@ struct vfio_device_feature {
>   */
>  #define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN	(0)
> 
> +/*
> + * Indicates the device can support the migration API through
> + * VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. If present flags must be
> non-zero and
> + * VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is supported. The RUNNING
> and
> + * ERROR states are always supported if this GET succeeds.
> + *
> + * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
> + * RESUMING are supported.
> + */
> +struct vfio_device_feature_migration {
> +	__aligned_u64 flags;
> +#define VFIO_MIGRATION_STOP_COPY	(1 << 0)
> +};
> +#define VFIO_DEVICE_FEATURE_MIGRATION 1
> +
> +/*
> + * Upon VFIO_DEVICE_FEATURE_SET, execute a migration state change on
> the VFIO
> + * device. The new state is supplied in device_state, see enum
> + * vfio_device_mig_state for details
> + *
> + * The kernel migration driver must fully transition the device to the new
> state
> + * value before the operation returns to the user.
> + *
> + * The kernel migration driver must not generate asynchronous device state
> + * transitions outside of manipulation by the user or the
> VFIO_DEVICE_RESET
> + * ioctl as described above.
> + *
> + * If this function fails then current device_state may be the original
> + * operating state or some other state along the combination transition path.
> + * The user can then decide if it should execute a VFIO_DEVICE_RESET,
> attempt
> + * to return to the original state, or attempt to return to some other state
> + * such as RUNNING or STOP.
> + *
> + * If the new_state starts a new data transfer session then the FD associated
> + * with that session is returned in data_fd. The user is responsible to close
> + * this FD when it is finished. The user must consider the migration data
> + * segments carried over the FD to be opaque and non-fungible. During
> RESUMING,
> + * the data segments must be written in the same order they came out of
> the
> + * saving side FD.
> + *
> + * Upon VFIO_DEVICE_FEATURE_GET, get the current migration state of the
> VFIO
> + * device, data_fd will be -1.
> + */
> +struct vfio_device_feature_mig_state {
> +	__u32 device_state; /* From enum vfio_device_mig_state */
> +	__s32 data_fd;
> +};
> +#define VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE 2
> +
> +/*
> + * The device migration Finite State Machine is described by the enum
> + * vfio_device_mig_state. Some of the FSM arcs will create a migration data
> + * transfer session by returning a FD, in this case the migration data will
> + * flow over the FD using read() and write() as discussed below.
> + *
> + * There are 5 states to support VFIO_MIGRATION_STOP_COPY:
> + *  RUNNING - The device is running normally
> + *  STOP - The device does not change the internal or external state
> + *  STOP_COPY - The device internal state can be read out
> + *  RESUMING - The device is stopped and is loading a new internal state
> + *  ERROR - The device has failed and must be reset
> + *
> + * The FSM takes actions on the arcs between FSM states. The driver
> implements
> + * the following behavior for the FSM arcs:
> + *
> + * RUNNING -> STOP
> + * STOP_COPY -> STOP
> + *   While in STOP the device must stop the operation of the device. The
> device
> + *   must not generate interrupts, DMA, or any other change to external
> state.
> + *   It must not change its internal state. When stopped the device and
> kernel
> + *   migration driver must accept and respond to interaction to support
> external
> + *   subsystems in the STOP state, for example PCI MSI-X and PCI config
> space.
> + *   Failure by the user to restrict device access while in STOP must not result
> + *   in error conditions outside the user context (ex. host system faults).
> + *
> + *   The STOP_COPY arc will terminate a data transfer session.
> + *
> + * RESUMING -> STOP
> + *   Leaving RESUMING terminates a data transfer session and indicates the
> + *   device should complete processing of the data delivered by write(). The
> + *   kernel migration driver should complete the incorporation of data
> written
> + *   to the data transfer FD into the device internal state and perform
> + *   final validity and consistency checking of the new device state. If the
> + *   user provided data is found to be incomplete, inconsistent, or otherwise
> + *   invalid, the migration driver must fail the SET_STATE ioctl and
> + *   optionally go to the ERROR state as described below.
> + *
> + *   While in STOP the device has the same behavior as other STOP states
> + *   described above.
> + *
> + *   To abort a RESUMING session the device must be reset.
> + *
> + * STOP -> RUNNING
> + *   While in RUNNING the device is fully operational, the device may
> generate
> + *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
> + *   and the device may advance its internal state.
> + *
> + * STOP -> STOP_COPY
> + *   This arc begin the process of saving the device state and will return a
> + *   new data_fd.
> + *
> + *   While in the STOP_COPY state the device has the same behavior as STOP
> + *   with the addition that the data transfers session continues to stream the
> + *   migration state. End of stream on the FD indicates the entire device
> + *   state has been transferred.
> + *
> + *   The user should take steps to restrict access to vfio device regions while
> + *   the device is in STOP_COPY or risk corruption of the device migration
> data
> + *   stream.
> + *
> + * STOP -> RESUMING
> + *   Entering the RESUMING state starts a process of restoring the device
> state
> + *   and will return a new data_fd. The data stream fed into the data_fd
> should
> + *   be taken from the data transfer output of a single FD during saving from
> + *   a compatible device. The migration driver may alter/reset the internal
> + *   device state for this arc if required to prepare the device to receive the
> + *   migration data.
> + *
> + * any -> ERROR
> + *   ERROR cannot be specified as a device state, however any transition
> request
> + *   can be failed with an errno return and may then move the device_state
> into
> + *   ERROR. In this case the device was unable to execute the requested arc
> and
> + *   was also unable to restore the device to any valid device_state.
> + *   To recover from ERROR VFIO_DEVICE_RESET must be used to return the
> + *   device_state back to RUNNING.
> + *
> + * The remaining possible transitions are interpreted as combinations of the
> + * above FSM arcs. As there are multiple paths through the FSM arcs the
> path
> + * should be selected based on the following rules:
> + *   - Select the shortest path.
> + * Refer to vfio_mig_get_next_state() for the result of the algorithm.
> + *
> + * The automatic transit through the FSM arcs that make up the
> combination
> + * transition is invisible to the user. When working with combination arcs
> the
> + * user may see any step along the path in the device_state if SET_STATE
> + * fails. When handling these types of errors users should anticipate future
> + * revisions of this protocol using new states and those states becoming
> + * visible in this case.
> + */
> +enum vfio_device_mig_state {
> +	VFIO_DEVICE_STATE_ERROR = 0,
> +	VFIO_DEVICE_STATE_STOP = 1,
> +	VFIO_DEVICE_STATE_RUNNING = 2,
> +	VFIO_DEVICE_STATE_STOP_COPY = 3,
> +	VFIO_DEVICE_STATE_RESUMING = 4,
> +};
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
> 
>  /**
> --
> 2.18.1


  reply	other threads:[~2022-02-22  1:55 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-20  9:57 [PATCH V8 mlx5-next 00/15] Add mlx5 live migration driver and v2 migration protocol Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 01/15] PCI/IOV: Add pci_iov_vf_id() to get VF index Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 02/15] net/mlx5: Reuse exported virtfn index function call Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 03/15] net/mlx5: Disable SRIOV before PF removal Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 04/15] PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 05/15] net/mlx5: Expose APIs to get/put the mlx5 core device Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 06/15] net/mlx5: Introduce migration bits and structures Yishai Hadas
2022-02-23 19:09   ` Alex Williamson
2022-02-23 19:17     ` Jason Gunthorpe
2022-02-20  9:57 ` [PATCH V8 mlx5-next 07/15] net/mlx5: Add migration commands definitions Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 08/15] vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl Yishai Hadas
2022-02-22 16:48   ` Cornelia Huck
2022-02-22 18:13     ` Jason Gunthorpe
2022-02-20  9:57 ` [PATCH V8 mlx5-next 09/15] vfio: Define device migration protocol v2 Yishai Hadas
2022-02-22  1:55   ` Tian, Kevin [this message]
2022-02-22 23:53   ` Alex Williamson
2022-02-23  0:21     ` Jason Gunthorpe
2022-02-23  1:09       ` Alex Williamson
2022-02-23  2:02         ` Tian, Kevin
2022-02-23  2:47         ` Jason Gunthorpe
2022-02-23 17:06   ` Cornelia Huck
2022-02-24  0:46     ` Jason Gunthorpe
2022-02-24 10:41       ` Cornelia Huck
2022-02-24 12:39         ` Jason Gunthorpe
2022-02-20  9:57 ` [PATCH V8 mlx5-next 10/15] vfio: Extend the device migration protocol with RUNNING_P2P Yishai Hadas
2022-02-22  2:00   ` Tian, Kevin
2022-02-23 17:42   ` Alex Williamson
2022-02-24  0:47     ` Jason Gunthorpe
2022-02-20  9:57 ` [PATCH V8 mlx5-next 11/15] vfio: Remove migration protocol v1 documentation Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 12/15] vfio/mlx5: Expose migration commands over mlx5 device Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 13/15] vfio/mlx5: Implement vfio_pci driver for mlx5 devices Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 14/15] vfio/pci: Expose vfio_pci_core_aer_err_detected() Yishai Hadas
2022-02-20  9:57 ` [PATCH V8 mlx5-next 15/15] vfio/mlx5: Use its own PCI reset_done error handler Yishai Hadas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN9PR11MB52769CE4DE386BDF1325F8698C3B9@BN9PR11MB5276.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).