From: Kirti Wankhede <kwankhede@nvidia.com>
To: <alex.williamson@redhat.com>, <cjia@nvidia.com>
Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com,
yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org,
eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org,
cohuck@redhat.com, shuangtai.tst@alibaba-inc.com,
dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com,
pasic@linux.ibm.com, aik@ozlabs.ru,
Kirti Wankhede <kwankhede@nvidia.com>,
eauger@redhat.com, felipe@nutanix.com,
jonathan.davies@nutanix.com, changpeng.liu@intel.com,
Ken.Xue@amd.com
Subject: [PATCH v11 Kernel 1/6] vfio: KABI for migration interface for device state
Date: Tue, 17 Dec 2019 22:40:46 +0530 [thread overview]
Message-ID: <1576602651-15430-2-git-send-email-kwankhede@nvidia.com> (raw)
In-Reply-To: <1576602651-15430-1-git-send-email-kwankhede@nvidia.com>
- Defined MIGRATION region type and sub-type.
- Defined vfio_device_migration_info structure which will be placed at 0th
offset of migration region to get/set VFIO device related information.
Defined members of structure and usage on read/write access.
- Defined device states and added state transition details in the comment.
- Added sequence to be followed while saving and resuming VFIO device state
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
---
include/uapi/linux/vfio.h | 187 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 187 insertions(+)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..b7ac8f7c0d3c 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
#define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff)
#define VFIO_REGION_TYPE_GFX (1)
#define VFIO_REGION_TYPE_CCW (2)
+#define VFIO_REGION_TYPE_MIGRATION (3)
/* sub-types for VFIO_REGION_TYPE_PCI_* */
@@ -379,6 +380,192 @@ struct vfio_region_gfx_edid {
/* sub-types for VFIO_REGION_TYPE_CCW */
#define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1)
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION (1)
+
+/*
+ * Structure vfio_device_migration_info is placed at 0th offset of
+ * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related migration
+ * information. Field accesses from this structure are only supported at their
+ * native width and alignment, otherwise the result is undefined and vendor
+ * drivers should return an error.
+ *
+ * device_state: (read/write)
+ * To indicate vendor driver the state VFIO device should be transitioned
+ * to. If device state transition fails, write on this field return error.
+ * It consists of 3 bits:
+ * - If bit 0 set, indicates _RUNNING state. When it's clear, that
+ * indicates _STOP state. When device is changed to _STOP, driver should
+ * stop device before write() returns.
+ * - If bit 1 set, indicates _SAVING state. When set, that indicates driver
+ * should start gathering device state information which will be provided
+ * to VFIO user space application to save device's state.
+ * - If bit 2 set, indicates _RESUMING state. When set, that indicates
+ * prepare to resume device, data provided through migration region
+ * should be used to resume device.
+ * Bits 3 - 31 are reserved for future use. User should perform
+ * read-modify-write operation on this field.
+ *
+ * +------- _RESUMING
+ * |+------ _SAVING
+ * ||+----- _RUNNING
+ * |||
+ * 000b => Device Stopped, not saving or resuming
+ * 001b => Device running state, default state
+ * 010b => Stop Device & save device state, stop-and-copy state
+ * 011b => Device running and save device state, pre-copy state
+ * 100b => Device stopped and device state is resuming
+ * 101b => Invalid state
+ * 110b => Invalid state
+ * 111b => Invalid state
+ *
+ * State transitions:
+ *
+ * _RESUMING _RUNNING Pre-copy Stop-and-copy _STOP
+ * (100b) (001b) (011b) (010b) (000b)
+ * 0. Running or Default state
+ * |
+ *
+ * 1. Normal Shutdown (optional)
+ * |------------------------------------->|
+ *
+ * 2. Save state or Suspend
+ * |------------------------->|---------->|
+ *
+ * 3. Save state during live migration
+ * |----------->|------------>|---------->|
+ *
+ * 4. Resuming
+ * |<---------|
+ *
+ * 5. Resumed
+ * |--------->|
+ *
+ * 0. Default state of VFIO device is _RUNNNG when VFIO application starts.
+ * 1. During normal VFIO application shutdown, vfio device state changes
+ * from _RUNNING to _STOP. This is optional, user space application may or
+ * may not perform this state transition and vendor driver may not need.
+ * 2. When VFIO application save state or suspend application, VFIO device
+ * state transition is from _RUNNING to stop-and-copy state and then to
+ * _STOP.
+ * On state transition from _RUNNING to stop-and-copy, driver must
+ * stop device, save device state and send it to application through
+ * migration region.
+ * On _RUNNING to stop-and-copy state transition failure, application should
+ * set VFIO device state to _RUNNING.
+ * 3. In VFIO application live migration, state transition is from _RUNNING
+ * to pre-copy to stop-and-copy to _STOP.
+ * On state transition from _RUNNING to pre-copy, driver should start
+ * gathering device state while application is still running and send device
+ * state data to application through migration region.
+ * On state transition from pre-copy to stop-and-copy, driver must stop
+ * device, save device state and send it to application through migration
+ * region.
+ * On any failure during any of these state transition, VFIO device state
+ * should be set to _RUNNING.
+ * 4. To start resuming phase, VFIO device state should be transitioned from
+ * _RUNNING to _RESUMING state.
+ * In _RESUMING state, driver should use received device state data through
+ * migration region to resume device.
+ * On failure during this state transition, application should set _RUNNING
+ * state.
+ * 5. On providing saved device data to driver, appliation should change state
+ * from _RESUMING to _RUNNING.
+ * On failure to transition to _RUNNING state, VFIO application should reset
+ * the device and set _RUNNING state so that device doesn't remain in unknown
+ * or bad state. On reset, driver must reset device and device should be
+ * available in default initial state, _RUNNING.
+ *
+ * pending bytes: (read only)
+ * Number of pending bytes yet to be migrated from vendor driver
+ *
+ * data_offset: (read only)
+ * User application should read data_offset in migration region from where
+ * user application should read device data during _SAVING state or write
+ * device data during _RESUMING state. See below for detail of sequence to
+ * be followed.
+ *
+ * data_size: (read/write)
+ * User application should read data_size to get size of data copied in
+ * bytes in migration region during _SAVING state and write size of data
+ * copied in bytes in migration region during _RESUMING state.
+ *
+ * Migration region looks like:
+ * ------------------------------------------------------------------
+ * |vfio_device_migration_info| data section |
+ * | | /////////////////////////////// |
+ * ------------------------------------------------------------------
+ * ^ ^
+ * offset 0-trapped part data_offset
+ *
+ * Structure vfio_device_migration_info is always followed by data section in
+ * the region, so data_offset will always be non-0. Offset from where data is
+ * copied is decided by kernel driver, data section can be trapped or mapped
+ * or partitioned, depending on how kernel driver defines data section.
+ * Data section partition can be defined as mapped by sparse mmap capability.
+ * If mmapped, then data_offset should be page aligned, where as initial section
+ * which contain vfio_device_migration_info structure might not end at offset
+ * which is page aligned. The user is not required to access via mmap regardless
+ * of the region mmap capabilities.
+ * Vendor driver should decide whether to partition data section and how to
+ * partition the data section. Vendor driver should return data_offset
+ * accordingly.
+ *
+ * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase
+ * and for _SAVING device state or stop-and-copy phase:
+ * a. read pending_bytes, indicates start of new iteration to get device data.
+ * If pending_bytes > 0, go through below steps.
+ * b. read data_offset, indicates kernel driver to make data available through
+ * data section. Kernel driver should return this read operation only after
+ * data is available from (region + data_offset) to (region + data_offset +
+ * data_size).
+ * c. read data_size, amount of data in bytes available through migration
+ * region.
+ * d. read data of data_size bytes from (region + data_offset) from migration
+ * region.
+ * e. process data.
+ * f. read pending_bytes, this read operation indicates data from previous
+ * iteration had read. If pending_bytes > 0, goto step b.
+ *
+ * User can transition from _SAVING|_RUNNING (pre-copy state) to _SAVING
+ * (stop-and-copy) state regardless of pending bytes.
+ * User should iterate in _SAVING (stop-and-copy) until pending_bytes is 0.
+ *
+ * Sequence to be followed while _RESUMING device state:
+ * While data for this device is available, repeat below steps:
+ * a. read data_offset from where user application should write data.
+ * b. write data of data_size to migration region from data_offset. Data size
+ * could be data packet size at source during _SAVING or migration region
+ * data section size which ever is less.
+ * c. write data_size which indicates vendor driver that data is written in
+ * staging buffer. Vendor driver should read this data from migration
+ * region and resume device's state.
+ *
+ * For user application, data is opaque. User should write data in the same
+ * order as received.
+ */
+
+struct vfio_device_migration_info {
+ __u32 device_state; /* VFIO device state */
+#define VFIO_DEVICE_STATE_STOP (0)
+#define VFIO_DEVICE_STATE_RUNNING (1 << 0)
+#define VFIO_DEVICE_STATE_SAVING (1 << 1)
+#define VFIO_DEVICE_STATE_RESUMING (1 << 2)
+#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \
+ VFIO_DEVICE_STATE_SAVING | \
+ VFIO_DEVICE_STATE_RESUMING)
+
+#define VFIO_DEVICE_STATE_INVALID_CASE1 (VFIO_DEVICE_STATE_SAVING | \
+ VFIO_DEVICE_STATE_RESUMING)
+
+#define VFIO_DEVICE_STATE_INVALID_CASE2 (VFIO_DEVICE_STATE_RUNNING | \
+ VFIO_DEVICE_STATE_RESUMING)
+ __u32 reserved;
+ __u64 pending_bytes;
+ __u64 data_offset;
+ __u64 data_size;
+} __attribute__((packed));
+
/*
* The MSIX mappable capability informs that MSIX data of a BAR can be mmapped
* which allows direct access to non-MSIX registers which happened to be within
--
2.7.0
next prev parent reply other threads:[~2019-12-17 17:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-17 17:10 [PATCH v11 Kernel 0/6] KABIs to support migration for VFIO devices Kirti Wankhede
2019-12-17 17:10 ` Kirti Wankhede [this message]
2019-12-17 17:10 ` [PATCH v11 Kernel 2/6] vfio iommu: Add ioctl definition for dirty pages tracking Kirti Wankhede
2019-12-17 17:10 ` [PATCH v11 Kernel 3/6] vfio iommu: Implementation of ioctl to " Kirti Wankhede
2019-12-17 22:12 ` Alex Williamson
2020-01-07 20:07 ` Kirti Wankhede
2020-01-07 22:02 ` Alex Williamson
2020-01-08 20:01 ` Kirti Wankhede
2020-01-08 22:29 ` Alex Williamson
2020-01-09 13:29 ` Kirti Wankhede
2020-01-09 14:53 ` Alex Williamson
2019-12-17 17:10 ` [PATCH v11 Kernel 4/6] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Kirti Wankhede
2019-12-17 22:55 ` Alex Williamson
2019-12-17 17:10 ` [PATCH v11 Kernel 5/6] vfio iommu: Adds flag to indicate dirty pages tracking capability support Kirti Wankhede
2019-12-17 17:10 ` [PATCH v11 Kernel 6/6] vfio: Selective dirty page tracking if IOMMU backed device pins pages Kirti Wankhede
2019-12-18 0:12 ` Alex Williamson
2020-01-07 20:45 ` Kirti Wankhede
2020-01-08 0:09 ` Alex Williamson
2020-01-08 20:52 ` Kirti Wankhede
2020-01-08 22:59 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1576602651-15430-2-git-send-email-kwankhede@nvidia.com \
--to=kwankhede@nvidia.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).