All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2
@ 2023-01-12  8:50 Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 01/13] linux-headers: Update to v6.2-rc1 Avihai Horon
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Hello,

Here is v6 of the series.
Thanks for reviewing!

Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.

The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
   a bitmap.

2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
   ioctl and normal read() and write() instead of the migration region
   used in v1.

3. Pre-copy is made optional in v2 protocol. Support for pre-copy will
   be added later on.

Full description of the v2 protocol and the differences from v1 can be
found here [1].



Patch list:

Patch 1 updates linux headers so we will have the MIG_DATA_SIZE ioctl.

Patches 2-8 are prep patches fixing bugs, adding QEMUFile function
that will be used later and refactoring v1 protocol code to make it
easier to add v2 protocol.

Patches 9-13 implement v2 protocol and remove v1 protocol.

Thanks.



Changes from v5 [2]:
- Dropped patch #3.
- Simplified patch #5 as per Alex's suggestion.
- Changed qemu_file_get_to_fd() to return -EIO instead of -1, as
  suggested by Cedric.
  Also changed it so now write returns -errno instead of -1 on error.
- Fixed compilation error reported by Cedric.
- Changed vfio_migration_query_flags() to print error message and return
  -errno in error case as suggested by Cedric.
- Added Reviewed-by tags.



Changes from v4 [3]:
- Rebased on latest master branch.
- Added linux header update to kernel v6.2-rc1.
- Merged preview patches (#13-14) into this series.



Changes from v3 [4]:
- Rebased on latest master branch.

- Dropped patch #1 "migration: Remove res_compatible parameter" as
  it's not mandatory to this series and needs some further discussion.

- Dropped patch #3 "migration: Block migration comment or code is
  wrong" as it has been merged already.

- Addressed overlooked corner case reported by Vladimir in patch #4
  "migration: Simplify migration_iteration_run()".

- Dropped patch #5 "vfio/migration: Fix wrong enum usage" as it has
  been merged already.

- In patch #12 "vfio/migration: Implement VFIO migration protocol v2":
  1. Changed vfio_save_pending() to update res_precopy_only instead of
     res_postcopy_only (as VFIO migration doesn’t support postcopy).
  2. Moved VFIOMigration->data_buffer allocation to vfio_save_setup()
     and its de-allocation to vfio_save_cleanup(), so now it's
     allocated when actually used (during migration and only on source
     side).

- Addressed Alex's comments:
  1. Eliminated code duplication in patch #7 "vfio/migration: Allow
     migration without VFIO IOMMU dirty tracking support".
  2. Removed redundant initialization of vfio_region_info in patch #10
     "vfio/migration: Move migration v1 logic to vfio_migration_init()".
  3. Added comment about VFIO_MIG_DATA_BUFFER_SIZE heuristic (and
     renamed to VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE).
  4. Cast migration structs to their actual types instead of void *.
  5. Return -errno and -EBADF instead of -1 in vfio_migration_set_state().
  6. Set migration->device_state to new_state even in case of data_fd
     out of sync. Although migration will be aborted, setting device
     state succeeded so we should reflect that.
  7. Renamed VFIO_MIG_PENDING_SIZE to VFIO_MIG_STOP_COPY_SIZE, set it
     to 100G and added a comment about the size choice.
  8. Changed vfio_save_block() to return -errno on error.
  9. Squashed Patch #14 to patch #12.
  10. Adjusted migration data buffer size according to MIG_DATA_SIZE
      ioctl.

- In preview patch #17 "vfio/migration: Query device data size in
  vfio_save_pending()" - changed vfio_save_pending() to report
  VFIO_MIG_STOP_COPY_SIZE on any error.
   
- Added another preview patch "vfio/migration: Optimize
  vfio_save_pending()".

- Added ret value on some traces as suggested by David.

- Added Reviewed-By tags.



Changes from v2 [5]:
- Rebased on top of latest master branch.

- Added relevant patches from Juan's RFC [6] with minor changes:
  1. Added Reviewed-by tag to patch #3 in the RFC.
  2. Adjusted patch #6 to work without patch #4 in the RFC.

- Added a new patch "vfio/migration: Fix wrong enum usage" that fixes a
  small bug in v1 code. This patch has been sent a few weeks ago [7] but
  wasn't taken yet.

- Patch #2 (vfio/migration: Skip pre-copy if dirty page tracking is not
  supported):
  1. Dropped this patch and replaced it with
     "vfio/migration: Allow migration without VFIO IOMMU dirty tracking
     support".
     The new patch uses a different approach – instead of skipping
     pre-copy phase completely, QEMU VFIO code will mark RAM dirty
     (instead of kernel). This ensures that current migration behavior
     is not changed and SLA is taken into account.

- Patch #4 (vfio/common: Change vfio_devices_all_running_and_saving()
  logic to equivalent one):
  1. Improved commit message to better explain the change.

- Patch #7 (vfio/migration: Implement VFIO migration protocol v2):
  1. Enhanced vfio_migration_set_state() error reporting.
  2. In vfio_save_complete_precopy() of v2 protocol - when changing
     device state to STOP, set recover state to ERROR instead of STOP as
     suggested by Joao.
  3. Constify SaveVMHandlers of v2 protocol.
  4. Modified trace_vfio_vmstate_change and
     trace_vfio_migration_set_state
     to print device state string instead of enum.
  5. Replaced qemu_put_buffer_async() with qemu_put_buffer() in
     vfio_save_block(), as requested by Juan.
  6. Implemented v2 protocol version of vfio_save_pending() as requested
     by Juan. Until ioctl to get device state size is added, we just
     report some big hard coded value, as agreed in KVM call.

- Patch #9 (vfio/migration: Reset device if setting recover state
  fails):
  1. Enhanced error reporting.
  2. Set VFIOMigration->device_state to RUNNING after device reset.

- Patch #11 (docs/devel: Align vfio-migration docs to VFIO migration
  v2):
  1. Adjusted vfio migration documentation to the added
     vfio_save_pending()

- Added the last patch (which is not for merging yet) that demonstrates
  how the new ioctl to get device state size will work once added.



Changes from v1 [8]:
- Split the big patch that replaced v1 with v2 into several patches as
  suggested by Joao, to make review easier.
- Change warn_report to warn_report_once when container doesn't support
  dirty tracking.
- Add Reviewed-by tag.

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

[2]
https://lore.kernel.org/qemu-devel/20221229110345.12480-1-avihaih@nvidia.com/

[3]
https://lore.kernel.org/qemu-devel/20221130094414.27247-1-avihaih@nvidia.com/

[4]
https://lore.kernel.org/qemu-devel/20221103161620.13120-1-avihaih@nvidia.com/

[5]
https://lore.kernel.org/all/20220530170739.19072-1-avihaih@nvidia.com/

[6]
https://lore.kernel.org/qemu-devel/20221003031600.20084-1-quintela@redhat.com/T/

[7]
https://lore.kernel.org/all/20221016085752.32740-1-avihaih@nvidia.com/

[8]
https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/

Avihai Horon (12):
  linux-headers: Update to v6.2-rc1
  vfio/migration: Fix NULL pointer dereference bug
  vfio/migration: Allow migration without VFIO IOMMU dirty tracking
    support
  migration/qemu-file: Add qemu_file_get_to_fd()
  vfio/common: Change vfio_devices_all_running_and_saving() logic to
    equivalent one
  vfio/migration: Move migration v1 logic to vfio_migration_init()
  vfio/migration: Rename functions/structs related to v1 protocol
  vfio/migration: Implement VFIO migration protocol v2
  vfio/migration: Optimize vfio_save_pending()
  vfio/migration: Remove VFIO migration protocol v1
  vfio: Alphabetize migration section of VFIO trace-events file
  docs/devel: Align VFIO migration docs to v2 protocol

Juan Quintela (1):
  migration: No save_live_pending() method uses the QEMUFile parameter

 docs/devel/vfio-migration.rst                 |  68 +-
 include/hw/vfio/vfio-common.h                 |  10 +-
 include/migration/register.h                  |   3 +-
 include/standard-headers/drm/drm_fourcc.h     |  63 +-
 include/standard-headers/linux/ethtool.h      |  81 +-
 include/standard-headers/linux/fuse.h         |  20 +-
 .../linux/input-event-codes.h                 |   4 +
 include/standard-headers/linux/pci_regs.h     |   2 +
 include/standard-headers/linux/virtio_blk.h   |  19 +
 include/standard-headers/linux/virtio_bt.h    |   8 +
 include/standard-headers/linux/virtio_net.h   |   4 +
 linux-headers/asm-arm64/kvm.h                 |   1 +
 linux-headers/asm-generic/hugetlb_encode.h    |  26 +-
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-riscv/kvm.h                 |   7 +
 linux-headers/asm-x86/kvm.h                   |  11 +-
 linux-headers/linux/kvm.h                     |  32 +-
 linux-headers/linux/psci.h                    |  14 +
 linux-headers/linux/userfaultfd.h             |   4 +
 linux-headers/linux/vfio.h                    | 278 ++++++-
 migration/qemu-file.h                         |   1 +
 migration/savevm.h                            |   3 +-
 hw/s390x/s390-stattrib.c                      |   2 +-
 hw/vfio/common.c                              |  41 +-
 hw/vfio/migration.c                           | 748 ++++++------------
 migration/block-dirty-bitmap.c                |   3 +-
 migration/block.c                             |   2 +-
 migration/migration.c                         |   4 +-
 migration/qemu-file.c                         |  34 +
 migration/ram.c                               |   2 +-
 migration/savevm.c                            |   7 +-
 hw/vfio/trace-events                          |  29 +-
 33 files changed, 898 insertions(+), 637 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 01/13] linux-headers: Update to v6.2-rc1
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 02/13] migration: No save_live_pending() method uses the QEMUFile parameter Avihai Horon
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Update to commit 1b929c02afd3 ("Linux 6.2-rc1").

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 include/standard-headers/drm/drm_fourcc.h     |  63 +++-
 include/standard-headers/linux/ethtool.h      |  81 ++++-
 include/standard-headers/linux/fuse.h         |  20 +-
 .../linux/input-event-codes.h                 |   4 +
 include/standard-headers/linux/pci_regs.h     |   2 +
 include/standard-headers/linux/virtio_blk.h   |  19 ++
 include/standard-headers/linux/virtio_bt.h    |   8 +
 include/standard-headers/linux/virtio_net.h   |   4 +
 linux-headers/asm-arm64/kvm.h                 |   1 +
 linux-headers/asm-generic/hugetlb_encode.h    |  26 +-
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-riscv/kvm.h                 |   7 +
 linux-headers/asm-x86/kvm.h                   |  11 +-
 linux-headers/linux/kvm.h                     |  32 +-
 linux-headers/linux/psci.h                    |  14 +
 linux-headers/linux/userfaultfd.h             |   4 +
 linux-headers/linux/vfio.h                    | 278 +++++++++++++++++-
 18 files changed, 522 insertions(+), 56 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
index 48b620cbef..69cab17b38 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -98,18 +98,42 @@ extern "C" {
 #define DRM_FORMAT_INVALID	0
 
 /* color index */
+#define DRM_FORMAT_C1		fourcc_code('C', '1', ' ', ' ') /* [7:0] C0:C1:C2:C3:C4:C5:C6:C7 1:1:1:1:1:1:1:1 eight pixels/byte */
+#define DRM_FORMAT_C2		fourcc_code('C', '2', ' ', ' ') /* [7:0] C0:C1:C2:C3 2:2:2:2 four pixels/byte */
+#define DRM_FORMAT_C4		fourcc_code('C', '4', ' ', ' ') /* [7:0] C0:C1 4:4 two pixels/byte */
 #define DRM_FORMAT_C8		fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
-/* 8 bpp Red */
+/* 1 bpp Darkness (inverse relationship between channel value and brightness) */
+#define DRM_FORMAT_D1		fourcc_code('D', '1', ' ', ' ') /* [7:0] D0:D1:D2:D3:D4:D5:D6:D7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Darkness (inverse relationship between channel value and brightness) */
+#define DRM_FORMAT_D2		fourcc_code('D', '2', ' ', ' ') /* [7:0] D0:D1:D2:D3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Darkness (inverse relationship between channel value and brightness) */
+#define DRM_FORMAT_D4		fourcc_code('D', '4', ' ', ' ') /* [7:0] D0:D1 4:4 two pixels/byte */
+
+/* 8 bpp Darkness (inverse relationship between channel value and brightness) */
+#define DRM_FORMAT_D8		fourcc_code('D', '8', ' ', ' ') /* [7:0] D */
+
+/* 1 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R1		fourcc_code('R', '1', ' ', ' ') /* [7:0] R0:R1:R2:R3:R4:R5:R6:R7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R2		fourcc_code('R', '2', ' ', ' ') /* [7:0] R0:R1:R2:R3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R4		fourcc_code('R', '4', ' ', ' ') /* [7:0] R0:R1 4:4 two pixels/byte */
+
+/* 8 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R8		fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
-/* 10 bpp Red */
+/* 10 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R10		fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 6:10 little endian */
 
-/* 12 bpp Red */
+/* 12 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R12		fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 4:12 little endian */
 
-/* 16 bpp Red */
+/* 16 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R16		fourcc_code('R', '1', '6', ' ') /* [15:0] R little endian */
 
 /* 16 bpp RG */
@@ -204,7 +228,9 @@ extern "C" {
 #define DRM_FORMAT_VYUY		fourcc_code('V', 'Y', 'U', 'Y') /* [31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */
 
 #define DRM_FORMAT_AYUV		fourcc_code('A', 'Y', 'U', 'V') /* [31:0] A:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_AVUY8888	fourcc_code('A', 'V', 'U', 'Y') /* [31:0] A:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_XYUV8888	fourcc_code('X', 'Y', 'U', 'V') /* [31:0] X:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_XVUY8888	fourcc_code('X', 'V', 'U', 'Y') /* [31:0] X:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_VUY888	fourcc_code('V', 'U', '2', '4') /* [23:0] Cr:Cb:Y 8:8:8 little endian */
 #define DRM_FORMAT_VUY101010	fourcc_code('V', 'U', '3', '0') /* Y followed by U then V, 10:10:10. Non-linear modifier only */
 
@@ -717,6 +743,35 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED fourcc_mod_code(VIVANTE, 4)
 
+/*
+ * Vivante TS (tile-status) buffer modifiers. They can be combined with all of
+ * the color buffer tiling modifiers defined above. When TS is present it's a
+ * separate buffer containing the clear/compression status of each tile. The
+ * modifiers are defined as VIVANTE_MOD_TS_c_s, where c is the color buffer
+ * tile size in bytes covered by one entry in the status buffer and s is the
+ * number of status bits per entry.
+ * We reserve the top 8 bits of the Vivante modifier space for tile status
+ * clear/compression modifiers, as future cores might add some more TS layout
+ * variations.
+ */
+#define VIVANTE_MOD_TS_64_4               (1ULL << 48)
+#define VIVANTE_MOD_TS_64_2               (2ULL << 48)
+#define VIVANTE_MOD_TS_128_4              (3ULL << 48)
+#define VIVANTE_MOD_TS_256_4              (4ULL << 48)
+#define VIVANTE_MOD_TS_MASK               (0xfULL << 48)
+
+/*
+ * Vivante compression modifiers. Those depend on a TS modifier being present
+ * as the TS bits get reinterpreted as compression tags instead of simple
+ * clear markers when compression is enabled.
+ */
+#define VIVANTE_MOD_COMP_DEC400           (1ULL << 52)
+#define VIVANTE_MOD_COMP_MASK             (0xfULL << 52)
+
+/* Masking out the extension bits will yield the base modifier. */
+#define VIVANTE_MOD_EXT_MASK              (VIVANTE_MOD_TS_MASK | \
+                                           VIVANTE_MOD_COMP_MASK)
+
 /* NVIDIA frame buffer modifiers */
 
 /*
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index 4537da20cc..87176ab075 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -159,8 +159,10 @@ static inline uint32_t ethtool_cmd_speed(const struct ethtool_cmd *ep)
  *	in its bus driver structure (e.g. pci_driver::name).  Must
  *	not be an empty string.
  * @version: Driver version string; may be an empty string
- * @fw_version: Firmware version string; may be an empty string
- * @erom_version: Expansion ROM version string; may be an empty string
+ * @fw_version: Firmware version string; driver defined; may be an
+ *	empty string
+ * @erom_version: Expansion ROM version string; driver defined; may be
+ *	an empty string
  * @bus_info: Device bus address.  This should match the dev_name()
  *	string for the underlying bus device, if there is one.  May be
  *	an empty string.
@@ -179,10 +181,6 @@ static inline uint32_t ethtool_cmd_speed(const struct ethtool_cmd *ep)
  *
  * Users can use the %ETHTOOL_GSSET_INFO command to get the number of
  * strings in any string set (from Linux 2.6.34).
- *
- * Drivers should set at most @driver, @version, @fw_version and
- * @bus_info in their get_drvinfo() implementation.  The ethtool
- * core fills in the other fields using other driver operations.
  */
 struct ethtool_drvinfo {
 	uint32_t	cmd;
@@ -736,6 +734,51 @@ enum ethtool_module_power_mode {
 	ETHTOOL_MODULE_POWER_MODE_HIGH,
 };
 
+/**
+ * enum ethtool_podl_pse_admin_state - operational state of the PoDL PSE
+ *	functions. IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
+ * @ETHTOOL_PODL_PSE_ADMIN_STATE_UNKNOWN: state of PoDL PSE functions are
+ * 	unknown
+ * @ETHTOOL_PODL_PSE_ADMIN_STATE_DISABLED: PoDL PSE functions are disabled
+ * @ETHTOOL_PODL_PSE_ADMIN_STATE_ENABLED: PoDL PSE functions are enabled
+ */
+enum ethtool_podl_pse_admin_state {
+	ETHTOOL_PODL_PSE_ADMIN_STATE_UNKNOWN = 1,
+	ETHTOOL_PODL_PSE_ADMIN_STATE_DISABLED,
+	ETHTOOL_PODL_PSE_ADMIN_STATE_ENABLED,
+};
+
+/**
+ * enum ethtool_podl_pse_pw_d_status - power detection status of the PoDL PSE.
+ *	IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus:
+ * @ETHTOOL_PODL_PSE_PW_D_STATUS_UNKNOWN: PoDL PSE
+ * @ETHTOOL_PODL_PSE_PW_D_STATUS_DISABLED: "The enumeration “disabled” is
+ *	asserted true when the PoDL PSE state diagram variable mr_pse_enable is
+ *	false"
+ * @ETHTOOL_PODL_PSE_PW_D_STATUS_SEARCHING: "The enumeration “searching” is
+ *	asserted true when either of the PSE state diagram variables
+ *	pi_detecting or pi_classifying is true."
+ * @ETHTOOL_PODL_PSE_PW_D_STATUS_DELIVERING: "The enumeration “deliveringPower”
+ *	is asserted true when the PoDL PSE state diagram variable pi_powered is
+ *	true."
+ * @ETHTOOL_PODL_PSE_PW_D_STATUS_SLEEP: "The enumeration “sleep” is asserted
+ *	true when the PoDL PSE state diagram variable pi_sleeping is true."
+ * @ETHTOOL_PODL_PSE_PW_D_STATUS_IDLE: "The enumeration “idle” is asserted true
+ *	when the logical combination of the PoDL PSE state diagram variables
+ *	pi_prebiased*!pi_sleeping is true."
+ * @ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR: "The enumeration “error” is asserted
+ *	true when the PoDL PSE state diagram variable overload_held is true."
+ */
+enum ethtool_podl_pse_pw_d_status {
+	ETHTOOL_PODL_PSE_PW_D_STATUS_UNKNOWN = 1,
+	ETHTOOL_PODL_PSE_PW_D_STATUS_DISABLED,
+	ETHTOOL_PODL_PSE_PW_D_STATUS_SEARCHING,
+	ETHTOOL_PODL_PSE_PW_D_STATUS_DELIVERING,
+	ETHTOOL_PODL_PSE_PW_D_STATUS_SLEEP,
+	ETHTOOL_PODL_PSE_PW_D_STATUS_IDLE,
+	ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR,
+};
+
 /**
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
@@ -1692,6 +1735,13 @@ enum ethtool_link_mode_bit_indices {
 	ETHTOOL_LINK_MODE_100baseFX_Half_BIT		 = 90,
 	ETHTOOL_LINK_MODE_100baseFX_Full_BIT		 = 91,
 	ETHTOOL_LINK_MODE_10baseT1L_Full_BIT		 = 92,
+	ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT	 = 93,
+	ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT	 = 94,
+	ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT	 = 95,
+	ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT	 = 96,
+	ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT	 = 97,
+	ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT	 = 98,
+
 	/* must be last entry */
 	__ETHTOOL_LINK_MODE_MASK_NBITS
 };
@@ -1803,6 +1853,7 @@ enum ethtool_link_mode_bit_indices {
 #define SPEED_100000		100000
 #define SPEED_200000		200000
 #define SPEED_400000		400000
+#define SPEED_800000		800000
 
 #define SPEED_UNKNOWN		-1
 
@@ -1840,6 +1891,20 @@ static inline int ethtool_validate_duplex(uint8_t duplex)
 #define MASTER_SLAVE_STATE_SLAVE		3
 #define MASTER_SLAVE_STATE_ERR			4
 
+/* These are used to throttle the rate of data on the phy interface when the
+ * native speed of the interface is higher than the link speed. These should
+ * not be used for phy interfaces which natively support multiple speeds (e.g.
+ * MII or SGMII).
+ */
+/* No rate matching performed. */
+#define RATE_MATCH_NONE		0
+/* The phy sends pause frames to throttle the MAC. */
+#define RATE_MATCH_PAUSE	1
+/* The phy asserts CRS to prevent the MAC from transmitting. */
+#define RATE_MATCH_CRS		2
+/* The MAC is programmed with a sufficiently-large IPG. */
+#define RATE_MATCH_OPEN_LOOP	3
+
 /* Which connector port. */
 #define PORT_TP			0x00
 #define PORT_AUI		0x01
@@ -2033,8 +2098,8 @@ enum ethtool_reset_flags {
  *	reported consistently by PHYLIB.  Read-only.
  * @master_slave_cfg: Master/slave port mode.
  * @master_slave_state: Master/slave port state.
+ * @rate_matching: Rate adaptation performed by the PHY
  * @reserved: Reserved for future use; see the note on reserved space.
- * @reserved1: Reserved for future use; see the note on reserved space.
  * @link_mode_masks: Variable length bitmaps.
  *
  * If autonegotiation is disabled, the speed and @duplex represent the
@@ -2085,7 +2150,7 @@ struct ethtool_link_settings {
 	uint8_t	transceiver;
 	uint8_t	master_slave_cfg;
 	uint8_t	master_slave_state;
-	uint8_t	reserved1[1];
+	uint8_t	rate_matching;
 	uint32_t	reserved[7];
 	uint32_t	link_mode_masks[];
 	/* layout of link_mode_masks fields:
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index bda06258be..a1af78d989 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -194,6 +194,13 @@
  *  - add FUSE_SECURITY_CTX init flag
  *  - add security context to create, mkdir, symlink, and mknod requests
  *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
+ *
+ *  7.37
+ *  - add FUSE_TMPFILE
+ *
+ *  7.38
+ *  - add FUSE_EXPIRE_ONLY flag to fuse_notify_inval_entry
+ *  - add FOPEN_PARALLEL_DIRECT_WRITES
  */
 
 #ifndef _LINUX_FUSE_H
@@ -225,7 +232,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 36
+#define FUSE_KERNEL_MINOR_VERSION 38
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -297,6 +304,7 @@ struct fuse_file_lock {
  * FOPEN_CACHE_DIR: allow caching this directory
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
+ * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -304,6 +312,7 @@ struct fuse_file_lock {
 #define FOPEN_CACHE_DIR		(1 << 3)
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
+#define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
 
 /**
  * INIT request/reply flags
@@ -484,6 +493,12 @@ struct fuse_file_lock {
  */
 #define FUSE_SETXATTR_ACL_KILL_SGID	(1 << 0)
 
+/**
+ * notify_inval_entry flags
+ * FUSE_EXPIRE_ONLY
+ */
+#define FUSE_EXPIRE_ONLY		(1 << 0)
+
 enum fuse_opcode {
 	FUSE_LOOKUP		= 1,
 	FUSE_FORGET		= 2,  /* no reply */
@@ -533,6 +548,7 @@ enum fuse_opcode {
 	FUSE_SETUPMAPPING	= 48,
 	FUSE_REMOVEMAPPING	= 49,
 	FUSE_SYNCFS		= 50,
+	FUSE_TMPFILE		= 51,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
@@ -911,7 +927,7 @@ struct fuse_notify_inval_inode_out {
 struct fuse_notify_inval_entry_out {
 	uint64_t	parent;
 	uint32_t	namelen;
-	uint32_t	padding;
+	uint32_t	flags;
 };
 
 struct fuse_notify_delete_out {
diff --git a/include/standard-headers/linux/input-event-codes.h b/include/standard-headers/linux/input-event-codes.h
index 50790aee5a..f6bab08540 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -614,6 +614,9 @@
 #define KEY_KBD_LAYOUT_NEXT	0x248	/* AC Next Keyboard Layout Select */
 #define KEY_EMOJI_PICKER	0x249	/* Show/hide emoji picker (HUTRR101) */
 #define KEY_DICTATE		0x24a	/* Start or Stop Voice Dictation Session (HUTRR99) */
+#define KEY_CAMERA_ACCESS_ENABLE	0x24b	/* Enables programmatic access to camera devices. (HUTRR72) */
+#define KEY_CAMERA_ACCESS_DISABLE	0x24c	/* Disables programmatic access to camera devices. (HUTRR72) */
+#define KEY_CAMERA_ACCESS_TOGGLE	0x24d	/* Toggles the current state of the camera access control. (HUTRR72) */
 
 #define KEY_BRIGHTNESS_MIN		0x250	/* Set Brightness to Minimum */
 #define KEY_BRIGHTNESS_MAX		0x251	/* Set Brightness to Maximum */
@@ -862,6 +865,7 @@
 #define ABS_TOOL_WIDTH		0x1c
 
 #define ABS_VOLUME		0x20
+#define ABS_PROFILE		0x21
 
 #define ABS_MISC		0x28
 
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index 57b8e2ffb1..85ab127881 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -1058,6 +1058,7 @@
 /* Precision Time Measurement */
 #define PCI_PTM_CAP			0x04	    /* PTM Capability */
 #define  PCI_PTM_CAP_REQ		0x00000001  /* Requester capable */
+#define  PCI_PTM_CAP_RES		0x00000002  /* Responder capable */
 #define  PCI_PTM_CAP_ROOT		0x00000004  /* Root capable */
 #define  PCI_PTM_GRANULARITY_MASK	0x0000FF00  /* Clock granularity */
 #define PCI_PTM_CTRL			0x08	    /* PTM Control */
@@ -1119,6 +1120,7 @@
 #define  PCI_DOE_STATUS_DATA_OBJECT_READY	0x80000000  /* Data Object Ready */
 #define PCI_DOE_WRITE		0x10    /* DOE Write Data Mailbox Register */
 #define PCI_DOE_READ		0x14    /* DOE Read Data Mailbox Register */
+#define PCI_DOE_CAP_SIZEOF	0x18	/* Size of DOE register block */
 
 /* DOE Data Object - note not actually registers */
 #define PCI_DOE_DATA_OBJECT_HEADER_1_VID		0x0000ffff
diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h
index 2dcc90826a..e81715cd70 100644
--- a/include/standard-headers/linux/virtio_blk.h
+++ b/include/standard-headers/linux/virtio_blk.h
@@ -40,6 +40,7 @@
 #define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
 #define VIRTIO_BLK_F_DISCARD	13	/* DISCARD is supported */
 #define VIRTIO_BLK_F_WRITE_ZEROES	14	/* WRITE ZEROES is supported */
+#define VIRTIO_BLK_F_SECURE_ERASE	16 /* Secure Erase is supported */
 
 /* Legacy feature bits */
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -119,6 +120,21 @@ struct virtio_blk_config {
 	uint8_t write_zeroes_may_unmap;
 
 	uint8_t unused1[3];
+
+	/* the next 3 entries are guarded by VIRTIO_BLK_F_SECURE_ERASE */
+	/*
+	 * The maximum secure erase sectors (in 512-byte sectors) for
+	 * one segment.
+	 */
+	__virtio32 max_secure_erase_sectors;
+	/*
+	 * The maximum number of secure erase segments in a
+	 * secure erase command.
+	 */
+	__virtio32 max_secure_erase_seg;
+	/* Secure erase commands must be aligned to this number of sectors. */
+	__virtio32 secure_erase_sector_alignment;
+
 } QEMU_PACKED;
 
 /*
@@ -153,6 +169,9 @@ struct virtio_blk_config {
 /* Write zeroes command */
 #define VIRTIO_BLK_T_WRITE_ZEROES	13
 
+/* Secure erase command */
+#define VIRTIO_BLK_T_SECURE_ERASE	14
+
 #ifndef VIRTIO_BLK_NO_LEGACY
 /* Barrier before this op. */
 #define VIRTIO_BLK_T_BARRIER	0x80000000
diff --git a/include/standard-headers/linux/virtio_bt.h b/include/standard-headers/linux/virtio_bt.h
index 245e1eff4b..a11ecc3f92 100644
--- a/include/standard-headers/linux/virtio_bt.h
+++ b/include/standard-headers/linux/virtio_bt.h
@@ -9,6 +9,7 @@
 #define VIRTIO_BT_F_VND_HCI	0	/* Indicates vendor command support */
 #define VIRTIO_BT_F_MSFT_EXT	1	/* Indicates MSFT vendor support */
 #define VIRTIO_BT_F_AOSP_EXT	2	/* Indicates AOSP vendor support */
+#define VIRTIO_BT_F_CONFIG_V2	3	/* Use second version configuration */
 
 enum virtio_bt_config_type {
 	VIRTIO_BT_CONFIG_TYPE_PRIMARY	= 0,
@@ -28,4 +29,11 @@ struct virtio_bt_config {
 	uint16_t msft_opcode;
 } QEMU_PACKED;
 
+struct virtio_bt_config_v2 {
+	uint8_t  type;
+	uint8_t  alignment;
+	uint16_t vendor;
+	uint16_t msft_opcode;
+};
+
 #endif /* _LINUX_VIRTIO_BT_H */
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
index 42c68caf71..c0e797067a 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,9 @@
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
 #define VIRTIO_NET_F_NOTF_COAL	53	/* Device supports notifications coalescing */
+#define VIRTIO_NET_F_GUEST_USO4	54	/* Guest can handle USOv4 in. */
+#define VIRTIO_NET_F_GUEST_USO6	55	/* Guest can handle USOv6 in. */
+#define VIRTIO_NET_F_HOST_USO	56	/* Host can handle USO in. */
 #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
 #define VIRTIO_NET_F_RSS	  60	/* Supports RSS RX steering */
 #define VIRTIO_NET_F_RSC_EXT	  61	/* extended coalescing info */
@@ -130,6 +133,7 @@ struct virtio_net_hdr_v1 {
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
 #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
+#define VIRTIO_NET_HDR_GSO_UDP_L4	5	/* GSO frame, IPv4& IPv6 UDP (USO) */
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
 	uint8_t gso_type;
 	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 4bf2d7246e..a7cfefb3a8 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/linux-headers/asm-generic/hugetlb_encode.h b/linux-headers/asm-generic/hugetlb_encode.h
index 4f3d5aaa11..de687009bf 100644
--- a/linux-headers/asm-generic/hugetlb_encode.h
+++ b/linux-headers/asm-generic/hugetlb_encode.h
@@ -20,18 +20,18 @@
 #define HUGETLB_FLAG_ENCODE_SHIFT	26
 #define HUGETLB_FLAG_ENCODE_MASK	0x3f
 
-#define HUGETLB_FLAG_ENCODE_16KB	(14 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_64KB	(16 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_512KB	(19 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_1MB		(20 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_2MB		(21 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_8MB		(23 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_16MB	(24 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_32MB	(25 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_256MB	(28 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_512MB	(29 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_1GB		(30 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_2GB		(31 << HUGETLB_FLAG_ENCODE_SHIFT)
-#define HUGETLB_FLAG_ENCODE_16GB	(34 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16KB	(14U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_64KB	(16U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512KB	(19U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1MB		(20U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2MB		(21U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_8MB		(23U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16MB	(24U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_32MB	(25U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_256MB	(28U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512MB	(29U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1GB		(30U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2GB		(31U << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16GB	(34U << HUGETLB_FLAG_ENCODE_SHIFT)
 
 #endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */
diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
index 6c1aa92a92..6ce1f1ceb4 100644
--- a/linux-headers/asm-generic/mman-common.h
+++ b/linux-headers/asm-generic/mman-common.h
@@ -77,6 +77,8 @@
 
 #define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
 
+#define MADV_COLLAPSE	25		/* Synchronous hugepage collapse */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/linux-headers/asm-mips/mman.h b/linux-headers/asm-mips/mman.h
index 1be428663c..c6e1fc77c9 100644
--- a/linux-headers/asm-mips/mman.h
+++ b/linux-headers/asm-mips/mman.h
@@ -103,6 +103,8 @@
 
 #define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
 
+#define MADV_COLLAPSE	25		/* Synchronous hugepage collapse */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h
index 7351417afd..92af6f3f05 100644
--- a/linux-headers/asm-riscv/kvm.h
+++ b/linux-headers/asm-riscv/kvm.h
@@ -48,6 +48,10 @@ struct kvm_sregs {
 /* CONFIG registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
 struct kvm_riscv_config {
 	unsigned long isa;
+	unsigned long zicbom_block_size;
+	unsigned long mvendorid;
+	unsigned long marchid;
+	unsigned long mimpid;
 };
 
 /* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
@@ -98,6 +102,9 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_M,
 	KVM_RISCV_ISA_EXT_SVPBMT,
 	KVM_RISCV_ISA_EXT_SSTC,
+	KVM_RISCV_ISA_EXT_SVINVAL,
+	KVM_RISCV_ISA_EXT_ZIHINTPAUSE,
+	KVM_RISCV_ISA_EXT_ZICBOM,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 46de10a809..2747d2ce14 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -53,14 +53,6 @@
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
 
-struct kvm_memory_alias {
-	__u32 slot;  /* this has a different namespace than memory slots */
-	__u32 flags;
-	__u64 guest_phys_addr;
-	__u64 memory_size;
-	__u64 target_phys_addr;
-};
-
 /* for KVM_GET_IRQCHIP and KVM_SET_IRQCHIP */
 struct kvm_pic_state {
 	__u8 last_irr;	/* edge detection */
@@ -214,6 +206,8 @@ struct kvm_msr_list {
 struct kvm_msr_filter_range {
 #define KVM_MSR_FILTER_READ  (1 << 0)
 #define KVM_MSR_FILTER_WRITE (1 << 1)
+#define KVM_MSR_FILTER_RANGE_VALID_MASK (KVM_MSR_FILTER_READ | \
+					 KVM_MSR_FILTER_WRITE)
 	__u32 flags;
 	__u32 nmsrs; /* number of msrs in bitmap */
 	__u32 base;  /* MSR index the bitmap starts at */
@@ -224,6 +218,7 @@ struct kvm_msr_filter_range {
 struct kvm_msr_filter {
 #define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0)
 #define KVM_MSR_FILTER_DEFAULT_DENY  (1 << 0)
+#define KVM_MSR_FILTER_VALID_MASK (KVM_MSR_FILTER_DEFAULT_DENY)
 	__u32 flags;
 	struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES];
 };
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index ebdafa576d..30b2795d10 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -86,14 +86,6 @@ struct kvm_debug_guest {
 /* *** End of deprecated interfaces *** */
 
 
-/* for KVM_CREATE_MEMORY_REGION */
-struct kvm_memory_region {
-	__u32 slot;
-	__u32 flags;
-	__u64 guest_phys_addr;
-	__u64 memory_size; /* bytes */
-};
-
 /* for KVM_SET_USER_MEMORY_REGION */
 struct kvm_userspace_memory_region {
 	__u32 slot;
@@ -104,9 +96,9 @@ struct kvm_userspace_memory_region {
 };
 
 /*
- * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
- * other bits are reserved for kvm internal use which are defined in
- * include/linux/kvm_host.h.
+ * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
+ * userspace, other bits are reserved for kvm internal use which are defined
+ * in include/linux/kvm_host.h.
  */
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
@@ -483,6 +475,9 @@ struct kvm_run {
 #define KVM_MSR_EXIT_REASON_INVAL	(1 << 0)
 #define KVM_MSR_EXIT_REASON_UNKNOWN	(1 << 1)
 #define KVM_MSR_EXIT_REASON_FILTER	(1 << 2)
+#define KVM_MSR_EXIT_REASON_VALID_MASK	(KVM_MSR_EXIT_REASON_INVAL   |	\
+					 KVM_MSR_EXIT_REASON_UNKNOWN |	\
+					 KVM_MSR_EXIT_REASON_FILTER)
 			__u32 reason; /* kernel -> user */
 			__u32 index; /* kernel -> user */
 			__u64 data; /* kernel <-> user */
@@ -1175,6 +1170,9 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1264,6 +1262,7 @@ struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_RUNSTATE		(1 << 3)
 #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL	(1 << 4)
 #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
+#define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
 
 struct kvm_xen_hvm_config {
 	__u32 flags;
@@ -1434,18 +1433,12 @@ struct kvm_vfio_spapr_tce {
 	__s32	tablefd;
 };
 
-/*
- * ioctls for VM fds
- */
-#define KVM_SET_MEMORY_REGION     _IOW(KVMIO,  0x40, struct kvm_memory_region)
 /*
  * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
  * a vcpu fd.
  */
 #define KVM_CREATE_VCPU           _IO(KVMIO,   0x41)
 #define KVM_GET_DIRTY_LOG         _IOW(KVMIO,  0x42, struct kvm_dirty_log)
-/* KVM_SET_MEMORY_ALIAS is obsolete: */
-#define KVM_SET_MEMORY_ALIAS      _IOW(KVMIO,  0x43, struct kvm_memory_alias)
 #define KVM_SET_NR_MMU_PAGES      _IO(KVMIO,   0x44)
 #define KVM_GET_NR_MMU_PAGES      _IO(KVMIO,   0x45)
 #define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \
@@ -1737,6 +1730,8 @@ enum pv_cmd_id {
 	KVM_PV_UNSHARE_ALL,
 	KVM_PV_INFO,
 	KVM_PV_DUMP,
+	KVM_PV_ASYNC_CLEANUP_PREPARE,
+	KVM_PV_ASYNC_CLEANUP_PERFORM,
 };
 
 struct kvm_pv_cmd {
@@ -1767,6 +1762,7 @@ struct kvm_xen_hvm_attr {
 	union {
 		__u8 long_mode;
 		__u8 vector;
+		__u8 runstate_update_flag;
 		struct {
 			__u64 gfn;
 		} shared_info;
@@ -1807,6 +1803,8 @@ struct kvm_xen_hvm_attr {
 /* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
 #define KVM_XEN_ATTR_TYPE_EVTCHN		0x3
 #define KVM_XEN_ATTR_TYPE_XEN_VERSION		0x4
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
+#define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG	0x5
 
 /* Per-vCPU Xen attributes */
 #define KVM_XEN_VCPU_GET_ATTR	_IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
diff --git a/linux-headers/linux/psci.h b/linux-headers/linux/psci.h
index 213b2a0f70..e60dfd8907 100644
--- a/linux-headers/linux/psci.h
+++ b/linux-headers/linux/psci.h
@@ -48,12 +48,26 @@
 #define PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU	PSCI_0_2_FN64(7)
 
 #define PSCI_1_0_FN_PSCI_FEATURES		PSCI_0_2_FN(10)
+#define PSCI_1_0_FN_CPU_FREEZE			PSCI_0_2_FN(11)
+#define PSCI_1_0_FN_CPU_DEFAULT_SUSPEND		PSCI_0_2_FN(12)
+#define PSCI_1_0_FN_NODE_HW_STATE		PSCI_0_2_FN(13)
 #define PSCI_1_0_FN_SYSTEM_SUSPEND		PSCI_0_2_FN(14)
 #define PSCI_1_0_FN_SET_SUSPEND_MODE		PSCI_0_2_FN(15)
+#define PSCI_1_0_FN_STAT_RESIDENCY		PSCI_0_2_FN(16)
+#define PSCI_1_0_FN_STAT_COUNT			PSCI_0_2_FN(17)
+
 #define PSCI_1_1_FN_SYSTEM_RESET2		PSCI_0_2_FN(18)
+#define PSCI_1_1_FN_MEM_PROTECT			PSCI_0_2_FN(19)
+#define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGE	PSCI_0_2_FN(19)
 
+#define PSCI_1_0_FN64_CPU_DEFAULT_SUSPEND	PSCI_0_2_FN64(12)
+#define PSCI_1_0_FN64_NODE_HW_STATE		PSCI_0_2_FN64(13)
 #define PSCI_1_0_FN64_SYSTEM_SUSPEND		PSCI_0_2_FN64(14)
+#define PSCI_1_0_FN64_STAT_RESIDENCY		PSCI_0_2_FN64(16)
+#define PSCI_1_0_FN64_STAT_COUNT		PSCI_0_2_FN64(17)
+
 #define PSCI_1_1_FN64_SYSTEM_RESET2		PSCI_0_2_FN64(18)
+#define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE	PSCI_0_2_FN64(19)
 
 /* PSCI v0.2 power state encoding for CPU_SUSPEND function */
 #define PSCI_0_2_POWER_STATE_ID_MASK		0xffff
diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index a3a377cd44..ba5d0df52f 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -12,6 +12,10 @@
 
 #include <linux/types.h>
 
+/* ioctls for /dev/userfaultfd */
+#define USERFAULTFD_IOC 0xAA
+#define USERFAULTFD_IOC_NEW _IO(USERFAULTFD_IOC, 0x00)
+
 /*
  * If the UFFDIO_API is upgraded someday, the UFFDIO_UNREGISTER and
  * UFFDIO_WAKE ioctls should be defined as _IOW and not as _IOR.  In
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index ede44b5572..c59692ce0b 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -819,12 +819,20 @@ struct vfio_device_feature {
  * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
  * is supported in addition to the STOP_COPY states.
  *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that
+ * PRE_COPY is supported in addition to the STOP_COPY states.
+ *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY
+ * means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported
+ * in addition to the STOP_COPY states.
+ *
  * Other combinations of flags have behavior to be defined in the future.
  */
 struct vfio_device_feature_migration {
 	__aligned_u64 flags;
 #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
 #define VFIO_MIGRATION_P2P		(1 << 1)
+#define VFIO_MIGRATION_PRE_COPY		(1 << 2)
 };
 #define VFIO_DEVICE_FEATURE_MIGRATION 1
 
@@ -875,8 +883,13 @@ struct vfio_device_feature_mig_state {
  *  RESUMING - The device is stopped and is loading a new internal state
  *  ERROR - The device has failed and must be reset
  *
- * And 1 optional state to support VFIO_MIGRATION_P2P:
+ * And optional states to support VFIO_MIGRATION_P2P:
  *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
+ * And VFIO_MIGRATION_PRE_COPY:
+ *  PRE_COPY - The device is running normally but tracking internal state
+ *             changes
+ * And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY:
+ *  PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA
  *
  * The FSM takes actions on the arcs between FSM states. The driver implements
  * the following behavior for the FSM arcs:
@@ -908,20 +921,48 @@ struct vfio_device_feature_mig_state {
  *
  *   To abort a RESUMING session the device must be reset.
  *
+ * PRE_COPY -> RUNNING
  * RUNNING_P2P -> RUNNING
  *   While in RUNNING the device is fully operational, the device may generate
  *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
  *   and the device may advance its internal state.
  *
+ *   The PRE_COPY arc will terminate a data transfer session.
+ *
+ * PRE_COPY_P2P -> RUNNING_P2P
  * RUNNING -> RUNNING_P2P
  * STOP -> RUNNING_P2P
  *   While in RUNNING_P2P the device is partially running in the P2P quiescent
  *   state defined below.
  *
+ *   The PRE_COPY_P2P arc will terminate a data transfer session.
+ *
+ * RUNNING -> PRE_COPY
+ * RUNNING_P2P -> PRE_COPY_P2P
  * STOP -> STOP_COPY
- *   This arc begin the process of saving the device state and will return a
- *   new data_fd.
+ *   PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states
+ *   which share a data transfer session. Moving between these states alters
+ *   what is streamed in session, but does not terminate or otherwise affect
+ *   the associated fd.
+ *
+ *   These arcs begin the process of saving the device state and will return a
+ *   new data_fd. The migration driver may perform actions such as enabling
+ *   dirty logging of device state when entering PRE_COPY or PER_COPY_P2P.
+ *
+ *   Each arc does not change the device operation, the device remains
+ *   RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below
+ *   in PRE_COPY_P2P -> STOP_COPY.
+ *
+ * PRE_COPY -> PRE_COPY_P2P
+ *   Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above.
+ *   However, while in the PRE_COPY_P2P state, the device is partially running
+ *   in the P2P quiescent state defined below, like RUNNING_P2P.
+ *
+ * PRE_COPY_P2P -> PRE_COPY
+ *   This arc allows returning the device to a full RUNNING behavior while
+ *   continuing all the behaviors of PRE_COPY.
  *
+ * PRE_COPY_P2P -> STOP_COPY
  *   While in the STOP_COPY state the device has the same behavior as STOP
  *   with the addition that the data transfers session continues to stream the
  *   migration state. End of stream on the FD indicates the entire device
@@ -939,6 +980,13 @@ struct vfio_device_feature_mig_state {
  *   device state for this arc if required to prepare the device to receive the
  *   migration data.
  *
+ * STOP_COPY -> PRE_COPY
+ * STOP_COPY -> PRE_COPY_P2P
+ *   These arcs are not permitted and return error if requested. Future
+ *   revisions of this API may define behaviors for these arcs, in this case
+ *   support will be discoverable by a new flag in
+ *   VFIO_DEVICE_FEATURE_MIGRATION.
+ *
  * any -> ERROR
  *   ERROR cannot be specified as a device state, however any transition request
  *   can be failed with an errno return and may then move the device_state into
@@ -950,7 +998,7 @@ struct vfio_device_feature_mig_state {
  * The optional peer to peer (P2P) quiescent state is intended to be a quiescent
  * state for the device for the purposes of managing multiple devices within a
  * user context where peer-to-peer DMA between devices may be active. The
- * RUNNING_P2P states must prevent the device from initiating
+ * RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating
  * any new P2P DMA transactions. If the device can identify P2P transactions
  * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
  * driver must complete any such outstanding operations prior to completing the
@@ -963,6 +1011,8 @@ struct vfio_device_feature_mig_state {
  * above FSM arcs. As there are multiple paths through the FSM arcs the path
  * should be selected based on the following rules:
  *   - Select the shortest path.
+ *   - The path cannot have saving group states as interior arcs, only
+ *     starting/end states.
  * Refer to vfio_mig_get_next_state() for the result of the algorithm.
  *
  * The automatic transit through the FSM arcs that make up the combination
@@ -976,6 +1026,9 @@ struct vfio_device_feature_mig_state {
  * support them. The user can discover if these states are supported by using
  * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
  * avoid knowing about these optional states if the kernel driver supports them.
+ *
+ * Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY
+ * is not present.
  */
 enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_ERROR = 0,
@@ -984,8 +1037,225 @@ enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_STOP_COPY = 3,
 	VFIO_DEVICE_STATE_RESUMING = 4,
 	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
+	VFIO_DEVICE_STATE_PRE_COPY = 6,
+	VFIO_DEVICE_STATE_PRE_COPY_P2P = 7,
+};
+
+/**
+ * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21)
+ *
+ * This ioctl is used on the migration data FD in the precopy phase of the
+ * migration data transfer. It returns an estimate of the current data sizes
+ * remaining to be transferred. It allows the user to judge when it is
+ * appropriate to leave PRE_COPY for STOP_COPY.
+ *
+ * This ioctl is valid only in PRE_COPY states and kernel driver should
+ * return -EINVAL from any other migration state.
+ *
+ * The vfio_precopy_info data structure returned by this ioctl provides
+ * estimates of data available from the device during the PRE_COPY states.
+ * This estimate is split into two categories, initial_bytes and
+ * dirty_bytes.
+ *
+ * The initial_bytes field indicates the amount of initial precopy
+ * data available from the device. This field should have a non-zero initial
+ * value and decrease as migration data is read from the device.
+ * It is recommended to leave PRE_COPY for STOP_COPY only after this field
+ * reaches zero. Leaving PRE_COPY earlier might make things slower.
+ *
+ * The dirty_bytes field tracks device state changes relative to data
+ * previously retrieved.  This field starts at zero and may increase as
+ * the internal device state is modified or decrease as that modified
+ * state is read from the device.
+ *
+ * Userspace may use the combination of these fields to estimate the
+ * potential data size available during the PRE_COPY phases, as well as
+ * trends relative to the rate the device is dirtying its internal
+ * state, but these fields are not required to have any bearing relative
+ * to the data size available during the STOP_COPY phase.
+ *
+ * Drivers have a lot of flexibility in when and what they transfer during the
+ * PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO.
+ *
+ * During pre-copy the migration data FD has a temporary "end of stream" that is
+ * reached when both initial_bytes and dirty_byte are zero. For instance, this
+ * may indicate that the device is idle and not currently dirtying any internal
+ * state. When read() is done on this temporary end of stream the kernel driver
+ * should return ENOMSG from read(). Userspace can wait for more data (which may
+ * never come) by using poll.
+ *
+ * Once in STOP_COPY the migration data FD has a permanent end of stream
+ * signaled in the usual way by read() always returning 0 and poll always
+ * returning readable. ENOMSG may not be returned in STOP_COPY.
+ * Support for this ioctl is mandatory if a driver claims to support
+ * VFIO_MIGRATION_PRE_COPY.
+ *
+ * Return: 0 on success, -1 and errno set on failure.
+ */
+struct vfio_precopy_info {
+	__u32 argsz;
+	__u32 flags;
+	__aligned_u64 initial_bytes;
+	__aligned_u64 dirty_bytes;
+};
+
+#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21)
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
+ * state with the platform-based power management.  Device use of lower power
+ * states depends on factors managed by the runtime power management core,
+ * including system level support and coordinating support among dependent
+ * devices.  Enabling device low power entry does not guarantee lower power
+ * usage by the device, nor is a mechanism provided through this feature to
+ * know the current power state of the device.  If any device access happens
+ * (either from the host or through the vfio uAPI) when the device is in the
+ * low power state, then the host will move the device out of the low power
+ * state as necessary prior to the access.  Once the access is completed, the
+ * device may re-enter the low power state.  For single shot low power support
+ * with wake-up notification, see
+ * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below.  Access to mmap'd
+ * device regions is disabled on LOW_POWER_ENTRY and may only be resumed after
+ * calling LOW_POWER_EXIT.
+ */
+#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3
+
+/*
+ * This device feature has the same behavior as
+ * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user
+ * provides an eventfd for wake-up notification.  When the device moves out of
+ * the low power state for the wake-up, the host will not allow the device to
+ * re-enter a low power state without a subsequent user call to one of the low
+ * power entry device feature IOCTLs.  Access to mmap'd device regions is
+ * disabled on LOW_POWER_ENTRY_WITH_WAKEUP and may only be resumed after the
+ * low power exit.  The low power exit can happen either through LOW_POWER_EXIT
+ * or through any other access (where the wake-up notification has been
+ * generated).  The access to mmap'd device regions will not trigger low power
+ * exit.
+ *
+ * The notification through the provided eventfd will be generated only when
+ * the device has entered and is resumed from a low power state after
+ * calling this device feature IOCTL.  A device that has not entered low power
+ * state, as managed through the runtime power management core, will not
+ * generate a notification through the provided eventfd on access.  Calling the
+ * LOW_POWER_EXIT feature is optional in the case where notification has been
+ * signaled on the provided eventfd that a resume from low power has occurred.
+ */
+struct vfio_device_low_power_entry_with_wakeup {
+	__s32 wakeup_eventfd;
+	__u32 reserved;
+};
+
+#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states as
+ * previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or
+ * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features.
+ * This device feature IOCTL may itself generate a wakeup eventfd notification
+ * in the latter case if the device had previously entered a low power state.
+ */
+#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET start/stop device DMA logging.
+ * VFIO_DEVICE_FEATURE_PROBE can be used to detect if the device supports
+ * DMA logging.
+ *
+ * DMA logging allows a device to internally record what DMAs the device is
+ * initiating and report them back to userspace. It is part of the VFIO
+ * migration infrastructure that allows implementing dirty page tracking
+ * during the pre copy phase of live migration. Only DMA WRITEs are logged,
+ * and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
+ *
+ * When DMA logging is started a range of IOVAs to monitor is provided and the
+ * device can optimize its logging to cover only the IOVA range given. Each
+ * DMA that the device initiates inside the range will be logged by the device
+ * for later retrieval.
+ *
+ * page_size is an input that hints what tracking granularity the device
+ * should try to achieve. If the device cannot do the hinted page size then
+ * it's the driver choice which page size to pick based on its support.
+ * On output the device will return the page size it selected.
+ *
+ * ranges is a pointer to an array of
+ * struct vfio_device_feature_dma_logging_range.
+ *
+ * The core kernel code guarantees to support by minimum num_ranges that fit
+ * into a single kernel page. User space can try higher values but should give
+ * up if the above can't be achieved as of some driver limitations.
+ *
+ * A single call to start device DMA logging can be issued and a matching stop
+ * should follow at the end. Another start is not allowed in the meantime.
+ */
+struct vfio_device_feature_dma_logging_control {
+	__aligned_u64 page_size;
+	__u32 num_ranges;
+	__u32 __reserved;
+	__aligned_u64 ranges;
 };
 
+struct vfio_device_feature_dma_logging_range {
+	__aligned_u64 iova;
+	__aligned_u64 length;
+};
+
+#define VFIO_DEVICE_FEATURE_DMA_LOGGING_START 6
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
+ * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
+ */
+#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 7
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_GET read back and clear the device DMA log
+ *
+ * Query the device's DMA log for written pages within the given IOVA range.
+ * During querying the log is cleared for the IOVA range.
+ *
+ * bitmap is a pointer to an array of u64s that will hold the output bitmap
+ * with 1 bit reporting a page_size unit of IOVA. The mapping of IOVA to bits
+ * is given by:
+ *  bitmap[(addr - iova)/page_size] & (1ULL << (addr % 64))
+ *
+ * The input page_size can be any power of two value and does not have to
+ * match the value given to VFIO_DEVICE_FEATURE_DMA_LOGGING_START. The driver
+ * will format its internal logging to match the reporting page size, possibly
+ * by replicating bits if the internal page size is lower than requested.
+ *
+ * The LOGGING_REPORT will only set bits in the bitmap and never clear or
+ * perform any initialization of the user provided bitmap.
+ *
+ * If any error is returned userspace should assume that the dirty log is
+ * corrupted. Error recovery is to consider all memory dirty and try to
+ * restart the dirty tracking, or to abort/restart the whole migration.
+ *
+ * If DMA logging is not enabled, an error will be returned.
+ *
+ */
+struct vfio_device_feature_dma_logging_report {
+	__aligned_u64 iova;
+	__aligned_u64 length;
+	__aligned_u64 page_size;
+	__aligned_u64 bitmap;
+};
+
+#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 8
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_GET read back the estimated data length that will
+ * be required to complete stop copy.
+ *
+ * Note: Can be called on each device state.
+ */
+
+struct vfio_device_feature_mig_data_size {
+	__aligned_u64 stop_copy_length;
+};
+
+#define VFIO_DEVICE_FEATURE_MIG_DATA_SIZE 9
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 02/13] migration: No save_live_pending() method uses the QEMUFile parameter
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 01/13] linux-headers: Update to v6.2-rc1 Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 03/13] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

From: Juan Quintela <quintela@redhat.com>

So remove it everywhere.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/register.h   | 3 +--
 migration/savevm.h             | 3 +--
 hw/s390x/s390-stattrib.c       | 2 +-
 hw/vfio/migration.c            | 3 +--
 migration/block-dirty-bitmap.c | 3 +--
 migration/block.c              | 2 +-
 migration/migration.c          | 4 ++--
 migration/ram.c                | 2 +-
 migration/savevm.c             | 7 +++----
 9 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index c1dcff0f90..eb6266a877 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -46,8 +46,7 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_setup)(QEMUFile *f, void *opaque);
-    void (*save_live_pending)(QEMUFile *f, void *opaque,
-                              uint64_t threshold_size,
+    void (*save_live_pending)(void *opaque, uint64_t threshold_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
                               uint64_t *res_postcopy_only);
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..6dec468cc3 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,8 +40,7 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
-                               uint64_t *res_precopy_only,
+void qemu_savevm_state_pending(uint64_t max_size, uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 9eda1c3b2a..a553a1e850 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void cmma_save_pending(void *opaque, uint64_t max_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
                               uint64_t *res_postcopy_only)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c74453e0b5..e1413ac90c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,8 +456,7 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(QEMUFile *f, void *opaque,
-                              uint64_t threshold_size,
+static void vfio_save_pending(void *opaque, uint64_t threshold_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
                               uint64_t *res_postcopy_only)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 283017d7d3..ffc433cd11 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -761,8 +761,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
-                                      uint64_t max_size,
+static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
                                       uint64_t *res_precopy_only,
                                       uint64_t *res_compatible,
                                       uint64_t *res_postcopy_only)
diff --git a/migration/block.c b/migration/block.c
index 4347da1526..b6a98caf78 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -862,7 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void block_save_pending(void *opaque, uint64_t max_size,
                                uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only)
diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..9795d0ec5c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3751,8 +3751,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     uint64_t pending_size, pend_pre, pend_compat, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-    qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
-                              &pend_compat, &pend_post);
+    qemu_savevm_state_pending(s->threshold_size, &pend_pre, &pend_compat,
+                              &pend_post);
     pending_size = pend_pre + pend_compat + pend_post;
 
     trace_migrate_pending(pending_size, s->threshold_size,
diff --git a/migration/ram.c b/migration/ram.c
index 334309f1c6..35a251b297 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3392,7 +3392,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void ram_save_pending(void *opaque, uint64_t max_size,
                              uint64_t *res_precopy_only,
                              uint64_t *res_compatible,
                              uint64_t *res_postcopy_only)
diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..a94e637904 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1471,7 +1471,7 @@ flush:
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
  */
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
+void qemu_savevm_state_pending(uint64_t threshold_size,
                                uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only)
@@ -1492,9 +1492,8 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->save_live_pending(f, se->opaque, threshold_size,
-                                   res_precopy_only, res_compatible,
-                                   res_postcopy_only);
+        se->ops->save_live_pending(se->opaque, threshold_size, res_precopy_only,
+                                   res_compatible, res_postcopy_only);
     }
 }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 03/13] vfio/migration: Fix NULL pointer dereference bug
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 01/13] linux-headers: Update to v6.2-rc1 Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 02/13] migration: No save_live_pending() method uses the QEMUFile parameter Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 04/13] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support Avihai Horon
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

As part of its error flow, vfio_vmstate_change() accesses
MigrationState->to_dst_file without any checks. This can cause a NULL
pointer dereference if the error flow is taken and
MigrationState->to_dst_file is not set.

For example, this can happen if VM is started or stopped not during
migration and vfio_vmstate_change() error flow is taken, as
MigrationState->to_dst_file is not set at that time.

Fix it by checking that MigrationState->to_dst_file is set before using
it.

Fixes: 02a7e71b1e5b ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 hw/vfio/migration.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e1413ac90c..09fe7c1de2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -743,7 +743,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
          */
         error_report("%s: Failed to set device state 0x%x", vbasedev->name,
                      (migration->device_state & mask) | value);
-        qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+        if (migrate_get_current()->to_dst_file) {
+            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+        }
     }
     vbasedev->migration->vm_running = running;
     trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 04/13] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (2 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 03/13] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked. This is because a DMA-able VFIO device
can dirty RAM pages without updating QEMU about it, thus breaking the
migration.

However, this doesn't mean that migration can't be done at all.
In such case, allow migration and let QEMU VFIO code mark all pages
dirty.

This guarantees that all pages that might have gotten dirty are reported
back, and thus guarantees a valid migration even without VFIO IOMMU
dirty tracking support.

The motivation for this patch is the introduction of iommufd [1].
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into its internal ops, allowing the usage of these IOCTLs
over iommufd. However, VFIO IOMMU dirty tracking is not supported by
this VFIO compatibility API.

This patch will allow migration by hosts that use the VFIO compatibility
API and prevent migration regressions caused by the lack of VFIO IOMMU
dirty tracking support.

[1]
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/common.c    | 20 ++++++++++++++++++--
 hw/vfio/migration.c |  3 +--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 130e5d1dc7..f6dd571549 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -488,6 +488,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
+    if (iotlb && vfio_devices_all_running_and_saving(container)) {
+        cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
+                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
+                                            DIRTY_CLIENTS_NOCODE);
+    }
+
     return 0;
 }
 
@@ -1201,6 +1207,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
         .argsz = sizeof(dirty),
     };
 
+    if (!container->dirty_pages_supported) {
+        return;
+    }
+
     if (start) {
         dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
     } else {
@@ -1236,6 +1246,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
     uint64_t pages;
     int ret;
 
+    if (!container->dirty_pages_supported) {
+        cpu_physical_memory_set_dirty_range(ram_addr, size,
+                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
+                                            DIRTY_CLIENTS_NOCODE);
+        return 0;
+    }
+
     dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
     dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1409,8 +1426,7 @@ static void vfio_listener_log_sync(MemoryListener *listener,
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 
-    if (vfio_listener_skipped_section(section) ||
-        !container->dirty_pages_supported) {
+    if (vfio_listener_skipped_section(section)) {
         return;
     }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 09fe7c1de2..552c2313b2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -860,11 +860,10 @@ int64_t vfio_mig_bytes_transferred(void)
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
-    VFIOContainer *container = vbasedev->group->container;
     struct vfio_region_info *info = NULL;
     int ret = -ENOTSUP;
 
-    if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
+    if (!vbasedev->enable_migration) {
         goto add_blocker;
     }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd()
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (3 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 04/13] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-12 14:43   ` Cédric Le Goater
  2023-01-12  8:50 ` [PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Add new function qemu_file_get_to_fd() that allows reading data from
QEMUFile and writing it straight into a given fd.

This will be used later in VFIO migration code.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 migration/qemu-file.h |  1 +
 migration/qemu-file.c | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index fa13d04d78..9d0155a2a1 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -148,6 +148,7 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2d5f74ffc2..102ab3b439 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -940,3 +940,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file)
 {
     return file->ioc;
 }
+
+/*
+ * Read size bytes from QEMUFile f and write them to fd.
+ */
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size)
+{
+    while (size) {
+        size_t pending = f->buf_size - f->buf_index;
+        ssize_t rc;
+
+        if (!pending) {
+            rc = qemu_fill_buffer(f);
+            if (rc < 0) {
+                return rc;
+            }
+            if (rc == 0) {
+                return -EIO;
+            }
+            continue;
+        }
+
+        rc = write(fd, f->buf + f->buf_index, MIN(pending, size));
+        if (rc < 0) {
+            return -errno;
+        }
+        if (rc == 0) {
+            return -EIO;
+        }
+        f->buf_index += rc;
+        size -= rc;
+    }
+
+    return 0;
+}
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (4 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-13 15:46   ` Cédric Le Goater
  2023-01-12  8:50 ` [PATCH v6 07/13] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

vfio_devices_all_running_and_saving() is used to check if migration is
in pre-copy phase. This is done by checking if migration is in setup or
active states and if all VFIO devices are in pre-copy state, i.e.
_SAVING | _RUNNING.

In VFIO migration protocol v2 pre-copy support is made optional. Hence,
a matching v2 protocol pre-copy state can't be used here.

As preparation for adding v2 protocol, change
vfio_devices_all_running_and_saving() logic such that it doesn't use the
VFIO pre-copy state.

The new equivalent logic checks if migration is in active state and if
all VFIO devices are in running state [1]. No functional changes
intended.

[1] Note that checking if migration is in setup or active states and if
all VFIO devices are in running state doesn't guarantee that we are in
pre-copy phase, thus we check if migration is only in active state.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/common.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f6dd571549..3a35f4afad 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
+#include "migration/misc.h"
 #include "sysemu/tpm.h"
 
 VFIOGroupList vfio_group_list =
@@ -363,13 +364,16 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
     return true;
 }
 
-static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
+/*
+ * Check if all VFIO devices are running and migration is active, which is
+ * essentially equivalent to the migration being in pre-copy phase.
+ */
+static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 {
     VFIOGroup *group;
     VFIODevice *vbasedev;
-    MigrationState *ms = migrate_get_current();
 
-    if (!migration_is_setup_or_active(ms->state)) {
+    if (!migration_is_active(migrate_get_current())) {
         return false;
     }
 
@@ -381,8 +385,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
                 return false;
             }
 
-            if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
-                (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+            if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
                 continue;
             } else {
                 return false;
@@ -461,7 +464,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
     };
 
     if (iotlb && container->dirty_pages_supported &&
-        vfio_devices_all_running_and_saving(container)) {
+        vfio_devices_all_running_and_mig_active(container)) {
         return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
     }
 
@@ -488,7 +491,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
-    if (iotlb && vfio_devices_all_running_and_saving(container)) {
+    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
         cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
                                             DIRTY_CLIENTS_NOCODE);
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 07/13] vfio/migration: Move migration v1 logic to vfio_migration_init()
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (5 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 08/13] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Move vfio_dev_get_region_info() logic from vfio_migration_probe() to
vfio_migration_init(). This logic is specific to v1 protocol and moving
it will make it easier to add the v2 protocol implementation later.
No functional changes intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/migration.c  | 30 +++++++++++++++---------------
 hw/vfio/trace-events |  2 +-
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 552c2313b2..977da64411 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -788,14 +788,14 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
     vbasedev->migration = NULL;
 }
 
-static int vfio_migration_init(VFIODevice *vbasedev,
-                               struct vfio_region_info *info)
+static int vfio_migration_init(VFIODevice *vbasedev)
 {
     int ret;
     Object *obj;
     VFIOMigration *migration;
     char id[256] = "";
     g_autofree char *path = NULL, *oid = NULL;
+    struct vfio_region_info *info;
 
     if (!vbasedev->ops->vfio_get_object) {
         return -EINVAL;
@@ -806,6 +806,14 @@ static int vfio_migration_init(VFIODevice *vbasedev,
         return -EINVAL;
     }
 
+    ret = vfio_get_dev_region_info(vbasedev,
+                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+                                   &info);
+    if (ret) {
+        return ret;
+    }
+
     vbasedev->migration = g_new0(VFIOMigration, 1);
     vbasedev->migration->device_state = VFIO_DEVICE_STATE_V1_RUNNING;
     vbasedev->migration->vm_running = runstate_is_running();
@@ -825,6 +833,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
         goto err;
     }
 
+    g_free(info);
+
     migration = vbasedev->migration;
     migration->vbasedev = vbasedev;
 
@@ -847,6 +857,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
     return 0;
 
 err:
+    g_free(info);
     vfio_migration_exit(vbasedev);
     return ret;
 }
@@ -860,34 +871,23 @@ int64_t vfio_mig_bytes_transferred(void)
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
-    struct vfio_region_info *info = NULL;
     int ret = -ENOTSUP;
 
     if (!vbasedev->enable_migration) {
         goto add_blocker;
     }
 
-    ret = vfio_get_dev_region_info(vbasedev,
-                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-                                   &info);
-    if (ret) {
-        goto add_blocker;
-    }
-
-    ret = vfio_migration_init(vbasedev, info);
+    ret = vfio_migration_init(vbasedev);
     if (ret) {
         goto add_blocker;
     }
 
-    trace_vfio_migration_probe(vbasedev->name, info->index);
-    g_free(info);
+    trace_vfio_migration_probe(vbasedev->name);
     return 0;
 
 add_blocker:
     error_setg(&vbasedev->migration_blocker,
                "VFIO device doesn't support migration");
-    g_free(info);
 
     ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
     if (ret < 0) {
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 73dffe9e00..b259dcc644 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -148,7 +148,7 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
-vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
+vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 08/13] vfio/migration: Rename functions/structs related to v1 protocol
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (6 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 07/13] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

To avoid name collisions, rename functions and structs related to VFIO
migration protocol v1. This will allow the two protocols to co-exist
when v2 protocol is added, until v1 is removed. No functional changes
intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h |   2 +-
 hw/vfio/common.c              |   6 +-
 hw/vfio/migration.c           | 106 +++++++++++++++++-----------------
 hw/vfio/trace-events          |  12 ++--
 4 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e573f5a9f1..bbaf72ba00 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -62,7 +62,7 @@ typedef struct VFIOMigration {
     struct VFIODevice *vbasedev;
     VMChangeStateEntry *vm_state;
     VFIORegion region;
-    uint32_t device_state;
+    uint32_t device_state_v1;
     int vm_running;
     Notifier migration_state;
     uint64_t pending_bytes;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3a35f4afad..550b2d7ded 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,8 +355,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
                 return false;
             }
 
-            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
-                && (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+                (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
                 return false;
             }
         }
@@ -385,7 +385,7 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
                 return false;
             }
 
-            if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
+            if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
                 continue;
             } else {
                 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 977da64411..9df859f4d3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -107,8 +107,8 @@ static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
  * an error is returned.
  */
 
-static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
-                                    uint32_t value)
+static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
+                                       uint32_t value)
 {
     VFIOMigration *migration = vbasedev->migration;
     VFIORegion *region = &migration->region;
@@ -145,8 +145,8 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
         return ret;
     }
 
-    migration->device_state = device_state;
-    trace_vfio_migration_set_state(vbasedev->name, device_state);
+    migration->device_state_v1 = device_state;
+    trace_vfio_migration_v1_set_state(vbasedev->name, device_state);
     return 0;
 }
 
@@ -260,8 +260,8 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
     return ret;
 }
 
-static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
-                            uint64_t data_size)
+static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
+                               uint64_t data_size)
 {
     VFIORegion *region = &vbasedev->migration->region;
     uint64_t data_offset = 0, size, report_size;
@@ -288,7 +288,7 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
             data_size = 0;
         }
 
-        trace_vfio_load_state_device_data(vbasedev->name, data_offset, size);
+        trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size);
 
         while (size) {
             void *buf;
@@ -394,7 +394,7 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
     return qemu_file_get_error(f);
 }
 
-static void vfio_migration_cleanup(VFIODevice *vbasedev)
+static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
 
@@ -405,13 +405,13 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
 
 /* ---------------------------------------------------------------------- */
 
-static int vfio_save_setup(QEMUFile *f, void *opaque)
+static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
     int ret;
 
-    trace_vfio_save_setup(vbasedev->name);
+    trace_vfio_v1_save_setup(vbasedev->name);
 
     qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
 
@@ -431,8 +431,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
         }
     }
 
-    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-                                   VFIO_DEVICE_STATE_V1_SAVING);
+    ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
+                                      VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state SAVING", vbasedev->name);
         return ret;
@@ -448,18 +448,18 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void vfio_save_cleanup(void *opaque)
+static void vfio_v1_save_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
 
-    vfio_migration_cleanup(vbasedev);
+    vfio_migration_v1_cleanup(vbasedev);
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(void *opaque, uint64_t threshold_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only)
+static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
+                                 uint64_t *res_precopy_only,
+                                 uint64_t *res_compatible,
+                                 uint64_t *res_postcopy_only)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -472,8 +472,8 @@ static void vfio_save_pending(void *opaque, uint64_t threshold_size,
 
     *res_precopy_only += migration->pending_bytes;
 
-    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
-                            *res_postcopy_only, *res_compatible);
+    trace_vfio_v1_save_pending(vbasedev->name, *res_precopy_only,
+                               *res_postcopy_only, *res_compatible);
 }
 
 static int vfio_save_iterate(QEMUFile *f, void *opaque)
@@ -523,15 +523,15 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
+static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
     uint64_t data_size;
     int ret;
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
-                                   VFIO_DEVICE_STATE_V1_SAVING);
+    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
+                                      VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state STOP and SAVING",
                      vbasedev->name);
@@ -568,13 +568,14 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
+    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING,
+                                      0);
     if (ret) {
         error_report("%s: Failed to set state STOPPED", vbasedev->name);
         return ret;
     }
 
-    trace_vfio_save_complete_precopy(vbasedev->name);
+    trace_vfio_v1_save_complete_precopy(vbasedev->name);
     return ret;
 }
 
@@ -591,7 +592,7 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
     }
 }
 
-static int vfio_load_setup(QEMUFile *f, void *opaque)
+static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -607,8 +608,8 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
         }
     }
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
-                                   VFIO_DEVICE_STATE_V1_RESUMING);
+    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
+                                      VFIO_DEVICE_STATE_V1_RESUMING);
     if (ret) {
         error_report("%s: Failed to set state RESUMING", vbasedev->name);
         if (migration->region.mmaps) {
@@ -618,11 +619,11 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
     return ret;
 }
 
-static int vfio_load_cleanup(void *opaque)
+static int vfio_v1_load_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
 
-    vfio_migration_cleanup(vbasedev);
+    vfio_migration_v1_cleanup(vbasedev);
     trace_vfio_load_cleanup(vbasedev->name);
     return 0;
 }
@@ -660,7 +661,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
             uint64_t data_size = qemu_get_be64(f);
 
             if (data_size) {
-                ret = vfio_load_buffer(f, vbasedev, data_size);
+                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
                 if (ret < 0) {
                     return ret;
                 }
@@ -681,21 +682,21 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
-static SaveVMHandlers savevm_vfio_handlers = {
-    .save_setup = vfio_save_setup,
-    .save_cleanup = vfio_save_cleanup,
-    .save_live_pending = vfio_save_pending,
+static SaveVMHandlers savevm_vfio_v1_handlers = {
+    .save_setup = vfio_v1_save_setup,
+    .save_cleanup = vfio_v1_save_cleanup,
+    .save_live_pending = vfio_v1_save_pending,
     .save_live_iterate = vfio_save_iterate,
-    .save_live_complete_precopy = vfio_save_complete_precopy,
+    .save_live_complete_precopy = vfio_v1_save_complete_precopy,
     .save_state = vfio_save_state,
-    .load_setup = vfio_load_setup,
-    .load_cleanup = vfio_load_cleanup,
+    .load_setup = vfio_v1_load_setup,
+    .load_cleanup = vfio_v1_load_cleanup,
     .load_state = vfio_load_state,
 };
 
 /* ---------------------------------------------------------------------- */
 
-static void vfio_vmstate_change(void *opaque, bool running, RunState state)
+static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -735,21 +736,21 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
         }
     }
 
-    ret = vfio_migration_set_state(vbasedev, mask, value);
+    ret = vfio_migration_v1_set_state(vbasedev, mask, value);
     if (ret) {
         /*
          * Migration should be aborted in this case, but vm_state_notify()
          * currently does not support reporting failures.
          */
         error_report("%s: Failed to set device state 0x%x", vbasedev->name,
-                     (migration->device_state & mask) | value);
+                     (migration->device_state_v1 & mask) | value);
         if (migrate_get_current()->to_dst_file) {
             qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
         }
     }
     vbasedev->migration->vm_running = running;
-    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
-            (migration->device_state & mask) | value);
+    trace_vfio_v1_vmstate_change(vbasedev->name, running, RunState_str(state),
+            (migration->device_state_v1 & mask) | value);
 }
 
 static void vfio_migration_state_notifier(Notifier *notifier, void *data)
@@ -768,10 +769,10 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
-        ret = vfio_migration_set_state(vbasedev,
-                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
-                                         VFIO_DEVICE_STATE_V1_RESUMING),
-                                       VFIO_DEVICE_STATE_V1_RUNNING);
+        ret = vfio_migration_v1_set_state(vbasedev,
+                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
+                                            VFIO_DEVICE_STATE_V1_RESUMING),
+                                          VFIO_DEVICE_STATE_V1_RUNNING);
         if (ret) {
             error_report("%s: Failed to set state RUNNING", vbasedev->name);
         }
@@ -815,7 +816,7 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
 
     vbasedev->migration = g_new0(VFIOMigration, 1);
-    vbasedev->migration->device_state = VFIO_DEVICE_STATE_V1_RUNNING;
+    vbasedev->migration->device_state_v1 = VFIO_DEVICE_STATE_V1_RUNNING;
     vbasedev->migration->vm_running = runstate_is_running();
 
     ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
@@ -846,12 +847,11 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
     strpadcpy(id, sizeof(id), path, '\0');
 
-    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
-                         vbasedev);
+    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
+                         &savevm_vfio_v1_handlers, vbasedev);
 
-    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
-                                                           vfio_vmstate_change,
-                                                           vbasedev);
+    migration->vm_state = qdev_add_vm_change_state_handler(
+        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
     return 0;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index b259dcc644..9eb9aff295 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -149,20 +149,20 @@ vfio_display_edid_write_error(void) ""
 
 # migration.c
 vfio_migration_probe(const char *name) " (%s)"
-vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
-vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
+vfio_migration_v1_set_state(const char *name, uint32_t state) " (%s) state %d"
+vfio_v1_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
-vfio_save_setup(const char *name) " (%s)"
+vfio_v1_save_setup(const char *name) " (%s)"
 vfio_save_cleanup(const char *name) " (%s)"
 vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
 vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
 vfio_save_device_config_state(const char *name) " (%s)"
-vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
+vfio_v1_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
 vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
-vfio_save_complete_precopy(const char *name) " (%s)"
+vfio_v1_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
 vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
+vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (7 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 08/13] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-13 16:13   ` Cédric Le Goater
  2023-01-12  8:50 ` [PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending() Avihai Horon
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Implement the basic mandatory part of VFIO migration protocol v2.
This includes all functionality that is necessary to support
VFIO_MIGRATION_STOP_COPY part of the v2 protocol.

The two protocols, v1 and v2, will co-exist and in the following patches
v1 protocol code will be removed.

There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
  of a bitmap.

- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
  ioctl and normal read() and write() instead of the migration region.

- Pre-copy is made optional in v2 protocol. Support for pre-copy will be
  added later on.

Detailed information about VFIO migration protocol v2 and its difference
compared to v1 protocol can be found here [1].

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 include/hw/vfio/vfio-common.h |   5 +
 hw/vfio/common.c              |  19 +-
 hw/vfio/migration.c           | 455 +++++++++++++++++++++++++++++++---
 hw/vfio/trace-events          |   7 +
 4 files changed, 447 insertions(+), 39 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index bbaf72ba00..2ec3346fea 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -66,6 +66,11 @@ typedef struct VFIOMigration {
     int vm_running;
     Notifier migration_state;
     uint64_t pending_bytes;
+    enum vfio_device_mig_state device_state;
+    int data_fd;
+    void *data_buffer;
+    size_t data_buffer_size;
+    bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 550b2d7ded..dcaa77d2a8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,10 +355,18 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
                 return false;
             }
 
-            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+            if (!migration->v2 &&
+                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
                 (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
                 return false;
             }
+
+            if (migration->v2 &&
+                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
+                return false;
+            }
         }
     }
     return true;
@@ -385,7 +393,14 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
                 return false;
             }
 
-            if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+            if (!migration->v2 &&
+                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+                continue;
+            }
+
+            if (migration->v2 &&
+                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
                 continue;
             } else {
                 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 9df859f4d3..08f53189fa 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
 #include "qemu/cutils.h"
+#include "qemu/units.h"
 #include <linux/vfio.h>
 #include <sys/ioctl.h>
 
@@ -44,8 +45,103 @@
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xffffffffef100003ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE    (0xffffffffef100004ULL)
 
+/*
+ * This is an arbitrary size based on migration of mlx5 devices, where typically
+ * total device migration size is on the order of 100s of MB. Testing with
+ * larger values, e.g. 128MB and 1GB, did not show a performance improvement.
+ */
+#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
+
 static int64_t bytes_transferred;
 
+static const char *mig_state_to_str(enum vfio_device_mig_state state)
+{
+    switch (state) {
+    case VFIO_DEVICE_STATE_ERROR:
+        return "ERROR";
+    case VFIO_DEVICE_STATE_STOP:
+        return "STOP";
+    case VFIO_DEVICE_STATE_RUNNING:
+        return "RUNNING";
+    case VFIO_DEVICE_STATE_STOP_COPY:
+        return "STOP_COPY";
+    case VFIO_DEVICE_STATE_RESUMING:
+        return "RESUMING";
+    case VFIO_DEVICE_STATE_RUNNING_P2P:
+        return "RUNNING_P2P";
+    default:
+        return "UNKNOWN STATE";
+    }
+}
+
+static int vfio_migration_set_state(VFIODevice *vbasedev,
+                                    enum vfio_device_mig_state new_state,
+                                    enum vfio_device_mig_state recover_state)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+                              sizeof(struct vfio_device_feature_mig_state),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+    struct vfio_device_feature_mig_state *mig_state =
+        (struct vfio_device_feature_mig_state *)feature->data;
+
+    feature->argsz = sizeof(buf);
+    feature->flags =
+        VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;
+    mig_state->device_state = new_state;
+    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        /* Try to set the device in some good state */
+        int err = -errno;
+
+        error_report(
+            "%s: Failed setting device state to %s, err: %s. Setting device in recover state %s",
+                     vbasedev->name, mig_state_to_str(new_state),
+                     strerror(errno), mig_state_to_str(recover_state));
+
+        mig_state->device_state = recover_state;
+        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+            err = -errno;
+            error_report(
+                "%s: Failed setting device in recover state, err: %s. Resetting device",
+                         vbasedev->name, strerror(errno));
+
+            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
+                hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
+                         strerror(errno));
+            }
+
+            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+
+            return err;
+        }
+
+        migration->device_state = recover_state;
+
+        return err;
+    }
+
+    migration->device_state = new_state;
+    if (mig_state->data_fd != -1) {
+        if (migration->data_fd != -1) {
+            /*
+             * This can happen if the device is asynchronously reset and
+             * terminates a data transfer.
+             */
+            error_report("%s: data_fd out of sync", vbasedev->name);
+            close(mig_state->data_fd);
+
+            return -EBADF;
+        }
+
+        migration->data_fd = mig_state->data_fd;
+    }
+
+    trace_vfio_migration_set_state(vbasedev->name, mig_state_to_str(new_state));
+
+    return 0;
+}
+
 static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
                                   off_t off, bool iswrite)
 {
@@ -260,6 +356,18 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
     return ret;
 }
 
+static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
+                            uint64_t data_size)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    int ret;
+
+    ret = qemu_file_get_to_fd(f, migration->data_fd, data_size);
+    trace_vfio_load_state_device_data(vbasedev->name, data_size, ret);
+
+    return ret;
+}
+
 static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
                                uint64_t data_size)
 {
@@ -394,6 +502,14 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
     return qemu_file_get_error(f);
 }
 
+static void vfio_migration_cleanup(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    close(migration->data_fd);
+    migration->data_fd = -1;
+}
+
 static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
@@ -403,8 +519,80 @@ static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
     }
 }
 
+static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
+                                     uint64_t *stop_copy_size)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+                              sizeof(struct vfio_device_feature_mig_data_size),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+    struct vfio_device_feature_mig_data_size *mig_data_size =
+        (struct vfio_device_feature_mig_data_size *)feature->data;
+
+    feature->argsz = sizeof(buf);
+    feature->flags =
+        VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIG_DATA_SIZE;
+
+    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        return -errno;
+    }
+
+    *stop_copy_size = mig_data_size->stop_copy_length;
+
+    return 0;
+}
+
+/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
+static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
+{
+    ssize_t data_size;
+
+    data_size = read(migration->data_fd, migration->data_buffer,
+                     migration->data_buffer_size);
+    if (data_size < 0) {
+        return -errno;
+    }
+    if (data_size == 0) {
+        return 1;
+    }
+
+    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
+    qemu_put_be64(f, data_size);
+    qemu_put_buffer(f, migration->data_buffer, data_size);
+    bytes_transferred += data_size;
+
+    trace_vfio_save_block(migration->vbasedev->name, data_size);
+
+    return qemu_file_get_error(f);
+}
+
 /* ---------------------------------------------------------------------- */
 
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+    VFIOMigration *migration = vbasedev->migration;
+    uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
+
+    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
+    migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE,
+                                      stop_copy_size);
+    migration->data_buffer = g_try_malloc0(migration->data_buffer_size);
+    if (!migration->data_buffer) {
+        error_report("%s: Failed to allocate migration data buffer",
+                     vbasedev->name);
+        return -ENOMEM;
+    }
+
+    trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size);
+
+    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+    return qemu_file_get_error(f);
+}
+
 static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -448,6 +636,17 @@ static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
+static void vfio_save_cleanup(void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+    VFIOMigration *migration = vbasedev->migration;
+
+    g_free(migration->data_buffer);
+    migration->data_buffer = NULL;
+    vfio_migration_cleanup(vbasedev);
+    trace_vfio_save_cleanup(vbasedev->name);
+}
+
 static void vfio_v1_save_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -456,6 +655,31 @@ static void vfio_v1_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
+/*
+ * Migration size of VFIO devices can be as little as a few KBs or as big as
+ * many GBs. This value should be big enough to cover the worst case.
+ */
+#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
+static void vfio_save_pending(void *opaque, uint64_t threshold_size,
+                              uint64_t *res_precopy_only,
+                              uint64_t *res_compatible,
+                              uint64_t *res_postcopy_only)
+{
+    VFIODevice *vbasedev = opaque;
+    uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+
+    /*
+     * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
+     * reported so downtime limit won't be violated.
+     */
+    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
+    *res_precopy_only += stop_copy_size;
+
+    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
+                            *res_postcopy_only, *res_compatible,
+                            stop_copy_size);
+}
+
 static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
                                  uint64_t *res_precopy_only,
                                  uint64_t *res_compatible,
@@ -523,6 +747,41 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
     return 0;
 }
 
+static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+    enum vfio_device_mig_state recover_state;
+    int ret;
+
+    /* We reach here with device state STOP only */
+    recover_state = VFIO_DEVICE_STATE_STOP;
+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
+                                   recover_state);
+    if (ret) {
+        return ret;
+    }
+
+    do {
+        ret = vfio_save_block(f, vbasedev->migration);
+        if (ret < 0) {
+            return ret;
+        }
+    } while (!ret);
+
+    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        return ret;
+    }
+
+    recover_state = VFIO_DEVICE_STATE_ERROR;
+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+                                   recover_state);
+    trace_vfio_save_complete_precopy(vbasedev->name, ret);
+
+    return ret;
+}
+
 static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -592,6 +851,14 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
     }
 }
 
+static int vfio_load_setup(QEMUFile *f, void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+
+    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
+                                   vbasedev->migration->device_state);
+}
+
 static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -619,6 +886,16 @@ static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
     return ret;
 }
 
+static int vfio_load_cleanup(void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+
+    vfio_migration_cleanup(vbasedev);
+    trace_vfio_load_cleanup(vbasedev->name);
+
+    return 0;
+}
+
 static int vfio_v1_load_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -661,7 +938,11 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
             uint64_t data_size = qemu_get_be64(f);
 
             if (data_size) {
-                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
+                if (vbasedev->migration->v2) {
+                    ret = vfio_load_buffer(f, vbasedev, data_size);
+                } else {
+                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
+                }
                 if (ret < 0) {
                     return ret;
                 }
@@ -682,6 +963,17 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+static const SaveVMHandlers savevm_vfio_handlers = {
+    .save_setup = vfio_save_setup,
+    .save_cleanup = vfio_save_cleanup,
+    .save_live_pending = vfio_save_pending,
+    .save_live_complete_precopy = vfio_save_complete_precopy,
+    .save_state = vfio_save_state,
+    .load_setup = vfio_load_setup,
+    .load_cleanup = vfio_load_cleanup,
+    .load_state = vfio_load_state,
+};
+
 static SaveVMHandlers savevm_vfio_v1_handlers = {
     .save_setup = vfio_v1_save_setup,
     .save_cleanup = vfio_v1_save_cleanup,
@@ -696,6 +988,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
 
 /* ---------------------------------------------------------------------- */
 
+static void vfio_vmstate_change(void *opaque, bool running, RunState state)
+{
+    VFIODevice *vbasedev = opaque;
+    enum vfio_device_mig_state new_state;
+    int ret;
+
+    if (running) {
+        new_state = VFIO_DEVICE_STATE_RUNNING;
+    } else {
+        new_state = VFIO_DEVICE_STATE_STOP;
+    }
+
+    ret = vfio_migration_set_state(vbasedev, new_state,
+                                   VFIO_DEVICE_STATE_ERROR);
+    if (ret) {
+        /*
+         * Migration should be aborted in this case, but vm_state_notify()
+         * currently does not support reporting failures.
+         */
+        if (migrate_get_current()->to_dst_file) {
+            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+        }
+    }
+
+    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
+                              mig_state_to_str(new_state));
+}
+
 static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
 {
     VFIODevice *vbasedev = opaque;
@@ -769,12 +1089,17 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
-        ret = vfio_migration_v1_set_state(vbasedev,
-                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
-                                            VFIO_DEVICE_STATE_V1_RESUMING),
-                                          VFIO_DEVICE_STATE_V1_RUNNING);
-        if (ret) {
-            error_report("%s: Failed to set state RUNNING", vbasedev->name);
+        if (migration->v2) {
+            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
+                                     VFIO_DEVICE_STATE_ERROR);
+        } else {
+            ret = vfio_migration_v1_set_state(vbasedev,
+                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
+                                                VFIO_DEVICE_STATE_V1_RESUMING),
+                                              VFIO_DEVICE_STATE_V1_RUNNING);
+            if (ret) {
+                error_report("%s: Failed to set state RUNNING", vbasedev->name);
+            }
         }
     }
 }
@@ -783,12 +1108,42 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
 
-    vfio_region_exit(&migration->region);
-    vfio_region_finalize(&migration->region);
+    if (!migration->v2) {
+        vfio_region_exit(&migration->region);
+        vfio_region_finalize(&migration->region);
+    }
     g_free(vbasedev->migration);
     vbasedev->migration = NULL;
 }
 
+static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+                                  sizeof(struct vfio_device_feature_migration),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+    struct vfio_device_feature_migration *mig =
+        (struct vfio_device_feature_migration *)feature->data;
+
+    feature->argsz = sizeof(buf);
+    feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
+    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        if (errno == ENOTTY) {
+            error_report("%s: VFIO migration is not supported in kernel",
+                         vbasedev->name);
+        } else {
+            error_report("%s: Failed to query VFIO migration support, err: %s",
+                         vbasedev->name, strerror(errno));
+        }
+
+        return -errno;
+    }
+
+    *mig_flags = mig->flags;
+
+    return 0;
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev)
 {
     int ret;
@@ -797,6 +1152,7 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     char id[256] = "";
     g_autofree char *path = NULL, *oid = NULL;
     struct vfio_region_info *info;
+    uint64_t mig_flags;
 
     if (!vbasedev->ops->vfio_get_object) {
         return -EINVAL;
@@ -807,34 +1163,50 @@ static int vfio_migration_init(VFIODevice *vbasedev)
         return -EINVAL;
     }
 
-    ret = vfio_get_dev_region_info(vbasedev,
-                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-                                   &info);
-    if (ret) {
-        return ret;
-    }
+    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
+    if (!ret) {
+        /* Migration v2 */
+        /* Basic migration functionality must be supported */
+        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
+            return -EOPNOTSUPP;
+        }
+        vbasedev->migration = g_new0(VFIOMigration, 1);
+        vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+        vbasedev->migration->data_fd = -1;
+        vbasedev->migration->v2 = true;
+    } else if (ret == -ENOTTY) {
+        /* Migration v1 */
+        ret = vfio_get_dev_region_info(vbasedev,
+                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+                                       &info);
+        if (ret) {
+            return ret;
+        }
 
-    vbasedev->migration = g_new0(VFIOMigration, 1);
-    vbasedev->migration->device_state_v1 = VFIO_DEVICE_STATE_V1_RUNNING;
-    vbasedev->migration->vm_running = runstate_is_running();
+        vbasedev->migration = g_new0(VFIOMigration, 1);
+        vbasedev->migration->device_state_v1 = VFIO_DEVICE_STATE_V1_RUNNING;
+        vbasedev->migration->vm_running = runstate_is_running();
 
-    ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
-                            info->index, "migration");
-    if (ret) {
-        error_report("%s: Failed to setup VFIO migration region %d: %s",
-                     vbasedev->name, info->index, strerror(-ret));
-        goto err;
-    }
+        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
+                                info->index, "migration");
+        if (ret) {
+            error_report("%s: Failed to setup VFIO migration region %d: %s",
+                         vbasedev->name, info->index, strerror(-ret));
+            goto err;
+        }
 
-    if (!vbasedev->migration->region.size) {
-        error_report("%s: Invalid zero-sized VFIO migration region %d",
-                     vbasedev->name, info->index);
-        ret = -EINVAL;
-        goto err;
-    }
+        if (!vbasedev->migration->region.size) {
+            error_report("%s: Invalid zero-sized VFIO migration region %d",
+                         vbasedev->name, info->index);
+            ret = -EINVAL;
+            goto err;
+        }
 
-    g_free(info);
+        g_free(info);
+    } else {
+        return ret;
+    }
 
     migration = vbasedev->migration;
     migration->vbasedev = vbasedev;
@@ -847,11 +1219,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
     strpadcpy(id, sizeof(id), path, '\0');
 
-    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
-                         &savevm_vfio_v1_handlers, vbasedev);
+    if (migration->v2) {
+        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
+                             &savevm_vfio_handlers, vbasedev);
+
+        migration->vm_state = qdev_add_vm_change_state_handler(
+            vbasedev->dev, vfio_vmstate_change, vbasedev);
+    } else {
+        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
+                             &savevm_vfio_v1_handlers, vbasedev);
+
+        migration->vm_state = qdev_add_vm_change_state_handler(
+            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
+    }
 
-    migration->vm_state = qdev_add_vm_change_state_handler(
-        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
     return 0;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 9eb9aff295..e7b3b4658c 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -149,20 +149,27 @@ vfio_display_edid_write_error(void) ""
 
 # migration.c
 vfio_migration_probe(const char *name) " (%s)"
+vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
 vfio_migration_v1_set_state(const char *name, uint32_t state) " (%s) state %d"
+vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_v1_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
+vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
 vfio_v1_save_setup(const char *name) " (%s)"
 vfio_save_cleanup(const char *name) " (%s)"
 vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
 vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
 vfio_save_device_config_state(const char *name) " (%s)"
+vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
 vfio_v1_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
 vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
+vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
 vfio_v1_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
 vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
 vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
+vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
+vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending()
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (8 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-13 16:05   ` Cédric Le Goater
  2023-01-12  8:50 ` [PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

During pre-copy phase of migration vfio_save_pending() is called
repeatedly and queries the VFIO device for its pending data size.

As long as pending RAM size is over the threshold, migration can't
converge and be completed. Therefore, during this time there is no
point in querying the VFIO device pending data size.

Avoid these unnecessary queries by issuing them in a RAM pre-copy
notifier instead of vfio_save_pending().

This way the VFIO device is queried only when RAM pending data is
below the threshold, when there is an actual chance for migration to
converge.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/migration.c           | 56 +++++++++++++++++++++++++++--------
 hw/vfio/trace-events          |  1 +
 3 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 2ec3346fea..113f8d9208 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -65,11 +65,13 @@ typedef struct VFIOMigration {
     uint32_t device_state_v1;
     int vm_running;
     Notifier migration_state;
+    NotifierWithReturn migration_data;
     uint64_t pending_bytes;
     enum vfio_device_mig_state device_state;
     int data_fd;
     void *data_buffer;
     size_t data_buffer_size;
+    uint64_t stop_copy_size;
     bool v2;
 } VFIOMigration;
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 08f53189fa..04f4397212 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -655,29 +655,19 @@ static void vfio_v1_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-/*
- * Migration size of VFIO devices can be as little as a few KBs or as big as
- * many GBs. This value should be big enough to cover the worst case.
- */
-#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
 static void vfio_save_pending(void *opaque, uint64_t threshold_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
                               uint64_t *res_postcopy_only)
 {
     VFIODevice *vbasedev = opaque;
-    uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+    VFIOMigration *migration = vbasedev->migration;
 
-    /*
-     * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
-     * reported so downtime limit won't be violated.
-     */
-    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
-    *res_precopy_only += stop_copy_size;
+    *res_precopy_only += migration->stop_copy_size;
 
     trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
                             *res_postcopy_only, *res_compatible,
-                            stop_copy_size);
+                            migration->stop_copy_size);
 }
 
 static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
@@ -1104,6 +1094,40 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     }
 }
 
+/*
+ * Migration size of VFIO devices can be as little as a few KBs or as big as
+ * many GBs. This value should be big enough to cover the worst case.
+ */
+#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
+static int vfio_migration_data_notifier(NotifierWithReturn *n, void *data)
+{
+    VFIOMigration *migration = container_of(n, VFIOMigration, migration_data);
+    VFIODevice *vbasedev = migration->vbasedev;
+    PrecopyNotifyData *pnd = data;
+
+    if (pnd->reason != PRECOPY_NOTIFY_AFTER_BITMAP_SYNC) {
+        return 0;
+    }
+
+    /* No need to get pending size when finishing migration */
+    if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+        return 0;
+    }
+
+    if (vfio_query_stop_copy_size(vbasedev, &migration->stop_copy_size)) {
+        /*
+         * Failed to get pending migration size. Report big pending size so
+         * downtime limit won't be violated.
+         */
+        migration->stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+    }
+
+    trace_vfio_migration_data_notifier(vbasedev->name,
+                                       migration->stop_copy_size);
+
+    return 0;
+}
+
 static void vfio_migration_exit(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
@@ -1225,6 +1249,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
 
         migration->vm_state = qdev_add_vm_change_state_handler(
             vbasedev->dev, vfio_vmstate_change, vbasedev);
+
+        migration->migration_data.notify = vfio_migration_data_notifier;
+        precopy_add_notifier(&migration->migration_data);
     } else {
         register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
                              &savevm_vfio_v1_handlers, vbasedev);
@@ -1283,6 +1310,9 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
     if (vbasedev->migration) {
         VFIOMigration *migration = vbasedev->migration;
 
+        if (migration->v2) {
+            precopy_remove_notifier(&migration->migration_data);
+        }
         remove_migration_state_change_notifier(&migration->migration_state);
         qemu_del_vm_change_state_handler(migration->vm_state);
         unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index e7b3b4658c..a8a64f0627 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -173,3 +173,4 @@ vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
 vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
+vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (9 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending() Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-13 16:06   ` Cédric Le Goater
  2023-01-12  8:50 ` [PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
  2023-01-12  8:50 ` [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon
  12 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Now that v2 protocol implementation has been added, remove the
deprecated v1 implementation.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 include/hw/vfio/vfio-common.h |   5 -
 hw/vfio/common.c              |  19 +-
 hw/vfio/migration.c           | 703 +---------------------------------
 hw/vfio/trace-events          |   9 -
 4 files changed, 24 insertions(+), 712 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 113f8d9208..2aba45887c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -61,18 +61,13 @@ typedef struct VFIORegion {
 typedef struct VFIOMigration {
     struct VFIODevice *vbasedev;
     VMChangeStateEntry *vm_state;
-    VFIORegion region;
-    uint32_t device_state_v1;
-    int vm_running;
     Notifier migration_state;
     NotifierWithReturn migration_data;
-    uint64_t pending_bytes;
     enum vfio_device_mig_state device_state;
     int data_fd;
     void *data_buffer;
     size_t data_buffer_size;
     uint64_t stop_copy_size;
-    bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index dcaa77d2a8..9a0dbee6b4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,14 +355,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
                 return false;
             }
 
-            if (!migration->v2 &&
-                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
-                (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
-                return false;
-            }
-
-            if (migration->v2 &&
-                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
                 (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
                  migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
                 return false;
@@ -393,14 +386,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
                 return false;
             }
 
-            if (!migration->v2 &&
-                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
-                continue;
-            }
-
-            if (migration->v2 &&
-                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
-                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
+            if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+                migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) {
                 continue;
             } else {
                 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 04f4397212..7688c83127 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -142,220 +142,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
     return 0;
 }
 
-static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
-                                  off_t off, bool iswrite)
-{
-    int ret;
-
-    ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
-                    pread(vbasedev->fd, val, count, off);
-    if (ret < count) {
-        error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
-                     HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
-                     vbasedev->name, off, strerror(errno));
-        return (ret < 0) ? ret : -EINVAL;
-    }
-    return 0;
-}
-
-static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
-                       off_t off, bool iswrite)
-{
-    int ret, done = 0;
-    __u8 *tbuf = buf;
-
-    while (count) {
-        int bytes = 0;
-
-        if (count >= 8 && !(off % 8)) {
-            bytes = 8;
-        } else if (count >= 4 && !(off % 4)) {
-            bytes = 4;
-        } else if (count >= 2 && !(off % 2)) {
-            bytes = 2;
-        } else {
-            bytes = 1;
-        }
-
-        ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
-        if (ret) {
-            return ret;
-        }
-
-        count -= bytes;
-        done += bytes;
-        off += bytes;
-        tbuf += bytes;
-    }
-    return done;
-}
-
-#define vfio_mig_read(f, v, c, o)       vfio_mig_rw(f, (__u8 *)v, c, o, false)
-#define vfio_mig_write(f, v, c, o)      vfio_mig_rw(f, (__u8 *)v, c, o, true)
-
-#define VFIO_MIG_STRUCT_OFFSET(f)       \
-                                 offsetof(struct vfio_device_migration_info, f)
-/*
- * Change the device_state register for device @vbasedev. Bits set in @mask
- * are preserved, bits set in @value are set, and bits not set in either @mask
- * or @value are cleared in device_state. If the register cannot be accessed,
- * the resulting state would be invalid, or the device enters an error state,
- * an error is returned.
- */
-
-static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
-                                       uint32_t value)
-{
-    VFIOMigration *migration = vbasedev->migration;
-    VFIORegion *region = &migration->region;
-    off_t dev_state_off = region->fd_offset +
-                          VFIO_MIG_STRUCT_OFFSET(device_state);
-    uint32_t device_state;
-    int ret;
-
-    ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
-                        dev_state_off);
-    if (ret < 0) {
-        return ret;
-    }
-
-    device_state = (device_state & mask) | value;
-
-    if (!VFIO_DEVICE_STATE_VALID(device_state)) {
-        return -EINVAL;
-    }
-
-    ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
-                         dev_state_off);
-    if (ret < 0) {
-        int rret;
-
-        rret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
-                             dev_state_off);
-
-        if ((rret < 0) || (VFIO_DEVICE_STATE_IS_ERROR(device_state))) {
-            hw_error("%s: Device in error state 0x%x", vbasedev->name,
-                     device_state);
-            return rret ? rret : -EIO;
-        }
-        return ret;
-    }
-
-    migration->device_state_v1 = device_state;
-    trace_vfio_migration_v1_set_state(vbasedev->name, device_state);
-    return 0;
-}
-
-static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
-                                   uint64_t data_size, uint64_t *size)
-{
-    void *ptr = NULL;
-    uint64_t limit = 0;
-    int i;
-
-    if (!region->mmaps) {
-        if (size) {
-            *size = MIN(data_size, region->size - data_offset);
-        }
-        return ptr;
-    }
-
-    for (i = 0; i < region->nr_mmaps; i++) {
-        VFIOMmap *map = region->mmaps + i;
-
-        if ((data_offset >= map->offset) &&
-            (data_offset < map->offset + map->size)) {
-
-            /* check if data_offset is within sparse mmap areas */
-            ptr = map->mmap + data_offset - map->offset;
-            if (size) {
-                *size = MIN(data_size, map->offset + map->size - data_offset);
-            }
-            break;
-        } else if ((data_offset < map->offset) &&
-                   (!limit || limit > map->offset)) {
-            /*
-             * data_offset is not within sparse mmap areas, find size of
-             * non-mapped area. Check through all list since region->mmaps list
-             * is not sorted.
-             */
-            limit = map->offset;
-        }
-    }
-
-    if (!ptr && size) {
-        *size = limit ? MIN(data_size, limit - data_offset) : data_size;
-    }
-    return ptr;
-}
-
-static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
-{
-    VFIOMigration *migration = vbasedev->migration;
-    VFIORegion *region = &migration->region;
-    uint64_t data_offset = 0, data_size = 0, sz;
-    int ret;
-
-    ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
-                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
-    if (ret < 0) {
-        return ret;
-    }
-
-    ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size),
-                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
-    if (ret < 0) {
-        return ret;
-    }
-
-    trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
-                           migration->pending_bytes);
-
-    qemu_put_be64(f, data_size);
-    sz = data_size;
-
-    while (sz) {
-        void *buf;
-        uint64_t sec_size;
-        bool buf_allocated = false;
-
-        buf = get_data_section_size(region, data_offset, sz, &sec_size);
-
-        if (!buf) {
-            buf = g_try_malloc(sec_size);
-            if (!buf) {
-                error_report("%s: Error allocating buffer ", __func__);
-                return -ENOMEM;
-            }
-            buf_allocated = true;
-
-            ret = vfio_mig_read(vbasedev, buf, sec_size,
-                                region->fd_offset + data_offset);
-            if (ret < 0) {
-                g_free(buf);
-                return ret;
-            }
-        }
-
-        qemu_put_buffer(f, buf, sec_size);
-
-        if (buf_allocated) {
-            g_free(buf);
-        }
-        sz -= sec_size;
-        data_offset += sec_size;
-    }
-
-    ret = qemu_file_get_error(f);
-
-    if (!ret && size) {
-        *size = data_size;
-    }
-
-    bytes_transferred += data_size;
-    return ret;
-}
-
 static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
                             uint64_t data_size)
 {
@@ -368,96 +154,6 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
     return ret;
 }
 
-static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
-                               uint64_t data_size)
-{
-    VFIORegion *region = &vbasedev->migration->region;
-    uint64_t data_offset = 0, size, report_size;
-    int ret;
-
-    do {
-        ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
-                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
-        if (ret < 0) {
-            return ret;
-        }
-
-        if (data_offset + data_size > region->size) {
-            /*
-             * If data_size is greater than the data section of migration region
-             * then iterate the write buffer operation. This case can occur if
-             * size of migration region at destination is smaller than size of
-             * migration region at source.
-             */
-            report_size = size = region->size - data_offset;
-            data_size -= size;
-        } else {
-            report_size = size = data_size;
-            data_size = 0;
-        }
-
-        trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size);
-
-        while (size) {
-            void *buf;
-            uint64_t sec_size;
-            bool buf_alloc = false;
-
-            buf = get_data_section_size(region, data_offset, size, &sec_size);
-
-            if (!buf) {
-                buf = g_try_malloc(sec_size);
-                if (!buf) {
-                    error_report("%s: Error allocating buffer ", __func__);
-                    return -ENOMEM;
-                }
-                buf_alloc = true;
-            }
-
-            qemu_get_buffer(f, buf, sec_size);
-
-            if (buf_alloc) {
-                ret = vfio_mig_write(vbasedev, buf, sec_size,
-                        region->fd_offset + data_offset);
-                g_free(buf);
-
-                if (ret < 0) {
-                    return ret;
-                }
-            }
-            size -= sec_size;
-            data_offset += sec_size;
-        }
-
-        ret = vfio_mig_write(vbasedev, &report_size, sizeof(report_size),
-                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
-        if (ret < 0) {
-            return ret;
-        }
-    } while (data_size);
-
-    return 0;
-}
-
-static int vfio_update_pending(VFIODevice *vbasedev)
-{
-    VFIOMigration *migration = vbasedev->migration;
-    VFIORegion *region = &migration->region;
-    uint64_t pending_bytes = 0;
-    int ret;
-
-    ret = vfio_mig_read(vbasedev, &pending_bytes, sizeof(pending_bytes),
-                    region->fd_offset + VFIO_MIG_STRUCT_OFFSET(pending_bytes));
-    if (ret < 0) {
-        migration->pending_bytes = 0;
-        return ret;
-    }
-
-    migration->pending_bytes = pending_bytes;
-    trace_vfio_update_pending(vbasedev->name, pending_bytes);
-    return 0;
-}
-
 static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -510,15 +206,6 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
     migration->data_fd = -1;
 }
 
-static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
-{
-    VFIOMigration *migration = vbasedev->migration;
-
-    if (migration->region.mmaps) {
-        vfio_region_unmap(&migration->region);
-    }
-}
-
 static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
                                      uint64_t *stop_copy_size)
 {
@@ -593,49 +280,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     return qemu_file_get_error(f);
 }
 
-static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    int ret;
-
-    trace_vfio_v1_save_setup(vbasedev->name);
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
-
-    if (migration->region.mmaps) {
-        /*
-         * Calling vfio_region_mmap() from migration thread. Memory API called
-         * from this function require locking the iothread when called from
-         * outside the main loop thread.
-         */
-        qemu_mutex_lock_iothread();
-        ret = vfio_region_mmap(&migration->region);
-        qemu_mutex_unlock_iothread();
-        if (ret) {
-            error_report("%s: Failed to mmap VFIO migration region: %s",
-                         vbasedev->name, strerror(-ret));
-            error_report("%s: Falling back to slow path", vbasedev->name);
-        }
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-                                      VFIO_DEVICE_STATE_V1_SAVING);
-    if (ret) {
-        error_report("%s: Failed to set state SAVING", vbasedev->name);
-        return ret;
-    }
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-
-    ret = qemu_file_get_error(f);
-    if (ret) {
-        return ret;
-    }
-
-    return 0;
-}
-
 static void vfio_save_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -647,14 +291,6 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_v1_save_cleanup(void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-
-    vfio_migration_v1_cleanup(vbasedev);
-    trace_vfio_save_cleanup(vbasedev->name);
-}
-
 static void vfio_save_pending(void *opaque, uint64_t threshold_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
@@ -670,73 +306,6 @@ static void vfio_save_pending(void *opaque, uint64_t threshold_size,
                             migration->stop_copy_size);
 }
 
-static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
-                                 uint64_t *res_precopy_only,
-                                 uint64_t *res_compatible,
-                                 uint64_t *res_postcopy_only)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    int ret;
-
-    ret = vfio_update_pending(vbasedev);
-    if (ret) {
-        return;
-    }
-
-    *res_precopy_only += migration->pending_bytes;
-
-    trace_vfio_v1_save_pending(vbasedev->name, *res_precopy_only,
-                               *res_postcopy_only, *res_compatible);
-}
-
-static int vfio_save_iterate(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    uint64_t data_size;
-    int ret;
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
-
-    if (migration->pending_bytes == 0) {
-        ret = vfio_update_pending(vbasedev);
-        if (ret) {
-            return ret;
-        }
-
-        if (migration->pending_bytes == 0) {
-            qemu_put_be64(f, 0);
-            qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-            /* indicates data finished, goto complete phase */
-            return 1;
-        }
-    }
-
-    ret = vfio_save_buffer(f, vbasedev, &data_size);
-    if (ret) {
-        error_report("%s: vfio_save_buffer failed %s", vbasedev->name,
-                     strerror(errno));
-        return ret;
-    }
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-
-    ret = qemu_file_get_error(f);
-    if (ret) {
-        return ret;
-    }
-
-    /*
-     * Reset pending_bytes as .save_live_pending is not called during savevm or
-     * snapshot case, in such case vfio_update_pending() at the start of this
-     * function updates pending_bytes.
-     */
-    migration->pending_bytes = 0;
-    trace_vfio_save_iterate(vbasedev->name, data_size);
-    return 0;
-}
-
 static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -772,62 +341,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
     return ret;
 }
 
-static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    uint64_t data_size;
-    int ret;
-
-    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
-                                      VFIO_DEVICE_STATE_V1_SAVING);
-    if (ret) {
-        error_report("%s: Failed to set state STOP and SAVING",
-                     vbasedev->name);
-        return ret;
-    }
-
-    ret = vfio_update_pending(vbasedev);
-    if (ret) {
-        return ret;
-    }
-
-    while (migration->pending_bytes > 0) {
-        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
-        ret = vfio_save_buffer(f, vbasedev, &data_size);
-        if (ret < 0) {
-            error_report("%s: Failed to save buffer", vbasedev->name);
-            return ret;
-        }
-
-        if (data_size == 0) {
-            break;
-        }
-
-        ret = vfio_update_pending(vbasedev);
-        if (ret) {
-            return ret;
-        }
-    }
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-
-    ret = qemu_file_get_error(f);
-    if (ret) {
-        return ret;
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING,
-                                      0);
-    if (ret) {
-        error_report("%s: Failed to set state STOPPED", vbasedev->name);
-        return ret;
-    }
-
-    trace_vfio_v1_save_complete_precopy(vbasedev->name);
-    return ret;
-}
-
 static void vfio_save_state(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -849,33 +362,6 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
                                    vbasedev->migration->device_state);
 }
 
-static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    int ret = 0;
-
-    if (migration->region.mmaps) {
-        ret = vfio_region_mmap(&migration->region);
-        if (ret) {
-            error_report("%s: Failed to mmap VFIO migration region %d: %s",
-                         vbasedev->name, migration->region.nr,
-                         strerror(-ret));
-            error_report("%s: Falling back to slow path", vbasedev->name);
-        }
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
-                                      VFIO_DEVICE_STATE_V1_RESUMING);
-    if (ret) {
-        error_report("%s: Failed to set state RESUMING", vbasedev->name);
-        if (migration->region.mmaps) {
-            vfio_region_unmap(&migration->region);
-        }
-    }
-    return ret;
-}
-
 static int vfio_load_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -886,15 +372,6 @@ static int vfio_load_cleanup(void *opaque)
     return 0;
 }
 
-static int vfio_v1_load_cleanup(void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-
-    vfio_migration_v1_cleanup(vbasedev);
-    trace_vfio_load_cleanup(vbasedev->name);
-    return 0;
-}
-
 static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
 {
     VFIODevice *vbasedev = opaque;
@@ -928,11 +405,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
             uint64_t data_size = qemu_get_be64(f);
 
             if (data_size) {
-                if (vbasedev->migration->v2) {
-                    ret = vfio_load_buffer(f, vbasedev, data_size);
-                } else {
-                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
-                }
+                ret = vfio_load_buffer(f, vbasedev, data_size);
                 if (ret < 0) {
                     return ret;
                 }
@@ -964,18 +437,6 @@ static const SaveVMHandlers savevm_vfio_handlers = {
     .load_state = vfio_load_state,
 };
 
-static SaveVMHandlers savevm_vfio_v1_handlers = {
-    .save_setup = vfio_v1_save_setup,
-    .save_cleanup = vfio_v1_save_cleanup,
-    .save_live_pending = vfio_v1_save_pending,
-    .save_live_iterate = vfio_save_iterate,
-    .save_live_complete_precopy = vfio_v1_save_complete_precopy,
-    .save_state = vfio_save_state,
-    .load_setup = vfio_v1_load_setup,
-    .load_cleanup = vfio_v1_load_cleanup,
-    .load_state = vfio_load_state,
-};
-
 /* ---------------------------------------------------------------------- */
 
 static void vfio_vmstate_change(void *opaque, bool running, RunState state)
@@ -1006,70 +467,12 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
                               mig_state_to_str(new_state));
 }
 
-static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    uint32_t value, mask;
-    int ret;
-
-    if (vbasedev->migration->vm_running == running) {
-        return;
-    }
-
-    if (running) {
-        /*
-         * Here device state can have one of _SAVING, _RESUMING or _STOP bit.
-         * Transition from _SAVING to _RUNNING can happen if there is migration
-         * failure, in that case clear _SAVING bit.
-         * Transition from _RESUMING to _RUNNING occurs during resuming
-         * phase, in that case clear _RESUMING bit.
-         * In both the above cases, set _RUNNING bit.
-         */
-        mask = ~VFIO_DEVICE_STATE_MASK;
-        value = VFIO_DEVICE_STATE_V1_RUNNING;
-    } else {
-        /*
-         * Here device state could be either _RUNNING or _SAVING|_RUNNING. Reset
-         * _RUNNING bit
-         */
-        mask = ~VFIO_DEVICE_STATE_V1_RUNNING;
-
-        /*
-         * When VM state transition to stop for savevm command, device should
-         * start saving data.
-         */
-        if (state == RUN_STATE_SAVE_VM) {
-            value = VFIO_DEVICE_STATE_V1_SAVING;
-        } else {
-            value = 0;
-        }
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, mask, value);
-    if (ret) {
-        /*
-         * Migration should be aborted in this case, but vm_state_notify()
-         * currently does not support reporting failures.
-         */
-        error_report("%s: Failed to set device state 0x%x", vbasedev->name,
-                     (migration->device_state_v1 & mask) | value);
-        if (migrate_get_current()->to_dst_file) {
-            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
-        }
-    }
-    vbasedev->migration->vm_running = running;
-    trace_vfio_v1_vmstate_change(vbasedev->name, running, RunState_str(state),
-            (migration->device_state_v1 & mask) | value);
-}
-
 static void vfio_migration_state_notifier(Notifier *notifier, void *data)
 {
     MigrationState *s = data;
     VFIOMigration *migration = container_of(notifier, VFIOMigration,
                                             migration_state);
     VFIODevice *vbasedev = migration->vbasedev;
-    int ret;
 
     trace_vfio_migration_state_notifier(vbasedev->name,
                                         MigrationStatus_str(s->state));
@@ -1079,18 +482,8 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
-        if (migration->v2) {
-            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
-                                     VFIO_DEVICE_STATE_ERROR);
-        } else {
-            ret = vfio_migration_v1_set_state(vbasedev,
-                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
-                                                VFIO_DEVICE_STATE_V1_RESUMING),
-                                              VFIO_DEVICE_STATE_V1_RUNNING);
-            if (ret) {
-                error_report("%s: Failed to set state RUNNING", vbasedev->name);
-            }
-        }
+        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
+                                 VFIO_DEVICE_STATE_ERROR);
     }
 }
 
@@ -1130,12 +523,6 @@ static int vfio_migration_data_notifier(NotifierWithReturn *n, void *data)
 
 static void vfio_migration_exit(VFIODevice *vbasedev)
 {
-    VFIOMigration *migration = vbasedev->migration;
-
-    if (!migration->v2) {
-        vfio_region_exit(&migration->region);
-        vfio_region_finalize(&migration->region);
-    }
     g_free(vbasedev->migration);
     vbasedev->migration = NULL;
 }
@@ -1175,7 +562,6 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     VFIOMigration *migration;
     char id[256] = "";
     g_autofree char *path = NULL, *oid = NULL;
-    struct vfio_region_info *info;
     uint64_t mig_flags;
 
     if (!vbasedev->ops->vfio_get_object) {
@@ -1188,52 +574,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
 
     ret = vfio_migration_query_flags(vbasedev, &mig_flags);
-    if (!ret) {
-        /* Migration v2 */
-        /* Basic migration functionality must be supported */
-        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
-            return -EOPNOTSUPP;
-        }
-        vbasedev->migration = g_new0(VFIOMigration, 1);
-        vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
-        vbasedev->migration->data_fd = -1;
-        vbasedev->migration->v2 = true;
-    } else if (ret == -ENOTTY) {
-        /* Migration v1 */
-        ret = vfio_get_dev_region_info(vbasedev,
-                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-                                       &info);
-        if (ret) {
-            return ret;
-        }
-
-        vbasedev->migration = g_new0(VFIOMigration, 1);
-        vbasedev->migration->device_state_v1 = VFIO_DEVICE_STATE_V1_RUNNING;
-        vbasedev->migration->vm_running = runstate_is_running();
-
-        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
-                                info->index, "migration");
-        if (ret) {
-            error_report("%s: Failed to setup VFIO migration region %d: %s",
-                         vbasedev->name, info->index, strerror(-ret));
-            goto err;
-        }
-
-        if (!vbasedev->migration->region.size) {
-            error_report("%s: Invalid zero-sized VFIO migration region %d",
-                         vbasedev->name, info->index);
-            ret = -EINVAL;
-            goto err;
-        }
-
-        g_free(info);
-    } else {
+    if (ret) {
         return ret;
     }
 
+    /* Basic migration functionality must be supported */
+    if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
+        return -EOPNOTSUPP;
+    }
+
+    vbasedev->migration = g_new0(VFIOMigration, 1);
     migration = vbasedev->migration;
     migration->vbasedev = vbasedev;
+    migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+    migration->data_fd = -1;
 
     oid = vmstate_if_get_id(VMSTATE_IF(DEVICE(obj)));
     if (oid) {
@@ -1243,31 +597,18 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
     strpadcpy(id, sizeof(id), path, '\0');
 
-    if (migration->v2) {
-        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
-                             &savevm_vfio_handlers, vbasedev);
-
-        migration->vm_state = qdev_add_vm_change_state_handler(
-            vbasedev->dev, vfio_vmstate_change, vbasedev);
-
-        migration->migration_data.notify = vfio_migration_data_notifier;
-        precopy_add_notifier(&migration->migration_data);
-    } else {
-        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
-                             &savevm_vfio_v1_handlers, vbasedev);
-
-        migration->vm_state = qdev_add_vm_change_state_handler(
-            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
-    }
+    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
+                         vbasedev);
 
+    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
+                                                           vfio_vmstate_change,
+                                                           vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
-    return 0;
+    migration->migration_data.notify = vfio_migration_data_notifier;
+    precopy_add_notifier(&migration->migration_data);
 
-err:
-    g_free(info);
-    vfio_migration_exit(vbasedev);
-    return ret;
+    return 0;
 }
 
 /* ---------------------------------------------------------------------- */
@@ -1310,9 +651,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
     if (vbasedev->migration) {
         VFIOMigration *migration = vbasedev->migration;
 
-        if (migration->v2) {
-            precopy_remove_notifier(&migration->migration_data);
-        }
+        precopy_remove_notifier(&migration->migration_data);
         remove_migration_state_change_notifier(&migration->migration_state);
         qemu_del_vm_change_state_handler(migration->vm_state);
         unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index a8a64f0627..60c49b2ecf 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -150,24 +150,15 @@ vfio_display_edid_write_error(void) ""
 # migration.c
 vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
-vfio_migration_v1_set_state(const char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
-vfio_v1_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
 vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
-vfio_v1_save_setup(const char *name) " (%s)"
 vfio_save_cleanup(const char *name) " (%s)"
-vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
-vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
 vfio_save_device_config_state(const char *name) " (%s)"
 vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
-vfio_v1_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
-vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
-vfio_v1_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
 vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
 vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (10 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-13 16:06   ` Cédric Le Goater
  2023-01-12  8:50 ` [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon
  12 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Sort the migration section of VFIO trace events file alphabetically
and move two misplaced traces to common.c section.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/trace-events | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 60c49b2ecf..db9cb94952 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -119,6 +119,8 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
+vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
@@ -148,20 +150,18 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
+vfio_load_cleanup(const char *name) " (%s)"
+vfio_load_device_config_state(const char *name) " (%s)"
+vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
+vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
+vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64
 vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
-vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
-vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
+vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_cleanup(const char *name) " (%s)"
+vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
 vfio_save_device_config_state(const char *name) " (%s)"
 vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
-vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
-vfio_load_device_config_state(const char *name) " (%s)"
-vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
-vfio_load_cleanup(const char *name) " (%s)"
-vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
-vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
-vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
-vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64
+vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
+vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol
  2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (11 preceding siblings ...)
  2023-01-12  8:50 ` [PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
@ 2023-01-12  8:50 ` Avihai Horon
  2023-01-13 17:12   ` Cédric Le Goater
  12 siblings, 1 reply; 22+ messages in thread
From: Avihai Horon @ 2023-01-12  8:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, Cédric Le Goater,
	qemu-s390x, qemu-block, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Avihai Horon, Kirti Wankhede, Tarun Gupta,
	Joao Martins

Now that VFIO migration protocol v2 has been implemented and v1 protocol
has been removed, update the documentation according to v2 protocol.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 docs/devel/vfio-migration.rst | 68 ++++++++++++++++-------------------
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 9ff6163c88..1d50c2fe5f 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -7,46 +7,39 @@ the guest is running on source host and restoring this saved state on the
 destination host. This document details how saving and restoring of VFIO
 devices is done in QEMU.
 
-Migration of VFIO devices consists of two phases: the optional pre-copy phase,
-and the stop-and-copy phase. The pre-copy phase is iterative and allows to
-accommodate VFIO devices that have a large amount of data that needs to be
-transferred. The iterative pre-copy phase of migration allows for the guest to
-continue whilst the VFIO device state is transferred to the destination, this
-helps to reduce the total downtime of the VM. VFIO devices can choose to skip
-the pre-copy phase of migration by returning pending_bytes as zero during the
-pre-copy phase.
+Migration of VFIO devices currently consists of a single stop-and-copy phase.
+During the stop-and-copy phase the guest is stopped and the entire VFIO device
+data is transferred to the destination.
+
+The pre-copy phase of migration is currently not supported for VFIO devices.
+Support for VFIO pre-copy will be added later on.
 
 A detailed description of the UAPI for VFIO device migration can be found in
-the comment for the ``vfio_device_migration_info`` structure in the header
-file linux-headers/linux/vfio.h.
+the comment for the ``vfio_device_mig_state`` structure in the header file
+linux-headers/linux/vfio.h.
 
 VFIO implements the device hooks for the iterative approach as follows:
 
-* A ``save_setup`` function that sets up the migration region and sets _SAVING
-  flag in the VFIO device state.
+* A ``save_setup`` function that sets up migration on the source.
 
-* A ``load_setup`` function that sets up the migration region on the
-  destination and sets _RESUMING flag in the VFIO device state.
+* A ``load_setup`` function that sets the VFIO device on the destination in
+  _RESUMING state.
 
 * A ``save_live_pending`` function that reads pending_bytes from the vendor
   driver, which indicates the amount of data that the vendor driver has yet to
   save for the VFIO device.
 
-* A ``save_live_iterate`` function that reads the VFIO device's data from the
-  vendor driver through the migration region during iterative phase.
-
 * A ``save_state`` function to save the device config space if it is present.
 
-* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the
-  VFIO device state and iteratively copies the remaining data for the VFIO
-  device until the vendor driver indicates that no data remains (pending bytes
-  is zero).
+* A ``save_live_complete_precopy`` function that sets the VFIO device in
+  _STOP_COPY state and iteratively copies the data for the VFIO device until
+  the vendor driver indicates that no data remains.
 
 * A ``load_state`` function that loads the config section and the data
-  sections that are generated by the save functions above
+  sections that are generated by the save functions above.
 
 * ``cleanup`` functions for both save and load that perform any migration
-  related cleanup, including unmapping the migration region
+  related cleanup.
 
 
 The VFIO migration code uses a VM state change handler to change the VFIO
@@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinned by the vendor driver
 can also be written by the device. There is currently no device or IOMMU
 support for dirty page tracking in hardware.
 
-By default, dirty pages are tracked when the device is in pre-copy as well as
-stop-and-copy phase. So, a page pinned by the vendor driver will be copied to
-the destination in both phases. Copying dirty pages in pre-copy phase helps
-QEMU to predict if it can achieve its downtime tolerances. If QEMU during
-pre-copy phase keeps finding dirty pages continuously, then it understands
-that even in stop-and-copy phase, it is likely to find dirty pages and can
-predict the downtime accordingly.
+By default, dirty pages are tracked during pre-copy as well as stop-and-copy
+phase. So, a page pinned by the vendor driver will be copied to the destination
+in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
+it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
+finding dirty pages continuously, then it understands that even in stop-and-copy
+phase, it is likely to find dirty pages and can predict the downtime
+accordingly.
 
 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
 which disables querying the dirty bitmap during pre-copy phase. If it is set to
@@ -111,23 +104,22 @@ Live migration save path
                                   |
                      migrate_init spawns migration_thread
                 Migration thread then calls each device's .save_setup()
-                    (RUNNING, _SETUP, _RUNNING|_SAVING)
+                       (RUNNING, _SETUP, _RUNNING)
                                   |
-                    (RUNNING, _ACTIVE, _RUNNING|_SAVING)
+                      (RUNNING, _ACTIVE, _RUNNING)
              If device is active, get pending_bytes by .save_live_pending()
           If total pending_bytes >= threshold_size, call .save_live_iterate()
-                  Data of VFIO device for pre-copy phase is copied
         Iterate till total pending bytes converge and are less than threshold
                                   |
   On migration completion, vCPU stops and calls .save_live_complete_precopy for
-   each active device. The VFIO device is then transitioned into _SAVING state
-                   (FINISH_MIGRATE, _DEVICE, _SAVING)
+  each active device. The VFIO device is then transitioned into _STOP_COPY state
+                  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
                                   |
      For the VFIO device, iterate in .save_live_complete_precopy until
                          pending data is 0
-                   (FINISH_MIGRATE, _DEVICE, _STOPPED)
+                   (FINISH_MIGRATE, _DEVICE, _STOP)
                                   |
-                 (FINISH_MIGRATE, _COMPLETED, _STOPPED)
+                 (FINISH_MIGRATE, _COMPLETED, _STOP)
              Migraton thread schedules cleanup bottom half and exits
 
 Live migration resume path
@@ -136,7 +128,7 @@ Live migration resume path
 ::
 
               Incoming migration calls .load_setup for each device
-                       (RESTORE_VM, _ACTIVE, _STOPPED)
+                       (RESTORE_VM, _ACTIVE, _STOP)
                                  |
        For each device, .load_state is called for that device section data
                        (RESTORE_VM, _ACTIVE, _RESUMING)
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd()
  2023-01-12  8:50 ` [PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
@ 2023-01-12 14:43   ` Cédric Le Goater
  0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-01-12 14:43 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins

On 1/12/23 09:50, Avihai Horon wrote:
> Add new function qemu_file_get_to_fd() that allows reading data from
> QEMUFile and writing it straight into a given fd.
> 
> This will be used later in VFIO migration code.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   migration/qemu-file.h |  1 +
>   migration/qemu-file.c | 34 ++++++++++++++++++++++++++++++++++
>   2 files changed, 35 insertions(+)
> 
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index fa13d04d78..9d0155a2a1 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -148,6 +148,7 @@ int qemu_file_shutdown(QEMUFile *f);
>   QEMUFile *qemu_file_get_return_path(QEMUFile *f);
>   void qemu_fflush(QEMUFile *f);
>   void qemu_file_set_blocking(QEMUFile *f, bool block);
> +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
>   
>   void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>   void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 2d5f74ffc2..102ab3b439 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -940,3 +940,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file)
>   {
>       return file->ioc;
>   }
> +
> +/*
> + * Read size bytes from QEMUFile f and write them to fd.
> + */
> +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size)
> +{
> +    while (size) {
> +        size_t pending = f->buf_size - f->buf_index;
> +        ssize_t rc;
> +
> +        if (!pending) {
> +            rc = qemu_fill_buffer(f);
> +            if (rc < 0) {
> +                return rc;
> +            }
> +            if (rc == 0) {
> +                return -EIO;
> +            }
> +            continue;
> +        }
> +
> +        rc = write(fd, f->buf + f->buf_index, MIN(pending, size));
> +        if (rc < 0) {
> +            return -errno;
> +        }
> +        if (rc == 0) {
> +            return -EIO;
> +        }
> +        f->buf_index += rc;
> +        size -= rc;
> +    }
> +
> +    return 0;
> +}



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one
  2023-01-12  8:50 ` [PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
@ 2023-01-13 15:46   ` Cédric Le Goater
  0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-01-13 15:46 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins

On 1/12/23 09:50, Avihai Horon wrote:
> vfio_devices_all_running_and_saving() is used to check if migration is
> in pre-copy phase. This is done by checking if migration is in setup or
> active states and if all VFIO devices are in pre-copy state, i.e.
> _SAVING | _RUNNING.
> 
> In VFIO migration protocol v2 pre-copy support is made optional. Hence,
> a matching v2 protocol pre-copy state can't be used here.
> 
> As preparation for adding v2 protocol, change
> vfio_devices_all_running_and_saving() logic such that it doesn't use the
> VFIO pre-copy state.
> 
> The new equivalent logic checks if migration is in active state and if
> all VFIO devices are in running state [1]. No functional changes
> intended.
> 
> [1] Note that checking if migration is in setup or active states and if
> all VFIO devices are in running state doesn't guarantee that we are in
> pre-copy phase, thus we check if migration is only in active state.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/common.c | 17 ++++++++++-------
>   1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f6dd571549..3a35f4afad 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -40,6 +40,7 @@
>   #include "trace.h"
>   #include "qapi/error.h"
>   #include "migration/migration.h"
> +#include "migration/misc.h"
>   #include "sysemu/tpm.h"
>   
>   VFIOGroupList vfio_group_list =
> @@ -363,13 +364,16 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>       return true;
>   }
>   
> -static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
> +/*
> + * Check if all VFIO devices are running and migration is active, which is
> + * essentially equivalent to the migration being in pre-copy phase.
> + */
> +static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>   {
>       VFIOGroup *group;
>       VFIODevice *vbasedev;
> -    MigrationState *ms = migrate_get_current();
>   
> -    if (!migration_is_setup_or_active(ms->state)) {
> +    if (!migration_is_active(migrate_get_current())) {
>           return false;
>       }
>   
> @@ -381,8 +385,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
>                   return false;
>               }
>   
> -            if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
> -                (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
> +            if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
>                   continue;
>               } else {
>                   return false;
> @@ -461,7 +464,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
>       };
>   
>       if (iotlb && container->dirty_pages_supported &&
> -        vfio_devices_all_running_and_saving(container)) {
> +        vfio_devices_all_running_and_mig_active(container)) {
>           return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>       }
>   
> @@ -488,7 +491,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
>           return -errno;
>       }
>   
> -    if (iotlb && vfio_devices_all_running_and_saving(container)) {
> +    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
>           cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
>                                               tcg_enabled() ? DIRTY_CLIENTS_ALL :
>                                               DIRTY_CLIENTS_NOCODE);



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending()
  2023-01-12  8:50 ` [PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending() Avihai Horon
@ 2023-01-13 16:05   ` Cédric Le Goater
  0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-01-13 16:05 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins

On 1/12/23 09:50, Avihai Horon wrote:
> During pre-copy phase of migration vfio_save_pending() is called
> repeatedly and queries the VFIO device for its pending data size.
> 
> As long as pending RAM size is over the threshold, migration can't
> converge and be completed. Therefore, during this time there is no
> point in querying the VFIO device pending data size.
> 
> Avoid these unnecessary queries by issuing them in a RAM pre-copy
> notifier instead of vfio_save_pending().
> 
> This way the VFIO device is queried only when RAM pending data is
> below the threshold, when there is an actual chance for migration to
> converge.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> ---
>   include/hw/vfio/vfio-common.h |  2 ++
>   hw/vfio/migration.c           | 56 +++++++++++++++++++++++++++--------
>   hw/vfio/trace-events          |  1 +
>   3 files changed, 46 insertions(+), 13 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 2ec3346fea..113f8d9208 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -65,11 +65,13 @@ typedef struct VFIOMigration {
>       uint32_t device_state_v1;
>       int vm_running;
>       Notifier migration_state;
> +    NotifierWithReturn migration_data;
>       uint64_t pending_bytes;
>       enum vfio_device_mig_state device_state;
>       int data_fd;
>       void *data_buffer;
>       size_t data_buffer_size;
> +    uint64_t stop_copy_size;
>       bool v2;
>   } VFIOMigration;
>   
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 08f53189fa..04f4397212 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -655,29 +655,19 @@ static void vfio_v1_save_cleanup(void *opaque)
>       trace_vfio_save_cleanup(vbasedev->name);
>   }
>   
> -/*
> - * Migration size of VFIO devices can be as little as a few KBs or as big as
> - * many GBs. This value should be big enough to cover the worst case.
> - */
> -#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
>   static void vfio_save_pending(void *opaque, uint64_t threshold_size,
>                                 uint64_t *res_precopy_only,
>                                 uint64_t *res_compatible,
>                                 uint64_t *res_postcopy_only)
>   {
>       VFIODevice *vbasedev = opaque;
> -    uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
> +    VFIOMigration *migration = vbasedev->migration;
>   
> -    /*
> -     * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
> -     * reported so downtime limit won't be violated.
> -     */
> -    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
> -    *res_precopy_only += stop_copy_size;
> +    *res_precopy_only += migration->stop_copy_size;
>   
>       trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
>                               *res_postcopy_only, *res_compatible,
> -                            stop_copy_size);
> +                            migration->stop_copy_size);
>   }
>   
>   static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
> @@ -1104,6 +1094,40 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>       }
>   }
>   
> +/*
> + * Migration size of VFIO devices can be as little as a few KBs or as big as
> + * many GBs. This value should be big enough to cover the worst case.
> + */
> +#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
> +static int vfio_migration_data_notifier(NotifierWithReturn *n, void *data)
> +{
> +    VFIOMigration *migration = container_of(n, VFIOMigration, migration_data);
> +    VFIODevice *vbasedev = migration->vbasedev;
> +    PrecopyNotifyData *pnd = data;
> +
> +    if (pnd->reason != PRECOPY_NOTIFY_AFTER_BITMAP_SYNC) {
> +        return 0;
> +    }
> +
> +    /* No need to get pending size when finishing migration */
> +    if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
> +        return 0;
> +    }
> +
> +    if (vfio_query_stop_copy_size(vbasedev, &migration->stop_copy_size)) {
> +        /*
> +         * Failed to get pending migration size. Report big pending size so
> +         * downtime limit won't be violated.
> +         */
> +        migration->stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
> +    }
> +
> +    trace_vfio_migration_data_notifier(vbasedev->name,
> +                                       migration->stop_copy_size);
> +
> +    return 0;
> +}
> +
>   static void vfio_migration_exit(VFIODevice *vbasedev)
>   {
>       VFIOMigration *migration = vbasedev->migration;
> @@ -1225,6 +1249,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>   
>           migration->vm_state = qdev_add_vm_change_state_handler(
>               vbasedev->dev, vfio_vmstate_change, vbasedev);
> +
> +        migration->migration_data.notify = vfio_migration_data_notifier;
> +        precopy_add_notifier(&migration->migration_data);
>       } else {
>           register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
>                                &savevm_vfio_v1_handlers, vbasedev);
> @@ -1283,6 +1310,9 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>       if (vbasedev->migration) {
>           VFIOMigration *migration = vbasedev->migration;
>   
> +        if (migration->v2) {
> +            precopy_remove_notifier(&migration->migration_data);
> +        }
>           remove_migration_state_change_notifier(&migration->migration_state);
>           qemu_del_vm_change_state_handler(migration->vm_state);
>           unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index e7b3b4658c..a8a64f0627 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -173,3 +173,4 @@ vfio_load_cleanup(const char *name) " (%s)"
>   vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
>   vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
>   vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
> +vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1
  2023-01-12  8:50 ` [PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
@ 2023-01-13 16:06   ` Cédric Le Goater
  0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-01-13 16:06 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins

On 1/12/23 09:50, Avihai Horon wrote:
> Now that v2 protocol implementation has been added, remove the
> deprecated v1 implementation.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>



Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/hw/vfio/vfio-common.h |   5 -
>   hw/vfio/common.c              |  19 +-
>   hw/vfio/migration.c           | 703 +---------------------------------
>   hw/vfio/trace-events          |   9 -
>   4 files changed, 24 insertions(+), 712 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 113f8d9208..2aba45887c 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -61,18 +61,13 @@ typedef struct VFIORegion {
>   typedef struct VFIOMigration {
>       struct VFIODevice *vbasedev;
>       VMChangeStateEntry *vm_state;
> -    VFIORegion region;
> -    uint32_t device_state_v1;
> -    int vm_running;
>       Notifier migration_state;
>       NotifierWithReturn migration_data;
> -    uint64_t pending_bytes;
>       enum vfio_device_mig_state device_state;
>       int data_fd;
>       void *data_buffer;
>       size_t data_buffer_size;
>       uint64_t stop_copy_size;
> -    bool v2;
>   } VFIOMigration;
>   
>   typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index dcaa77d2a8..9a0dbee6b4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -355,14 +355,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>                   return false;
>               }
>   
> -            if (!migration->v2 &&
> -                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> -                (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
> -                return false;
> -            }
> -
> -            if (migration->v2 &&
> -                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> +            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
>                   (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
>                    migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
>                   return false;
> @@ -393,14 +386,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>                   return false;
>               }
>   
> -            if (!migration->v2 &&
> -                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
> -                continue;
> -            }
> -
> -            if (migration->v2 &&
> -                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> -                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
> +            if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> +                migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) {
>                   continue;
>               } else {
>                   return false;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 04f4397212..7688c83127 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -142,220 +142,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>       return 0;
>   }
>   
> -static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
> -                                  off_t off, bool iswrite)
> -{
> -    int ret;
> -
> -    ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
> -                    pread(vbasedev->fd, val, count, off);
> -    if (ret < count) {
> -        error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
> -                     HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
> -                     vbasedev->name, off, strerror(errno));
> -        return (ret < 0) ? ret : -EINVAL;
> -    }
> -    return 0;
> -}
> -
> -static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
> -                       off_t off, bool iswrite)
> -{
> -    int ret, done = 0;
> -    __u8 *tbuf = buf;
> -
> -    while (count) {
> -        int bytes = 0;
> -
> -        if (count >= 8 && !(off % 8)) {
> -            bytes = 8;
> -        } else if (count >= 4 && !(off % 4)) {
> -            bytes = 4;
> -        } else if (count >= 2 && !(off % 2)) {
> -            bytes = 2;
> -        } else {
> -            bytes = 1;
> -        }
> -
> -        ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        count -= bytes;
> -        done += bytes;
> -        off += bytes;
> -        tbuf += bytes;
> -    }
> -    return done;
> -}
> -
> -#define vfio_mig_read(f, v, c, o)       vfio_mig_rw(f, (__u8 *)v, c, o, false)
> -#define vfio_mig_write(f, v, c, o)      vfio_mig_rw(f, (__u8 *)v, c, o, true)
> -
> -#define VFIO_MIG_STRUCT_OFFSET(f)       \
> -                                 offsetof(struct vfio_device_migration_info, f)
> -/*
> - * Change the device_state register for device @vbasedev. Bits set in @mask
> - * are preserved, bits set in @value are set, and bits not set in either @mask
> - * or @value are cleared in device_state. If the register cannot be accessed,
> - * the resulting state would be invalid, or the device enters an error state,
> - * an error is returned.
> - */
> -
> -static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
> -                                       uint32_t value)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    off_t dev_state_off = region->fd_offset +
> -                          VFIO_MIG_STRUCT_OFFSET(device_state);
> -    uint32_t device_state;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
> -                        dev_state_off);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    device_state = (device_state & mask) | value;
> -
> -    if (!VFIO_DEVICE_STATE_VALID(device_state)) {
> -        return -EINVAL;
> -    }
> -
> -    ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
> -                         dev_state_off);
> -    if (ret < 0) {
> -        int rret;
> -
> -        rret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
> -                             dev_state_off);
> -
> -        if ((rret < 0) || (VFIO_DEVICE_STATE_IS_ERROR(device_state))) {
> -            hw_error("%s: Device in error state 0x%x", vbasedev->name,
> -                     device_state);
> -            return rret ? rret : -EIO;
> -        }
> -        return ret;
> -    }
> -
> -    migration->device_state_v1 = device_state;
> -    trace_vfio_migration_v1_set_state(vbasedev->name, device_state);
> -    return 0;
> -}
> -
> -static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
> -                                   uint64_t data_size, uint64_t *size)
> -{
> -    void *ptr = NULL;
> -    uint64_t limit = 0;
> -    int i;
> -
> -    if (!region->mmaps) {
> -        if (size) {
> -            *size = MIN(data_size, region->size - data_offset);
> -        }
> -        return ptr;
> -    }
> -
> -    for (i = 0; i < region->nr_mmaps; i++) {
> -        VFIOMmap *map = region->mmaps + i;
> -
> -        if ((data_offset >= map->offset) &&
> -            (data_offset < map->offset + map->size)) {
> -
> -            /* check if data_offset is within sparse mmap areas */
> -            ptr = map->mmap + data_offset - map->offset;
> -            if (size) {
> -                *size = MIN(data_size, map->offset + map->size - data_offset);
> -            }
> -            break;
> -        } else if ((data_offset < map->offset) &&
> -                   (!limit || limit > map->offset)) {
> -            /*
> -             * data_offset is not within sparse mmap areas, find size of
> -             * non-mapped area. Check through all list since region->mmaps list
> -             * is not sorted.
> -             */
> -            limit = map->offset;
> -        }
> -    }
> -
> -    if (!ptr && size) {
> -        *size = limit ? MIN(data_size, limit - data_offset) : data_size;
> -    }
> -    return ptr;
> -}
> -
> -static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    uint64_t data_offset = 0, data_size = 0, sz;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
> -                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size),
> -                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
> -                           migration->pending_bytes);
> -
> -    qemu_put_be64(f, data_size);
> -    sz = data_size;
> -
> -    while (sz) {
> -        void *buf;
> -        uint64_t sec_size;
> -        bool buf_allocated = false;
> -
> -        buf = get_data_section_size(region, data_offset, sz, &sec_size);
> -
> -        if (!buf) {
> -            buf = g_try_malloc(sec_size);
> -            if (!buf) {
> -                error_report("%s: Error allocating buffer ", __func__);
> -                return -ENOMEM;
> -            }
> -            buf_allocated = true;
> -
> -            ret = vfio_mig_read(vbasedev, buf, sec_size,
> -                                region->fd_offset + data_offset);
> -            if (ret < 0) {
> -                g_free(buf);
> -                return ret;
> -            }
> -        }
> -
> -        qemu_put_buffer(f, buf, sec_size);
> -
> -        if (buf_allocated) {
> -            g_free(buf);
> -        }
> -        sz -= sec_size;
> -        data_offset += sec_size;
> -    }
> -
> -    ret = qemu_file_get_error(f);
> -
> -    if (!ret && size) {
> -        *size = data_size;
> -    }
> -
> -    bytes_transferred += data_size;
> -    return ret;
> -}
> -
>   static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>                               uint64_t data_size)
>   {
> @@ -368,96 +154,6 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>       return ret;
>   }
>   
> -static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
> -                               uint64_t data_size)
> -{
> -    VFIORegion *region = &vbasedev->migration->region;
> -    uint64_t data_offset = 0, size, report_size;
> -    int ret;
> -
> -    do {
> -        ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
> -                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
> -        if (ret < 0) {
> -            return ret;
> -        }
> -
> -        if (data_offset + data_size > region->size) {
> -            /*
> -             * If data_size is greater than the data section of migration region
> -             * then iterate the write buffer operation. This case can occur if
> -             * size of migration region at destination is smaller than size of
> -             * migration region at source.
> -             */
> -            report_size = size = region->size - data_offset;
> -            data_size -= size;
> -        } else {
> -            report_size = size = data_size;
> -            data_size = 0;
> -        }
> -
> -        trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size);
> -
> -        while (size) {
> -            void *buf;
> -            uint64_t sec_size;
> -            bool buf_alloc = false;
> -
> -            buf = get_data_section_size(region, data_offset, size, &sec_size);
> -
> -            if (!buf) {
> -                buf = g_try_malloc(sec_size);
> -                if (!buf) {
> -                    error_report("%s: Error allocating buffer ", __func__);
> -                    return -ENOMEM;
> -                }
> -                buf_alloc = true;
> -            }
> -
> -            qemu_get_buffer(f, buf, sec_size);
> -
> -            if (buf_alloc) {
> -                ret = vfio_mig_write(vbasedev, buf, sec_size,
> -                        region->fd_offset + data_offset);
> -                g_free(buf);
> -
> -                if (ret < 0) {
> -                    return ret;
> -                }
> -            }
> -            size -= sec_size;
> -            data_offset += sec_size;
> -        }
> -
> -        ret = vfio_mig_write(vbasedev, &report_size, sizeof(report_size),
> -                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    } while (data_size);
> -
> -    return 0;
> -}
> -
> -static int vfio_update_pending(VFIODevice *vbasedev)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    uint64_t pending_bytes = 0;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &pending_bytes, sizeof(pending_bytes),
> -                    region->fd_offset + VFIO_MIG_STRUCT_OFFSET(pending_bytes));
> -    if (ret < 0) {
> -        migration->pending_bytes = 0;
> -        return ret;
> -    }
> -
> -    migration->pending_bytes = pending_bytes;
> -    trace_vfio_update_pending(vbasedev->name, pending_bytes);
> -    return 0;
> -}
> -
>   static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -510,15 +206,6 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
>       migration->data_fd = -1;
>   }
>   
> -static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -
> -    if (migration->region.mmaps) {
> -        vfio_region_unmap(&migration->region);
> -    }
> -}
> -
>   static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
>                                        uint64_t *stop_copy_size)
>   {
> @@ -593,49 +280,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>       return qemu_file_get_error(f);
>   }
>   
> -static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret;
> -
> -    trace_vfio_v1_save_setup(vbasedev->name);
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> -
> -    if (migration->region.mmaps) {
> -        /*
> -         * Calling vfio_region_mmap() from migration thread. Memory API called
> -         * from this function require locking the iothread when called from
> -         * outside the main loop thread.
> -         */
> -        qemu_mutex_lock_iothread();
> -        ret = vfio_region_mmap(&migration->region);
> -        qemu_mutex_unlock_iothread();
> -        if (ret) {
> -            error_report("%s: Failed to mmap VFIO migration region: %s",
> -                         vbasedev->name, strerror(-ret));
> -            error_report("%s: Falling back to slow path", vbasedev->name);
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
> -                                      VFIO_DEVICE_STATE_V1_SAVING);
> -    if (ret) {
> -        error_report("%s: Failed to set state SAVING", vbasedev->name);
> -        return ret;
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    return 0;
> -}
> -
>   static void vfio_save_cleanup(void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -647,14 +291,6 @@ static void vfio_save_cleanup(void *opaque)
>       trace_vfio_save_cleanup(vbasedev->name);
>   }
>   
> -static void vfio_v1_save_cleanup(void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -
> -    vfio_migration_v1_cleanup(vbasedev);
> -    trace_vfio_save_cleanup(vbasedev->name);
> -}
> -
>   static void vfio_save_pending(void *opaque, uint64_t threshold_size,
>                                 uint64_t *res_precopy_only,
>                                 uint64_t *res_compatible,
> @@ -670,73 +306,6 @@ static void vfio_save_pending(void *opaque, uint64_t threshold_size,
>                               migration->stop_copy_size);
>   }
>   
> -static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
> -                                 uint64_t *res_precopy_only,
> -                                 uint64_t *res_compatible,
> -                                 uint64_t *res_postcopy_only)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret;
> -
> -    ret = vfio_update_pending(vbasedev);
> -    if (ret) {
> -        return;
> -    }
> -
> -    *res_precopy_only += migration->pending_bytes;
> -
> -    trace_vfio_v1_save_pending(vbasedev->name, *res_precopy_only,
> -                               *res_postcopy_only, *res_compatible);
> -}
> -
> -static int vfio_save_iterate(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint64_t data_size;
> -    int ret;
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> -
> -    if (migration->pending_bytes == 0) {
> -        ret = vfio_update_pending(vbasedev);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        if (migration->pending_bytes == 0) {
> -            qemu_put_be64(f, 0);
> -            qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -            /* indicates data finished, goto complete phase */
> -            return 1;
> -        }
> -    }
> -
> -    ret = vfio_save_buffer(f, vbasedev, &data_size);
> -    if (ret) {
> -        error_report("%s: vfio_save_buffer failed %s", vbasedev->name,
> -                     strerror(errno));
> -        return ret;
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    /*
> -     * Reset pending_bytes as .save_live_pending is not called during savevm or
> -     * snapshot case, in such case vfio_update_pending() at the start of this
> -     * function updates pending_bytes.
> -     */
> -    migration->pending_bytes = 0;
> -    trace_vfio_save_iterate(vbasedev->name, data_size);
> -    return 0;
> -}
> -
>   static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -772,62 +341,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>       return ret;
>   }
>   
> -static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint64_t data_size;
> -    int ret;
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
> -                                      VFIO_DEVICE_STATE_V1_SAVING);
> -    if (ret) {
> -        error_report("%s: Failed to set state STOP and SAVING",
> -                     vbasedev->name);
> -        return ret;
> -    }
> -
> -    ret = vfio_update_pending(vbasedev);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    while (migration->pending_bytes > 0) {
> -        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> -        ret = vfio_save_buffer(f, vbasedev, &data_size);
> -        if (ret < 0) {
> -            error_report("%s: Failed to save buffer", vbasedev->name);
> -            return ret;
> -        }
> -
> -        if (data_size == 0) {
> -            break;
> -        }
> -
> -        ret = vfio_update_pending(vbasedev);
> -        if (ret) {
> -            return ret;
> -        }
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING,
> -                                      0);
> -    if (ret) {
> -        error_report("%s: Failed to set state STOPPED", vbasedev->name);
> -        return ret;
> -    }
> -
> -    trace_vfio_v1_save_complete_precopy(vbasedev->name);
> -    return ret;
> -}
> -
>   static void vfio_save_state(QEMUFile *f, void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -849,33 +362,6 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
>                                      vbasedev->migration->device_state);
>   }
>   
> -static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret = 0;
> -
> -    if (migration->region.mmaps) {
> -        ret = vfio_region_mmap(&migration->region);
> -        if (ret) {
> -            error_report("%s: Failed to mmap VFIO migration region %d: %s",
> -                         vbasedev->name, migration->region.nr,
> -                         strerror(-ret));
> -            error_report("%s: Falling back to slow path", vbasedev->name);
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
> -                                      VFIO_DEVICE_STATE_V1_RESUMING);
> -    if (ret) {
> -        error_report("%s: Failed to set state RESUMING", vbasedev->name);
> -        if (migration->region.mmaps) {
> -            vfio_region_unmap(&migration->region);
> -        }
> -    }
> -    return ret;
> -}
> -
>   static int vfio_load_cleanup(void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -886,15 +372,6 @@ static int vfio_load_cleanup(void *opaque)
>       return 0;
>   }
>   
> -static int vfio_v1_load_cleanup(void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -
> -    vfio_migration_v1_cleanup(vbasedev);
> -    trace_vfio_load_cleanup(vbasedev->name);
> -    return 0;
> -}
> -
>   static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -928,11 +405,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>               uint64_t data_size = qemu_get_be64(f);
>   
>               if (data_size) {
> -                if (vbasedev->migration->v2) {
> -                    ret = vfio_load_buffer(f, vbasedev, data_size);
> -                } else {
> -                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
> -                }
> +                ret = vfio_load_buffer(f, vbasedev, data_size);
>                   if (ret < 0) {
>                       return ret;
>                   }
> @@ -964,18 +437,6 @@ static const SaveVMHandlers savevm_vfio_handlers = {
>       .load_state = vfio_load_state,
>   };
>   
> -static SaveVMHandlers savevm_vfio_v1_handlers = {
> -    .save_setup = vfio_v1_save_setup,
> -    .save_cleanup = vfio_v1_save_cleanup,
> -    .save_live_pending = vfio_v1_save_pending,
> -    .save_live_iterate = vfio_save_iterate,
> -    .save_live_complete_precopy = vfio_v1_save_complete_precopy,
> -    .save_state = vfio_save_state,
> -    .load_setup = vfio_v1_load_setup,
> -    .load_cleanup = vfio_v1_load_cleanup,
> -    .load_state = vfio_load_state,
> -};
> -
>   /* ---------------------------------------------------------------------- */
>   
>   static void vfio_vmstate_change(void *opaque, bool running, RunState state)
> @@ -1006,70 +467,12 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>                                 mig_state_to_str(new_state));
>   }
>   
> -static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint32_t value, mask;
> -    int ret;
> -
> -    if (vbasedev->migration->vm_running == running) {
> -        return;
> -    }
> -
> -    if (running) {
> -        /*
> -         * Here device state can have one of _SAVING, _RESUMING or _STOP bit.
> -         * Transition from _SAVING to _RUNNING can happen if there is migration
> -         * failure, in that case clear _SAVING bit.
> -         * Transition from _RESUMING to _RUNNING occurs during resuming
> -         * phase, in that case clear _RESUMING bit.
> -         * In both the above cases, set _RUNNING bit.
> -         */
> -        mask = ~VFIO_DEVICE_STATE_MASK;
> -        value = VFIO_DEVICE_STATE_V1_RUNNING;
> -    } else {
> -        /*
> -         * Here device state could be either _RUNNING or _SAVING|_RUNNING. Reset
> -         * _RUNNING bit
> -         */
> -        mask = ~VFIO_DEVICE_STATE_V1_RUNNING;
> -
> -        /*
> -         * When VM state transition to stop for savevm command, device should
> -         * start saving data.
> -         */
> -        if (state == RUN_STATE_SAVE_VM) {
> -            value = VFIO_DEVICE_STATE_V1_SAVING;
> -        } else {
> -            value = 0;
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, mask, value);
> -    if (ret) {
> -        /*
> -         * Migration should be aborted in this case, but vm_state_notify()
> -         * currently does not support reporting failures.
> -         */
> -        error_report("%s: Failed to set device state 0x%x", vbasedev->name,
> -                     (migration->device_state_v1 & mask) | value);
> -        if (migrate_get_current()->to_dst_file) {
> -            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> -        }
> -    }
> -    vbasedev->migration->vm_running = running;
> -    trace_vfio_v1_vmstate_change(vbasedev->name, running, RunState_str(state),
> -            (migration->device_state_v1 & mask) | value);
> -}
> -
>   static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>   {
>       MigrationState *s = data;
>       VFIOMigration *migration = container_of(notifier, VFIOMigration,
>                                               migration_state);
>       VFIODevice *vbasedev = migration->vbasedev;
> -    int ret;
>   
>       trace_vfio_migration_state_notifier(vbasedev->name,
>                                           MigrationStatus_str(s->state));
> @@ -1079,18 +482,8 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>       case MIGRATION_STATUS_CANCELLED:
>       case MIGRATION_STATUS_FAILED:
>           bytes_transferred = 0;
> -        if (migration->v2) {
> -            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> -                                     VFIO_DEVICE_STATE_ERROR);
> -        } else {
> -            ret = vfio_migration_v1_set_state(vbasedev,
> -                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
> -                                                VFIO_DEVICE_STATE_V1_RESUMING),
> -                                              VFIO_DEVICE_STATE_V1_RUNNING);
> -            if (ret) {
> -                error_report("%s: Failed to set state RUNNING", vbasedev->name);
> -            }
> -        }
> +        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> +                                 VFIO_DEVICE_STATE_ERROR);
>       }
>   }
>   
> @@ -1130,12 +523,6 @@ static int vfio_migration_data_notifier(NotifierWithReturn *n, void *data)
>   
>   static void vfio_migration_exit(VFIODevice *vbasedev)
>   {
> -    VFIOMigration *migration = vbasedev->migration;
> -
> -    if (!migration->v2) {
> -        vfio_region_exit(&migration->region);
> -        vfio_region_finalize(&migration->region);
> -    }
>       g_free(vbasedev->migration);
>       vbasedev->migration = NULL;
>   }
> @@ -1175,7 +562,6 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       VFIOMigration *migration;
>       char id[256] = "";
>       g_autofree char *path = NULL, *oid = NULL;
> -    struct vfio_region_info *info;
>       uint64_t mig_flags;
>   
>       if (!vbasedev->ops->vfio_get_object) {
> @@ -1188,52 +574,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       }
>   
>       ret = vfio_migration_query_flags(vbasedev, &mig_flags);
> -    if (!ret) {
> -        /* Migration v2 */
> -        /* Basic migration functionality must be supported */
> -        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> -            return -EOPNOTSUPP;
> -        }
> -        vbasedev->migration = g_new0(VFIOMigration, 1);
> -        vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
> -        vbasedev->migration->data_fd = -1;
> -        vbasedev->migration->v2 = true;
> -    } else if (ret == -ENOTTY) {
> -        /* Migration v1 */
> -        ret = vfio_get_dev_region_info(vbasedev,
> -                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> -                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> -                                       &info);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        vbasedev->migration = g_new0(VFIOMigration, 1);
> -        vbasedev->migration->device_state_v1 = VFIO_DEVICE_STATE_V1_RUNNING;
> -        vbasedev->migration->vm_running = runstate_is_running();
> -
> -        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
> -                                info->index, "migration");
> -        if (ret) {
> -            error_report("%s: Failed to setup VFIO migration region %d: %s",
> -                         vbasedev->name, info->index, strerror(-ret));
> -            goto err;
> -        }
> -
> -        if (!vbasedev->migration->region.size) {
> -            error_report("%s: Invalid zero-sized VFIO migration region %d",
> -                         vbasedev->name, info->index);
> -            ret = -EINVAL;
> -            goto err;
> -        }
> -
> -        g_free(info);
> -    } else {
> +    if (ret) {
>           return ret;
>       }
>   
> +    /* Basic migration functionality must be supported */
> +    if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> +        return -EOPNOTSUPP;
> +    }
> +
> +    vbasedev->migration = g_new0(VFIOMigration, 1);
>       migration = vbasedev->migration;
>       migration->vbasedev = vbasedev;
> +    migration->device_state = VFIO_DEVICE_STATE_RUNNING;
> +    migration->data_fd = -1;
>   
>       oid = vmstate_if_get_id(VMSTATE_IF(DEVICE(obj)));
>       if (oid) {
> @@ -1243,31 +597,18 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       }
>       strpadcpy(id, sizeof(id), path, '\0');
>   
> -    if (migration->v2) {
> -        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                             &savevm_vfio_handlers, vbasedev);
> -
> -        migration->vm_state = qdev_add_vm_change_state_handler(
> -            vbasedev->dev, vfio_vmstate_change, vbasedev);
> -
> -        migration->migration_data.notify = vfio_migration_data_notifier;
> -        precopy_add_notifier(&migration->migration_data);
> -    } else {
> -        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                             &savevm_vfio_v1_handlers, vbasedev);
> -
> -        migration->vm_state = qdev_add_vm_change_state_handler(
> -            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
> -    }
> +    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
> +                         vbasedev);
>   
> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> +                                                           vfio_vmstate_change,
> +                                                           vbasedev);
>       migration->migration_state.notify = vfio_migration_state_notifier;
>       add_migration_state_change_notifier(&migration->migration_state);
> -    return 0;
> +    migration->migration_data.notify = vfio_migration_data_notifier;
> +    precopy_add_notifier(&migration->migration_data);
>   
> -err:
> -    g_free(info);
> -    vfio_migration_exit(vbasedev);
> -    return ret;
> +    return 0;
>   }
>   
>   /* ---------------------------------------------------------------------- */
> @@ -1310,9 +651,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>       if (vbasedev->migration) {
>           VFIOMigration *migration = vbasedev->migration;
>   
> -        if (migration->v2) {
> -            precopy_remove_notifier(&migration->migration_data);
> -        }
> +        precopy_remove_notifier(&migration->migration_data);
>           remove_migration_state_change_notifier(&migration->migration_state);
>           qemu_del_vm_change_state_handler(migration->vm_state);
>           unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index a8a64f0627..60c49b2ecf 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -150,24 +150,15 @@ vfio_display_edid_write_error(void) ""
>   # migration.c
>   vfio_migration_probe(const char *name) " (%s)"
>   vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
> -vfio_migration_v1_set_state(const char *name, uint32_t state) " (%s) state %d"
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> -vfio_v1_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
>   vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
>   vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
> -vfio_v1_save_setup(const char *name) " (%s)"
>   vfio_save_cleanup(const char *name) " (%s)"
> -vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
> -vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
>   vfio_save_device_config_state(const char *name) " (%s)"
>   vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
> -vfio_v1_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
> -vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
>   vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
> -vfio_v1_save_complete_precopy(const char *name) " (%s)"
>   vfio_load_device_config_state(const char *name) " (%s)"
>   vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
> -vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
>   vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
>   vfio_load_cleanup(const char *name) " (%s)"
>   vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file
  2023-01-12  8:50 ` [PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
@ 2023-01-13 16:06   ` Cédric Le Goater
  0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-01-13 16:06 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins

On 1/12/23 09:50, Avihai Horon wrote:
> Sort the migration section of VFIO trace events file alphabetically
> and move two misplaced traces to common.c section.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>



Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> ---
>   hw/vfio/trace-events | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 60c49b2ecf..db9cb94952 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -119,6 +119,8 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
>   vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
>   vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
>   vfio_dma_unmap_overflow_workaround(void) ""
> +vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
> +vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
>   
>   # platform.c
>   vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
> @@ -148,20 +150,18 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u"
>   vfio_display_edid_write_error(void) ""
>   
>   # migration.c
> +vfio_load_cleanup(const char *name) " (%s)"
> +vfio_load_device_config_state(const char *name) " (%s)"
> +vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
> +vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
> +vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64
>   vfio_migration_probe(const char *name) " (%s)"
>   vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
> -vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
> -vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
> +vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
>   vfio_save_cleanup(const char *name) " (%s)"
> +vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
>   vfio_save_device_config_state(const char *name) " (%s)"
>   vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
> -vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
> -vfio_load_device_config_state(const char *name) " (%s)"
> -vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
> -vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
> -vfio_load_cleanup(const char *name) " (%s)"
> -vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
> -vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
> -vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
> -vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64
> +vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
> +vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2
  2023-01-12  8:50 ` [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
@ 2023-01-13 16:13   ` Cédric Le Goater
  2023-01-15 18:42     ` Avihai Horon
  0 siblings, 1 reply; 22+ messages in thread
From: Cédric Le Goater @ 2023-01-13 16:13 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins

On 1/12/23 09:50, Avihai Horon wrote:
> Implement the basic mandatory part of VFIO migration protocol v2.
> This includes all functionality that is necessary to support
> VFIO_MIGRATION_STOP_COPY part of the v2 protocol.
> 
> The two protocols, v1 and v2, will co-exist and in the following patches
> v1 protocol code will be removed.
> 
> There are several main differences between v1 and v2 protocols:
> - VFIO device state is now represented as a finite state machine instead
>    of a bitmap.
> 
> - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
>    ioctl and normal read() and write() instead of the migration region.
> 
> - Pre-copy is made optional in v2 protocol. Support for pre-copy will be
>    added later on.
> 
> Detailed information about VFIO migration protocol v2 and its difference
> compared to v1 protocol can be found here [1].
> 
> [1]
> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


LGTM,

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Still a small issue below,

> ---
>   include/hw/vfio/vfio-common.h |   5 +
>   hw/vfio/common.c              |  19 +-
>   hw/vfio/migration.c           | 455 +++++++++++++++++++++++++++++++---
>   hw/vfio/trace-events          |   7 +
>   4 files changed, 447 insertions(+), 39 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index bbaf72ba00..2ec3346fea 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -66,6 +66,11 @@ typedef struct VFIOMigration {
>       int vm_running;
>       Notifier migration_state;
>       uint64_t pending_bytes;
> +    enum vfio_device_mig_state device_state;
> +    int data_fd;
> +    void *data_buffer;
> +    size_t data_buffer_size;
> +    bool v2;
>   } VFIOMigration;
>   
>   typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 550b2d7ded..dcaa77d2a8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -355,10 +355,18 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>                   return false;
>               }
>   
> -            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> +            if (!migration->v2 &&
> +                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
>                   (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
>                   return false;
>               }
> +
> +            if (migration->v2 &&
> +                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> +                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> +                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
> +                return false;
> +            }
>           }
>       }
>       return true;
> @@ -385,7 +393,14 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>                   return false;
>               }
>   
> -            if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
> +            if (!migration->v2 &&
> +                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
> +                continue;
> +            }
> +
> +            if (migration->v2 &&
> +                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> +                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
>                   continue;
>               } else {
>                   return false;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 9df859f4d3..08f53189fa 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -10,6 +10,7 @@
>   #include "qemu/osdep.h"
>   #include "qemu/main-loop.h"
>   #include "qemu/cutils.h"
> +#include "qemu/units.h"
>   #include <linux/vfio.h>
>   #include <sys/ioctl.h>
>   
> @@ -44,8 +45,103 @@
>   #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xffffffffef100003ULL)
>   #define VFIO_MIG_FLAG_DEV_DATA_STATE    (0xffffffffef100004ULL)
>   
> +/*
> + * This is an arbitrary size based on migration of mlx5 devices, where typically
> + * total device migration size is on the order of 100s of MB. Testing with
> + * larger values, e.g. 128MB and 1GB, did not show a performance improvement.
> + */
> +#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
> +
>   static int64_t bytes_transferred;
>   
> +static const char *mig_state_to_str(enum vfio_device_mig_state state)
> +{
> +    switch (state) {
> +    case VFIO_DEVICE_STATE_ERROR:
> +        return "ERROR";
> +    case VFIO_DEVICE_STATE_STOP:
> +        return "STOP";
> +    case VFIO_DEVICE_STATE_RUNNING:
> +        return "RUNNING";
> +    case VFIO_DEVICE_STATE_STOP_COPY:
> +        return "STOP_COPY";
> +    case VFIO_DEVICE_STATE_RESUMING:
> +        return "RESUMING";
> +    case VFIO_DEVICE_STATE_RUNNING_P2P:
> +        return "RUNNING_P2P";
> +    default:
> +        return "UNKNOWN STATE";
> +    }
> +}
> +
> +static int vfio_migration_set_state(VFIODevice *vbasedev,
> +                                    enum vfio_device_mig_state new_state,
> +                                    enum vfio_device_mig_state recover_state)
> +{
> +    VFIOMigration *migration = vbasedev->migration;
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
> +                              sizeof(struct vfio_device_feature_mig_state),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +    struct vfio_device_feature_mig_state *mig_state =
> +        (struct vfio_device_feature_mig_state *)feature->data;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags =
> +        VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;
> +    mig_state->device_state = new_state;
> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        /* Try to set the device in some good state */
> +        int err = -errno;
> +
> +        error_report(
> +            "%s: Failed setting device state to %s, err: %s. Setting device in recover state %s",
> +                     vbasedev->name, mig_state_to_str(new_state),
> +                     strerror(errno), mig_state_to_str(recover_state));
> +
> +        mig_state->device_state = recover_state;
> +        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +            err = -errno;
> +            error_report(
> +                "%s: Failed setting device in recover state, err: %s. Resetting device",
> +                         vbasedev->name, strerror(errno));
> +
> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
> +                hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
> +                         strerror(errno));
> +            }
> +
> +            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
> +
> +            return err;
> +        }
> +
> +        migration->device_state = recover_state;
> +
> +        return err;
> +    }
> +
> +    migration->device_state = new_state;
> +    if (mig_state->data_fd != -1) {
> +        if (migration->data_fd != -1) {
> +            /*
> +             * This can happen if the device is asynchronously reset and
> +             * terminates a data transfer.
> +             */
> +            error_report("%s: data_fd out of sync", vbasedev->name);
> +            close(mig_state->data_fd);
> +
> +            return -EBADF;
> +        }
> +
> +        migration->data_fd = mig_state->data_fd;
> +    }
> +
> +    trace_vfio_migration_set_state(vbasedev->name, mig_state_to_str(new_state));
> +
> +    return 0;
> +}
> +
>   static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
>                                     off_t off, bool iswrite)
>   {
> @@ -260,6 +356,18 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
>       return ret;
>   }
>   
> +static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
> +                            uint64_t data_size)
> +{
> +    VFIOMigration *migration = vbasedev->migration;
> +    int ret;
> +
> +    ret = qemu_file_get_to_fd(f, migration->data_fd, data_size);
> +    trace_vfio_load_state_device_data(vbasedev->name, data_size, ret);
> +
> +    return ret;
> +}
> +
>   static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>                                  uint64_t data_size)
>   {
> @@ -394,6 +502,14 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
>       return qemu_file_get_error(f);
>   }
>   
> +static void vfio_migration_cleanup(VFIODevice *vbasedev)
> +{
> +    VFIOMigration *migration = vbasedev->migration;
> +
> +    close(migration->data_fd);
> +    migration->data_fd = -1;
> +}
> +
>   static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
>   {
>       VFIOMigration *migration = vbasedev->migration;
> @@ -403,8 +519,80 @@ static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
>       }
>   }
>   
> +static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
> +                                     uint64_t *stop_copy_size)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
> +                              sizeof(struct vfio_device_feature_mig_data_size),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +    struct vfio_device_feature_mig_data_size *mig_data_size =
> +        (struct vfio_device_feature_mig_data_size *)feature->data;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags =
> +        VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIG_DATA_SIZE;
> +
> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        return -errno;
> +    }
> +
> +    *stop_copy_size = mig_data_size->stop_copy_length;
> +
> +    return 0;
> +}
> +
> +/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
> +static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
> +{
> +    ssize_t data_size;
> +
> +    data_size = read(migration->data_fd, migration->data_buffer,
> +                     migration->data_buffer_size);
> +    if (data_size < 0) {
> +        return -errno;
> +    }
> +    if (data_size == 0) {
> +        return 1;
> +    }
> +
> +    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> +    qemu_put_be64(f, data_size);
> +    qemu_put_buffer(f, migration->data_buffer, data_size);
> +    bytes_transferred += data_size;
> +
> +    trace_vfio_save_block(migration->vbasedev->name, data_size);
> +
> +    return qemu_file_get_error(f);
> +}
> +
>   /* ---------------------------------------------------------------------- */
>   
> +static int vfio_save_setup(QEMUFile *f, void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    VFIOMigration *migration = vbasedev->migration;
> +    uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
> +
> +    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> +
> +    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
> +    migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE,
> +                                      stop_copy_size);
> +    migration->data_buffer = g_try_malloc0(migration->data_buffer_size);
> +    if (!migration->data_buffer) {
> +        error_report("%s: Failed to allocate migration data buffer",
> +                     vbasedev->name);
> +        return -ENOMEM;
> +    }
> +
> +    trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size);
> +
> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +
> +    return qemu_file_get_error(f);
> +}
> +
>   static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -448,6 +636,17 @@ static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
>       return 0;
>   }
>   
> +static void vfio_save_cleanup(void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    VFIOMigration *migration = vbasedev->migration;
> +
> +    g_free(migration->data_buffer);
> +    migration->data_buffer = NULL;
> +    vfio_migration_cleanup(vbasedev);
> +    trace_vfio_save_cleanup(vbasedev->name);
> +}
> +
>   static void vfio_v1_save_cleanup(void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -456,6 +655,31 @@ static void vfio_v1_save_cleanup(void *opaque)
>       trace_vfio_save_cleanup(vbasedev->name);
>   }
>   
> +/*
> + * Migration size of VFIO devices can be as little as a few KBs or as big as
> + * many GBs. This value should be big enough to cover the worst case.
> + */
> +#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
> +static void vfio_save_pending(void *opaque, uint64_t threshold_size,
> +                              uint64_t *res_precopy_only,
> +                              uint64_t *res_compatible,
> +                              uint64_t *res_postcopy_only)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
> +
> +    /*
> +     * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
> +     * reported so downtime limit won't be violated.
> +     */
> +    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
> +    *res_precopy_only += stop_copy_size;
> +
> +    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
> +                            *res_postcopy_only, *res_compatible,
> +                            stop_copy_size);
> +}
> +
>   static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
>                                    uint64_t *res_precopy_only,
>                                    uint64_t *res_compatible,
> @@ -523,6 +747,41 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
>       return 0;
>   }
>   
> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    enum vfio_device_mig_state recover_state;
> +    int ret;
> +
> +    /* We reach here with device state STOP only */
> +    recover_state = VFIO_DEVICE_STATE_STOP;
> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
> +                                   recover_state);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    do {
> +        ret = vfio_save_block(f, vbasedev->migration);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    } while (!ret);
> +
> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    recover_state = VFIO_DEVICE_STATE_ERROR;
> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> +                                   recover_state);
> +    trace_vfio_save_complete_precopy(vbasedev->name, ret);
> +
> +    return ret;
> +}
> +
>   static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -592,6 +851,14 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
>       }
>   }
>   
> +static int vfio_load_setup(QEMUFile *f, void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +
> +    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
> +                                   vbasedev->migration->device_state);
> +}
> +
>   static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -619,6 +886,16 @@ static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>       return ret;
>   }
>   
> +static int vfio_load_cleanup(void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +
> +    vfio_migration_cleanup(vbasedev);
> +    trace_vfio_load_cleanup(vbasedev->name);
> +
> +    return 0;
> +}
> +
>   static int vfio_v1_load_cleanup(void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -661,7 +938,11 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>               uint64_t data_size = qemu_get_be64(f);
>   
>               if (data_size) {
> -                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
> +                if (vbasedev->migration->v2) {
> +                    ret = vfio_load_buffer(f, vbasedev, data_size);
> +                } else {
> +                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
> +                }
>                   if (ret < 0) {
>                       return ret;
>                   }
> @@ -682,6 +963,17 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>       return ret;
>   }
>   
> +static const SaveVMHandlers savevm_vfio_handlers = {
> +    .save_setup = vfio_save_setup,
> +    .save_cleanup = vfio_save_cleanup,
> +    .save_live_pending = vfio_save_pending,
> +    .save_live_complete_precopy = vfio_save_complete_precopy,
> +    .save_state = vfio_save_state,
> +    .load_setup = vfio_load_setup,
> +    .load_cleanup = vfio_load_cleanup,
> +    .load_state = vfio_load_state,
> +};
> +
>   static SaveVMHandlers savevm_vfio_v1_handlers = {
>       .save_setup = vfio_v1_save_setup,
>       .save_cleanup = vfio_v1_save_cleanup,
> @@ -696,6 +988,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
>   
>   /* ---------------------------------------------------------------------- */
>   
> +static void vfio_vmstate_change(void *opaque, bool running, RunState state)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    enum vfio_device_mig_state new_state;
> +    int ret;
> +
> +    if (running) {
> +        new_state = VFIO_DEVICE_STATE_RUNNING;
> +    } else {
> +        new_state = VFIO_DEVICE_STATE_STOP;
> +    }
> +
> +    ret = vfio_migration_set_state(vbasedev, new_state,
> +                                   VFIO_DEVICE_STATE_ERROR);
> +    if (ret) {
> +        /*
> +         * Migration should be aborted in this case, but vm_state_notify()
> +         * currently does not support reporting failures.
> +         */
> +        if (migrate_get_current()->to_dst_file) {
> +            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> +        }
> +    }
> +
> +    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
> +                              mig_state_to_str(new_state));
> +}
> +
>   static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -769,12 +1089,17 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>       case MIGRATION_STATUS_CANCELLED:
>       case MIGRATION_STATUS_FAILED:
>           bytes_transferred = 0;
> -        ret = vfio_migration_v1_set_state(vbasedev,
> -                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
> -                                            VFIO_DEVICE_STATE_V1_RESUMING),
> -                                          VFIO_DEVICE_STATE_V1_RUNNING);
> -        if (ret) {
> -            error_report("%s: Failed to set state RUNNING", vbasedev->name);
> +        if (migration->v2) {
> +            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> +                                     VFIO_DEVICE_STATE_ERROR);
> +        } else {
> +            ret = vfio_migration_v1_set_state(vbasedev,
> +                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
> +                                                VFIO_DEVICE_STATE_V1_RESUMING),
> +                                              VFIO_DEVICE_STATE_V1_RUNNING);
> +            if (ret) {
> +                error_report("%s: Failed to set state RUNNING", vbasedev->name);
> +            }
>           }
>       }
>   }
> @@ -783,12 +1108,42 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
>   {
>       VFIOMigration *migration = vbasedev->migration;
>   
> -    vfio_region_exit(&migration->region);
> -    vfio_region_finalize(&migration->region);
> +    if (!migration->v2) {
> +        vfio_region_exit(&migration->region);
> +        vfio_region_finalize(&migration->region);
> +    }
>       g_free(vbasedev->migration);
>       vbasedev->migration = NULL;
>   }
>   
> +static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
> +                                  sizeof(struct vfio_device_feature_migration),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +    struct vfio_device_feature_migration *mig =
> +        (struct vfio_device_feature_migration *)feature->data;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        if (errno == ENOTTY) {
> +            error_report("%s: VFIO migration is not supported in kernel",
> +                         vbasedev->name);
> +        } else {
> +            error_report("%s: Failed to query VFIO migration support, err: %s",
> +                         vbasedev->name, strerror(errno));
> +        }
> +
> +        return -errno;
> +    }
> +
> +    *mig_flags = mig->flags;
> +
> +    return 0;
> +}
> +
>   static int vfio_migration_init(VFIODevice *vbasedev)
>   {
>       int ret;
> @@ -797,6 +1152,7 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       char id[256] = "";
>       g_autofree char *path = NULL, *oid = NULL;
>       struct vfio_region_info *info;
> +    uint64_t mig_flags;

GCC 12 reports :

   ../hw/vfio/migration.c: In function ‘vfio_migration_init’:
   ../hw/vfio/migration.c:1170:25: error: ‘mig_flags’ may be used uninitialized [-Werror=maybe-uninitialized]
    1170 |         if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
         |              ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
   ../hw/vfio/migration.c:1155:14: note: ‘mig_flags’ was declared here
    1155 |     uint64_t mig_flags;
         |              ^~~~~~~~~
   cc1: all warnings being treated as errors

You will need a little fix for that. May be it can be done at merge time :

-    uint64_t mig_flags;
+    uint64_t mig_flags = 0;

Thanks,

C.

>   
>       if (!vbasedev->ops->vfio_get_object) {
>           return -EINVAL;
> @@ -807,34 +1163,50 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>           return -EINVAL;
>       }
>   
> -    ret = vfio_get_dev_region_info(vbasedev,
> -                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> -                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> -                                   &info);
> -    if (ret) {
> -        return ret;
> -    }
> +    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
> +    if (!ret) {
> +        /* Migration v2 */
> +        /* Basic migration functionality must be supported */
> +        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> +            return -EOPNOTSUPP;
> +        }
> +        vbasedev->migration = g_new0(VFIOMigration, 1);
> +        vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
> +        vbasedev->migration->data_fd = -1;
> +        vbasedev->migration->v2 = true;
> +    } else if (ret == -ENOTTY) {
> +        /* Migration v1 */
> +        ret = vfio_get_dev_region_info(vbasedev,
> +                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> +                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> +                                       &info);
> +        if (ret) {
> +            return ret;
> +        }
>   
> -    vbasedev->migration = g_new0(VFIOMigration, 1);
> -    vbasedev->migration->device_state_v1 = VFIO_DEVICE_STATE_V1_RUNNING;
> -    vbasedev->migration->vm_running = runstate_is_running();
> +        vbasedev->migration = g_new0(VFIOMigration, 1);
> +        vbasedev->migration->device_state_v1 = VFIO_DEVICE_STATE_V1_RUNNING;
> +        vbasedev->migration->vm_running = runstate_is_running();
>   
> -    ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
> -                            info->index, "migration");
> -    if (ret) {
> -        error_report("%s: Failed to setup VFIO migration region %d: %s",
> -                     vbasedev->name, info->index, strerror(-ret));
> -        goto err;
> -    }
> +        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
> +                                info->index, "migration");
> +        if (ret) {
> +            error_report("%s: Failed to setup VFIO migration region %d: %s",
> +                         vbasedev->name, info->index, strerror(-ret));
> +            goto err;
> +        }
>   
> -    if (!vbasedev->migration->region.size) {
> -        error_report("%s: Invalid zero-sized VFIO migration region %d",
> -                     vbasedev->name, info->index);
> -        ret = -EINVAL;
> -        goto err;
> -    }
> +        if (!vbasedev->migration->region.size) {
> +            error_report("%s: Invalid zero-sized VFIO migration region %d",
> +                         vbasedev->name, info->index);
> +            ret = -EINVAL;
> +            goto err;
> +        }
>   
> -    g_free(info);
> +        g_free(info);
> +    } else {
> +        return ret;
> +    }
>   
>       migration = vbasedev->migration;
>       migration->vbasedev = vbasedev;
> @@ -847,11 +1219,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       }
>       strpadcpy(id, sizeof(id), path, '\0');
>   
> -    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                         &savevm_vfio_v1_handlers, vbasedev);
> +    if (migration->v2) {
> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> +                             &savevm_vfio_handlers, vbasedev);
> +
> +        migration->vm_state = qdev_add_vm_change_state_handler(
> +            vbasedev->dev, vfio_vmstate_change, vbasedev);
> +    } else {
> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> +                             &savevm_vfio_v1_handlers, vbasedev);
> +
> +        migration->vm_state = qdev_add_vm_change_state_handler(
> +            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
> +    }
>   
> -    migration->vm_state = qdev_add_vm_change_state_handler(
> -        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
>       migration->migration_state.notify = vfio_migration_state_notifier;
>       add_migration_state_change_notifier(&migration->migration_state);
>       return 0;
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 9eb9aff295..e7b3b4658c 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -149,20 +149,27 @@ vfio_display_edid_write_error(void) ""
>   
>   # migration.c
>   vfio_migration_probe(const char *name) " (%s)"
> +vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
>   vfio_migration_v1_set_state(const char *name, uint32_t state) " (%s) state %d"
> +vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_v1_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
>   vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
> +vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
>   vfio_v1_save_setup(const char *name) " (%s)"
>   vfio_save_cleanup(const char *name) " (%s)"
>   vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
>   vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
>   vfio_save_device_config_state(const char *name) " (%s)"
> +vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
>   vfio_v1_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
>   vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
> +vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
>   vfio_v1_save_complete_precopy(const char *name) " (%s)"
>   vfio_load_device_config_state(const char *name) " (%s)"
>   vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
>   vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
> +vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
>   vfio_load_cleanup(const char *name) " (%s)"
>   vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
>   vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
> +vfio_save_block(const char *name, int data_size) " (%s) data_size %d"



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol
  2023-01-12  8:50 ` [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon
@ 2023-01-13 17:12   ` Cédric Le Goater
  0 siblings, 0 replies; 22+ messages in thread
From: Cédric Le Goater @ 2023-01-13 17:12 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins

On 1/12/23 09:50, Avihai Horon wrote:
> Now that VFIO migration protocol v2 has been implemented and v1 protocol
> has been removed, update the documentation according to v2 protocol.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> ---
>   docs/devel/vfio-migration.rst | 68 ++++++++++++++++-------------------
>   1 file changed, 30 insertions(+), 38 deletions(-)
> 
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index 9ff6163c88..1d50c2fe5f 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -7,46 +7,39 @@ the guest is running on source host and restoring this saved state on the
>   destination host. This document details how saving and restoring of VFIO
>   devices is done in QEMU.
>   
> -Migration of VFIO devices consists of two phases: the optional pre-copy phase,
> -and the stop-and-copy phase. The pre-copy phase is iterative and allows to
> -accommodate VFIO devices that have a large amount of data that needs to be
> -transferred. The iterative pre-copy phase of migration allows for the guest to
> -continue whilst the VFIO device state is transferred to the destination, this
> -helps to reduce the total downtime of the VM. VFIO devices can choose to skip
> -the pre-copy phase of migration by returning pending_bytes as zero during the
> -pre-copy phase.
> +Migration of VFIO devices currently consists of a single stop-and-copy phase.
> +During the stop-and-copy phase the guest is stopped and the entire VFIO device
> +data is transferred to the destination.
> +
> +The pre-copy phase of migration is currently not supported for VFIO devices.
> +Support for VFIO pre-copy will be added later on.
>   
>   A detailed description of the UAPI for VFIO device migration can be found in
> -the comment for the ``vfio_device_migration_info`` structure in the header
> -file linux-headers/linux/vfio.h.
> +the comment for the ``vfio_device_mig_state`` structure in the header file
> +linux-headers/linux/vfio.h.
>   
>   VFIO implements the device hooks for the iterative approach as follows:
>   
> -* A ``save_setup`` function that sets up the migration region and sets _SAVING
> -  flag in the VFIO device state.
> +* A ``save_setup`` function that sets up migration on the source.
>   
> -* A ``load_setup`` function that sets up the migration region on the
> -  destination and sets _RESUMING flag in the VFIO device state.
> +* A ``load_setup`` function that sets the VFIO device on the destination in
> +  _RESUMING state.
>   
>   * A ``save_live_pending`` function that reads pending_bytes from the vendor
>     driver, which indicates the amount of data that the vendor driver has yet to
>     save for the VFIO device.
>   
> -* A ``save_live_iterate`` function that reads the VFIO device's data from the
> -  vendor driver through the migration region during iterative phase.
> -
>   * A ``save_state`` function to save the device config space if it is present.
>   
> -* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the
> -  VFIO device state and iteratively copies the remaining data for the VFIO
> -  device until the vendor driver indicates that no data remains (pending bytes
> -  is zero).
> +* A ``save_live_complete_precopy`` function that sets the VFIO device in
> +  _STOP_COPY state and iteratively copies the data for the VFIO device until
> +  the vendor driver indicates that no data remains.
>   
>   * A ``load_state`` function that loads the config section and the data
> -  sections that are generated by the save functions above
> +  sections that are generated by the save functions above.
>   
>   * ``cleanup`` functions for both save and load that perform any migration
> -  related cleanup, including unmapping the migration region
> +  related cleanup.
>   
>   
>   The VFIO migration code uses a VM state change handler to change the VFIO
> @@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinned by the vendor driver
>   can also be written by the device. There is currently no device or IOMMU
>   support for dirty page tracking in hardware.
>   
> -By default, dirty pages are tracked when the device is in pre-copy as well as
> -stop-and-copy phase. So, a page pinned by the vendor driver will be copied to
> -the destination in both phases. Copying dirty pages in pre-copy phase helps
> -QEMU to predict if it can achieve its downtime tolerances. If QEMU during
> -pre-copy phase keeps finding dirty pages continuously, then it understands
> -that even in stop-and-copy phase, it is likely to find dirty pages and can
> -predict the downtime accordingly.
> +By default, dirty pages are tracked during pre-copy as well as stop-and-copy
> +phase. So, a page pinned by the vendor driver will be copied to the destination
> +in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
> +it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
> +finding dirty pages continuously, then it understands that even in stop-and-copy
> +phase, it is likely to find dirty pages and can predict the downtime
> +accordingly.
>   
>   QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
>   which disables querying the dirty bitmap during pre-copy phase. If it is set to
> @@ -111,23 +104,22 @@ Live migration save path
>                                     |
>                        migrate_init spawns migration_thread
>                   Migration thread then calls each device's .save_setup()
> -                    (RUNNING, _SETUP, _RUNNING|_SAVING)
> +                       (RUNNING, _SETUP, _RUNNING)
>                                     |
> -                    (RUNNING, _ACTIVE, _RUNNING|_SAVING)
> +                      (RUNNING, _ACTIVE, _RUNNING)
>                If device is active, get pending_bytes by .save_live_pending()
>             If total pending_bytes >= threshold_size, call .save_live_iterate()
> -                  Data of VFIO device for pre-copy phase is copied
>           Iterate till total pending bytes converge and are less than threshold
>                                     |
>     On migration completion, vCPU stops and calls .save_live_complete_precopy for
> -   each active device. The VFIO device is then transitioned into _SAVING state
> -                   (FINISH_MIGRATE, _DEVICE, _SAVING)
> +  each active device. The VFIO device is then transitioned into _STOP_COPY state
> +                  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
>                                     |
>        For the VFIO device, iterate in .save_live_complete_precopy until
>                            pending data is 0
> -                   (FINISH_MIGRATE, _DEVICE, _STOPPED)
> +                   (FINISH_MIGRATE, _DEVICE, _STOP)
>                                     |
> -                 (FINISH_MIGRATE, _COMPLETED, _STOPPED)
> +                 (FINISH_MIGRATE, _COMPLETED, _STOP)
>                Migraton thread schedules cleanup bottom half and exits
>   
>   Live migration resume path
> @@ -136,7 +128,7 @@ Live migration resume path
>   ::
>   
>                 Incoming migration calls .load_setup for each device
> -                       (RESTORE_VM, _ACTIVE, _STOPPED)
> +                       (RESTORE_VM, _ACTIVE, _STOP)
>                                    |
>          For each device, .load_state is called for that device section data
>                          (RESTORE_VM, _ACTIVE, _RESUMING)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2
  2023-01-13 16:13   ` Cédric Le Goater
@ 2023-01-15 18:42     ` Avihai Horon
  0 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2023-01-15 18:42 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Halil Pasic, Christian Borntraeger, Eric Farman,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	Thomas Huth, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Eric Blake,
	Vladimir Sementsov-Ogievskiy, John Snow, qemu-s390x, qemu-block,
	Yishai Hadas, Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Joao Martins


On 13/01/2023 18:13, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 1/12/23 09:50, Avihai Horon wrote:
>> Implement the basic mandatory part of VFIO migration protocol v2.
>> This includes all functionality that is necessary to support
>> VFIO_MIGRATION_STOP_COPY part of the v2 protocol.
>>
>> The two protocols, v1 and v2, will co-exist and in the following patches
>> v1 protocol code will be removed.
>>
>> There are several main differences between v1 and v2 protocols:
>> - VFIO device state is now represented as a finite state machine instead
>>    of a bitmap.
>>
>> - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
>>    ioctl and normal read() and write() instead of the migration region.
>>
>> - Pre-copy is made optional in v2 protocol. Support for pre-copy will be
>>    added later on.
>>
>> Detailed information about VFIO migration protocol v2 and its difference
>> compared to v1 protocol can be found here [1].
>>
>> [1]
>> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>
>
> LGTM,
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
> Still a small issue below,
>
I have sent v7 with a fix for this.

Thanks for reviewing!

>> ---
>>   include/hw/vfio/vfio-common.h |   5 +
>>   hw/vfio/common.c              |  19 +-
>>   hw/vfio/migration.c           | 455 +++++++++++++++++++++++++++++++---
>>   hw/vfio/trace-events          |   7 +
>>   4 files changed, 447 insertions(+), 39 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h 
>> b/include/hw/vfio/vfio-common.h
>> index bbaf72ba00..2ec3346fea 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -66,6 +66,11 @@ typedef struct VFIOMigration {
>>       int vm_running;
>>       Notifier migration_state;
>>       uint64_t pending_bytes;
>> +    enum vfio_device_mig_state device_state;
>> +    int data_fd;
>> +    void *data_buffer;
>> +    size_t data_buffer_size;
>> +    bool v2;
>>   } VFIOMigration;
>>
>>   typedef struct VFIOAddressSpace {
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 550b2d7ded..dcaa77d2a8 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -355,10 +355,18 @@ static bool 
>> vfio_devices_all_dirty_tracking(VFIOContainer *container)
>>                   return false;
>>               }
>>
>> -            if ((vbasedev->pre_copy_dirty_page_tracking == 
>> ON_OFF_AUTO_OFF) &&
>> +            if (!migration->v2 &&
>> +                (vbasedev->pre_copy_dirty_page_tracking == 
>> ON_OFF_AUTO_OFF) &&
>>                   (migration->device_state_v1 & 
>> VFIO_DEVICE_STATE_V1_RUNNING)) {
>>                   return false;
>>               }
>> +
>> +            if (migration->v2 &&
>> +                (vbasedev->pre_copy_dirty_page_tracking == 
>> ON_OFF_AUTO_OFF) &&
>> +                (migration->device_state == 
>> VFIO_DEVICE_STATE_RUNNING ||
>> +                 migration->device_state == 
>> VFIO_DEVICE_STATE_RUNNING_P2P)) {
>> +                return false;
>> +            }
>>           }
>>       }
>>       return true;
>> @@ -385,7 +393,14 @@ static bool 
>> vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>>                   return false;
>>               }
>>
>> -            if (migration->device_state_v1 & 
>> VFIO_DEVICE_STATE_V1_RUNNING) {
>> +            if (!migration->v2 &&
>> +                migration->device_state_v1 & 
>> VFIO_DEVICE_STATE_V1_RUNNING) {
>> +                continue;
>> +            }
>> +
>> +            if (migration->v2 &&
>> +                (migration->device_state == 
>> VFIO_DEVICE_STATE_RUNNING ||
>> +                 migration->device_state == 
>> VFIO_DEVICE_STATE_RUNNING_P2P)) {
>>                   continue;
>>               } else {
>>                   return false;
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 9df859f4d3..08f53189fa 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -10,6 +10,7 @@
>>   #include "qemu/osdep.h"
>>   #include "qemu/main-loop.h"
>>   #include "qemu/cutils.h"
>> +#include "qemu/units.h"
>>   #include <linux/vfio.h>
>>   #include <sys/ioctl.h>
>>
>> @@ -44,8 +45,103 @@
>>   #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL)
>>   #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL)
>>
>> +/*
>> + * This is an arbitrary size based on migration of mlx5 devices, 
>> where typically
>> + * total device migration size is on the order of 100s of MB. 
>> Testing with
>> + * larger values, e.g. 128MB and 1GB, did not show a performance 
>> improvement.
>> + */
>> +#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
>> +
>>   static int64_t bytes_transferred;
>>
>> +static const char *mig_state_to_str(enum vfio_device_mig_state state)
>> +{
>> +    switch (state) {
>> +    case VFIO_DEVICE_STATE_ERROR:
>> +        return "ERROR";
>> +    case VFIO_DEVICE_STATE_STOP:
>> +        return "STOP";
>> +    case VFIO_DEVICE_STATE_RUNNING:
>> +        return "RUNNING";
>> +    case VFIO_DEVICE_STATE_STOP_COPY:
>> +        return "STOP_COPY";
>> +    case VFIO_DEVICE_STATE_RESUMING:
>> +        return "RESUMING";
>> +    case VFIO_DEVICE_STATE_RUNNING_P2P:
>> +        return "RUNNING_P2P";
>> +    default:
>> +        return "UNKNOWN STATE";
>> +    }
>> +}
>> +
>> +static int vfio_migration_set_state(VFIODevice *vbasedev,
>> +                                    enum vfio_device_mig_state 
>> new_state,
>> +                                    enum vfio_device_mig_state 
>> recover_state)
>> +{
>> +    VFIOMigration *migration = vbasedev->migration;
>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
>> +                              sizeof(struct 
>> vfio_device_feature_mig_state),
>> +                              sizeof(uint64_t))] = {};
>> +    struct vfio_device_feature *feature = (struct 
>> vfio_device_feature *)buf;
>> +    struct vfio_device_feature_mig_state *mig_state =
>> +        (struct vfio_device_feature_mig_state *)feature->data;
>> +
>> +    feature->argsz = sizeof(buf);
>> +    feature->flags =
>> +        VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;
>> +    mig_state->device_state = new_state;
>> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> +        /* Try to set the device in some good state */
>> +        int err = -errno;
>> +
>> +        error_report(
>> +            "%s: Failed setting device state to %s, err: %s. Setting 
>> device in recover state %s",
>> +                     vbasedev->name, mig_state_to_str(new_state),
>> +                     strerror(errno), mig_state_to_str(recover_state));
>> +
>> +        mig_state->device_state = recover_state;
>> +        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> +            err = -errno;
>> +            error_report(
>> +                "%s: Failed setting device in recover state, err: 
>> %s. Resetting device",
>> +                         vbasedev->name, strerror(errno));
>> +
>> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
>> +                hw_error("%s: Failed resetting device, err: %s", 
>> vbasedev->name,
>> +                         strerror(errno));
>> +            }
>> +
>> +            migration->device_state = VFIO_DEVICE_STATE_RUNNING;
>> +
>> +            return err;
>> +        }
>> +
>> +        migration->device_state = recover_state;
>> +
>> +        return err;
>> +    }
>> +
>> +    migration->device_state = new_state;
>> +    if (mig_state->data_fd != -1) {
>> +        if (migration->data_fd != -1) {
>> +            /*
>> +             * This can happen if the device is asynchronously reset 
>> and
>> +             * terminates a data transfer.
>> +             */
>> +            error_report("%s: data_fd out of sync", vbasedev->name);
>> +            close(mig_state->data_fd);
>> +
>> +            return -EBADF;
>> +        }
>> +
>> +        migration->data_fd = mig_state->data_fd;
>> +    }
>> +
>> +    trace_vfio_migration_set_state(vbasedev->name, 
>> mig_state_to_str(new_state));
>> +
>> +    return 0;
>> +}
>> +
>>   static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, 
>> int count,
>>                                     off_t off, bool iswrite)
>>   {
>> @@ -260,6 +356,18 @@ static int vfio_save_buffer(QEMUFile *f, 
>> VFIODevice *vbasedev, uint64_t *size)
>>       return ret;
>>   }
>>
>> +static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>> +                            uint64_t data_size)
>> +{
>> +    VFIOMigration *migration = vbasedev->migration;
>> +    int ret;
>> +
>> +    ret = qemu_file_get_to_fd(f, migration->data_fd, data_size);
>> +    trace_vfio_load_state_device_data(vbasedev->name, data_size, ret);
>> +
>> +    return ret;
>> +}
>> +
>>   static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>>                                  uint64_t data_size)
>>   {
>> @@ -394,6 +502,14 @@ static int 
>> vfio_load_device_config_state(QEMUFile *f, void *opaque)
>>       return qemu_file_get_error(f);
>>   }
>>
>> +static void vfio_migration_cleanup(VFIODevice *vbasedev)
>> +{
>> +    VFIOMigration *migration = vbasedev->migration;
>> +
>> +    close(migration->data_fd);
>> +    migration->data_fd = -1;
>> +}
>> +
>>   static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
>>   {
>>       VFIOMigration *migration = vbasedev->migration;
>> @@ -403,8 +519,80 @@ static void vfio_migration_v1_cleanup(VFIODevice 
>> *vbasedev)
>>       }
>>   }
>>
>> +static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
>> +                                     uint64_t *stop_copy_size)
>> +{
>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
>> +                              sizeof(struct 
>> vfio_device_feature_mig_data_size),
>> +                              sizeof(uint64_t))] = {};
>> +    struct vfio_device_feature *feature = (struct 
>> vfio_device_feature *)buf;
>> +    struct vfio_device_feature_mig_data_size *mig_data_size =
>> +        (struct vfio_device_feature_mig_data_size *)feature->data;
>> +
>> +    feature->argsz = sizeof(buf);
>> +    feature->flags =
>> +        VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIG_DATA_SIZE;
>> +
>> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> +        return -errno;
>> +    }
>> +
>> +    *stop_copy_size = mig_data_size->stop_copy_length;
>> +
>> +    return 0;
>> +}
>> +
>> +/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if 
>> error */
>> +static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
>> +{
>> +    ssize_t data_size;
>> +
>> +    data_size = read(migration->data_fd, migration->data_buffer,
>> +                     migration->data_buffer_size);
>> +    if (data_size < 0) {
>> +        return -errno;
>> +    }
>> +    if (data_size == 0) {
>> +        return 1;
>> +    }
>> +
>> +    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
>> +    qemu_put_be64(f, data_size);
>> +    qemu_put_buffer(f, migration->data_buffer, data_size);
>> +    bytes_transferred += data_size;
>> +
>> +    trace_vfio_save_block(migration->vbasedev->name, data_size);
>> +
>> +    return qemu_file_get_error(f);
>> +}
>> +
>>   /* 
>> ---------------------------------------------------------------------- 
>> */
>>
>> +static int vfio_save_setup(QEMUFile *f, void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    VFIOMigration *migration = vbasedev->migration;
>> +    uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
>> +
>> +    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
>> +
>> +    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
>> +    migration->data_buffer_size = 
>> MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE,
>> +                                      stop_copy_size);
>> +    migration->data_buffer = 
>> g_try_malloc0(migration->data_buffer_size);
>> +    if (!migration->data_buffer) {
>> +        error_report("%s: Failed to allocate migration data buffer",
>> +                     vbasedev->name);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size);
>> +
>> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> +
>> +    return qemu_file_get_error(f);
>> +}
>> +
>>   static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -448,6 +636,17 @@ static int vfio_v1_save_setup(QEMUFile *f, void 
>> *opaque)
>>       return 0;
>>   }
>>
>> +static void vfio_save_cleanup(void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    VFIOMigration *migration = vbasedev->migration;
>> +
>> +    g_free(migration->data_buffer);
>> +    migration->data_buffer = NULL;
>> +    vfio_migration_cleanup(vbasedev);
>> +    trace_vfio_save_cleanup(vbasedev->name);
>> +}
>> +
>>   static void vfio_v1_save_cleanup(void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -456,6 +655,31 @@ static void vfio_v1_save_cleanup(void *opaque)
>>       trace_vfio_save_cleanup(vbasedev->name);
>>   }
>>
>> +/*
>> + * Migration size of VFIO devices can be as little as a few KBs or 
>> as big as
>> + * many GBs. This value should be big enough to cover the worst case.
>> + */
>> +#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
>> +static void vfio_save_pending(void *opaque, uint64_t threshold_size,
>> +                              uint64_t *res_precopy_only,
>> +                              uint64_t *res_compatible,
>> +                              uint64_t *res_postcopy_only)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
>> +
>> +    /*
>> +     * If getting pending migration size fails, 
>> VFIO_MIG_STOP_COPY_SIZE is
>> +     * reported so downtime limit won't be violated.
>> +     */
>> +    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
>> +    *res_precopy_only += stop_copy_size;
>> +
>> +    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
>> +                            *res_postcopy_only, *res_compatible,
>> +                            stop_copy_size);
>> +}
>> +
>>   static void vfio_v1_save_pending(void *opaque, uint64_t 
>> threshold_size,
>>                                    uint64_t *res_precopy_only,
>>                                    uint64_t *res_compatible,
>> @@ -523,6 +747,41 @@ static int vfio_save_iterate(QEMUFile *f, void 
>> *opaque)
>>       return 0;
>>   }
>>
>> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    enum vfio_device_mig_state recover_state;
>> +    int ret;
>> +
>> +    /* We reach here with device state STOP only */
>> +    recover_state = VFIO_DEVICE_STATE_STOP;
>> +    ret = vfio_migration_set_state(vbasedev, 
>> VFIO_DEVICE_STATE_STOP_COPY,
>> +                                   recover_state);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    do {
>> +        ret = vfio_save_block(f, vbasedev->migration);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    } while (!ret);
>> +
>> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> +    ret = qemu_file_get_error(f);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    recover_state = VFIO_DEVICE_STATE_ERROR;
>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> +                                   recover_state);
>> +    trace_vfio_save_complete_precopy(vbasedev->name, ret);
>> +
>> +    return ret;
>> +}
>> +
>>   static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -592,6 +851,14 @@ static void vfio_save_state(QEMUFile *f, void 
>> *opaque)
>>       }
>>   }
>>
>> +static int vfio_load_setup(QEMUFile *f, void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +
>> +    return vfio_migration_set_state(vbasedev, 
>> VFIO_DEVICE_STATE_RESUMING,
>> + vbasedev->migration->device_state);
>> +}
>> +
>>   static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -619,6 +886,16 @@ static int vfio_v1_load_setup(QEMUFile *f, void 
>> *opaque)
>>       return ret;
>>   }
>>
>> +static int vfio_load_cleanup(void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +
>> +    vfio_migration_cleanup(vbasedev);
>> +    trace_vfio_load_cleanup(vbasedev->name);
>> +
>> +    return 0;
>> +}
>> +
>>   static int vfio_v1_load_cleanup(void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -661,7 +938,11 @@ static int vfio_load_state(QEMUFile *f, void 
>> *opaque, int version_id)
>>               uint64_t data_size = qemu_get_be64(f);
>>
>>               if (data_size) {
>> -                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>> +                if (vbasedev->migration->v2) {
>> +                    ret = vfio_load_buffer(f, vbasedev, data_size);
>> +                } else {
>> +                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>> +                }
>>                   if (ret < 0) {
>>                       return ret;
>>                   }
>> @@ -682,6 +963,17 @@ static int vfio_load_state(QEMUFile *f, void 
>> *opaque, int version_id)
>>       return ret;
>>   }
>>
>> +static const SaveVMHandlers savevm_vfio_handlers = {
>> +    .save_setup = vfio_save_setup,
>> +    .save_cleanup = vfio_save_cleanup,
>> +    .save_live_pending = vfio_save_pending,
>> +    .save_live_complete_precopy = vfio_save_complete_precopy,
>> +    .save_state = vfio_save_state,
>> +    .load_setup = vfio_load_setup,
>> +    .load_cleanup = vfio_load_cleanup,
>> +    .load_state = vfio_load_state,
>> +};
>> +
>>   static SaveVMHandlers savevm_vfio_v1_handlers = {
>>       .save_setup = vfio_v1_save_setup,
>>       .save_cleanup = vfio_v1_save_cleanup,
>> @@ -696,6 +988,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
>>
>>   /* 
>> ---------------------------------------------------------------------- 
>> */
>>
>> +static void vfio_vmstate_change(void *opaque, bool running, RunState 
>> state)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    enum vfio_device_mig_state new_state;
>> +    int ret;
>> +
>> +    if (running) {
>> +        new_state = VFIO_DEVICE_STATE_RUNNING;
>> +    } else {
>> +        new_state = VFIO_DEVICE_STATE_STOP;
>> +    }
>> +
>> +    ret = vfio_migration_set_state(vbasedev, new_state,
>> +                                   VFIO_DEVICE_STATE_ERROR);
>> +    if (ret) {
>> +        /*
>> +         * Migration should be aborted in this case, but 
>> vm_state_notify()
>> +         * currently does not support reporting failures.
>> +         */
>> +        if (migrate_get_current()->to_dst_file) {
>> + qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
>> +        }
>> +    }
>> +
>> +    trace_vfio_vmstate_change(vbasedev->name, running, 
>> RunState_str(state),
>> +                              mig_state_to_str(new_state));
>> +}
>> +
>>   static void vfio_v1_vmstate_change(void *opaque, bool running, 
>> RunState state)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -769,12 +1089,17 @@ static void 
>> vfio_migration_state_notifier(Notifier *notifier, void *data)
>>       case MIGRATION_STATUS_CANCELLED:
>>       case MIGRATION_STATUS_FAILED:
>>           bytes_transferred = 0;
>> -        ret = vfio_migration_v1_set_state(vbasedev,
>> - ~(VFIO_DEVICE_STATE_V1_SAVING |
>> - VFIO_DEVICE_STATE_V1_RESUMING),
>> - VFIO_DEVICE_STATE_V1_RUNNING);
>> -        if (ret) {
>> -            error_report("%s: Failed to set state RUNNING", 
>> vbasedev->name);
>> +        if (migration->v2) {
>> +            vfio_migration_set_state(vbasedev, 
>> VFIO_DEVICE_STATE_RUNNING,
>> +                                     VFIO_DEVICE_STATE_ERROR);
>> +        } else {
>> +            ret = vfio_migration_v1_set_state(vbasedev,
>> + ~(VFIO_DEVICE_STATE_V1_SAVING |
>> + VFIO_DEVICE_STATE_V1_RESUMING),
>> + VFIO_DEVICE_STATE_V1_RUNNING);
>> +            if (ret) {
>> +                error_report("%s: Failed to set state RUNNING", 
>> vbasedev->name);
>> +            }
>>           }
>>       }
>>   }
>> @@ -783,12 +1108,42 @@ static void vfio_migration_exit(VFIODevice 
>> *vbasedev)
>>   {
>>       VFIOMigration *migration = vbasedev->migration;
>>
>> -    vfio_region_exit(&migration->region);
>> -    vfio_region_finalize(&migration->region);
>> +    if (!migration->v2) {
>> +        vfio_region_exit(&migration->region);
>> +        vfio_region_finalize(&migration->region);
>> +    }
>>       g_free(vbasedev->migration);
>>       vbasedev->migration = NULL;
>>   }
>>
>> +static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t 
>> *mig_flags)
>> +{
>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
>> +                                  sizeof(struct 
>> vfio_device_feature_migration),
>> +                              sizeof(uint64_t))] = {};
>> +    struct vfio_device_feature *feature = (struct 
>> vfio_device_feature *)buf;
>> +    struct vfio_device_feature_migration *mig =
>> +        (struct vfio_device_feature_migration *)feature->data;
>> +
>> +    feature->argsz = sizeof(buf);
>> +    feature->flags = VFIO_DEVICE_FEATURE_GET | 
>> VFIO_DEVICE_FEATURE_MIGRATION;
>> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> +        if (errno == ENOTTY) {
>> +            error_report("%s: VFIO migration is not supported in 
>> kernel",
>> +                         vbasedev->name);
>> +        } else {
>> +            error_report("%s: Failed to query VFIO migration 
>> support, err: %s",
>> +                         vbasedev->name, strerror(errno));
>> +        }
>> +
>> +        return -errno;
>> +    }
>> +
>> +    *mig_flags = mig->flags;
>> +
>> +    return 0;
>> +}
>> +
>>   static int vfio_migration_init(VFIODevice *vbasedev)
>>   {
>>       int ret;
>> @@ -797,6 +1152,7 @@ static int vfio_migration_init(VFIODevice 
>> *vbasedev)
>>       char id[256] = "";
>>       g_autofree char *path = NULL, *oid = NULL;
>>       struct vfio_region_info *info;
>> +    uint64_t mig_flags;
>
> GCC 12 reports :
>
>   ../hw/vfio/migration.c: In function ‘vfio_migration_init’:
>   ../hw/vfio/migration.c:1170:25: error: ‘mig_flags’ may be used 
> uninitialized [-Werror=maybe-uninitialized]
>    1170 |         if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
>         |              ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
>   ../hw/vfio/migration.c:1155:14: note: ‘mig_flags’ was declared here
>    1155 |     uint64_t mig_flags;
>         |              ^~~~~~~~~
>   cc1: all warnings being treated as errors
>
> You will need a little fix for that. May be it can be done at merge 
> time :
>
> -    uint64_t mig_flags;
> +    uint64_t mig_flags = 0;
>
> Thanks,
>
> C.
>
>>
>>       if (!vbasedev->ops->vfio_get_object) {
>>           return -EINVAL;
>> @@ -807,34 +1163,50 @@ static int vfio_migration_init(VFIODevice 
>> *vbasedev)
>>           return -EINVAL;
>>       }
>>
>> -    ret = vfio_get_dev_region_info(vbasedev,
>> - VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> - VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> -                                   &info);
>> -    if (ret) {
>> -        return ret;
>> -    }
>> +    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
>> +    if (!ret) {
>> +        /* Migration v2 */
>> +        /* Basic migration functionality must be supported */
>> +        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
>> +            return -EOPNOTSUPP;
>> +        }
>> +        vbasedev->migration = g_new0(VFIOMigration, 1);
>> +        vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
>> +        vbasedev->migration->data_fd = -1;
>> +        vbasedev->migration->v2 = true;
>> +    } else if (ret == -ENOTTY) {
>> +        /* Migration v1 */
>> +        ret = vfio_get_dev_region_info(vbasedev,
>> + VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> + VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> +                                       &info);
>> +        if (ret) {
>> +            return ret;
>> +        }
>>
>> -    vbasedev->migration = g_new0(VFIOMigration, 1);
>> -    vbasedev->migration->device_state_v1 = 
>> VFIO_DEVICE_STATE_V1_RUNNING;
>> -    vbasedev->migration->vm_running = runstate_is_running();
>> +        vbasedev->migration = g_new0(VFIOMigration, 1);
>> +        vbasedev->migration->device_state_v1 = 
>> VFIO_DEVICE_STATE_V1_RUNNING;
>> +        vbasedev->migration->vm_running = runstate_is_running();
>>
>> -    ret = vfio_region_setup(obj, vbasedev, 
>> &vbasedev->migration->region,
>> -                            info->index, "migration");
>> -    if (ret) {
>> -        error_report("%s: Failed to setup VFIO migration region %d: 
>> %s",
>> -                     vbasedev->name, info->index, strerror(-ret));
>> -        goto err;
>> -    }
>> +        ret = vfio_region_setup(obj, vbasedev, 
>> &vbasedev->migration->region,
>> +                                info->index, "migration");
>> +        if (ret) {
>> +            error_report("%s: Failed to setup VFIO migration region 
>> %d: %s",
>> +                         vbasedev->name, info->index, strerror(-ret));
>> +            goto err;
>> +        }
>>
>> -    if (!vbasedev->migration->region.size) {
>> -        error_report("%s: Invalid zero-sized VFIO migration region %d",
>> -                     vbasedev->name, info->index);
>> -        ret = -EINVAL;
>> -        goto err;
>> -    }
>> +        if (!vbasedev->migration->region.size) {
>> +            error_report("%s: Invalid zero-sized VFIO migration 
>> region %d",
>> +                         vbasedev->name, info->index);
>> +            ret = -EINVAL;
>> +            goto err;
>> +        }
>>
>> -    g_free(info);
>> +        g_free(info);
>> +    } else {
>> +        return ret;
>> +    }
>>
>>       migration = vbasedev->migration;
>>       migration->vbasedev = vbasedev;
>> @@ -847,11 +1219,20 @@ static int vfio_migration_init(VFIODevice 
>> *vbasedev)
>>       }
>>       strpadcpy(id, sizeof(id), path, '\0');
>>
>> -    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
>> -                         &savevm_vfio_v1_handlers, vbasedev);
>> +    if (migration->v2) {
>> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
>> +                             &savevm_vfio_handlers, vbasedev);
>> +
>> +        migration->vm_state = qdev_add_vm_change_state_handler(
>> +            vbasedev->dev, vfio_vmstate_change, vbasedev);
>> +    } else {
>> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
>> +                             &savevm_vfio_v1_handlers, vbasedev);
>> +
>> +        migration->vm_state = qdev_add_vm_change_state_handler(
>> +            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
>> +    }
>>
>> -    migration->vm_state = qdev_add_vm_change_state_handler(
>> -        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
>>       migration->migration_state.notify = vfio_migration_state_notifier;
>> add_migration_state_change_notifier(&migration->migration_state);
>>       return 0;
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 9eb9aff295..e7b3b4658c 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -149,20 +149,27 @@ vfio_display_edid_write_error(void) ""
>>
>>   # migration.c
>>   vfio_migration_probe(const char *name) " (%s)"
>> +vfio_migration_set_state(const char *name, const char *state) " (%s) 
>> state %s"
>>   vfio_migration_v1_set_state(const char *name, uint32_t state) " 
>> (%s) state %d"
>> +vfio_vmstate_change(const char *name, int running, const char 
>> *reason, const char *dev_state) " (%s) running %d reason %s device 
>> state %s"
>>   vfio_v1_vmstate_change(const char *name, int running, const char 
>> *reason, uint32_t dev_state) " (%s) running %d reason %s device state 
>> %d"
>>   vfio_migration_state_notifier(const char *name, const char *state) 
>> " (%s) state %s"
>> +vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) 
>> data buffer size 0x%"PRIx64
>>   vfio_v1_save_setup(const char *name) " (%s)"
>>   vfio_save_cleanup(const char *name) " (%s)"
>>   vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t 
>> data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 
>> 0x%"PRIx64" pending 0x%"PRIx64
>>   vfio_update_pending(const char *name, uint64_t pending) " (%s) 
>> pending 0x%"PRIx64
>>   vfio_save_device_config_state(const char *name) " (%s)"
>> +vfio_save_pending(const char *name, uint64_t precopy, uint64_t 
>> postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 
>> 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 
>> 0x%"PRIx64
>>   vfio_v1_save_pending(const char *name, uint64_t precopy, uint64_t 
>> postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 
>> 0x%"PRIx64" compatible 0x%"PRIx64
>>   vfio_save_iterate(const char *name, int data_size) " (%s) data_size 
>> %d"
>> +vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
>>   vfio_v1_save_complete_precopy(const char *name) " (%s)"
>>   vfio_load_device_config_state(const char *name) " (%s)"
>>   vfio_load_state(const char *name, uint64_t data) " (%s) data 
>> 0x%"PRIx64
>>   vfio_v1_load_state_device_data(const char *name, uint64_t 
>> data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 
>> 0x%"PRIx64
>> +vfio_load_state_device_data(const char *name, uint64_t data_size, 
>> int ret) " (%s) size 0x%"PRIx64" ret %d"
>>   vfio_load_cleanup(const char *name) " (%s)"
>>   vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, 
>> uint64_t bitmap_size, uint64_t start) "container fd=%d, 
>> iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" 
>> start=0x%"PRIx64
>>   vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) 
>> "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
>> +vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-01-15 18:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12  8:50 [PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-01-12  8:50 ` [PATCH v6 01/13] linux-headers: Update to v6.2-rc1 Avihai Horon
2023-01-12  8:50 ` [PATCH v6 02/13] migration: No save_live_pending() method uses the QEMUFile parameter Avihai Horon
2023-01-12  8:50 ` [PATCH v6 03/13] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
2023-01-12  8:50 ` [PATCH v6 04/13] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support Avihai Horon
2023-01-12  8:50 ` [PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
2023-01-12 14:43   ` Cédric Le Goater
2023-01-12  8:50 ` [PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
2023-01-13 15:46   ` Cédric Le Goater
2023-01-12  8:50 ` [PATCH v6 07/13] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
2023-01-12  8:50 ` [PATCH v6 08/13] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
2023-01-12  8:50 ` [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-01-13 16:13   ` Cédric Le Goater
2023-01-15 18:42     ` Avihai Horon
2023-01-12  8:50 ` [PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending() Avihai Horon
2023-01-13 16:05   ` Cédric Le Goater
2023-01-12  8:50 ` [PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
2023-01-13 16:06   ` Cédric Le Goater
2023-01-12  8:50 ` [PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
2023-01-13 16:06   ` Cédric Le Goater
2023-01-12  8:50 ` [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon
2023-01-13 17:12   ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.