All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device
  2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
@ 2018-03-05  6:00 ` no-reply
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO subregion for mdev device migration support Yulei Zhang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: no-reply @ 2018-03-05  6:00 UTC (permalink / raw)
  To: yulei.zhang
  Cc: famz, qemu-devel, kevin.tian, alex.williamson, kwankhede, zhenyuw

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1520229653-10658-1-git-send-email-yulei.zhang@intel.com
Subject: [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/1520229653-10658-1-git-send-email-yulei.zhang@intel.com -> patchew/1520229653-10658-1-git-send-email-yulei.zhang@intel.com
Switched to a new branch 'test'
c4092ae00e vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
a6090fffc5 vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore
74f1f045a9 vfio: Add vm status change callback to stop/restart the mdev device
b32825279c vfio: introduce a new VFIO subregion for mdev device migration support

=== OUTPUT BEGIN ===
Checking PATCH 1/4: vfio: introduce a new VFIO subregion for mdev device migration support...
ERROR: initializer for struct VMStateDescription should normally be const
#49: FILE: hw/vfio/pci.c:3172:
+static VMStateDescription vfio_pci_vmstate = {

total: 1 errors, 0 warnings, 54 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/4: vfio: Add vm status change callback to stop/restart the mdev device...
Checking PATCH 3/4: vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore...
Checking PATCH 4/4: vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device
@ 2018-03-05  6:00 Yulei Zhang
  2018-03-05  6:00 ` no-reply
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Yulei Zhang @ 2018-03-05  6:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: kevin.tian, zhenyuw, kwankhede, alex.williamson, Yulei Zhang

Summary

This series RFC would like to resume the discussion about how to
introduce the live migration capability to vfio mdev device. 

By adding a new vfio subtype region VFIO_REGION_SUBTYPE_DEVICE_STATE,
the mdev device will be set to migratable if the new region exist
during the initialization.  

The intention to add the new region is using it for mdev device status
save and restore during the migration. The access to this region
will be trapped and forward to the mdev device driver, it also uses 
the first byte in the new region to control the running state of mdev
device, so during the migration after stop the mdev driver, qemu could
retrieve the specific device status from this region and transfer to 
the target VM side for the mdev device restore.

In addition,  we add one new ioctl VFIO_IOMMU_GET_DIRTY_BITMAP to help do 
the mdev device dirty page synchronization during the migration, currently
it is just for static copy, in the future we would like to add new interface
for the pre-copy.

Below is the vfio_mdev device migration sequence
Source VM side:
			start migration
				|
				V
        	 get the cpu state change callback, write to the
        	 subregion's first byte to stop the mdev device
				|
				V
		 quary the dirty page bitmap from iommu container 
		 and add into qemu dirty list for synchronization
				|
				V
		 save the deivce status into Qemufile which is 
                     read from the vfio device subregion

Target VM side:
		   restore the mdev device after get the
		     saved status context from Qemufile
				|
				V
		     get the cpu state change callback
 		     write to subregion's first byte to 
                      start the mdev device to put it in 
                      running status
				|
				V
			finish migration

V3->V2:
1. rebase the patch to Qemu stable 2.10 branch.
2. use a common name for the subregion instead of specific for 
   intel IGD.

V1->V2:
Per Alex's suggestion:
1. use device subtype region instead of VFIO PCI fixed region.
2. remove unnecessary ioctl, use the first byte of subregion to 
   control the running state of mdev device.  
3. for dirty page synchronization, implement the interface with
   VFIOContainer instead of vfio pci device.

Yulei Zhang (4):
  vfio: introduce a new VFIO subregion for mdev device migration support
  vfio: Add vm status change callback to stop/restart the mdev device
  vfio: Add struct vfio_vmstate_info to introduce put/get callback
    funtion for vfio device status save/restore
  vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP

 hw/vfio/common.c              |  34 +++++++++
 hw/vfio/pci.c                 | 171 +++++++++++++++++++++++++++++++++++++++++-
 hw/vfio/pci.h                 |   1 +
 include/hw/vfio/vfio-common.h |   1 +
 linux-headers/linux/vfio.h    |  29 ++++++-
 5 files changed, 232 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO subregion for mdev device migration support
  2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
  2018-03-05  6:00 ` no-reply
@ 2018-03-05  6:00 ` Yulei Zhang
  2018-03-09 11:43   ` Dr. David Alan Gilbert
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 2/4] vfio: Add vm status change callback to stop/restart the mdev device Yulei Zhang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yulei Zhang @ 2018-03-05  6:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: kevin.tian, zhenyuw, kwankhede, alex.williamson, Yulei Zhang

New VFIO sub region VFIO_REGION_SUBTYPE_DEVICE_STATE is added
to fetch and restore the status of mdev device vGPU during the
live migration.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 hw/vfio/pci.c              | 14 +++++++++++++-
 hw/vfio/pci.h              |  1 +
 linux-headers/linux/vfio.h |  9 ++++++---
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 31e1edf..2fe20e4 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -37,6 +37,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static VMStateDescription vfio_pci_vmstate;
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -2813,6 +2814,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         vfio_vga_quirk_setup(vdev);
     }
 
+    struct vfio_region_info *device_state;
+    /* device state region setup */
+    if (!vfio_get_dev_region_info(&vdev->vbasedev,
+                VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
+                VFIO_REGION_SUBTYPE_DEVICE_STATE, &device_state)) {
+        memcpy(&vdev->device_state, device_state,
+               sizeof(struct vfio_region_info));
+        g_free(device_state);
+        vfio_pci_vmstate.unmigratable = 0;
+    }
+
     for (i = 0; i < PCI_ROM_SLOT; i++) {
         vfio_bar_quirk_setup(vdev, i);
     }
@@ -2994,7 +3006,7 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static const VMStateDescription vfio_pci_vmstate = {
+static VMStateDescription vfio_pci_vmstate = {
     .name = "vfio-pci",
     .unmigratable = 1,
 };
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index a8366bb..6a1d26e 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -116,6 +116,7 @@ typedef struct VFIOPCIDevice {
     VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
     VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
     void *igd_opregion;
+    struct vfio_region_info device_state;
     PCIHostDeviceAddress host;
     EventNotifier err_notifier;
     EventNotifier req_notifier;
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 4e7ab4c..c3b8e4a 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -296,9 +296,12 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK	(0xffff)
 
 /* 8086 Vendor sub-types */
-#define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION	(1)
-#define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG	(2)
-#define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG	(3)
+#define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION		(1)
+#define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG		(2)
+#define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG		(3)
+
+/* Mdev sub-type for device state save and restore */
+#define VFIO_REGION_SUBTYPE_DEVICE_STATE	(4)
 
 /**
  * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 2/4] vfio: Add vm status change callback to stop/restart the mdev device
  2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
  2018-03-05  6:00 ` no-reply
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO subregion for mdev device migration support Yulei Zhang
@ 2018-03-05  6:00 ` Yulei Zhang
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore Yulei Zhang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Yulei Zhang @ 2018-03-05  6:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: kevin.tian, zhenyuw, kwankhede, alex.williamson, Yulei Zhang

VM status change handler is added to change the vfio pci device
status during the migration, write the demanded device status
to the DEVICE STATUS subregion to stop the device on the source side
before fetch its status and start the deivce on the target side
after restore its status.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 hw/vfio/pci.c                 | 20 ++++++++++++++++++++
 include/hw/vfio/vfio-common.h |  1 +
 linux-headers/linux/vfio.h    |  3 +++
 3 files changed, 24 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2fe20e4..3e2289c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -38,6 +38,7 @@
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 static VMStateDescription vfio_pci_vmstate;
+static void vfio_vm_change_state_handler(void *pv, int running, RunState state);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -2880,6 +2881,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_err_notifier(vdev);
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
+    qemu_add_vm_change_state_handler(vfio_vm_change_state_handler, vdev);
 
     return;
 
@@ -2962,6 +2964,24 @@ post_reset:
     vfio_pci_post_reset(vdev);
 }
 
+static void vfio_vm_change_state_handler(void *pv, int running, RunState state)
+{
+    VFIOPCIDevice *vdev = pv;
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    uint8_t dev_state;
+    uint8_t sz = 1;
+
+    dev_state = running ? VFIO_DEVICE_START : VFIO_DEVICE_STOP;
+
+    if (pwrite(vdev->vbasedev.fd, &dev_state,
+               sz, vdev->device_state.offset) != sz) {
+        error_report("vfio: Failed to %s device", running ? "start" : "stop");
+        return;
+    }
+
+    vbasedev->device_state = dev_state;
+}
+
 static void vfio_instance_init(Object *obj)
 {
     PCIDevice *pci_dev = PCI_DEVICE(obj);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f3a2ac9..9c14a8f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -125,6 +125,7 @@ typedef struct VFIODevice {
     unsigned int num_irqs;
     unsigned int num_regions;
     unsigned int flags;
+    bool device_state;
 } VFIODevice;
 
 struct VFIODeviceOps {
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index c3b8e4a..4ddeebc 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -303,6 +303,9 @@ struct vfio_region_info_cap_type {
 /* Mdev sub-type for device state save and restore */
 #define VFIO_REGION_SUBTYPE_DEVICE_STATE	(4)
 
+#define VFIO_DEVICE_START	0
+#define VFIO_DEVICE_STOP	1
+
 /**
  * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
  *				    struct vfio_irq_info)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore
  2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
                   ` (2 preceding siblings ...)
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 2/4] vfio: Add vm status change callback to stop/restart the mdev device Yulei Zhang
@ 2018-03-05  6:00 ` Yulei Zhang
  2018-03-09 11:59   ` Dr. David Alan Gilbert
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 4/4] vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP Yulei Zhang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yulei Zhang @ 2018-03-05  6:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: kevin.tian, zhenyuw, kwankhede, alex.williamson, Yulei Zhang

Introduce vfio_device_put/vfio_device_get funtion for vfio device state
save/restore usage.

For VFIO pci device status migrate, on the source side with
funtion vfio_device_put to save the following states
1. pci configuration space addr0~addr5
2. pci configuration space msi_addr msi_data
3. pci device status fetch from device driver

And on the target side with funtion vfio_device_get to restore
the same states
1. re-setup the pci bar configuration
2. re-setup the pci device msi configuration
3. restore the pci device status

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 hw/vfio/pci.c              | 137 +++++++++++++++++++++++++++++++++++++++++++++
 linux-headers/linux/vfio.h |   3 +
 2 files changed, 140 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3e2289c..c1676cf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2982,6 +2982,123 @@ static void vfio_vm_change_state_handler(void *pv, int running, RunState state)
     vbasedev->device_state = dev_state;
 }
 
+static int vfio_device_put(QEMUFile *f, void *pv, size_t size,
+                           VMStateField *field, QJSON *vmdesc)
+{
+    VFIOPCIDevice *vdev = pv;
+    PCIDevice *pdev = &vdev->pdev;
+    int sz = vdev->device_state.size - VFIO_DEVICE_STATE_OFFSET;
+    uint8_t *buf = NULL;
+    uint32_t msi_cfg, msi_lo, msi_hi, msi_data, bar_cfg, i;
+    bool msi_64bit;
+
+    for (i = 0; i < PCI_ROM_SLOT; i++) {
+        bar_cfg = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
+        qemu_put_be32(f, bar_cfg);
+    }
+
+    msi_cfg = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2);
+    msi_64bit = !!(msi_cfg & PCI_MSI_FLAGS_64BIT);
+
+    msi_lo = pci_default_read_config(pdev,
+                                     pdev->msi_cap + PCI_MSI_ADDRESS_LO, 4);
+    qemu_put_be32(f, msi_lo);
+
+    if (msi_64bit) {
+        msi_hi = pci_default_read_config(pdev,
+                                         pdev->msi_cap + PCI_MSI_ADDRESS_HI,
+                                         4);
+        qemu_put_be32(f, msi_hi);
+    }
+
+    msi_data = pci_default_read_config(pdev,
+               pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
+               2);
+    qemu_put_be32(f, msi_data);
+
+    buf = g_malloc(sz);
+    if (buf == NULL) {
+        error_report("vfio: Failed to allocate memory for migrate");
+        goto exit;
+    }
+
+    if (pread(vdev->vbasedev.fd, buf, sz,
+              vdev->device_state.offset + VFIO_DEVICE_STATE_OFFSET) != sz) {
+        error_report("vfio: Failed to read Device State Region");
+        goto exit;
+    }
+
+    qemu_put_buffer(f, buf, sz);
+
+exit:
+    g_free(buf);
+
+    return 0;
+}
+
+static int vfio_device_get(QEMUFile *f, void *pv,
+                           size_t size, VMStateField *field)
+{
+    VFIOPCIDevice *vdev = pv;
+    PCIDevice *pdev = &vdev->pdev;
+    int sz = vdev->device_state.size - VFIO_DEVICE_STATE_OFFSET;
+    uint8_t *buf = NULL;
+    uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i;
+    bool msi_64bit;
+
+    /* retore pci bar configuration */
+    ctl = pci_default_read_config(pdev, PCI_COMMAND, 2);
+    vfio_pci_write_config(pdev, PCI_COMMAND,
+                          ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
+    for (i = 0; i < PCI_ROM_SLOT; i++) {
+        bar_cfg = qemu_get_be32(f);
+        vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar_cfg, 4);
+    }
+    vfio_pci_write_config(pdev, PCI_COMMAND,
+                          ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 2);
+
+    /* restore msi configuration */
+    ctl = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2);
+    msi_64bit = !!(ctl & PCI_MSI_FLAGS_64BIT);
+
+    vfio_pci_write_config(&vdev->pdev,
+                          pdev->msi_cap + PCI_MSI_FLAGS,
+                          ctl & (!PCI_MSI_FLAGS_ENABLE), 2);
+
+    msi_lo = qemu_get_be32(f);
+    vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO, msi_lo, 4);
+
+    if (msi_64bit) {
+        msi_hi = qemu_get_be32(f);
+        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI,
+                              msi_hi, 4);
+    }
+    msi_data = qemu_get_be32(f);
+    vfio_pci_write_config(pdev,
+              pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
+              msi_data, 2);
+
+    vfio_pci_write_config(&vdev->pdev, pdev->msi_cap + PCI_MSI_FLAGS,
+                          ctl | PCI_MSI_FLAGS_ENABLE, 2);
+
+    buf = g_malloc(sz);
+    if (buf == NULL) {
+        error_report("vfio: Failed to allocate memory for migrate");
+        return -1;
+    }
+
+    qemu_get_buffer(f, buf, sz);
+    if (pwrite(vdev->vbasedev.fd, buf, sz,
+               vdev->device_state.offset + VFIO_DEVICE_STATE_OFFSET) != sz) {
+        error_report("vfio: Failed to write Device State Region");
+        return -1;
+    }
+
+    g_free(buf);
+
+    return 0;
+}
+
 static void vfio_instance_init(Object *obj)
 {
     PCIDevice *pci_dev = PCI_DEVICE(obj);
@@ -3026,9 +3143,29 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateInfo vfio_vmstate_info = {
+    .name = "vfio-state",
+    .get = vfio_device_get,
+    .put = vfio_device_put,
+};
+
 static VMStateDescription vfio_pci_vmstate = {
     .name = "vfio-pci",
     .unmigratable = 1,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        {
+            .name         = "vfio dev",
+            .version_id   = 0,
+            .field_exists = NULL,
+            .size         = 0,
+            .info         = &vfio_vmstate_info,
+            .flags        = VMS_SINGLE,
+            .offset       = 0,
+         },
+        VMSTATE_END_OF_LIST()
+    },
 };
 
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 4ddeebc..4451a8f 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -303,6 +303,9 @@ struct vfio_region_info_cap_type {
 /* Mdev sub-type for device state save and restore */
 #define VFIO_REGION_SUBTYPE_DEVICE_STATE	(4)
 
+/* Offset in region to save device state */
+#define VFIO_DEVICE_STATE_OFFSET	1
+
 #define VFIO_DEVICE_START	0
 #define VFIO_DEVICE_STOP	1
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 4/4] vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
  2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
                   ` (3 preceding siblings ...)
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore Yulei Zhang
@ 2018-03-05  6:00 ` Yulei Zhang
  2018-03-05 13:02 ` [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Kirti Wankhede
  2018-03-12 22:21 ` Alex Williamson
  6 siblings, 0 replies; 15+ messages in thread
From: Yulei Zhang @ 2018-03-05  6:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: kevin.tian, zhenyuw, kwankhede, alex.williamson, Yulei Zhang

New VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP is used to fetch the
bitmap of pinned memory in iommu container, we need copy those
memory to the target during the migration as they are dirtied by
mdev devices.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 hw/vfio/common.c           | 34 ++++++++++++++++++++++++++++++++++
 linux-headers/linux/vfio.h | 14 ++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c..a952554 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -35,6 +35,7 @@
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "exec/ram_addr.h"
 
 struct vfio_group_head vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -624,9 +625,42 @@ static void vfio_listener_region_del(MemoryListener *listener,
     }
 }
 
+static void vfio_log_sync(MemoryListener *listener,
+                          MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    VFIOGroup *group = QLIST_FIRST(&container->group_list);
+    VFIODevice *vbasedev;
+    QLIST_FOREACH(vbasedev, &group->device_list, next) {
+        if (vbasedev->device_state == VFIO_DEVICE_START) {
+            return;
+        }
+    }
+
+    struct vfio_iommu_get_dirty_bitmap *d;
+    ram_addr_t size = int128_get64(section->size);
+    unsigned long page_nr = size >> TARGET_PAGE_BITS;
+    unsigned long bitmap_size =
+                    (BITS_TO_LONGS(page_nr) + 1) * sizeof(unsigned long);
+    d = g_malloc0(sizeof(*d) + bitmap_size);
+    d->start_addr = section->offset_within_address_space;
+    d->page_nr = page_nr;
+
+    if (ioctl(container->fd, VFIO_IOMMU_GET_DIRTY_BITMAP, d)) {
+        error_report("vfio: Failed to fetch dirty pages for migration");
+        goto exit;
+    }
+
+    cpu_physical_memory_set_dirty_lebitmap((unsigned long *)&d->dirty_bitmap,
+                                           d->start_addr, d->page_nr);
+exit:
+    g_free(d);
+}
+
 static const MemoryListener vfio_memory_listener = {
     .region_add = vfio_listener_region_add,
     .region_del = vfio_listener_region_del,
+    .log_sync = vfio_log_sync,
 };
 
 static void vfio_listener_release(VFIOContainer *container)
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 4451a8f..a41f73b 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -574,6 +574,20 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_GET_DIRTY_BITMAP - _IOW(VFIO_TYPE, VFIO_BASE + 17,
+ *				    struct vfio_iommu_get_dirty_bitmap)
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_iommu_get_dirty_bitmap {
+	__u64	       start_addr;
+	__u64	       page_nr;
+	__u8           dirty_bitmap[];
+};
+
+#define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device
  2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
                   ` (4 preceding siblings ...)
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 4/4] vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP Yulei Zhang
@ 2018-03-05 13:02 ` Kirti Wankhede
  2018-03-06 13:34   ` Zhang, Yulei
  2018-03-12 22:21 ` Alex Williamson
  6 siblings, 1 reply; 15+ messages in thread
From: Kirti Wankhede @ 2018-03-05 13:02 UTC (permalink / raw)
  To: Yulei Zhang, qemu-devel; +Cc: kevin.tian, zhenyuw, alex.williamson

Hi Yulei Zhang,

This series is same as the previous version, that is, there is no
pre-copy phase. This only takes care of stop-and-copy phase.
As per we discussed in KVM Forum 2017 in October, there should be
provision of pre-copy phase.

Thanks,
Kirti

On 3/5/2018 11:30 AM, Yulei Zhang wrote:
> Summary
> 
> This series RFC would like to resume the discussion about how to
> introduce the live migration capability to vfio mdev device. 
> 
> By adding a new vfio subtype region VFIO_REGION_SUBTYPE_DEVICE_STATE,
> the mdev device will be set to migratable if the new region exist
> during the initialization.  
> 
> The intention to add the new region is using it for mdev device status
> save and restore during the migration. The access to this region
> will be trapped and forward to the mdev device driver, it also uses 
> the first byte in the new region to control the running state of mdev
> device, so during the migration after stop the mdev driver, qemu could
> retrieve the specific device status from this region and transfer to 
> the target VM side for the mdev device restore.
> 
> In addition,  we add one new ioctl VFIO_IOMMU_GET_DIRTY_BITMAP to help do 
> the mdev device dirty page synchronization during the migration, currently
> it is just for static copy, in the future we would like to add new interface
> for the pre-copy.
> 
> Below is the vfio_mdev device migration sequence
> Source VM side:
> 			start migration
> 				|
> 				V
>         	 get the cpu state change callback, write to the
>         	 subregion's first byte to stop the mdev device
> 				|
> 				V
> 		 quary the dirty page bitmap from iommu container 
> 		 and add into qemu dirty list for synchronization
> 				|
> 				V
> 		 save the deivce status into Qemufile which is 
>                      read from the vfio device subregion
> 
> Target VM side:
> 		   restore the mdev device after get the
> 		     saved status context from Qemufile
> 				|
> 				V
> 		     get the cpu state change callback
>  		     write to subregion's first byte to 
>                       start the mdev device to put it in 
>                       running status
> 				|
> 				V
> 			finish migration
> 
> V3->V2:
> 1. rebase the patch to Qemu stable 2.10 branch.
> 2. use a common name for the subregion instead of specific for 
>    intel IGD.
> 
> V1->V2:
> Per Alex's suggestion:
> 1. use device subtype region instead of VFIO PCI fixed region.
> 2. remove unnecessary ioctl, use the first byte of subregion to 
>    control the running state of mdev device.  
> 3. for dirty page synchronization, implement the interface with
>    VFIOContainer instead of vfio pci device.
> 
> Yulei Zhang (4):
>   vfio: introduce a new VFIO subregion for mdev device migration support
>   vfio: Add vm status change callback to stop/restart the mdev device
>   vfio: Add struct vfio_vmstate_info to introduce put/get callback
>     funtion for vfio device status save/restore
>   vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
> 
>  hw/vfio/common.c              |  34 +++++++++
>  hw/vfio/pci.c                 | 171 +++++++++++++++++++++++++++++++++++++++++-
>  hw/vfio/pci.h                 |   1 +
>  include/hw/vfio/vfio-common.h |   1 +
>  linux-headers/linux/vfio.h    |  29 ++++++-
>  5 files changed, 232 insertions(+), 4 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device
  2018-03-05 13:02 ` [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Kirti Wankhede
@ 2018-03-06 13:34   ` Zhang, Yulei
  2018-03-07  2:11     ` Tian, Kevin
  0 siblings, 1 reply; 15+ messages in thread
From: Zhang, Yulei @ 2018-03-06 13:34 UTC (permalink / raw)
  To: Kirti Wankhede, qemu-devel; +Cc: Tian, Kevin, zhenyuw, alex.williamson

Hi Kirti,

Yes, that is the plan and we will address it in the coming versions.
In this version we just rebase the code and looking for more inputs.

Thanks, 
Yulei

> -----Original Message-----
> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> Sent: Monday, March 5, 2018 9:03 PM
> To: Zhang, Yulei <yulei.zhang@intel.com>; qemu-devel@nongnu.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; zhenyuw@linux.intel.com;
> alex.williamson@redhat.com
> Subject: Re: [PATCH V3 0/4] vfio: Introduce Live migration capability to
> vfio_mdev device
> 
> Hi Yulei Zhang,
> 
> This series is same as the previous version, that is, there is no pre-copy
> phase. This only takes care of stop-and-copy phase.
> As per we discussed in KVM Forum 2017 in October, there should be
> provision of pre-copy phase.
> 
> Thanks,
> Kirti
> 
> On 3/5/2018 11:30 AM, Yulei Zhang wrote:
> > Summary
> >
> > This series RFC would like to resume the discussion about how to
> > introduce the live migration capability to vfio mdev device.
> >
> > By adding a new vfio subtype region
> VFIO_REGION_SUBTYPE_DEVICE_STATE,
> > the mdev device will be set to migratable if the new region exist
> > during the initialization.
> >
> > The intention to add the new region is using it for mdev device status
> > save and restore during the migration. The access to this region will
> > be trapped and forward to the mdev device driver, it also uses the
> > first byte in the new region to control the running state of mdev
> > device, so during the migration after stop the mdev driver, qemu could
> > retrieve the specific device status from this region and transfer to
> > the target VM side for the mdev device restore.
> >
> > In addition,  we add one new ioctl VFIO_IOMMU_GET_DIRTY_BITMAP to
> help
> > do the mdev device dirty page synchronization during the migration,
> > currently it is just for static copy, in the future we would like to
> > add new interface for the pre-copy.
> >
> > Below is the vfio_mdev device migration sequence Source VM side:
> > 			start migration
> > 				|
> > 				V
> >         	 get the cpu state change callback, write to the
> >         	 subregion's first byte to stop the mdev device
> > 				|
> > 				V
> > 		 quary the dirty page bitmap from iommu container
> > 		 and add into qemu dirty list for synchronization
> > 				|
> > 				V
> > 		 save the deivce status into Qemufile which is
> >                      read from the vfio device subregion
> >
> > Target VM side:
> > 		   restore the mdev device after get the
> > 		     saved status context from Qemufile
> > 				|
> > 				V
> > 		     get the cpu state change callback
> >  		     write to subregion's first byte to
> >                       start the mdev device to put it in
> >                       running status
> > 				|
> > 				V
> > 			finish migration
> >
> > V3->V2:
> > 1. rebase the patch to Qemu stable 2.10 branch.
> > 2. use a common name for the subregion instead of specific for
> >    intel IGD.
> >
> > V1->V2:
> > Per Alex's suggestion:
> > 1. use device subtype region instead of VFIO PCI fixed region.
> > 2. remove unnecessary ioctl, use the first byte of subregion to
> >    control the running state of mdev device.
> > 3. for dirty page synchronization, implement the interface with
> >    VFIOContainer instead of vfio pci device.
> >
> > Yulei Zhang (4):
> >   vfio: introduce a new VFIO subregion for mdev device migration support
> >   vfio: Add vm status change callback to stop/restart the mdev device
> >   vfio: Add struct vfio_vmstate_info to introduce put/get callback
> >     funtion for vfio device status save/restore
> >   vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
> >
> >  hw/vfio/common.c              |  34 +++++++++
> >  hw/vfio/pci.c                 | 171
> +++++++++++++++++++++++++++++++++++++++++-
> >  hw/vfio/pci.h                 |   1 +
> >  include/hw/vfio/vfio-common.h |   1 +
> >  linux-headers/linux/vfio.h    |  29 ++++++-
> >  5 files changed, 232 insertions(+), 4 deletions(-)
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device
  2018-03-06 13:34   ` Zhang, Yulei
@ 2018-03-07  2:11     ` Tian, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Tian, Kevin @ 2018-03-07  2:11 UTC (permalink / raw)
  To: Zhang, Yulei, Kirti Wankhede, qemu-devel; +Cc: zhenyuw, alex.williamson

> From: Zhang, Yulei
> Sent: Tuesday, March 6, 2018 9:35 PM
> 
> Hi Kirti,
> 
> Yes, that is the plan and we will address it in the coming versions.
> In this version we just rebase the code and looking for more inputs.

It's not how a new version is expected to provide. For review
comments which you received from previous versions, you need
echo them in the new version where 'echo' means either fix and list
in change list or providing a TODO list for unhandled comments 
so reviewers know what to further look at. Also rebase usually
doesn't bear a new version...

btw when describing change list of version history, please use v2->v3
instead of vice versa.

Thanks
Kevin

> 
> Thanks,
> Yulei
> 
> > -----Original Message-----
> > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > Sent: Monday, March 5, 2018 9:03 PM
> > To: Zhang, Yulei <yulei.zhang@intel.com>; qemu-devel@nongnu.org
> > Cc: Tian, Kevin <kevin.tian@intel.com>; zhenyuw@linux.intel.com;
> > alex.williamson@redhat.com
> > Subject: Re: [PATCH V3 0/4] vfio: Introduce Live migration capability to
> > vfio_mdev device
> >
> > Hi Yulei Zhang,
> >
> > This series is same as the previous version, that is, there is no pre-copy
> > phase. This only takes care of stop-and-copy phase.
> > As per we discussed in KVM Forum 2017 in October, there should be
> > provision of pre-copy phase.
> >
> > Thanks,
> > Kirti
> >
> > On 3/5/2018 11:30 AM, Yulei Zhang wrote:
> > > Summary
> > >
> > > This series RFC would like to resume the discussion about how to
> > > introduce the live migration capability to vfio mdev device.
> > >
> > > By adding a new vfio subtype region
> > VFIO_REGION_SUBTYPE_DEVICE_STATE,
> > > the mdev device will be set to migratable if the new region exist
> > > during the initialization.
> > >
> > > The intention to add the new region is using it for mdev device status
> > > save and restore during the migration. The access to this region will
> > > be trapped and forward to the mdev device driver, it also uses the
> > > first byte in the new region to control the running state of mdev
> > > device, so during the migration after stop the mdev driver, qemu could
> > > retrieve the specific device status from this region and transfer to
> > > the target VM side for the mdev device restore.
> > >
> > > In addition,  we add one new ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
> to
> > help
> > > do the mdev device dirty page synchronization during the migration,
> > > currently it is just for static copy, in the future we would like to
> > > add new interface for the pre-copy.
> > >
> > > Below is the vfio_mdev device migration sequence Source VM side:
> > > 			start migration
> > > 				|
> > > 				V
> > >         	 get the cpu state change callback, write to the
> > >         	 subregion's first byte to stop the mdev device
> > > 				|
> > > 				V
> > > 		 quary the dirty page bitmap from iommu container
> > > 		 and add into qemu dirty list for synchronization
> > > 				|
> > > 				V
> > > 		 save the deivce status into Qemufile which is
> > >                      read from the vfio device subregion
> > >
> > > Target VM side:
> > > 		   restore the mdev device after get the
> > > 		     saved status context from Qemufile
> > > 				|
> > > 				V
> > > 		     get the cpu state change callback
> > >  		     write to subregion's first byte to
> > >                       start the mdev device to put it in
> > >                       running status
> > > 				|
> > > 				V
> > > 			finish migration
> > >
> > > V3->V2:
> > > 1. rebase the patch to Qemu stable 2.10 branch.
> > > 2. use a common name for the subregion instead of specific for
> > >    intel IGD.
> > >
> > > V1->V2:
> > > Per Alex's suggestion:
> > > 1. use device subtype region instead of VFIO PCI fixed region.
> > > 2. remove unnecessary ioctl, use the first byte of subregion to
> > >    control the running state of mdev device.
> > > 3. for dirty page synchronization, implement the interface with
> > >    VFIOContainer instead of vfio pci device.
> > >
> > > Yulei Zhang (4):
> > >   vfio: introduce a new VFIO subregion for mdev device migration
> support
> > >   vfio: Add vm status change callback to stop/restart the mdev device
> > >   vfio: Add struct vfio_vmstate_info to introduce put/get callback
> > >     funtion for vfio device status save/restore
> > >   vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
> > >
> > >  hw/vfio/common.c              |  34 +++++++++
> > >  hw/vfio/pci.c                 | 171
> > +++++++++++++++++++++++++++++++++++++++++-
> > >  hw/vfio/pci.h                 |   1 +
> > >  include/hw/vfio/vfio-common.h |   1 +
> > >  linux-headers/linux/vfio.h    |  29 ++++++-
> > >  5 files changed, 232 insertions(+), 4 deletions(-)
> > >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO subregion for mdev device migration support
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO subregion for mdev device migration support Yulei Zhang
@ 2018-03-09 11:43   ` Dr. David Alan Gilbert
  2018-03-13  8:27     ` Zhang, Yulei
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-09 11:43 UTC (permalink / raw)
  To: Yulei Zhang; +Cc: qemu-devel, kevin.tian, alex.williamson, kwankhede, zhenyuw

* Yulei Zhang (yulei.zhang@intel.com) wrote:
> New VFIO sub region VFIO_REGION_SUBTYPE_DEVICE_STATE is added
> to fetch and restore the status of mdev device vGPU during the
> live migration.
> 
> Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
> ---
>  hw/vfio/pci.c              | 14 +++++++++++++-
>  hw/vfio/pci.h              |  1 +
>  linux-headers/linux/vfio.h |  9 ++++++---
>  3 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 31e1edf..2fe20e4 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -37,6 +37,7 @@
>  
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> +static VMStateDescription vfio_pci_vmstate;
>  
>  /*
>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> @@ -2813,6 +2814,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          vfio_vga_quirk_setup(vdev);
>      }
>  
> +    struct vfio_region_info *device_state;
> +    /* device state region setup */
> +    if (!vfio_get_dev_region_info(&vdev->vbasedev,
> +                VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
> +                VFIO_REGION_SUBTYPE_DEVICE_STATE, &device_state)) {
> +        memcpy(&vdev->device_state, device_state,
> +               sizeof(struct vfio_region_info));
> +        g_free(device_state);
> +        vfio_pci_vmstate.unmigratable = 0;
> +    }

I don't think we've got any other code that changes the 'unmigratable'
flag on a vmstate.  If you've got a device that you don't know whether
it's migratable until runtime, then you should add a 'migration blocker'
if the device isn't migratable.
See for example the calls to migrate_add_blocker in hw/virtio/vhost.c

Dave

>      for (i = 0; i < PCI_ROM_SLOT; i++) {
>          vfio_bar_quirk_setup(vdev, i);
>      }
> @@ -2994,7 +3006,7 @@ static Property vfio_pci_dev_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> -static const VMStateDescription vfio_pci_vmstate = {
> +static VMStateDescription vfio_pci_vmstate = {
>      .name = "vfio-pci",
>      .unmigratable = 1,
>  };
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index a8366bb..6a1d26e 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -116,6 +116,7 @@ typedef struct VFIOPCIDevice {
>      VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
>      VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
>      void *igd_opregion;
> +    struct vfio_region_info device_state;
>      PCIHostDeviceAddress host;
>      EventNotifier err_notifier;
>      EventNotifier req_notifier;
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 4e7ab4c..c3b8e4a 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -296,9 +296,12 @@ struct vfio_region_info_cap_type {
>  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK	(0xffff)
>  
>  /* 8086 Vendor sub-types */
> -#define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION	(1)
> -#define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG	(2)
> -#define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG	(3)
> +#define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION		(1)
> +#define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG		(2)
> +#define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG		(3)
> +
> +/* Mdev sub-type for device state save and restore */
> +#define VFIO_REGION_SUBTYPE_DEVICE_STATE	(4)
>  
>  /**
>   * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore
  2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore Yulei Zhang
@ 2018-03-09 11:59   ` Dr. David Alan Gilbert
  2018-03-14  8:52     ` Zhang, Yulei
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-09 11:59 UTC (permalink / raw)
  To: Yulei Zhang; +Cc: qemu-devel, kevin.tian, alex.williamson, kwankhede, zhenyuw

* Yulei Zhang (yulei.zhang@intel.com) wrote:
> Introduce vfio_device_put/vfio_device_get funtion for vfio device state
> save/restore usage.
> 
> For VFIO pci device status migrate, on the source side with
> funtion vfio_device_put to save the following states
> 1. pci configuration space addr0~addr5
> 2. pci configuration space msi_addr msi_data
> 3. pci device status fetch from device driver
> 
> And on the target side with funtion vfio_device_get to restore
> the same states
> 1. re-setup the pci bar configuration
> 2. re-setup the pci device msi configuration
> 3. restore the pci device status
> 
> Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
> ---
>  hw/vfio/pci.c              | 137 +++++++++++++++++++++++++++++++++++++++++++++
>  linux-headers/linux/vfio.h |   3 +
>  2 files changed, 140 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 3e2289c..c1676cf 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2982,6 +2982,123 @@ static void vfio_vm_change_state_handler(void *pv, int running, RunState state)
>      vbasedev->device_state = dev_state;
>  }
>  
> +static int vfio_device_put(QEMUFile *f, void *pv, size_t size,
> +                           VMStateField *field, QJSON *vmdesc)
> +{
> +    VFIOPCIDevice *vdev = pv;
> +    PCIDevice *pdev = &vdev->pdev;
> +    int sz = vdev->device_state.size - VFIO_DEVICE_STATE_OFFSET;
> +    uint8_t *buf = NULL;
> +    uint32_t msi_cfg, msi_lo, msi_hi, msi_data, bar_cfg, i;
> +    bool msi_64bit;
> +
> +    for (i = 0; i < PCI_ROM_SLOT; i++) {
> +        bar_cfg = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
> +        qemu_put_be32(f, bar_cfg);
> +    }
> +
> +    msi_cfg = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2);
> +    msi_64bit = !!(msi_cfg & PCI_MSI_FLAGS_64BIT);
> +
> +    msi_lo = pci_default_read_config(pdev,
> +                                     pdev->msi_cap + PCI_MSI_ADDRESS_LO, 4);
> +    qemu_put_be32(f, msi_lo);
> +
> +    if (msi_64bit) {
> +        msi_hi = pci_default_read_config(pdev,
> +                                         pdev->msi_cap + PCI_MSI_ADDRESS_HI,
> +                                         4);
> +        qemu_put_be32(f, msi_hi);
> +    }
> +
> +    msi_data = pci_default_read_config(pdev,
> +               pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
> +               2);
> +    qemu_put_be32(f, msi_data);
> +
> +    buf = g_malloc(sz);
> +    if (buf == NULL) {
> +        error_report("vfio: Failed to allocate memory for migrate");
> +        goto exit;

Yet exit does a 'return 0' - so we've not failed the migration and we've
generated a broken migration stream.
(same in the other exit cases)

> +    }
> +
> +    if (pread(vdev->vbasedev.fd, buf, sz,
> +              vdev->device_state.offset + VFIO_DEVICE_STATE_OFFSET) != sz) {
> +        error_report("vfio: Failed to read Device State Region");
> +        goto exit;
> +    }
> +
> +    qemu_put_buffer(f, buf, sz);
> +
> +exit:
> +    g_free(buf);
> +
> +    return 0;
> +}

We really try and avoid qemu_put_* and qemu_get* as much as possible
these days.
Please try building a datastructure (possible in a pre_save) and then
construction a VMState with all the fields in.
VMSTATE_WITH_TMP might help you in this case.

Doing it that way is much more robust against later changes.

Also, I wonder if it'sa good idea to include 'sz' (or at least
state.size) in the migration stream; if it was different on teh
destination then things would go badly wrong. I don't know about mdev
to know if that can happen.   You'd need to validate it when it's
loaded.

Dave

> +
> +static int vfio_device_get(QEMUFile *f, void *pv,
> +                           size_t size, VMStateField *field)
> +{
> +    VFIOPCIDevice *vdev = pv;
> +    PCIDevice *pdev = &vdev->pdev;
> +    int sz = vdev->device_state.size - VFIO_DEVICE_STATE_OFFSET;
> +    uint8_t *buf = NULL;
> +    uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i;
> +    bool msi_64bit;
> +
> +    /* retore pci bar configuration */
> +    ctl = pci_default_read_config(pdev, PCI_COMMAND, 2);
> +    vfio_pci_write_config(pdev, PCI_COMMAND,
> +                          ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
> +    for (i = 0; i < PCI_ROM_SLOT; i++) {
> +        bar_cfg = qemu_get_be32(f);
> +        vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar_cfg, 4);
> +    }
> +    vfio_pci_write_config(pdev, PCI_COMMAND,
> +                          ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 2);
> +
> +    /* restore msi configuration */
> +    ctl = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2);
> +    msi_64bit = !!(ctl & PCI_MSI_FLAGS_64BIT);
> +
> +    vfio_pci_write_config(&vdev->pdev,
> +                          pdev->msi_cap + PCI_MSI_FLAGS,
> +                          ctl & (!PCI_MSI_FLAGS_ENABLE), 2);
> +
> +    msi_lo = qemu_get_be32(f);
> +    vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO, msi_lo, 4);
> +
> +    if (msi_64bit) {
> +        msi_hi = qemu_get_be32(f);
> +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI,
> +                              msi_hi, 4);
> +    }
> +    msi_data = qemu_get_be32(f);
> +    vfio_pci_write_config(pdev,
> +              pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
> +              msi_data, 2);
> +
> +    vfio_pci_write_config(&vdev->pdev, pdev->msi_cap + PCI_MSI_FLAGS,
> +                          ctl | PCI_MSI_FLAGS_ENABLE, 2);
> +
> +    buf = g_malloc(sz);
> +    if (buf == NULL) {
> +        error_report("vfio: Failed to allocate memory for migrate");
> +        return -1;
> +    }
> +
> +    qemu_get_buffer(f, buf, sz);
> +    if (pwrite(vdev->vbasedev.fd, buf, sz,
> +               vdev->device_state.offset + VFIO_DEVICE_STATE_OFFSET) != sz) {
> +        error_report("vfio: Failed to write Device State Region");
> +        return -1;
> +    }
> +
> +    g_free(buf);
> +
> +    return 0;
> +}
> +
>  static void vfio_instance_init(Object *obj)
>  {
>      PCIDevice *pci_dev = PCI_DEVICE(obj);
> @@ -3026,9 +3143,29 @@ static Property vfio_pci_dev_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static const VMStateInfo vfio_vmstate_info = {
> +    .name = "vfio-state",
> +    .get = vfio_device_get,
> +    .put = vfio_device_put,
> +};
> +
>  static VMStateDescription vfio_pci_vmstate = {
>      .name = "vfio-pci",
>      .unmigratable = 1,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        {
> +            .name         = "vfio dev",
> +            .version_id   = 0,
> +            .field_exists = NULL,
> +            .size         = 0,
> +            .info         = &vfio_vmstate_info,
> +            .flags        = VMS_SINGLE,
> +            .offset       = 0,
> +         },
> +        VMSTATE_END_OF_LIST()
> +    },
>  };
>  
>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 4ddeebc..4451a8f 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -303,6 +303,9 @@ struct vfio_region_info_cap_type {
>  /* Mdev sub-type for device state save and restore */
>  #define VFIO_REGION_SUBTYPE_DEVICE_STATE	(4)
>  
> +/* Offset in region to save device state */
> +#define VFIO_DEVICE_STATE_OFFSET	1
> +
>  #define VFIO_DEVICE_START	0
>  #define VFIO_DEVICE_STOP	1
>  
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device
  2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
                   ` (5 preceding siblings ...)
  2018-03-05 13:02 ` [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Kirti Wankhede
@ 2018-03-12 22:21 ` Alex Williamson
  2018-03-13  8:16   ` Zhang, Yulei
  6 siblings, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2018-03-12 22:21 UTC (permalink / raw)
  To: Yulei Zhang; +Cc: qemu-devel, kevin.tian, zhenyuw, kwankhede, Juan Quintela

[cc +Juan]

On Mon,  5 Mar 2018 14:00:49 +0800
Yulei Zhang <yulei.zhang@intel.com> wrote:

> Summary
> 
> This series RFC would like to resume the discussion about how to
> introduce the live migration capability to vfio mdev device. 
> 
> By adding a new vfio subtype region VFIO_REGION_SUBTYPE_DEVICE_STATE,
> the mdev device will be set to migratable if the new region exist
> during the initialization.  
> 
> The intention to add the new region is using it for mdev device status
> save and restore during the migration. The access to this region
> will be trapped and forward to the mdev device driver, it also uses 
> the first byte in the new region to control the running state of mdev
> device, so during the migration after stop the mdev driver, qemu could
> retrieve the specific device status from this region and transfer to 
> the target VM side for the mdev device restore.
> 
> In addition,  we add one new ioctl VFIO_IOMMU_GET_DIRTY_BITMAP to help do 
> the mdev device dirty page synchronization during the migration, currently
> it is just for static copy, in the future we would like to add new interface
> for the pre-copy.

Juan had concerns about another dirty bitmap implementation.  I'm not
sure what alternatives we have, but let's loop him in for guidance on
the best migration strategies.  The migration state for a device could
be many gigabytes.

> Below is the vfio_mdev device migration sequence
> Source VM side:
> 			start migration
> 				|
> 				V
>         	 get the cpu state change callback, write to the
>         	 subregion's first byte to stop the mdev device
> 				|
> 				V
> 		 quary the dirty page bitmap from iommu container 
> 		 and add into qemu dirty list for synchronization
> 				|
> 				V
> 		 save the deivce status into Qemufile which is 
>                      read from the vfio device subregion
> 
> Target VM side:
> 		   restore the mdev device after get the
> 		     saved status context from Qemufile
> 				|
> 				V
> 		     get the cpu state change callback
>  		     write to subregion's first byte to 
>                       start the mdev device to put it in 
>                       running status
> 				|
> 				V
> 			finish migration
> 
> V3->V2:
> 1. rebase the patch to Qemu stable 2.10 branch.
> 2. use a common name for the subregion instead of specific for 
>    intel IGD.

But it's still tied to Intel's vendor ID??

Thanks,
Alex


> 
> V1->V2:
> Per Alex's suggestion:
> 1. use device subtype region instead of VFIO PCI fixed region.
> 2. remove unnecessary ioctl, use the first byte of subregion to 
>    control the running state of mdev device.  
> 3. for dirty page synchronization, implement the interface with
>    VFIOContainer instead of vfio pci device.
> 
> Yulei Zhang (4):
>   vfio: introduce a new VFIO subregion for mdev device migration support
>   vfio: Add vm status change callback to stop/restart the mdev device
>   vfio: Add struct vfio_vmstate_info to introduce put/get callback
>     funtion for vfio device status save/restore
>   vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
> 
>  hw/vfio/common.c              |  34 +++++++++
>  hw/vfio/pci.c                 | 171 +++++++++++++++++++++++++++++++++++++++++-
>  hw/vfio/pci.h                 |   1 +
>  include/hw/vfio/vfio-common.h |   1 +
>  linux-headers/linux/vfio.h    |  29 ++++++-
>  5 files changed, 232 insertions(+), 4 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device
  2018-03-12 22:21 ` Alex Williamson
@ 2018-03-13  8:16   ` Zhang, Yulei
  0 siblings, 0 replies; 15+ messages in thread
From: Zhang, Yulei @ 2018-03-13  8:16 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Tian, Kevin, zhenyuw, kwankhede, Juan Quintela



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, March 13, 2018 6:22 AM
> To: Zhang, Yulei <yulei.zhang@intel.com>
> Cc: qemu-devel@nongnu.org; Tian, Kevin <kevin.tian@intel.com>;
> zhenyuw@linux.intel.com; kwankhede@nvidia.com; Juan Quintela
> <quintela@redhat.com>
> Subject: Re: [PATCH V3 0/4] vfio: Introduce Live migration capability to
> vfio_mdev device
> 
> [cc +Juan]
> 
> On Mon,  5 Mar 2018 14:00:49 +0800
> Yulei Zhang <yulei.zhang@intel.com> wrote:
> 
> > Summary
> >
> > This series RFC would like to resume the discussion about how to
> > introduce the live migration capability to vfio mdev device.
> >
> > By adding a new vfio subtype region
> VFIO_REGION_SUBTYPE_DEVICE_STATE,
> > the mdev device will be set to migratable if the new region exist
> > during the initialization.
> >
> > The intention to add the new region is using it for mdev device status
> > save and restore during the migration. The access to this region
> > will be trapped and forward to the mdev device driver, it also uses
> > the first byte in the new region to control the running state of mdev
> > device, so during the migration after stop the mdev driver, qemu could
> > retrieve the specific device status from this region and transfer to
> > the target VM side for the mdev device restore.
> >
> > In addition,  we add one new ioctl VFIO_IOMMU_GET_DIRTY_BITMAP to
> help do
> > the mdev device dirty page synchronization during the migration,
> currently
> > it is just for static copy, in the future we would like to add new interface
> > for the pre-copy.
> 
> Juan had concerns about another dirty bitmap implementation.  I'm not
> sure what alternatives we have, but let's loop him in for guidance on
> the best migration strategies.  The migration state for a device could
> be many gigabytes.
> 
> > Below is the vfio_mdev device migration sequence
> > Source VM side:
> > 			start migration
> > 				|
> > 				V
> >         	 get the cpu state change callback, write to the
> >         	 subregion's first byte to stop the mdev device
> > 				|
> > 				V
> > 		 quary the dirty page bitmap from iommu container
> > 		 and add into qemu dirty list for synchronization
> > 				|
> > 				V
> > 		 save the deivce status into Qemufile which is
> >                      read from the vfio device subregion
> >
> > Target VM side:
> > 		   restore the mdev device after get the
> > 		     saved status context from Qemufile
> > 				|
> > 				V
> > 		     get the cpu state change callback
> >  		     write to subregion's first byte to
> >                       start the mdev device to put it in
> >                       running status
> > 				|
> > 				V
> > 			finish migration
> >
> > V3->V2:
> > 1. rebase the patch to Qemu stable 2.10 branch.
> > 2. use a common name for the subregion instead of specific for
> >    intel IGD.
> 
> But it's still tied to Intel's vendor ID??
> 
No. this is not necessary, I will remove the intel vendor ID. 

> Thanks,
> Alex
> 
> 
> >
> > V1->V2:
> > Per Alex's suggestion:
> > 1. use device subtype region instead of VFIO PCI fixed region.
> > 2. remove unnecessary ioctl, use the first byte of subregion to
> >    control the running state of mdev device.
> > 3. for dirty page synchronization, implement the interface with
> >    VFIOContainer instead of vfio pci device.
> >
> > Yulei Zhang (4):
> >   vfio: introduce a new VFIO subregion for mdev device migration support
> >   vfio: Add vm status change callback to stop/restart the mdev device
> >   vfio: Add struct vfio_vmstate_info to introduce put/get callback
> >     funtion for vfio device status save/restore
> >   vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP
> >
> >  hw/vfio/common.c              |  34 +++++++++
> >  hw/vfio/pci.c                 | 171
> +++++++++++++++++++++++++++++++++++++++++-
> >  hw/vfio/pci.h                 |   1 +
> >  include/hw/vfio/vfio-common.h |   1 +
> >  linux-headers/linux/vfio.h    |  29 ++++++-
> >  5 files changed, 232 insertions(+), 4 deletions(-)
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO subregion for mdev device migration support
  2018-03-09 11:43   ` Dr. David Alan Gilbert
@ 2018-03-13  8:27     ` Zhang, Yulei
  0 siblings, 0 replies; 15+ messages in thread
From: Zhang, Yulei @ 2018-03-13  8:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Tian, Kevin, alex.williamson, kwankhede, zhenyuw



> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Friday, March 9, 2018 7:43 PM
> To: Zhang, Yulei <yulei.zhang@intel.com>
> Cc: qemu-devel@nongnu.org; Tian, Kevin <kevin.tian@intel.com>;
> alex.williamson@redhat.com; kwankhede@nvidia.com;
> zhenyuw@linux.intel.com
> Subject: Re: [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO
> subregion for mdev device migration support
> 
> * Yulei Zhang (yulei.zhang@intel.com) wrote:
> > New VFIO sub region VFIO_REGION_SUBTYPE_DEVICE_STATE is added to
> fetch
> > and restore the status of mdev device vGPU during the live migration.
> >
> > Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
> > ---
> >  hw/vfio/pci.c              | 14 +++++++++++++-
> >  hw/vfio/pci.h              |  1 +
> >  linux-headers/linux/vfio.h |  9 ++++++---
> >  3 files changed, 20 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 31e1edf..2fe20e4
> > 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -37,6 +37,7 @@
> >
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);  static
> > void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static VMStateDescription vfio_pci_vmstate;
> >
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -2813,6 +2814,17 @@ static void vfio_realize(PCIDevice *pdev, Error
> **errp)
> >          vfio_vga_quirk_setup(vdev);
> >      }
> >
> > +    struct vfio_region_info *device_state;
> > +    /* device state region setup */
> > +    if (!vfio_get_dev_region_info(&vdev->vbasedev,
> > +                VFIO_REGION_TYPE_PCI_VENDOR_TYPE |
> PCI_VENDOR_ID_INTEL,
> > +                VFIO_REGION_SUBTYPE_DEVICE_STATE, &device_state)) {
> > +        memcpy(&vdev->device_state, device_state,
> > +               sizeof(struct vfio_region_info));
> > +        g_free(device_state);
> > +        vfio_pci_vmstate.unmigratable = 0;
> > +    }
> 
> I don't think we've got any other code that changes the 'unmigratable'
> flag on a vmstate.  If you've got a device that you don't know whether it's
> migratable until runtime, then you should add a 'migration blocker'
> if the device isn't migratable.
> See for example the calls to migrate_add_blocker in hw/virtio/vhost.c
> 
> Dave
> 
Thanks, I see. We will leave the default value of unmigratable as 0 for 
vfio pci, and add blocker if the subtype region doesn't exist. 

> >      for (i = 0; i < PCI_ROM_SLOT; i++) {
> >          vfio_bar_quirk_setup(vdev, i);
> >      }
> > @@ -2994,7 +3006,7 @@ static Property vfio_pci_dev_properties[] = {
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > -static const VMStateDescription vfio_pci_vmstate = {
> > +static VMStateDescription vfio_pci_vmstate = {
> >      .name = "vfio-pci",
> >      .unmigratable = 1,
> >  };
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index a8366bb..6a1d26e
> > 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -116,6 +116,7 @@ typedef struct VFIOPCIDevice {
> >      VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
> >      VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
> >      void *igd_opregion;
> > +    struct vfio_region_info device_state;
> >      PCIHostDeviceAddress host;
> >      EventNotifier err_notifier;
> >      EventNotifier req_notifier;
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index 4e7ab4c..c3b8e4a 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -296,9 +296,12 @@ struct vfio_region_info_cap_type {
> >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK	(0xffff)
> >
> >  /* 8086 Vendor sub-types */
> > -#define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION	(1)
> > -#define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG	(2)
> > -#define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG	(3)
> > +#define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION		(1)
> > +#define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG		(2)
> > +#define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG		(3)
> > +
> > +/* Mdev sub-type for device state save and restore */
> > +#define VFIO_REGION_SUBTYPE_DEVICE_STATE	(4)
> >
> >  /**
> >   * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
> > --
> > 2.7.4
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore
  2018-03-09 11:59   ` Dr. David Alan Gilbert
@ 2018-03-14  8:52     ` Zhang, Yulei
  0 siblings, 0 replies; 15+ messages in thread
From: Zhang, Yulei @ 2018-03-14  8:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Tian, Kevin, alex.williamson, kwankhede, zhenyuw



> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Friday, March 9, 2018 7:59 PM
> To: Zhang, Yulei <yulei.zhang@intel.com>
> Cc: qemu-devel@nongnu.org; Tian, Kevin <kevin.tian@intel.com>;
> alex.williamson@redhat.com; kwankhede@nvidia.com;
> zhenyuw@linux.intel.com
> Subject: Re: [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info
> to introduce put/get callback funtion for vfio device status save/restore
> 
> * Yulei Zhang (yulei.zhang@intel.com) wrote:
> > Introduce vfio_device_put/vfio_device_get funtion for vfio device
> > state save/restore usage.
> >
> > For VFIO pci device status migrate, on the source side with funtion
> > vfio_device_put to save the following states 1. pci configuration
> > space addr0~addr5 2. pci configuration space msi_addr msi_data 3. pci
> > device status fetch from device driver
> >
> > And on the target side with funtion vfio_device_get to restore the
> > same states 1. re-setup the pci bar configuration 2. re-setup the pci
> > device msi configuration 3. restore the pci device status
> >
> > Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
> > ---
> >  hw/vfio/pci.c              | 137
> +++++++++++++++++++++++++++++++++++++++++++++
> >  linux-headers/linux/vfio.h |   3 +
> >  2 files changed, 140 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 3e2289c..c1676cf
> > 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2982,6 +2982,123 @@ static void
> vfio_vm_change_state_handler(void *pv, int running, RunState state)
> >      vbasedev->device_state = dev_state;  }
> >
> > +static int vfio_device_put(QEMUFile *f, void *pv, size_t size,
> > +                           VMStateField *field, QJSON *vmdesc) {
> > +    VFIOPCIDevice *vdev = pv;
> > +    PCIDevice *pdev = &vdev->pdev;
> > +    int sz = vdev->device_state.size - VFIO_DEVICE_STATE_OFFSET;
> > +    uint8_t *buf = NULL;
> > +    uint32_t msi_cfg, msi_lo, msi_hi, msi_data, bar_cfg, i;
> > +    bool msi_64bit;
> > +
> > +    for (i = 0; i < PCI_ROM_SLOT; i++) {
> > +        bar_cfg = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i *
> 4, 4);
> > +        qemu_put_be32(f, bar_cfg);
> > +    }
> > +
> > +    msi_cfg = pci_default_read_config(pdev, pdev->msi_cap +
> PCI_MSI_FLAGS, 2);
> > +    msi_64bit = !!(msi_cfg & PCI_MSI_FLAGS_64BIT);
> > +
> > +    msi_lo = pci_default_read_config(pdev,
> > +                                     pdev->msi_cap + PCI_MSI_ADDRESS_LO, 4);
> > +    qemu_put_be32(f, msi_lo);
> > +
> > +    if (msi_64bit) {
> > +        msi_hi = pci_default_read_config(pdev,
> > +                                         pdev->msi_cap + PCI_MSI_ADDRESS_HI,
> > +                                         4);
> > +        qemu_put_be32(f, msi_hi);
> > +    }
> > +
> > +    msi_data = pci_default_read_config(pdev,
> > +               pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 :
> PCI_MSI_DATA_32),
> > +               2);
> > +    qemu_put_be32(f, msi_data);
> > +
> > +    buf = g_malloc(sz);
> > +    if (buf == NULL) {
> > +        error_report("vfio: Failed to allocate memory for migrate");
> > +        goto exit;
> 
> Yet exit does a 'return 0' - so we've not failed the migration and we've
> generated a broken migration stream.
> (same in the other exit cases)
> 
> > +    }
> > +
> > +    if (pread(vdev->vbasedev.fd, buf, sz,
> > +              vdev->device_state.offset + VFIO_DEVICE_STATE_OFFSET) != sz) {
> > +        error_report("vfio: Failed to read Device State Region");
> > +        goto exit;
> > +    }
> > +
> > +    qemu_put_buffer(f, buf, sz);
> > +
> > +exit:
> > +    g_free(buf);
> > +
> > +    return 0;
> > +}
> 
> We really try and avoid qemu_put_* and qemu_get* as much as possible
> these days.
> Please try building a datastructure (possible in a pre_save) and then
> construction a VMState with all the fields in.
> VMSTATE_WITH_TMP might help you in this case.
> 
> Doing it that way is much more robust against later changes.
> 
> Also, I wonder if it'sa good idea to include 'sz' (or at least
> state.size) in the migration stream; if it was different on teh destination
> then things would go badly wrong. I don't know about mdev
> to know if that can happen.   You'd need to validate it when it's
> loaded.
> 
> Dave
> 
Hi Dave, this size is not included in the migration stream, it is reported by
device state region when vifo pci initialize, it's value will be varied according to 
different mdev devices. 

Yulei
> > +
> > +static int vfio_device_get(QEMUFile *f, void *pv,
> > +                           size_t size, VMStateField *field) {
> > +    VFIOPCIDevice *vdev = pv;
> > +    PCIDevice *pdev = &vdev->pdev;
> > +    int sz = vdev->device_state.size - VFIO_DEVICE_STATE_OFFSET;
> > +    uint8_t *buf = NULL;
> > +    uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i;
> > +    bool msi_64bit;
> > +
> > +    /* retore pci bar configuration */
> > +    ctl = pci_default_read_config(pdev, PCI_COMMAND, 2);
> > +    vfio_pci_write_config(pdev, PCI_COMMAND,
> > +                          ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)),
> 2);
> > +    for (i = 0; i < PCI_ROM_SLOT; i++) {
> > +        bar_cfg = qemu_get_be32(f);
> > +        vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar_cfg, 4);
> > +    }
> > +    vfio_pci_write_config(pdev, PCI_COMMAND,
> > +                          ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY,
> > + 2);
> > +
> > +    /* restore msi configuration */
> > +    ctl = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS,
> 2);
> > +    msi_64bit = !!(ctl & PCI_MSI_FLAGS_64BIT);
> > +
> > +    vfio_pci_write_config(&vdev->pdev,
> > +                          pdev->msi_cap + PCI_MSI_FLAGS,
> > +                          ctl & (!PCI_MSI_FLAGS_ENABLE), 2);
> > +
> > +    msi_lo = qemu_get_be32(f);
> > +    vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO,
> > + msi_lo, 4);
> > +
> > +    if (msi_64bit) {
> > +        msi_hi = qemu_get_be32(f);
> > +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI,
> > +                              msi_hi, 4);
> > +    }
> > +    msi_data = qemu_get_be32(f);
> > +    vfio_pci_write_config(pdev,
> > +              pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 :
> PCI_MSI_DATA_32),
> > +              msi_data, 2);
> > +
> > +    vfio_pci_write_config(&vdev->pdev, pdev->msi_cap + PCI_MSI_FLAGS,
> > +                          ctl | PCI_MSI_FLAGS_ENABLE, 2);
> > +
> > +    buf = g_malloc(sz);
> > +    if (buf == NULL) {
> > +        error_report("vfio: Failed to allocate memory for migrate");
> > +        return -1;
> > +    }
> > +
> > +    qemu_get_buffer(f, buf, sz);
> > +    if (pwrite(vdev->vbasedev.fd, buf, sz,
> > +               vdev->device_state.offset + VFIO_DEVICE_STATE_OFFSET) != sz) {
> > +        error_report("vfio: Failed to write Device State Region");
> > +        return -1;
> > +    }
> > +
> > +    g_free(buf);
> > +
> > +    return 0;
> > +}
> > +
> >  static void vfio_instance_init(Object *obj)  {
> >      PCIDevice *pci_dev = PCI_DEVICE(obj); @@ -3026,9 +3143,29 @@
> > static Property vfio_pci_dev_properties[] = {
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > +static const VMStateInfo vfio_vmstate_info = {
> > +    .name = "vfio-state",
> > +    .get = vfio_device_get,
> > +    .put = vfio_device_put,
> > +};
> > +
> >  static VMStateDescription vfio_pci_vmstate = {
> >      .name = "vfio-pci",
> >      .unmigratable = 1,
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .fields = (VMStateField[]) {
> > +        {
> > +            .name         = "vfio dev",
> > +            .version_id   = 0,
> > +            .field_exists = NULL,
> > +            .size         = 0,
> > +            .info         = &vfio_vmstate_info,
> > +            .flags        = VMS_SINGLE,
> > +            .offset       = 0,
> > +         },
> > +        VMSTATE_END_OF_LIST()
> > +    },
> >  };
> >
> >  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index 4ddeebc..4451a8f 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -303,6 +303,9 @@ struct vfio_region_info_cap_type {
> >  /* Mdev sub-type for device state save and restore */
> >  #define VFIO_REGION_SUBTYPE_DEVICE_STATE	(4)
> >
> > +/* Offset in region to save device state */
> > +#define VFIO_DEVICE_STATE_OFFSET	1
> > +
> >  #define VFIO_DEVICE_START	0
> >  #define VFIO_DEVICE_STOP	1
> >
> > --
> > 2.7.4
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-14  8:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05  6:00 [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Yulei Zhang
2018-03-05  6:00 ` no-reply
2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 1/4] vfio: introduce a new VFIO subregion for mdev device migration support Yulei Zhang
2018-03-09 11:43   ` Dr. David Alan Gilbert
2018-03-13  8:27     ` Zhang, Yulei
2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 2/4] vfio: Add vm status change callback to stop/restart the mdev device Yulei Zhang
2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 3/4] vfio: Add struct vfio_vmstate_info to introduce put/get callback funtion for vfio device status save/restore Yulei Zhang
2018-03-09 11:59   ` Dr. David Alan Gilbert
2018-03-14  8:52     ` Zhang, Yulei
2018-03-05  6:00 ` [Qemu-devel] [PATCH V3 4/4] vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP Yulei Zhang
2018-03-05 13:02 ` [Qemu-devel] [PATCH V3 0/4] vfio: Introduce Live migration capability to vfio_mdev device Kirti Wankhede
2018-03-06 13:34   ` Zhang, Yulei
2018-03-07  2:11     ` Tian, Kevin
2018-03-12 22:21 ` Alex Williamson
2018-03-13  8:16   ` Zhang, Yulei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.