[RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration
@ 2020-12-09  8:09 Shenming Lu
  2020-12-09  8:09 ` [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in " Shenming Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Shenming Lu @ 2020-12-09  8:09 UTC (permalink / raw)
  To: Alex Williamson, Kirti Wankhede, Cornelia Huck,
	Dr . David Alan Gilbert, Eric Auger, mst, Marcel Apfelbaum
  Cc: Lorenzo Pieralisi, Neo Jia, Marc Zyngier, qemu-devel, lushenming,
	qemu-arm, yuzenghui, wanghaibin.wang

This patch set includes two fixes and one optimization for VFIO migration
as blew:
Patch 1-2:
- Fix two ordering problems in migration.

Patch 3:
- Optimize the enabling process of the MSI-X vectors in migration.

Thanks,
Shenming

Shenming Lu (3):
  vfio: Move the saving of the config space to the right place in VFIO
    migration
  vfio: Set the priority of the VFIO VM state change handler explicitly
  vfio: Avoid disabling and enabling vectors repeatedly in VFIO
    migration

 hw/pci/msix.c         | 17 +++++++++++++++++
 hw/vfio/migration.c   | 28 +++++++++++++++++-----------
 hw/vfio/pci.c         | 10 ++++++++--
 include/hw/pci/msix.h |  2 ++
 4 files changed, 44 insertions(+), 13 deletions(-)

-- 
2.19.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2020-12-09  8:09 [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration Shenming Lu
@ 2020-12-09  8:09 ` Shenming Lu
  2020-12-09 12:29   ` Cornelia Huck
  2021-02-18 14:42   ` Kirti Wankhede
  2020-12-09  8:09 ` [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly Shenming Lu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Shenming Lu @ 2020-12-09  8:09 UTC (permalink / raw)
  To: Alex Williamson, Kirti Wankhede, Cornelia Huck,
	Dr . David Alan Gilbert, Eric Auger, mst, Marcel Apfelbaum
  Cc: Lorenzo Pieralisi, Neo Jia, Marc Zyngier, qemu-devel, lushenming,
	qemu-arm, yuzenghui, wanghaibin.wang

On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
setup, if the restoring of the VFIO PCI device config space is
before the VGIC, an error might occur in the kernel.

So we move the saving of the config space to the non-iterable
process, so that it will be called after the VGIC according to
their priorities.

As for the possible dependence of the device specific migration
data on it's config space, we can let the vendor driver to
include any config info it needs in its own data stream.
(Should we note this in the header file linux-headers/linux/vfio.h?)

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 hw/vfio/migration.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 00daa50ed8..3b9de1353a 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -575,11 +575,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    ret = vfio_save_device_config_state(f, opaque);
-    if (ret) {
-        return ret;
-    }
-
     ret = vfio_update_pending(vbasedev);
     if (ret) {
         return ret;
@@ -620,6 +615,19 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
     return ret;
 }
 
+static void vfio_save_state(QEMUFile *f, void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+    int ret;
+
+    /* The device specific data is migrated in the iterable process. */
+    ret = vfio_save_device_config_state(f, opaque);
+    if (ret) {
+        error_report("%s: Failed to save device config space",
+                     vbasedev->name);
+    }
+}
+
 static int vfio_load_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -670,11 +678,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
         switch (data) {
         case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
         {
-            ret = vfio_load_device_config_state(f, opaque);
-            if (ret) {
-                return ret;
-            }
-            break;
+            return vfio_load_device_config_state(f, opaque);
         }
         case VFIO_MIG_FLAG_DEV_SETUP_STATE:
         {
@@ -720,6 +724,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
     .save_live_pending = vfio_save_pending,
     .save_live_iterate = vfio_save_iterate,
     .save_live_complete_precopy = vfio_save_complete_precopy,
+    .save_state = vfio_save_state,
     .load_setup = vfio_load_setup,
     .load_cleanup = vfio_load_cleanup,
     .load_state = vfio_load_state,
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2020-12-09  8:09 [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration Shenming Lu
  2020-12-09  8:09 ` [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in " Shenming Lu
@ 2020-12-09  8:09 ` Shenming Lu
  2020-12-09 12:45   ` Cornelia Huck
  2021-01-26 21:36   ` Alex Williamson
  2020-12-09  8:09 ` [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration Shenming Lu
  2021-01-26  6:03 ` [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for " Shenming Lu
  3 siblings, 2 replies; 24+ messages in thread
From: Shenming Lu @ 2020-12-09  8:09 UTC (permalink / raw)
  To: Alex Williamson, Kirti Wankhede, Cornelia Huck,
	Dr . David Alan Gilbert, Eric Auger, mst, Marcel Apfelbaum
  Cc: Lorenzo Pieralisi, Neo Jia, Marc Zyngier, qemu-devel, lushenming,
	qemu-arm, yuzenghui, wanghaibin.wang

In the VFIO VM state change handler, VFIO devices are transitioned
in the _SAVING state, which should keep them from sending interrupts.
Then we can save the pending states of all interrupts in the GIC VM
state change handler (on ARM).

So we have to set the priority of the VFIO VM state change handler
explicitly (like virtio devices) to ensure it is called before the
GIC's in saving.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
---
 hw/vfio/migration.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3b9de1353a..97ea82b100 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
     register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
                          vbasedev);
 
-    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
+    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
+                                                           vfio_vmstate_change,
                                                            vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration
  2020-12-09  8:09 [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration Shenming Lu
  2020-12-09  8:09 ` [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in " Shenming Lu
  2020-12-09  8:09 ` [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly Shenming Lu
@ 2020-12-09  8:09 ` Shenming Lu
  2021-01-26 21:36   ` Alex Williamson
  2021-01-26  6:03 ` [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for " Shenming Lu
  3 siblings, 1 reply; 24+ messages in thread
From: Shenming Lu @ 2020-12-09  8:09 UTC (permalink / raw)
  To: Alex Williamson, Kirti Wankhede, Cornelia Huck,
	Dr . David Alan Gilbert, Eric Auger, mst, Marcel Apfelbaum
  Cc: Lorenzo Pieralisi, Neo Jia, Marc Zyngier, qemu-devel, lushenming,
	qemu-arm, yuzenghui, wanghaibin.wang

Different from the normal situation when the guest starts, we can
know the max unmasked vetctor (at the beginning) after msix_load()
in VFIO migration. So in order to avoid ineffectively disabling and
enabling vectors repeatedly, let's allocate all needed vectors first
and then enable these unmasked vectors one by one without disabling.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 hw/pci/msix.c         | 17 +++++++++++++++++
 hw/vfio/pci.c         | 10 ++++++++--
 include/hw/pci/msix.h |  2 ++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 67e34f34d6..bf291d3ff8 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -557,6 +557,23 @@ unsigned int msix_nr_vectors_allocated(const PCIDevice *dev)
     return dev->msix_entries_nr;
 }
 
+int msix_get_max_unmasked_vector(PCIDevice *dev)
+{
+    int max_unmasked_vector = -1;
+    int vector;
+
+    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+        for (vector = 0; vector < dev->msix_entries_nr; vector++) {
+            if (!msix_is_masked(dev, vector)) {
+                max_unmasked_vector = vector;
+            }
+        }
+    }
+
+    return max_unmasked_vector;
+}
+
 static int msix_set_notifier_for_vector(PCIDevice *dev, unsigned int vector)
 {
     MSIMessage msg;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 51dc373695..e755ed2514 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -568,6 +568,9 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 
 static void vfio_msix_enable(VFIOPCIDevice *vdev)
 {
+    int max_unmasked_vector = msix_get_max_unmasked_vector(&vdev->pdev);
+    unsigned int used_vector = MAX(max_unmasked_vector, 0);
+
     vfio_disable_interrupts(vdev);
 
     vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
@@ -586,9 +589,12 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
      * triggering to userspace, then immediately release the vector, leaving
      * the physical device with no vectors enabled, but MSI-X enabled, just
      * like the guest view.
+     * If there are unmasked vectors (such as in migration) which will be
+     * enabled soon, we can allocate them here to avoid ineffectively disabling
+     * and enabling vectors repeatedly later.
      */
-    vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
-    vfio_msix_vector_release(&vdev->pdev, 0);
+    vfio_msix_vector_do_use(&vdev->pdev, used_vector, NULL, NULL);
+    vfio_msix_vector_release(&vdev->pdev, used_vector);
 
     if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use,
                                   vfio_msix_vector_release, NULL)) {
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 4c4a60c739..4bfb463fa6 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -23,6 +23,8 @@ void msix_uninit_exclusive_bar(PCIDevice *dev);
 
 unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
 
+int msix_get_max_unmasked_vector(PCIDevice *dev);
+
 void msix_save(PCIDevice *dev, QEMUFile *f);
 void msix_load(PCIDevice *dev, QEMUFile *f);
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2020-12-09  8:09 ` [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in " Shenming Lu
@ 2020-12-09 12:29   ` Cornelia Huck
  2020-12-09 18:34     ` Alex Williamson
  2021-02-18 14:42   ` Kirti Wankhede
  1 sibling, 1 reply; 24+ messages in thread
From: Cornelia Huck @ 2020-12-09 12:29 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, qemu-devel, Marc Zyngier,
	Kirti Wankhede, Dr . David Alan Gilbert, Eric Auger,
	Alex Williamson, qemu-arm, yuzenghui, wanghaibin.wang

On Wed, 9 Dec 2020 16:09:17 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
> setup, if the restoring of the VFIO PCI device config space is
> before the VGIC, an error might occur in the kernel.
> 
> So we move the saving of the config space to the non-iterable
> process, so that it will be called after the VGIC according to
> their priorities.
> 
> As for the possible dependence of the device specific migration
> data on it's config space, we can let the vendor driver to
> include any config info it needs in its own data stream.
> (Should we note this in the header file linux-headers/linux/vfio.h?)

Given that the header is our primary source about how this interface
should act, we need to properly document expectations about what will
be saved/restored when there (well, in the source file in the kernel.)
That goes in both directions: what a userspace must implement, and what
a vendor driver can rely on.

[Related, but not a todo for you: I think we're still missing proper
documentation of the whole migration feature.]

> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> ---
>  hw/vfio/migration.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2020-12-09  8:09 ` [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly Shenming Lu
@ 2020-12-09 12:45   ` Cornelia Huck
  2020-12-10  3:11     ` Shenming Lu
  2021-01-26 21:36   ` Alex Williamson
  1 sibling, 1 reply; 24+ messages in thread
From: Cornelia Huck @ 2020-12-09 12:45 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, qemu-devel, Marc Zyngier,
	Kirti Wankhede, Dr . David Alan Gilbert, Eric Auger,
	Alex Williamson, qemu-arm, yuzenghui, wanghaibin.wang

On Wed, 9 Dec 2020 16:09:18 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> In the VFIO VM state change handler, VFIO devices are transitioned
> in the _SAVING state, which should keep them from sending interrupts.
> Then we can save the pending states of all interrupts in the GIC VM
> state change handler (on ARM).
> 
> So we have to set the priority of the VFIO VM state change handler
> explicitly (like virtio devices) to ensure it is called before the
> GIC's in saving.

What this patch does is to make the priority of the vfio migration
state change handler depending on the position in the qdev tree. As all
state change handlers with no explicit priority are added at priority
0, this will make sure that this handler runs before (save) resp. after
(restore) nearly all other handlers, which will address your issue here
(and possibly similar ones).

So, this patch seems fine for now, but I'm wondering whether we need to
think more about priorities for handlers in general, and if there are
more hidden dependencies lurking in there.

> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
> ---
>  hw/vfio/migration.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3b9de1353a..97ea82b100 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>      register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
>                           vbasedev);
>  
> -    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> +                                                           vfio_vmstate_change,
>                                                             vbasedev);
>      migration->migration_state.notify = vfio_migration_state_notifier;
>      add_migration_state_change_notifier(&migration->migration_state);



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2020-12-09 12:29   ` Cornelia Huck
@ 2020-12-09 18:34     ` Alex Williamson
  2020-12-10  2:21       ` Shenming Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2020-12-09 18:34 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier,
	Dr . David Alan Gilbert, qemu-devel, Shenming Lu, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang, Eric Auger

On Wed, 9 Dec 2020 13:29:47 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Wed, 9 Dec 2020 16:09:17 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
> > On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
> > setup, if the restoring of the VFIO PCI device config space is
> > before the VGIC, an error might occur in the kernel.
> > 
> > So we move the saving of the config space to the non-iterable
> > process, so that it will be called after the VGIC according to
> > their priorities.
> > 
> > As for the possible dependence of the device specific migration
> > data on it's config space, we can let the vendor driver to
> > include any config info it needs in its own data stream.
> > (Should we note this in the header file linux-headers/linux/vfio.h?)  
> 
> Given that the header is our primary source about how this interface
> should act, we need to properly document expectations about what will
> be saved/restored when there (well, in the source file in the kernel.)
> That goes in both directions: what a userspace must implement, and what
> a vendor driver can rely on.
> 
> [Related, but not a todo for you: I think we're still missing proper
> documentation of the whole migration feature.]

Yes, we never saw anything past v1 of the documentation patch.  Thanks,

Alex


> > Signed-off-by: Shenming Lu <lushenming@huawei.com>
> > ---
> >  hw/vfio/migration.c | 25 +++++++++++++++----------
> >  1 file changed, 15 insertions(+), 10 deletions(-)  



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2020-12-09 18:34     ` Alex Williamson
@ 2020-12-10  2:21       ` Shenming Lu
  2021-01-26 21:36         ` Alex Williamson
  0 siblings, 1 reply; 24+ messages in thread
From: Shenming Lu @ 2020-12-10  2:21 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, qemu-devel,
	Dr . David Alan Gilbert, Eric Auger, Kirti Wankhede, qemu-arm,
	yuzenghui, wanghaibin.wang

On 2020/12/10 2:34, Alex Williamson wrote:
> On Wed, 9 Dec 2020 13:29:47 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Wed, 9 Dec 2020 16:09:17 +0800
>> Shenming Lu <lushenming@huawei.com> wrote:
>>
>>> On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
>>> setup, if the restoring of the VFIO PCI device config space is
>>> before the VGIC, an error might occur in the kernel.
>>>
>>> So we move the saving of the config space to the non-iterable
>>> process, so that it will be called after the VGIC according to
>>> their priorities.
>>>
>>> As for the possible dependence of the device specific migration
>>> data on it's config space, we can let the vendor driver to
>>> include any config info it needs in its own data stream.
>>> (Should we note this in the header file linux-headers/linux/vfio.h?)  
>>
>> Given that the header is our primary source about how this interface
>> should act, we need to properly document expectations about what will
>> be saved/restored when there (well, in the source file in the kernel.)
>> That goes in both directions: what a userspace must implement, and what
>> a vendor driver can rely on.

Yeah, in order to make the vendor driver and QEMU cooperate better, we might
need to document some expectations about the data section in the migration
region...

>>
>> [Related, but not a todo for you: I think we're still missing proper
>> documentation of the whole migration feature.]
> 
> Yes, we never saw anything past v1 of the documentation patch.  Thanks,
> 

By the way, is there anything unproper with this patch? Wish your suggestion. :-)

> Alex
> 
> 
>>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>>> ---
>>>  hw/vfio/migration.c | 25 +++++++++++++++----------
>>>  1 file changed, 15 insertions(+), 10 deletions(-)  
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2020-12-09 12:45   ` Cornelia Huck
@ 2020-12-10  3:11     ` Shenming Lu
  2020-12-10 18:39       ` Cornelia Huck
  0 siblings, 1 reply; 24+ messages in thread
From: Shenming Lu @ 2020-12-10  3:11 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Lorenzo Pieralisi, Neo Jia, mst, qemu-devel, Marc Zyngier,
	Kirti Wankhede, Dr . David Alan Gilbert, Eric Auger,
	Alex Williamson, qemu-arm, yuzenghui, wanghaibin.wang

On 2020/12/9 20:45, Cornelia Huck wrote:
> On Wed, 9 Dec 2020 16:09:18 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> In the VFIO VM state change handler, VFIO devices are transitioned
>> in the _SAVING state, which should keep them from sending interrupts.
>> Then we can save the pending states of all interrupts in the GIC VM
>> state change handler (on ARM).
>>
>> So we have to set the priority of the VFIO VM state change handler
>> explicitly (like virtio devices) to ensure it is called before the
>> GIC's in saving.
> 
> What this patch does is to make the priority of the vfio migration
> state change handler depending on the position in the qdev tree. As all
> state change handlers with no explicit priority are added at priority
> 0, this will make sure that this handler runs before (save) resp. after
> (restore) nearly all other handlers, which will address your issue here
> (and possibly similar ones).
> 
> So, this patch seems fine for now, but I'm wondering whether we need to
> think more about priorities for handlers in general, and if there are
> more hidden dependencies lurking in there.

As far as I know, as for the migration of interrupt, on x86 the sync from
the PIR field to the Virtual-APIC page for posted interrupts (in
KVM_GET_LAPIC ioctl) is after the pause of VFIO devices, which is fine.
Not sure about others...

Thanks,
Shenming

> 
>>
>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
>> ---
>>  hw/vfio/migration.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 3b9de1353a..97ea82b100 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>>      register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
>>                           vbasedev);
>>  
>> -    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
>> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
>> +                                                           vfio_vmstate_change,
>>                                                             vbasedev);
>>      migration->migration_state.notify = vfio_migration_state_notifier;
>>      add_migration_state_change_notifier(&migration->migration_state);
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2020-12-10  3:11     ` Shenming Lu
@ 2020-12-10 18:39       ` Cornelia Huck
  0 siblings, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2020-12-10 18:39 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, qemu-devel, Marc Zyngier,
	Kirti Wankhede, Dr . David Alan Gilbert, Eric Auger,
	Alex Williamson, qemu-arm, yuzenghui, wanghaibin.wang

On Thu, 10 Dec 2020 11:11:17 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> On 2020/12/9 20:45, Cornelia Huck wrote:
> > On Wed, 9 Dec 2020 16:09:18 +0800
> > Shenming Lu <lushenming@huawei.com> wrote:
> >   
> >> In the VFIO VM state change handler, VFIO devices are transitioned
> >> in the _SAVING state, which should keep them from sending interrupts.
> >> Then we can save the pending states of all interrupts in the GIC VM
> >> state change handler (on ARM).
> >>
> >> So we have to set the priority of the VFIO VM state change handler
> >> explicitly (like virtio devices) to ensure it is called before the
> >> GIC's in saving.  
> > 
> > What this patch does is to make the priority of the vfio migration
> > state change handler depending on the position in the qdev tree. As all
> > state change handlers with no explicit priority are added at priority
> > 0, this will make sure that this handler runs before (save) resp. after
> > (restore) nearly all other handlers, which will address your issue here
> > (and possibly similar ones).
> > 
> > So, this patch seems fine for now, but I'm wondering whether we need to
> > think more about priorities for handlers in general, and if there are
> > more hidden dependencies lurking in there.  
> 
> As far as I know, as for the migration of interrupt, on x86 the sync from
> the PIR field to the Virtual-APIC page for posted interrupts (in
> KVM_GET_LAPIC ioctl) is after the pause of VFIO devices, which is fine.
> Not sure about others...

Interrupt handling had been my first guess, but it seems most interrupt
controllers don't use a state change handler anyway.

"handlers in general" actually also referred to use cases outside of
vfio migration, so out of scope of this patch...

> 
> Thanks,
> Shenming
> 
> >   
> >>
> >> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> >> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
> >> ---
> >>  hw/vfio/migration.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index 3b9de1353a..97ea82b100 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
> >>      register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
> >>                           vbasedev);
> >>  
> >> -    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
> >> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> >> +                                                           vfio_vmstate_change,
> >>                                                             vbasedev);
> >>      migration->migration_state.notify = vfio_migration_state_notifier;
> >>      add_migration_state_change_notifier(&migration->migration_state);  
> > 
> > .
> >   
> 

In any case,

Reviewed-by: Cornelia Huck <cohuck@redhat.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration
  2020-12-09  8:09 [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration Shenming Lu
                   ` (2 preceding siblings ...)
  2020-12-09  8:09 ` [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration Shenming Lu
@ 2021-01-26  6:03 ` Shenming Lu
  3 siblings, 0 replies; 24+ messages in thread
From: Shenming Lu @ 2021-01-26  6:03 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	qemu-devel, Dr . David Alan Gilbert, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On 2020/12/9 16:09, Shenming Lu wrote:
> This patch set includes two fixes and one optimization for VFIO migration
> as blew:
> Patch 1-2:
> - Fix two ordering problems in migration.
> 
> Patch 3:
> - Optimize the enabling process of the MSI-X vectors in migration.
> 

Hi,

Friendly ping, is there any further comments on this series (especially
for patch 1 and 3)?  :-)

Thanks,
Shenming

> 
> Shenming Lu (3):
>   vfio: Move the saving of the config space to the right place in VFIO
>     migration
>   vfio: Set the priority of the VFIO VM state change handler explicitly
>   vfio: Avoid disabling and enabling vectors repeatedly in VFIO
>     migration
> 
>  hw/pci/msix.c         | 17 +++++++++++++++++
>  hw/vfio/migration.c   | 28 +++++++++++++++++-----------
>  hw/vfio/pci.c         | 10 ++++++++--
>  include/hw/pci/msix.h |  2 ++
>  4 files changed, 44 insertions(+), 13 deletions(-)
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2020-12-10  2:21       ` Shenming Lu
@ 2021-01-26 21:36         ` Alex Williamson
  2021-01-27 21:52           ` Kirti Wankhede
  2021-02-18 14:45           ` Kirti Wankhede
  0 siblings, 2 replies; 24+ messages in thread
From: Alex Williamson @ 2021-01-26 21:36 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On Thu, 10 Dec 2020 10:21:21 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> On 2020/12/10 2:34, Alex Williamson wrote:
> > On Wed, 9 Dec 2020 13:29:47 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> >> On Wed, 9 Dec 2020 16:09:17 +0800
> >> Shenming Lu <lushenming@huawei.com> wrote:
> >>  
> >>> On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
> >>> setup, if the restoring of the VFIO PCI device config space is
> >>> before the VGIC, an error might occur in the kernel.
> >>>
> >>> So we move the saving of the config space to the non-iterable
> >>> process, so that it will be called after the VGIC according to
> >>> their priorities.
> >>>
> >>> As for the possible dependence of the device specific migration
> >>> data on it's config space, we can let the vendor driver to
> >>> include any config info it needs in its own data stream.
> >>> (Should we note this in the header file linux-headers/linux/vfio.h?)    
> >>
> >> Given that the header is our primary source about how this interface
> >> should act, we need to properly document expectations about what will
> >> be saved/restored when there (well, in the source file in the kernel.)
> >> That goes in both directions: what a userspace must implement, and what
> >> a vendor driver can rely on.  
> 
> Yeah, in order to make the vendor driver and QEMU cooperate better, we might
> need to document some expectations about the data section in the migration
> region...
> >>
> >> [Related, but not a todo for you: I think we're still missing proper
> >> documentation of the whole migration feature.]  
> > 
> > Yes, we never saw anything past v1 of the documentation patch.  Thanks,
> >   
> 
> By the way, is there anything unproper with this patch? Wish your suggestion. :-)

I'm really hoping for some feedback from Kirti, I understand the NVIDIA
vGPU driver to have some dependency on this.  Thanks,

Alex

> >>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> >>> ---
> >>>  hw/vfio/migration.c | 25 +++++++++++++++----------
> >>>  1 file changed, 15 insertions(+), 10 deletions(-)    
> > 
> > .
> >   
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2020-12-09  8:09 ` [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly Shenming Lu
  2020-12-09 12:45   ` Cornelia Huck
@ 2021-01-26 21:36   ` Alex Williamson
  2021-01-27 11:20     ` Shenming Lu
  1 sibling, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2021-01-26 21:36 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On Wed, 9 Dec 2020 16:09:18 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> In the VFIO VM state change handler, VFIO devices are transitioned
> in the _SAVING state, which should keep them from sending interrupts.

Is this comment accurate?  It's my expectation that _SAVING has no
bearing on a device generating interrupts.  Interrupt generation must
be allowed to continue so long as the device is _RUNNING.  Thanks,

Alex

> Then we can save the pending states of all interrupts in the GIC VM
> state change handler (on ARM).
> 
> So we have to set the priority of the VFIO VM state change handler
> explicitly (like virtio devices) to ensure it is called before the
> GIC's in saving.
> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
> ---
>  hw/vfio/migration.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3b9de1353a..97ea82b100 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>      register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
>                           vbasedev);
>  
> -    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> +                                                           vfio_vmstate_change,
>                                                             vbasedev);
>      migration->migration_state.notify = vfio_migration_state_notifier;
>      add_migration_state_change_notifier(&migration->migration_state);



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration
  2020-12-09  8:09 ` [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration Shenming Lu
@ 2021-01-26 21:36   ` Alex Williamson
  2021-01-27 11:27     ` Shenming Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2021-01-26 21:36 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On Wed, 9 Dec 2020 16:09:19 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> Different from the normal situation when the guest starts, we can
> know the max unmasked vetctor (at the beginning) after msix_load()
> in VFIO migration. So in order to avoid ineffectively disabling and

s/ineffectively/inefficiently/?  It's "effective" either way I think.

> enabling vectors repeatedly, let's allocate all needed vectors first
> and then enable these unmasked vectors one by one without disabling.
> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> ---
>  hw/pci/msix.c         | 17 +++++++++++++++++
>  hw/vfio/pci.c         | 10 ++++++++--
>  include/hw/pci/msix.h |  2 ++
>  3 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 67e34f34d6..bf291d3ff8 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -557,6 +557,23 @@ unsigned int msix_nr_vectors_allocated(const PCIDevice *dev)
>      return dev->msix_entries_nr;
>  }
>  
> +int msix_get_max_unmasked_vector(PCIDevice *dev)
> +{
> +    int max_unmasked_vector = -1;
> +    int vector;
> +
> +    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
> +        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
> +        for (vector = 0; vector < dev->msix_entries_nr; vector++) {
> +            if (!msix_is_masked(dev, vector)) {
> +                max_unmasked_vector = vector;
> +            }
> +        }
> +    }
> +
> +    return max_unmasked_vector;
> +}

Comments from QEMU PCI folks?

> +
>  static int msix_set_notifier_for_vector(PCIDevice *dev, unsigned int vector)
>  {
>      MSIMessage msg;
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 51dc373695..e755ed2514 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -568,6 +568,9 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>  
>  static void vfio_msix_enable(VFIOPCIDevice *vdev)
>  {
> +    int max_unmasked_vector = msix_get_max_unmasked_vector(&vdev->pdev);
> +    unsigned int used_vector = MAX(max_unmasked_vector, 0);
> +

The above PCI function could also be done inline here pretty easily too:

unsigned int nr, max_vec = 0;

if (!msix_masked(&vdev->pdev))
    for (nr = 0; nr < msix_nr_vectors_allocated(&vdev->pdev); nr++) {
        if (!msix_is_masked(&vdev->pdev, nr)) {
            max_vec = nr;
        }
    }
}

It's a bit cleaner than the msix utility function, imo. 

>      vfio_disable_interrupts(vdev);
>  
>      vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
> @@ -586,9 +589,12 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>       * triggering to userspace, then immediately release the vector, leaving
>       * the physical device with no vectors enabled, but MSI-X enabled, just
>       * like the guest view.
> +     * If there are unmasked vectors (such as in migration) which will be
> +     * enabled soon, we can allocate them here to avoid ineffectively disabling
> +     * and enabling vectors repeatedly later.

It just happens that migration triggers this usage model where the
MSI-X enable bit is set with vectors unmasked in the vector table, but
this is not unique to migration, guests can follow this pattern as well.
Has this been tested with a variety of guests?  Logically it seems
correct, but always good to prove so.  Thanks,

Alex

>       */
> -    vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
> -    vfio_msix_vector_release(&vdev->pdev, 0);
> +    vfio_msix_vector_do_use(&vdev->pdev, used_vector, NULL, NULL);
> +    vfio_msix_vector_release(&vdev->pdev, used_vector);
>  
>      if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use,
>                                    vfio_msix_vector_release, NULL)) {
> diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
> index 4c4a60c739..4bfb463fa6 100644
> --- a/include/hw/pci/msix.h
> +++ b/include/hw/pci/msix.h
> @@ -23,6 +23,8 @@ void msix_uninit_exclusive_bar(PCIDevice *dev);
>  
>  unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
>  
> +int msix_get_max_unmasked_vector(PCIDevice *dev);
> +
>  void msix_save(PCIDevice *dev, QEMUFile *f);
>  void msix_load(PCIDevice *dev, QEMUFile *f);
>  



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2021-01-26 21:36   ` Alex Williamson
@ 2021-01-27 11:20     ` Shenming Lu
  2021-01-27 14:20       ` Alex Williamson
  0 siblings, 1 reply; 24+ messages in thread
From: Shenming Lu @ 2021-01-27 11:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On 2021/1/27 5:36, Alex Williamson wrote:
> On Wed, 9 Dec 2020 16:09:18 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> In the VFIO VM state change handler, VFIO devices are transitioned
>> in the _SAVING state, which should keep them from sending interrupts.
> 
> Is this comment accurate?  It's my expectation that _SAVING has no
> bearing on a device generating interrupts.  Interrupt generation must
> be allowed to continue so long as the device is _RUNNING.  Thanks,
> 

To be more accurate, the _RUNNING bit in device_state is cleared in the
VFIO VM state change handler when stopping the VM. And if the device continues
to send interrupts after this, how can we save the states of device interrupts
in the stop-and-copy phase?...

Thanks,
Shenming

> Alex
> 
>> Then we can save the pending states of all interrupts in the GIC VM
>> state change handler (on ARM).
>>
>> So we have to set the priority of the VFIO VM state change handler
>> explicitly (like virtio devices) to ensure it is called before the
>> GIC's in saving.
>>
>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
>> ---
>>  hw/vfio/migration.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 3b9de1353a..97ea82b100 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>>      register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
>>                           vbasedev);
>>  
>> -    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
>> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
>> +                                                           vfio_vmstate_change,
>>                                                             vbasedev);
>>      migration->migration_state.notify = vfio_migration_state_notifier;
>>      add_migration_state_change_notifier(&migration->migration_state);
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration
  2021-01-26 21:36   ` Alex Williamson
@ 2021-01-27 11:27     ` Shenming Lu
  2021-01-27 14:21       ` Alex Williamson
  0 siblings, 1 reply; 24+ messages in thread
From: Shenming Lu @ 2021-01-27 11:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On 2021/1/27 5:36, Alex Williamson wrote:
> On Wed, 9 Dec 2020 16:09:19 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> Different from the normal situation when the guest starts, we can
>> know the max unmasked vetctor (at the beginning) after msix_load()
>> in VFIO migration. So in order to avoid ineffectively disabling and
> 
> s/ineffectively/inefficiently/?  It's "effective" either way I think.

Yeah, I should say "inefficiently". :-)

> 
>> enabling vectors repeatedly, let's allocate all needed vectors first
>> and then enable these unmasked vectors one by one without disabling.
>>
>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>> ---
>>  hw/pci/msix.c         | 17 +++++++++++++++++
>>  hw/vfio/pci.c         | 10 ++++++++--
>>  include/hw/pci/msix.h |  2 ++
>>  3 files changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index 67e34f34d6..bf291d3ff8 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -557,6 +557,23 @@ unsigned int msix_nr_vectors_allocated(const PCIDevice *dev)
>>      return dev->msix_entries_nr;
>>  }
>>  
>> +int msix_get_max_unmasked_vector(PCIDevice *dev)
>> +{
>> +    int max_unmasked_vector = -1;
>> +    int vector;
>> +
>> +    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>> +        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>> +        for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>> +            if (!msix_is_masked(dev, vector)) {
>> +                max_unmasked_vector = vector;
>> +            }
>> +        }
>> +    }
>> +
>> +    return max_unmasked_vector;
>> +}
> 
> Comments from QEMU PCI folks?
> 
>> +
>>  static int msix_set_notifier_for_vector(PCIDevice *dev, unsigned int vector)
>>  {
>>      MSIMessage msg;
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 51dc373695..e755ed2514 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -568,6 +568,9 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>>  
>>  static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>  {
>> +    int max_unmasked_vector = msix_get_max_unmasked_vector(&vdev->pdev);
>> +    unsigned int used_vector = MAX(max_unmasked_vector, 0);
>> +
> 
> The above PCI function could also be done inline here pretty easily too:
> 
> unsigned int nr, max_vec = 0;
> 
> if (!msix_masked(&vdev->pdev))
>     for (nr = 0; nr < msix_nr_vectors_allocated(&vdev->pdev); nr++) {
>         if (!msix_is_masked(&vdev->pdev, nr)) {
>             max_vec = nr;
>         }
>     }
> }
> 
> It's a bit cleaner than the msix utility function, imo.

Yeah, it makes sense.

> 
>>      vfio_disable_interrupts(vdev);
>>  
>>      vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
>> @@ -586,9 +589,12 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>       * triggering to userspace, then immediately release the vector, leaving
>>       * the physical device with no vectors enabled, but MSI-X enabled, just
>>       * like the guest view.
>> +     * If there are unmasked vectors (such as in migration) which will be
>> +     * enabled soon, we can allocate them here to avoid ineffectively disabling
>> +     * and enabling vectors repeatedly later.
> 
> It just happens that migration triggers this usage model where the
> MSI-X enable bit is set with vectors unmasked in the vector table, but
> this is not unique to migration, guests can follow this pattern as well.
> Has this been tested with a variety of guests?  Logically it seems
> correct, but always good to prove so.  Thanks,

I have tested it in migration and normal guest startup (only the latest Linux).
And I can try to test with some other kernels, could you be more specific about this?

Thanks,
Shenming

> 
> Alex
> 
>>       */
>> -    vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
>> -    vfio_msix_vector_release(&vdev->pdev, 0);
>> +    vfio_msix_vector_do_use(&vdev->pdev, used_vector, NULL, NULL);
>> +    vfio_msix_vector_release(&vdev->pdev, used_vector);
>>  
>>      if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use,
>>                                    vfio_msix_vector_release, NULL)) {
>> diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
>> index 4c4a60c739..4bfb463fa6 100644
>> --- a/include/hw/pci/msix.h
>> +++ b/include/hw/pci/msix.h
>> @@ -23,6 +23,8 @@ void msix_uninit_exclusive_bar(PCIDevice *dev);
>>  
>>  unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
>>  
>> +int msix_get_max_unmasked_vector(PCIDevice *dev);
>> +
>>  void msix_save(PCIDevice *dev, QEMUFile *f);
>>  void msix_load(PCIDevice *dev, QEMUFile *f);
>>  
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2021-01-27 11:20     ` Shenming Lu
@ 2021-01-27 14:20       ` Alex Williamson
  2021-01-28  2:35         ` Shenming Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2021-01-27 14:20 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On Wed, 27 Jan 2021 19:20:06 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> On 2021/1/27 5:36, Alex Williamson wrote:
> > On Wed, 9 Dec 2020 16:09:18 +0800
> > Shenming Lu <lushenming@huawei.com> wrote:
> >   
> >> In the VFIO VM state change handler, VFIO devices are transitioned
> >> in the _SAVING state, which should keep them from sending interrupts.  
> > 
> > Is this comment accurate?  It's my expectation that _SAVING has no
> > bearing on a device generating interrupts.  Interrupt generation must
> > be allowed to continue so long as the device is _RUNNING.  Thanks,
> >   
> 
> To be more accurate, the _RUNNING bit in device_state is cleared in the
> VFIO VM state change handler when stopping the VM. And if the device continues
> to send interrupts after this, how can we save the states of device interrupts
> in the stop-and-copy phase?...

Exactly, it's clearing the _RUNNING bit that makes the device stop,
including no longer generating interrupts.  Perhaps I incorrectly
inferred "_SAVING state" as referring to the _SAVING bit when you
actually intended:

   *  +------- _RESUMING
   *  |+------ _SAVING
   *  ||+----- _RUNNING
   *  |||
   *  000b => Device Stopped, not saving or resuming
   *  001b => Device running, which is the default state
-> *  010b => Stop the device & save the device state, stop-and-copy state

ie. the full state when only _SAVING is set.

Could we make the comment more clear to avoid this confusion?  Thanks,

Alex

> >> Then we can save the pending states of all interrupts in the GIC VM
> >> state change handler (on ARM).
> >>
> >> So we have to set the priority of the VFIO VM state change handler
> >> explicitly (like virtio devices) to ensure it is called before the
> >> GIC's in saving.
> >>
> >> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> >> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
> >> ---
> >>  hw/vfio/migration.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index 3b9de1353a..97ea82b100 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
> >>      register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
> >>                           vbasedev);
> >>  
> >> -    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
> >> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> >> +                                                           vfio_vmstate_change,
> >>                                                             vbasedev);
> >>      migration->migration_state.notify = vfio_migration_state_notifier;
> >>      add_migration_state_change_notifier(&migration->migration_state);  
> > 
> > .
> >   
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration
  2021-01-27 11:27     ` Shenming Lu
@ 2021-01-27 14:21       ` Alex Williamson
  2021-02-01  3:12         ` Shenming Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2021-01-27 14:21 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On Wed, 27 Jan 2021 19:27:35 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> On 2021/1/27 5:36, Alex Williamson wrote:
> > On Wed, 9 Dec 2020 16:09:19 +0800
> > Shenming Lu <lushenming@huawei.com> wrote:
> >   
> >> Different from the normal situation when the guest starts, we can
> >> know the max unmasked vetctor (at the beginning) after msix_load()
> >> in VFIO migration. So in order to avoid ineffectively disabling and  
> > 
> > s/ineffectively/inefficiently/?  It's "effective" either way I think.  
> 
> Yeah, I should say "inefficiently". :-)
> 
> >   
> >> enabling vectors repeatedly, let's allocate all needed vectors first
> >> and then enable these unmasked vectors one by one without disabling.
> >>
> >> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> >> ---
> >>  hw/pci/msix.c         | 17 +++++++++++++++++
> >>  hw/vfio/pci.c         | 10 ++++++++--
> >>  include/hw/pci/msix.h |  2 ++
> >>  3 files changed, 27 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> >> index 67e34f34d6..bf291d3ff8 100644
> >> --- a/hw/pci/msix.c
> >> +++ b/hw/pci/msix.c
> >> @@ -557,6 +557,23 @@ unsigned int msix_nr_vectors_allocated(const PCIDevice *dev)
> >>      return dev->msix_entries_nr;
> >>  }
> >>  
> >> +int msix_get_max_unmasked_vector(PCIDevice *dev)
> >> +{
> >> +    int max_unmasked_vector = -1;
> >> +    int vector;
> >> +
> >> +    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
> >> +        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
> >> +        for (vector = 0; vector < dev->msix_entries_nr; vector++) {
> >> +            if (!msix_is_masked(dev, vector)) {
> >> +                max_unmasked_vector = vector;
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +    return max_unmasked_vector;
> >> +}  
> > 
> > Comments from QEMU PCI folks?
> >   
> >> +
> >>  static int msix_set_notifier_for_vector(PCIDevice *dev, unsigned int vector)
> >>  {
> >>      MSIMessage msg;
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 51dc373695..e755ed2514 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -568,6 +568,9 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
> >>  
> >>  static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >>  {
> >> +    int max_unmasked_vector = msix_get_max_unmasked_vector(&vdev->pdev);
> >> +    unsigned int used_vector = MAX(max_unmasked_vector, 0);
> >> +  
> > 
> > The above PCI function could also be done inline here pretty easily too:
> > 
> > unsigned int nr, max_vec = 0;
> > 
> > if (!msix_masked(&vdev->pdev))
> >     for (nr = 0; nr < msix_nr_vectors_allocated(&vdev->pdev); nr++) {
> >         if (!msix_is_masked(&vdev->pdev, nr)) {
> >             max_vec = nr;
> >         }
> >     }
> > }
> > 
> > It's a bit cleaner than the msix utility function, imo.  
> 
> Yeah, it makes sense.
> 
> >   
> >>      vfio_disable_interrupts(vdev);
> >>  
> >>      vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
> >> @@ -586,9 +589,12 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >>       * triggering to userspace, then immediately release the vector, leaving
> >>       * the physical device with no vectors enabled, but MSI-X enabled, just
> >>       * like the guest view.
> >> +     * If there are unmasked vectors (such as in migration) which will be
> >> +     * enabled soon, we can allocate them here to avoid ineffectively disabling
> >> +     * and enabling vectors repeatedly later.  
> > 
> > It just happens that migration triggers this usage model where the
> > MSI-X enable bit is set with vectors unmasked in the vector table, but
> > this is not unique to migration, guests can follow this pattern as well.
> > Has this been tested with a variety of guests?  Logically it seems
> > correct, but always good to prove so.  Thanks,  
> 
> I have tested it in migration and normal guest startup (only the latest Linux).
> And I can try to test with some other kernels, could you be more specific about this?

Minimally also Windows, ideally a BSD as well.  Thanks,

Alex

> >>       */
> >> -    vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
> >> -    vfio_msix_vector_release(&vdev->pdev, 0);
> >> +    vfio_msix_vector_do_use(&vdev->pdev, used_vector, NULL, NULL);
> >> +    vfio_msix_vector_release(&vdev->pdev, used_vector);
> >>  
> >>      if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use,
> >>                                    vfio_msix_vector_release, NULL)) {
> >> diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
> >> index 4c4a60c739..4bfb463fa6 100644
> >> --- a/include/hw/pci/msix.h
> >> +++ b/include/hw/pci/msix.h
> >> @@ -23,6 +23,8 @@ void msix_uninit_exclusive_bar(PCIDevice *dev);
> >>  
> >>  unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
> >>  
> >> +int msix_get_max_unmasked_vector(PCIDevice *dev);
> >> +
> >>  void msix_save(PCIDevice *dev, QEMUFile *f);
> >>  void msix_load(PCIDevice *dev, QEMUFile *f);
> >>    
> > 
> > .
> >   
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2021-01-26 21:36         ` Alex Williamson
@ 2021-01-27 21:52           ` Kirti Wankhede
  2021-02-18 14:45           ` Kirti Wankhede
  1 sibling, 0 replies; 24+ messages in thread
From: Kirti Wankhede @ 2021-01-27 21:52 UTC (permalink / raw)
  To: Alex Williamson, Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	qemu-devel, Dr . David Alan Gilbert, Eric Auger, qemu-arm,
	yuzenghui, wanghaibin.wang



On 1/27/2021 3:06 AM, Alex Williamson wrote:
> On Thu, 10 Dec 2020 10:21:21 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> On 2020/12/10 2:34, Alex Williamson wrote:
>>> On Wed, 9 Dec 2020 13:29:47 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>    
>>>> On Wed, 9 Dec 2020 16:09:17 +0800
>>>> Shenming Lu <lushenming@huawei.com> wrote:
>>>>   
>>>>> On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
>>>>> setup, if the restoring of the VFIO PCI device config space is
>>>>> before the VGIC, an error might occur in the kernel.
>>>>>
>>>>> So we move the saving of the config space to the non-iterable
>>>>> process, so that it will be called after the VGIC according to
>>>>> their priorities.
>>>>>
>>>>> As for the possible dependence of the device specific migration
>>>>> data on it's config space, we can let the vendor driver to
>>>>> include any config info it needs in its own data stream.
>>>>> (Should we note this in the header file linux-headers/linux/vfio.h?)
>>>>
>>>> Given that the header is our primary source about how this interface
>>>> should act, we need to properly document expectations about what will
>>>> be saved/restored when there (well, in the source file in the kernel.)
>>>> That goes in both directions: what a userspace must implement, and what
>>>> a vendor driver can rely on.
>>
>> Yeah, in order to make the vendor driver and QEMU cooperate better, we might
>> need to document some expectations about the data section in the migration
>> region...
>>>>
>>>> [Related, but not a todo for you: I think we're still missing proper
>>>> documentation of the whole migration feature.]
>>>
>>> Yes, we never saw anything past v1 of the documentation patch.  Thanks,
>>>    
>>
>> By the way, is there anything unproper with this patch? Wish your suggestion. :-)
> 
> I'm really hoping for some feedback from Kirti, I understand the NVIDIA
> vGPU driver to have some dependency on this.  Thanks,
> 

I need to verify this patch. Spare me a day to verify this.

Thanks,
Kirti


> Alex
> 
>>>>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>>>>> ---
>>>>>   hw/vfio/migration.c | 25 +++++++++++++++----------
>>>>>   1 file changed, 15 insertions(+), 10 deletions(-)
>>>
>>> .
>>>    
>>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly
  2021-01-27 14:20       ` Alex Williamson
@ 2021-01-28  2:35         ` Shenming Lu
  0 siblings, 0 replies; 24+ messages in thread
From: Shenming Lu @ 2021-01-28  2:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On 2021/1/27 22:20, Alex Williamson wrote:
> On Wed, 27 Jan 2021 19:20:06 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> On 2021/1/27 5:36, Alex Williamson wrote:
>>> On Wed, 9 Dec 2020 16:09:18 +0800
>>> Shenming Lu <lushenming@huawei.com> wrote:
>>>   
>>>> In the VFIO VM state change handler, VFIO devices are transitioned
>>>> in the _SAVING state, which should keep them from sending interrupts.  
>>>
>>> Is this comment accurate?  It's my expectation that _SAVING has no
>>> bearing on a device generating interrupts.  Interrupt generation must
>>> be allowed to continue so long as the device is _RUNNING.  Thanks,
>>>   
>>
>> To be more accurate, the _RUNNING bit in device_state is cleared in the
>> VFIO VM state change handler when stopping the VM. And if the device continues
>> to send interrupts after this, how can we save the states of device interrupts
>> in the stop-and-copy phase?...
> 
> Exactly, it's clearing the _RUNNING bit that makes the device stop,
> including no longer generating interrupts.  Perhaps I incorrectly
> inferred "_SAVING state" as referring to the _SAVING bit when you
> actually intended:
> 
>    *  +------- _RESUMING
>    *  |+------ _SAVING
>    *  ||+----- _RUNNING
>    *  |||
>    *  000b => Device Stopped, not saving or resuming
>    *  001b => Device running, which is the default state
> -> *  010b => Stop the device & save the device state, stop-and-copy state
> 
> ie. the full state when only _SAVING is set.
> 
> Could we make the comment more clear to avoid this confusion?  Thanks,
> 

OK, sorry for the confusion. I will modify the comment to:

In the VFIO VM state change handler when stopping the VM, the _RUNNING bit
in device_state is cleared which makes the VFIO device stop, including
no longer generating interrupts.

Thanks,
Shenming

> Alex
> 
>>>> Then we can save the pending states of all interrupts in the GIC VM
>>>> state change handler (on ARM).
>>>>
>>>> So we have to set the priority of the VFIO VM state change handler
>>>> explicitly (like virtio devices) to ensure it is called before the
>>>> GIC's in saving.
>>>>
>>>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>>>> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
>>>> ---
>>>>  hw/vfio/migration.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>> index 3b9de1353a..97ea82b100 100644
>>>> --- a/hw/vfio/migration.c
>>>> +++ b/hw/vfio/migration.c
>>>> @@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>>>>      register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
>>>>                           vbasedev);
>>>>  
>>>> -    migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
>>>> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
>>>> +                                                           vfio_vmstate_change,
>>>>                                                             vbasedev);
>>>>      migration->migration_state.notify = vfio_migration_state_notifier;
>>>>      add_migration_state_change_notifier(&migration->migration_state);  
>>>
>>> .
>>>   
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration
  2021-01-27 14:21       ` Alex Williamson
@ 2021-02-01  3:12         ` Shenming Lu
  0 siblings, 0 replies; 24+ messages in thread
From: Shenming Lu @ 2021-02-01  3:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	Dr . David Alan Gilbert, qemu-devel, Eric Auger, Kirti Wankhede,
	qemu-arm, yuzenghui, wanghaibin.wang

On 2021/1/27 22:21, Alex Williamson wrote:
> On Wed, 27 Jan 2021 19:27:35 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> On 2021/1/27 5:36, Alex Williamson wrote:
>>> On Wed, 9 Dec 2020 16:09:19 +0800
>>> Shenming Lu <lushenming@huawei.com> wrote:
>>>   
>>>> Different from the normal situation when the guest starts, we can
>>>> know the max unmasked vetctor (at the beginning) after msix_load()
>>>> in VFIO migration. So in order to avoid ineffectively disabling and  
>>>
>>> s/ineffectively/inefficiently/?  It's "effective" either way I think.  
>>
>> Yeah, I should say "inefficiently". :-)
>>
>>>   
>>>> enabling vectors repeatedly, let's allocate all needed vectors first
>>>> and then enable these unmasked vectors one by one without disabling.
>>>>
>>>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>>>> ---
>>>>  hw/pci/msix.c         | 17 +++++++++++++++++
>>>>  hw/vfio/pci.c         | 10 ++++++++--
>>>>  include/hw/pci/msix.h |  2 ++
>>>>  3 files changed, 27 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>>>> index 67e34f34d6..bf291d3ff8 100644
>>>> --- a/hw/pci/msix.c
>>>> +++ b/hw/pci/msix.c
>>>> @@ -557,6 +557,23 @@ unsigned int msix_nr_vectors_allocated(const PCIDevice *dev)
>>>>      return dev->msix_entries_nr;
>>>>  }
>>>>  
>>>> +int msix_get_max_unmasked_vector(PCIDevice *dev)
>>>> +{
>>>> +    int max_unmasked_vector = -1;
>>>> +    int vector;
>>>> +
>>>> +    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>>>> +        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>>>> +        for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>>>> +            if (!msix_is_masked(dev, vector)) {
>>>> +                max_unmasked_vector = vector;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return max_unmasked_vector;
>>>> +}  
>>>
>>> Comments from QEMU PCI folks?
>>>   
>>>> +
>>>>  static int msix_set_notifier_for_vector(PCIDevice *dev, unsigned int vector)
>>>>  {
>>>>      MSIMessage msg;
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index 51dc373695..e755ed2514 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -568,6 +568,9 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>>>>  
>>>>  static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>>>  {
>>>> +    int max_unmasked_vector = msix_get_max_unmasked_vector(&vdev->pdev);
>>>> +    unsigned int used_vector = MAX(max_unmasked_vector, 0);
>>>> +  
>>>
>>> The above PCI function could also be done inline here pretty easily too:
>>>
>>> unsigned int nr, max_vec = 0;
>>>
>>> if (!msix_masked(&vdev->pdev))
>>>     for (nr = 0; nr < msix_nr_vectors_allocated(&vdev->pdev); nr++) {
>>>         if (!msix_is_masked(&vdev->pdev, nr)) {
>>>             max_vec = nr;
>>>         }
>>>     }
>>> }
>>>
>>> It's a bit cleaner than the msix utility function, imo.  
>>
>> Yeah, it makes sense.
>>
>>>   
>>>>      vfio_disable_interrupts(vdev);
>>>>  
>>>>      vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
>>>> @@ -586,9 +589,12 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>>>       * triggering to userspace, then immediately release the vector, leaving
>>>>       * the physical device with no vectors enabled, but MSI-X enabled, just
>>>>       * like the guest view.
>>>> +     * If there are unmasked vectors (such as in migration) which will be
>>>> +     * enabled soon, we can allocate them here to avoid ineffectively disabling
>>>> +     * and enabling vectors repeatedly later.  
>>>
>>> It just happens that migration triggers this usage model where the
>>> MSI-X enable bit is set with vectors unmasked in the vector table, but
>>> this is not unique to migration, guests can follow this pattern as well.
>>> Has this been tested with a variety of guests?  Logically it seems
>>> correct, but always good to prove so.  Thanks,  
>>
>> I have tested it in migration and normal guest startup (only the latest Linux).
>> And I can try to test with some other kernels, could you be more specific about this?
> 
> Minimally also Windows, ideally a BSD as well.  Thanks,
> 

Hi Alex,

I have tested this patch with a Windows guest (Windows Server 2012 R2 Datacenter, Intel
X722 Ethernet controller (passthrough)) and nothing went wrong. And I found that it does
trigger our usage model in the normal guest startup: has all needed vectors already unmasked
in the vector table when calling vfio_msix_enable()...

Thanks,
Shenming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2020-12-09  8:09 ` [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in " Shenming Lu
  2020-12-09 12:29   ` Cornelia Huck
@ 2021-02-18 14:42   ` Kirti Wankhede
  2021-02-19 10:33     ` Shenming Lu
  1 sibling, 1 reply; 24+ messages in thread
From: Kirti Wankhede @ 2021-02-18 14:42 UTC (permalink / raw)
  To: Shenming Lu, Alex Williamson, Cornelia Huck,
	Dr . David Alan Gilbert, Eric Auger, mst, Marcel Apfelbaum
  Cc: Lorenzo Pieralisi, Neo Jia, Marc Zyngier, qemu-devel, qemu-arm,
	yuzenghui, wanghaibin.wang



On 12/9/2020 1:39 PM, Shenming Lu wrote:
> On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
> setup, if the restoring of the VFIO PCI device config space is
> before the VGIC, an error might occur in the kernel.
> 
> So we move the saving of the config space to the non-iterable
> process, so that it will be called after the VGIC according to
> their priorities.
> 
> As for the possible dependence of the device specific migration
> data on it's config space, we can let the vendor driver to
> include any config info it needs in its own data stream.
> (Should we note this in the header file linux-headers/linux/vfio.h?)
> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> ---
>   hw/vfio/migration.c | 25 +++++++++++++++----------
>   1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 00daa50ed8..3b9de1353a 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -575,11 +575,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>           return ret;
>       }
>   
> -    ret = vfio_save_device_config_state(f, opaque);
> -    if (ret) {
> -        return ret;
> -    }
> -
>       ret = vfio_update_pending(vbasedev);
>       if (ret) {
>           return ret;
> @@ -620,6 +615,19 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>       return ret;
>   }
>   
> +static void vfio_save_state(QEMUFile *f, void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    int ret;
> +
> +    /* The device specific data is migrated in the iterable process. */
> +    ret = vfio_save_device_config_state(f, opaque);
> +    if (ret) {
> +        error_report("%s: Failed to save device config space",
> +                     vbasedev->name);
> +    }
> +}
> +

Since error is not propagated, set error in migration stream for 
migration to fail, use qemu_file_set_error() on error.

Thanks,
Kirti

>   static int vfio_load_setup(QEMUFile *f, void *opaque)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -670,11 +678,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>           switch (data) {
>           case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>           {
> -            ret = vfio_load_device_config_state(f, opaque);
> -            if (ret) {
> -                return ret;
> -            }
> -            break;
> +            return vfio_load_device_config_state(f, opaque);
>           }
>           case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>           {
> @@ -720,6 +724,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
>       .save_live_pending = vfio_save_pending,
>       .save_live_iterate = vfio_save_iterate,
>       .save_live_complete_precopy = vfio_save_complete_precopy,
> +    .save_state = vfio_save_state,
>       .load_setup = vfio_load_setup,
>       .load_cleanup = vfio_load_cleanup,
>       .load_state = vfio_load_state,
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2021-01-26 21:36         ` Alex Williamson
  2021-01-27 21:52           ` Kirti Wankhede
@ 2021-02-18 14:45           ` Kirti Wankhede
  1 sibling, 0 replies; 24+ messages in thread
From: Kirti Wankhede @ 2021-02-18 14:45 UTC (permalink / raw)
  To: Alex Williamson, Shenming Lu
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	qemu-devel, Dr . David Alan Gilbert, Eric Auger, qemu-arm,
	yuzenghui, wanghaibin.wang



On 1/27/2021 3:06 AM, Alex Williamson wrote:
> On Thu, 10 Dec 2020 10:21:21 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> On 2020/12/10 2:34, Alex Williamson wrote:
>>> On Wed, 9 Dec 2020 13:29:47 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>    
>>>> On Wed, 9 Dec 2020 16:09:17 +0800
>>>> Shenming Lu <lushenming@huawei.com> wrote:
>>>>   
>>>>> On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
>>>>> setup, if the restoring of the VFIO PCI device config space is
>>>>> before the VGIC, an error might occur in the kernel.
>>>>>
>>>>> So we move the saving of the config space to the non-iterable
>>>>> process, so that it will be called after the VGIC according to
>>>>> their priorities.
>>>>>
>>>>> As for the possible dependence of the device specific migration
>>>>> data on it's config space, we can let the vendor driver to
>>>>> include any config info it needs in its own data stream.
>>>>> (Should we note this in the header file linux-headers/linux/vfio.h?)
>>>>
>>>> Given that the header is our primary source about how this interface
>>>> should act, we need to properly document expectations about what will
>>>> be saved/restored when there (well, in the source file in the kernel.)
>>>> That goes in both directions: what a userspace must implement, and what
>>>> a vendor driver can rely on.
>>
>> Yeah, in order to make the vendor driver and QEMU cooperate better, we might
>> need to document some expectations about the data section in the migration
>> region...
>>>>
>>>> [Related, but not a todo for you: I think we're still missing proper
>>>> documentation of the whole migration feature.]
>>>
>>> Yes, we never saw anything past v1 of the documentation patch.  Thanks,
>>>

I'll get back on this and send next version soon.

>>
>> By the way, is there anything unproper with this patch? Wish your suggestion. :-)
> 
> I'm really hoping for some feedback from Kirti, I understand the NVIDIA
> vGPU driver to have some dependency on this.  Thanks,

NVIDIA driver doesn't use device config space value/information during 
device data restoration, so we are good with this change.

Thanks,
Kirti


> 
> Alex
> 
>>>>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>>>>> ---
>>>>>   hw/vfio/migration.c | 25 +++++++++++++++----------
>>>>>   1 file changed, 15 insertions(+), 10 deletions(-)
>>>
>>> .
>>>    
>>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in VFIO migration
  2021-02-18 14:42   ` Kirti Wankhede
@ 2021-02-19 10:33     ` Shenming Lu
  0 siblings, 0 replies; 24+ messages in thread
From: Shenming Lu @ 2021-02-19 10:33 UTC (permalink / raw)
  To: Kirti Wankhede, Alex Williamson
  Cc: Lorenzo Pieralisi, Neo Jia, mst, Marc Zyngier, Cornelia Huck,
	qemu-devel, Dr . David Alan Gilbert, Eric Auger, qemu-arm,
	yuzenghui, wanghaibin.wang

On 2021/2/18 22:42, Kirti Wankhede wrote:
> 
> 
> On 12/9/2020 1:39 PM, Shenming Lu wrote:
>> On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
>> setup, if the restoring of the VFIO PCI device config space is
>> before the VGIC, an error might occur in the kernel.
>>
>> So we move the saving of the config space to the non-iterable
>> process, so that it will be called after the VGIC according to
>> their priorities.
>>
>> As for the possible dependence of the device specific migration
>> data on it's config space, we can let the vendor driver to
>> include any config info it needs in its own data stream.
>> (Should we note this in the header file linux-headers/linux/vfio.h?)
>>
>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>> ---
>>   hw/vfio/migration.c | 25 +++++++++++++++----------
>>   1 file changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 00daa50ed8..3b9de1353a 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -575,11 +575,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>>           return ret;
>>       }
>>   -    ret = vfio_save_device_config_state(f, opaque);
>> -    if (ret) {
>> -        return ret;
>> -    }
>> -
>>       ret = vfio_update_pending(vbasedev);
>>       if (ret) {
>>           return ret;
>> @@ -620,6 +615,19 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>>       return ret;
>>   }
>>   +static void vfio_save_state(QEMUFile *f, void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    int ret;
>> +
>> +    /* The device specific data is migrated in the iterable process. */
>> +    ret = vfio_save_device_config_state(f, opaque);
>> +    if (ret) {
>> +        error_report("%s: Failed to save device config space",
>> +                     vbasedev->name);
>> +    }
>> +}
>> +
> 
> Since error is not propagated, set error in migration stream for migration to fail, use qemu_file_set_error() on error.

Makes sense. I will send a v3 soon.	Thanks,

Shenming

> 
> Thanks,
> Kirti
> 
>>   static int vfio_load_setup(QEMUFile *f, void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -670,11 +678,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>           switch (data) {
>>           case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>>           {
>> -            ret = vfio_load_device_config_state(f, opaque);
>> -            if (ret) {
>> -                return ret;
>> -            }
>> -            break;
>> +            return vfio_load_device_config_state(f, opaque);
>>           }
>>           case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>>           {
>> @@ -720,6 +724,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
>>       .save_live_pending = vfio_save_pending,
>>       .save_live_iterate = vfio_save_iterate,
>>       .save_live_complete_precopy = vfio_save_complete_precopy,
>> +    .save_state = vfio_save_state,
>>       .load_setup = vfio_load_setup,
>>       .load_cleanup = vfio_load_cleanup,
>>       .load_state = vfio_load_state,
>>
> .


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-02-19 10:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-09  8:09 [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for VFIO migration Shenming Lu
2020-12-09  8:09 ` [RFC PATCH v2 1/3] vfio: Move the saving of the config space to the right place in " Shenming Lu
2020-12-09 12:29   ` Cornelia Huck
2020-12-09 18:34     ` Alex Williamson
2020-12-10  2:21       ` Shenming Lu
2021-01-26 21:36         ` Alex Williamson
2021-01-27 21:52           ` Kirti Wankhede
2021-02-18 14:45           ` Kirti Wankhede
2021-02-18 14:42   ` Kirti Wankhede
2021-02-19 10:33     ` Shenming Lu
2020-12-09  8:09 ` [RFC PATCH v2 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly Shenming Lu
2020-12-09 12:45   ` Cornelia Huck
2020-12-10  3:11     ` Shenming Lu
2020-12-10 18:39       ` Cornelia Huck
2021-01-26 21:36   ` Alex Williamson
2021-01-27 11:20     ` Shenming Lu
2021-01-27 14:20       ` Alex Williamson
2021-01-28  2:35         ` Shenming Lu
2020-12-09  8:09 ` [RFC PATCH v2 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration Shenming Lu
2021-01-26 21:36   ` Alex Williamson
2021-01-27 11:27     ` Shenming Lu
2021-01-27 14:21       ` Alex Williamson
2021-02-01  3:12         ` Shenming Lu
2021-01-26  6:03 ` [RFC PATCH v2 0/3] vfio: Some fixes and optimizations for " Shenming Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.