All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] VFIO migration related refactor and bug fix
@ 2023-06-21  8:02 Zhenzhong Duan
  2023-06-21  8:02 ` [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize Zhenzhong Duan
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Zhenzhong Duan @ 2023-06-21  8:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

PATCH3 refactors the VFIO migration blocker related code based on
suggestions from Joao and Cedric, so that code is simpler and
"Migration disabled" printed in right case.

But before that works, also found some hotplug bugs when testing
blocker adding failed case. PATCH1 and PATCH2 fix them.

See patch description for details.

Thanks

v3: Add PATCH1,2 to fix hotplug bug
    Fix bugs in PATCH3 Avihai and Joao pointed out

Tested vfio hotplug/unplug with vfio migration supported and unsupported cases,
including blocker adding failed case.

Zhenzhong Duan (3):
  vfio/pci: Fix resource leak in vfio_realize
  vfio/pci: Fix a segfault in vfio_realize
  vfio/migration: vfio/migration: Refactor and fix print of "Migration
    disabled"

 hw/vfio/common.c              | 54 +++++------------------------------
 hw/vfio/migration.c           | 37 +++++++++++-------------
 hw/vfio/pci.c                 |  9 ++++--
 include/hw/vfio/vfio-common.h |  7 ++---
 4 files changed, 33 insertions(+), 74 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-21  8:02 [PATCH v3 0/3] VFIO migration related refactor and bug fix Zhenzhong Duan
@ 2023-06-21  8:02 ` Zhenzhong Duan
  2023-06-21 11:08   ` Joao Martins
  2023-06-21  8:02 ` [PATCH v3 2/3] vfio/pci: Fix a segfault " Zhenzhong Duan
  2023-06-21  8:02 ` [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
  2 siblings, 1 reply; 20+ messages in thread
From: Zhenzhong Duan @ 2023-06-21  8:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

When adding migration blocker failed in vfio_migration_realize(),
hotplug will fail and we see below:

(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true
0000:81:11.1: VFIO migration is not supported in kernel
Error: disallowing migration blocker (--only-migratable) for: VFIO device doesn't support migration

If we hotplug again we should see same log as above, but we see:
(qemu) device_add vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true
Error: vfio 0000:81:11.0: device is already attached

That's because some references to VFIO device isn't released,
we should check return value of vfio_migration_realize() and
release the references, then VFIO device will be truely
released when hotplug failed.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 73874a94de12..c71b0955d81c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         ret = vfio_migration_realize(vbasedev, errp);
         if (ret) {
             error_report("%s: Migration disabled", vbasedev->name);
+            goto out_deregister;
         }
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/3] vfio/pci: Fix a segfault in vfio_realize
  2023-06-21  8:02 [PATCH v3 0/3] VFIO migration related refactor and bug fix Zhenzhong Duan
  2023-06-21  8:02 ` [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize Zhenzhong Duan
@ 2023-06-21  8:02 ` Zhenzhong Duan
  2023-06-21 11:08   ` Joao Martins
  2023-06-21  8:02 ` [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
  2 siblings, 1 reply; 20+ messages in thread
From: Zhenzhong Duan @ 2023-06-21  8:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

In case irqchip_change_notifier isn't added, removing it triggers segfault.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c71b0955d81c..82c4cf4f7609 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3222,7 +3222,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 out_deregister:
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
-    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
+    if (vdev->irqchip_change_notifier.notify) {
+        kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
+    }
 out_teardown:
     vfio_teardown_msi(vdev);
     vfio_bars_exit(vdev);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-21  8:02 [PATCH v3 0/3] VFIO migration related refactor and bug fix Zhenzhong Duan
  2023-06-21  8:02 ` [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize Zhenzhong Duan
  2023-06-21  8:02 ` [PATCH v3 2/3] vfio/pci: Fix a segfault " Zhenzhong Duan
@ 2023-06-21  8:02 ` Zhenzhong Duan
  2023-06-26  9:34   ` Avihai Horon
  2 siblings, 1 reply; 20+ messages in thread
From: Zhenzhong Duan @ 2023-06-21  8:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

This patch refactors vfio_migration_realize() and its dependend code
as follows:

1. It's redundant in vfio_migration_realize() to registers multiple blockers,
   e.g: vIOMMU blocker can be refactored as per device blocker.
2. Change vfio_block_giommu_migration() to be only a per device checker.
3. Remove global vIOMMU blocker related stuff, e.g:
   giommu_migration_blocker, vfio_unblock_giommu_migration(),
   vfio_viommu_preset() and vfio_migration_finalize()
4. Change vfio_migration_realize() and dependent vfio_block_*_migration() to
   return bool type.
5. Change to print "Migration disabled" only after adding blocker succeed.
6. Add device name to errp so "info migrate" could be more informative.

migrate_add_blocker() returns 0 when successfully adding the migration blocker.
However, the caller of vfio_migration_realize() considers that migration was
blocked when the latter returned an error. What matters for migration is that
the blocker is added in core migration, so this cleans up usability such that
user sees "Migrate disabled" when any of the vfio migration blockers are active.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/common.c              | 54 +++++------------------------------
 hw/vfio/migration.c           | 37 +++++++++++-------------
 hw/vfio/pci.c                 |  4 +--
 include/hw/vfio/vfio-common.h |  7 ++---
 4 files changed, 29 insertions(+), 73 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fa8fd949b1cf..cc5f4e805341 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -362,8 +362,6 @@ bool vfio_mig_active(void)
 }
 
 static Error *multiple_devices_migration_blocker;
-static Error *giommu_migration_blocker;
-
 static unsigned int vfio_migratable_device_num(void)
 {
     VFIOGroup *group;
@@ -381,13 +379,13 @@ static unsigned int vfio_migratable_device_num(void)
     return device_num;
 }
 
-int vfio_block_multiple_devices_migration(Error **errp)
+bool vfio_block_multiple_devices_migration(Error **errp)
 {
     int ret;
 
     if (multiple_devices_migration_blocker ||
         vfio_migratable_device_num() <= 1) {
-        return 0;
+        return true;
     }
 
     error_setg(&multiple_devices_migration_blocker,
@@ -397,9 +395,11 @@ int vfio_block_multiple_devices_migration(Error **errp)
     if (ret < 0) {
         error_free(multiple_devices_migration_blocker);
         multiple_devices_migration_blocker = NULL;
+    } else {
+        error_report("Migration disabled, not support multiple VFIO devices");
     }
 
-    return ret;
+    return !ret;
 }
 
 void vfio_unblock_multiple_devices_migration(void)
@@ -414,49 +414,9 @@ void vfio_unblock_multiple_devices_migration(void)
     multiple_devices_migration_blocker = NULL;
 }
 
-static bool vfio_viommu_preset(void)
-{
-    VFIOAddressSpace *space;
-
-    QLIST_FOREACH(space, &vfio_address_spaces, list) {
-        if (space->as != &address_space_memory) {
-            return true;
-        }
-    }
-
-    return false;
-}
-
-int vfio_block_giommu_migration(Error **errp)
-{
-    int ret;
-
-    if (giommu_migration_blocker ||
-        !vfio_viommu_preset()) {
-        return 0;
-    }
-
-    error_setg(&giommu_migration_blocker,
-               "Migration is currently not supported with vIOMMU enabled");
-    ret = migrate_add_blocker(giommu_migration_blocker, errp);
-    if (ret < 0) {
-        error_free(giommu_migration_blocker);
-        giommu_migration_blocker = NULL;
-    }
-
-    return ret;
-}
-
-void vfio_migration_finalize(void)
+bool vfio_block_giommu_migration(VFIODevice *vbasedev)
 {
-    if (!giommu_migration_blocker ||
-        vfio_viommu_preset()) {
-        return;
-    }
-
-    migrate_del_blocker(giommu_migration_blocker);
-    error_free(giommu_migration_blocker);
-    giommu_migration_blocker = NULL;
+    return vbasedev->group->container->space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int err)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 6b58dddb8859..7621074f156d 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -632,42 +632,39 @@ int64_t vfio_mig_bytes_transferred(void)
     return bytes_transferred;
 }
 
-int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
+/* Return true when either migration initialized or blocker registered */
+bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
 {
-    int ret = -ENOTSUP;
+    int ret;
 
-    if (!vbasedev->enable_migration) {
+    if (!vbasedev->enable_migration || vfio_migration_init(vbasedev)) {
+        error_setg(&vbasedev->migration_blocker,
+                   "VFIO device %s doesn't support migration", vbasedev->name);
         goto add_blocker;
     }
 
-    ret = vfio_migration_init(vbasedev);
-    if (ret) {
-        goto add_blocker;
-    }
-
-    ret = vfio_block_multiple_devices_migration(errp);
-    if (ret) {
-        return ret;
+    if (!vfio_block_multiple_devices_migration(errp)) {
+        return false;
     }
 
-    ret = vfio_block_giommu_migration(errp);
-    if (ret) {
-        return ret;
+    if (vfio_block_giommu_migration(vbasedev)) {
+        error_setg(&vbasedev->migration_blocker,
+                   "Migration is currently not supported on %s "
+                   "with vIOMMU enabled", vbasedev->name);
+        goto add_blocker;
     }
 
-    trace_vfio_migration_probe(vbasedev->name);
-    return 0;
+    return true;
 
 add_blocker:
-    error_setg(&vbasedev->migration_blocker,
-               "VFIO device doesn't support migration");
-
     ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
     if (ret < 0) {
         error_free(vbasedev->migration_blocker);
         vbasedev->migration_blocker = NULL;
+    } else {
+        error_report("%s: Migration disabled", vbasedev->name);
     }
-    return ret;
+    return !ret;
 }
 
 void vfio_migration_exit(VFIODevice *vbasedev)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 82c4cf4f7609..061ca96cbce2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     if (!pdev->failover_pair_id) {
         ret = vfio_migration_realize(vbasedev, errp);
         if (ret) {
-            error_report("%s: Migration disabled", vbasedev->name);
+            trace_vfio_migration_probe(vbasedev->name);
+        } else {
             goto out_deregister;
         }
     }
@@ -3250,7 +3251,6 @@ static void vfio_instance_finalize(Object *obj)
      */
     vfio_put_device(vdev);
     vfio_put_group(group);
-    vfio_migration_finalize();
 }
 
 static void vfio_exitfn(PCIDevice *pdev)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index eed244f25f34..a2e2171b1f93 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -220,9 +220,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
 
 bool vfio_mig_active(void);
-int vfio_block_multiple_devices_migration(Error **errp);
+bool vfio_block_multiple_devices_migration(Error **errp);
 void vfio_unblock_multiple_devices_migration(void);
-int vfio_block_giommu_migration(Error **errp);
+bool vfio_block_giommu_migration(VFIODevice *vbasedev);
 int64_t vfio_mig_bytes_transferred(void);
 
 #ifdef CONFIG_LINUX
@@ -246,8 +246,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space);
 
-int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
+bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
-void vfio_migration_finalize(void);
 
 #endif /* HW_VFIO_VFIO_COMMON_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-21  8:02 ` [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize Zhenzhong Duan
@ 2023-06-21 11:08   ` Joao Martins
  2023-06-25  6:00     ` Duan, Zhenzhong
  0 siblings, 1 reply; 20+ messages in thread
From: Joao Martins @ 2023-06-21 11:08 UTC (permalink / raw)
  To: Zhenzhong Duan; +Cc: alex.williamson, clg, qemu-devel, avihaih, chao.p.peng



On 21/06/2023 09:02, Zhenzhong Duan wrote:
> When adding migration blocker failed in vfio_migration_realize(),
> hotplug will fail and we see below:
> 
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true
> 0000:81:11.1: VFIO migration is not supported in kernel
> Error: disallowing migration blocker (--only-migratable) for: VFIO device doesn't support migration
> 
> If we hotplug again we should see same log as above, but we see:
> (qemu) device_add vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true
> Error: vfio 0000:81:11.0: device is already attached
> 
> That's because some references to VFIO device isn't released,
> we should check return value of vfio_migration_realize() and
> release the references, then VFIO device will be truely
> released when hotplug failed.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  hw/vfio/pci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 73874a94de12..c71b0955d81c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          ret = vfio_migration_realize(vbasedev, errp);
>          if (ret) {
>              error_report("%s: Migration disabled", vbasedev->name);
> +            goto out_deregister;
>          }
>      }
>  
This doesn't look right. This means that failure to support migration will
deregister the device. Migration "realize" should not condition as to whether
your device finishes the realize. Maybe the fix is elsewhere?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] vfio/pci: Fix a segfault in vfio_realize
  2023-06-21  8:02 ` [PATCH v3 2/3] vfio/pci: Fix a segfault " Zhenzhong Duan
@ 2023-06-21 11:08   ` Joao Martins
  2023-06-25  6:01     ` Duan, Zhenzhong
  0 siblings, 1 reply; 20+ messages in thread
From: Joao Martins @ 2023-06-21 11:08 UTC (permalink / raw)
  To: Zhenzhong Duan; +Cc: alex.williamson, clg, qemu-devel, avihaih, chao.p.peng



On 21/06/2023 09:02, Zhenzhong Duan wrote:
> In case irqchip_change_notifier isn't added, removing it triggers segfault.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  hw/vfio/pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c71b0955d81c..82c4cf4f7609 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3222,7 +3222,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  
>  out_deregister:
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> -    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> +    if (vdev->irqchip_change_notifier.notify) {
> +        kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> +    }

If the first patch ends up being pursued (which I am not quite sure) it should
be folded in the previous patch, as the out_deregister is used starting your
patch 1.

>  out_teardown:
>      vfio_teardown_msi(vdev);
>      vfio_bars_exit(vdev);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-21 11:08   ` Joao Martins
@ 2023-06-25  6:00     ` Duan, Zhenzhong
  2023-06-26  7:02       ` Duan, Zhenzhong
  0 siblings, 1 reply; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-25  6:00 UTC (permalink / raw)
  To: Martins, Joao; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Wednesday, June 21, 2023 7:08 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>
>
>
>On 21/06/2023 09:02, Zhenzhong Duan wrote:
>> When adding migration blocker failed in vfio_migration_realize(),
>> hotplug will fail and we see below:
>>
>> (qemu) device_add
>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true
>> 0000:81:11.1: VFIO migration is not supported in kernel
>> Error: disallowing migration blocker (--only-migratable) for: VFIO
>> device doesn't support migration
>>
>> If we hotplug again we should see same log as above, but we see:
>> (qemu) device_add
>> vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true
>> Error: vfio 0000:81:11.0: device is already attached
>>
>> That's because some references to VFIO device isn't released, we
>> should check return value of vfio_migration_realize() and release the
>> references, then VFIO device will be truely released when hotplug
>> failed.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  hw/vfio/pci.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>> 73874a94de12..c71b0955d81c 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>          ret = vfio_migration_realize(vbasedev, errp);
>>          if (ret) {
>>              error_report("%s: Migration disabled", vbasedev->name);
>> +            goto out_deregister;
>>          }
>>      }
>>
>This doesn't look right. This means that failure to support migration will
>deregister the device.

In my understanding, failure to support migration but success to add migration blocker will not deregister device. vfio_migration_realize() is successful in this case.
But failure to add migration blocker should deregister device, because vfio_exitfn() is never called in this case(errp set), jumping to out_deregister is the best choice. Then vfio_migration_realize() should return failure in this case.

> Migration "realize" should not condition as to whether
>your device finishes the realize.
What's the impact if we condition it?

Thanks
Zhenzhong



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/3] vfio/pci: Fix a segfault in vfio_realize
  2023-06-21 11:08   ` Joao Martins
@ 2023-06-25  6:01     ` Duan, Zhenzhong
  2023-06-26  9:58       ` Joao Martins
  0 siblings, 1 reply; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-25  6:01 UTC (permalink / raw)
  To: Martins, Joao; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Wednesday, June 21, 2023 7:09 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 2/3] vfio/pci: Fix a segfault in vfio_realize
>
>
>
>On 21/06/2023 09:02, Zhenzhong Duan wrote:
>> In case irqchip_change_notifier isn't added, removing it triggers segfault.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  hw/vfio/pci.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>> c71b0955d81c..82c4cf4f7609 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3222,7 +3222,9 @@ static void vfio_realize(PCIDevice *pdev, Error
>> **errp)
>>
>>  out_deregister:
>>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>> -    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>> +    if (vdev->irqchip_change_notifier.notify) {
>> +        kvm_irqchip_remove_change_notifier(&vdev-
>>irqchip_change_notifier);
>> +    }
>
>If the first patch ends up being pursued (which I am not quite sure) it should
>be folded in the previous patch, as the out_deregister is used starting your
>patch 1.
Sorry for late response, just back from vacation.

out_deregister isn't only for vfio migration, there are some other jump sites to out_deregister in vfio_realize. Take below code for example:

    if (vdev->display_xres || vdev->display_yres) {
        if (vdev->dpy == NULL) {
            error_setg(errp, "xres and yres properties require display=on");
            goto out_deregister;
        }

I can reproduce a segmentation fault when hotplug a vfio device using below cmd:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Connection closed by foreign host.

After fix:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
Error: vfio 0000:81:11.1: xres and yres properties require display=on
(qemu)

Thanks
Zhenzhong



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-25  6:00     ` Duan, Zhenzhong
@ 2023-06-26  7:02       ` Duan, Zhenzhong
  2023-06-26 10:07         ` Joao Martins
  0 siblings, 1 reply; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-26  7:02 UTC (permalink / raw)
  To: Martins, Joao; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

>-----Original Message-----
>From: Duan, Zhenzhong
>Sent: Sunday, June 25, 2023 2:01 PM
>To: Joao Martins <joao.m.martins@oracle.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>Subject: RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>
>>-----Original Message-----
>>From: Joao Martins <joao.m.martins@oracle.com>
>>Sent: Wednesday, June 21, 2023 7:08 PM
>>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-
>devel@nongnu.org;
>>avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>>Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>>
>>
>>
>>On 21/06/2023 09:02, Zhenzhong Duan wrote:
>>> When adding migration blocker failed in vfio_migration_realize(),
>>> hotplug will fail and we see below:
>>>
>>> (qemu) device_add
>>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true
>>> 0000:81:11.1: VFIO migration is not supported in kernel
>>> Error: disallowing migration blocker (--only-migratable) for: VFIO
>>> device doesn't support migration
>>>
>>> If we hotplug again we should see same log as above, but we see:
>>> (qemu) device_add
>>> vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true
>>> Error: vfio 0000:81:11.0: device is already attached
>>>
>>> That's because some references to VFIO device isn't released, we
>>> should check return value of vfio_migration_realize() and release the
>>> references, then VFIO device will be truely released when hotplug
>>> failed.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>  hw/vfio/pci.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>>> 73874a94de12..c71b0955d81c 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>>**errp)
>>>          ret = vfio_migration_realize(vbasedev, errp);
>>>          if (ret) {
>>>              error_report("%s: Migration disabled", vbasedev->name);
>>> +            goto out_deregister;
>>>          }
>>>      }
>>>
>>This doesn't look right. This means that failure to support migration
>>will deregister the device.
>
>In my understanding, failure to support migration but success to add
>migration blocker will not deregister device. vfio_migration_realize() is
>successful in this case.
>But failure to add migration blocker should deregister device, because
>vfio_exitfn() is never called in this case(errp set), jumping to out_deregister is
>the best choice. Then vfio_migration_realize() should return failure in this
>case.

I just realized the error path in vfio_realize() isn't adequate. We need to add more deregister code just as vfio_exitfn(), see below. I'll re-post after we are consensus on this and get some comments of PATCH3.

--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3210,7 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         ret = vfio_migration_realize(vbasedev, errp);
         if (ret) {
             error_report("%s: Migration disabled", vbasedev->name);
-            goto out_deregister;
+            goto out_vfio_migration;
         }
     }

@@ -3220,11 +3220,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)

     return;

+out_vfio_migration:
+    vfio_migration_exit(vbasedev);
 out_deregister:
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
     if (vdev->irqchip_change_notifier.notify) {
         kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
     }
+    vfio_disable_interrupts(vdev);
+    if (vdev->intx.mmap_timer) {
+        timer_free(vdev->intx.mmap_timer);
+    }
 out_teardown:
     vfio_teardown_msi(vdev);
     vfio_bars_exit(vdev);

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-21  8:02 ` [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
@ 2023-06-26  9:34   ` Avihai Horon
  2023-06-26 10:18     ` Joao Martins
  2023-06-27  2:46     ` Duan, Zhenzhong
  0 siblings, 2 replies; 20+ messages in thread
From: Avihai Horon @ 2023-06-26  9:34 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, clg, joao.m.martins, chao.p.peng


On 21/06/2023 11:02, Zhenzhong Duan wrote:
> External email: Use caution opening links or attachments
>
>
> This patch refactors vfio_migration_realize() and its dependend code
> as follows:
>
> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>     e.g: vIOMMU blocker can be refactored as per device blocker.
> 2. Change vfio_block_giommu_migration() to be only a per device checker.
> 3. Remove global vIOMMU blocker related stuff, e.g:
>     giommu_migration_blocker, vfio_unblock_giommu_migration(),
>     vfio_viommu_preset() and vfio_migration_finalize()
> 4. Change vfio_migration_realize() and dependent vfio_block_*_migration() to
>     return bool type.
> 5. Change to print "Migration disabled" only after adding blocker succeed.
> 6. Add device name to errp so "info migrate" could be more informative.
>
> migrate_add_blocker() returns 0 when successfully adding the migration blocker.
> However, the caller of vfio_migration_realize() considers that migration was
> blocked when the latter returned an error. What matters for migration is that
> the blocker is added in core migration, so this cleans up usability such that
> user sees "Migrate disabled" when any of the vfio migration blockers are active.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/vfio/common.c              | 54 +++++------------------------------
>   hw/vfio/migration.c           | 37 +++++++++++-------------
>   hw/vfio/pci.c                 |  4 +--
>   include/hw/vfio/vfio-common.h |  7 ++---
>   4 files changed, 29 insertions(+), 73 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index fa8fd949b1cf..cc5f4e805341 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -362,8 +362,6 @@ bool vfio_mig_active(void)
>   }
>
>   static Error *multiple_devices_migration_blocker;
> -static Error *giommu_migration_blocker;
> -
>   static unsigned int vfio_migratable_device_num(void)
>   {
>       VFIOGroup *group;
> @@ -381,13 +379,13 @@ static unsigned int vfio_migratable_device_num(void)
>       return device_num;
>   }
>
> -int vfio_block_multiple_devices_migration(Error **errp)
> +bool vfio_block_multiple_devices_migration(Error **errp)
>   {
>       int ret;
>
>       if (multiple_devices_migration_blocker ||
>           vfio_migratable_device_num() <= 1) {
> -        return 0;
> +        return true;
>       }
>
>       error_setg(&multiple_devices_migration_blocker,
> @@ -397,9 +395,11 @@ int vfio_block_multiple_devices_migration(Error **errp)
>       if (ret < 0) {
>           error_free(multiple_devices_migration_blocker);
>           multiple_devices_migration_blocker = NULL;
> +    } else {
> +        error_report("Migration disabled, not support multiple VFIO devices");
>       }
>
> -    return ret;
> +    return !ret;
>   }
>
>   void vfio_unblock_multiple_devices_migration(void)
> @@ -414,49 +414,9 @@ void vfio_unblock_multiple_devices_migration(void)
>       multiple_devices_migration_blocker = NULL;
>   }
>
> -static bool vfio_viommu_preset(void)
> -{
> -    VFIOAddressSpace *space;
> -
> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> -        if (space->as != &address_space_memory) {
> -            return true;
> -        }
> -    }
> -
> -    return false;
> -}
> -
> -int vfio_block_giommu_migration(Error **errp)
> -{
> -    int ret;
> -
> -    if (giommu_migration_blocker ||
> -        !vfio_viommu_preset()) {
> -        return 0;
> -    }
> -
> -    error_setg(&giommu_migration_blocker,
> -               "Migration is currently not supported with vIOMMU enabled");
> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
> -    if (ret < 0) {
> -        error_free(giommu_migration_blocker);
> -        giommu_migration_blocker = NULL;
> -    }
> -
> -    return ret;
> -}
> -
> -void vfio_migration_finalize(void)
> +bool vfio_block_giommu_migration(VFIODevice *vbasedev)
>   {
> -    if (!giommu_migration_blocker ||
> -        vfio_viommu_preset()) {
> -        return;
> -    }
> -
> -    migrate_del_blocker(giommu_migration_blocker);
> -    error_free(giommu_migration_blocker);
> -    giommu_migration_blocker = NULL;
> +    return vbasedev->group->container->space->as != &address_space_memory;
>   }

I guess vfio_block_giommu_migration can be moved to migration.c and made 
static?
Although Joao's vIOMMU series will probably change that back later in 
some way.

>
>   static void vfio_set_migration_error(int err)
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 6b58dddb8859..7621074f156d 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -632,42 +632,39 @@ int64_t vfio_mig_bytes_transferred(void)
>       return bytes_transferred;
>   }
>
> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> +/* Return true when either migration initialized or blocker registered */
> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>   {
> -    int ret = -ENOTSUP;
> +    int ret;
>
> -    if (!vbasedev->enable_migration) {
> +    if (!vbasedev->enable_migration || vfio_migration_init(vbasedev)) {
> +        error_setg(&vbasedev->migration_blocker,
> +                   "VFIO device %s doesn't support migration", vbasedev->name);
>           goto add_blocker;
>       }
>
> -    ret = vfio_migration_init(vbasedev);
> -    if (ret) {
> -        goto add_blocker;
> -    }
> -
> -    ret = vfio_block_multiple_devices_migration(errp);
> -    if (ret) {
> -        return ret;
> +    if (!vfio_block_multiple_devices_migration(errp)) {
> +        return false;
>       }
>
> -    ret = vfio_block_giommu_migration(errp);
> -    if (ret) {
> -        return ret;
> +    if (vfio_block_giommu_migration(vbasedev)) {
> +        error_setg(&vbasedev->migration_blocker,
> +                   "Migration is currently not supported on %s "
> +                   "with vIOMMU enabled", vbasedev->name);
> +        goto add_blocker;
>       }
>
> -    trace_vfio_migration_probe(vbasedev->name);
> -    return 0;
> +    return true;
>
>   add_blocker:
> -    error_setg(&vbasedev->migration_blocker,
> -               "VFIO device doesn't support migration");
> -
>       ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
>       if (ret < 0) {
>           error_free(vbasedev->migration_blocker);
>           vbasedev->migration_blocker = NULL;
> +    } else {
> +        error_report("%s: Migration disabled", vbasedev->name);

When x-enable-migration=off, "Migration disabled" error will be printed, 
but this is the expected behavior, so we should not print it in this case.

>       }
> -    return ret;
> +    return !ret;
>   }
>
>   void vfio_migration_exit(VFIODevice *vbasedev)
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 82c4cf4f7609..061ca96cbce2 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>       if (!pdev->failover_pair_id) {
>           ret = vfio_migration_realize(vbasedev, errp);
>           if (ret) {
> -            error_report("%s: Migration disabled", vbasedev->name);
> +            trace_vfio_migration_probe(vbasedev->name);

While we are here, let's rename trace_vfio_migration_probe() to 
trace_vfio_migration_realize() (I can do it too in my series).

Thanks.

> +        } else {
>               goto out_deregister;
>           }
>       }
> @@ -3250,7 +3251,6 @@ static void vfio_instance_finalize(Object *obj)
>        */
>       vfio_put_device(vdev);
>       vfio_put_group(group);
> -    vfio_migration_finalize();
>   }
>
>   static void vfio_exitfn(PCIDevice *pdev)
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index eed244f25f34..a2e2171b1f93 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -220,9 +220,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>   extern VFIOGroupList vfio_group_list;
>
>   bool vfio_mig_active(void);
> -int vfio_block_multiple_devices_migration(Error **errp);
> +bool vfio_block_multiple_devices_migration(Error **errp);
>   void vfio_unblock_multiple_devices_migration(void);
> -int vfio_block_giommu_migration(Error **errp);
> +bool vfio_block_giommu_migration(VFIODevice *vbasedev);
>   int64_t vfio_mig_bytes_transferred(void);
>
>   #ifdef CONFIG_LINUX
> @@ -246,8 +246,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
>   int vfio_spapr_remove_window(VFIOContainer *container,
>                                hwaddr offset_within_address_space);
>
> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
>   void vfio_migration_exit(VFIODevice *vbasedev);
> -void vfio_migration_finalize(void);
>
>   #endif /* HW_VFIO_VFIO_COMMON_H */
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] vfio/pci: Fix a segfault in vfio_realize
  2023-06-25  6:01     ` Duan, Zhenzhong
@ 2023-06-26  9:58       ` Joao Martins
  0 siblings, 0 replies; 20+ messages in thread
From: Joao Martins @ 2023-06-26  9:58 UTC (permalink / raw)
  To: Duan, Zhenzhong; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

On 25/06/2023 07:01, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Sent: Wednesday, June 21, 2023 7:09 PM
>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>> avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>> Subject: Re: [PATCH v3 2/3] vfio/pci: Fix a segfault in vfio_realize
>>
>>
>>
>> On 21/06/2023 09:02, Zhenzhong Duan wrote:
>>> In case irqchip_change_notifier isn't added, removing it triggers segfault.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>  hw/vfio/pci.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>>> c71b0955d81c..82c4cf4f7609 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3222,7 +3222,9 @@ static void vfio_realize(PCIDevice *pdev, Error
>>> **errp)
>>>
>>>  out_deregister:
>>>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>> -    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>>> +    if (vdev->irqchip_change_notifier.notify) {
>>> +        kvm_irqchip_remove_change_notifier(&vdev-
>>> irqchip_change_notifier);
>>> +    }
>>
>> If the first patch ends up being pursued (which I am not quite sure) it should
>> be folded in the previous patch, as the out_deregister is used starting your
>> patch 1.
> Sorry for late response, just back from vacation.
> 
> out_deregister isn't only for vfio migration, there are some other jump sites to out_deregister in vfio_realize. Take below code for example:
> 
>     if (vdev->display_xres || vdev->display_yres) {
>         if (vdev->dpy == NULL) {
>             error_setg(errp, "xres and yres properties require display=on");
>             goto out_deregister;
>         }
> 
> I can reproduce a segmentation fault when hotplug a vfio device using below cmd:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Connection closed by foreign host.
> 
> After fix:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
> Error: vfio 0000:81:11.1: xres and yres properties require display=on
> (qemu)
> 

Makes sense. Let's keep it separate then.

> Thanks
> Zhenzhong
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-26  7:02       ` Duan, Zhenzhong
@ 2023-06-26 10:07         ` Joao Martins
  2023-06-27  2:38           ` Duan, Zhenzhong
  0 siblings, 1 reply; 20+ messages in thread
From: Joao Martins @ 2023-06-26 10:07 UTC (permalink / raw)
  To: Duan, Zhenzhong; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

On 26/06/2023 08:02, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Duan, Zhenzhong
>> Sent: Sunday, June 25, 2023 2:01 PM
>> To: Joao Martins <joao.m.martins@oracle.com>
>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>> avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>> Subject: RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Sent: Wednesday, June 21, 2023 7:08 PM
>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-
>> devel@nongnu.org;
>>> avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>>> Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>>>
>>>
>>>
>>> On 21/06/2023 09:02, Zhenzhong Duan wrote:
>>>> When adding migration blocker failed in vfio_migration_realize(),
>>>> hotplug will fail and we see below:
>>>>
>>>> (qemu) device_add
>>>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true
>>>> 0000:81:11.1: VFIO migration is not supported in kernel
>>>> Error: disallowing migration blocker (--only-migratable) for: VFIO
>>>> device doesn't support migration
>>>>
>>>> If we hotplug again we should see same log as above, but we see:
>>>> (qemu) device_add
>>>> vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true
>>>> Error: vfio 0000:81:11.0: device is already attached
>>>>
>>>> That's because some references to VFIO device isn't released, we
>>>> should check return value of vfio_migration_realize() and release the
>>>> references, then VFIO device will be truely released when hotplug
>>>> failed.
>>>>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>  hw/vfio/pci.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>>>> 73874a94de12..c71b0955d81c 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>>> **errp)
>>>>          ret = vfio_migration_realize(vbasedev, errp);
>>>>          if (ret) {
>>>>              error_report("%s: Migration disabled", vbasedev->name);
>>>> +            goto out_deregister;
>>>>          }
>>>>      }
>>>>
>>> This doesn't look right. This means that failure to support migration
>>> will deregister the device.
>>
>> In my understanding, failure to support migration but success to add
>> migration blocker will not deregister device. vfio_migration_realize() is
>> successful in this case.
>> But failure to add migration blocker should deregister device, because
>> vfio_exitfn() is never called in this case(errp set), jumping to out_deregister is
>> the best choice. Then vfio_migration_realize() should return failure in this
>> case.
> 

I was checking other devices in the tree, and I think you are right. Failure to
add the migration blocker results in a failure of device realize. Which IIUC
only happens in 'only-migratable' setups and during snapshots.

Maybe also including a sentence explainer in the commit message is useful too.

> I just realized the error path in vfio_realize() isn't adequate. We need to add more deregister code just as vfio_exitfn(), see below. I'll re-post after we are consensus on this and get some comments of PATCH3.
> 
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3210,7 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          ret = vfio_migration_realize(vbasedev, errp);
>          if (ret) {
>              error_report("%s: Migration disabled", vbasedev->name);
> -            goto out_deregister;
> +            goto out_vfio_migration;
>          }
>      }
> 
> @@ -3220,11 +3220,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> 
>      return;
> 
> +out_vfio_migration:
> +    vfio_migration_exit(vbasedev);

This belongs in this patch from the looks

>  out_deregister:
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>      if (vdev->irqchip_change_notifier.notify) {
>          kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>      }
> +    vfio_disable_interrupts(vdev);
> +    if (vdev->intx.mmap_timer) {
> +        timer_free(vdev->intx.mmap_timer);
> +    }

But this one suggests another one as it looks a pre-existing issue?

>  out_teardown:
>      vfio_teardown_msi(vdev);
>      vfio_bars_exit(vdev);
> 
> Thanks
> Zhenzhong


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-26  9:34   ` Avihai Horon
@ 2023-06-26 10:18     ` Joao Martins
  2023-06-27  2:55       ` Duan, Zhenzhong
  2023-06-27  2:46     ` Duan, Zhenzhong
  1 sibling, 1 reply; 20+ messages in thread
From: Joao Martins @ 2023-06-26 10:18 UTC (permalink / raw)
  To: Avihai Horon, Zhenzhong Duan
  Cc: alex.williamson, clg, chao.p.peng, qemu-devel



On 26/06/2023 10:34, Avihai Horon wrote:
> 
> On 21/06/2023 11:02, Zhenzhong Duan wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> This patch refactors vfio_migration_realize() and its dependend code
>> as follows:
>>
>> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>>     e.g: vIOMMU blocker can be refactored as per device blocker.
>> 2. Change vfio_block_giommu_migration() to be only a per device checker.
>> 3. Remove global vIOMMU blocker related stuff, e.g:
>>     giommu_migration_blocker, vfio_unblock_giommu_migration(),
>>     vfio_viommu_preset() and vfio_migration_finalize()
>> 4. Change vfio_migration_realize() and dependent vfio_block_*_migration() to
>>     return bool type.
>> 5. Change to print "Migration disabled" only after adding blocker succeed.
>> 6. Add device name to errp so "info migrate" could be more informative.
>>
>> migrate_add_blocker() returns 0 when successfully adding the migration blocker.
>> However, the caller of vfio_migration_realize() considers that migration was
>> blocked when the latter returned an error. What matters for migration is that
>> the blocker is added in core migration, so this cleans up usability such that
>> user sees "Migrate disabled" when any of the vfio migration blockers are active.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/common.c              | 54 +++++------------------------------
>>   hw/vfio/migration.c           | 37 +++++++++++-------------
>>   hw/vfio/pci.c                 |  4 +--
>>   include/hw/vfio/vfio-common.h |  7 ++---
>>   4 files changed, 29 insertions(+), 73 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index fa8fd949b1cf..cc5f4e805341 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -362,8 +362,6 @@ bool vfio_mig_active(void)
>>   }
>>
>>   static Error *multiple_devices_migration_blocker;
>> -static Error *giommu_migration_blocker;
>> -
>>   static unsigned int vfio_migratable_device_num(void)
>>   {
>>       VFIOGroup *group;
>> @@ -381,13 +379,13 @@ static unsigned int vfio_migratable_device_num(void)
>>       return device_num;
>>   }
>>
>> -int vfio_block_multiple_devices_migration(Error **errp)
>> +bool vfio_block_multiple_devices_migration(Error **errp)
>>   {
>>       int ret;
>>
>>       if (multiple_devices_migration_blocker ||
>>           vfio_migratable_device_num() <= 1) {
>> -        return 0;
>> +        return true;
>>       }
>>
>>       error_setg(&multiple_devices_migration_blocker,
>> @@ -397,9 +395,11 @@ int vfio_block_multiple_devices_migration(Error **errp)
>>       if (ret < 0) {
>>           error_free(multiple_devices_migration_blocker);
>>           multiple_devices_migration_blocker = NULL;
>> +    } else {
>> +        error_report("Migration disabled, not support multiple VFIO devices");
>>       }
>>
>> -    return ret;
>> +    return !ret;
>>   }
>>
>>   void vfio_unblock_multiple_devices_migration(void)
>> @@ -414,49 +414,9 @@ void vfio_unblock_multiple_devices_migration(void)
>>       multiple_devices_migration_blocker = NULL;
>>   }
>>
>> -static bool vfio_viommu_preset(void)
>> -{
>> -    VFIOAddressSpace *space;
>> -
>> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
>> -        if (space->as != &address_space_memory) {
>> -            return true;
>> -        }
>> -    }
>> -
>> -    return false;
>> -}
>> -
>> -int vfio_block_giommu_migration(Error **errp)
>> -{
>> -    int ret;
>> -
>> -    if (giommu_migration_blocker ||
>> -        !vfio_viommu_preset()) {
>> -        return 0;
>> -    }
>> -
>> -    error_setg(&giommu_migration_blocker,
>> -               "Migration is currently not supported with vIOMMU enabled");
>> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
>> -    if (ret < 0) {
>> -        error_free(giommu_migration_blocker);
>> -        giommu_migration_blocker = NULL;
>> -    }
>> -
>> -    return ret;
>> -}
>> -
>> -void vfio_migration_finalize(void)
>> +bool vfio_block_giommu_migration(VFIODevice *vbasedev)
>>   {
>> -    if (!giommu_migration_blocker ||
>> -        vfio_viommu_preset()) {
>> -        return;
>> -    }
>> -
>> -    migrate_del_blocker(giommu_migration_blocker);
>> -    error_free(giommu_migration_blocker);
>> -    giommu_migration_blocker = NULL;
>> +    return vbasedev->group->container->space->as != &address_space_memory;
>>   }
> 
> I guess vfio_block_giommu_migration can be moved to migration.c and made static?
> Although Joao's vIOMMU series will probably change that back later in some way.
> 
I guess it makes sense -- the thing that was tieing him was the global migration
blocker, which is now consolidated into the main migration blocker.

My vIOMMU series will relax this condition yes (still same per-device scope).
And I will need to check the maximum IOVA in the vIOMMU. But it's new code I can
adjust and Zhenzhong shouldn't worry about the vIOMMU future stuff but I don't
expect to influence location, say if it gets moved into migration.c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-26 10:07         ` Joao Martins
@ 2023-06-27  2:38           ` Duan, Zhenzhong
  2023-06-27 10:21             ` Joao Martins
  0 siblings, 1 reply; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-27  2:38 UTC (permalink / raw)
  To: Martins, Joao; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Monday, June 26, 2023 6:08 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>
>On 26/06/2023 08:02, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Duan, Zhenzhong
>>> Sent: Sunday, June 25, 2023 2:01 PM
>>> To: Joao Martins <joao.m.martins@oracle.com>
>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>> qemu-devel@nongnu.org; avihaih@nvidia.com; Peng, Chao P
>>> <chao.p.peng@intel.com>
>>> Subject: RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in
>>> vfio_realize
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Sent: Wednesday, June 21, 2023 7:08 PM
>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-
>>> devel@nongnu.org;
>>>> avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>>>> Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in
>>>> vfio_realize
>>>>
>>>>
>>>>
>>>> On 21/06/2023 09:02, Zhenzhong Duan wrote:
>>>>> When adding migration blocker failed in vfio_migration_realize(),
>>>>> hotplug will fail and we see below:
>>>>>
>>>>> (qemu) device_add
>>>>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true
>>>>> 0000:81:11.1: VFIO migration is not supported in kernel
>>>>> Error: disallowing migration blocker (--only-migratable) for: VFIO
>>>>> device doesn't support migration
>>>>>
>>>>> If we hotplug again we should see same log as above, but we see:
>>>>> (qemu) device_add
>>>>> vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true
>>>>> Error: vfio 0000:81:11.0: device is already attached
>>>>>
>>>>> That's because some references to VFIO device isn't released, we
>>>>> should check return value of vfio_migration_realize() and release
>>>>> the references, then VFIO device will be truely released when
>>>>> hotplug failed.
>>>>>
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>  hw/vfio/pci.c | 1 +
>>>>>  1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>>>>> 73874a94de12..c71b0955d81c 100644
>>>>> --- a/hw/vfio/pci.c
>>>>> +++ b/hw/vfio/pci.c
>>>>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev,
>>>>> Error
>>>> **errp)
>>>>>          ret = vfio_migration_realize(vbasedev, errp);
>>>>>          if (ret) {
>>>>>              error_report("%s: Migration disabled",
>>>>> vbasedev->name);
>>>>> +            goto out_deregister;
>>>>>          }
>>>>>      }
>>>>>
>>>> This doesn't look right. This means that failure to support
>>>> migration will deregister the device.
>>>
>>> In my understanding, failure to support migration but success to add
>>> migration blocker will not deregister device.
>>> vfio_migration_realize() is successful in this case.
>>> But failure to add migration blocker should deregister device,
>>> because
>>> vfio_exitfn() is never called in this case(errp set), jumping to
>>> out_deregister is the best choice. Then vfio_migration_realize()
>>> should return failure in this case.
>>
>
>I was checking other devices in the tree, and I think you are right. Failure to
>add the migration blocker results in a failure of device realize. Which IIUC only
>happens in 'only-migratable' setups and during snapshots.
Yes.

>
>Maybe also including a sentence explainer in the commit message is useful
>too.
Sure, will do.

>
>> I just realized the error path in vfio_realize() isn't adequate. We need to add
>more deregister code just as vfio_exitfn(), see below. I'll re-post after we are
>consensus on this and get some comments of PATCH3.
>>
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3210,7 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>          ret = vfio_migration_realize(vbasedev, errp);
>>          if (ret) {
>>              error_report("%s: Migration disabled", vbasedev->name);
>> -            goto out_deregister;
>> +            goto out_vfio_migration;
>>          }
>>      }
>>
>> @@ -3220,11 +3220,17 @@ static void vfio_realize(PCIDevice *pdev,
>> Error **errp)
>>
>>      return;
>>
>> +out_vfio_migration:
>> +    vfio_migration_exit(vbasedev);
>
>This belongs in this patch from the looks
Yes, I plan to merge above change to this patch.

>
>>  out_deregister:
>>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>      if (vdev->irqchip_change_notifier.notify) {
>>          kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>>      }
>> +    vfio_disable_interrupts(vdev);
>> +    if (vdev->intx.mmap_timer) {
>> +        timer_free(vdev->intx.mmap_timer);
>> +    }
>
>But this one suggests another one as it looks a pre-existing issue?
Yes, it's another resource leak I just found.
Not sure if it's better to also merge above change to this patch which is targeting resource leak issues,
or to PATCH2 which is targeting out_deregister path, or to create a new one.
Any suggestion?

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-26  9:34   ` Avihai Horon
  2023-06-26 10:18     ` Joao Martins
@ 2023-06-27  2:46     ` Duan, Zhenzhong
  1 sibling, 0 replies; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-27  2:46 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: alex.williamson, clg, Martins, Joao, Peng, Chao P



>-----Original Message-----
>From: Avihai Horon <avihaih@nvidia.com>
>Sent: Monday, June 26, 2023 5:35 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; qemu-
>devel@nongnu.org
>Cc: alex.williamson@redhat.com; clg@redhat.com; Martins, Joao
><joao.m.martins@oracle.com>; Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix
>print of "Migration disabled"
>
>
>On 21/06/2023 11:02, Zhenzhong Duan wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> This patch refactors vfio_migration_realize() and its dependend code
>> as follows:
>>
>> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>>     e.g: vIOMMU blocker can be refactored as per device blocker.
>> 2. Change vfio_block_giommu_migration() to be only a per device checker.
>> 3. Remove global vIOMMU blocker related stuff, e.g:
>>     giommu_migration_blocker, vfio_unblock_giommu_migration(),
>>     vfio_viommu_preset() and vfio_migration_finalize() 4. Change
>> vfio_migration_realize() and dependent vfio_block_*_migration() to
>>     return bool type.
>> 5. Change to print "Migration disabled" only after adding blocker succeed.
>> 6. Add device name to errp so "info migrate" could be more informative.
>>
>> migrate_add_blocker() returns 0 when successfully adding the migration
>blocker.
>> However, the caller of vfio_migration_realize() considers that
>> migration was blocked when the latter returned an error. What matters
>> for migration is that the blocker is added in core migration, so this
>> cleans up usability such that user sees "Migrate disabled" when any of the
>vfio migration blockers are active.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/common.c              | 54 +++++------------------------------
>>   hw/vfio/migration.c           | 37 +++++++++++-------------
>>   hw/vfio/pci.c                 |  4 +--
>>   include/hw/vfio/vfio-common.h |  7 ++---
>>   4 files changed, 29 insertions(+), 73 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>> fa8fd949b1cf..cc5f4e805341 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -362,8 +362,6 @@ bool vfio_mig_active(void)
>>   }
>>
>>   static Error *multiple_devices_migration_blocker;
>> -static Error *giommu_migration_blocker;
>> -
>>   static unsigned int vfio_migratable_device_num(void)
>>   {
>>       VFIOGroup *group;
>> @@ -381,13 +379,13 @@ static unsigned int
>vfio_migratable_device_num(void)
>>       return device_num;
>>   }
>>
>> -int vfio_block_multiple_devices_migration(Error **errp)
>> +bool vfio_block_multiple_devices_migration(Error **errp)
>>   {
>>       int ret;
>>
>>       if (multiple_devices_migration_blocker ||
>>           vfio_migratable_device_num() <= 1) {
>> -        return 0;
>> +        return true;
>>       }
>>
>>       error_setg(&multiple_devices_migration_blocker,
>> @@ -397,9 +395,11 @@ int vfio_block_multiple_devices_migration(Error
>**errp)
>>       if (ret < 0) {
>>           error_free(multiple_devices_migration_blocker);
>>           multiple_devices_migration_blocker = NULL;
>> +    } else {
>> +        error_report("Migration disabled, not support multiple VFIO
>> + devices");
>>       }
>>
>> -    return ret;
>> +    return !ret;
>>   }
>>
>>   void vfio_unblock_multiple_devices_migration(void)
>> @@ -414,49 +414,9 @@ void vfio_unblock_multiple_devices_migration(void)
>>       multiple_devices_migration_blocker = NULL;
>>   }
>>
>> -static bool vfio_viommu_preset(void)
>> -{
>> -    VFIOAddressSpace *space;
>> -
>> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
>> -        if (space->as != &address_space_memory) {
>> -            return true;
>> -        }
>> -    }
>> -
>> -    return false;
>> -}
>> -
>> -int vfio_block_giommu_migration(Error **errp) -{
>> -    int ret;
>> -
>> -    if (giommu_migration_blocker ||
>> -        !vfio_viommu_preset()) {
>> -        return 0;
>> -    }
>> -
>> -    error_setg(&giommu_migration_blocker,
>> -               "Migration is currently not supported with vIOMMU enabled");
>> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
>> -    if (ret < 0) {
>> -        error_free(giommu_migration_blocker);
>> -        giommu_migration_blocker = NULL;
>> -    }
>> -
>> -    return ret;
>> -}
>> -
>> -void vfio_migration_finalize(void)
>> +bool vfio_block_giommu_migration(VFIODevice *vbasedev)
>>   {
>> -    if (!giommu_migration_blocker ||
>> -        vfio_viommu_preset()) {
>> -        return;
>> -    }
>> -
>> -    migrate_del_blocker(giommu_migration_blocker);
>> -    error_free(giommu_migration_blocker);
>> -    giommu_migration_blocker = NULL;
>> +    return vbasedev->group->container->space->as !=
>> + &address_space_memory;
>>   }
>
>I guess vfio_block_giommu_migration can be moved to migration.c and made
>static?
>Although Joao's vIOMMU series will probably change that back later in some
>way.
Good idea, will do.

>
>>
>>   static void vfio_set_migration_error(int err)
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 6b58dddb8859..7621074f156d 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -632,42 +632,39 @@ int64_t vfio_mig_bytes_transferred(void)
>>       return bytes_transferred;
>>   }
>>
>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> +/* Return true when either migration initialized or blocker registered */
>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>   {
>> -    int ret = -ENOTSUP;
>> +    int ret;
>>
>> -    if (!vbasedev->enable_migration) {
>> +    if (!vbasedev->enable_migration || vfio_migration_init(vbasedev)) {
>> +        error_setg(&vbasedev->migration_blocker,
>> +                   "VFIO device %s doesn't support migration", vbasedev->name);
>>           goto add_blocker;
>>       }
>>
>> -    ret = vfio_migration_init(vbasedev);
>> -    if (ret) {
>> -        goto add_blocker;
>> -    }
>> -
>> -    ret = vfio_block_multiple_devices_migration(errp);
>> -    if (ret) {
>> -        return ret;
>> +    if (!vfio_block_multiple_devices_migration(errp)) {
>> +        return false;
>>       }
>>
>> -    ret = vfio_block_giommu_migration(errp);
>> -    if (ret) {
>> -        return ret;
>> +    if (vfio_block_giommu_migration(vbasedev)) {
>> +        error_setg(&vbasedev->migration_blocker,
>> +                   "Migration is currently not supported on %s "
>> +                   "with vIOMMU enabled", vbasedev->name);
>> +        goto add_blocker;
>>       }
>>
>> -    trace_vfio_migration_probe(vbasedev->name);
>> -    return 0;
>> +    return true;
>>
>>   add_blocker:
>> -    error_setg(&vbasedev->migration_blocker,
>> -               "VFIO device doesn't support migration");
>> -
>>       ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
>>       if (ret < 0) {
>>           error_free(vbasedev->migration_blocker);
>>           vbasedev->migration_blocker = NULL;
>> +    } else {
>> +        error_report("%s: Migration disabled", vbasedev->name);
>
>When x-enable-migration=off, "Migration disabled" error will be printed,
>but this is the expected behavior, so we should not print it in this case.
Make sense, will do.

>
>>       }
>> -    return ret;
>> +    return !ret;
>>   }
>>
>>   void vfio_migration_exit(VFIODevice *vbasedev)
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 82c4cf4f7609..061ca96cbce2 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>       if (!pdev->failover_pair_id) {
>>           ret = vfio_migration_realize(vbasedev, errp);
>>           if (ret) {
>> -            error_report("%s: Migration disabled", vbasedev->name);
>> +            trace_vfio_migration_probe(vbasedev->name);
>
>While we are here, let's rename trace_vfio_migration_probe() to
>trace_vfio_migration_realize() (I can do it too in my series).
Good finding, will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-26 10:18     ` Joao Martins
@ 2023-06-27  2:55       ` Duan, Zhenzhong
  2023-06-27 10:56         ` Joao Martins
  0 siblings, 1 reply; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-27  2:55 UTC (permalink / raw)
  To: Martins, Joao, Avihai Horon
  Cc: alex.williamson, clg, Peng, Chao P, qemu-devel



>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Monday, June 26, 2023 6:19 PM
>To: Avihai Horon <avihaih@nvidia.com>; Duan, Zhenzhong
><zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; Peng, Chao P
><chao.p.peng@intel.com>; qemu-devel@nongnu.org
>Subject: Re: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix
>print of "Migration disabled"
>
>
>
>On 26/06/2023 10:34, Avihai Horon wrote:
>>
>> On 21/06/2023 11:02, Zhenzhong Duan wrote:
...
>>> -void vfio_migration_finalize(void)
>>> +bool vfio_block_giommu_migration(VFIODevice *vbasedev)
>>>   {
>>> -    if (!giommu_migration_blocker ||
>>> -        vfio_viommu_preset()) {
>>> -        return;
>>> -    }
>>> -
>>> -    migrate_del_blocker(giommu_migration_blocker);
>>> -    error_free(giommu_migration_blocker);
>>> -    giommu_migration_blocker = NULL;
>>> +    return vbasedev->group->container->space->as !=
>>> +&address_space_memory;
>>>   }
>>
>> I guess vfio_block_giommu_migration can be moved to migration.c and
>made static?
>> Although Joao's vIOMMU series will probably change that back later in some
>way.
>>
>I guess it makes sense -- the thing that was tieing him was the global migration
>blocker, which is now consolidated into the main migration blocker.
>
>My vIOMMU series will relax this condition yes (still same per-device scope).
>And I will need to check the maximum IOVA in the vIOMMU. But it's new code
>I can adjust and Zhenzhong shouldn't worry about the vIOMMU future stuff
A bit confused, you agreed to move vfio_block_giommu_migration into migration.c

>but I don't expect to influence location, say if it gets moved into migration.c
and final decision is no move for influencing location reason? Do I misunderstand?

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-27  2:38           ` Duan, Zhenzhong
@ 2023-06-27 10:21             ` Joao Martins
  2023-06-27 10:28               ` Duan, Zhenzhong
  0 siblings, 1 reply; 20+ messages in thread
From: Joao Martins @ 2023-06-27 10:21 UTC (permalink / raw)
  To: Duan, Zhenzhong; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

>>>  out_deregister:
>>>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>>      if (vdev->irqchip_change_notifier.notify) {
>>>          kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>>>      }
>>> +    vfio_disable_interrupts(vdev);
>>> +    if (vdev->intx.mmap_timer) {
>>> +        timer_free(vdev->intx.mmap_timer);
>>> +    }
>>
>> But this one suggests another one as it looks a pre-existing issue?
> Yes, it's another resource leak I just found.
> Not sure if it's better to also merge above change to this patch which is targeting resource leak issues,
> or to PATCH2 which is targeting out_deregister path, or to create a new one.
> Any suggestion?

In general they are all bugs in the same deregistration path, but each resource
is not being teardown correctly. I tend to prefer 'logical change' per commit,
so there's would be a fix the irqchip_change notifier and another one for
mmap_timer teardown. Both with the Fixes tags that introduced each bug. Unless
everything was introduced by the same change in which case you would do
everything in one patch.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
  2023-06-27 10:21             ` Joao Martins
@ 2023-06-27 10:28               ` Duan, Zhenzhong
  0 siblings, 0 replies; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-27 10:28 UTC (permalink / raw)
  To: Martins, Joao; +Cc: alex.williamson, clg, qemu-devel, avihaih, Peng, Chao P

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Tuesday, June 27, 2023 6:22 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>
>>>>  out_deregister:
>>>>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>>>      if (vdev->irqchip_change_notifier.notify) {
>>>>          kvm_irqchip_remove_change_notifier(&vdev-
>>irqchip_change_notifier);
>>>>      }
>>>> +    vfio_disable_interrupts(vdev);
>>>> +    if (vdev->intx.mmap_timer) {
>>>> +        timer_free(vdev->intx.mmap_timer);
>>>> +    }
>>>
>>> But this one suggests another one as it looks a pre-existing issue?
>> Yes, it's another resource leak I just found.
>> Not sure if it's better to also merge above change to this patch which
>> is targeting resource leak issues, or to PATCH2 which is targeting
>out_deregister path, or to create a new one.
>> Any suggestion?
>
>In general they are all bugs in the same deregistration path, but each resource
>is not being teardown correctly. I tend to prefer 'logical change' per commit,
>so there's would be a fix the irqchip_change notifier and another one for
>mmap_timer teardown. Both with the Fixes tags that introduced each bug.
>Unless everything was introduced by the same change in which case you
>would do everything in one patch.
Clear.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-27  2:55       ` Duan, Zhenzhong
@ 2023-06-27 10:56         ` Joao Martins
  2023-06-28  2:26           ` Duan, Zhenzhong
  0 siblings, 1 reply; 20+ messages in thread
From: Joao Martins @ 2023-06-27 10:56 UTC (permalink / raw)
  To: Duan, Zhenzhong, Avihai Horon
  Cc: alex.williamson, clg, Peng, Chao P, qemu-devel

On 27/06/2023 03:55, Duan, Zhenzhong wrote:
>> I guess it makes sense -- the thing that was tieing him was the global migration
>> blocker, which is now consolidated into the main migration blocker.
>>
>> My vIOMMU series will relax this condition yes (still same per-device scope).
>> And I will need to check the maximum IOVA in the vIOMMU. But it's new code
>> I can adjust and Zhenzhong shouldn't worry about the vIOMMU future stuff
> A bit confused, you agreed to move vfio_block_giommu_migration into migration.c
> 
>> but I don't expect to influence location, say if it gets moved into migration.c
> and final decision is no move for influencing location reason? Do I misunderstand?

Sorry for the confusion.

The thing is that I will need 'similar code' to test if a vIOMMU is enabled or
not. The reason being that dirty tracking will need this to understand what to
track meaning to decide whether we track the whole address space or just the
linear map[0]. And all that code is in common, not migration.c and where I use
it will have to look at all address spaces (because dirty tracking is started
for all devices, so there's no vbasedev to look at).

Eventually after the vIOMMU stuff, the migration blocker condition will look
more or less like this:

	return (!vfio_viommu_preset(vbasedev) ||
		(vfio_viommu_preset(vbasedev) &&
		 !vfio_viommu_get_max_iova(&max)))

Whereby vfio_viommu_preset(vbasedev) is what you currently call
vfio_block_giommu_migration(vbasedev) in current patch. So perhaps I would say
to leave it in common.c and rename it to vfio_viommu_preset(vbasedev) with a
comment on top for /* Block migration with a vIOMMU */

Then when the time comes I can introduce in my vIOMMU series a
vfio_viommu_possible() [or some other name] when starting dirty tracking which
looks all VFIOAddressSpaces and reuses your vfio_viommu_preset(vbasedev). This
should cover current and future needs, without going to great extents beyond the
purpose of this patch?

[0]
https://lore.kernel.org/qemu-devel/20230622214845.3980-13-joao.m.martins@oracle.com/#iZ31hw:vfio:common.c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-27 10:56         ` Joao Martins
@ 2023-06-28  2:26           ` Duan, Zhenzhong
  0 siblings, 0 replies; 20+ messages in thread
From: Duan, Zhenzhong @ 2023-06-28  2:26 UTC (permalink / raw)
  To: Martins, Joao, Avihai Horon
  Cc: alex.williamson, clg, Peng, Chao P, qemu-devel

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Tuesday, June 27, 2023 6:57 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; Avihai Horon
><avihaih@nvidia.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; Peng, Chao P
><chao.p.peng@intel.com>; qemu-devel@nongnu.org
>Subject: Re: [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix
>print of "Migration disabled"
>
>On 27/06/2023 03:55, Duan, Zhenzhong wrote:
>>> I guess it makes sense -- the thing that was tieing him was the
>>> global migration blocker, which is now consolidated into the main
>migration blocker.
>>>
>>> My vIOMMU series will relax this condition yes (still same per-device scope).
>>> And I will need to check the maximum IOVA in the vIOMMU. But it's new
>>> code I can adjust and Zhenzhong shouldn't worry about the vIOMMU
>>> future stuff
>> A bit confused, you agreed to move vfio_block_giommu_migration into
>> migration.c
>>
>>> but I don't expect to influence location, say if it gets moved into
>>> migration.c
>> and final decision is no move for influencing location reason? Do I
>misunderstand?
>
>Sorry for the confusion.
>
>The thing is that I will need 'similar code' to test if a vIOMMU is enabled or not.
>The reason being that dirty tracking will need this to understand what to track
>meaning to decide whether we track the whole address space or just the
>linear map[0]. And all that code is in common, not migration.c and where I
>use it will have to look at all address spaces (because dirty tracking is started
>for all devices, so there's no vbasedev to look at).
>
>Eventually after the vIOMMU stuff, the migration blocker condition will look
>more or less like this:
>
>	return (!vfio_viommu_preset(vbasedev) ||
>		(vfio_viommu_preset(vbasedev) &&
>		 !vfio_viommu_get_max_iova(&max)))
>
>Whereby vfio_viommu_preset(vbasedev) is what you currently call
>vfio_block_giommu_migration(vbasedev) in current patch. So perhaps I would
>say to leave it in common.c and rename it to vfio_viommu_preset(vbasedev)
>with a comment on top for /* Block migration with a vIOMMU */
>
>Then when the time comes I can introduce in my vIOMMU series a
>vfio_viommu_possible() [or some other name] when starting dirty tracking
>which looks all VFIOAddressSpaces and reuses your
>vfio_viommu_preset(vbasedev). This should cover current and future needs,
>without going to great extents beyond the purpose of this patch?

Thanks for detailed explanation, clear, I'll update based on above suggestions.

Zhenzhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-06-28  2:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-21  8:02 [PATCH v3 0/3] VFIO migration related refactor and bug fix Zhenzhong Duan
2023-06-21  8:02 ` [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize Zhenzhong Duan
2023-06-21 11:08   ` Joao Martins
2023-06-25  6:00     ` Duan, Zhenzhong
2023-06-26  7:02       ` Duan, Zhenzhong
2023-06-26 10:07         ` Joao Martins
2023-06-27  2:38           ` Duan, Zhenzhong
2023-06-27 10:21             ` Joao Martins
2023-06-27 10:28               ` Duan, Zhenzhong
2023-06-21  8:02 ` [PATCH v3 2/3] vfio/pci: Fix a segfault " Zhenzhong Duan
2023-06-21 11:08   ` Joao Martins
2023-06-25  6:01     ` Duan, Zhenzhong
2023-06-26  9:58       ` Joao Martins
2023-06-21  8:02 ` [PATCH v3 3/3] vfio/migration: vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
2023-06-26  9:34   ` Avihai Horon
2023-06-26 10:18     ` Joao Martins
2023-06-27  2:55       ` Duan, Zhenzhong
2023-06-27 10:56         ` Joao Martins
2023-06-28  2:26           ` Duan, Zhenzhong
2023-06-27  2:46     ` Duan, Zhenzhong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.