All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] VFIO migration related refactor and bug fix
@ 2023-06-29  8:40 Zhenzhong Duan
  2023-06-29  8:40 ` [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize Zhenzhong Duan
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Zhenzhong Duan @ 2023-06-29  8:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

Hello,

PATCH5 refactors the VFIO migration blocker related code based on
suggestions from Joao and Cedric, so that code is simpler and
"Migration disabled" printed in right case.

But before that works, also found some hotplug bugs when testing
blocker adding failed case. PATCH1-4 fix them.

See patch description for details.

v4: Rebased on [1] which contains Avihai's patchset [2]
    Add more patches to fix resource leak issue, split based on
    different fix TAG per Joao
    Change to not print "Migration disabled" with explicit
    enable-migration=off per Avihai
    Rename vfio_block_giommu_migration to vfio_viommu_preset per Joao

v3: Add PATCH1,2 to fix hotplug bug
    Fix bugs in PATCH3 Avihai and Joao pointed out

Tested vfio hotplug/unplug with vfio migration supported and unsupported cases,
and different param of enable-migration=[on/off/auto] and -only-migratable.

[1] https://github.com/legoater/qemu/tree/vfio-next
[2] https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg06117.html

Thanks

Zhenzhong Duan (5):
  vfio/pci: Fix a segfault in vfio_realize
  vfio/pci: Free leaked timer in vfio_realize error path
  vfio/pci: Disable INTx in vfio_realize error path
  vfio/pci: Free resources when vfio_migration_realize fails
  vfio/migration: Refactor and fix print of "Migration disabled"

 hw/vfio/common.c              | 66 +++++++----------------------------
 hw/vfio/migration.c           | 30 +++++++++-------
 hw/vfio/pci.c                 | 18 +++++++---
 include/hw/vfio/vfio-common.h |  7 ++--
 4 files changed, 48 insertions(+), 73 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize
  2023-06-29  8:40 [PATCH v4 0/5] VFIO migration related refactor and bug fix Zhenzhong Duan
@ 2023-06-29  8:40 ` Zhenzhong Duan
  2023-06-29 10:57   ` Joao Martins
  2023-06-29 13:06   ` Cédric Le Goater
  2023-06-29  8:40 ` [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path Zhenzhong Duan
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Zhenzhong Duan @ 2023-06-29  8:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

The kvm irqchip notifier is only registered if the device supports
INTx, however it's unconditionally removed in vfio realize error
path. If the assigned device does not support INTx, this will cause
QEMU to crash when vfio realize fails. Change it to conditionally
remove the notifier only if the notify hook is setup.

Before fix:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
Connection closed by foreign host.

After fix:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
Error: vfio 0000:81:11.1: xres and yres properties require display=on
(qemu)

Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 73e19a04b2bf..48df517f79ee 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3221,7 +3221,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 out_deregister:
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
-    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
+    if (vdev->irqchip_change_notifier.notify) {
+        kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
+    }
 out_teardown:
     vfio_teardown_msi(vdev);
     vfio_bars_exit(vdev);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path
  2023-06-29  8:40 [PATCH v4 0/5] VFIO migration related refactor and bug fix Zhenzhong Duan
  2023-06-29  8:40 ` [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize Zhenzhong Duan
@ 2023-06-29  8:40 ` Zhenzhong Duan
  2023-06-29 10:59   ` Joao Martins
  2023-06-29 13:09   ` Cédric Le Goater
  2023-06-29  8:40 ` [PATCH v4 3/5] vfio/pci: Disable INTx " Zhenzhong Duan
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Zhenzhong Duan @ 2023-06-29  8:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

When vfio_realize fails, the mmap_timer used for INTx optimization
isn't freed. As this timer isn't activated yet, the potential impact
is just a piece of leaked memory.

Fixes: ea486926b07d ("vfio-pci: Update slow path INTx algorithm timer related")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 48df517f79ee..ab6645ba60af 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3224,6 +3224,9 @@ out_deregister:
     if (vdev->irqchip_change_notifier.notify) {
         kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
     }
+    if (vdev->intx.mmap_timer) {
+        timer_free(vdev->intx.mmap_timer);
+    }
 out_teardown:
     vfio_teardown_msi(vdev);
     vfio_bars_exit(vdev);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 3/5] vfio/pci: Disable INTx in vfio_realize error path
  2023-06-29  8:40 [PATCH v4 0/5] VFIO migration related refactor and bug fix Zhenzhong Duan
  2023-06-29  8:40 ` [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize Zhenzhong Duan
  2023-06-29  8:40 ` [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path Zhenzhong Duan
@ 2023-06-29  8:40 ` Zhenzhong Duan
  2023-06-29 11:24   ` Joao Martins
  2023-06-29  8:40 ` [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails Zhenzhong Duan
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Zhenzhong Duan @ 2023-06-29  8:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

When vfio realize fails, INTx isn't disabled if it has been enabled.
This may confuse host side with unhandled interrupt report.

Add a new label to be used for vfio_intx_enable() failed case.

Fixes: a9994687cb9b ("vfio/display: core & wireup")
Fixes: b290659fc3dd ("hw/vfio/display: add ramfb support")
Fixes: c62a0c7ce34e ("vfio/display: add xres + yres properties")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ab6645ba60af..54a8179d1c64 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3167,7 +3167,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
         ret = vfio_intx_enable(vdev, errp);
         if (ret) {
-            goto out_deregister;
+            goto out_intx_disable;
         }
     }
 
@@ -3220,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     return;
 
 out_deregister:
+    vfio_disable_interrupts(vdev);
+out_intx_disable:
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
     if (vdev->irqchip_change_notifier.notify) {
         kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails
  2023-06-29  8:40 [PATCH v4 0/5] VFIO migration related refactor and bug fix Zhenzhong Duan
                   ` (2 preceding siblings ...)
  2023-06-29  8:40 ` [PATCH v4 3/5] vfio/pci: Disable INTx " Zhenzhong Duan
@ 2023-06-29  8:40 ` Zhenzhong Duan
  2023-06-29 11:45   ` Joao Martins
  2023-06-29  8:40 ` [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
  2023-06-30  6:01 ` [PATCH v4 0/5] VFIO migration related refactor and bug fix Cédric Le Goater
  5 siblings, 1 reply; 25+ messages in thread
From: Zhenzhong Duan @ 2023-06-29  8:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
to free resources allocated in vfio_realize(); when vfio_realize()
fails, vfio_exitfn() is never called and we need to free resources
in vfio_realize().

In the case that vfio_migration_realize() fails,
e.g: with -only-migratable & enable-migration=off, we see below:

(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
0000:81:11.1: Migration disabled
Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device

If we hotplug again we should see same log as above, but we see:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
Error: vfio 0000:81:11.1: device is already attached

That's because some references to VFIO device isn't released,
we should check return value of vfio_migration_realize() and
release the references, then VFIO device will be truely
released when hotplug fails.

Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 54a8179d1c64..dc69d3031b24 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         ret = vfio_migration_realize(vbasedev, errp);
         if (ret) {
             error_report("%s: Migration disabled", vbasedev->name);
+            goto out_vfio_migration;
         }
     }
 
@@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
     return;
 
+out_vfio_migration:
+    vfio_migration_exit(vbasedev);
 out_deregister:
     vfio_disable_interrupts(vdev);
 out_intx_disable:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29  8:40 [PATCH v4 0/5] VFIO migration related refactor and bug fix Zhenzhong Duan
                   ` (3 preceding siblings ...)
  2023-06-29  8:40 ` [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails Zhenzhong Duan
@ 2023-06-29  8:40 ` Zhenzhong Duan
  2023-06-29 12:44   ` Joao Martins
  2023-06-29 16:40   ` Cédric Le Goater
  2023-06-30  6:01 ` [PATCH v4 0/5] VFIO migration related refactor and bug fix Cédric Le Goater
  5 siblings, 2 replies; 25+ messages in thread
From: Zhenzhong Duan @ 2023-06-29  8:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, joao.m.martins, avihaih, chao.p.peng

This patch refactors vfio_migration_realize() and its dependend code
as follows:

1. It's redundant in vfio_migration_realize() to registers multiple blockers,
   e.g: vIOMMU blocker can be refactored as per device blocker.
2. Change vfio_viommu_preset() to be only a per device checker.
3. Remove global vIOMMU blocker related stuff, e.g:
   giommu_migration_blocker, vfio_[block|unblock]_giommu_migration()
   and vfio_migration_finalize()
4. Change vfio_migration_realize(), vfio_block_multiple_devices_migration()
   vfio_block_migration() and vfio_viommu_preset() to return bool type.
5. Print "Migration disabled" depending on enable_migration property
   and print it as warning instead of error which is overkill.

migrate_add_blocker() returns 0 when successfully adding the migration blocker.
However, the caller of vfio_migration_realize() considers that migration was
blocked when the latter returned an error. What matters for migration is that
the blocker is added in core migration, so this cleans up usability such that
user sees "Migrate disabled" when any of the vfio migration blockers are active
and it's not intentionally forced by user with enable-migration=off.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/common.c              | 66 +++++++----------------------------
 hw/vfio/migration.c           | 30 +++++++++-------
 hw/vfio/pci.c                 |  4 +--
 include/hw/vfio/vfio-common.h |  7 ++--
 4 files changed, 36 insertions(+), 71 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 77e2ee0e5c6e..c80ecb1da53f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -362,7 +362,6 @@ bool vfio_mig_active(void)
 }
 
 static Error *multiple_devices_migration_blocker;
-static Error *giommu_migration_blocker;
 
 static unsigned int vfio_migratable_device_num(void)
 {
@@ -381,19 +380,19 @@ static unsigned int vfio_migratable_device_num(void)
     return device_num;
 }
 
-int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
+bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
 {
     int ret;
 
     if (multiple_devices_migration_blocker ||
         vfio_migratable_device_num() <= 1) {
-        return 0;
+        return true;
     }
 
     if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
         error_setg(errp, "Migration is currently not supported with multiple "
                          "VFIO devices");
-        return -EINVAL;
+        return false;
     }
 
     error_setg(&multiple_devices_migration_blocker,
@@ -403,9 +402,15 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
     if (ret < 0) {
         error_free(multiple_devices_migration_blocker);
         multiple_devices_migration_blocker = NULL;
+    } else {
+        /*
+         * Only ON_OFF_AUTO_AUTO case, ON_OFF_AUTO_OFF is checked
+         * in vfio_migration_realize().
+         */
+        warn_report("Migration disabled, not support multiple VFIO devices");
     }
 
-    return ret;
+    return !ret;
 }
 
 void vfio_unblock_multiple_devices_migration(void)
@@ -420,55 +425,10 @@ void vfio_unblock_multiple_devices_migration(void)
     multiple_devices_migration_blocker = NULL;
 }
 
-static bool vfio_viommu_preset(void)
+/* Block migration with a vIOMMU */
+bool vfio_viommu_preset(VFIODevice *vbasedev)
 {
-    VFIOAddressSpace *space;
-
-    QLIST_FOREACH(space, &vfio_address_spaces, list) {
-        if (space->as != &address_space_memory) {
-            return true;
-        }
-    }
-
-    return false;
-}
-
-int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp)
-{
-    int ret;
-
-    if (giommu_migration_blocker ||
-        !vfio_viommu_preset()) {
-        return 0;
-    }
-
-    if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
-        error_setg(errp,
-                   "Migration is currently not supported with vIOMMU enabled");
-        return -EINVAL;
-    }
-
-    error_setg(&giommu_migration_blocker,
-               "Migration is currently not supported with vIOMMU enabled");
-    ret = migrate_add_blocker(giommu_migration_blocker, errp);
-    if (ret < 0) {
-        error_free(giommu_migration_blocker);
-        giommu_migration_blocker = NULL;
-    }
-
-    return ret;
-}
-
-void vfio_migration_finalize(void)
-{
-    if (!giommu_migration_blocker ||
-        vfio_viommu_preset()) {
-        return;
-    }
-
-    migrate_del_blocker(giommu_migration_blocker);
-    error_free(giommu_migration_blocker);
-    giommu_migration_blocker = NULL;
+    return vbasedev->group->container->space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int err)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 1db7d52ab2c1..84036e5cfc01 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -802,13 +802,13 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     return 0;
 }
 
-static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
+static bool vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
 {
     int ret;
 
     if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
         error_propagate(errp, err);
-        return -EINVAL;
+        return false;
     }
 
     vbasedev->migration_blocker = error_copy(err);
@@ -818,9 +818,11 @@ static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
     if (ret < 0) {
         error_free(vbasedev->migration_blocker);
         vbasedev->migration_blocker = NULL;
+    } else if (vbasedev->enable_migration != ON_OFF_AUTO_OFF) {
+        warn_report("%s: Migration disabled", vbasedev->name);
     }
 
-    return ret;
+    return !ret;
 }
 
 /* ---------------------------------------------------------------------- */
@@ -835,7 +837,12 @@ void vfio_reset_bytes_transferred(void)
     bytes_transferred = 0;
 }
 
-int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
+/*
+ * Return true when either migration initialized or blocker registered.
+ * Currently only return false when adding blocker fails which will
+ * de-register vfio device.
+ */
+bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
 {
     Error *err = NULL;
     int ret;
@@ -873,18 +880,17 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
                     vbasedev->name);
     }
 
-    ret = vfio_block_multiple_devices_migration(vbasedev, errp);
-    if (ret) {
-        return ret;
+    if (!vfio_block_multiple_devices_migration(vbasedev, errp)) {
+        return false;
     }
 
-    ret = vfio_block_giommu_migration(vbasedev, errp);
-    if (ret) {
-        return ret;
+    if (vfio_viommu_preset(vbasedev)) {
+        error_setg(&err, "%s: Migration is currently not supported "
+                   "with vIOMMU enabled", vbasedev->name);
+        return vfio_block_migration(vbasedev, err, errp);
     }
 
-    trace_vfio_migration_realize(vbasedev->name);
-    return 0;
+    return true;
 }
 
 void vfio_migration_exit(VFIODevice *vbasedev)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dc69d3031b24..184d08568154 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     if (!pdev->failover_pair_id) {
         ret = vfio_migration_realize(vbasedev, errp);
         if (ret) {
-            error_report("%s: Migration disabled", vbasedev->name);
+            trace_vfio_migration_realize(vbasedev->name);
+        } else {
             goto out_vfio_migration;
         }
     }
@@ -3257,7 +3258,6 @@ static void vfio_instance_finalize(Object *obj)
      */
     vfio_put_device(vdev);
     vfio_put_group(group);
-    vfio_migration_finalize();
 }
 
 static void vfio_exitfn(PCIDevice *pdev)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 93429b9abba0..3c18572322fc 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -225,9 +225,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
 
 bool vfio_mig_active(void);
-int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
+bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
 void vfio_unblock_multiple_devices_migration(void);
-int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp);
+bool vfio_viommu_preset(VFIODevice *vbasedev);
 int64_t vfio_mig_bytes_transferred(void);
 void vfio_reset_bytes_transferred(void);
 
@@ -252,8 +252,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space);
 
-int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
+bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
-void vfio_migration_finalize(void);
 
 #endif /* HW_VFIO_VFIO_COMMON_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize
  2023-06-29  8:40 ` [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize Zhenzhong Duan
@ 2023-06-29 10:57   ` Joao Martins
  2023-06-29 13:06   ` Cédric Le Goater
  1 sibling, 0 replies; 25+ messages in thread
From: Joao Martins @ 2023-06-29 10:57 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel; +Cc: alex.williamson, clg, avihaih, chao.p.peng



On 29/06/2023 09:40, Zhenzhong Duan wrote:
> The kvm irqchip notifier is only registered if the device supports
> INTx, however it's unconditionally removed in vfio realize error
> path. If the assigned device does not support INTx, this will cause
> QEMU to crash when vfio realize fails. Change it to conditionally
> remove the notifier only if the notify hook is setup.
> 
> Before fix:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
> Connection closed by foreign host.
> 
> After fix:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
> Error: vfio 0000:81:11.1: xres and yres properties require display=on
> (qemu)
> 
> Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

> ---
>  hw/vfio/pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 73e19a04b2bf..48df517f79ee 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3221,7 +3221,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  
>  out_deregister:
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> -    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> +    if (vdev->irqchip_change_notifier.notify) {
> +        kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> +    }
>  out_teardown:
>      vfio_teardown_msi(vdev);
>      vfio_bars_exit(vdev);


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path
  2023-06-29  8:40 ` [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path Zhenzhong Duan
@ 2023-06-29 10:59   ` Joao Martins
  2023-06-29 13:09   ` Cédric Le Goater
  1 sibling, 0 replies; 25+ messages in thread
From: Joao Martins @ 2023-06-29 10:59 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel; +Cc: alex.williamson, clg, avihaih, chao.p.peng

On 29/06/2023 09:40, Zhenzhong Duan wrote:
> When vfio_realize fails, the mmap_timer used for INTx optimization
> isn't freed. As this timer isn't activated yet, the potential impact
> is just a piece of leaked memory.
> 
> Fixes: ea486926b07d ("vfio-pci: Update slow path INTx algorithm timer related")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

> ---
>  hw/vfio/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 48df517f79ee..ab6645ba60af 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3224,6 +3224,9 @@ out_deregister:
>      if (vdev->irqchip_change_notifier.notify) {
>          kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>      }
> +    if (vdev->intx.mmap_timer) {
> +        timer_free(vdev->intx.mmap_timer);
> +    }
>  out_teardown:
>      vfio_teardown_msi(vdev);
>      vfio_bars_exit(vdev);


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 3/5] vfio/pci: Disable INTx in vfio_realize error path
  2023-06-29  8:40 ` [PATCH v4 3/5] vfio/pci: Disable INTx " Zhenzhong Duan
@ 2023-06-29 11:24   ` Joao Martins
  2023-06-29 15:13     ` Cédric Le Goater
  0 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2023-06-29 11:24 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel; +Cc: alex.williamson, clg, avihaih, chao.p.peng

On 29/06/2023 09:40, Zhenzhong Duan wrote:
> When vfio realize fails, INTx isn't disabled if it has been enabled.
> This may confuse host side with unhandled interrupt report.
> 
> Add a new label to be used for vfio_intx_enable() failed case.
> 
> Fixes: a9994687cb9b ("vfio/display: core & wireup")
> Fixes: b290659fc3dd ("hw/vfio/display: add ramfb support")
> Fixes: c62a0c7ce34e ("vfio/display: add xres + yres properties")

Sounds to me the correct Fixes tag is the same as first patch i.e.:

Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier")

> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Looks good, but see some clarifications below.

> ---
>  hw/vfio/pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index ab6645ba60af..54a8179d1c64 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3167,7 +3167,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>          ret = vfio_intx_enable(vdev, errp);
>          if (ret) {
> -            goto out_deregister;
> +            goto out_intx_disable;
>          }
>      }
>  
> @@ -3220,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      return;
>  
>  out_deregister:
> +    vfio_disable_interrupts(vdev);

You are calling vfio_disable_interrupts() when what you want is
vfio_intx_disable() ? But I guess your thinking was to call
vfio_disable_interrupt() which eventually calls vfio_intx_disable() in case INTx
was really setup, thus saving the duplicated check. The MSIx/MSI in realize() I
don't think they will be enabled at this point. Let me know if I misunderstood.

> +out_intx_disable:

Maybe 'out_intx_teardown' or 'out_intx_deregister' because you are not really
disabling INTx.

>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>      if (vdev->irqchip_change_notifier.notify) {
>          kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails
  2023-06-29  8:40 ` [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails Zhenzhong Duan
@ 2023-06-29 11:45   ` Joao Martins
  2023-06-29 15:23     ` Cédric Le Goater
  2023-06-30  1:23     ` Duan, Zhenzhong
  0 siblings, 2 replies; 25+ messages in thread
From: Joao Martins @ 2023-06-29 11:45 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel; +Cc: alex.williamson, clg, avihaih, chao.p.peng

On 29/06/2023 09:40, Zhenzhong Duan wrote:
> When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
> to free resources allocated in vfio_realize(); when vfio_realize()
> fails, vfio_exitfn() is never called and we need to free resources
> in vfio_realize().
> 
> In the case that vfio_migration_realize() fails,
> e.g: with -only-migratable & enable-migration=off, we see below:
> 
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
> 0000:81:11.1: Migration disabled
> Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device
> 
> If we hotplug again we should see same log as above, but we see:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
> Error: vfio 0000:81:11.1: device is already attached
> 
> That's because some references to VFIO device isn't released,
> we should check return value of vfio_migration_realize() and
> release the references, then VFIO device will be truely
> released when hotplug fails.
> 
> Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  hw/vfio/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 54a8179d1c64..dc69d3031b24 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          ret = vfio_migration_realize(vbasedev, errp);
>          if (ret) {
>              error_report("%s: Migration disabled", vbasedev->name);
> +            goto out_vfio_migration;
>          }
>      }
>  
> @@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  
>      return;
>  
> +out_vfio_migration:
> +    vfio_migration_exit(vbasedev);
>  out_deregister:
>      vfio_disable_interrupts(vdev);
>  out_intx_disable:

I agree with the general sentiment behind the change.
Clearly vfio::migration and vfio::migration_blocker are leaking from inside the
migration_realize() function.

But it is kinda awkward semantic that vfio_migration_realize() (or any realize)
failures need to be accompanied with a vfio_migration_exit() that tears down
state *leaked* by its realize() failure.

It sounds to me that this should be inside the vfio_migration_realize() not on
the caller? Unless QEMU ::realize() is expected to do this.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29  8:40 ` [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
@ 2023-06-29 12:44   ` Joao Martins
  2023-06-29 15:20     ` Avihai Horon
  2023-06-29 16:40   ` Cédric Le Goater
  1 sibling, 1 reply; 25+ messages in thread
From: Joao Martins @ 2023-06-29 12:44 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel; +Cc: alex.williamson, clg, avihaih, chao.p.peng

On 29/06/2023 09:40, Zhenzhong Duan wrote:
> This patch refactors vfio_migration_realize() and its dependend code
> as follows:
> 
> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>    e.g: vIOMMU blocker can be refactored as per device blocker.
> 2. Change vfio_viommu_preset() to be only a per device checker.
> 3. Remove global vIOMMU blocker related stuff, e.g:
>    giommu_migration_blocker, vfio_[block|unblock]_giommu_migration()
>    and vfio_migration_finalize()
> 4. Change vfio_migration_realize(), vfio_block_multiple_devices_migration()
>    vfio_block_migration() and vfio_viommu_preset() to return bool type.
> 5. Print "Migration disabled" depending on enable_migration property
>    and print it as warning instead of error which is overkill.
> 
I am not enterily sure we need to keep "Migration disabled". Perhaps we should
just derisk from error to warning and use always the same error messages.

> migrate_add_blocker() returns 0 when successfully adding the migration blocker.
> However, the caller of vfio_migration_realize() considers that migration was
> blocked when the latter returned an error. What matters for migration is that
> the blocker is added in core migration, so this cleans up usability such that
> user sees "Migrate disabled" when any of the vfio migration blockers are active
> and it's not intentionally forced by user with enable-migration=off.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  hw/vfio/common.c              | 66 +++++++----------------------------
>  hw/vfio/migration.c           | 30 +++++++++-------
>  hw/vfio/pci.c                 |  4 +--
>  include/hw/vfio/vfio-common.h |  7 ++--
>  4 files changed, 36 insertions(+), 71 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 77e2ee0e5c6e..c80ecb1da53f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -362,7 +362,6 @@ bool vfio_mig_active(void)
>  }
>  
>  static Error *multiple_devices_migration_blocker;
> -static Error *giommu_migration_blocker;
>  
>  static unsigned int vfio_migratable_device_num(void)
>  {
> @@ -381,19 +380,19 @@ static unsigned int vfio_migratable_device_num(void)
>      return device_num;
>  }
>  
> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>  {
>      int ret;
>  
>      if (multiple_devices_migration_blocker ||
>          vfio_migratable_device_num() <= 1) {
> -        return 0;
> +        return true;
>      }
>  
>      if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>          error_setg(errp, "Migration is currently not supported with multiple "
>                           "VFIO devices");
> -        return -EINVAL;
> +        return false;
>      }
>  
>      error_setg(&multiple_devices_migration_blocker,
> @@ -403,9 +402,15 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>      if (ret < 0) {
>          error_free(multiple_devices_migration_blocker);
>          multiple_devices_migration_blocker = NULL;
> +    } else {
> +        /*
> +         * Only ON_OFF_AUTO_AUTO case, ON_OFF_AUTO_OFF is checked
> +         * in vfio_migration_realize().
> +         */
> +        warn_report("Migration disabled, not support multiple VFIO devices");
>      }
>  

Perhaps you could stash the previous error message and use it in the
warn_report_error to consolidate the error messages e.g.

bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
{
    Error *err = NULL;

    if (multiple_devices_migration_blocker ||
        vfio_migratable_device_num() <= 1) {
        return true;
    }

    error_setg(&err, "%s: Migration is currently not supported with multiple "
                     "VFIO devices", vbasedev->name);

    if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
        error_propagate(errp, err);
        return -EINVAL;
    }

    ...
    if (ret < 0) {
    } else {
        /* Warns only on ON_OFF_AUTO_AUTO case */
        warn_report_err(err);
    }
}

> -    return ret;
> +    return !ret;
>  }
>  
>  void vfio_unblock_multiple_devices_migration(void)
> @@ -420,55 +425,10 @@ void vfio_unblock_multiple_devices_migration(void)
>      multiple_devices_migration_blocker = NULL;
>  }
>  
> -static bool vfio_viommu_preset(void)
> +/* Block migration with a vIOMMU */

I meant in the previous version to put the comment on top of the caller, not on
the definition. But with the new code structure from Avihai the error message
further below... it will look a bit redundant.

> +bool vfio_viommu_preset(VFIODevice *vbasedev)
>  {
> -    VFIOAddressSpace *space;
> -
> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> -        if (space->as != &address_space_memory) {
> -            return true;
> -        }
> -    }
> -
> -    return false;
> -}
> -
> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp)
> -{
> -    int ret;
> -
> -    if (giommu_migration_blocker ||
> -        !vfio_viommu_preset()) {
> -        return 0;
> -    }
> -
> -    if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
> -        error_setg(errp,
> -                   "Migration is currently not supported with vIOMMU enabled");
> -        return -EINVAL;
> -    }
> -
> -    error_setg(&giommu_migration_blocker,
> -               "Migration is currently not supported with vIOMMU enabled");
> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
> -    if (ret < 0) {
> -        error_free(giommu_migration_blocker);
> -        giommu_migration_blocker = NULL;
> -    }
> -
> -    return ret;
> -}
> -
> -void vfio_migration_finalize(void)
> -{
> -    if (!giommu_migration_blocker ||
> -        vfio_viommu_preset()) {
> -        return;
> -    }
> -
> -    migrate_del_blocker(giommu_migration_blocker);
> -    error_free(giommu_migration_blocker);
> -    giommu_migration_blocker = NULL;
> +    return vbasedev->group->container->space->as != &address_space_memory;
>  }
>  

nice consolidation

>  static void vfio_set_migration_error(int err)
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 1db7d52ab2c1..84036e5cfc01 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -802,13 +802,13 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      return 0;
>  }
>  
> -static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
> +static bool vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>  {
>      int ret;
>  
>      if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>          error_propagate(errp, err);
> -        return -EINVAL;
> +        return false;
>      }
>  
>      vbasedev->migration_blocker = error_copy(err);
> @@ -818,9 +818,11 @@ static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>      if (ret < 0) {
>          error_free(vbasedev->migration_blocker);
>          vbasedev->migration_blocker = NULL;
> +    } else if (vbasedev->enable_migration != ON_OFF_AUTO_OFF) {
> +        warn_report("%s: Migration disabled", vbasedev->name);
>      }
>  
Perhaps you can use the the local error to expand on why migration was disabled e.g.

	warn_report_err(err);

> -    return ret;
> +    return !ret;
>  }
>  
>  /* ---------------------------------------------------------------------- */
> @@ -835,7 +837,12 @@ void vfio_reset_bytes_transferred(void)
>      bytes_transferred = 0;
>  }
>  
> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> +/*
> + * Return true when either migration initialized or blocker registered.
> + * Currently only return false when adding blocker fails which will
> + * de-register vfio device.
> + */
> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>  {
>      Error *err = NULL;
>      int ret;
> @@ -873,18 +880,17 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>                      vbasedev->name);
>      }
>  
> -    ret = vfio_block_multiple_devices_migration(vbasedev, errp);
> -    if (ret) {
> -        return ret;
> +    if (!vfio_block_multiple_devices_migration(vbasedev, errp)) {
> +        return false;
>      }
>  
> -    ret = vfio_block_giommu_migration(vbasedev, errp);
> -    if (ret) {
> -        return ret;
> +    if (vfio_viommu_preset(vbasedev)) {

The /* Block migration with a vIOMMU */

Would go above, but I don't think we need it anymore ...

> +        error_setg(&err, "%s: Migration is currently not supported "
> +                   "with vIOMMU enabled", vbasedev->name);
> +        return vfio_block_migration(vbasedev, err, errp);

... as the error message when placed here makes it obvious. So the comment I
suggested won't add much. Unless others disagree.

>      }
>  
> -    trace_vfio_migration_realize(vbasedev->name);
> -    return 0;
> +    return true;
>  }
>  
I think somewhere in function we should have vfio_migration_exit() being called
behind a label or elsewhere from vfio_migration_realize (...)

>  void vfio_migration_exit(VFIODevice *vbasedev)
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index dc69d3031b24..184d08568154 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      if (!pdev->failover_pair_id) {
>          ret = vfio_migration_realize(vbasedev, errp);
>          if (ret) {
> -            error_report("%s: Migration disabled", vbasedev->name);
> +            trace_vfio_migration_realize(vbasedev->name);
> +        } else {
>              goto out_vfio_migration;
>          }
>      }

(...) Which then void the need for this change. Perhaps your previous patch
(4/5) could come after this refactor patch instead ... where you would fix the
unwinding error path inside the vfio_migration_realize() as opposed to
vfio_realize().

> @@ -3257,7 +3258,6 @@ static void vfio_instance_finalize(Object *obj)
>       */
>      vfio_put_device(vdev);
>      vfio_put_group(group);
> -    vfio_migration_finalize();
>  }
>  
>  static void vfio_exitfn(PCIDevice *pdev)
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 93429b9abba0..3c18572322fc 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -225,9 +225,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>  extern VFIOGroupList vfio_group_list;
>  
>  bool vfio_mig_active(void);
> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
>  void vfio_unblock_multiple_devices_migration(void);
> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp);
> +bool vfio_viommu_preset(VFIODevice *vbasedev);
>  int64_t vfio_mig_bytes_transferred(void);
>  void vfio_reset_bytes_transferred(void);
>  
> @@ -252,8 +252,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space);
>  
> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
>  void vfio_migration_exit(VFIODevice *vbasedev);
> -void vfio_migration_finalize(void);
>  
>  #endif /* HW_VFIO_VFIO_COMMON_H */


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize
  2023-06-29  8:40 ` [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize Zhenzhong Duan
  2023-06-29 10:57   ` Joao Martins
@ 2023-06-29 13:06   ` Cédric Le Goater
  1 sibling, 0 replies; 25+ messages in thread
From: Cédric Le Goater @ 2023-06-29 13:06 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, joao.m.martins, avihaih, chao.p.peng

On 6/29/23 10:40, Zhenzhong Duan wrote:
> The kvm irqchip notifier is only registered if the device supports
> INTx, however it's unconditionally removed in vfio realize error
> path. If the assigned device does not support INTx, this will cause
> QEMU to crash when vfio realize fails. Change it to conditionally
> remove the notifier only if the notify hook is setup.
> 
> Before fix:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
> Connection closed by foreign host.
> 
> After fix:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
> Error: vfio 0000:81:11.1: xres and yres properties require display=on
> (qemu)
> 
> Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/pci.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 73e19a04b2bf..48df517f79ee 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3221,7 +3221,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>   
>   out_deregister:
>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> -    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> +    if (vdev->irqchip_change_notifier.notify) {
> +        kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> +    }
>   out_teardown:
>       vfio_teardown_msi(vdev);
>       vfio_bars_exit(vdev);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path
  2023-06-29  8:40 ` [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path Zhenzhong Duan
  2023-06-29 10:59   ` Joao Martins
@ 2023-06-29 13:09   ` Cédric Le Goater
  1 sibling, 0 replies; 25+ messages in thread
From: Cédric Le Goater @ 2023-06-29 13:09 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, joao.m.martins, avihaih, chao.p.peng

On 6/29/23 10:40, Zhenzhong Duan wrote:
> When vfio_realize fails, the mmap_timer used for INTx optimization
> isn't freed. As this timer isn't activated yet, the potential impact
> is just a piece of leaked memory.
> 
> Fixes: ea486926b07d ("vfio-pci: Update slow path INTx algorithm timer related")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/pci.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 48df517f79ee..ab6645ba60af 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3224,6 +3224,9 @@ out_deregister:
>       if (vdev->irqchip_change_notifier.notify) {
>           kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>       }
> +    if (vdev->intx.mmap_timer) {
> +        timer_free(vdev->intx.mmap_timer);
> +    }
>   out_teardown:
>       vfio_teardown_msi(vdev);
>       vfio_bars_exit(vdev);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 3/5] vfio/pci: Disable INTx in vfio_realize error path
  2023-06-29 11:24   ` Joao Martins
@ 2023-06-29 15:13     ` Cédric Le Goater
  2023-06-29 15:33       ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Cédric Le Goater @ 2023-06-29 15:13 UTC (permalink / raw)
  To: Joao Martins, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, avihaih, chao.p.peng

On 6/29/23 13:24, Joao Martins wrote:
> On 29/06/2023 09:40, Zhenzhong Duan wrote:
>> When vfio realize fails, INTx isn't disabled if it has been enabled.
>> This may confuse host side with unhandled interrupt report.
>>
>> Add a new label to be used for vfio_intx_enable() failed case.
>>
>> Fixes: a9994687cb9b ("vfio/display: core & wireup")
>> Fixes: b290659fc3dd ("hw/vfio/display: add ramfb support")
>> Fixes: c62a0c7ce34e ("vfio/display: add xres + yres properties")
> 
> Sounds to me the correct Fixes tag is the same as first patch i.e.:
> 
> Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier")
> 
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> 
> Looks good, but see some clarifications below.
> 
>> ---
>>   hw/vfio/pci.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index ab6645ba60af..54a8179d1c64 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3167,7 +3167,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>           kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>>           ret = vfio_intx_enable(vdev, errp);
>>           if (ret) {
>> -            goto out_deregister;
>> +            goto out_intx_disable;
>>           }
>>       }
>>   
>> @@ -3220,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>       return;
>>   
>>   out_deregister:
>> +    vfio_disable_interrupts(vdev);
> 
> You are calling vfio_disable_interrupts() when what you want is
> vfio_intx_disable() ? But I guess your thinking was to call
> vfio_disable_interrupt() which eventually calls vfio_intx_disable() in case INTx
> was really setup, thus saving the duplicated check. The MSIx/MSI in realize() I
> don't think they will be enabled at this point. Let me know if I misunderstood.
> 
>> +out_intx_disable:
> 
> Maybe 'out_intx_teardown' or 'out_intx_deregister' because you are not really
> disabling INTx.

or simply extract from vfio_disable_interrupts() :
  
     if (vdev->interrupt == VFIO_INT_INTx) {
         vfio_intx_disable(vdev);
     }

and add the above code before cleaning up the intx routing
notifier without any new goto labels.

Thanks,

C.


> 
>>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>       if (vdev->irqchip_change_notifier.notify) {
>>           kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29 12:44   ` Joao Martins
@ 2023-06-29 15:20     ` Avihai Horon
  2023-06-29 15:42       ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Avihai Horon @ 2023-06-29 15:20 UTC (permalink / raw)
  To: Joao Martins, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, clg, chao.p.peng


On 29/06/2023 15:44, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> On 29/06/2023 09:40, Zhenzhong Duan wrote:
>> This patch refactors vfio_migration_realize() and its dependend code
>> as follows:
>>
>> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>>     e.g: vIOMMU blocker can be refactored as per device blocker.
>> 2. Change vfio_viommu_preset() to be only a per device checker.
>> 3. Remove global vIOMMU blocker related stuff, e.g:
>>     giommu_migration_blocker, vfio_[block|unblock]_giommu_migration()
>>     and vfio_migration_finalize()
>> 4. Change vfio_migration_realize(), vfio_block_multiple_devices_migration()
>>     vfio_block_migration() and vfio_viommu_preset() to return bool type.
>> 5. Print "Migration disabled" depending on enable_migration property
>>     and print it as warning instead of error which is overkill.
>>
> I am not enterily sure we need to keep "Migration disabled". Perhaps we should
> just derisk from error to warning and use always the same error messages.
>
>> migrate_add_blocker() returns 0 when successfully adding the migration blocker.
>> However, the caller of vfio_migration_realize() considers that migration was
>> blocked when the latter returned an error. What matters for migration is that
>> the blocker is added in core migration, so this cleans up usability such that
>> user sees "Migrate disabled" when any of the vfio migration blockers are active
>> and it's not intentionally forced by user with enable-migration=off.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/common.c              | 66 +++++++----------------------------
>>   hw/vfio/migration.c           | 30 +++++++++-------
>>   hw/vfio/pci.c                 |  4 +--
>>   include/hw/vfio/vfio-common.h |  7 ++--
>>   4 files changed, 36 insertions(+), 71 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 77e2ee0e5c6e..c80ecb1da53f 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -362,7 +362,6 @@ bool vfio_mig_active(void)
>>   }
>>
>>   static Error *multiple_devices_migration_blocker;
>> -static Error *giommu_migration_blocker;
>>
>>   static unsigned int vfio_migratable_device_num(void)
>>   {
>> @@ -381,19 +380,19 @@ static unsigned int vfio_migratable_device_num(void)
>>       return device_num;
>>   }
>>
>> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>>   {
>>       int ret;
>>
>>       if (multiple_devices_migration_blocker ||
>>           vfio_migratable_device_num() <= 1) {
>> -        return 0;
>> +        return true;
>>       }
>>
>>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>>           error_setg(errp, "Migration is currently not supported with multiple "
>>                            "VFIO devices");
>> -        return -EINVAL;
>> +        return false;
>>       }
>>
>>       error_setg(&multiple_devices_migration_blocker,
>> @@ -403,9 +402,15 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>>       if (ret < 0) {
>>           error_free(multiple_devices_migration_blocker);
>>           multiple_devices_migration_blocker = NULL;
>> +    } else {
>> +        /*
>> +         * Only ON_OFF_AUTO_AUTO case, ON_OFF_AUTO_OFF is checked
>> +         * in vfio_migration_realize().
>> +         */
>> +        warn_report("Migration disabled, not support multiple VFIO devices");
>>       }
>>
> Perhaps you could stash the previous error message and use it in the
> warn_report_error to consolidate the error messages e.g.
>
> bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
> {
>      Error *err = NULL;
>
>      if (multiple_devices_migration_blocker ||
>          vfio_migratable_device_num() <= 1) {
>          return true;
>      }
>
>      error_setg(&err, "%s: Migration is currently not supported with multiple "
>                       "VFIO devices", vbasedev->name);
>
>      if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>          error_propagate(errp, err);
>          return -EINVAL;
>      }
>
>      ...
>      if (ret < 0) {
>      } else {
>          /* Warns only on ON_OFF_AUTO_AUTO case */
>          warn_report_err(err);

I'm not sure this warning is needed.
If I remember correctly, I think Alex didn't want migration 
error/warning messages to be logged in the AUTO case.

>      }
> }
>
>> -    return ret;
>> +    return !ret;
>>   }
>>
>>   void vfio_unblock_multiple_devices_migration(void)
>> @@ -420,55 +425,10 @@ void vfio_unblock_multiple_devices_migration(void)
>>       multiple_devices_migration_blocker = NULL;
>>   }
>>
>> -static bool vfio_viommu_preset(void)
>> +/* Block migration with a vIOMMU */
> I meant in the previous version to put the comment on top of the caller, not on
> the definition. But with the new code structure from Avihai the error message
> further below... it will look a bit redundant.
>
>> +bool vfio_viommu_preset(VFIODevice *vbasedev)
>>   {
>> -    VFIOAddressSpace *space;
>> -
>> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
>> -        if (space->as != &address_space_memory) {
>> -            return true;
>> -        }
>> -    }
>> -
>> -    return false;
>> -}
>> -
>> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp)
>> -{
>> -    int ret;
>> -
>> -    if (giommu_migration_blocker ||
>> -        !vfio_viommu_preset()) {
>> -        return 0;
>> -    }
>> -
>> -    if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>> -        error_setg(errp,
>> -                   "Migration is currently not supported with vIOMMU enabled");
>> -        return -EINVAL;
>> -    }
>> -
>> -    error_setg(&giommu_migration_blocker,
>> -               "Migration is currently not supported with vIOMMU enabled");
>> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
>> -    if (ret < 0) {
>> -        error_free(giommu_migration_blocker);
>> -        giommu_migration_blocker = NULL;
>> -    }
>> -
>> -    return ret;
>> -}
>> -
>> -void vfio_migration_finalize(void)
>> -{
>> -    if (!giommu_migration_blocker ||
>> -        vfio_viommu_preset()) {
>> -        return;
>> -    }
>> -
>> -    migrate_del_blocker(giommu_migration_blocker);
>> -    error_free(giommu_migration_blocker);
>> -    giommu_migration_blocker = NULL;
>> +    return vbasedev->group->container->space->as != &address_space_memory;
>>   }
>>
> nice consolidation
>
>>   static void vfio_set_migration_error(int err)
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 1db7d52ab2c1..84036e5cfc01 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -802,13 +802,13 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>>       return 0;
>>   }
>>
>> -static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>> +static bool vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>>   {
>>       int ret;
>>
>>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>>           error_propagate(errp, err);
>> -        return -EINVAL;
>> +        return false;
>>       }
>>
>>       vbasedev->migration_blocker = error_copy(err);
>> @@ -818,9 +818,11 @@ static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>>       if (ret < 0) {
>>           error_free(vbasedev->migration_blocker);
>>           vbasedev->migration_blocker = NULL;
>> +    } else if (vbasedev->enable_migration != ON_OFF_AUTO_OFF) {
>> +        warn_report("%s: Migration disabled", vbasedev->name);
>>       }
>>
> Perhaps you can use the the local error to expand on why migration was disabled e.g.
>
>          warn_report_err(err);

Same here.

Thanks.

>
>> -    return ret;
>> +    return !ret;
>>   }
>>
>>   /* ---------------------------------------------------------------------- */
>> @@ -835,7 +837,12 @@ void vfio_reset_bytes_transferred(void)
>>       bytes_transferred = 0;
>>   }
>>
>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> +/*
>> + * Return true when either migration initialized or blocker registered.
>> + * Currently only return false when adding blocker fails which will
>> + * de-register vfio device.
>> + */
>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>   {
>>       Error *err = NULL;
>>       int ret;
>> @@ -873,18 +880,17 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>                       vbasedev->name);
>>       }
>>
>> -    ret = vfio_block_multiple_devices_migration(vbasedev, errp);
>> -    if (ret) {
>> -        return ret;
>> +    if (!vfio_block_multiple_devices_migration(vbasedev, errp)) {
>> +        return false;
>>       }
>>
>> -    ret = vfio_block_giommu_migration(vbasedev, errp);
>> -    if (ret) {
>> -        return ret;
>> +    if (vfio_viommu_preset(vbasedev)) {
> The /* Block migration with a vIOMMU */
>
> Would go above, but I don't think we need it anymore ...
>
>> +        error_setg(&err, "%s: Migration is currently not supported "
>> +                   "with vIOMMU enabled", vbasedev->name);
>> +        return vfio_block_migration(vbasedev, err, errp);
> ... as the error message when placed here makes it obvious. So the comment I
> suggested won't add much. Unless others disagree.
>
>>       }
>>
>> -    trace_vfio_migration_realize(vbasedev->name);
>> -    return 0;
>> +    return true;
>>   }
>>
> I think somewhere in function we should have vfio_migration_exit() being called
> behind a label or elsewhere from vfio_migration_realize (...)
>
>>   void vfio_migration_exit(VFIODevice *vbasedev)
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index dc69d3031b24..184d08568154 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>       if (!pdev->failover_pair_id) {
>>           ret = vfio_migration_realize(vbasedev, errp);
>>           if (ret) {
>> -            error_report("%s: Migration disabled", vbasedev->name);
>> +            trace_vfio_migration_realize(vbasedev->name);
>> +        } else {
>>               goto out_vfio_migration;
>>           }
>>       }
> (...) Which then void the need for this change. Perhaps your previous patch
> (4/5) could come after this refactor patch instead ... where you would fix the
> unwinding error path inside the vfio_migration_realize() as opposed to
> vfio_realize().
>
>> @@ -3257,7 +3258,6 @@ static void vfio_instance_finalize(Object *obj)
>>        */
>>       vfio_put_device(vdev);
>>       vfio_put_group(group);
>> -    vfio_migration_finalize();
>>   }
>>
>>   static void vfio_exitfn(PCIDevice *pdev)
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 93429b9abba0..3c18572322fc 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -225,9 +225,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>>   extern VFIOGroupList vfio_group_list;
>>
>>   bool vfio_mig_active(void);
>> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
>> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
>>   void vfio_unblock_multiple_devices_migration(void);
>> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp);
>> +bool vfio_viommu_preset(VFIODevice *vbasedev);
>>   int64_t vfio_mig_bytes_transferred(void);
>>   void vfio_reset_bytes_transferred(void);
>>
>> @@ -252,8 +252,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
>>   int vfio_spapr_remove_window(VFIOContainer *container,
>>                                hwaddr offset_within_address_space);
>>
>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
>>   void vfio_migration_exit(VFIODevice *vbasedev);
>> -void vfio_migration_finalize(void);
>>
>>   #endif /* HW_VFIO_VFIO_COMMON_H */


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails
  2023-06-29 11:45   ` Joao Martins
@ 2023-06-29 15:23     ` Cédric Le Goater
  2023-06-30  1:23     ` Duan, Zhenzhong
  1 sibling, 0 replies; 25+ messages in thread
From: Cédric Le Goater @ 2023-06-29 15:23 UTC (permalink / raw)
  To: Joao Martins, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, avihaih, chao.p.peng

On 6/29/23 13:45, Joao Martins wrote:
> On 29/06/2023 09:40, Zhenzhong Duan wrote:
>> When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
>> to free resources allocated in vfio_realize(); when vfio_realize()
>> fails, vfio_exitfn() is never called and we need to free resources
>> in vfio_realize().
>>
>> In the case that vfio_migration_realize() fails,
>> e.g: with -only-migratable & enable-migration=off, we see below:
>>
>> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> 0000:81:11.1: Migration disabled
>> Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device
>>
>> If we hotplug again we should see same log as above, but we see:
>> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> Error: vfio 0000:81:11.1: device is already attached
>>
>> That's because some references to VFIO device isn't released,
>> we should check return value of vfio_migration_realize() and
>> release the references, then VFIO device will be truely
>> released when hotplug fails.
>>
>> Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/pci.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 54a8179d1c64..dc69d3031b24 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>           ret = vfio_migration_realize(vbasedev, errp);
>>           if (ret) {
>>               error_report("%s: Migration disabled", vbasedev->name);
>> +            goto out_vfio_migration;
>>           }
>>       }
>>   
>> @@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>   
>>       return;
>>   
>> +out_vfio_migration:
>> +    vfio_migration_exit(vbasedev);
>>   out_deregister:
>>       vfio_disable_interrupts(vdev);
>>   out_intx_disable:
> 
> I agree with the general sentiment behind the change.
> Clearly vfio::migration and vfio::migration_blocker are leaking from inside the
> migration_realize() function.
> 
> But it is kinda awkward semantic that vfio_migration_realize() (or any realize)
> failures need to be accompanied with a vfio_migration_exit() that tears down
> state *leaked* by its realize() failure.
> 
> It sounds to me that this should be inside the vfio_migration_realize() not on
> the caller? Unless QEMU ::realize() is expected to do this.
> 

I agree. vfio_migration_realize() should handle the cleanup of the resources
it allocated if there is a failure.

Thanks

C.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 3/5] vfio/pci: Disable INTx in vfio_realize error path
  2023-06-29 15:13     ` Cédric Le Goater
@ 2023-06-29 15:33       ` Joao Martins
  2023-06-30  1:19         ` Duan, Zhenzhong
  0 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2023-06-29 15:33 UTC (permalink / raw)
  To: Cédric Le Goater, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, avihaih, chao.p.peng



On 29/06/2023 16:13, Cédric Le Goater wrote:
> On 6/29/23 13:24, Joao Martins wrote:
>> On 29/06/2023 09:40, Zhenzhong Duan wrote:
>>> When vfio realize fails, INTx isn't disabled if it has been enabled.
>>> This may confuse host side with unhandled interrupt report.
>>>
>>> Add a new label to be used for vfio_intx_enable() failed case.
>>>
>>> Fixes: a9994687cb9b ("vfio/display: core & wireup")
>>> Fixes: b290659fc3dd ("hw/vfio/display: add ramfb support")
>>> Fixes: c62a0c7ce34e ("vfio/display: add xres + yres properties")
>>
>> Sounds to me the correct Fixes tag is the same as first patch i.e.:
>>
>> Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier")
>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>
>> Looks good, but see some clarifications below.
>>
>>> ---
>>>   hw/vfio/pci.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index ab6645ba60af..54a8179d1c64 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3167,7 +3167,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>>           kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>>>           ret = vfio_intx_enable(vdev, errp);
>>>           if (ret) {
>>> -            goto out_deregister;
>>> +            goto out_intx_disable;
>>>           }
>>>       }
>>>   @@ -3220,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>>       return;
>>>     out_deregister:
>>> +    vfio_disable_interrupts(vdev);
>>
>> You are calling vfio_disable_interrupts() when what you want is
>> vfio_intx_disable() ? But I guess your thinking was to call
>> vfio_disable_interrupt() which eventually calls vfio_intx_disable() in case INTx
>> was really setup, thus saving the duplicated check. The MSIx/MSI in realize() I
>> don't think they will be enabled at this point. Let me know if I misunderstood.
>>
>>> +out_intx_disable:
>>
>> Maybe 'out_intx_teardown' or 'out_intx_deregister' because you are not really
>> disabling INTx.
> 
> or simply extract from vfio_disable_interrupts() :
>  
>     if (vdev->interrupt == VFIO_INT_INTx) {
>         vfio_intx_disable(vdev);
>     }
> 
> and add the above code before cleaning up the intx routing
> notifier without any new goto labels.
> 
An even better option indeed.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29 15:20     ` Avihai Horon
@ 2023-06-29 15:42       ` Joao Martins
  2023-06-29 22:12         ` Alex Williamson
  0 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2023-06-29 15:42 UTC (permalink / raw)
  To: Avihai Horon, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, clg, chao.p.peng

On 29/06/2023 16:20, Avihai Horon wrote:
> On 29/06/2023 15:44, Joao Martins wrote:
>> On 29/06/2023 09:40, Zhenzhong Duan wrote:
>>> This patch refactors vfio_migration_realize() and its dependend code
>>> as follows:
>>>
>>> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>>>     e.g: vIOMMU blocker can be refactored as per device blocker.
>>> 2. Change vfio_viommu_preset() to be only a per device checker.
>>> 3. Remove global vIOMMU blocker related stuff, e.g:
>>>     giommu_migration_blocker, vfio_[block|unblock]_giommu_migration()
>>>     and vfio_migration_finalize()
>>> 4. Change vfio_migration_realize(), vfio_block_multiple_devices_migration()
>>>     vfio_block_migration() and vfio_viommu_preset() to return bool type.
>>> 5. Print "Migration disabled" depending on enable_migration property
>>>     and print it as warning instead of error which is overkill.
>>>
>> I am not enterily sure we need to keep "Migration disabled". Perhaps we should
>> just derisk from error to warning and use always the same error messages.
>>
>>> migrate_add_blocker() returns 0 when successfully adding the migration blocker.
>>> However, the caller of vfio_migration_realize() considers that migration was
>>> blocked when the latter returned an error. What matters for migration is that
>>> the blocker is added in core migration, so this cleans up usability such that
>>> user sees "Migrate disabled" when any of the vfio migration blockers are active
>>> and it's not intentionally forced by user with enable-migration=off.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>   hw/vfio/common.c              | 66 +++++++----------------------------
>>>   hw/vfio/migration.c           | 30 +++++++++-------
>>>   hw/vfio/pci.c                 |  4 +--
>>>   include/hw/vfio/vfio-common.h |  7 ++--
>>>   4 files changed, 36 insertions(+), 71 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 77e2ee0e5c6e..c80ecb1da53f 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -362,7 +362,6 @@ bool vfio_mig_active(void)
>>>   }
>>>
>>>   static Error *multiple_devices_migration_blocker;
>>> -static Error *giommu_migration_blocker;
>>>
>>>   static unsigned int vfio_migratable_device_num(void)
>>>   {
>>> @@ -381,19 +380,19 @@ static unsigned int vfio_migratable_device_num(void)
>>>       return device_num;
>>>   }
>>>
>>> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>>> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>>>   {
>>>       int ret;
>>>
>>>       if (multiple_devices_migration_blocker ||
>>>           vfio_migratable_device_num() <= 1) {
>>> -        return 0;
>>> +        return true;
>>>       }
>>>
>>>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>>>           error_setg(errp, "Migration is currently not supported with multiple "
>>>                            "VFIO devices");
>>> -        return -EINVAL;
>>> +        return false;
>>>       }
>>>
>>>       error_setg(&multiple_devices_migration_blocker,
>>> @@ -403,9 +402,15 @@ int vfio_block_multiple_devices_migration(VFIODevice
>>> *vbasedev, Error **errp)
>>>       if (ret < 0) {
>>>           error_free(multiple_devices_migration_blocker);
>>>           multiple_devices_migration_blocker = NULL;
>>> +    } else {
>>> +        /*
>>> +         * Only ON_OFF_AUTO_AUTO case, ON_OFF_AUTO_OFF is checked
>>> +         * in vfio_migration_realize().
>>> +         */
>>> +        warn_report("Migration disabled, not support multiple VFIO devices");
>>>       }
>>>
>> Perhaps you could stash the previous error message and use it in the
>> warn_report_error to consolidate the error messages e.g.
>>
>> bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>> {
>>      Error *err = NULL;
>>
>>      if (multiple_devices_migration_blocker ||
>>          vfio_migratable_device_num() <= 1) {
>>          return true;
>>      }
>>
>>      error_setg(&err, "%s: Migration is currently not supported with multiple "
>>                       "VFIO devices", vbasedev->name);
>>
>>      if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>>          error_propagate(errp, err);
>>          return -EINVAL;
>>      }
>>
>>      ...
>>      if (ret < 0) {
>>      } else {
>>          /* Warns only on ON_OFF_AUTO_AUTO case */
>>          warn_report_err(err);
> 
> I'm not sure this warning is needed.
> If I remember correctly, I think Alex didn't want migration error/warning
> messages to be logged in the AUTO case.
> 

Hmm, ok, I missed this from the previous discussions.

So today there are migration warnings in the current code. (even in the AUTO
case). So if we want them removed, then this patch would then just remove the
"Migration disabled" all together (in the two places we commented).

The rest of the cases already propagate the error I think. And the AUTO case
will always be blocked migration and see the same printed messages elsewhere.

>>      }
>> }
>>
>>> -    return ret;
>>> +    return !ret;
>>>   }
>>>
>>>   void vfio_unblock_multiple_devices_migration(void)
>>> @@ -420,55 +425,10 @@ void vfio_unblock_multiple_devices_migration(void)
>>>       multiple_devices_migration_blocker = NULL;
>>>   }
>>>
>>> -static bool vfio_viommu_preset(void)
>>> +/* Block migration with a vIOMMU */
>> I meant in the previous version to put the comment on top of the caller, not on
>> the definition. But with the new code structure from Avihai the error message
>> further below... it will look a bit redundant.
>>
>>> +bool vfio_viommu_preset(VFIODevice *vbasedev)
>>>   {
>>> -    VFIOAddressSpace *space;
>>> -
>>> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
>>> -        if (space->as != &address_space_memory) {
>>> -            return true;
>>> -        }
>>> -    }
>>> -
>>> -    return false;
>>> -}
>>> -
>>> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp)
>>> -{
>>> -    int ret;
>>> -
>>> -    if (giommu_migration_blocker ||
>>> -        !vfio_viommu_preset()) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>>> -        error_setg(errp,
>>> -                   "Migration is currently not supported with vIOMMU enabled");
>>> -        return -EINVAL;
>>> -    }
>>> -
>>> -    error_setg(&giommu_migration_blocker,
>>> -               "Migration is currently not supported with vIOMMU enabled");
>>> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
>>> -    if (ret < 0) {
>>> -        error_free(giommu_migration_blocker);
>>> -        giommu_migration_blocker = NULL;
>>> -    }
>>> -
>>> -    return ret;
>>> -}
>>> -
>>> -void vfio_migration_finalize(void)
>>> -{
>>> -    if (!giommu_migration_blocker ||
>>> -        vfio_viommu_preset()) {
>>> -        return;
>>> -    }
>>> -
>>> -    migrate_del_blocker(giommu_migration_blocker);
>>> -    error_free(giommu_migration_blocker);
>>> -    giommu_migration_blocker = NULL;
>>> +    return vbasedev->group->container->space->as != &address_space_memory;
>>>   }
>>>
>> nice consolidation
>>
>>>   static void vfio_set_migration_error(int err)
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 1db7d52ab2c1..84036e5cfc01 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -802,13 +802,13 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>>>       return 0;
>>>   }
>>>
>>> -static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>>> +static bool vfio_block_migration(VFIODevice *vbasedev, Error *err, Error
>>> **errp)
>>>   {
>>>       int ret;
>>>
>>>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>>>           error_propagate(errp, err);
>>> -        return -EINVAL;
>>> +        return false;
>>>       }
>>>
>>>       vbasedev->migration_blocker = error_copy(err);
>>> @@ -818,9 +818,11 @@ static int vfio_block_migration(VFIODevice *vbasedev,
>>> Error *err, Error **errp)
>>>       if (ret < 0) {
>>>           error_free(vbasedev->migration_blocker);
>>>           vbasedev->migration_blocker = NULL;
>>> +    } else if (vbasedev->enable_migration != ON_OFF_AUTO_OFF) {
>>> +        warn_report("%s: Migration disabled", vbasedev->name);
>>>       }
>>>
>> Perhaps you can use the the local error to expand on why migration was
>> disabled e.g.
>>
>>          warn_report_err(err);
> 
> Same here.
> 
> Thanks.
> 
>>
>>> -    return ret;
>>> +    return !ret;
>>>   }
>>>
>>>   /* ---------------------------------------------------------------------- */
>>> @@ -835,7 +837,12 @@ void vfio_reset_bytes_transferred(void)
>>>       bytes_transferred = 0;
>>>   }
>>>
>>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>> +/*
>>> + * Return true when either migration initialized or blocker registered.
>>> + * Currently only return false when adding blocker fails which will
>>> + * de-register vfio device.
>>> + */
>>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>>   {
>>>       Error *err = NULL;
>>>       int ret;
>>> @@ -873,18 +880,17 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
>>> **errp)
>>>                       vbasedev->name);
>>>       }
>>>
>>> -    ret = vfio_block_multiple_devices_migration(vbasedev, errp);
>>> -    if (ret) {
>>> -        return ret;
>>> +    if (!vfio_block_multiple_devices_migration(vbasedev, errp)) {
>>> +        return false;
>>>       }
>>>
>>> -    ret = vfio_block_giommu_migration(vbasedev, errp);
>>> -    if (ret) {
>>> -        return ret;
>>> +    if (vfio_viommu_preset(vbasedev)) {
>> The /* Block migration with a vIOMMU */
>>
>> Would go above, but I don't think we need it anymore ...
>>
>>> +        error_setg(&err, "%s: Migration is currently not supported "
>>> +                   "with vIOMMU enabled", vbasedev->name);
>>> +        return vfio_block_migration(vbasedev, err, errp);
>> ... as the error message when placed here makes it obvious. So the comment I
>> suggested won't add much. Unless others disagree.
>>
>>>       }
>>>
>>> -    trace_vfio_migration_realize(vbasedev->name);
>>> -    return 0;
>>> +    return true;
>>>   }
>>>
>> I think somewhere in function we should have vfio_migration_exit() being called
>> behind a label or elsewhere from vfio_migration_realize (...)
>>
>>>   void vfio_migration_exit(VFIODevice *vbasedev)
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index dc69d3031b24..184d08568154 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>>       if (!pdev->failover_pair_id) {
>>>           ret = vfio_migration_realize(vbasedev, errp);
>>>           if (ret) {
>>> -            error_report("%s: Migration disabled", vbasedev->name);
>>> +            trace_vfio_migration_realize(vbasedev->name);
>>> +        } else {
>>>               goto out_vfio_migration;
>>>           }
>>>       }
>> (...) Which then void the need for this change. Perhaps your previous patch
>> (4/5) could come after this refactor patch instead ... where you would fix the
>> unwinding error path inside the vfio_migration_realize() as opposed to
>> vfio_realize().
>>
>>> @@ -3257,7 +3258,6 @@ static void vfio_instance_finalize(Object *obj)
>>>        */
>>>       vfio_put_device(vdev);
>>>       vfio_put_group(group);
>>> -    vfio_migration_finalize();
>>>   }
>>>
>>>   static void vfio_exitfn(PCIDevice *pdev)
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 93429b9abba0..3c18572322fc 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -225,9 +225,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>>>   extern VFIOGroupList vfio_group_list;
>>>
>>>   bool vfio_mig_active(void);
>>> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
>>> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
>>>   void vfio_unblock_multiple_devices_migration(void);
>>> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp);
>>> +bool vfio_viommu_preset(VFIODevice *vbasedev);
>>>   int64_t vfio_mig_bytes_transferred(void);
>>>   void vfio_reset_bytes_transferred(void);
>>>
>>> @@ -252,8 +252,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
>>>   int vfio_spapr_remove_window(VFIOContainer *container,
>>>                                hwaddr offset_within_address_space);
>>>
>>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
>>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
>>>   void vfio_migration_exit(VFIODevice *vbasedev);
>>> -void vfio_migration_finalize(void);
>>>
>>>   #endif /* HW_VFIO_VFIO_COMMON_H */


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29  8:40 ` [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
  2023-06-29 12:44   ` Joao Martins
@ 2023-06-29 16:40   ` Cédric Le Goater
  2023-06-30  1:40     ` Duan, Zhenzhong
  1 sibling, 1 reply; 25+ messages in thread
From: Cédric Le Goater @ 2023-06-29 16:40 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, joao.m.martins, avihaih, chao.p.peng

Hello Zhenzhong,

On 6/29/23 10:40, Zhenzhong Duan wrote:
> This patch refactors vfio_migration_realize() and its dependend code
> as follows:
> 
> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>     e.g: vIOMMU blocker can be refactored as per device blocker.
> 2. Change vfio_viommu_preset() to be only a per device checker.
> 3. Remove global vIOMMU blocker related stuff, e.g:
>     giommu_migration_blocker, vfio_[block|unblock]_giommu_migration()
>     and vfio_migration_finalize()
> 4. Change vfio_migration_realize(), vfio_block_multiple_devices_migration()
>     vfio_block_migration() and vfio_viommu_preset() to return bool type.
> 5. Print "Migration disabled" depending on enable_migration property
>     and print it as warning instead of error which is overkill.


We are close to soft freeze and these combo patches adding various fixes
all at once are difficult to evaluate.

Please split this patch in multiple ones to ease the review.  May be
start with the  int -> bool conversion of the return values. It should
remove some noise.

Thanks,

C.

> migrate_add_blocker() returns 0 when successfully adding the migration blocker.
> However, the caller of vfio_migration_realize() considers that migration was
> blocked when the latter returned an error. What matters for migration is that
> the blocker is added in core migration, so this cleans up usability such that
> user sees "Migrate disabled" when any of the vfio migration blockers are active
> and it's not intentionally forced by user with enable-migration=off.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/vfio/common.c              | 66 +++++++----------------------------
>   hw/vfio/migration.c           | 30 +++++++++-------
>   hw/vfio/pci.c                 |  4 +--
>   include/hw/vfio/vfio-common.h |  7 ++--
>   4 files changed, 36 insertions(+), 71 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 77e2ee0e5c6e..c80ecb1da53f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -362,7 +362,6 @@ bool vfio_mig_active(void)
>   }
>   
>   static Error *multiple_devices_migration_blocker;
> -static Error *giommu_migration_blocker;
>   
>   static unsigned int vfio_migratable_device_num(void)
>   {
> @@ -381,19 +380,19 @@ static unsigned int vfio_migratable_device_num(void)
>       return device_num;
>   }
>   
> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>   {
>       int ret;
>   
>       if (multiple_devices_migration_blocker ||
>           vfio_migratable_device_num() <= 1) {
> -        return 0;
> +        return true;
>       }
>   
>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>           error_setg(errp, "Migration is currently not supported with multiple "
>                            "VFIO devices");
> -        return -EINVAL;
> +        return false;
>       }
>   
>       error_setg(&multiple_devices_migration_blocker,
> @@ -403,9 +402,15 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>       if (ret < 0) {
>           error_free(multiple_devices_migration_blocker);
>           multiple_devices_migration_blocker = NULL;
> +    } else {
> +        /*
> +         * Only ON_OFF_AUTO_AUTO case, ON_OFF_AUTO_OFF is checked
> +         * in vfio_migration_realize().
> +         */
> +        warn_report("Migration disabled, not support multiple VFIO devices");
>       }
>   
> -    return ret;
> +    return !ret;
>   }
>   
>   void vfio_unblock_multiple_devices_migration(void)
> @@ -420,55 +425,10 @@ void vfio_unblock_multiple_devices_migration(void)
>       multiple_devices_migration_blocker = NULL;
>   }
>   
> -static bool vfio_viommu_preset(void)
> +/* Block migration with a vIOMMU */
> +bool vfio_viommu_preset(VFIODevice *vbasedev)
>   {
> -    VFIOAddressSpace *space;
> -
> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> -        if (space->as != &address_space_memory) {
> -            return true;
> -        }
> -    }
> -
> -    return false;
> -}
> -
> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp)
> -{
> -    int ret;
> -
> -    if (giommu_migration_blocker ||
> -        !vfio_viommu_preset()) {
> -        return 0;
> -    }
> -
> -    if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
> -        error_setg(errp,
> -                   "Migration is currently not supported with vIOMMU enabled");
> -        return -EINVAL;
> -    }
> -
> -    error_setg(&giommu_migration_blocker,
> -               "Migration is currently not supported with vIOMMU enabled");
> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
> -    if (ret < 0) {
> -        error_free(giommu_migration_blocker);
> -        giommu_migration_blocker = NULL;
> -    }
> -
> -    return ret;
> -}
> -
> -void vfio_migration_finalize(void)
> -{
> -    if (!giommu_migration_blocker ||
> -        vfio_viommu_preset()) {
> -        return;
> -    }
> -
> -    migrate_del_blocker(giommu_migration_blocker);
> -    error_free(giommu_migration_blocker);
> -    giommu_migration_blocker = NULL;
> +    return vbasedev->group->container->space->as != &address_space_memory;
>   }
>   
>   static void vfio_set_migration_error(int err)
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 1db7d52ab2c1..84036e5cfc01 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -802,13 +802,13 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       return 0;
>   }
>   
> -static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
> +static bool vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>   {
>       int ret;
>   
>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>           error_propagate(errp, err);
> -        return -EINVAL;
> +        return false;
>       }
>   
>       vbasedev->migration_blocker = error_copy(err);
> @@ -818,9 +818,11 @@ static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>       if (ret < 0) {
>           error_free(vbasedev->migration_blocker);
>           vbasedev->migration_blocker = NULL;
> +    } else if (vbasedev->enable_migration != ON_OFF_AUTO_OFF) {
> +        warn_report("%s: Migration disabled", vbasedev->name);
>       }
>   
> -    return ret;
> +    return !ret;
>   }
>   
>   /* ---------------------------------------------------------------------- */
> @@ -835,7 +837,12 @@ void vfio_reset_bytes_transferred(void)
>       bytes_transferred = 0;
>   }
>   
> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> +/*
> + * Return true when either migration initialized or blocker registered.
> + * Currently only return false when adding blocker fails which will
> + * de-register vfio device.
> + */
> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>   {
>       Error *err = NULL;
>       int ret;
> @@ -873,18 +880,17 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>                       vbasedev->name);
>       }
>   
> -    ret = vfio_block_multiple_devices_migration(vbasedev, errp);
> -    if (ret) {
> -        return ret;
> +    if (!vfio_block_multiple_devices_migration(vbasedev, errp)) {
> +        return false;
>       }
>   
> -    ret = vfio_block_giommu_migration(vbasedev, errp);
> -    if (ret) {
> -        return ret;
> +    if (vfio_viommu_preset(vbasedev)) {
> +        error_setg(&err, "%s: Migration is currently not supported "
> +                   "with vIOMMU enabled", vbasedev->name);
> +        return vfio_block_migration(vbasedev, err, errp);
>       }
>   
> -    trace_vfio_migration_realize(vbasedev->name);
> -    return 0;
> +    return true;
>   }
>   
>   void vfio_migration_exit(VFIODevice *vbasedev)
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index dc69d3031b24..184d08568154 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>       if (!pdev->failover_pair_id) {
>           ret = vfio_migration_realize(vbasedev, errp);
>           if (ret) {
> -            error_report("%s: Migration disabled", vbasedev->name);
> +            trace_vfio_migration_realize(vbasedev->name);
> +        } else {
>               goto out_vfio_migration;
>           }
>       }
> @@ -3257,7 +3258,6 @@ static void vfio_instance_finalize(Object *obj)
>        */
>       vfio_put_device(vdev);
>       vfio_put_group(group);
> -    vfio_migration_finalize();
>   }
>   
>   static void vfio_exitfn(PCIDevice *pdev)
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 93429b9abba0..3c18572322fc 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -225,9 +225,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>   extern VFIOGroupList vfio_group_list;
>   
>   bool vfio_mig_active(void);
> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
>   void vfio_unblock_multiple_devices_migration(void);
> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp);
> +bool vfio_viommu_preset(VFIODevice *vbasedev);
>   int64_t vfio_mig_bytes_transferred(void);
>   void vfio_reset_bytes_transferred(void);
>   
> @@ -252,8 +252,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
>   int vfio_spapr_remove_window(VFIOContainer *container,
>                                hwaddr offset_within_address_space);
>   
> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
>   void vfio_migration_exit(VFIODevice *vbasedev);
> -void vfio_migration_finalize(void);
>   
>   #endif /* HW_VFIO_VFIO_COMMON_H */



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29 15:42       ` Joao Martins
@ 2023-06-29 22:12         ` Alex Williamson
  2023-06-30  1:38           ` Duan, Zhenzhong
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2023-06-29 22:12 UTC (permalink / raw)
  To: Joao Martins; +Cc: Avihai Horon, Zhenzhong Duan, qemu-devel, clg, chao.p.peng

On Thu, 29 Jun 2023 16:42:23 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 29/06/2023 16:20, Avihai Horon wrote:
> > On 29/06/2023 15:44, Joao Martins wrote:  
> >> On 29/06/2023 09:40, Zhenzhong Duan wrote:  
> >>> This patch refactors vfio_migration_realize() and its dependend code
> >>> as follows:
> >>>
> >>> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
> >>>     e.g: vIOMMU blocker can be refactored as per device blocker.
> >>> 2. Change vfio_viommu_preset() to be only a per device checker.
> >>> 3. Remove global vIOMMU blocker related stuff, e.g:
> >>>     giommu_migration_blocker, vfio_[block|unblock]_giommu_migration()
> >>>     and vfio_migration_finalize()
> >>> 4. Change vfio_migration_realize(), vfio_block_multiple_devices_migration()
> >>>     vfio_block_migration() and vfio_viommu_preset() to return bool type.
> >>> 5. Print "Migration disabled" depending on enable_migration property
> >>>     and print it as warning instead of error which is overkill.
> >>>  
> >> I am not enterily sure we need to keep "Migration disabled". Perhaps we should
> >> just derisk from error to warning and use always the same error messages.
> >>  
> >>> migrate_add_blocker() returns 0 when successfully adding the migration blocker.
> >>> However, the caller of vfio_migration_realize() considers that migration was
> >>> blocked when the latter returned an error. What matters for migration is that
> >>> the blocker is added in core migration, so this cleans up usability such that
> >>> user sees "Migrate disabled" when any of the vfio migration blockers are active
> >>> and it's not intentionally forced by user with enable-migration=off.
> >>>
> >>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> >>> ---
> >>>   hw/vfio/common.c              | 66 +++++++----------------------------
> >>>   hw/vfio/migration.c           | 30 +++++++++-------
> >>>   hw/vfio/pci.c                 |  4 +--
> >>>   include/hw/vfio/vfio-common.h |  7 ++--
> >>>   4 files changed, 36 insertions(+), 71 deletions(-)
> >>>
> >>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>> index 77e2ee0e5c6e..c80ecb1da53f 100644
> >>> --- a/hw/vfio/common.c
> >>> +++ b/hw/vfio/common.c
> >>> @@ -362,7 +362,6 @@ bool vfio_mig_active(void)
> >>>   }
> >>>
> >>>   static Error *multiple_devices_migration_blocker;
> >>> -static Error *giommu_migration_blocker;
> >>>
> >>>   static unsigned int vfio_migratable_device_num(void)
> >>>   {
> >>> @@ -381,19 +380,19 @@ static unsigned int vfio_migratable_device_num(void)
> >>>       return device_num;
> >>>   }
> >>>
> >>> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
> >>> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
> >>>   {
> >>>       int ret;
> >>>
> >>>       if (multiple_devices_migration_blocker ||
> >>>           vfio_migratable_device_num() <= 1) {
> >>> -        return 0;
> >>> +        return true;
> >>>       }
> >>>
> >>>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
> >>>           error_setg(errp, "Migration is currently not supported with multiple "
> >>>                            "VFIO devices");
> >>> -        return -EINVAL;
> >>> +        return false;
> >>>       }
> >>>
> >>>       error_setg(&multiple_devices_migration_blocker,
> >>> @@ -403,9 +402,15 @@ int vfio_block_multiple_devices_migration(VFIODevice
> >>> *vbasedev, Error **errp)
> >>>       if (ret < 0) {
> >>>           error_free(multiple_devices_migration_blocker);
> >>>           multiple_devices_migration_blocker = NULL;
> >>> +    } else {
> >>> +        /*
> >>> +         * Only ON_OFF_AUTO_AUTO case, ON_OFF_AUTO_OFF is checked
> >>> +         * in vfio_migration_realize().
> >>> +         */
> >>> +        warn_report("Migration disabled, not support multiple VFIO devices");
> >>>       }
> >>>  
> >> Perhaps you could stash the previous error message and use it in the
> >> warn_report_error to consolidate the error messages e.g.
> >>
> >> bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
> >> {
> >>      Error *err = NULL;
> >>
> >>      if (multiple_devices_migration_blocker ||
> >>          vfio_migratable_device_num() <= 1) {
> >>          return true;
> >>      }
> >>
> >>      error_setg(&err, "%s: Migration is currently not supported with multiple "
> >>                       "VFIO devices", vbasedev->name);
> >>
> >>      if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
> >>          error_propagate(errp, err);
> >>          return -EINVAL;
> >>      }
> >>
> >>      ...
> >>      if (ret < 0) {
> >>      } else {
> >>          /* Warns only on ON_OFF_AUTO_AUTO case */
> >>          warn_report_err(err);  
> > 
> > I'm not sure this warning is needed.
> > If I remember correctly, I think Alex didn't want migration error/warning
> > messages to be logged in the AUTO case.

Correct.

> Hmm, ok, I missed this from the previous discussions.
> 
> So today there are migration warnings in the current code. (even in the AUTO
> case). So if we want them removed, then this patch would then just remove the
> "Migration disabled" all together (in the two places we commented).
> 
> The rest of the cases already propagate the error I think. And the AUTO case
> will always be blocked migration and see the same printed messages elsewhere.

I tested this with Avihai's series and saw the correct logging, at
least for a device that does not support migration.

In AUTO mode, we should only ever see errors or warnings if the device
supports migration and an error or incompatibility occurs while further
probing or configuring it.  Lack of support for migration should only
ever generate an error or warning when using enable_migration=on or the
global -only-migratable flag.

As I understood Avihai's patch, we're populating the Error pointer, but
we only ever propagate that error in the above cases.  Thanks,

Alex

> >>      }
> >> }
> >>  
> >>> -    return ret;
> >>> +    return !ret;
> >>>   }
> >>>
> >>>   void vfio_unblock_multiple_devices_migration(void)
> >>> @@ -420,55 +425,10 @@ void vfio_unblock_multiple_devices_migration(void)
> >>>       multiple_devices_migration_blocker = NULL;
> >>>   }
> >>>
> >>> -static bool vfio_viommu_preset(void)
> >>> +/* Block migration with a vIOMMU */  
> >> I meant in the previous version to put the comment on top of the caller, not on
> >> the definition. But with the new code structure from Avihai the error message
> >> further below... it will look a bit redundant.
> >>  
> >>> +bool vfio_viommu_preset(VFIODevice *vbasedev)
> >>>   {
> >>> -    VFIOAddressSpace *space;
> >>> -
> >>> -    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> >>> -        if (space->as != &address_space_memory) {
> >>> -            return true;
> >>> -        }
> >>> -    }
> >>> -
> >>> -    return false;
> >>> -}
> >>> -
> >>> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp)
> >>> -{
> >>> -    int ret;
> >>> -
> >>> -    if (giommu_migration_blocker ||
> >>> -        !vfio_viommu_preset()) {
> >>> -        return 0;
> >>> -    }
> >>> -
> >>> -    if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
> >>> -        error_setg(errp,
> >>> -                   "Migration is currently not supported with vIOMMU enabled");
> >>> -        return -EINVAL;
> >>> -    }
> >>> -
> >>> -    error_setg(&giommu_migration_blocker,
> >>> -               "Migration is currently not supported with vIOMMU enabled");
> >>> -    ret = migrate_add_blocker(giommu_migration_blocker, errp);
> >>> -    if (ret < 0) {
> >>> -        error_free(giommu_migration_blocker);
> >>> -        giommu_migration_blocker = NULL;
> >>> -    }
> >>> -
> >>> -    return ret;
> >>> -}
> >>> -
> >>> -void vfio_migration_finalize(void)
> >>> -{
> >>> -    if (!giommu_migration_blocker ||
> >>> -        vfio_viommu_preset()) {
> >>> -        return;
> >>> -    }
> >>> -
> >>> -    migrate_del_blocker(giommu_migration_blocker);
> >>> -    error_free(giommu_migration_blocker);
> >>> -    giommu_migration_blocker = NULL;
> >>> +    return vbasedev->group->container->space->as != &address_space_memory;
> >>>   }
> >>>  
> >> nice consolidation
> >>  
> >>>   static void vfio_set_migration_error(int err)
> >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >>> index 1db7d52ab2c1..84036e5cfc01 100644
> >>> --- a/hw/vfio/migration.c
> >>> +++ b/hw/vfio/migration.c
> >>> @@ -802,13 +802,13 @@ static int vfio_migration_init(VFIODevice *vbasedev)
> >>>       return 0;
> >>>   }
> >>>
> >>> -static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
> >>> +static bool vfio_block_migration(VFIODevice *vbasedev, Error *err, Error
> >>> **errp)
> >>>   {
> >>>       int ret;
> >>>
> >>>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
> >>>           error_propagate(errp, err);
> >>> -        return -EINVAL;
> >>> +        return false;
> >>>       }
> >>>
> >>>       vbasedev->migration_blocker = error_copy(err);
> >>> @@ -818,9 +818,11 @@ static int vfio_block_migration(VFIODevice *vbasedev,
> >>> Error *err, Error **errp)
> >>>       if (ret < 0) {
> >>>           error_free(vbasedev->migration_blocker);
> >>>           vbasedev->migration_blocker = NULL;
> >>> +    } else if (vbasedev->enable_migration != ON_OFF_AUTO_OFF) {
> >>> +        warn_report("%s: Migration disabled", vbasedev->name);
> >>>       }
> >>>  
> >> Perhaps you can use the the local error to expand on why migration was
> >> disabled e.g.
> >>
> >>          warn_report_err(err);  
> > 
> > Same here.
> > 
> > Thanks.
> >   
> >>  
> >>> -    return ret;
> >>> +    return !ret;
> >>>   }
> >>>
> >>>   /* ---------------------------------------------------------------------- */
> >>> @@ -835,7 +837,12 @@ void vfio_reset_bytes_transferred(void)
> >>>       bytes_transferred = 0;
> >>>   }
> >>>
> >>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> >>> +/*
> >>> + * Return true when either migration initialized or blocker registered.
> >>> + * Currently only return false when adding blocker fails which will
> >>> + * de-register vfio device.
> >>> + */
> >>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> >>>   {
> >>>       Error *err = NULL;
> >>>       int ret;
> >>> @@ -873,18 +880,17 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
> >>> **errp)
> >>>                       vbasedev->name);
> >>>       }
> >>>
> >>> -    ret = vfio_block_multiple_devices_migration(vbasedev, errp);
> >>> -    if (ret) {
> >>> -        return ret;
> >>> +    if (!vfio_block_multiple_devices_migration(vbasedev, errp)) {
> >>> +        return false;
> >>>       }
> >>>
> >>> -    ret = vfio_block_giommu_migration(vbasedev, errp);
> >>> -    if (ret) {
> >>> -        return ret;
> >>> +    if (vfio_viommu_preset(vbasedev)) {  
> >> The /* Block migration with a vIOMMU */
> >>
> >> Would go above, but I don't think we need it anymore ...
> >>  
> >>> +        error_setg(&err, "%s: Migration is currently not supported "
> >>> +                   "with vIOMMU enabled", vbasedev->name);
> >>> +        return vfio_block_migration(vbasedev, err, errp);  
> >> ... as the error message when placed here makes it obvious. So the comment I
> >> suggested won't add much. Unless others disagree.
> >>  
> >>>       }
> >>>
> >>> -    trace_vfio_migration_realize(vbasedev->name);
> >>> -    return 0;
> >>> +    return true;
> >>>   }
> >>>  
> >> I think somewhere in function we should have vfio_migration_exit() being called
> >> behind a label or elsewhere from vfio_migration_realize (...)
> >>  
> >>>   void vfio_migration_exit(VFIODevice *vbasedev)
> >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >>> index dc69d3031b24..184d08568154 100644
> >>> --- a/hw/vfio/pci.c
> >>> +++ b/hw/vfio/pci.c
> >>> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >>>       if (!pdev->failover_pair_id) {
> >>>           ret = vfio_migration_realize(vbasedev, errp);
> >>>           if (ret) {
> >>> -            error_report("%s: Migration disabled", vbasedev->name);
> >>> +            trace_vfio_migration_realize(vbasedev->name);
> >>> +        } else {
> >>>               goto out_vfio_migration;
> >>>           }
> >>>       }  
> >> (...) Which then void the need for this change. Perhaps your previous patch
> >> (4/5) could come after this refactor patch instead ... where you would fix the
> >> unwinding error path inside the vfio_migration_realize() as opposed to
> >> vfio_realize().
> >>  
> >>> @@ -3257,7 +3258,6 @@ static void vfio_instance_finalize(Object *obj)
> >>>        */
> >>>       vfio_put_device(vdev);
> >>>       vfio_put_group(group);
> >>> -    vfio_migration_finalize();
> >>>   }
> >>>
> >>>   static void vfio_exitfn(PCIDevice *pdev)
> >>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >>> index 93429b9abba0..3c18572322fc 100644
> >>> --- a/include/hw/vfio/vfio-common.h
> >>> +++ b/include/hw/vfio/vfio-common.h
> >>> @@ -225,9 +225,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
> >>>   extern VFIOGroupList vfio_group_list;
> >>>
> >>>   bool vfio_mig_active(void);
> >>> -int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
> >>> +bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
> >>>   void vfio_unblock_multiple_devices_migration(void);
> >>> -int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp);
> >>> +bool vfio_viommu_preset(VFIODevice *vbasedev);
> >>>   int64_t vfio_mig_bytes_transferred(void);
> >>>   void vfio_reset_bytes_transferred(void);
> >>>
> >>> @@ -252,8 +252,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
> >>>   int vfio_spapr_remove_window(VFIOContainer *container,
> >>>                                hwaddr offset_within_address_space);
> >>>
> >>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
> >>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
> >>>   void vfio_migration_exit(VFIODevice *vbasedev);
> >>> -void vfio_migration_finalize(void);
> >>>
> >>>   #endif /* HW_VFIO_VFIO_COMMON_H */  



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v4 3/5] vfio/pci: Disable INTx in vfio_realize error path
  2023-06-29 15:33       ` Joao Martins
@ 2023-06-30  1:19         ` Duan, Zhenzhong
  0 siblings, 0 replies; 25+ messages in thread
From: Duan, Zhenzhong @ 2023-06-30  1:19 UTC (permalink / raw)
  To: Martins, Joao, Cédric Le Goater, qemu-devel
  Cc: alex.williamson, avihaih, Peng, Chao P

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 3/5] vfio/pci: Disable INTx in vfio_realize error path
>
>
>
>On 29/06/2023 16:13, Cédric Le Goater wrote:
>> On 6/29/23 13:24, Joao Martins wrote:
>>> On 29/06/2023 09:40, Zhenzhong Duan wrote:
>>>> When vfio realize fails, INTx isn't disabled if it has been enabled.
>>>> This may confuse host side with unhandled interrupt report.
>>>>
>>>> Add a new label to be used for vfio_intx_enable() failed case.
>>>>
>>>> Fixes: a9994687cb9b ("vfio/display: core & wireup")
>>>> Fixes: b290659fc3dd ("hw/vfio/display: add ramfb support")
>>>> Fixes: c62a0c7ce34e ("vfio/display: add xres + yres properties")
>>>
>>> Sounds to me the correct Fixes tag is the same as first patch i.e.:
>>>
>>> Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change
>>> notifier")

OK, will use it.
Previously I thought I should pick commit a9994687cb9b which firstly
introduced the timer leak with a jump label out_teardown, then
b290659fc3dd and c62a0c7ce34e which used out_teardown.

>>>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>
>>> Looks good, but see some clarifications below.
>>>
>>>> ---
>>>>   hw/vfio/pci.c | 4 +++-
>>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>>>> ab6645ba60af..54a8179d1c64 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -3167,7 +3167,7 @@ static void vfio_realize(PCIDevice *pdev,
>>>> Error **errp)
>>>>
>>>> kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>>>>           ret = vfio_intx_enable(vdev, errp);
>>>>           if (ret) {
>>>> -            goto out_deregister;
>>>> +            goto out_intx_disable;
>>>>           }
>>>>       }
>>>>   @@ -3220,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev,
>>>> Error **errp)
>>>>       return;
>>>>     out_deregister:
>>>> +    vfio_disable_interrupts(vdev);
>>>
>>> You are calling vfio_disable_interrupts() when what you want is
>>> vfio_intx_disable() ? But I guess your thinking was to call
>>> vfio_disable_interrupt() which eventually calls vfio_intx_disable()
>>> in case INTx was really setup, thus saving the duplicated check. The
>>> MSIx/MSI in realize() I don't think they will be enabled at this point.
Yes.

>>> Let me know if I misunderstood.
>>>
>>>> +out_intx_disable:
>>>
>>> Maybe 'out_intx_teardown' or 'out_intx_deregister' because you are
>>> not really disabling INTx.
>>
>> or simply extract from vfio_disable_interrupts() :
>>
>>     if (vdev->interrupt == VFIO_INT_INTx) {
>>         vfio_intx_disable(vdev);
>>     }
>>
>> and add the above code before cleaning up the intx routing notifier
>> without any new goto labels.
>>
>An even better option indeed.
Will do.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails
  2023-06-29 11:45   ` Joao Martins
  2023-06-29 15:23     ` Cédric Le Goater
@ 2023-06-30  1:23     ` Duan, Zhenzhong
  1 sibling, 0 replies; 25+ messages in thread
From: Duan, Zhenzhong @ 2023-06-30  1:23 UTC (permalink / raw)
  To: Martins, Joao, qemu-devel; +Cc: alex.williamson, clg, avihaih, Peng, Chao P



>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 4/5] vfio/pci: Free resources when
>vfio_migration_realize fails
>
>On 29/06/2023 09:40, Zhenzhong Duan wrote:
>> When vfio_realize() succeeds, hot unplug will call vfio_exitfn() to
>> free resources allocated in vfio_realize(); when vfio_realize() fails,
>> vfio_exitfn() is never called and we need to free resources in
>> vfio_realize().
>>
>> In the case that vfio_migration_realize() fails,
>> e.g: with -only-migratable & enable-migration=off, we see below:
>>
>> (qemu) device_add
>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> 0000:81:11.1: Migration disabled
>> Error: disallowing migration blocker (--only-migratable) for:
>> 0000:81:11.1: Migration is disabled for VFIO device
>>
>> If we hotplug again we should see same log as above, but we see:
>> (qemu) device_add
>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> Error: vfio 0000:81:11.1: device is already attached
>>
>> That's because some references to VFIO device isn't released, we
>> should check return value of vfio_migration_realize() and release the
>> references, then VFIO device will be truely released when hotplug
>> fails.
>>
>> Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  hw/vfio/pci.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>> 54a8179d1c64..dc69d3031b24 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>          ret = vfio_migration_realize(vbasedev, errp);
>>          if (ret) {
>>              error_report("%s: Migration disabled", vbasedev->name);
>> +            goto out_vfio_migration;
>>          }
>>      }
>>
>> @@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error
>> **errp)
>>
>>      return;
>>
>> +out_vfio_migration:
>> +    vfio_migration_exit(vbasedev);
>>  out_deregister:
>>      vfio_disable_interrupts(vdev);
>>  out_intx_disable:
>
>I agree with the general sentiment behind the change.
>Clearly vfio::migration and vfio::migration_blocker are leaking from inside the
>migration_realize() function.
>
>But it is kinda awkward semantic that vfio_migration_realize() (or any realize)
>failures need to be accompanied with a vfio_migration_exit() that tears down
>state *leaked* by its realize() failure.
>
>It sounds to me that this should be inside the vfio_migration_realize() not on
>the caller? Unless QEMU ::realize() is expected to do this.
Good suggestion, will fix.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29 22:12         ` Alex Williamson
@ 2023-06-30  1:38           ` Duan, Zhenzhong
  0 siblings, 0 replies; 25+ messages in thread
From: Duan, Zhenzhong @ 2023-06-30  1:38 UTC (permalink / raw)
  To: Alex Williamson, Martins, Joao
  Cc: Avihai Horon, qemu-devel, clg, Peng, Chao P

>-----Original Message-----
>From: Alex Williamson <alex.williamson@redhat.com>
>Subject: Re: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration
>disabled"
>
>On Thu, 29 Jun 2023 16:42:23 +0100
>Joao Martins <joao.m.martins@oracle.com> wrote:
>
>> On 29/06/2023 16:20, Avihai Horon wrote:
>> > On 29/06/2023 15:44, Joao Martins wrote:
>> >> On 29/06/2023 09:40, Zhenzhong Duan wrote:
...
>> >>> @@ -403,9 +402,15 @@ int
>> >>> vfio_block_multiple_devices_migration(VFIODevice
>> >>> *vbasedev, Error **errp)
>> >>>       if (ret < 0) {
>> >>>           error_free(multiple_devices_migration_blocker);
>> >>>           multiple_devices_migration_blocker = NULL;
>> >>> +    } else {
>> >>> +        /*
>> >>> +         * Only ON_OFF_AUTO_AUTO case, ON_OFF_AUTO_OFF is checked
>> >>> +         * in vfio_migration_realize().
>> >>> +         */
>> >>> +        warn_report("Migration disabled, not support multiple
>> >>> +VFIO devices");
>> >>>       }
>> >>>
>> >> Perhaps you could stash the previous error message and use it in
>> >> the warn_report_error to consolidate the error messages e.g.
>> >>
>> >> bool vfio_block_multiple_devices_migration(VFIODevice *vbasedev,
>> >> Error **errp) {
>> >>      Error *err = NULL;
>> >>
>> >>      if (multiple_devices_migration_blocker ||
>> >>          vfio_migratable_device_num() <= 1) {
>> >>          return true;
>> >>      }
>> >>
>> >>      error_setg(&err, "%s: Migration is currently not supported with
>multiple "
>> >>                       "VFIO devices", vbasedev->name);
>> >>
>> >>      if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
>> >>          error_propagate(errp, err);
>> >>          return -EINVAL;
>> >>      }
>> >>
>> >>      ...
>> >>      if (ret < 0) {
>> >>      } else {
>> >>          /* Warns only on ON_OFF_AUTO_AUTO case */
>> >>          warn_report_err(err);
>> >
>> > I'm not sure this warning is needed.
>> > If I remember correctly, I think Alex didn't want migration
>> > error/warning messages to be logged in the AUTO case.
>
>Correct.
>
>> Hmm, ok, I missed this from the previous discussions.
>>
>> So today there are migration warnings in the current code. (even in
>> the AUTO case). So if we want them removed, then this patch would then
>> just remove the "Migration disabled" all together (in the two places we
>commented).
>>
>> The rest of the cases already propagate the error I think. And the
>> AUTO case will always be blocked migration and see the same printed
>messages elsewhere.
>
>I tested this with Avihai's series and saw the correct logging, at least for a
>device that does not support migration.
>
>In AUTO mode, we should only ever see errors or warnings if the device
>supports migration and an error or incompatibility occurs while further
>probing or configuring it.  Lack of support for migration should only ever
>generate an error or warning when using enable_migration=on or the global -
>only-migratable flag.
Will remove the two places of "Migration disabled" print.

>
>As I understood Avihai's patch, we're populating the Error pointer, but we
>only ever propagate that error in the above cases.  Thanks,
>
>Alex
>
...
>> >>> +818,11 @@ static int vfio_block_migration(VFIODevice *vbasedev,
>> >>> Error *err, Error **errp)
>> >>>       if (ret < 0) {
>> >>>           error_free(vbasedev->migration_blocker);
>> >>>           vbasedev->migration_blocker = NULL;
>> >>> +    } else if (vbasedev->enable_migration != ON_OFF_AUTO_OFF) {
>> >>> +        warn_report("%s: Migration disabled", vbasedev->name);
>> >>>       }
>> >>>
>> >> Perhaps you can use the the local error to expand on why migration
>> >> was disabled e.g.
>> >>
>> >>          warn_report_err(err);
>> >
>> > Same here.
>> >
>> > Thanks.
>> >
>> >>
>> >>> -    return ret;
>> >>> +    return !ret;
>> >>>   }
>> >>>
>> >>>   /*
>> >>> ------------------------------------------------------------------
>> >>> ---- */ @@ -835,7 +837,12 @@ void
>> >>> vfio_reset_bytes_transferred(void)
>> >>>       bytes_transferred = 0;
>> >>>   }
>> >>>
>> >>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> >>> +/*
>> >>> + * Return true when either migration initialized or blocker registered.
>> >>> + * Currently only return false when adding blocker fails which
>> >>> +will
>> >>> + * de-register vfio device.
>> >>> + */
>> >>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> >>>   {
>> >>>       Error *err = NULL;
>> >>>       int ret;
>> >>> @@ -873,18 +880,17 @@ int vfio_migration_realize(VFIODevice
>> >>> *vbasedev, Error
>> >>> **errp)
>> >>>                       vbasedev->name);
>> >>>       }
>> >>>
>> >>> -    ret = vfio_block_multiple_devices_migration(vbasedev, errp);
>> >>> -    if (ret) {
>> >>> -        return ret;
>> >>> +    if (!vfio_block_multiple_devices_migration(vbasedev, errp)) {
>> >>> +        return false;
>> >>>       }
>> >>>
>> >>> -    ret = vfio_block_giommu_migration(vbasedev, errp);
>> >>> -    if (ret) {
>> >>> -        return ret;
>> >>> +    if (vfio_viommu_preset(vbasedev)) {
>> >> The /* Block migration with a vIOMMU */
>> >>
>> >> Would go above, but I don't think we need it anymore ...
Will remove it.

>> >>
>> >>> +        error_setg(&err, "%s: Migration is currently not supported "
>> >>> +                   "with vIOMMU enabled", vbasedev->name);
>> >>> +        return vfio_block_migration(vbasedev, err, errp);
>> >> ... as the error message when placed here makes it obvious. So the
>> >> comment I suggested won't add much. Unless others disagree.
>> >>
>> >>>       }
>> >>>
>> >>> -    trace_vfio_migration_realize(vbasedev->name);
>> >>> -    return 0;
>> >>> +    return true;
>> >>>   }
>> >>>
>> >> I think somewhere in function we should have vfio_migration_exit()
>> >> being called behind a label or elsewhere from
>> >> vfio_migration_realize (...)
>> >>
>> >>>   void vfio_migration_exit(VFIODevice *vbasedev) diff --git
>> >>> a/hw/vfio/pci.c b/hw/vfio/pci.c index dc69d3031b24..184d08568154
>> >>> 100644
>> >>> --- a/hw/vfio/pci.c
>> >>> +++ b/hw/vfio/pci.c
>> >>> @@ -3209,7 +3209,8 @@ static void vfio_realize(PCIDevice *pdev,
>> >>> Error **errp)
>> >>>       if (!pdev->failover_pair_id) {
>> >>>           ret = vfio_migration_realize(vbasedev, errp);
>> >>>           if (ret) {
>> >>> -            error_report("%s: Migration disabled",
>> >>> vbasedev->name);
>> >>> +            trace_vfio_migration_realize(vbasedev->name);
>> >>> +        } else {
>> >>>               goto out_vfio_migration;
>> >>>           }
>> >>>       }
>> >> (...) Which then void the need for this change. Perhaps your
>> >> previous patch
>> >> (4/5) could come after this refactor patch instead ... where you
>> >> would fix the unwinding error path inside the
>> >> vfio_migration_realize() as opposed to vfio_realize().
Sure, will fix.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled"
  2023-06-29 16:40   ` Cédric Le Goater
@ 2023-06-30  1:40     ` Duan, Zhenzhong
  0 siblings, 0 replies; 25+ messages in thread
From: Duan, Zhenzhong @ 2023-06-30  1:40 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, Martins, Joao, avihaih, Peng, Chao P

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Friday, June 30, 2023 12:40 AM
>Subject: Re: [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration
>disabled"
>
>Hello Zhenzhong,
>
>On 6/29/23 10:40, Zhenzhong Duan wrote:
>> This patch refactors vfio_migration_realize() and its dependend code
>> as follows:
>>
>> 1. It's redundant in vfio_migration_realize() to registers multiple blockers,
>>     e.g: vIOMMU blocker can be refactored as per device blocker.
>> 2. Change vfio_viommu_preset() to be only a per device checker.
>> 3. Remove global vIOMMU blocker related stuff, e.g:
>>     giommu_migration_blocker, vfio_[block|unblock]_giommu_migration()
>>     and vfio_migration_finalize()
>> 4. Change vfio_migration_realize(), vfio_block_multiple_devices_migration()
>>     vfio_block_migration() and vfio_viommu_preset() to return bool type.
>> 5. Print "Migration disabled" depending on enable_migration property
>>     and print it as warning instead of error which is overkill.
>
>
>We are close to soft freeze and these combo patches adding various fixes all
>at once are difficult to evaluate.
>
>Please split this patch in multiple ones to ease the review.  May be start with
>the  int -> bool conversion of the return values. It should remove some noise.
Good suggestion! Will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 0/5] VFIO migration related refactor and bug fix
  2023-06-29  8:40 [PATCH v4 0/5] VFIO migration related refactor and bug fix Zhenzhong Duan
                   ` (4 preceding siblings ...)
  2023-06-29  8:40 ` [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
@ 2023-06-30  6:01 ` Cédric Le Goater
  5 siblings, 0 replies; 25+ messages in thread
From: Cédric Le Goater @ 2023-06-30  6:01 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, joao.m.martins, avihaih, chao.p.peng

On 6/29/23 10:40, Zhenzhong Duan wrote:
> Hello,
> 
> PATCH5 refactors the VFIO migration blocker related code based on
> suggestions from Joao and Cedric, so that code is simpler and
> "Migration disabled" printed in right case.
> 
> But before that works, also found some hotplug bugs when testing
> blocker adding failed case. PATCH1-4 fix them.
> 
> See patch description for details.
> 
> v4: Rebased on [1] which contains Avihai's patchset [2]
>      Add more patches to fix resource leak issue, split based on
>      different fix TAG per Joao
>      Change to not print "Migration disabled" with explicit
>      enable-migration=off per Avihai
>      Rename vfio_block_giommu_migration to vfio_viommu_preset per Joao
> 
> v3: Add PATCH1,2 to fix hotplug bug
>      Fix bugs in PATCH3 Avihai and Joao pointed out
> 
> Tested vfio hotplug/unplug with vfio migration supported and unsupported cases,
> and different param of enable-migration=[on/off/auto] and -only-migratable.
> 
> [1] https://github.com/legoater/qemu/tree/vfio-next
> [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg06117.html
> 
> Thanks
> 
> Zhenzhong Duan (5):
>    vfio/pci: Fix a segfault in vfio_realize
>    vfio/pci: Free leaked timer in vfio_realize error path

Applied patch 1-2 to vfio-next. PR is sent.

Thanks,

C.


>    vfio/pci: Disable INTx in vfio_realize error path
>    vfio/pci: Free resources when vfio_migration_realize fails
>    vfio/migration: Refactor and fix print of "Migration disabled"
> 
>   hw/vfio/common.c              | 66 +++++++----------------------------
>   hw/vfio/migration.c           | 30 +++++++++-------
>   hw/vfio/pci.c                 | 18 +++++++---
>   include/hw/vfio/vfio-common.h |  7 ++--
>   4 files changed, 48 insertions(+), 73 deletions(-)
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-06-30  6:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29  8:40 [PATCH v4 0/5] VFIO migration related refactor and bug fix Zhenzhong Duan
2023-06-29  8:40 ` [PATCH v4 1/5] vfio/pci: Fix a segfault in vfio_realize Zhenzhong Duan
2023-06-29 10:57   ` Joao Martins
2023-06-29 13:06   ` Cédric Le Goater
2023-06-29  8:40 ` [PATCH v4 2/5] vfio/pci: Free leaked timer in vfio_realize error path Zhenzhong Duan
2023-06-29 10:59   ` Joao Martins
2023-06-29 13:09   ` Cédric Le Goater
2023-06-29  8:40 ` [PATCH v4 3/5] vfio/pci: Disable INTx " Zhenzhong Duan
2023-06-29 11:24   ` Joao Martins
2023-06-29 15:13     ` Cédric Le Goater
2023-06-29 15:33       ` Joao Martins
2023-06-30  1:19         ` Duan, Zhenzhong
2023-06-29  8:40 ` [PATCH v4 4/5] vfio/pci: Free resources when vfio_migration_realize fails Zhenzhong Duan
2023-06-29 11:45   ` Joao Martins
2023-06-29 15:23     ` Cédric Le Goater
2023-06-30  1:23     ` Duan, Zhenzhong
2023-06-29  8:40 ` [PATCH v4 5/5] vfio/migration: Refactor and fix print of "Migration disabled" Zhenzhong Duan
2023-06-29 12:44   ` Joao Martins
2023-06-29 15:20     ` Avihai Horon
2023-06-29 15:42       ` Joao Martins
2023-06-29 22:12         ` Alex Williamson
2023-06-30  1:38           ` Duan, Zhenzhong
2023-06-29 16:40   ` Cédric Le Goater
2023-06-30  1:40     ` Duan, Zhenzhong
2023-06-30  6:01 ` [PATCH v4 0/5] VFIO migration related refactor and bug fix Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.