All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/13] vfio/migration: Device dirty page tracking
@ 2023-03-04  1:43 Joao Martins
  2023-03-04  1:43 ` [PATCH v3 01/13] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
                   ` (14 more replies)
  0 siblings, 15 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins

Hey,

Presented herewith a series based on the basic VFIO migration protocol v2
implementation [1].

It is split from its parent series[5] to solely focus on device dirty
page tracking. Device dirty page tracking allows the VFIO device to
record its DMAs and report them back when needed. This is part of VFIO
migration and is used during pre-copy phase of migration to track the
RAM pages that the device has written to and mark those pages dirty, so
they can later be re-sent to target.

Device dirty page tracking uses the DMA logging uAPI to discover device
capabilities, to start and stop tracking, and to get dirty page bitmap
report. Extra details and uAPI definition can be found here [3].

Device dirty page tracking operates in VFIOContainer scope. I.e., When
dirty tracking is started, stopped or dirty page report is queried, all
devices within a VFIOContainer are iterated and for each of them device
dirty page tracking is started, stopped or dirty page report is queried,
respectively.

Device dirty page tracking is used only if all devices within a
VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
used, and if that is not supported as well, memory is perpetually marked
dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
support, the last two usually have the same effect of perpetually
marking all pages dirty.

Normally, when asked to start dirty tracking, all the currently DMA
mapped ranges are tracked by device dirty page tracking. If using a
vIOMMU we block live migration. It's temporary and a separate series is
going to add support for it. Thus this series focus on getting the
ground work first.

The series is organized as follows:

- Patches 1-7: Fix bugs and do some preparatory work required prior to
  adding device dirty page tracking.
- Patches 8-10: Implement device dirty page tracking.
- Patch 11: Blocks live migration with vIOMMU.
- Patches 12-13 enable device dirty page tracking and document it.

Comments, improvements as usual appreciated.

Thanks,
	Joao

Changes from v2 [5]:
- Split initial dirty page tracking support from the parent series to
  split into smaller parts.
- Replace an IOVATree with a simple two range setup: one range for 32-bit
  another one for 64-bit address space. After discussions it was sorted out
  this way due to unnecessary complexity of IOVAtree while being more
  efficient too without stressing so much of the UAPI limits. (patch 7 and 8) 
- For now exclude vIOMMU, and so add a live migration blocker if a
  vIOMMU is passed in. This will be followed up with vIOMMU support in
  a separate series. (patch 10)
- Add new patches to reuse most helpers used across memory listeners.
  This is useful for reusal when recording DMA ranges.  (patch 5 and 6)
- Adjust Documentation to avoid mentioning the vIOMMU and instead
  claim that vIOMMU with device dirty page tracking is blocked. Cedric
  gave a Rb, but I've dropped taking into consideration the split and no
  vIOMMU support (patch 13)
- Improve VFIOBitmap to avoid allocating a 16byte structure to
  place it on the stack. Remove the free helper function. (patch 4)
- Fixing the compilation issues (patch 8 and 10). Possibly not 100%
  addressed as I am still working out the env to repro it.

Changes from v1 [4]:
- Rebased on latest master branch. As part of it, made some changes in
  pre-copy to adjust it to Juan's new patches:
  1. Added a new patch that passes threshold_size parameter to
     .state_pending_{estimate,exact}() handlers.
  2. Added a new patch that refactors vfio_save_block().
  3. Changed the pre-copy patch to cache and report pending pre-copy
     size in the .state_pending_estimate() handler.
- Removed unnecessary P2P code. This should be added later on when P2P
  support is added. (Alex)
- Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap()
  (patch #11). (Alex)
- Stored vfio_devices_all_device_dirty_tracking()'s value in a local
  variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11).
- Refactored the viommu device dirty tracking ranges creation code to
  make it clearer (patch #15).
- Changed overflow check in vfio_iommu_range_is_device_tracked() to
  emphasize that we specifically check for 2^64 wrap around (patch #15).
- Added R-bs / Acks.

[1]
https://lore.kernel.org/qemu-devel/167658846945.932837.1420176491103357684.stgit@omen/

[2]
https://lore.kernel.org/kvm/20221206083438.37807-3-yishaih@nvidia.com/

[3]
https://lore.kernel.org/netdev/20220908183448.195262-4-yishaih@nvidia.com/

[4] https://lore.kernel.org/qemu-devel/20230126184948.10478-1-avihaih@nvidia.com/

[5] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/


Avihai Horon (6):
  vfio/common: Fix error reporting in vfio_get_dirty_bitmap()
  vfio/common: Fix wrong %m usages
  vfio/common: Abort migration if dirty log start/stop/sync fails
  vfio/common: Add VFIOBitmap and alloc function
  vfio/common: Extract code from vfio_get_dirty_bitmap() to new function
  docs/devel: Document VFIO device dirty page tracking

Joao Martins (7):
  vfio/common: Add helper to validate iova/end against hostwin
  vfio/common: Consolidate skip/invalid section into helper
  vfio/common: Record DMA mapped IOVA ranges
  vfio/common: Add device dirty page tracking start/stop
  vfio/common: Add device dirty page bitmap sync
  vfio/migration: Block migration with vIOMMU
  vfio/migration: Query device dirty page tracking support

 docs/devel/vfio-migration.rst |  46 ++-
 hw/vfio/common.c              | 668 ++++++++++++++++++++++++++++------
 hw/vfio/migration.c           |  21 ++
 hw/vfio/trace-events          |   2 +
 include/hw/vfio/vfio-common.h |  15 +
 5 files changed, 617 insertions(+), 135 deletions(-)

-- 
2.17.2



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 01/13] vfio/common: Fix error reporting in vfio_get_dirty_bitmap()
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-04  1:43 ` [PATCH v3 02/13] vfio/common: Fix wrong %m usages Joao Martins
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

From: Avihai Horon <avihaih@nvidia.com>

Return -errno instead of -1 if VFIO_IOMMU_DIRTY_PAGES ioctl fails in
vfio_get_dirty_bitmap().

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index bab83c0e55cb..9fc305448fa2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1335,6 +1335,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
     if (ret) {
+        ret = -errno;
         error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
                 " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
                 (uint64_t)range->size, errno);
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 02/13] vfio/common: Fix wrong %m usages
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
  2023-03-04  1:43 ` [PATCH v3 01/13] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-04  1:43 ` [PATCH v3 03/13] vfio/common: Abort migration if dirty log start/stop/sync fails Joao Martins
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

From: Avihai Horon <avihaih@nvidia.com>

There are several places where the %m conversion is used if one of
vfio_dma_map(), vfio_dma_unmap() or vfio_get_dirty_bitmap() fail.

The %m usage in these places is wrong since %m relies on errno value while
the above functions don't report errors via errno.

Fix it by using strerror() with the returned value instead.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9fc305448fa2..4d26e9cccf91 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -703,17 +703,17 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                            read_only);
         if (ret) {
             error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         "0x%"HWADDR_PRIx", %p) = %d (%s)",
                          container, iova,
-                         iotlb->addr_mask + 1, vaddr, ret);
+                         iotlb->addr_mask + 1, vaddr, ret, strerror(-ret));
         }
     } else {
         ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
         if (ret) {
             error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         "0x%"HWADDR_PRIx") = %d (%s)",
                          container, iova,
-                         iotlb->addr_mask + 1, ret);
+                         iotlb->addr_mask + 1, ret, strerror(-ret));
         }
     }
 out:
@@ -1095,8 +1095,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
                        vaddr, section->readonly);
     if (ret) {
         error_setg(&err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                   "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                   container, iova, int128_get64(llsize), vaddr, ret);
+                   "0x%"HWADDR_PRIx", %p) = %d (%s)",
+                   container, iova, int128_get64(llsize), vaddr, ret,
+                   strerror(-ret));
         if (memory_region_is_ram_device(section->mr)) {
             /* Allow unexpected mappings not to be fatal for RAM devices */
             error_report_err(err);
@@ -1228,16 +1229,18 @@ static void vfio_listener_region_del(MemoryListener *listener,
             ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
             if (ret) {
                 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                             "0x%"HWADDR_PRIx") = %d (%m)",
-                             container, iova, int128_get64(llsize), ret);
+                             "0x%"HWADDR_PRIx") = %d (%s)",
+                             container, iova, int128_get64(llsize), ret,
+                             strerror(-ret));
             }
             iova += int128_get64(llsize);
         }
         ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
         if (ret) {
             error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
-                         container, iova, int128_get64(llsize), ret);
+                         "0x%"HWADDR_PRIx") = %d (%s)",
+                         container, iova, int128_get64(llsize), ret,
+                         strerror(-ret));
         }
     }
 
@@ -1384,9 +1387,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                                     translated_addr);
         if (ret) {
             error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
-                         container, iova,
-                         iotlb->addr_mask + 1, ret);
+                         "0x%"HWADDR_PRIx") = %d (%s)",
+                         container, iova, iotlb->addr_mask + 1, ret,
+                         strerror(-ret));
         }
     }
     rcu_read_unlock();
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 03/13] vfio/common: Abort migration if dirty log start/stop/sync fails
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
  2023-03-04  1:43 ` [PATCH v3 01/13] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
  2023-03-04  1:43 ` [PATCH v3 02/13] vfio/common: Fix wrong %m usages Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-04  1:43 ` [PATCH v3 04/13] vfio/common: Add VFIOBitmap and alloc function Joao Martins
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

From: Avihai Horon <avihaih@nvidia.com>

If VFIO dirty pages log start/stop/sync fails during migration,
migration should be aborted as pages dirtied by VFIO devices might not
be reported properly.

This is not the case today, where in such scenario only an error is
printed.

Fix it by aborting migration in the above scenario.

Fixes: 758b96b61d5c ("vfio/migrate: Move switch of dirty tracking into vfio_memory_listener")
Fixes: b6dd6504e303 ("vfio: Add vfio_listener_log_sync to mark dirty pages")
Fixes: 9e7b0442f23a ("vfio: Add ioctl to get dirty pages bitmap during dma unmap")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4d26e9cccf91..4c801513136a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -42,6 +42,7 @@
 #include "migration/migration.h"
 #include "migration/misc.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file.h"
 #include "sysemu/tpm.h"
 
 VFIOGroupList vfio_group_list =
@@ -390,6 +391,19 @@ void vfio_unblock_multiple_devices_migration(void)
     multiple_devices_migration_blocker = NULL;
 }
 
+static void vfio_set_migration_error(int err)
+{
+    MigrationState *ms = migrate_get_current();
+
+    if (migration_is_setup_or_active(ms->state)) {
+        WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
+            if (ms->to_dst_file) {
+                qemu_file_set_error(ms->to_dst_file, err);
+            }
+        }
+    }
+}
+
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 {
     VFIOGroup *group;
@@ -680,6 +694,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     if (iotlb->target_as != &address_space_memory) {
         error_report("Wrong target AS \"%s\", only system memory is allowed",
                      iotlb->target_as->name ? iotlb->target_as->name : "none");
+        vfio_set_migration_error(-EINVAL);
         return;
     }
 
@@ -714,6 +729,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                          "0x%"HWADDR_PRIx") = %d (%s)",
                          container, iova,
                          iotlb->addr_mask + 1, ret, strerror(-ret));
+            vfio_set_migration_error(ret);
         }
     }
 out:
@@ -1259,7 +1275,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
     }
 }
 
-static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
+static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
 {
     int ret;
     struct vfio_iommu_type1_dirty_bitmap dirty = {
@@ -1267,7 +1283,7 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
     };
 
     if (!container->dirty_pages_supported) {
-        return;
+        return 0;
     }
 
     if (start) {
@@ -1278,23 +1294,34 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
     if (ret) {
+        ret = -errno;
         error_report("Failed to set dirty tracking flag 0x%x errno: %d",
                      dirty.flags, errno);
     }
+
+    return ret;
 }
 
 static void vfio_listener_log_global_start(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    int ret;
 
-    vfio_set_dirty_page_tracking(container, true);
+    ret = vfio_set_dirty_page_tracking(container, true);
+    if (ret) {
+        vfio_set_migration_error(ret);
+    }
 }
 
 static void vfio_listener_log_global_stop(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    int ret;
 
-    vfio_set_dirty_page_tracking(container, false);
+    ret = vfio_set_dirty_page_tracking(container, false);
+    if (ret) {
+        vfio_set_migration_error(ret);
+    }
 }
 
 static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
@@ -1370,19 +1397,18 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     VFIOContainer *container = giommu->container;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     ram_addr_t translated_addr;
+    int ret = -EINVAL;
 
     trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);
 
     if (iotlb->target_as != &address_space_memory) {
         error_report("Wrong target AS \"%s\", only system memory is allowed",
                      iotlb->target_as->name ? iotlb->target_as->name : "none");
-        return;
+        goto out;
     }
 
     rcu_read_lock();
     if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
-        int ret;
-
         ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
                                     translated_addr);
         if (ret) {
@@ -1393,6 +1419,11 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
         }
     }
     rcu_read_unlock();
+
+out:
+    if (ret) {
+        vfio_set_migration_error(ret);
+    }
 }
 
 static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
@@ -1485,13 +1516,19 @@ static void vfio_listener_log_sync(MemoryListener *listener,
         MemoryRegionSection *section)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    int ret;
 
     if (vfio_listener_skipped_section(section)) {
         return;
     }
 
     if (vfio_devices_all_dirty_tracking(container)) {
-        vfio_sync_dirty_bitmap(container, section);
+        ret = vfio_sync_dirty_bitmap(container, section);
+        if (ret) {
+            error_report("vfio: Failed to sync dirty bitmap, err: %d (%s)", ret,
+                         strerror(-ret));
+            vfio_set_migration_error(ret);
+        }
     }
 }
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 04/13] vfio/common: Add VFIOBitmap and alloc function
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (2 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 03/13] vfio/common: Abort migration if dirty log start/stop/sync fails Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 13:20   ` Cédric Le Goater
  2023-03-04  1:43 ` [PATCH v3 05/13] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

There are already two places where dirty page bitmap allocation and
calculations are done in open code. With device dirty page tracking
being added in next patches, there are going to be even more places.

To avoid code duplication, introduce VFIOBitmap struct and corresponding
alloc function and use them where applicable.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 75 +++++++++++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4c801513136a..151e7f40b73d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -320,6 +320,27 @@ const MemoryRegionOps vfio_region_ops = {
  * Device state interfaces
  */
 
+typedef struct {
+    unsigned long *bitmap;
+    hwaddr size;
+    hwaddr pages;
+} VFIOBitmap;
+
+static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
+{
+    vbmap->pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
+    vbmap->size = ROUND_UP(vbmap->pages, sizeof(__u64) * BITS_PER_BYTE) /
+                                         BITS_PER_BYTE;
+    vbmap->bitmap = g_try_malloc0(vbmap->size);
+    if (!vbmap->bitmap) {
+        errno = ENOMEM;
+
+        return -errno;
+    }
+
+    return 0;
+}
+
 bool vfio_mig_active(void)
 {
     VFIOGroup *group;
@@ -468,9 +489,14 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
 {
     struct vfio_iommu_type1_dma_unmap *unmap;
     struct vfio_bitmap *bitmap;
-    uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
+    VFIOBitmap vbmap;
     int ret;
 
+    ret = vfio_bitmap_alloc(&vbmap, size);
+    if (ret) {
+        return -errno;
+    }
+
     unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
 
     unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
@@ -484,35 +510,28 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
      * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize
      * to qemu_real_host_page_size.
      */
-
     bitmap->pgsize = qemu_real_host_page_size();
-    bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
-                   BITS_PER_BYTE;
+    bitmap->size = vbmap.size;
+    bitmap->data = (__u64 *)vbmap.bitmap;
 
-    if (bitmap->size > container->max_dirty_bitmap_size) {
-        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
-                     (uint64_t)bitmap->size);
+    if (vbmap.size > container->max_dirty_bitmap_size) {
+        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
         ret = -E2BIG;
         goto unmap_exit;
     }
 
-    bitmap->data = g_try_malloc0(bitmap->size);
-    if (!bitmap->data) {
-        ret = -ENOMEM;
-        goto unmap_exit;
-    }
-
     ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
     if (!ret) {
-        cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
-                iotlb->translated_addr, pages);
+        cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap,
+                iotlb->translated_addr, vbmap.pages);
     } else {
         error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
     }
 
-    g_free(bitmap->data);
 unmap_exit:
     g_free(unmap);
+    g_free(vbmap.bitmap);
+
     return ret;
 }
 
@@ -1329,7 +1348,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
 {
     struct vfio_iommu_type1_dirty_bitmap *dbitmap;
     struct vfio_iommu_type1_dirty_bitmap_get *range;
-    uint64_t pages;
+    VFIOBitmap vbmap;
     int ret;
 
     if (!container->dirty_pages_supported) {
@@ -1339,6 +1358,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         return 0;
     }
 
+    ret = vfio_bitmap_alloc(&vbmap, size);
+    if (ret) {
+        return -errno;
+    }
+
     dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
     dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1353,15 +1377,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
      * to qemu_real_host_page_size.
      */
     range->bitmap.pgsize = qemu_real_host_page_size();
-
-    pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size();
-    range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
-                                         BITS_PER_BYTE;
-    range->bitmap.data = g_try_malloc0(range->bitmap.size);
-    if (!range->bitmap.data) {
-        ret = -ENOMEM;
-        goto err_out;
-    }
+    range->bitmap.size = vbmap.size;
+    range->bitmap.data = (__u64 *)vbmap.bitmap;
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
     if (ret) {
@@ -1372,14 +1389,14 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         goto err_out;
     }
 
-    cpu_physical_memory_set_dirty_lebitmap((unsigned long *)range->bitmap.data,
-                                            ram_addr, pages);
+    cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
+                                           vbmap.pages);
 
     trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
                                 range->bitmap.size, ram_addr);
 err_out:
-    g_free(range->bitmap.data);
     g_free(dbitmap);
+    g_free(vbmap.bitmap);
 
     return ret;
 }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 05/13] vfio/common: Add helper to validate iova/end against hostwin
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (3 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 04/13] vfio/common: Add VFIOBitmap and alloc function Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 13:24   ` Cédric Le Goater
  2023-03-04  1:43 ` [PATCH v3 06/13] vfio/common: Consolidate skip/invalid section into helper Joao Martins
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins

In preparation to be used in device dirty tracking, move the code that
finds the container host DMA window against a iova range.  This avoids
duplication on the common checks across listener callbacks.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 151e7f40b73d..80f3a1c44a01 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -903,6 +903,22 @@ static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
     g_free(vrdl);
 }
 
+static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
+                                            hwaddr iova, hwaddr end)
+{
+    VFIOHostDMAWindow *hostwin;
+    bool hostwin_found = false;
+
+    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+            hostwin_found = true;
+            break;
+        }
+    }
+
+    return hostwin_found ? hostwin : NULL;
+}
+
 static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
 {
     MemoryRegion *mr = section->mr;
@@ -928,7 +944,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
     void *vaddr;
     int ret;
     VFIOHostDMAWindow *hostwin;
-    bool hostwin_found;
     Error *err = NULL;
 
     if (vfio_listener_skipped_section(section)) {
@@ -1029,15 +1044,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
 #endif
     }
 
-    hostwin_found = false;
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-            hostwin_found = true;
-            break;
-        }
-    }
-
-    if (!hostwin_found) {
+    hostwin = vfio_find_hostwin(container, iova, end);
+    if (!hostwin) {
         error_setg(&err, "Container %p can't map guest IOVA region"
                    " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
         goto fail;
@@ -1239,15 +1247,9 @@ static void vfio_listener_region_del(MemoryListener *listener,
     if (memory_region_is_ram_device(section->mr)) {
         hwaddr pgmask;
         VFIOHostDMAWindow *hostwin;
-        bool hostwin_found = false;
 
-        QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-            if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-                hostwin_found = true;
-                break;
-            }
-        }
-        assert(hostwin_found); /* or region_add() would have failed */
+        hostwin = vfio_find_hostwin(container, iova, end);
+        assert(hostwin); /* or region_add() would have failed */
 
         pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
         try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 06/13] vfio/common: Consolidate skip/invalid section into helper
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (4 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 05/13] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 13:33   ` Cédric Le Goater
  2023-03-04  1:43 ` [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges Joao Martins
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins

The checks are replicated against region_add and region_del
and will be soon added in another memory listener dedicated
for dirty tracking.

Move these into a new helper for avoid duplication.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 31 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 80f3a1c44a01..ed908e303dbb 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -935,23 +935,14 @@ static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
     return true;
 }
 
-static void vfio_listener_region_add(MemoryListener *listener,
-                                     MemoryRegionSection *section)
+static bool vfio_listener_valid_section(MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-    hwaddr iova, end;
-    Int128 llend, llsize;
-    void *vaddr;
-    int ret;
-    VFIOHostDMAWindow *hostwin;
-    Error *err = NULL;
-
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_add_skip(
                 section->offset_within_address_space,
                 section->offset_within_address_space +
                 int128_get64(int128_sub(section->size, int128_one())));
-        return;
+        return false;
     }
 
     if (unlikely((section->offset_within_address_space &
@@ -966,6 +957,24 @@ static void vfio_listener_region_add(MemoryListener *listener,
                          section->offset_within_region,
                          qemu_real_host_page_size());
         }
+        return false;
+    }
+
+    return true;
+}
+
+static void vfio_listener_region_add(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    hwaddr iova, end;
+    Int128 llend, llsize;
+    void *vaddr;
+    int ret;
+    VFIOHostDMAWindow *hostwin;
+    Error *err = NULL;
+
+    if (!vfio_listener_valid_section(section)) {
         return;
     }
 
@@ -1184,26 +1193,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
     int ret;
     bool try_unmap = true;
 
-    if (vfio_listener_skipped_section(section)) {
-        trace_vfio_listener_region_del_skip(
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space &
-                  ~qemu_real_host_page_mask()) !=
-                 (section->offset_within_region & ~qemu_real_host_page_mask()))) {
-        if (!vfio_known_safe_misalignment(section)) {
-            error_report("%s received unaligned region %s iova=0x%"PRIx64
-                         " offset_within_region=0x%"PRIx64
-                         " qemu_real_host_page_size=0x%"PRIxPTR,
-                         __func__, memory_region_name(section->mr),
-                         section->offset_within_address_space,
-                         section->offset_within_region,
-                         qemu_real_host_page_size());
-        }
+    if (!vfio_listener_valid_section(section)) {
         return;
     }
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (5 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 06/13] vfio/common: Consolidate skip/invalid section into helper Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 13:41   ` Cédric Le Goater
                     ` (2 more replies)
  2023-03-04  1:43 ` [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop Joao Martins
                   ` (7 subsequent siblings)
  14 siblings, 3 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins,
	Avihai Horon

According to the device DMA logging uAPI, IOVA ranges to be logged by
the device must be provided all at once upon DMA logging start.

As preparation for the following patches which will add device dirty
page tracking, keep a record of all DMA mapped IOVA ranges so later they
can be used for DMA logging start.

Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
 hw/vfio/trace-events          |  1 +
 include/hw/vfio/vfio-common.h | 11 +++++
 3 files changed, 96 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ed908e303dbb..d84e5fd86bb4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -44,6 +44,7 @@
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
 #include "sysemu/tpm.h"
+#include "qemu/iova-tree.h"
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
     return ret;
 }
 
+/*
+ * Called for the dirty tracking memory listener to calculate the iova/end
+ * for a given memory region section. The checks here, replicate the logic
+ * in vfio_listener_region_{add,del}() used for the same purpose. And thus
+ * both listener should be kept in sync.
+ */
+static bool vfio_get_section_iova_range(VFIOContainer *container,
+                                        MemoryRegionSection *section,
+                                        hwaddr *out_iova, hwaddr *out_end)
+{
+    Int128 llend;
+    hwaddr iova;
+
+    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
+
+    if (int128_ge(int128_make64(iova), llend)) {
+        return false;
+    }
+
+    *out_iova = iova;
+    *out_end = int128_get64(llend) - 1;
+    return true;
+}
+
+static void vfio_dirty_tracking_update(MemoryListener *listener,
+                                       MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            tracking_listener);
+    VFIODirtyTrackingRange *range = &container->tracking_range;
+    hwaddr max32 = (1ULL << 32) - 1ULL;
+    hwaddr iova, end;
+
+    if (!vfio_listener_valid_section(section) ||
+        !vfio_get_section_iova_range(container, section, &iova, &end)) {
+        return;
+    }
+
+    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
+        if (iova < max32 && end <= max32) {
+                if (range->min32 > iova) {
+                    range->min32 = iova;
+                }
+                if (range->max32 < end) {
+                    range->max32 = end;
+                }
+                trace_vfio_device_dirty_tracking_update(iova, end,
+                                            range->min32, range->max32);
+        } else {
+                if (!range->min64 || range->min64 > iova) {
+                    range->min64 = iova;
+                }
+                if (range->max64 < end) {
+                    range->max64 = end;
+                }
+                trace_vfio_device_dirty_tracking_update(iova, end,
+                                            range->min64, range->max64);
+        }
+    }
+    return;
+}
+
+static const MemoryListener vfio_dirty_tracking_listener = {
+    .name = "vfio-tracking",
+    .region_add = vfio_dirty_tracking_update,
+};
+
+static void vfio_dirty_tracking_init(VFIOContainer *container)
+{
+    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
+    qemu_mutex_init(&container->tracking_mutex);
+    container->tracking_listener = vfio_dirty_tracking_listener;
+    memory_listener_register(&container->tracking_listener,
+                             container->space->as);
+    memory_listener_unregister(&container->tracking_listener);
+    qemu_mutex_destroy(&container->tracking_mutex);
+}
+
 static void vfio_listener_log_global_start(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     int ret;
 
+    vfio_dirty_tracking_init(container);
+
     ret = vfio_set_dirty_page_tracking(container, true);
     if (ret) {
         vfio_set_migration_error(ret);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 669d9fe07cd9..d97a6de17921 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
 vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
+vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 87524c64a443..96791add2719 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -23,6 +23,7 @@
 
 #include "exec/memory.h"
 #include "qemu/queue.h"
+#include "qemu/iova-tree.h"
 #include "qemu/notify.h"
 #include "ui/console.h"
 #include "hw/display/ramfb.h"
@@ -68,6 +69,13 @@ typedef struct VFIOMigration {
     size_t data_buffer_size;
 } VFIOMigration;
 
+typedef struct VFIODirtyTrackingRange {
+    hwaddr min32;
+    hwaddr max32;
+    hwaddr min64;
+    hwaddr max64;
+} VFIODirtyTrackingRange;
+
 typedef struct VFIOAddressSpace {
     AddressSpace *as;
     QLIST_HEAD(, VFIOContainer) containers;
@@ -89,6 +97,9 @@ typedef struct VFIOContainer {
     uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
     unsigned int dma_max_mappings;
+    VFIODirtyTrackingRange tracking_range;
+    QemuMutex tracking_mutex;
+    MemoryListener tracking_listener;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (6 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 18:25   ` Cédric Le Goater
  2023-03-06 18:42   ` Alex Williamson
  2023-03-04  1:43 ` [PATCH v3 09/13] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Joao Martins
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins,
	Avihai Horon

Add device dirty page tracking start/stop functionality. This uses the
device DMA logging uAPI to start and stop dirty page tracking by device.

Device dirty page tracking is used only if all devices within a
container support device dirty page tracking.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 166 +++++++++++++++++++++++++++++++++-
 hw/vfio/trace-events          |   1 +
 include/hw/vfio/vfio-common.h |   2 +
 3 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d84e5fd86bb4..aa0df0604704 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -453,6 +453,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
     return true;
 }
 
+static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(group, &container->group_list, container_next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (!vbasedev->dirty_pages_supported) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
 /*
  * Check if all VFIO devices are running and migration is active, which is
  * essentially equivalent to the migration being in pre-copy phase.
@@ -1395,15 +1411,152 @@ static void vfio_dirty_tracking_init(VFIOContainer *container)
     qemu_mutex_destroy(&container->tracking_mutex);
 }
 
+static int vfio_devices_dma_logging_set(VFIOContainer *container,
+                                        struct vfio_device_feature *feature)
+{
+    bool status = (feature->flags & VFIO_DEVICE_FEATURE_MASK) ==
+                  VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
+    VFIODevice *vbasedev;
+    VFIOGroup *group;
+    int ret = 0;
+
+    QLIST_FOREACH(group, &container->group_list, container_next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->dirty_tracking == status) {
+                continue;
+            }
+
+            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+            if (ret) {
+                ret = -errno;
+                error_report("%s: Failed to set DMA logging %s, err %d (%s)",
+                             vbasedev->name, status ? "start" : "stop", ret,
+                             strerror(errno));
+                goto out;
+            }
+            vbasedev->dirty_tracking = status;
+        }
+    }
+
+out:
+    return ret;
+}
+
+static int vfio_devices_dma_logging_stop(VFIOContainer *container)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+
+    feature->argsz = sizeof(buf);
+    feature->flags = VFIO_DEVICE_FEATURE_SET;
+    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
+
+    return vfio_devices_dma_logging_set(container, feature);
+}
+
+static struct vfio_device_feature *
+vfio_device_feature_dma_logging_start_create(VFIOContainer *container)
+{
+    struct vfio_device_feature *feature;
+    size_t feature_size;
+    struct vfio_device_feature_dma_logging_control *control;
+    struct vfio_device_feature_dma_logging_range *ranges;
+    VFIODirtyTrackingRange *tracking = &container->tracking_range;
+
+    feature_size = sizeof(struct vfio_device_feature) +
+                   sizeof(struct vfio_device_feature_dma_logging_control);
+    feature = g_try_malloc0(feature_size);
+    if (!feature) {
+        errno = ENOMEM;
+        return NULL;
+    }
+    feature->argsz = feature_size;
+    feature->flags = VFIO_DEVICE_FEATURE_SET;
+    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
+
+    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
+    control->page_size = qemu_real_host_page_size();
+
+    /*
+     * DMA logging uAPI guarantees to support at least a number of ranges that
+     * fits into a single host kernel base page.
+     */
+    control->num_ranges = !!tracking->max32 + !!tracking->max64;
+    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
+                        control->num_ranges);
+    if (!ranges) {
+        g_free(feature);
+        errno = ENOMEM;
+
+        return NULL;
+    }
+
+    control->ranges = (__aligned_u64)ranges;
+    if (tracking->max32) {
+        ranges->iova = tracking->min32;
+        ranges->length = (tracking->max32 - tracking->min32) + 1;
+        ranges++;
+    }
+    if (tracking->max64) {
+        ranges->iova = tracking->min64;
+        ranges->length = (tracking->max64 - tracking->min64) + 1;
+    }
+
+    trace_vfio_device_dirty_tracking_start(control->num_ranges,
+                                           tracking->min32, tracking->max32,
+                                           tracking->min64, tracking->max64);
+
+    return feature;
+}
+
+static void vfio_device_feature_dma_logging_start_destroy(
+    struct vfio_device_feature *feature)
+{
+    struct vfio_device_feature_dma_logging_control *control =
+        (struct vfio_device_feature_dma_logging_control *)feature->data;
+    struct vfio_device_feature_dma_logging_range *ranges =
+        (struct vfio_device_feature_dma_logging_range *)control->ranges;
+
+    g_free(ranges);
+    g_free(feature);
+}
+
+static int vfio_devices_dma_logging_start(VFIOContainer *container)
+{
+    struct vfio_device_feature *feature;
+    int ret = 0;
+
+    vfio_dirty_tracking_init(container);
+    feature = vfio_device_feature_dma_logging_start_create(container);
+    if (!feature) {
+        return -errno;
+    }
+
+    ret = vfio_devices_dma_logging_set(container, feature);
+    if (ret) {
+        vfio_devices_dma_logging_stop(container);
+    }
+
+    vfio_device_feature_dma_logging_start_destroy(feature);
+
+    return ret;
+}
+
 static void vfio_listener_log_global_start(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     int ret;
 
-    vfio_dirty_tracking_init(container);
+    if (vfio_devices_all_device_dirty_tracking(container)) {
+        ret = vfio_devices_dma_logging_start(container);
+    } else {
+        ret = vfio_set_dirty_page_tracking(container, true);
+    }
 
-    ret = vfio_set_dirty_page_tracking(container, true);
     if (ret) {
+        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
+                     ret, strerror(-ret));
         vfio_set_migration_error(ret);
     }
 }
@@ -1413,8 +1566,15 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     int ret;
 
-    ret = vfio_set_dirty_page_tracking(container, false);
+    if (vfio_devices_all_device_dirty_tracking(container)) {
+        ret = vfio_devices_dma_logging_stop(container);
+    } else {
+        ret = vfio_set_dirty_page_tracking(container, false);
+    }
+
     if (ret) {
+        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
+                     ret, strerror(-ret));
         vfio_set_migration_error(ret);
     }
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index d97a6de17921..7a7e0cfe5b23 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
+vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 96791add2719..1cbbccd91e11 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -154,6 +154,8 @@ typedef struct VFIODevice {
     VFIOMigration *migration;
     Error *migration_blocker;
     OnOffAuto pre_copy_dirty_page_tracking;
+    bool dirty_pages_supported;
+    bool dirty_tracking;
 } VFIODevice;
 
 struct VFIODeviceOps {
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 09/13] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (7 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 16:24   ` Cédric Le Goater
  2023-03-04  1:43 ` [PATCH v3 10/13] vfio/common: Add device dirty page bitmap sync Joao Martins
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

Extract the VFIO_IOMMU_DIRTY_PAGES ioctl code in vfio_get_dirty_bitmap()
to its own function.

This will help the code to be more readable after next patch will add
device dirty page bitmap sync functionality.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 57 +++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index aa0df0604704..b0c7d03279ab 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1579,26 +1579,13 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     }
 }
 
-static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
-                                 uint64_t size, ram_addr_t ram_addr)
+static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
+                                   hwaddr iova, hwaddr size)
 {
     struct vfio_iommu_type1_dirty_bitmap *dbitmap;
     struct vfio_iommu_type1_dirty_bitmap_get *range;
-    VFIOBitmap vbmap;
     int ret;
 
-    if (!container->dirty_pages_supported) {
-        cpu_physical_memory_set_dirty_range(ram_addr, size,
-                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
-                                            DIRTY_CLIENTS_NOCODE);
-        return 0;
-    }
-
-    ret = vfio_bitmap_alloc(&vbmap, size);
-    if (ret) {
-        return -errno;
-    }
-
     dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
     dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1613,8 +1600,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
      * to qemu_real_host_page_size.
      */
     range->bitmap.pgsize = qemu_real_host_page_size();
-    range->bitmap.size = vbmap.size;
-    range->bitmap.data = (__u64 *)vbmap.bitmap;
+    range->bitmap.size = vbmap->size;
+    range->bitmap.data = (__u64 *)vbmap->bitmap;
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
     if (ret) {
@@ -1622,16 +1609,42 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
                 " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
                 (uint64_t)range->size, errno);
-        goto err_out;
+    }
+
+    g_free(dbitmap);
+
+    return ret;
+}
+
+static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+                                 uint64_t size, ram_addr_t ram_addr)
+{
+    VFIOBitmap vbmap;
+    int ret;
+
+    if (!container->dirty_pages_supported) {
+        cpu_physical_memory_set_dirty_range(ram_addr, size,
+                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
+                                            DIRTY_CLIENTS_NOCODE);
+        return 0;
+    }
+
+    ret = vfio_bitmap_alloc(&vbmap, size);
+    if (ret) {
+        return -errno;
+    }
+
+    ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+    if (ret) {
+        goto out;
     }
 
     cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
                                            vbmap.pages);
 
-    trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
-                                range->bitmap.size, ram_addr);
-err_out:
-    g_free(dbitmap);
+    trace_vfio_get_dirty_bitmap(container->fd, iova, size, vbmap.size,
+                                ram_addr);
+out:
     g_free(vbmap.bitmap);
 
     return ret;
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 10/13] vfio/common: Add device dirty page bitmap sync
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (8 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 09/13] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 19:22   ` Alex Williamson
  2023-03-04  1:43 ` [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU Joao Martins
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins,
	Avihai Horon

Add device dirty page bitmap sync functionality. This uses the device
DMA logging uAPI to sync dirty page bitmap from the device.

Device dirty page bitmap sync is used only if all devices within a
container support device dirty page tracking.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 88 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b0c7d03279ab..5b8456975e97 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -342,6 +342,9 @@ static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
     return 0;
 }
 
+static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+                                 uint64_t size, ram_addr_t ram_addr);
+
 bool vfio_mig_active(void)
 {
     VFIOGroup *group;
@@ -565,10 +568,16 @@ static int vfio_dma_unmap(VFIOContainer *container,
         .iova = iova,
         .size = size,
     };
+    bool need_dirty_sync = false;
+    int ret;
+
+    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
+        if (!vfio_devices_all_device_dirty_tracking(container) &&
+            container->dirty_pages_supported) {
+            return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
+        }
 
-    if (iotlb && container->dirty_pages_supported &&
-        vfio_devices_all_running_and_mig_active(container)) {
-        return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
+        need_dirty_sync = true;
     }
 
     while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
@@ -594,10 +603,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
-    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
-        cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
-                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
-                                            DIRTY_CLIENTS_NOCODE);
+    if (need_dirty_sync) {
+        ret = vfio_get_dirty_bitmap(container, iova, size,
+                                    iotlb->translated_addr);
+        if (ret) {
+            return ret;
+        }
     }
 
     return 0;
@@ -1579,6 +1590,58 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     }
 }
 
+static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
+                                          hwaddr size, void *bitmap)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+                        sizeof(struct vfio_device_feature_dma_logging_report),
+                        sizeof(__aligned_u64))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+    struct vfio_device_feature_dma_logging_report *report =
+        (struct vfio_device_feature_dma_logging_report *)feature->data;
+
+    report->iova = iova;
+    report->length = size;
+    report->page_size = qemu_real_host_page_size();
+    report->bitmap = (__aligned_u64)bitmap;
+
+    feature->argsz = sizeof(buf);
+    feature->flags =
+        VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
+
+    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        return -errno;
+    }
+
+    return 0;
+}
+
+static int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
+                                           VFIOBitmap *vbmap, hwaddr iova,
+                                           hwaddr size)
+{
+    VFIODevice *vbasedev;
+    VFIOGroup *group;
+    int ret;
+
+    QLIST_FOREACH(group, &container->group_list, container_next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            ret = vfio_device_dma_logging_report(vbasedev, iova, size,
+                                                 vbmap->bitmap);
+            if (ret) {
+                error_report("%s: Failed to get DMA logging report, iova: "
+                             "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
+                             ", err: %d (%s)",
+                             vbasedev->name, iova, size, ret, strerror(-ret));
+
+                return ret;
+            }
+        }
+    }
+
+    return 0;
+}
+
 static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
                                    hwaddr iova, hwaddr size)
 {
@@ -1619,10 +1682,12 @@ static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
 static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
                                  uint64_t size, ram_addr_t ram_addr)
 {
+    bool all_device_dirty_tracking =
+        vfio_devices_all_device_dirty_tracking(container);
     VFIOBitmap vbmap;
     int ret;
 
-    if (!container->dirty_pages_supported) {
+    if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
         cpu_physical_memory_set_dirty_range(ram_addr, size,
                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
                                             DIRTY_CLIENTS_NOCODE);
@@ -1634,7 +1699,12 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         return -errno;
     }
 
-    ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+    if (all_device_dirty_tracking) {
+        ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
+    } else {
+        ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+    }
+
     if (ret) {
         goto out;
     }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (9 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 10/13] vfio/common: Add device dirty page bitmap sync Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 17:00   ` Cédric Le Goater
  2023-03-06 19:42   ` Alex Williamson
  2023-03-04  1:43 ` [PATCH v3 12/13] vfio/migration: Query device dirty page tracking support Joao Martins
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins

Migrating with vIOMMU will require either tracking maximum
IOMMU supported address space (e.g. 39/48 address width on Intel)
or range-track current mappings and dirty track the new ones
post starting dirty tracking. This will be done as a separate
series, so add a live migration blocker until that is fixed.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
 hw/vfio/migration.c           |  6 +++++
 include/hw/vfio/vfio-common.h |  2 ++
 3 files changed, 59 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5b8456975e97..9b909f856722 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -365,6 +365,7 @@ bool vfio_mig_active(void)
 }
 
 static Error *multiple_devices_migration_blocker;
+static Error *giommu_migration_blocker;
 
 static unsigned int vfio_migratable_device_num(void)
 {
@@ -416,6 +417,56 @@ void vfio_unblock_multiple_devices_migration(void)
     multiple_devices_migration_blocker = NULL;
 }
 
+static unsigned int vfio_use_iommu_device_num(void)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+    unsigned int device_num = 0;
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->group->container->space->as !=
+                                    &address_space_memory) {
+                device_num++;
+            }
+        }
+    }
+
+    return device_num;
+}
+
+int vfio_block_giommu_migration(Error **errp)
+{
+    int ret;
+
+    if (giommu_migration_blocker ||
+        !vfio_use_iommu_device_num()) {
+        return 0;
+    }
+
+    error_setg(&giommu_migration_blocker,
+               "Migration is currently not supported with vIOMMU enabled");
+    ret = migrate_add_blocker(giommu_migration_blocker, errp);
+    if (ret < 0) {
+        error_free(giommu_migration_blocker);
+        giommu_migration_blocker = NULL;
+    }
+
+    return ret;
+}
+
+void vfio_unblock_giommu_migration(void)
+{
+    if (!giommu_migration_blocker ||
+        vfio_use_iommu_device_num()) {
+        return;
+    }
+
+    migrate_del_blocker(giommu_migration_blocker);
+    error_free(giommu_migration_blocker);
+    giommu_migration_blocker = NULL;
+}
+
 static void vfio_set_migration_error(int err)
 {
     MigrationState *ms = migrate_get_current();
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a2c3d9bade7f..3e75868ae7a9 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -634,6 +634,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
         return ret;
     }
 
+    ret = vfio_block_giommu_migration(errp);
+    if (ret) {
+        return ret;
+    }
+
     trace_vfio_migration_probe(vbasedev->name);
     return 0;
 
@@ -659,6 +664,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
         unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
         vfio_migration_exit(vbasedev);
         vfio_unblock_multiple_devices_migration();
+        vfio_unblock_giommu_migration();
     }
 
     if (vbasedev->migration_blocker) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1cbbccd91e11..38e44258925b 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -233,6 +233,8 @@ extern VFIOGroupList vfio_group_list;
 bool vfio_mig_active(void);
 int vfio_block_multiple_devices_migration(Error **errp);
 void vfio_unblock_multiple_devices_migration(void);
+int vfio_block_giommu_migration(Error **errp);
+void vfio_unblock_giommu_migration(void);
 int64_t vfio_mig_bytes_transferred(void);
 
 #ifdef CONFIG_LINUX
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 12/13] vfio/migration: Query device dirty page tracking support
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (10 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 17:20   ` Cédric Le Goater
  2023-03-04  1:43 ` [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking Joao Martins
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Joao Martins,
	Avihai Horon

Now that everything has been set up for device dirty page tracking,
query the device for device dirty page tracking support.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/migration.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3e75868ae7a9..da3aa596b3ec 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -555,6 +555,19 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
     return 0;
 }
 
+static bool vfio_dma_logging_supported(VFIODevice *vbasedev)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+
+    feature->argsz = sizeof(buf);
+    feature->flags =
+        VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
+
+    return !ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev)
 {
     int ret;
@@ -589,6 +602,8 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     migration->device_state = VFIO_DEVICE_STATE_RUNNING;
     migration->data_fd = -1;
 
+    vbasedev->dirty_pages_supported = vfio_dma_logging_supported(vbasedev);
+
     oid = vmstate_if_get_id(VMSTATE_IF(DEVICE(obj)));
     if (oid) {
         path = g_strdup_printf("%s/vfio", oid);
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (11 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 12/13] vfio/migration: Query device dirty page tracking support Joao Martins
@ 2023-03-04  1:43 ` Joao Martins
  2023-03-06 17:15   ` Cédric Le Goater
  2023-03-05 20:57 ` [PATCH v3 00/13] vfio/migration: Device " Alex Williamson
  2023-03-06 17:23 ` Cédric Le Goater
  14 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

Adjust the VFIO dirty page tracking documentation and add a section to
describe device dirty page tracking.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 docs/devel/vfio-migration.rst | 46 +++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index c214c73e2818..1b68ccf11529 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -59,22 +59,37 @@ System memory dirty pages tracking
 ----------------------------------
 
 A ``log_global_start`` and ``log_global_stop`` memory listener callback informs
-the VFIO IOMMU module to start and stop dirty page tracking. A ``log_sync``
-memory listener callback marks those system memory pages as dirty which are
-used for DMA by the VFIO device. The dirty pages bitmap is queried per
-container. All pages pinned by the vendor driver through external APIs have to
-be marked as dirty during migration. When there are CPU writes, CPU dirty page
-tracking can identify dirtied pages, but any page pinned by the vendor driver
-can also be written by the device. There is currently no device or IOMMU
-support for dirty page tracking in hardware.
+the VFIO dirty tracking module to start and stop dirty page tracking. A
+``log_sync`` memory listener callback queries the dirty page bitmap from the
+dirty tracking module and marks system memory pages which were DMA-ed by the
+VFIO device as dirty. The dirty page bitmap is queried per container.
+
+Currently there are two ways dirty page tracking can be done:
+(1) Device dirty tracking:
+In this method the device is responsible to log and report its DMAs. This
+method can be used only if the device is capable of tracking its DMAs.
+Discovering device capability, starting and stopping dirty tracking, and
+syncing the dirty bitmaps from the device are done using the DMA logging uAPI.
+More info about the uAPI can be found in the comments of the
+``vfio_device_feature_dma_logging_control`` and
+``vfio_device_feature_dma_logging_report`` structures in the header file
+linux-headers/linux/vfio.h.
+
+(2) VFIO IOMMU module:
+In this method dirty tracking is done by IOMMU. However, there is currently no
+IOMMU support for dirty page tracking. For this reason, all pages are
+perpetually marked dirty, unless the device driver pins pages through external
+APIs in which case only those pinned pages are perpetually marked dirty.
+
+If the above two methods are not supported, all pages are perpetually marked
+dirty by QEMU.
 
 By default, dirty pages are tracked during pre-copy as well as stop-and-copy
-phase. So, a page pinned by the vendor driver will be copied to the destination
-in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
-it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
-finding dirty pages continuously, then it understands that even in stop-and-copy
-phase, it is likely to find dirty pages and can predict the downtime
-accordingly.
+phase. So, a page marked as dirty will be copied to the destination in both
+phases. Copying dirty pages in pre-copy phase helps QEMU to predict if it can
+achieve its downtime tolerances. If QEMU during pre-copy phase keeps finding
+dirty pages continuously, then it understands that even in stop-and-copy phase,
+it is likely to find dirty pages and can predict the downtime accordingly.
 
 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
 which disables querying the dirty bitmap during pre-copy phase. If it is set to
@@ -89,7 +104,8 @@ phase of migration. In that case, the unmap ioctl returns any dirty pages in
 that range and QEMU reports corresponding guest physical pages dirty. During
 stop-and-copy phase, an IOMMU notifier is used to get a callback for mapped
 pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for those
-mapped ranges.
+mapped ranges. If device dirty tracking is enabled with vIOMMU, live migration
+will be blocked.
 
 Flow of state changes during Live migration
 ===========================================
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (12 preceding siblings ...)
  2023-03-04  1:43 ` [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking Joao Martins
@ 2023-03-05 20:57 ` Alex Williamson
  2023-03-05 23:33   ` Joao Martins
  2023-03-06 17:23 ` Cédric Le Goater
  14 siblings, 1 reply; 51+ messages in thread
From: Alex Williamson @ 2023-03-05 20:57 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On Sat,  4 Mar 2023 01:43:30 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> Hey,
> 
> Presented herewith a series based on the basic VFIO migration protocol v2
> implementation [1].
> 
> It is split from its parent series[5] to solely focus on device dirty
> page tracking. Device dirty page tracking allows the VFIO device to
> record its DMAs and report them back when needed. This is part of VFIO
> migration and is used during pre-copy phase of migration to track the
> RAM pages that the device has written to and mark those pages dirty, so
> they can later be re-sent to target.
> 
> Device dirty page tracking uses the DMA logging uAPI to discover device
> capabilities, to start and stop tracking, and to get dirty page bitmap
> report. Extra details and uAPI definition can be found here [3].
> 
> Device dirty page tracking operates in VFIOContainer scope. I.e., When
> dirty tracking is started, stopped or dirty page report is queried, all
> devices within a VFIOContainer are iterated and for each of them device
> dirty page tracking is started, stopped or dirty page report is queried,
> respectively.
> 
> Device dirty page tracking is used only if all devices within a
> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
> used, and if that is not supported as well, memory is perpetually marked
> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
> support, the last two usually have the same effect of perpetually
> marking all pages dirty.
> 
> Normally, when asked to start dirty tracking, all the currently DMA
> mapped ranges are tracked by device dirty page tracking. If using a
> vIOMMU we block live migration. It's temporary and a separate series is
> going to add support for it. Thus this series focus on getting the
> ground work first.
> 
> The series is organized as follows:
> 
> - Patches 1-7: Fix bugs and do some preparatory work required prior to
>   adding device dirty page tracking.
> - Patches 8-10: Implement device dirty page tracking.
> - Patch 11: Blocks live migration with vIOMMU.
> - Patches 12-13 enable device dirty page tracking and document it.
> 
> Comments, improvements as usual appreciated.

Still some CI failures:

https://gitlab.com/alex.williamson/qemu/-/pipelines/796657474

The docker failures are normal, afaict the rest are not.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-05 20:57 ` [PATCH v3 00/13] vfio/migration: Device " Alex Williamson
@ 2023-03-05 23:33   ` Joao Martins
  2023-03-06  2:19     ` Alex Williamson
  0 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-05 23:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

[-- Attachment #1: Type: text/plain, Size: 5373 bytes --]

On 05/03/2023 20:57, Alex Williamson wrote:
> On Sat,  4 Mar 2023 01:43:30 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> Hey,
>>
>> Presented herewith a series based on the basic VFIO migration protocol v2
>> implementation [1].
>>
>> It is split from its parent series[5] to solely focus on device dirty
>> page tracking. Device dirty page tracking allows the VFIO device to
>> record its DMAs and report them back when needed. This is part of VFIO
>> migration and is used during pre-copy phase of migration to track the
>> RAM pages that the device has written to and mark those pages dirty, so
>> they can later be re-sent to target.
>>
>> Device dirty page tracking uses the DMA logging uAPI to discover device
>> capabilities, to start and stop tracking, and to get dirty page bitmap
>> report. Extra details and uAPI definition can be found here [3].
>>
>> Device dirty page tracking operates in VFIOContainer scope. I.e., When
>> dirty tracking is started, stopped or dirty page report is queried, all
>> devices within a VFIOContainer are iterated and for each of them device
>> dirty page tracking is started, stopped or dirty page report is queried,
>> respectively.
>>
>> Device dirty page tracking is used only if all devices within a
>> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
>> used, and if that is not supported as well, memory is perpetually marked
>> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
>> support, the last two usually have the same effect of perpetually
>> marking all pages dirty.
>>
>> Normally, when asked to start dirty tracking, all the currently DMA
>> mapped ranges are tracked by device dirty page tracking. If using a
>> vIOMMU we block live migration. It's temporary and a separate series is
>> going to add support for it. Thus this series focus on getting the
>> ground work first.
>>
>> The series is organized as follows:
>>
>> - Patches 1-7: Fix bugs and do some preparatory work required prior to
>>   adding device dirty page tracking.
>> - Patches 8-10: Implement device dirty page tracking.
>> - Patch 11: Blocks live migration with vIOMMU.
>> - Patches 12-13 enable device dirty page tracking and document it.
>>
>> Comments, improvements as usual appreciated.
> 
> Still some CI failures:
> 
> https://gitlab.com/alex.williamson/qemu/-/pipelines/796657474
> 
> The docker failures are normal, afaict the rest are not.  Thanks,
> 

Ugh, sorry

The patch below scissors mark (and also attached as a file) fixes those build
issues. I managed to reproduce on i386 target builds, and these changes fix my
32-bit build.

I don't have a working Gitlab setup[*] though to trigger the CI to enable to
wealth of targets it build-tests. If you could kindly test the patch attached in
a new pipeline (applied on top of the branch you just build) below to understand
if the CI gets happy. I will include these changes in the right patches (patch 8
and 10) for the v4 spin.

	Joao

[*] I'm working with Gitlab support to understand what's wrong there with my
account.

----------------->8-----------------

From bbf2c3bbb9c9e97f12dfe49f85dac8cc1f0c5d97 Mon Sep 17 00:00:00 2001
From: Joao Martins <joao.m.martins@oracle.com>
Date: Sun, 5 Mar 2023 18:12:29 -0500
Subject: [PATCH v3 14/13] vfio/common: Fix 32-bit builds

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9b909f856722..eecff5bb16c6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1554,7 +1554,7 @@ vfio_device_feature_dma_logging_start_create(VFIOContainer
*container)
         return NULL;
     }

-    control->ranges = (__aligned_u64)ranges;
+    control->ranges = (__u64)(uintptr_t)ranges;
     if (tracking->max32) {
         ranges->iova = tracking->min32;
         ranges->length = (tracking->max32 - tracking->min32) + 1;
@@ -1578,7 +1578,7 @@ static void vfio_device_feature_dma_logging_start_destroy(
     struct vfio_device_feature_dma_logging_control *control =
         (struct vfio_device_feature_dma_logging_control *)feature->data;
     struct vfio_device_feature_dma_logging_range *ranges =
-        (struct vfio_device_feature_dma_logging_range *)control->ranges;
+        (struct vfio_device_feature_dma_logging_range *)(uintptr_t)
control->ranges;

     g_free(ranges);
     g_free(feature);
@@ -1646,7 +1646,7 @@ static int vfio_device_dma_logging_report(VFIODevice
*vbasedev, hwaddr iova,
 {
     uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
                         sizeof(struct vfio_device_feature_dma_logging_report),
-                        sizeof(__aligned_u64))] = {};
+                        sizeof(__u64))] = {};
     struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
     struct vfio_device_feature_dma_logging_report *report =
         (struct vfio_device_feature_dma_logging_report *)feature->data;
@@ -1654,7 +1654,7 @@ static int vfio_device_dma_logging_report(VFIODevice
*vbasedev, hwaddr iova,
     report->iova = iova;
     report->length = size;
     report->page_size = qemu_real_host_page_size();
-    report->bitmap = (__aligned_u64)bitmap;
+    report->bitmap = (__u64)(uintptr_t)bitmap;

     feature->argsz = sizeof(buf);
     feature->flags =
--
2.17.2

[-- Attachment #2: 0014-vfio-common-Fix-32-bit-builds.patch --]
[-- Type: text/plain, Size: 2261 bytes --]

From bbf2c3bbb9c9e97f12dfe49f85dac8cc1f0c5d97 Mon Sep 17 00:00:00 2001
From: Joao Martins <joao.m.martins@oracle.com>
Date: Sun, 5 Mar 2023 18:12:29 -0500
Subject: [PATCH v3 14/13] vfio/common: Fix 32-bit builds

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9b909f856722..eecff5bb16c6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1554,7 +1554,7 @@ vfio_device_feature_dma_logging_start_create(VFIOContainer *container)
         return NULL;
     }
 
-    control->ranges = (__aligned_u64)ranges;
+    control->ranges = (__u64)(uintptr_t)ranges;
     if (tracking->max32) {
         ranges->iova = tracking->min32;
         ranges->length = (tracking->max32 - tracking->min32) + 1;
@@ -1578,7 +1578,7 @@ static void vfio_device_feature_dma_logging_start_destroy(
     struct vfio_device_feature_dma_logging_control *control =
         (struct vfio_device_feature_dma_logging_control *)feature->data;
     struct vfio_device_feature_dma_logging_range *ranges =
-        (struct vfio_device_feature_dma_logging_range *)control->ranges;
+        (struct vfio_device_feature_dma_logging_range *)(uintptr_t) control->ranges;
 
     g_free(ranges);
     g_free(feature);
@@ -1646,7 +1646,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
 {
     uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
                         sizeof(struct vfio_device_feature_dma_logging_report),
-                        sizeof(__aligned_u64))] = {};
+                        sizeof(__u64))] = {};
     struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
     struct vfio_device_feature_dma_logging_report *report =
         (struct vfio_device_feature_dma_logging_report *)feature->data;
@@ -1654,7 +1654,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
     report->iova = iova;
     report->length = size;
     report->page_size = qemu_real_host_page_size();
-    report->bitmap = (__aligned_u64)bitmap;
+    report->bitmap = (__u64)(uintptr_t)bitmap;
 
     feature->argsz = sizeof(buf);
     feature->flags =
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-05 23:33   ` Joao Martins
@ 2023-03-06  2:19     ` Alex Williamson
  2023-03-06  9:45       ` Joao Martins
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Williamson @ 2023-03-06  2:19 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On Sun, 5 Mar 2023 23:33:35 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 05/03/2023 20:57, Alex Williamson wrote:
> > On Sat,  4 Mar 2023 01:43:30 +0000
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >   
> >> Hey,
> >>
> >> Presented herewith a series based on the basic VFIO migration protocol v2
> >> implementation [1].
> >>
> >> It is split from its parent series[5] to solely focus on device dirty
> >> page tracking. Device dirty page tracking allows the VFIO device to
> >> record its DMAs and report them back when needed. This is part of VFIO
> >> migration and is used during pre-copy phase of migration to track the
> >> RAM pages that the device has written to and mark those pages dirty, so
> >> they can later be re-sent to target.
> >>
> >> Device dirty page tracking uses the DMA logging uAPI to discover device
> >> capabilities, to start and stop tracking, and to get dirty page bitmap
> >> report. Extra details and uAPI definition can be found here [3].
> >>
> >> Device dirty page tracking operates in VFIOContainer scope. I.e., When
> >> dirty tracking is started, stopped or dirty page report is queried, all
> >> devices within a VFIOContainer are iterated and for each of them device
> >> dirty page tracking is started, stopped or dirty page report is queried,
> >> respectively.
> >>
> >> Device dirty page tracking is used only if all devices within a
> >> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
> >> used, and if that is not supported as well, memory is perpetually marked
> >> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
> >> support, the last two usually have the same effect of perpetually
> >> marking all pages dirty.
> >>
> >> Normally, when asked to start dirty tracking, all the currently DMA
> >> mapped ranges are tracked by device dirty page tracking. If using a
> >> vIOMMU we block live migration. It's temporary and a separate series is
> >> going to add support for it. Thus this series focus on getting the
> >> ground work first.
> >>
> >> The series is organized as follows:
> >>
> >> - Patches 1-7: Fix bugs and do some preparatory work required prior to
> >>   adding device dirty page tracking.
> >> - Patches 8-10: Implement device dirty page tracking.
> >> - Patch 11: Blocks live migration with vIOMMU.
> >> - Patches 12-13 enable device dirty page tracking and document it.
> >>
> >> Comments, improvements as usual appreciated.  
> > 
> > Still some CI failures:
> > 
> > https://gitlab.com/alex.williamson/qemu/-/pipelines/796657474
> > 
> > The docker failures are normal, afaict the rest are not.  Thanks,
> >   
> 
> Ugh, sorry
> 
> The patch below scissors mark (and also attached as a file) fixes those build
> issues. I managed to reproduce on i386 target builds, and these changes fix my
> 32-bit build.
> 
> I don't have a working Gitlab setup[*] though to trigger the CI to enable to
> wealth of targets it build-tests. If you could kindly test the patch attached in
> a new pipeline (applied on top of the branch you just build) below to understand
> if the CI gets happy. I will include these changes in the right patches (patch 8
> and 10) for the v4 spin.

Looks like this passes:

https://gitlab.com/alex.williamson/qemu/-/pipelines/796750136

Thanks,
Alex



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-06  2:19     ` Alex Williamson
@ 2023-03-06  9:45       ` Joao Martins
  2023-03-06 11:05         ` Cédric Le Goater
  0 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-06  9:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 06/03/2023 02:19, Alex Williamson wrote:
> On Sun, 5 Mar 2023 23:33:35 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> On 05/03/2023 20:57, Alex Williamson wrote:
>>> On Sat,  4 Mar 2023 01:43:30 +0000
>>> Joao Martins <joao.m.martins@oracle.com> wrote:
>>>   
>>>> Hey,
>>>>
>>>> Presented herewith a series based on the basic VFIO migration protocol v2
>>>> implementation [1].
>>>>
>>>> It is split from its parent series[5] to solely focus on device dirty
>>>> page tracking. Device dirty page tracking allows the VFIO device to
>>>> record its DMAs and report them back when needed. This is part of VFIO
>>>> migration and is used during pre-copy phase of migration to track the
>>>> RAM pages that the device has written to and mark those pages dirty, so
>>>> they can later be re-sent to target.
>>>>
>>>> Device dirty page tracking uses the DMA logging uAPI to discover device
>>>> capabilities, to start and stop tracking, and to get dirty page bitmap
>>>> report. Extra details and uAPI definition can be found here [3].
>>>>
>>>> Device dirty page tracking operates in VFIOContainer scope. I.e., When
>>>> dirty tracking is started, stopped or dirty page report is queried, all
>>>> devices within a VFIOContainer are iterated and for each of them device
>>>> dirty page tracking is started, stopped or dirty page report is queried,
>>>> respectively.
>>>>
>>>> Device dirty page tracking is used only if all devices within a
>>>> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
>>>> used, and if that is not supported as well, memory is perpetually marked
>>>> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
>>>> support, the last two usually have the same effect of perpetually
>>>> marking all pages dirty.
>>>>
>>>> Normally, when asked to start dirty tracking, all the currently DMA
>>>> mapped ranges are tracked by device dirty page tracking. If using a
>>>> vIOMMU we block live migration. It's temporary and a separate series is
>>>> going to add support for it. Thus this series focus on getting the
>>>> ground work first.
>>>>
>>>> The series is organized as follows:
>>>>
>>>> - Patches 1-7: Fix bugs and do some preparatory work required prior to
>>>>   adding device dirty page tracking.
>>>> - Patches 8-10: Implement device dirty page tracking.
>>>> - Patch 11: Blocks live migration with vIOMMU.
>>>> - Patches 12-13 enable device dirty page tracking and document it.
>>>>
>>>> Comments, improvements as usual appreciated.  
>>>
>>> Still some CI failures:
>>>
>>> https://gitlab.com/alex.williamson/qemu/-/pipelines/796657474
>>>
>>> The docker failures are normal, afaict the rest are not.  Thanks,
>>>   
>>
>> Ugh, sorry
>>
>> The patch below scissors mark (and also attached as a file) fixes those build
>> issues. I managed to reproduce on i386 target builds, and these changes fix my
>> 32-bit build.
>>
>> I don't have a working Gitlab setup[*] though to trigger the CI to enable to
>> wealth of targets it build-tests. If you could kindly test the patch attached in
>> a new pipeline (applied on top of the branch you just build) below to understand
>> if the CI gets happy. I will include these changes in the right patches (patch 8
>> and 10) for the v4 spin.
> 
> Looks like this passes:
> 
> https://gitlab.com/alex.williamson/qemu/-/pipelines/796750136
> 
Great, I've staged this fixes in patches 8&10!

I have a sliver of hope that we might still make it by soft freeze (tomorrow?).
If you think it can still make it, should the rest of the series is good, then I
can follow up v4 today/tomorrow. Thoughts?

	Joao


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-06  9:45       ` Joao Martins
@ 2023-03-06 11:05         ` Cédric Le Goater
  2023-03-06 21:19           ` Alex Williamson
  0 siblings, 1 reply; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 11:05 UTC (permalink / raw)
  To: Joao Martins, Alex Williamson
  Cc: qemu-devel, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On 3/6/23 10:45, Joao Martins wrote:
> On 06/03/2023 02:19, Alex Williamson wrote:
>> On Sun, 5 Mar 2023 23:33:35 +0000
>> Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>>> On 05/03/2023 20:57, Alex Williamson wrote:
>>>> On Sat,  4 Mar 2023 01:43:30 +0000
>>>> Joao Martins <joao.m.martins@oracle.com> wrote:
>>>>    
>>>>> Hey,
>>>>>
>>>>> Presented herewith a series based on the basic VFIO migration protocol v2
>>>>> implementation [1].
>>>>>
>>>>> It is split from its parent series[5] to solely focus on device dirty
>>>>> page tracking. Device dirty page tracking allows the VFIO device to
>>>>> record its DMAs and report them back when needed. This is part of VFIO
>>>>> migration and is used during pre-copy phase of migration to track the
>>>>> RAM pages that the device has written to and mark those pages dirty, so
>>>>> they can later be re-sent to target.
>>>>>
>>>>> Device dirty page tracking uses the DMA logging uAPI to discover device
>>>>> capabilities, to start and stop tracking, and to get dirty page bitmap
>>>>> report. Extra details and uAPI definition can be found here [3].
>>>>>
>>>>> Device dirty page tracking operates in VFIOContainer scope. I.e., When
>>>>> dirty tracking is started, stopped or dirty page report is queried, all
>>>>> devices within a VFIOContainer are iterated and for each of them device
>>>>> dirty page tracking is started, stopped or dirty page report is queried,
>>>>> respectively.
>>>>>
>>>>> Device dirty page tracking is used only if all devices within a
>>>>> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
>>>>> used, and if that is not supported as well, memory is perpetually marked
>>>>> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
>>>>> support, the last two usually have the same effect of perpetually
>>>>> marking all pages dirty.
>>>>>
>>>>> Normally, when asked to start dirty tracking, all the currently DMA
>>>>> mapped ranges are tracked by device dirty page tracking. If using a
>>>>> vIOMMU we block live migration. It's temporary and a separate series is
>>>>> going to add support for it. Thus this series focus on getting the
>>>>> ground work first.
>>>>>
>>>>> The series is organized as follows:
>>>>>
>>>>> - Patches 1-7: Fix bugs and do some preparatory work required prior to
>>>>>    adding device dirty page tracking.
>>>>> - Patches 8-10: Implement device dirty page tracking.
>>>>> - Patch 11: Blocks live migration with vIOMMU.
>>>>> - Patches 12-13 enable device dirty page tracking and document it.
>>>>>
>>>>> Comments, improvements as usual appreciated.
>>>>
>>>> Still some CI failures:
>>>>
>>>> https://gitlab.com/alex.williamson/qemu/-/pipelines/796657474
>>>>
>>>> The docker failures are normal, afaict the rest are not.  Thanks,
>>>>    
>>>
>>> Ugh, sorry
>>>
>>> The patch below scissors mark (and also attached as a file) fixes those build
>>> issues. I managed to reproduce on i386 target builds, and these changes fix my
>>> 32-bit build.
>>>
>>> I don't have a working Gitlab setup[*] though to trigger the CI to enable to
>>> wealth of targets it build-tests. If you could kindly test the patch attached in
>>> a new pipeline (applied on top of the branch you just build) below to understand
>>> if the CI gets happy. I will include these changes in the right patches (patch 8
>>> and 10) for the v4 spin.
>>
>> Looks like this passes:
>>
>> https://gitlab.com/alex.williamson/qemu/-/pipelines/796750136
>>
> Great, I've staged this fixes in patches 8&10!
> 
> I have a sliver of hope that we might still make it by soft freeze (tomorrow?).
> If you think it can still make it, should the rest of the series is good, then I
> can follow up v4 today/tomorrow. Thoughts?

I would say, wait and see if a v4 is needed first. These changes are
relatively easy to fold in.

C.



> 
> 	Joao
> 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 04/13] vfio/common: Add VFIOBitmap and alloc function
  2023-03-04  1:43 ` [PATCH v3 04/13] vfio/common: Add VFIOBitmap and alloc function Joao Martins
@ 2023-03-06 13:20   ` Cédric Le Goater
  2023-03-06 14:37     ` Joao Martins
  0 siblings, 1 reply; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 13:20 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> From: Avihai Horon <avihaih@nvidia.com>
> 
> There are already two places where dirty page bitmap allocation and
> calculations are done in open code. With device dirty page tracking
> being added in next patches, there are going to be even more places.
> 
> To avoid code duplication, introduce VFIOBitmap struct and corresponding
> alloc function and use them where applicable.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

One minor comment, only in case you respin,

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> ---
>   hw/vfio/common.c | 75 +++++++++++++++++++++++++++++-------------------
>   1 file changed, 46 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 4c801513136a..151e7f40b73d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -320,6 +320,27 @@ const MemoryRegionOps vfio_region_ops = {
>    * Device state interfaces
>    */
>   
> +typedef struct {
> +    unsigned long *bitmap;
> +    hwaddr size;
> +    hwaddr pages;
> +} VFIOBitmap;
> +
> +static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
> +{
> +    vbmap->pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
> +    vbmap->size = ROUND_UP(vbmap->pages, sizeof(__u64) * BITS_PER_BYTE) /
> +                                         BITS_PER_BYTE;
> +    vbmap->bitmap = g_try_malloc0(vbmap->size);
> +    if (!vbmap->bitmap) {
> +        errno = ENOMEM;
> +
> +        return -errno;

vfio_bitmap_alloc() could simply return ENOMEM now.

> +    }
> +
> +    return 0;
> +}
> +
>   bool vfio_mig_active(void)
>   {
>       VFIOGroup *group;
> @@ -468,9 +489,14 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>   {
>       struct vfio_iommu_type1_dma_unmap *unmap;
>       struct vfio_bitmap *bitmap;
> -    uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
> +    VFIOBitmap vbmap;
>       int ret;
>   
> +    ret = vfio_bitmap_alloc(&vbmap, size);
> +    if (ret) {
> +        return -errno;
> +    }
> +
>       unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
>   
>       unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
> @@ -484,35 +510,28 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>        * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize
>        * to qemu_real_host_page_size.
>        */
> -
>       bitmap->pgsize = qemu_real_host_page_size();
> -    bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
> -                   BITS_PER_BYTE;
> +    bitmap->size = vbmap.size;
> +    bitmap->data = (__u64 *)vbmap.bitmap;
>   
> -    if (bitmap->size > container->max_dirty_bitmap_size) {
> -        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
> -                     (uint64_t)bitmap->size);
> +    if (vbmap.size > container->max_dirty_bitmap_size) {
> +        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
>           ret = -E2BIG;
>           goto unmap_exit;
>       }
>   
> -    bitmap->data = g_try_malloc0(bitmap->size);
> -    if (!bitmap->data) {
> -        ret = -ENOMEM;
> -        goto unmap_exit;
> -    }
> -
>       ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
>       if (!ret) {
> -        cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
> -                iotlb->translated_addr, pages);
> +        cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap,
> +                iotlb->translated_addr, vbmap.pages);
>       } else {
>           error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
>       }
>   
> -    g_free(bitmap->data);
>   unmap_exit:
>       g_free(unmap);
> +    g_free(vbmap.bitmap);
> +
>       return ret;
>   }
>   
> @@ -1329,7 +1348,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>   {
>       struct vfio_iommu_type1_dirty_bitmap *dbitmap;
>       struct vfio_iommu_type1_dirty_bitmap_get *range;
> -    uint64_t pages;
> +    VFIOBitmap vbmap;
>       int ret;
>   
>       if (!container->dirty_pages_supported) {
> @@ -1339,6 +1358,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>           return 0;
>       }
>   
> +    ret = vfio_bitmap_alloc(&vbmap, size);
> +    if (ret) {
> +        return -errno;
> +    }
> +
>       dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
>   
>       dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
> @@ -1353,15 +1377,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>        * to qemu_real_host_page_size.
>        */
>       range->bitmap.pgsize = qemu_real_host_page_size();
> -
> -    pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size();
> -    range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
> -                                         BITS_PER_BYTE;
> -    range->bitmap.data = g_try_malloc0(range->bitmap.size);
> -    if (!range->bitmap.data) {
> -        ret = -ENOMEM;
> -        goto err_out;
> -    }
> +    range->bitmap.size = vbmap.size;
> +    range->bitmap.data = (__u64 *)vbmap.bitmap;
>   
>       ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
>       if (ret) {
> @@ -1372,14 +1389,14 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>           goto err_out;
>       }
>   
> -    cpu_physical_memory_set_dirty_lebitmap((unsigned long *)range->bitmap.data,
> -                                            ram_addr, pages);
> +    cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
> +                                           vbmap.pages);
>   
>       trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
>                                   range->bitmap.size, ram_addr);
>   err_out:
> -    g_free(range->bitmap.data);
>       g_free(dbitmap);
> +    g_free(vbmap.bitmap);
>   
>       return ret;
>   }



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/13] vfio/common: Add helper to validate iova/end against hostwin
  2023-03-04  1:43 ` [PATCH v3 05/13] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
@ 2023-03-06 13:24   ` Cédric Le Goater
  0 siblings, 0 replies; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 13:24 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On 3/4/23 02:43, Joao Martins wrote:
> In preparation to be used in device dirty tracking, move the code that
> finds the container host DMA window against a iova range.  This avoids
> duplication on the common checks across listener callbacks.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> ---
>   hw/vfio/common.c | 38 ++++++++++++++++++++------------------
>   1 file changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 151e7f40b73d..80f3a1c44a01 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -903,6 +903,22 @@ static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
>       g_free(vrdl);
>   }
>   
> +static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
> +                                            hwaddr iova, hwaddr end)
> +{
> +    VFIOHostDMAWindow *hostwin;
> +    bool hostwin_found = false;
> +
> +    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> +            hostwin_found = true;
> +            break;
> +        }
> +    }
> +
> +    return hostwin_found ? hostwin : NULL;
> +}
> +
>   static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>   {
>       MemoryRegion *mr = section->mr;
> @@ -928,7 +944,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       void *vaddr;
>       int ret;
>       VFIOHostDMAWindow *hostwin;
> -    bool hostwin_found;
>       Error *err = NULL;
>   
>       if (vfio_listener_skipped_section(section)) {
> @@ -1029,15 +1044,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
>   #endif
>       }
>   
> -    hostwin_found = false;
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> -        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> -            hostwin_found = true;
> -            break;
> -        }
> -    }
> -
> -    if (!hostwin_found) {
> +    hostwin = vfio_find_hostwin(container, iova, end);
> +    if (!hostwin) {
>           error_setg(&err, "Container %p can't map guest IOVA region"
>                      " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>           goto fail;
> @@ -1239,15 +1247,9 @@ static void vfio_listener_region_del(MemoryListener *listener,
>       if (memory_region_is_ram_device(section->mr)) {
>           hwaddr pgmask;
>           VFIOHostDMAWindow *hostwin;
> -        bool hostwin_found = false;
>   
> -        QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> -            if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> -                hostwin_found = true;
> -                break;
> -            }
> -        }
> -        assert(hostwin_found); /* or region_add() would have failed */
> +        hostwin = vfio_find_hostwin(container, iova, end);
> +        assert(hostwin); /* or region_add() would have failed */
>   
>           pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
>           try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/13] vfio/common: Consolidate skip/invalid section into helper
  2023-03-04  1:43 ` [PATCH v3 06/13] vfio/common: Consolidate skip/invalid section into helper Joao Martins
@ 2023-03-06 13:33   ` Cédric Le Goater
  0 siblings, 0 replies; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 13:33 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On 3/4/23 02:43, Joao Martins wrote:
> The checks are replicated against region_add and region_del
> and will be soon added in another memory listener dedicated
> for dirty tracking.
> 
> Move these into a new helper for avoid duplication.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
>   1 file changed, 21 insertions(+), 31 deletions(-)

LGTM, it is a valid change even without adding migration support.

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 80f3a1c44a01..ed908e303dbb 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -935,23 +935,14 @@ static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>       return true;
>   }
>   
> -static void vfio_listener_region_add(MemoryListener *listener,
> -                                     MemoryRegionSection *section)
> +static bool vfio_listener_valid_section(MemoryRegionSection *section)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> -    hwaddr iova, end;
> -    Int128 llend, llsize;
> -    void *vaddr;
> -    int ret;
> -    VFIOHostDMAWindow *hostwin;
> -    Error *err = NULL;
> -
>       if (vfio_listener_skipped_section(section)) {
>           trace_vfio_listener_region_add_skip(
>                   section->offset_within_address_space,
>                   section->offset_within_address_space +
>                   int128_get64(int128_sub(section->size, int128_one())));
> -        return;
> +        return false;
>       }
>   
>       if (unlikely((section->offset_within_address_space &
> @@ -966,6 +957,24 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                            section->offset_within_region,
>                            qemu_real_host_page_size());
>           }
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static void vfio_listener_region_add(MemoryListener *listener,
> +                                     MemoryRegionSection *section)
> +{
> +    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    hwaddr iova, end;
> +    Int128 llend, llsize;
> +    void *vaddr;
> +    int ret;
> +    VFIOHostDMAWindow *hostwin;
> +    Error *err = NULL;
> +
> +    if (!vfio_listener_valid_section(section)) {
>           return;
>       }
>   
> @@ -1184,26 +1193,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>       int ret;
>       bool try_unmap = true;
>   
> -    if (vfio_listener_skipped_section(section)) {
> -        trace_vfio_listener_region_del_skip(
> -                section->offset_within_address_space,
> -                section->offset_within_address_space +
> -                int128_get64(int128_sub(section->size, int128_one())));
> -        return;
> -    }
> -
> -    if (unlikely((section->offset_within_address_space &
> -                  ~qemu_real_host_page_mask()) !=
> -                 (section->offset_within_region & ~qemu_real_host_page_mask()))) {
> -        if (!vfio_known_safe_misalignment(section)) {
> -            error_report("%s received unaligned region %s iova=0x%"PRIx64
> -                         " offset_within_region=0x%"PRIx64
> -                         " qemu_real_host_page_size=0x%"PRIxPTR,
> -                         __func__, memory_region_name(section->mr),
> -                         section->offset_within_address_space,
> -                         section->offset_within_region,
> -                         qemu_real_host_page_size());
> -        }
> +    if (!vfio_listener_valid_section(section)) {
>           return;
>       }
>   



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-04  1:43 ` [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges Joao Martins
@ 2023-03-06 13:41   ` Cédric Le Goater
  2023-03-06 14:37     ` Joao Martins
  2023-03-06 18:05   ` Cédric Le Goater
  2023-03-06 18:15   ` Alex Williamson
  2 siblings, 1 reply; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 13:41 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> According to the device DMA logging uAPI, IOVA ranges to be logged by
> the device must be provided all at once upon DMA logging start.
> 
> As preparation for the following patches which will add device dirty
> page tracking, keep a record of all DMA mapped IOVA ranges so later they
> can be used for DMA logging start.
> 
> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
>   hw/vfio/trace-events          |  1 +
>   include/hw/vfio/vfio-common.h | 11 +++++
>   3 files changed, 96 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ed908e303dbb..d84e5fd86bb4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -44,6 +44,7 @@
>   #include "migration/blocker.h"
>   #include "migration/qemu-file.h"
>   #include "sysemu/tpm.h"
> +#include "qemu/iova-tree.h"
>   
>   VFIOGroupList vfio_group_list =
>       QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>       return ret;
>   }
>   
> +/*
> + * Called for the dirty tracking memory listener to calculate the iova/end
> + * for a given memory region section. The checks here, replicate the logic
> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus
> + * both listener should be kept in sync.
> + */
> +static bool vfio_get_section_iova_range(VFIOContainer *container,
> +                                        MemoryRegionSection *section,
> +                                        hwaddr *out_iova, hwaddr *out_end)
> +{
> +    Int128 llend;
> +    hwaddr iova;
> +
> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> +
> +    if (int128_ge(int128_make64(iova), llend)) {
> +        return false;
> +    }
> +
> +    *out_iova = iova;
> +    *out_end = int128_get64(llend) - 1;
> +    return true;
> +}
> +
> +static void vfio_dirty_tracking_update(MemoryListener *listener,
> +                                       MemoryRegionSection *section)
> +{
> +    VFIOContainer *container = container_of(listener, VFIOContainer,
> +                                            tracking_listener);
> +    VFIODirtyTrackingRange *range = &container->tracking_range;
> +    hwaddr max32 = (1ULL << 32) - 1ULL;
> +    hwaddr iova, end;
> +
> +    if (!vfio_listener_valid_section(section) ||
> +        !vfio_get_section_iova_range(container, section, &iova, &end)) {
> +        return;
> +    }
> +
> +    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
> +        if (iova < max32 && end <= max32) {
> +                if (range->min32 > iova) {
> +                    range->min32 = iova;
> +                }
> +                if (range->max32 < end) {
> +                    range->max32 = end;
> +                }
> +                trace_vfio_device_dirty_tracking_update(iova, end,
> +                                            range->min32, range->max32);
> +        } else {
> +                if (!range->min64 || range->min64 > iova) {
> +                    range->min64 = iova;
> +                }
> +                if (range->max64 < end) {
> +                    range->max64 = end;
> +                }
> +                trace_vfio_device_dirty_tracking_update(iova, end,
> +                                            range->min64, range->max64);
> +        }
> +    }
> +    return;
> +}
> +
> +static const MemoryListener vfio_dirty_tracking_listener = {
> +    .name = "vfio-tracking",
> +    .region_add = vfio_dirty_tracking_update,
> +};
> +
> +static void vfio_dirty_tracking_init(VFIOContainer *container)
> +{
> +    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
> +    qemu_mutex_init(&container->tracking_mutex);
> +    container->tracking_listener = vfio_dirty_tracking_listener;
> +    memory_listener_register(&container->tracking_listener,
> +                             container->space->as);

The following unregister+destroy calls seem to belong to a _fini routine.
Am I missing something ?

Thanks,

C.

> +    memory_listener_unregister(&container->tracking_listener);
> +    qemu_mutex_destroy(&container->tracking_mutex);
> +}
> +
>   static void vfio_listener_log_global_start(MemoryListener *listener)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>       int ret;
>   
> +    vfio_dirty_tracking_init(container);
> +
>       ret = vfio_set_dirty_page_tracking(container, true);
>       if (ret) {
>           vfio_set_migration_error(ret);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 669d9fe07cd9..d97a6de17921 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>   vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>   vfio_disconnect_container(int fd) "close container->fd=%d"
>   vfio_put_group(int fd) "close group->fd=%d"
>   vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 87524c64a443..96791add2719 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -23,6 +23,7 @@
>   
>   #include "exec/memory.h"
>   #include "qemu/queue.h"
> +#include "qemu/iova-tree.h"
>   #include "qemu/notify.h"
>   #include "ui/console.h"
>   #include "hw/display/ramfb.h"
> @@ -68,6 +69,13 @@ typedef struct VFIOMigration {
>       size_t data_buffer_size;
>   } VFIOMigration;
>   
> +typedef struct VFIODirtyTrackingRange {
> +    hwaddr min32;
> +    hwaddr max32;
> +    hwaddr min64;
> +    hwaddr max64;
> +} VFIODirtyTrackingRange;
> +
>   typedef struct VFIOAddressSpace {
>       AddressSpace *as;
>       QLIST_HEAD(, VFIOContainer) containers;
> @@ -89,6 +97,9 @@ typedef struct VFIOContainer {
>       uint64_t max_dirty_bitmap_size;
>       unsigned long pgsizes;
>       unsigned int dma_max_mappings;
> +    VFIODirtyTrackingRange tracking_range;
> +    QemuMutex tracking_mutex;
> +    MemoryListener tracking_listener;
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-06 13:41   ` Cédric Le Goater
@ 2023-03-06 14:37     ` Joao Martins
  2023-03-06 15:11       ` Alex Williamson
  0 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-06 14:37 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon, qemu-devel

On 06/03/2023 13:41, Cédric Le Goater wrote:
> On 3/4/23 02:43, Joao Martins wrote:
>> According to the device DMA logging uAPI, IOVA ranges to be logged by
>> the device must be provided all at once upon DMA logging start.
>>
>> As preparation for the following patches which will add device dirty
>> page tracking, keep a record of all DMA mapped IOVA ranges so later they
>> can be used for DMA logging start.
>>
>> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
>> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
>>   hw/vfio/trace-events          |  1 +
>>   include/hw/vfio/vfio-common.h | 11 +++++
>>   3 files changed, 96 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index ed908e303dbb..d84e5fd86bb4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -44,6 +44,7 @@
>>   #include "migration/blocker.h"
>>   #include "migration/qemu-file.h"
>>   #include "sysemu/tpm.h"
>> +#include "qemu/iova-tree.h"
>>     VFIOGroupList vfio_group_list =
>>       QLIST_HEAD_INITIALIZER(vfio_group_list);
>> @@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer
>> *container, bool start)
>>       return ret;
>>   }
>>   +/*
>> + * Called for the dirty tracking memory listener to calculate the iova/end
>> + * for a given memory region section. The checks here, replicate the logic
>> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus
>> + * both listener should be kept in sync.
>> + */
>> +static bool vfio_get_section_iova_range(VFIOContainer *container,
>> +                                        MemoryRegionSection *section,
>> +                                        hwaddr *out_iova, hwaddr *out_end)
>> +{
>> +    Int128 llend;
>> +    hwaddr iova;
>> +
>> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>> +    llend = int128_make64(section->offset_within_address_space);
>> +    llend = int128_add(llend, section->size);
>> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
>> +
>> +    if (int128_ge(int128_make64(iova), llend)) {
>> +        return false;
>> +    }
>> +
>> +    *out_iova = iova;
>> +    *out_end = int128_get64(llend) - 1;
>> +    return true;
>> +}
>> +
>> +static void vfio_dirty_tracking_update(MemoryListener *listener,
>> +                                       MemoryRegionSection *section)
>> +{
>> +    VFIOContainer *container = container_of(listener, VFIOContainer,
>> +                                            tracking_listener);
>> +    VFIODirtyTrackingRange *range = &container->tracking_range;
>> +    hwaddr max32 = (1ULL << 32) - 1ULL;
>> +    hwaddr iova, end;
>> +
>> +    if (!vfio_listener_valid_section(section) ||
>> +        !vfio_get_section_iova_range(container, section, &iova, &end)) {
>> +        return;
>> +    }
>> +
>> +    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
>> +        if (iova < max32 && end <= max32) {
>> +                if (range->min32 > iova) {
>> +                    range->min32 = iova;
>> +                }
>> +                if (range->max32 < end) {
>> +                    range->max32 = end;
>> +                }
>> +                trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                            range->min32, range->max32);
>> +        } else {
>> +                if (!range->min64 || range->min64 > iova) {
>> +                    range->min64 = iova;
>> +                }
>> +                if (range->max64 < end) {
>> +                    range->max64 = end;
>> +                }
>> +                trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                            range->min64, range->max64);
>> +        }
>> +    }
>> +    return;
>> +}
>> +
>> +static const MemoryListener vfio_dirty_tracking_listener = {
>> +    .name = "vfio-tracking",
>> +    .region_add = vfio_dirty_tracking_update,
>> +};
>> +
>> +static void vfio_dirty_tracking_init(VFIOContainer *container)
>> +{
>> +    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
>> +    qemu_mutex_init(&container->tracking_mutex);
>> +    container->tracking_listener = vfio_dirty_tracking_listener;
>> +    memory_listener_register(&container->tracking_listener,
>> +                             container->space->as);
> 
> The following unregister+destroy calls seem to belong to a _fini routine.
> Am I missing something ?
> 
The thinking is that once we register the memory listener, it will iterate
over all the sections and once that is finished the memory_listener_register
returns. So the state we initialize here isn't needed anyelse other than to
create the range and hence we destroy it right away. It was at container_init()
but it was unnecessary overhead to keep around if it's *only* needed when we
start/stop dirty tracking.

So the reason I don't add a _fini method is because there's no need to teardown
the state anywhere else other than this function.

I would argue that maybe I don't need the mutex at all as this is all serialized...

> Thanks,
> 
> C.
> 
>> +    memory_listener_unregister(&container->tracking_listener);
>> +    qemu_mutex_destroy(&container->tracking_mutex);
>> +}
>> +
>>   static void vfio_listener_log_global_start(MemoryListener *listener)
>>   {
>>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>       int ret;
>>   +    vfio_dirty_tracking_init(container);
>> +
>>       ret = vfio_set_dirty_page_tracking(container, true);
>>       if (ret) {
>>           vfio_set_migration_error(ret);
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 669d9fe07cd9..d97a6de17921 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t
>> iova, uint64_t offset_wi
>>   vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova,
>> uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64"
>> is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING
>> region_del 0x%"PRIx64" - 0x%"PRIx64
>>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del
>> 0x%"PRIx64" - 0x%"PRIx64
>> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min,
>> uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" -
>> 0x%"PRIx64"]"
>>   vfio_disconnect_container(int fd) "close container->fd=%d"
>>   vfio_put_group(int fd) "close group->fd=%d"
>>   vfio_get_device(const char * name, unsigned int flags, unsigned int
>> num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 87524c64a443..96791add2719 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -23,6 +23,7 @@
>>     #include "exec/memory.h"
>>   #include "qemu/queue.h"
>> +#include "qemu/iova-tree.h"
>>   #include "qemu/notify.h"
>>   #include "ui/console.h"
>>   #include "hw/display/ramfb.h"
>> @@ -68,6 +69,13 @@ typedef struct VFIOMigration {
>>       size_t data_buffer_size;
>>   } VFIOMigration;
>>   +typedef struct VFIODirtyTrackingRange {
>> +    hwaddr min32;
>> +    hwaddr max32;
>> +    hwaddr min64;
>> +    hwaddr max64;
>> +} VFIODirtyTrackingRange;
>> +
>>   typedef struct VFIOAddressSpace {
>>       AddressSpace *as;
>>       QLIST_HEAD(, VFIOContainer) containers;
>> @@ -89,6 +97,9 @@ typedef struct VFIOContainer {
>>       uint64_t max_dirty_bitmap_size;
>>       unsigned long pgsizes;
>>       unsigned int dma_max_mappings;
>> +    VFIODirtyTrackingRange tracking_range;
>> +    QemuMutex tracking_mutex;
>> +    MemoryListener tracking_listener;
>>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>>       QLIST_HEAD(, VFIOGroup) group_list;
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 04/13] vfio/common: Add VFIOBitmap and alloc function
  2023-03-06 13:20   ` Cédric Le Goater
@ 2023-03-06 14:37     ` Joao Martins
  0 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 14:37 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon



On 06/03/2023 13:20, Cédric Le Goater wrote:
> On 3/4/23 02:43, Joao Martins wrote:
>> From: Avihai Horon <avihaih@nvidia.com>
>>
>> There are already two places where dirty page bitmap allocation and
>> calculations are done in open code. With device dirty page tracking
>> being added in next patches, there are going to be even more places.
>>
>> To avoid code duplication, introduce VFIOBitmap struct and corresponding
>> alloc function and use them where applicable.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> One minor comment, only in case you respin,
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> 

Thanks!

>>   hw/vfio/common.c | 75 +++++++++++++++++++++++++++++-------------------
>>   1 file changed, 46 insertions(+), 29 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 4c801513136a..151e7f40b73d 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -320,6 +320,27 @@ const MemoryRegionOps vfio_region_ops = {
>>    * Device state interfaces
>>    */
>>   +typedef struct {
>> +    unsigned long *bitmap;
>> +    hwaddr size;
>> +    hwaddr pages;
>> +} VFIOBitmap;
>> +
>> +static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
>> +{
>> +    vbmap->pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
>> +    vbmap->size = ROUND_UP(vbmap->pages, sizeof(__u64) * BITS_PER_BYTE) /
>> +                                         BITS_PER_BYTE;
>> +    vbmap->bitmap = g_try_malloc0(vbmap->size);
>> +    if (!vbmap->bitmap) {
>> +        errno = ENOMEM;
>> +
>> +        return -errno;
> 
> vfio_bitmap_alloc() could simply return ENOMEM now.
> 
Gotcha.

>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   bool vfio_mig_active(void)
>>   {
>>       VFIOGroup *group;
>> @@ -468,9 +489,14 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>>   {
>>       struct vfio_iommu_type1_dma_unmap *unmap;
>>       struct vfio_bitmap *bitmap;
>> -    uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
>> +    VFIOBitmap vbmap;
>>       int ret;
>>   +    ret = vfio_bitmap_alloc(&vbmap, size);
>> +    if (ret) {
>> +        return -errno;
>> +    }
>> +
>>       unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
>>         unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
>> @@ -484,35 +510,28 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>>        * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize
>>        * to qemu_real_host_page_size.
>>        */
>> -
>>       bitmap->pgsize = qemu_real_host_page_size();
>> -    bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
>> -                   BITS_PER_BYTE;
>> +    bitmap->size = vbmap.size;
>> +    bitmap->data = (__u64 *)vbmap.bitmap;
>>   -    if (bitmap->size > container->max_dirty_bitmap_size) {
>> -        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
>> -                     (uint64_t)bitmap->size);
>> +    if (vbmap.size > container->max_dirty_bitmap_size) {
>> +        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
>>           ret = -E2BIG;
>>           goto unmap_exit;
>>       }
>>   -    bitmap->data = g_try_malloc0(bitmap->size);
>> -    if (!bitmap->data) {
>> -        ret = -ENOMEM;
>> -        goto unmap_exit;
>> -    }
>> -
>>       ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
>>       if (!ret) {
>> -        cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
>> -                iotlb->translated_addr, pages);
>> +        cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap,
>> +                iotlb->translated_addr, vbmap.pages);
>>       } else {
>>           error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
>>       }
>>   -    g_free(bitmap->data);
>>   unmap_exit:
>>       g_free(unmap);
>> +    g_free(vbmap.bitmap);
>> +
>>       return ret;
>>   }
>>   @@ -1329,7 +1348,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>   {
>>       struct vfio_iommu_type1_dirty_bitmap *dbitmap;
>>       struct vfio_iommu_type1_dirty_bitmap_get *range;
>> -    uint64_t pages;
>> +    VFIOBitmap vbmap;
>>       int ret;
>>         if (!container->dirty_pages_supported) {
>> @@ -1339,6 +1358,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>           return 0;
>>       }
>>   +    ret = vfio_bitmap_alloc(&vbmap, size);
>> +    if (ret) {
>> +        return -errno;
>> +    }
>> +
>>       dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
>>         dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
>> @@ -1353,15 +1377,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>        * to qemu_real_host_page_size.
>>        */
>>       range->bitmap.pgsize = qemu_real_host_page_size();
>> -
>> -    pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size();
>> -    range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
>> -                                         BITS_PER_BYTE;
>> -    range->bitmap.data = g_try_malloc0(range->bitmap.size);
>> -    if (!range->bitmap.data) {
>> -        ret = -ENOMEM;
>> -        goto err_out;
>> -    }
>> +    range->bitmap.size = vbmap.size;
>> +    range->bitmap.data = (__u64 *)vbmap.bitmap;
>>         ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
>>       if (ret) {
>> @@ -1372,14 +1389,14 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>           goto err_out;
>>       }
>>   -    cpu_physical_memory_set_dirty_lebitmap((unsigned long
>> *)range->bitmap.data,
>> -                                            ram_addr, pages);
>> +    cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
>> +                                           vbmap.pages);
>>         trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
>>                                   range->bitmap.size, ram_addr);
>>   err_out:
>> -    g_free(range->bitmap.data);
>>       g_free(dbitmap);
>> +    g_free(vbmap.bitmap);
>>         return ret;
>>   }
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-06 14:37     ` Joao Martins
@ 2023-03-06 15:11       ` Alex Williamson
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2023-03-06 15:11 UTC (permalink / raw)
  To: Joao Martins
  Cc: Cédric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	qemu-devel

On Mon, 6 Mar 2023 14:37:04 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 06/03/2023 13:41, Cédric Le Goater wrote:
> > On 3/4/23 02:43, Joao Martins wrote:  
> >> According to the device DMA logging uAPI, IOVA ranges to be logged by
> >> the device must be provided all at once upon DMA logging start.
> >>
> >> As preparation for the following patches which will add device dirty
> >> page tracking, keep a record of all DMA mapped IOVA ranges so later they
> >> can be used for DMA logging start.
> >>
> >> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
> >> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.
> >>
> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> >> ---
> >>   hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
> >>   hw/vfio/trace-events          |  1 +
> >>   include/hw/vfio/vfio-common.h | 11 +++++
> >>   3 files changed, 96 insertions(+)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index ed908e303dbb..d84e5fd86bb4 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -44,6 +44,7 @@
> >>   #include "migration/blocker.h"
> >>   #include "migration/qemu-file.h"
> >>   #include "sysemu/tpm.h"
> >> +#include "qemu/iova-tree.h"
> >>     VFIOGroupList vfio_group_list =
> >>       QLIST_HEAD_INITIALIZER(vfio_group_list);
> >> @@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer
> >> *container, bool start)
> >>       return ret;
> >>   }
> >>   +/*
> >> + * Called for the dirty tracking memory listener to calculate the iova/end
> >> + * for a given memory region section. The checks here, replicate the logic
> >> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus
> >> + * both listener should be kept in sync.
> >> + */
> >> +static bool vfio_get_section_iova_range(VFIOContainer *container,
> >> +                                        MemoryRegionSection *section,
> >> +                                        hwaddr *out_iova, hwaddr *out_end)
> >> +{
> >> +    Int128 llend;
> >> +    hwaddr iova;
> >> +
> >> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> >> +    llend = int128_make64(section->offset_within_address_space);
> >> +    llend = int128_add(llend, section->size);
> >> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> >> +
> >> +    if (int128_ge(int128_make64(iova), llend)) {
> >> +        return false;
> >> +    }
> >> +
> >> +    *out_iova = iova;
> >> +    *out_end = int128_get64(llend) - 1;
> >> +    return true;
> >> +}
> >> +
> >> +static void vfio_dirty_tracking_update(MemoryListener *listener,
> >> +                                       MemoryRegionSection *section)
> >> +{
> >> +    VFIOContainer *container = container_of(listener, VFIOContainer,
> >> +                                            tracking_listener);
> >> +    VFIODirtyTrackingRange *range = &container->tracking_range;
> >> +    hwaddr max32 = (1ULL << 32) - 1ULL;
> >> +    hwaddr iova, end;
> >> +
> >> +    if (!vfio_listener_valid_section(section) ||
> >> +        !vfio_get_section_iova_range(container, section, &iova, &end)) {
> >> +        return;
> >> +    }
> >> +
> >> +    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
> >> +        if (iova < max32 && end <= max32) {
> >> +                if (range->min32 > iova) {
> >> +                    range->min32 = iova;
> >> +                }
> >> +                if (range->max32 < end) {
> >> +                    range->max32 = end;
> >> +                }
> >> +                trace_vfio_device_dirty_tracking_update(iova, end,
> >> +                                            range->min32, range->max32);
> >> +        } else {
> >> +                if (!range->min64 || range->min64 > iova) {
> >> +                    range->min64 = iova;
> >> +                }
> >> +                if (range->max64 < end) {
> >> +                    range->max64 = end;
> >> +                }
> >> +                trace_vfio_device_dirty_tracking_update(iova, end,
> >> +                                            range->min64, range->max64);
> >> +        }
> >> +    }
> >> +    return;
> >> +}
> >> +
> >> +static const MemoryListener vfio_dirty_tracking_listener = {
> >> +    .name = "vfio-tracking",
> >> +    .region_add = vfio_dirty_tracking_update,
> >> +};
> >> +
> >> +static void vfio_dirty_tracking_init(VFIOContainer *container)
> >> +{
> >> +    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
> >> +    qemu_mutex_init(&container->tracking_mutex);
> >> +    container->tracking_listener = vfio_dirty_tracking_listener;
> >> +    memory_listener_register(&container->tracking_listener,
> >> +                             container->space->as);  
> > 
> > The following unregister+destroy calls seem to belong to a _fini routine.
> > Am I missing something ?
> >   
> The thinking is that once we register the memory listener, it will iterate
> over all the sections and once that is finished the memory_listener_register
> returns. So the state we initialize here isn't needed anyelse other than to
> create the range and hence we destroy it right away. It was at container_init()
> but it was unnecessary overhead to keep around if it's *only* needed when we
> start/stop dirty tracking.
> 
> So the reason I don't add a _fini method is because there's no need to teardown
> the state anywhere else other than this function.
> 
> I would argue that maybe I don't need the mutex at all as this is all serialized...

Right, this is in line with my previous comments that we don't need to
keep the listener around since we don't expect changes to memory
regions, ex. virtio-mem slots are locked down during migration, and we
don't support removal of ranges from the device anyway.  We're done with
the listener after it's built our min/max ranges.  And yes, the mutex
seems superfluous.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 09/13] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function
  2023-03-04  1:43 ` [PATCH v3 09/13] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Joao Martins
@ 2023-03-06 16:24   ` Cédric Le Goater
  0 siblings, 0 replies; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 16:24 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> From: Avihai Horon <avihaih@nvidia.com>
> 
> Extract the VFIO_IOMMU_DIRTY_PAGES ioctl code in vfio_get_dirty_bitmap()
> to its own function.
> 
> This will help the code to be more readable after next patch will add
> device dirty page bitmap sync functionality.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>



Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> ---
>   hw/vfio/common.c | 57 +++++++++++++++++++++++++++++-------------------
>   1 file changed, 35 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index aa0df0604704..b0c7d03279ab 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1579,26 +1579,13 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>       }
>   }
>   
> -static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
> -                                 uint64_t size, ram_addr_t ram_addr)
> +static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
> +                                   hwaddr iova, hwaddr size)
>   {
>       struct vfio_iommu_type1_dirty_bitmap *dbitmap;
>       struct vfio_iommu_type1_dirty_bitmap_get *range;
> -    VFIOBitmap vbmap;
>       int ret;
>   
> -    if (!container->dirty_pages_supported) {
> -        cpu_physical_memory_set_dirty_range(ram_addr, size,
> -                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
> -                                            DIRTY_CLIENTS_NOCODE);
> -        return 0;
> -    }
> -
> -    ret = vfio_bitmap_alloc(&vbmap, size);
> -    if (ret) {
> -        return -errno;
> -    }
> -
>       dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
>   
>       dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
> @@ -1613,8 +1600,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>        * to qemu_real_host_page_size.
>        */
>       range->bitmap.pgsize = qemu_real_host_page_size();
> -    range->bitmap.size = vbmap.size;
> -    range->bitmap.data = (__u64 *)vbmap.bitmap;
> +    range->bitmap.size = vbmap->size;
> +    range->bitmap.data = (__u64 *)vbmap->bitmap;
>   
>       ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
>       if (ret) {
> @@ -1622,16 +1609,42 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>           error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
>                   " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
>                   (uint64_t)range->size, errno);
> -        goto err_out;
> +    }
> +
> +    g_free(dbitmap);
> +
> +    return ret;
> +}
> +
> +static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
> +                                 uint64_t size, ram_addr_t ram_addr)
> +{
> +    VFIOBitmap vbmap;
> +    int ret;
> +
> +    if (!container->dirty_pages_supported) {
> +        cpu_physical_memory_set_dirty_range(ram_addr, size,
> +                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
> +                                            DIRTY_CLIENTS_NOCODE);
> +        return 0;
> +    }
> +
> +    ret = vfio_bitmap_alloc(&vbmap, size);
> +    if (ret) {
> +        return -errno;
> +    }
> +
> +    ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
> +    if (ret) {
> +        goto out;
>       }
>   
>       cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
>                                              vbmap.pages);
>   
> -    trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
> -                                range->bitmap.size, ram_addr);
> -err_out:
> -    g_free(dbitmap);
> +    trace_vfio_get_dirty_bitmap(container->fd, iova, size, vbmap.size,
> +                                ram_addr);
> +out:
>       g_free(vbmap.bitmap);
>   
>       return ret;



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU
  2023-03-04  1:43 ` [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU Joao Martins
@ 2023-03-06 17:00   ` Cédric Le Goater
  2023-03-06 17:04     ` Joao Martins
  2023-03-06 19:42   ` Alex Williamson
  1 sibling, 1 reply; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 17:00 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On 3/4/23 02:43, Joao Martins wrote:
> Migrating with vIOMMU will require either tracking maximum
> IOMMU supported address space (e.g. 39/48 address width on Intel)
> or range-track current mappings and dirty track the new ones
> post starting dirty tracking. This will be done as a separate
> series, so add a live migration blocker until that is fixed.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
>   hw/vfio/migration.c           |  6 +++++
>   include/hw/vfio/vfio-common.h |  2 ++
>   3 files changed, 59 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5b8456975e97..9b909f856722 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -365,6 +365,7 @@ bool vfio_mig_active(void)
>   }
>   
>   static Error *multiple_devices_migration_blocker;
> +static Error *giommu_migration_blocker;
>   
>   static unsigned int vfio_migratable_device_num(void)
>   {
> @@ -416,6 +417,56 @@ void vfio_unblock_multiple_devices_migration(void)
>       multiple_devices_migration_blocker = NULL;
>   }
>   
> +static unsigned int vfio_use_iommu_device_num(void)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    unsigned int device_num = 0;
> +
> +    QLIST_FOREACH(group, &vfio_group_list, next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (vbasedev->group->container->space->as !=
> +                                    &address_space_memory) {

Can't we avoid the second loop and test directly :

   group->container->space->as

?


The rest looks good. So,

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> +                device_num++;
> +            }
> +        }
> +    }
> +
> +    return device_num;
> +}
> +
> +int vfio_block_giommu_migration(Error **errp)
> +{
> +    int ret;
> +
> +    if (giommu_migration_blocker ||
> +        !vfio_use_iommu_device_num()) {
> +        return 0;
> +    }
> +
> +    error_setg(&giommu_migration_blocker,
> +               "Migration is currently not supported with vIOMMU enabled");
> +    ret = migrate_add_blocker(giommu_migration_blocker, errp);
> +    if (ret < 0) {
> +        error_free(giommu_migration_blocker);
> +        giommu_migration_blocker = NULL;
> +    }
> +
> +    return ret;
> +}
> +
> +void vfio_unblock_giommu_migration(void)
> +{
> +    if (!giommu_migration_blocker ||
> +        vfio_use_iommu_device_num()) {
> +        return;
> +    }
> +
> +    migrate_del_blocker(giommu_migration_blocker);
> +    error_free(giommu_migration_blocker);
> +    giommu_migration_blocker = NULL;
> +}
> +
>   static void vfio_set_migration_error(int err)
>   {
>       MigrationState *ms = migrate_get_current();
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index a2c3d9bade7f..3e75868ae7a9 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -634,6 +634,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>           return ret;
>       }
>   
> +    ret = vfio_block_giommu_migration(errp);
> +    if (ret) {
> +        return ret;
> +    }
> +
>       trace_vfio_migration_probe(vbasedev->name);
>       return 0;
>   
> @@ -659,6 +664,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>           unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
>           vfio_migration_exit(vbasedev);
>           vfio_unblock_multiple_devices_migration();
> +        vfio_unblock_giommu_migration();
>       }
>   
>       if (vbasedev->migration_blocker) {
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 1cbbccd91e11..38e44258925b 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -233,6 +233,8 @@ extern VFIOGroupList vfio_group_list;
>   bool vfio_mig_active(void);
>   int vfio_block_multiple_devices_migration(Error **errp);
>   void vfio_unblock_multiple_devices_migration(void);
> +int vfio_block_giommu_migration(Error **errp);
> +void vfio_unblock_giommu_migration(void);
>   int64_t vfio_mig_bytes_transferred(void);
>   
>   #ifdef CONFIG_LINUX



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU
  2023-03-06 17:00   ` Cédric Le Goater
@ 2023-03-06 17:04     ` Joao Martins
  0 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 17:04 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On 06/03/2023 17:00, Cédric Le Goater wrote:
> On 3/4/23 02:43, Joao Martins wrote:
>> Migrating with vIOMMU will require either tracking maximum
>> IOMMU supported address space (e.g. 39/48 address width on Intel)
>> or range-track current mappings and dirty track the new ones
>> post starting dirty tracking. This will be done as a separate
>> series, so add a live migration blocker until that is fixed.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
>>   hw/vfio/migration.c           |  6 +++++
>>   include/hw/vfio/vfio-common.h |  2 ++
>>   3 files changed, 59 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 5b8456975e97..9b909f856722 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -365,6 +365,7 @@ bool vfio_mig_active(void)
>>   }
>>     static Error *multiple_devices_migration_blocker;
>> +static Error *giommu_migration_blocker;
>>     static unsigned int vfio_migratable_device_num(void)
>>   {
>> @@ -416,6 +417,56 @@ void vfio_unblock_multiple_devices_migration(void)
>>       multiple_devices_migration_blocker = NULL;
>>   }
>>   +static unsigned int vfio_use_iommu_device_num(void)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    unsigned int device_num = 0;
>> +
>> +    QLIST_FOREACH(group, &vfio_group_list, next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (vbasedev->group->container->space->as !=
>> +                                    &address_space_memory) {
> 
> Can't we avoid the second loop and test directly :
> 
>   group->container->space->as
> 
> ?
>
Ah yes

> 
> The rest looks good. So,
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> 
Thanks!

> Thanks,
> 
> C.
> 
>> +                device_num++;
>> +            }
>> +        }
>> +    }
>> +
>> +    return device_num;
>> +}
>> +
>> +int vfio_block_giommu_migration(Error **errp)
>> +{
>> +    int ret;
>> +
>> +    if (giommu_migration_blocker ||
>> +        !vfio_use_iommu_device_num()) {
>> +        return 0;
>> +    }
>> +
>> +    error_setg(&giommu_migration_blocker,
>> +               "Migration is currently not supported with vIOMMU enabled");
>> +    ret = migrate_add_blocker(giommu_migration_blocker, errp);
>> +    if (ret < 0) {
>> +        error_free(giommu_migration_blocker);
>> +        giommu_migration_blocker = NULL;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +void vfio_unblock_giommu_migration(void)
>> +{
>> +    if (!giommu_migration_blocker ||
>> +        vfio_use_iommu_device_num()) {
>> +        return;
>> +    }
>> +
>> +    migrate_del_blocker(giommu_migration_blocker);
>> +    error_free(giommu_migration_blocker);
>> +    giommu_migration_blocker = NULL;
>> +}
>> +
>>   static void vfio_set_migration_error(int err)
>>   {
>>       MigrationState *ms = migrate_get_current();
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index a2c3d9bade7f..3e75868ae7a9 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -634,6 +634,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>>           return ret;
>>       }
>>   +    ret = vfio_block_giommu_migration(errp);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>>       trace_vfio_migration_probe(vbasedev->name);
>>       return 0;
>>   @@ -659,6 +664,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>>           unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
>>           vfio_migration_exit(vbasedev);
>>           vfio_unblock_multiple_devices_migration();
>> +        vfio_unblock_giommu_migration();
>>       }
>>         if (vbasedev->migration_blocker) {
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 1cbbccd91e11..38e44258925b 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -233,6 +233,8 @@ extern VFIOGroupList vfio_group_list;
>>   bool vfio_mig_active(void);
>>   int vfio_block_multiple_devices_migration(Error **errp);
>>   void vfio_unblock_multiple_devices_migration(void);
>> +int vfio_block_giommu_migration(Error **errp);
>> +void vfio_unblock_giommu_migration(void);
>>   int64_t vfio_mig_bytes_transferred(void);
>>     #ifdef CONFIG_LINUX
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking
  2023-03-04  1:43 ` [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking Joao Martins
@ 2023-03-06 17:15   ` Cédric Le Goater
  2023-03-06 17:18     ` Joao Martins
  0 siblings, 1 reply; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 17:15 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> From: Avihai Horon <avihaih@nvidia.com>
> 
> Adjust the VFIO dirty page tracking documentation and add a section to
> describe device dirty page tracking.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   docs/devel/vfio-migration.rst | 46 +++++++++++++++++++++++------------
>   1 file changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index c214c73e2818..1b68ccf11529 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -59,22 +59,37 @@ System memory dirty pages tracking
>   ----------------------------------
>   
>   A ``log_global_start`` and ``log_global_stop`` memory listener callback informs
> -the VFIO IOMMU module to start and stop dirty page tracking. A ``log_sync``
> -memory listener callback marks those system memory pages as dirty which are
> -used for DMA by the VFIO device. The dirty pages bitmap is queried per
> -container. All pages pinned by the vendor driver through external APIs have to
> -be marked as dirty during migration. When there are CPU writes, CPU dirty page
> -tracking can identify dirtied pages, but any page pinned by the vendor driver
> -can also be written by the device. There is currently no device or IOMMU
> -support for dirty page tracking in hardware.
> +the VFIO dirty tracking module to start and stop dirty page tracking. A
> +``log_sync`` memory listener callback queries the dirty page bitmap from the
> +dirty tracking module and marks system memory pages which were DMA-ed by the
> +VFIO device as dirty. The dirty page bitmap is queried per container.
> +
> +Currently there are two ways dirty page tracking can be done:
> +(1) Device dirty tracking:
> +In this method the device is responsible to log and report its DMAs. This
> +method can be used only if the device is capable of tracking its DMAs.
> +Discovering device capability, starting and stopping dirty tracking, and
> +syncing the dirty bitmaps from the device are done using the DMA logging uAPI.
> +More info about the uAPI can be found in the comments of the
> +``vfio_device_feature_dma_logging_control`` and
> +``vfio_device_feature_dma_logging_report`` structures in the header file
> +linux-headers/linux/vfio.h.
> +
> +(2) VFIO IOMMU module:
> +In this method dirty tracking is done by IOMMU. However, there is currently no
> +IOMMU support for dirty page tracking. For this reason, all pages are
> +perpetually marked dirty, unless the device driver pins pages through external
> +APIs in which case only those pinned pages are perpetually marked dirty.
> +
> +If the above two methods are not supported, all pages are perpetually marked
> +dirty by QEMU.
>   
>   By default, dirty pages are tracked during pre-copy as well as stop-and-copy
> -phase. So, a page pinned by the vendor driver will be copied to the destination
> -in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
> -it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
> -finding dirty pages continuously, then it understands that even in stop-and-copy
> -phase, it is likely to find dirty pages and can predict the downtime
> -accordingly.
> +phase. So, a page marked as dirty will be copied to the destination in both
> +phases. Copying dirty pages in pre-copy phase helps QEMU to predict if it can
> +achieve its downtime tolerances. If QEMU during pre-copy phase keeps finding
> +dirty pages continuously, then it understands that even in stop-and-copy phase,
> +it is likely to find dirty pages and can predict the downtime accordingly.
>   
>   QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
>   which disables querying the dirty bitmap during pre-copy phase. If it is set to
> @@ -89,7 +104,8 @@ phase of migration. In that case, the unmap ioctl returns any dirty pages in
>   that range and QEMU reports corresponding guest physical pages dirty. During
>   stop-and-copy phase, an IOMMU notifier is used to get a callback for mapped
>   pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for those
> -mapped ranges.
> +mapped ranges. If device dirty tracking is enabled with vIOMMU, live migration
> +will be blocked.

There is a limitation with multiple devices also.

Thanks,

C.

>   
>   Flow of state changes during Live migration
>   ===========================================



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking
  2023-03-06 17:15   ` Cédric Le Goater
@ 2023-03-06 17:18     ` Joao Martins
  2023-03-06 17:21       ` Joao Martins
  2023-03-06 17:21       ` Cédric Le Goater
  0 siblings, 2 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 17:18 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon



On 06/03/2023 17:15, Cédric Le Goater wrote:
> On 3/4/23 02:43, Joao Martins wrote:
>> From: Avihai Horon <avihaih@nvidia.com>
>>
>> Adjust the VFIO dirty page tracking documentation and add a section to
>> describe device dirty page tracking.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   docs/devel/vfio-migration.rst | 46 +++++++++++++++++++++++------------
>>   1 file changed, 31 insertions(+), 15 deletions(-)
>>
>> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
>> index c214c73e2818..1b68ccf11529 100644
>> --- a/docs/devel/vfio-migration.rst
>> +++ b/docs/devel/vfio-migration.rst
>> @@ -59,22 +59,37 @@ System memory dirty pages tracking
>>   ----------------------------------
>>     A ``log_global_start`` and ``log_global_stop`` memory listener callback
>> informs
>> -the VFIO IOMMU module to start and stop dirty page tracking. A ``log_sync``
>> -memory listener callback marks those system memory pages as dirty which are
>> -used for DMA by the VFIO device. The dirty pages bitmap is queried per
>> -container. All pages pinned by the vendor driver through external APIs have to
>> -be marked as dirty during migration. When there are CPU writes, CPU dirty page
>> -tracking can identify dirtied pages, but any page pinned by the vendor driver
>> -can also be written by the device. There is currently no device or IOMMU
>> -support for dirty page tracking in hardware.
>> +the VFIO dirty tracking module to start and stop dirty page tracking. A
>> +``log_sync`` memory listener callback queries the dirty page bitmap from the
>> +dirty tracking module and marks system memory pages which were DMA-ed by the
>> +VFIO device as dirty. The dirty page bitmap is queried per container.
>> +
>> +Currently there are two ways dirty page tracking can be done:
>> +(1) Device dirty tracking:
>> +In this method the device is responsible to log and report its DMAs. This
>> +method can be used only if the device is capable of tracking its DMAs.
>> +Discovering device capability, starting and stopping dirty tracking, and
>> +syncing the dirty bitmaps from the device are done using the DMA logging uAPI.
>> +More info about the uAPI can be found in the comments of the
>> +``vfio_device_feature_dma_logging_control`` and
>> +``vfio_device_feature_dma_logging_report`` structures in the header file
>> +linux-headers/linux/vfio.h.
>> +
>> +(2) VFIO IOMMU module:
>> +In this method dirty tracking is done by IOMMU. However, there is currently no
>> +IOMMU support for dirty page tracking. For this reason, all pages are
>> +perpetually marked dirty, unless the device driver pins pages through external
>> +APIs in which case only those pinned pages are perpetually marked dirty.
>> +
>> +If the above two methods are not supported, all pages are perpetually marked
>> +dirty by QEMU.
>>     By default, dirty pages are tracked during pre-copy as well as stop-and-copy
>> -phase. So, a page pinned by the vendor driver will be copied to the destination
>> -in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
>> -it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
>> -finding dirty pages continuously, then it understands that even in stop-and-copy
>> -phase, it is likely to find dirty pages and can predict the downtime
>> -accordingly.
>> +phase. So, a page marked as dirty will be copied to the destination in both
>> +phases. Copying dirty pages in pre-copy phase helps QEMU to predict if it can
>> +achieve its downtime tolerances. If QEMU during pre-copy phase keeps finding
>> +dirty pages continuously, then it understands that even in stop-and-copy phase,
>> +it is likely to find dirty pages and can predict the downtime accordingly.
>>     QEMU also provides a per device opt-out option
>> ``pre-copy-dirty-page-tracking``
>>   which disables querying the dirty bitmap during pre-copy phase. If it is set to
>> @@ -89,7 +104,8 @@ phase of migration. In that case, the unmap ioctl returns
>> any dirty pages in
>>   that range and QEMU reports corresponding guest physical pages dirty. During
>>   stop-and-copy phase, an IOMMU notifier is used to get a callback for mapped
>>   pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for those
>> -mapped ranges.
>> +mapped ranges. If device dirty tracking is enabled with vIOMMU, live migration
>> +will be blocked.
> 
> There is a limitation with multiple devices also.
> 
I'm aware. I just didn't write it because the section I am changing is specific
to vIOMMU.

> Thanks,
> 
> C.
> 
>>     Flow of state changes during Live migration
>>   ===========================================
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 12/13] vfio/migration: Query device dirty page tracking support
  2023-03-04  1:43 ` [PATCH v3 12/13] vfio/migration: Query device dirty page tracking support Joao Martins
@ 2023-03-06 17:20   ` Cédric Le Goater
  0 siblings, 0 replies; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 17:20 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> Now that everything has been set up for device dirty page tracking,
> query the device for device dirty page tracking support.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> ---
>   hw/vfio/migration.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3e75868ae7a9..da3aa596b3ec 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -555,6 +555,19 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
>       return 0;
>   }
>   
> +static bool vfio_dma_logging_supported(VFIODevice *vbasedev)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags =
> +        VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> +
> +    return !ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +}
> +
>   static int vfio_migration_init(VFIODevice *vbasedev)
>   {
>       int ret;
> @@ -589,6 +602,8 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       migration->device_state = VFIO_DEVICE_STATE_RUNNING;
>       migration->data_fd = -1;
>   
> +    vbasedev->dirty_pages_supported = vfio_dma_logging_supported(vbasedev);
> +
>       oid = vmstate_if_get_id(VMSTATE_IF(DEVICE(obj)));
>       if (oid) {
>           path = g_strdup_printf("%s/vfio", oid);



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking
  2023-03-06 17:18     ` Joao Martins
@ 2023-03-06 17:21       ` Joao Martins
  2023-03-06 17:21       ` Cédric Le Goater
  1 sibling, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 17:21 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 06/03/2023 17:18, Joao Martins wrote:
> On 06/03/2023 17:15, Cédric Le Goater wrote:
>> On 3/4/23 02:43, Joao Martins wrote:
>>> From: Avihai Horon <avihaih@nvidia.com>
>>>
>>> Adjust the VFIO dirty page tracking documentation and add a section to
>>> describe device dirty page tracking.
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>>   docs/devel/vfio-migration.rst | 46 +++++++++++++++++++++++------------
>>>   1 file changed, 31 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
>>> index c214c73e2818..1b68ccf11529 100644
>>> --- a/docs/devel/vfio-migration.rst
>>> +++ b/docs/devel/vfio-migration.rst
>>> @@ -59,22 +59,37 @@ System memory dirty pages tracking
>>>   ----------------------------------
>>>     A ``log_global_start`` and ``log_global_stop`` memory listener callback
>>> informs
>>> -the VFIO IOMMU module to start and stop dirty page tracking. A ``log_sync``
>>> -memory listener callback marks those system memory pages as dirty which are
>>> -used for DMA by the VFIO device. The dirty pages bitmap is queried per
>>> -container. All pages pinned by the vendor driver through external APIs have to
>>> -be marked as dirty during migration. When there are CPU writes, CPU dirty page
>>> -tracking can identify dirtied pages, but any page pinned by the vendor driver
>>> -can also be written by the device. There is currently no device or IOMMU
>>> -support for dirty page tracking in hardware.
>>> +the VFIO dirty tracking module to start and stop dirty page tracking. A
>>> +``log_sync`` memory listener callback queries the dirty page bitmap from the
>>> +dirty tracking module and marks system memory pages which were DMA-ed by the
>>> +VFIO device as dirty. The dirty page bitmap is queried per container.
>>> +
>>> +Currently there are two ways dirty page tracking can be done:
>>> +(1) Device dirty tracking:
>>> +In this method the device is responsible to log and report its DMAs. This
>>> +method can be used only if the device is capable of tracking its DMAs.
>>> +Discovering device capability, starting and stopping dirty tracking, and
>>> +syncing the dirty bitmaps from the device are done using the DMA logging uAPI.
>>> +More info about the uAPI can be found in the comments of the
>>> +``vfio_device_feature_dma_logging_control`` and
>>> +``vfio_device_feature_dma_logging_report`` structures in the header file
>>> +linux-headers/linux/vfio.h.
>>> +
>>> +(2) VFIO IOMMU module:
>>> +In this method dirty tracking is done by IOMMU. However, there is currently no
>>> +IOMMU support for dirty page tracking. For this reason, all pages are
>>> +perpetually marked dirty, unless the device driver pins pages through external
>>> +APIs in which case only those pinned pages are perpetually marked dirty.
>>> +
>>> +If the above two methods are not supported, all pages are perpetually marked
>>> +dirty by QEMU.
>>>     By default, dirty pages are tracked during pre-copy as well as stop-and-copy
>>> -phase. So, a page pinned by the vendor driver will be copied to the destination
>>> -in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
>>> -it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
>>> -finding dirty pages continuously, then it understands that even in stop-and-copy
>>> -phase, it is likely to find dirty pages and can predict the downtime
>>> -accordingly.
>>> +phase. So, a page marked as dirty will be copied to the destination in both
>>> +phases. Copying dirty pages in pre-copy phase helps QEMU to predict if it can
>>> +achieve its downtime tolerances. If QEMU during pre-copy phase keeps finding
>>> +dirty pages continuously, then it understands that even in stop-and-copy phase,
>>> +it is likely to find dirty pages and can predict the downtime accordingly.
>>>     QEMU also provides a per device opt-out option
>>> ``pre-copy-dirty-page-tracking``
>>>   which disables querying the dirty bitmap during pre-copy phase. If it is set to
>>> @@ -89,7 +104,8 @@ phase of migration. In that case, the unmap ioctl returns
>>> any dirty pages in
>>>   that range and QEMU reports corresponding guest physical pages dirty. During
>>>   stop-and-copy phase, an IOMMU notifier is used to get a callback for mapped
>>>   pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for those
>>> -mapped ranges.
>>> +mapped ranges. If device dirty tracking is enabled with vIOMMU, live migration
>>> +will be blocked.
>>
>> There is a limitation with multiple devices also.
>>
> I'm aware. I just didn't write it because the section I am changing is specific
> to vIOMMU.
> 
... and this patch is covering device dirty tracking


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking
  2023-03-06 17:18     ` Joao Martins
  2023-03-06 17:21       ` Joao Martins
@ 2023-03-06 17:21       ` Cédric Le Goater
  1 sibling, 0 replies; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 17:21 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/6/23 18:18, Joao Martins wrote:
> 
> 
> On 06/03/2023 17:15, Cédric Le Goater wrote:
>> On 3/4/23 02:43, Joao Martins wrote:
>>> From: Avihai Horon <avihaih@nvidia.com>
>>>
>>> Adjust the VFIO dirty page tracking documentation and add a section to
>>> describe device dirty page tracking.
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>>    docs/devel/vfio-migration.rst | 46 +++++++++++++++++++++++------------
>>>    1 file changed, 31 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
>>> index c214c73e2818..1b68ccf11529 100644
>>> --- a/docs/devel/vfio-migration.rst
>>> +++ b/docs/devel/vfio-migration.rst
>>> @@ -59,22 +59,37 @@ System memory dirty pages tracking
>>>    ----------------------------------
>>>      A ``log_global_start`` and ``log_global_stop`` memory listener callback
>>> informs
>>> -the VFIO IOMMU module to start and stop dirty page tracking. A ``log_sync``
>>> -memory listener callback marks those system memory pages as dirty which are
>>> -used for DMA by the VFIO device. The dirty pages bitmap is queried per
>>> -container. All pages pinned by the vendor driver through external APIs have to
>>> -be marked as dirty during migration. When there are CPU writes, CPU dirty page
>>> -tracking can identify dirtied pages, but any page pinned by the vendor driver
>>> -can also be written by the device. There is currently no device or IOMMU
>>> -support for dirty page tracking in hardware.
>>> +the VFIO dirty tracking module to start and stop dirty page tracking. A
>>> +``log_sync`` memory listener callback queries the dirty page bitmap from the
>>> +dirty tracking module and marks system memory pages which were DMA-ed by the
>>> +VFIO device as dirty. The dirty page bitmap is queried per container.
>>> +
>>> +Currently there are two ways dirty page tracking can be done:
>>> +(1) Device dirty tracking:
>>> +In this method the device is responsible to log and report its DMAs. This
>>> +method can be used only if the device is capable of tracking its DMAs.
>>> +Discovering device capability, starting and stopping dirty tracking, and
>>> +syncing the dirty bitmaps from the device are done using the DMA logging uAPI.
>>> +More info about the uAPI can be found in the comments of the
>>> +``vfio_device_feature_dma_logging_control`` and
>>> +``vfio_device_feature_dma_logging_report`` structures in the header file
>>> +linux-headers/linux/vfio.h.
>>> +
>>> +(2) VFIO IOMMU module:
>>> +In this method dirty tracking is done by IOMMU. However, there is currently no
>>> +IOMMU support for dirty page tracking. For this reason, all pages are
>>> +perpetually marked dirty, unless the device driver pins pages through external
>>> +APIs in which case only those pinned pages are perpetually marked dirty.
>>> +
>>> +If the above two methods are not supported, all pages are perpetually marked
>>> +dirty by QEMU.
>>>      By default, dirty pages are tracked during pre-copy as well as stop-and-copy
>>> -phase. So, a page pinned by the vendor driver will be copied to the destination
>>> -in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
>>> -it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
>>> -finding dirty pages continuously, then it understands that even in stop-and-copy
>>> -phase, it is likely to find dirty pages and can predict the downtime
>>> -accordingly.
>>> +phase. So, a page marked as dirty will be copied to the destination in both
>>> +phases. Copying dirty pages in pre-copy phase helps QEMU to predict if it can
>>> +achieve its downtime tolerances. If QEMU during pre-copy phase keeps finding
>>> +dirty pages continuously, then it understands that even in stop-and-copy phase,
>>> +it is likely to find dirty pages and can predict the downtime accordingly.
>>>      QEMU also provides a per device opt-out option
>>> ``pre-copy-dirty-page-tracking``
>>>    which disables querying the dirty bitmap during pre-copy phase. If it is set to
>>> @@ -89,7 +104,8 @@ phase of migration. In that case, the unmap ioctl returns
>>> any dirty pages in
>>>    that range and QEMU reports corresponding guest physical pages dirty. During
>>>    stop-and-copy phase, an IOMMU notifier is used to get a callback for mapped
>>>    pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for those
>>> -mapped ranges.
>>> +mapped ranges. If device dirty tracking is enabled with vIOMMU, live migration
>>> +will be blocked.
>>
>> There is a limitation with multiple devices also.
>>
> I'm aware. I just didn't write it because the section I am changing is specific
> to vIOMMU.


Ah OK. I didn't check, sorry.

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
                   ` (13 preceding siblings ...)
  2023-03-05 20:57 ` [PATCH v3 00/13] vfio/migration: Device " Alex Williamson
@ 2023-03-06 17:23 ` Cédric Le Goater
  2023-03-06 19:41   ` Joao Martins
  2023-03-07  8:33   ` Avihai Horon
  14 siblings, 2 replies; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 17:23 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> Hey,
> 
> Presented herewith a series based on the basic VFIO migration protocol v2
> implementation [1].
> 
> It is split from its parent series[5] to solely focus on device dirty
> page tracking. Device dirty page tracking allows the VFIO device to
> record its DMAs and report them back when needed. This is part of VFIO
> migration and is used during pre-copy phase of migration to track the
> RAM pages that the device has written to and mark those pages dirty, so
> they can later be re-sent to target.
> 
> Device dirty page tracking uses the DMA logging uAPI to discover device
> capabilities, to start and stop tracking, and to get dirty page bitmap
> report. Extra details and uAPI definition can be found here [3].
> 
> Device dirty page tracking operates in VFIOContainer scope. I.e., When
> dirty tracking is started, stopped or dirty page report is queried, all
> devices within a VFIOContainer are iterated and for each of them device
> dirty page tracking is started, stopped or dirty page report is queried,
> respectively.
> 
> Device dirty page tracking is used only if all devices within a
> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
> used, and if that is not supported as well, memory is perpetually marked
> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
> support, the last two usually have the same effect of perpetually
> marking all pages dirty.
> 
> Normally, when asked to start dirty tracking, all the currently DMA
> mapped ranges are tracked by device dirty page tracking. If using a
> vIOMMU we block live migration. It's temporary and a separate series is
> going to add support for it. Thus this series focus on getting the
> ground work first.
> 
> The series is organized as follows:
> 
> - Patches 1-7: Fix bugs and do some preparatory work required prior to
>    adding device dirty page tracking.
> - Patches 8-10: Implement device dirty page tracking.
> - Patch 11: Blocks live migration with vIOMMU.
> - Patches 12-13 enable device dirty page tracking and document it.
> 
> Comments, improvements as usual appreciated.

It would be helpful to have some feed back from Avihai on the new patches
introduced in v3 or v4 before merging.

Also, (being curious) did you test migration with a TCG guest ?

Thanks,

C.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-04  1:43 ` [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges Joao Martins
  2023-03-06 13:41   ` Cédric Le Goater
@ 2023-03-06 18:05   ` Cédric Le Goater
  2023-03-06 19:45     ` Joao Martins
  2023-03-06 18:15   ` Alex Williamson
  2 siblings, 1 reply; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 18:05 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> According to the device DMA logging uAPI, IOVA ranges to be logged by
> the device must be provided all at once upon DMA logging start.
> 
> As preparation for the following patches which will add device dirty
> page tracking, keep a record of all DMA mapped IOVA ranges so later they
> can be used for DMA logging start.
> 
> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

One question below,

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> ---
>   hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
>   hw/vfio/trace-events          |  1 +
>   include/hw/vfio/vfio-common.h | 11 +++++
>   3 files changed, 96 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ed908e303dbb..d84e5fd86bb4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -44,6 +44,7 @@
>   #include "migration/blocker.h"
>   #include "migration/qemu-file.h"
>   #include "sysemu/tpm.h"
> +#include "qemu/iova-tree.h"
>   
>   VFIOGroupList vfio_group_list =
>       QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>       return ret;
>   }
>   
> +/*
> + * Called for the dirty tracking memory listener to calculate the iova/end
> + * for a given memory region section. The checks here, replicate the logic
> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus
> + * both listener should be kept in sync.
> + */
> +static bool vfio_get_section_iova_range(VFIOContainer *container,
> +                                        MemoryRegionSection *section,
> +                                        hwaddr *out_iova, hwaddr *out_end)
> +{
> +    Int128 llend;
> +    hwaddr iova;
> +
> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> +
> +    if (int128_ge(int128_make64(iova), llend)) {
> +        return false;
> +    }
> +
> +    *out_iova = iova;
> +    *out_end = int128_get64(llend) - 1;
> +    return true;
> +}
> +
> +static void vfio_dirty_tracking_update(MemoryListener *listener,
> +                                       MemoryRegionSection *section)
> +{
> +    VFIOContainer *container = container_of(listener, VFIOContainer,
> +                                            tracking_listener);
> +    VFIODirtyTrackingRange *range = &container->tracking_range;
> +    hwaddr max32 = (1ULL << 32) - 1ULL;
> +    hwaddr iova, end;
> +
> +    if (!vfio_listener_valid_section(section) ||
> +        !vfio_get_section_iova_range(container, section, &iova, &end)) {
> +        return;
> +    }
> +
> +    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
> +        if (iova < max32 && end <= max32) {
> +                if (range->min32 > iova) {

With the memset(0) done in vfio_dirty_tracking_init(), min32 will always
be 0. Is that OK ?

> +                    range->min32 = iova;
> +                }
> +                if (range->max32 < end) {
> +                    range->max32 = end;
> +                }
> +                trace_vfio_device_dirty_tracking_update(iova, end,
> +                                            range->min32, range->max32);
> +        } else {
> +                if (!range->min64 || range->min64 > iova) {
> +                    range->min64 = iova;
> +                }
> +                if (range->max64 < end) {
> +                    range->max64 = end;
> +                }
> +                trace_vfio_device_dirty_tracking_update(iova, end,
> +                                            range->min64, range->max64);
> +        }
> +    }
> +    return;
> +}
> +
> +static const MemoryListener vfio_dirty_tracking_listener = {
> +    .name = "vfio-tracking",
> +    .region_add = vfio_dirty_tracking_update,
> +};
> +
> +static void vfio_dirty_tracking_init(VFIOContainer *container)
> +{
> +    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
> +    qemu_mutex_init(&container->tracking_mutex);
> +    container->tracking_listener = vfio_dirty_tracking_listener;
> +    memory_listener_register(&container->tracking_listener,
> +                             container->space->as);
> +    memory_listener_unregister(&container->tracking_listener);
> +    qemu_mutex_destroy(&container->tracking_mutex);
> +}
> +
>   static void vfio_listener_log_global_start(MemoryListener *listener)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>       int ret;
>   
> +    vfio_dirty_tracking_init(container);
> +
>       ret = vfio_set_dirty_page_tracking(container, true);
>       if (ret) {
>           vfio_set_migration_error(ret);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 669d9fe07cd9..d97a6de17921 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>   vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>   vfio_disconnect_container(int fd) "close container->fd=%d"
>   vfio_put_group(int fd) "close group->fd=%d"
>   vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 87524c64a443..96791add2719 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -23,6 +23,7 @@
>   
>   #include "exec/memory.h"
>   #include "qemu/queue.h"
> +#include "qemu/iova-tree.h"
>   #include "qemu/notify.h"
>   #include "ui/console.h"
>   #include "hw/display/ramfb.h"
> @@ -68,6 +69,13 @@ typedef struct VFIOMigration {
>       size_t data_buffer_size;
>   } VFIOMigration;
>   
> +typedef struct VFIODirtyTrackingRange {
> +    hwaddr min32;
> +    hwaddr max32;
> +    hwaddr min64;
> +    hwaddr max64;
> +} VFIODirtyTrackingRange;
> +
>   typedef struct VFIOAddressSpace {
>       AddressSpace *as;
>       QLIST_HEAD(, VFIOContainer) containers;
> @@ -89,6 +97,9 @@ typedef struct VFIOContainer {
>       uint64_t max_dirty_bitmap_size;
>       unsigned long pgsizes;
>       unsigned int dma_max_mappings;
> +    VFIODirtyTrackingRange tracking_range;
> +    QemuMutex tracking_mutex;
> +    MemoryListener tracking_listener;
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-04  1:43 ` [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges Joao Martins
  2023-03-06 13:41   ` Cédric Le Goater
  2023-03-06 18:05   ` Cédric Le Goater
@ 2023-03-06 18:15   ` Alex Williamson
  2023-03-06 19:32     ` Joao Martins
  2 siblings, 1 reply; 51+ messages in thread
From: Alex Williamson @ 2023-03-06 18:15 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On Sat,  4 Mar 2023 01:43:37 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> According to the device DMA logging uAPI, IOVA ranges to be logged by
> the device must be provided all at once upon DMA logging start.
> 
> As preparation for the following patches which will add device dirty
> page tracking, keep a record of all DMA mapped IOVA ranges so later they
> can be used for DMA logging start.
> 
> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.

Commit log is outdated for this version.

> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
>  hw/vfio/trace-events          |  1 +
>  include/hw/vfio/vfio-common.h | 11 +++++
>  3 files changed, 96 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ed908e303dbb..d84e5fd86bb4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -44,6 +44,7 @@
>  #include "migration/blocker.h"
>  #include "migration/qemu-file.h"
>  #include "sysemu/tpm.h"
> +#include "qemu/iova-tree.h"

Unnecessary

>  
>  VFIOGroupList vfio_group_list =
>      QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>      return ret;
>  }
>  
> +/*
> + * Called for the dirty tracking memory listener to calculate the iova/end
> + * for a given memory region section. The checks here, replicate the logic
> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus
> + * both listener should be kept in sync.
> + */
> +static bool vfio_get_section_iova_range(VFIOContainer *container,
> +                                        MemoryRegionSection *section,
> +                                        hwaddr *out_iova, hwaddr *out_end)
> +{
> +    Int128 llend;
> +    hwaddr iova;
> +
> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> +
> +    if (int128_ge(int128_make64(iova), llend)) {
> +        return false;
> +    }
> +
> +    *out_iova = iova;
> +    *out_end = int128_get64(llend) - 1;
> +    return true;
> +}

Not sure why this isn't turned into a helper here to avoid the issue
noted in the comment.  Also why do both of the existing listener
implementations resolve the end address as:

	int128_get64(int128_sub(llend, int128_one()));

While here we use:

	int128_get64(llend) - 1;

We're already out of sync.

> +
> +static void vfio_dirty_tracking_update(MemoryListener *listener,
> +                                       MemoryRegionSection *section)
> +{
> +    VFIOContainer *container = container_of(listener, VFIOContainer,
> +                                            tracking_listener);
> +    VFIODirtyTrackingRange *range = &container->tracking_range;
> +    hwaddr max32 = (1ULL << 32) - 1ULL;

UINT32_MAX

> +    hwaddr iova, end;
> +
> +    if (!vfio_listener_valid_section(section) ||
> +        !vfio_get_section_iova_range(container, section, &iova, &end)) {
> +        return;
> +    }
> +
> +    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
> +        if (iova < max32 && end <= max32) {
> +                if (range->min32 > iova) {
> +                    range->min32 = iova;
> +                }
> +                if (range->max32 < end) {
> +                    range->max32 = end;
> +                }
> +                trace_vfio_device_dirty_tracking_update(iova, end,
> +                                            range->min32, range->max32);
> +        } else {
> +                if (!range->min64 || range->min64 > iova) {
> +                    range->min64 = iova;
> +                }

I think this improperly handles a range starting at zero, min64 should
be UINT64_MAX initially.  For example, if we have ranges 0-8GB and
12-16GB, this effectively ignores the first range.  Likewise I think
range->min32 has a similar problem, it's initialized to zero, it will
never be updated to match a non-zero lowest range.  It needs to be
initialized to UINT32_MAX.

A comment describing the purpose of the 32/64 split tracking would be
useful too.

> +                if (range->max64 < end) {
> +                    range->max64 = end;
> +                }
> +                trace_vfio_device_dirty_tracking_update(iova, end,
> +                                            range->min64, range->max64);
> +        }
> +    }
> +    return;
> +}
> +
> +static const MemoryListener vfio_dirty_tracking_listener = {
> +    .name = "vfio-tracking",
> +    .region_add = vfio_dirty_tracking_update,
> +};
> +
> +static void vfio_dirty_tracking_init(VFIOContainer *container)
> +{
> +    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
> +    qemu_mutex_init(&container->tracking_mutex);

As noted in other thread, this mutex seems unnecessary.

The listener needs to be embedded in an object, but it doesn't need to
be the container.  Couldn't we create:

typedef struct VFIODirtyRanges {
    VFIOContainer *container;
    VFIODirtyTrackingRange ranges;
    MemoryListener listener;
} VFIODirectRanges;

For use here?  Caller could provide VFIODirtyTrackingRange pointer for
the resulting ranges, which then gets passed to
vfio_device_feature_dma_logging_start_create()

> +    container->tracking_listener = vfio_dirty_tracking_listener;
> +    memory_listener_register(&container->tracking_listener,
> +                             container->space->as);
> +    memory_listener_unregister(&container->tracking_listener);

It's sufficiently subtle that the listener callback is synchronous and
we're done with it here for a comment.

> +    qemu_mutex_destroy(&container->tracking_mutex);
> +}
> +
>  static void vfio_listener_log_global_start(MemoryListener *listener)
>  {
>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>      int ret;
>  
> +    vfio_dirty_tracking_init(container);
> +
>      ret = vfio_set_dirty_page_tracking(container, true);
>      if (ret) {
>          vfio_set_migration_error(ret);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 669d9fe07cd9..d97a6de17921 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>  vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>  vfio_disconnect_container(int fd) "close container->fd=%d"
>  vfio_put_group(int fd) "close group->fd=%d"
>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 87524c64a443..96791add2719 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -23,6 +23,7 @@
>  
>  #include "exec/memory.h"
>  #include "qemu/queue.h"
> +#include "qemu/iova-tree.h"

Unused.

>  #include "qemu/notify.h"
>  #include "ui/console.h"
>  #include "hw/display/ramfb.h"
> @@ -68,6 +69,13 @@ typedef struct VFIOMigration {
>      size_t data_buffer_size;
>  } VFIOMigration;
>  
> +typedef struct VFIODirtyTrackingRange {
> +    hwaddr min32;
> +    hwaddr max32;
> +    hwaddr min64;
> +    hwaddr max64;
> +} VFIODirtyTrackingRange;
> +
>  typedef struct VFIOAddressSpace {
>      AddressSpace *as;
>      QLIST_HEAD(, VFIOContainer) containers;
> @@ -89,6 +97,9 @@ typedef struct VFIOContainer {
>      uint64_t max_dirty_bitmap_size;
>      unsigned long pgsizes;
>      unsigned int dma_max_mappings;
> +    VFIODirtyTrackingRange tracking_range;
> +    QemuMutex tracking_mutex;
> +    MemoryListener tracking_listener;

Let's make this unnecessary too.  Thanks,

Alex

>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>      QLIST_HEAD(, VFIOGroup) group_list;



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop
  2023-03-04  1:43 ` [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop Joao Martins
@ 2023-03-06 18:25   ` Cédric Le Goater
  2023-03-06 18:42   ` Alex Williamson
  1 sibling, 0 replies; 51+ messages in thread
From: Cédric Le Goater @ 2023-03-06 18:25 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/4/23 02:43, Joao Martins wrote:
> Add device dirty page tracking start/stop functionality. This uses the
> device DMA logging uAPI to start and stop dirty page tracking by device.
> 
> Device dirty page tracking is used only if all devices within a
> container support device dirty page tracking.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

The device attribute 'dirty_pages_supported' is only set in patch 12,
when all checks are done. That was a bit confusing when I started
looking. It should be fine anyhow.

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/common.c              | 166 +++++++++++++++++++++++++++++++++-
>   hw/vfio/trace-events          |   1 +
>   include/hw/vfio/vfio-common.h |   2 +
>   3 files changed, 166 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index d84e5fd86bb4..aa0df0604704 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -453,6 +453,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>       return true;
>   }
>   
> +static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (!vbasedev->dirty_pages_supported) {
> +                return false;
> +            }
> +        }
> +    }
> +
> +    return true;
> +}
> +
>   /*
>    * Check if all VFIO devices are running and migration is active, which is
>    * essentially equivalent to the migration being in pre-copy phase.
> @@ -1395,15 +1411,152 @@ static void vfio_dirty_tracking_init(VFIOContainer *container)
>       qemu_mutex_destroy(&container->tracking_mutex);
>   }
>   
> +static int vfio_devices_dma_logging_set(VFIOContainer *container,
> +                                        struct vfio_device_feature *feature)
> +{
> +    bool status = (feature->flags & VFIO_DEVICE_FEATURE_MASK) ==
> +                  VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> +    VFIODevice *vbasedev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (vbasedev->dirty_tracking == status) {
> +                continue;
> +            }
> +
> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +            if (ret) {
> +                ret = -errno;
> +                error_report("%s: Failed to set DMA logging %s, err %d (%s)",
> +                             vbasedev->name, status ? "start" : "stop", ret,
> +                             strerror(errno));
> +                goto out;
> +            }
> +            vbasedev->dirty_tracking = status;
> +        }
> +    }
> +
> +out:
> +    return ret;
> +}
> +
> +static int vfio_devices_dma_logging_stop(VFIOContainer *container)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
> +
> +    return vfio_devices_dma_logging_set(container, feature);
> +}
> +
> +static struct vfio_device_feature *
> +vfio_device_feature_dma_logging_start_create(VFIOContainer *container)
> +{
> +    struct vfio_device_feature *feature;
> +    size_t feature_size;
> +    struct vfio_device_feature_dma_logging_control *control;
> +    struct vfio_device_feature_dma_logging_range *ranges;
> +    VFIODirtyTrackingRange *tracking = &container->tracking_range;
> +
> +    feature_size = sizeof(struct vfio_device_feature) +
> +                   sizeof(struct vfio_device_feature_dma_logging_control);
> +    feature = g_try_malloc0(feature_size);
> +    if (!feature) {
> +        errno = ENOMEM;
> +        return NULL;
> +    }
> +    feature->argsz = feature_size;
> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> +
> +    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
> +    control->page_size = qemu_real_host_page_size();
> +
> +    /*
> +     * DMA logging uAPI guarantees to support at least a number of ranges that
> +     * fits into a single host kernel base page.
> +     */
> +    control->num_ranges = !!tracking->max32 + !!tracking->max64;
> +    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
> +                        control->num_ranges);
> +    if (!ranges) {
> +        g_free(feature);
> +        errno = ENOMEM;
> +
> +        return NULL;
> +    }
> +
> +    control->ranges = (__aligned_u64)ranges;
> +    if (tracking->max32) {
> +        ranges->iova = tracking->min32;
> +        ranges->length = (tracking->max32 - tracking->min32) + 1;
> +        ranges++;
> +    }
> +    if (tracking->max64) {
> +        ranges->iova = tracking->min64;
> +        ranges->length = (tracking->max64 - tracking->min64) + 1;
> +    }
> +
> +    trace_vfio_device_dirty_tracking_start(control->num_ranges,
> +                                           tracking->min32, tracking->max32,
> +                                           tracking->min64, tracking->max64);
> +
> +    return feature;
> +}
> +
> +static void vfio_device_feature_dma_logging_start_destroy(
> +    struct vfio_device_feature *feature)
> +{
> +    struct vfio_device_feature_dma_logging_control *control =
> +        (struct vfio_device_feature_dma_logging_control *)feature->data;
> +    struct vfio_device_feature_dma_logging_range *ranges =
> +        (struct vfio_device_feature_dma_logging_range *)control->ranges;
> +
> +    g_free(ranges);
> +    g_free(feature);
> +}
> +
> +static int vfio_devices_dma_logging_start(VFIOContainer *container)
> +{
> +    struct vfio_device_feature *feature;
> +    int ret = 0;
> +
> +    vfio_dirty_tracking_init(container);
> +    feature = vfio_device_feature_dma_logging_start_create(container);
> +    if (!feature) {
> +        return -errno;
> +    }
> +
> +    ret = vfio_devices_dma_logging_set(container, feature);
> +    if (ret) {
> +        vfio_devices_dma_logging_stop(container);
> +    }
> +
> +    vfio_device_feature_dma_logging_start_destroy(feature);
> +
> +    return ret;
> +}
> +
>   static void vfio_listener_log_global_start(MemoryListener *listener)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>       int ret;
>   
> -    vfio_dirty_tracking_init(container);
> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> +        ret = vfio_devices_dma_logging_start(container);
> +    } else {
> +        ret = vfio_set_dirty_page_tracking(container, true);
> +    }
>   
> -    ret = vfio_set_dirty_page_tracking(container, true);
>       if (ret) {
> +        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
> +                     ret, strerror(-ret));
>           vfio_set_migration_error(ret);
>       }
>   }
> @@ -1413,8 +1566,15 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>       int ret;
>   
> -    ret = vfio_set_dirty_page_tracking(container, false);
> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> +        ret = vfio_devices_dma_logging_stop(container);
> +    } else {
> +        ret = vfio_set_dirty_page_tracking(container, false);
> +    }
> +
>       if (ret) {
> +        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
> +                     ret, strerror(-ret));
>           vfio_set_migration_error(ret);
>       }
>   }
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index d97a6de17921..7a7e0cfe5b23 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>   vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
> +vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
>   vfio_disconnect_container(int fd) "close container->fd=%d"
>   vfio_put_group(int fd) "close group->fd=%d"
>   vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 96791add2719..1cbbccd91e11 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -154,6 +154,8 @@ typedef struct VFIODevice {
>       VFIOMigration *migration;
>       Error *migration_blocker;
>       OnOffAuto pre_copy_dirty_page_tracking;
> +    bool dirty_pages_supported;
> +    bool dirty_tracking;
>   } VFIODevice;
>   
>   struct VFIODeviceOps {



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop
  2023-03-04  1:43 ` [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop Joao Martins
  2023-03-06 18:25   ` Cédric Le Goater
@ 2023-03-06 18:42   ` Alex Williamson
  2023-03-06 19:39     ` Joao Martins
  1 sibling, 1 reply; 51+ messages in thread
From: Alex Williamson @ 2023-03-06 18:42 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On Sat,  4 Mar 2023 01:43:38 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> Add device dirty page tracking start/stop functionality. This uses the
> device DMA logging uAPI to start and stop dirty page tracking by device.
> 
> Device dirty page tracking is used only if all devices within a
> container support device dirty page tracking.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  hw/vfio/common.c              | 166 +++++++++++++++++++++++++++++++++-
>  hw/vfio/trace-events          |   1 +
>  include/hw/vfio/vfio-common.h |   2 +
>  3 files changed, 166 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index d84e5fd86bb4..aa0df0604704 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -453,6 +453,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>      return true;
>  }
>  
> +static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (!vbasedev->dirty_pages_supported) {
> +                return false;
> +            }
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  /*
>   * Check if all VFIO devices are running and migration is active, which is
>   * essentially equivalent to the migration being in pre-copy phase.
> @@ -1395,15 +1411,152 @@ static void vfio_dirty_tracking_init(VFIOContainer *container)
>      qemu_mutex_destroy(&container->tracking_mutex);
>  }
>  
> +static int vfio_devices_dma_logging_set(VFIOContainer *container,
> +                                        struct vfio_device_feature *feature)
> +{
> +    bool status = (feature->flags & VFIO_DEVICE_FEATURE_MASK) ==
> +                  VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> +    VFIODevice *vbasedev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (vbasedev->dirty_tracking == status) {
> +                continue;
> +            }
> +
> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +            if (ret) {
> +                ret = -errno;
> +                error_report("%s: Failed to set DMA logging %s, err %d (%s)",
> +                             vbasedev->name, status ? "start" : "stop", ret,
> +                             strerror(errno));
> +                goto out;
> +            }

Exiting and returning an error on the first failure when starting dirty
tracking makes sense.  Does that behavior still make sense for the stop
path?  Maybe since we only support a single device it doesn't really
matter, but this needs to be revisited for multiple devices.  Thanks,

Alex

> +            vbasedev->dirty_tracking = status;
> +        }
> +    }
> +
> +out:
> +    return ret;
> +}
> +
> +static int vfio_devices_dma_logging_stop(VFIOContainer *container)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
> +
> +    return vfio_devices_dma_logging_set(container, feature);
> +}
> +
> +static struct vfio_device_feature *
> +vfio_device_feature_dma_logging_start_create(VFIOContainer *container)
> +{
> +    struct vfio_device_feature *feature;
> +    size_t feature_size;
> +    struct vfio_device_feature_dma_logging_control *control;
> +    struct vfio_device_feature_dma_logging_range *ranges;
> +    VFIODirtyTrackingRange *tracking = &container->tracking_range;
> +
> +    feature_size = sizeof(struct vfio_device_feature) +
> +                   sizeof(struct vfio_device_feature_dma_logging_control);
> +    feature = g_try_malloc0(feature_size);
> +    if (!feature) {
> +        errno = ENOMEM;
> +        return NULL;
> +    }
> +    feature->argsz = feature_size;
> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> +
> +    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
> +    control->page_size = qemu_real_host_page_size();
> +
> +    /*
> +     * DMA logging uAPI guarantees to support at least a number of ranges that
> +     * fits into a single host kernel base page.
> +     */
> +    control->num_ranges = !!tracking->max32 + !!tracking->max64;
> +    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
> +                        control->num_ranges);
> +    if (!ranges) {
> +        g_free(feature);
> +        errno = ENOMEM;
> +
> +        return NULL;
> +    }
> +
> +    control->ranges = (__aligned_u64)ranges;
> +    if (tracking->max32) {
> +        ranges->iova = tracking->min32;
> +        ranges->length = (tracking->max32 - tracking->min32) + 1;
> +        ranges++;
> +    }
> +    if (tracking->max64) {
> +        ranges->iova = tracking->min64;
> +        ranges->length = (tracking->max64 - tracking->min64) + 1;
> +    }
> +
> +    trace_vfio_device_dirty_tracking_start(control->num_ranges,
> +                                           tracking->min32, tracking->max32,
> +                                           tracking->min64, tracking->max64);
> +
> +    return feature;
> +}
> +
> +static void vfio_device_feature_dma_logging_start_destroy(
> +    struct vfio_device_feature *feature)
> +{
> +    struct vfio_device_feature_dma_logging_control *control =
> +        (struct vfio_device_feature_dma_logging_control *)feature->data;
> +    struct vfio_device_feature_dma_logging_range *ranges =
> +        (struct vfio_device_feature_dma_logging_range *)control->ranges;
> +
> +    g_free(ranges);
> +    g_free(feature);
> +}
> +
> +static int vfio_devices_dma_logging_start(VFIOContainer *container)
> +{
> +    struct vfio_device_feature *feature;
> +    int ret = 0;
> +
> +    vfio_dirty_tracking_init(container);
> +    feature = vfio_device_feature_dma_logging_start_create(container);
> +    if (!feature) {
> +        return -errno;
> +    }
> +
> +    ret = vfio_devices_dma_logging_set(container, feature);
> +    if (ret) {
> +        vfio_devices_dma_logging_stop(container);
> +    }
> +
> +    vfio_device_feature_dma_logging_start_destroy(feature);
> +
> +    return ret;
> +}
> +
>  static void vfio_listener_log_global_start(MemoryListener *listener)
>  {
>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>      int ret;
>  
> -    vfio_dirty_tracking_init(container);
> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> +        ret = vfio_devices_dma_logging_start(container);
> +    } else {
> +        ret = vfio_set_dirty_page_tracking(container, true);
> +    }
>  
> -    ret = vfio_set_dirty_page_tracking(container, true);
>      if (ret) {
> +        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
> +                     ret, strerror(-ret));
>          vfio_set_migration_error(ret);
>      }
>  }
> @@ -1413,8 +1566,15 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>      int ret;
>  
> -    ret = vfio_set_dirty_page_tracking(container, false);
> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> +        ret = vfio_devices_dma_logging_stop(container);
> +    } else {
> +        ret = vfio_set_dirty_page_tracking(container, false);
> +    }
> +
>      if (ret) {
> +        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
> +                     ret, strerror(-ret));
>          vfio_set_migration_error(ret);
>      }
>  }
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index d97a6de17921..7a7e0cfe5b23 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>  vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
> +vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
>  vfio_disconnect_container(int fd) "close container->fd=%d"
>  vfio_put_group(int fd) "close group->fd=%d"
>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 96791add2719..1cbbccd91e11 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -154,6 +154,8 @@ typedef struct VFIODevice {
>      VFIOMigration *migration;
>      Error *migration_blocker;
>      OnOffAuto pre_copy_dirty_page_tracking;
> +    bool dirty_pages_supported;
> +    bool dirty_tracking;
>  } VFIODevice;
>  
>  struct VFIODeviceOps {



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 10/13] vfio/common: Add device dirty page bitmap sync
  2023-03-04  1:43 ` [PATCH v3 10/13] vfio/common: Add device dirty page bitmap sync Joao Martins
@ 2023-03-06 19:22   ` Alex Williamson
  2023-03-06 19:42     ` Joao Martins
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Williamson @ 2023-03-06 19:22 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On Sat,  4 Mar 2023 01:43:40 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> Add device dirty page bitmap sync functionality. This uses the device
> DMA logging uAPI to sync dirty page bitmap from the device.
> 
> Device dirty page bitmap sync is used only if all devices within a
> container support device dirty page tracking.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  hw/vfio/common.c | 88 +++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 79 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b0c7d03279ab..5b8456975e97 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -342,6 +342,9 @@ static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
>      return 0;
>  }
>  
> +static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
> +                                 uint64_t size, ram_addr_t ram_addr);
> +
>  bool vfio_mig_active(void)
>  {
>      VFIOGroup *group;
> @@ -565,10 +568,16 @@ static int vfio_dma_unmap(VFIOContainer *container,
>          .iova = iova,
>          .size = size,
>      };
> +    bool need_dirty_sync = false;
> +    int ret;
> +
> +    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
> +        if (!vfio_devices_all_device_dirty_tracking(container) &&
> +            container->dirty_pages_supported) {
> +            return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
> +        }
>  
> -    if (iotlb && container->dirty_pages_supported &&
> -        vfio_devices_all_running_and_mig_active(container)) {
> -        return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
> +        need_dirty_sync = true;
>      }
>  
>      while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
> @@ -594,10 +603,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
>          return -errno;
>      }
>  
> -    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
> -        cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
> -                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
> -                                            DIRTY_CLIENTS_NOCODE);
> +    if (need_dirty_sync) {
> +        ret = vfio_get_dirty_bitmap(container, iova, size,
> +                                    iotlb->translated_addr);
> +        if (ret) {
> +            return ret;
> +        }
>      }
>  
>      return 0;
> @@ -1579,6 +1590,58 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>      }
>  }
>  
> +static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
> +                                          hwaddr size, void *bitmap)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
> +                        sizeof(struct vfio_device_feature_dma_logging_report),
> +                        sizeof(__aligned_u64))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +    struct vfio_device_feature_dma_logging_report *report =
> +        (struct vfio_device_feature_dma_logging_report *)feature->data;
> +
> +    report->iova = iova;
> +    report->length = size;
> +    report->page_size = qemu_real_host_page_size();
> +    report->bitmap = (__aligned_u64)bitmap;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags =
> +        VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;

Nit, the series is inconsistent between initializing flags as above and
as:

    feature->flags = VFIO_DEVICE_FEATURE_GET;
    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;

My personal preference would be more like:

    feature->flags = VFIO_DEVICE_FEATURE_GET |
                     VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;

Thanks,
Alex

> +
> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        return -errno;
> +    }
> +
> +    return 0;
> +}
> +
> +static int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
> +                                           VFIOBitmap *vbmap, hwaddr iova,
> +                                           hwaddr size)
> +{
> +    VFIODevice *vbasedev;
> +    VFIOGroup *group;
> +    int ret;
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            ret = vfio_device_dma_logging_report(vbasedev, iova, size,
> +                                                 vbmap->bitmap);
> +            if (ret) {
> +                error_report("%s: Failed to get DMA logging report, iova: "
> +                             "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
> +                             ", err: %d (%s)",
> +                             vbasedev->name, iova, size, ret, strerror(-ret));
> +
> +                return ret;
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
>                                     hwaddr iova, hwaddr size)
>  {
> @@ -1619,10 +1682,12 @@ static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
>  static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>                                   uint64_t size, ram_addr_t ram_addr)
>  {
> +    bool all_device_dirty_tracking =
> +        vfio_devices_all_device_dirty_tracking(container);
>      VFIOBitmap vbmap;
>      int ret;
>  
> -    if (!container->dirty_pages_supported) {
> +    if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
>          cpu_physical_memory_set_dirty_range(ram_addr, size,
>                                              tcg_enabled() ? DIRTY_CLIENTS_ALL :
>                                              DIRTY_CLIENTS_NOCODE);
> @@ -1634,7 +1699,12 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>          return -errno;
>      }
>  
> -    ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
> +    if (all_device_dirty_tracking) {
> +        ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
> +    } else {
> +        ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
> +    }
> +
>      if (ret) {
>          goto out;
>      }



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-06 18:15   ` Alex Williamson
@ 2023-03-06 19:32     ` Joao Martins
  0 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 19:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On 06/03/2023 18:15, Alex Williamson wrote:
> On Sat,  4 Mar 2023 01:43:37 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> According to the device DMA logging uAPI, IOVA ranges to be logged by
>> the device must be provided all at once upon DMA logging start.
>>
>> As preparation for the following patches which will add device dirty
>> page tracking, keep a record of all DMA mapped IOVA ranges so later they
>> can be used for DMA logging start.
>>
>> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
>> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.
> 
> Commit log is outdated for this version.
>
I will remove the paragraph. I can't mention vIOMMU usage blocks migration as I
effectively do that later in the series.

>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
>>  hw/vfio/trace-events          |  1 +
>>  include/hw/vfio/vfio-common.h | 11 +++++
>>  3 files changed, 96 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index ed908e303dbb..d84e5fd86bb4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -44,6 +44,7 @@
>>  #include "migration/blocker.h"
>>  #include "migration/qemu-file.h"
>>  #include "sysemu/tpm.h"
>> +#include "qemu/iova-tree.h"
> 
> Unnecessary
> 
True, I had it removed for v4 as Avihai pointed that out to me too offlist.
Same for the other one down below.

>>  
>>  VFIOGroupList vfio_group_list =
>>      QLIST_HEAD_INITIALIZER(vfio_group_list);
>> @@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>>      return ret;
>>  }
>>  
>> +/*
>> + * Called for the dirty tracking memory listener to calculate the iova/end
>> + * for a given memory region section. The checks here, replicate the logic
>> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus
>> + * both listener should be kept in sync.
>> + */
>> +static bool vfio_get_section_iova_range(VFIOContainer *container,
>> +                                        MemoryRegionSection *section,
>> +                                        hwaddr *out_iova, hwaddr *out_end)
>> +{
>> +    Int128 llend;
>> +    hwaddr iova;
>> +
>> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>> +    llend = int128_make64(section->offset_within_address_space);
>> +    llend = int128_add(llend, section->size);
>> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
>> +
>> +    if (int128_ge(int128_make64(iova), llend)) {
>> +        return false;
>> +    }
>> +
>> +    *out_iova = iova;
>> +    *out_end = int128_get64(llend) - 1;
>> +    return true;
>> +}
> 
> Not sure why this isn't turned into a helper here to avoid the issue
> noted in the comment. 

The reason I didn't directly reused, as the calculation happens in different
places in the existing listener. But I noticed now that I am confusing with
llsize (in the old checks I have now removed). @end is not used in the check
that preceeds it, so I am moving this calculation into a helper. Presumably I
have a new preceeding patch where I have this vfio_get_section_iova_range()
added. and this patch just uses it.

> Also why do both of the existing listener
> implementations resolve the end address as:
> 
> 	int128_get64(int128_sub(llend, int128_one()));
> 
> While here we use:
> 
> 	int128_get64(llend) - 1;
> 
> We're already out of sync.
> 
True

>> +
>> +static void vfio_dirty_tracking_update(MemoryListener *listener,
>> +                                       MemoryRegionSection *section)
>> +{
>> +    VFIOContainer *container = container_of(listener, VFIOContainer,
>> +                                            tracking_listener);
>> +    VFIODirtyTrackingRange *range = &container->tracking_range;
>> +    hwaddr max32 = (1ULL << 32) - 1ULL;
> 
> UINT32_MAX
> 
/me nods

>> +    hwaddr iova, end;
>> +
>> +    if (!vfio_listener_valid_section(section) ||
>> +        !vfio_get_section_iova_range(container, section, &iova, &end)) {
>> +        return;
>> +    }
>> +
>> +    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
>> +        if (iova < max32 && end <= max32) {
>> +                if (range->min32 > iova) {
>> +                    range->min32 = iova;
>> +                }
>> +                if (range->max32 < end) {
>> +                    range->max32 = end;
>> +                }
>> +                trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                            range->min32, range->max32);
>> +        } else {
>> +                if (!range->min64 || range->min64 > iova) {
>> +                    range->min64 = iova;
>> +                }
> 
> I think this improperly handles a range starting at zero, min64 should
> be UINT64_MAX initially.  For example, if we have ranges 0-8GB and
> 12-16GB, this effectively ignores the first range.  Likewise I think
> range->min32 has a similar problem, it's initialized to zero, it will
> never be updated to match a non-zero lowest range.  It needs to be
> initialized to UINT32_MAX.
> 
Yes, let me switch to that. I'll place the min64/min32 to UINT64_MAX/UINT32_MAX
in the place where we initialize the state for the dma tracking listener.

> A comment describing the purpose of the 32/64 split tracking would be
> useful too.
> 

Yes, definitely e.g.

/*
 * The address space passed to the dirty tracker is reduced to two ranges: one
 * for 32-bit DMA ranges, and another one for 64-bit DMA ranges. The underlying
 * reports of dirty will query a sub-interval of each of these ranges. The
 * purpose of the dual range handling is to handle known cases of big holes in
 * the address space, like the x86 AMD 1T hole. The alternative would be an
 * IOVATree but that has a much bigger runtime overhead and unnecessary
 * complexity.
 */

>> +                if (range->max64 < end) {
>> +                    range->max64 = end;
>> +                }
>> +                trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                            range->min64, range->max64);
>> +        }
>> +    }
>> +    return;
>> +}
>> +
>> +static const MemoryListener vfio_dirty_tracking_listener = {
>> +    .name = "vfio-tracking",
>> +    .region_add = vfio_dirty_tracking_update,
>> +};
>> +
>> +static void vfio_dirty_tracking_init(VFIOContainer *container)
>> +{
>> +    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
>> +    qemu_mutex_init(&container->tracking_mutex);
> 
> As noted in other thread, this mutex seems unnecessary.
> 
Already deleted it for v4.

> The listener needs to be embedded in an object, but it doesn't need to
> be the container.  Couldn't we create:
> 
> typedef struct VFIODirtyRanges {
>     VFIOContainer *container;
>     VFIODirtyTrackingRange ranges;
>     MemoryListener listener;
> } VFIODirectRanges;
> 
> For use here?  Caller could provide VFIODirtyTrackingRange pointer for
> the resulting ranges, which then gets passed to
> vfio_device_feature_dma_logging_start_create()

Oh, that would be so much cleaner, yes. Will switch to that.

> 
>> +    container->tracking_listener = vfio_dirty_tracking_listener;
>> +    memory_listener_register(&container->tracking_listener,
>> +                             container->space->as);
>> +    memory_listener_unregister(&container->tracking_listener);
> 
> It's sufficiently subtle that the listener callback is synchronous and
> we're done with it here for a comment.
> 

Will add a comment e.g.:

 /*
  * The memory listener is synchronous, and used to calculate the range to dirty
  * tracking. Unregister it after we are done as we are not interesting in any
  * follow-up updates.
  */

>> +    qemu_mutex_destroy(&container->tracking_mutex);
>> +}
>> +
>>  static void vfio_listener_log_global_start(MemoryListener *listener)
>>  {
>>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>      int ret;
>>  
>> +    vfio_dirty_tracking_init(container);
>> +
>>      ret = vfio_set_dirty_page_tracking(container, true);
>>      if (ret) {
>>          vfio_set_migration_error(ret);
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 669d9fe07cd9..d97a6de17921 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>>  vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>>  vfio_disconnect_container(int fd) "close container->fd=%d"
>>  vfio_put_group(int fd) "close group->fd=%d"
>>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 87524c64a443..96791add2719 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -23,6 +23,7 @@
>>  
>>  #include "exec/memory.h"
>>  #include "qemu/queue.h"
>> +#include "qemu/iova-tree.h"
> 
> Unused.
> 
Had it removed already preemptively

>>  #include "qemu/notify.h"
>>  #include "ui/console.h"
>>  #include "hw/display/ramfb.h"
>> @@ -68,6 +69,13 @@ typedef struct VFIOMigration {
>>      size_t data_buffer_size;
>>  } VFIOMigration;
>>  
>> +typedef struct VFIODirtyTrackingRange {
>> +    hwaddr min32;
>> +    hwaddr max32;
>> +    hwaddr min64;
>> +    hwaddr max64;
>> +} VFIODirtyTrackingRange;
>> +
>>  typedef struct VFIOAddressSpace {
>>      AddressSpace *as;
>>      QLIST_HEAD(, VFIOContainer) containers;
>> @@ -89,6 +97,9 @@ typedef struct VFIOContainer {
>>      uint64_t max_dirty_bitmap_size;
>>      unsigned long pgsizes;
>>      unsigned int dma_max_mappings;
>> +    VFIODirtyTrackingRange tracking_range;
>> +    QemuMutex tracking_mutex;
>> +    MemoryListener tracking_listener;
> 
> Let's make this unnecessary too.  Thanks,
> 
Got it.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop
  2023-03-06 18:42   ` Alex Williamson
@ 2023-03-06 19:39     ` Joao Martins
  2023-03-06 20:00       ` Alex Williamson
  0 siblings, 1 reply; 51+ messages in thread
From: Joao Martins @ 2023-03-06 19:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On 06/03/2023 18:42, Alex Williamson wrote:
> On Sat,  4 Mar 2023 01:43:38 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> Add device dirty page tracking start/stop functionality. This uses the
>> device DMA logging uAPI to start and stop dirty page tracking by device.
>>
>> Device dirty page tracking is used only if all devices within a
>> container support device dirty page tracking.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  hw/vfio/common.c              | 166 +++++++++++++++++++++++++++++++++-
>>  hw/vfio/trace-events          |   1 +
>>  include/hw/vfio/vfio-common.h |   2 +
>>  3 files changed, 166 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index d84e5fd86bb4..aa0df0604704 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -453,6 +453,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>>      return true;
>>  }
>>  
>> +static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +
>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (!vbasedev->dirty_pages_supported) {
>> +                return false;
>> +            }
>> +        }
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>  /*
>>   * Check if all VFIO devices are running and migration is active, which is
>>   * essentially equivalent to the migration being in pre-copy phase.
>> @@ -1395,15 +1411,152 @@ static void vfio_dirty_tracking_init(VFIOContainer *container)
>>      qemu_mutex_destroy(&container->tracking_mutex);
>>  }
>>  
>> +static int vfio_devices_dma_logging_set(VFIOContainer *container,
>> +                                        struct vfio_device_feature *feature)
>> +{
>> +    bool status = (feature->flags & VFIO_DEVICE_FEATURE_MASK) ==
>> +                  VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
>> +    VFIODevice *vbasedev;
>> +    VFIOGroup *group;
>> +    int ret = 0;
>> +
>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (vbasedev->dirty_tracking == status) {
>> +                continue;
>> +            }
>> +
>> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
>> +            if (ret) {
>> +                ret = -errno;
>> +                error_report("%s: Failed to set DMA logging %s, err %d (%s)",
>> +                             vbasedev->name, status ? "start" : "stop", ret,
>> +                             strerror(errno));
>> +                goto out;
>> +            }
> 
> Exiting and returning an error on the first failure when starting dirty
> tracking makes sense.  Does that behavior still make sense for the stop
> path?  Maybe since we only support a single device it doesn't really
> matter, but this needs to be revisited for multiple devices.  Thanks,
> 

How about I test for @status and exit earlier based on that?
(maybe variable should be renamed too) e.g.

if (ret) {
  ret = -errno;
  error_report("%s: Failed to set DMA logging %s, err %d (%s)",
               vbasedev->name, status ? "start" : "stop", ret,
               strerror(errno))
  if (status) {
      goto out;
  }
}

> Alex
> 
>> +            vbasedev->dirty_tracking = status;
>> +        }
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>> +static int vfio_devices_dma_logging_stop(VFIOContainer *container)
>> +{
>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>> +                              sizeof(uint64_t))] = {};
>> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>> +
>> +    feature->argsz = sizeof(buf);
>> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
>> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
>> +
>> +    return vfio_devices_dma_logging_set(container, feature);
>> +}
>> +
>> +static struct vfio_device_feature *
>> +vfio_device_feature_dma_logging_start_create(VFIOContainer *container)
>> +{
>> +    struct vfio_device_feature *feature;
>> +    size_t feature_size;
>> +    struct vfio_device_feature_dma_logging_control *control;
>> +    struct vfio_device_feature_dma_logging_range *ranges;
>> +    VFIODirtyTrackingRange *tracking = &container->tracking_range;
>> +
>> +    feature_size = sizeof(struct vfio_device_feature) +
>> +                   sizeof(struct vfio_device_feature_dma_logging_control);
>> +    feature = g_try_malloc0(feature_size);
>> +    if (!feature) {
>> +        errno = ENOMEM;
>> +        return NULL;
>> +    }
>> +    feature->argsz = feature_size;
>> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
>> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
>> +
>> +    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
>> +    control->page_size = qemu_real_host_page_size();
>> +
>> +    /*
>> +     * DMA logging uAPI guarantees to support at least a number of ranges that
>> +     * fits into a single host kernel base page.
>> +     */
>> +    control->num_ranges = !!tracking->max32 + !!tracking->max64;
>> +    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
>> +                        control->num_ranges);
>> +    if (!ranges) {
>> +        g_free(feature);
>> +        errno = ENOMEM;
>> +
>> +        return NULL;
>> +    }
>> +
>> +    control->ranges = (__aligned_u64)ranges;
>> +    if (tracking->max32) {
>> +        ranges->iova = tracking->min32;
>> +        ranges->length = (tracking->max32 - tracking->min32) + 1;
>> +        ranges++;
>> +    }
>> +    if (tracking->max64) {
>> +        ranges->iova = tracking->min64;
>> +        ranges->length = (tracking->max64 - tracking->min64) + 1;
>> +    }
>> +
>> +    trace_vfio_device_dirty_tracking_start(control->num_ranges,
>> +                                           tracking->min32, tracking->max32,
>> +                                           tracking->min64, tracking->max64);
>> +
>> +    return feature;
>> +}
>> +
>> +static void vfio_device_feature_dma_logging_start_destroy(
>> +    struct vfio_device_feature *feature)
>> +{
>> +    struct vfio_device_feature_dma_logging_control *control =
>> +        (struct vfio_device_feature_dma_logging_control *)feature->data;
>> +    struct vfio_device_feature_dma_logging_range *ranges =
>> +        (struct vfio_device_feature_dma_logging_range *)control->ranges;
>> +
>> +    g_free(ranges);
>> +    g_free(feature);
>> +}
>> +
>> +static int vfio_devices_dma_logging_start(VFIOContainer *container)
>> +{
>> +    struct vfio_device_feature *feature;
>> +    int ret = 0;
>> +
>> +    vfio_dirty_tracking_init(container);
>> +    feature = vfio_device_feature_dma_logging_start_create(container);
>> +    if (!feature) {
>> +        return -errno;
>> +    }
>> +
>> +    ret = vfio_devices_dma_logging_set(container, feature);
>> +    if (ret) {
>> +        vfio_devices_dma_logging_stop(container);
>> +    }
>> +
>> +    vfio_device_feature_dma_logging_start_destroy(feature);
>> +
>> +    return ret;
>> +}
>> +
>>  static void vfio_listener_log_global_start(MemoryListener *listener)
>>  {
>>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>      int ret;
>>  
>> -    vfio_dirty_tracking_init(container);
>> +    if (vfio_devices_all_device_dirty_tracking(container)) {
>> +        ret = vfio_devices_dma_logging_start(container);
>> +    } else {
>> +        ret = vfio_set_dirty_page_tracking(container, true);
>> +    }
>>  
>> -    ret = vfio_set_dirty_page_tracking(container, true);
>>      if (ret) {
>> +        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
>> +                     ret, strerror(-ret));
>>          vfio_set_migration_error(ret);
>>      }
>>  }
>> @@ -1413,8 +1566,15 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>      int ret;
>>  
>> -    ret = vfio_set_dirty_page_tracking(container, false);
>> +    if (vfio_devices_all_device_dirty_tracking(container)) {
>> +        ret = vfio_devices_dma_logging_stop(container);
>> +    } else {
>> +        ret = vfio_set_dirty_page_tracking(container, false);
>> +    }
>> +
>>      if (ret) {
>> +        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
>> +                     ret, strerror(-ret));
>>          vfio_set_migration_error(ret);
>>      }
>>  }
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index d97a6de17921..7a7e0cfe5b23 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
>>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>>  vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>> +vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
>>  vfio_disconnect_container(int fd) "close container->fd=%d"
>>  vfio_put_group(int fd) "close group->fd=%d"
>>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 96791add2719..1cbbccd91e11 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -154,6 +154,8 @@ typedef struct VFIODevice {
>>      VFIOMigration *migration;
>>      Error *migration_blocker;
>>      OnOffAuto pre_copy_dirty_page_tracking;
>> +    bool dirty_pages_supported;
>> +    bool dirty_tracking;
>>  } VFIODevice;
>>  
>>  struct VFIODeviceOps {
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-06 17:23 ` Cédric Le Goater
@ 2023-03-06 19:41   ` Joao Martins
  2023-03-07  8:33   ` Avihai Horon
  1 sibling, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 19:41 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon, qemu-devel

On 06/03/2023 17:23, Cédric Le Goater wrote:
> On 3/4/23 02:43, Joao Martins wrote:
>> Hey,
>>
>> Presented herewith a series based on the basic VFIO migration protocol v2
>> implementation [1].
>>
>> It is split from its parent series[5] to solely focus on device dirty
>> page tracking. Device dirty page tracking allows the VFIO device to
>> record its DMAs and report them back when needed. This is part of VFIO
>> migration and is used during pre-copy phase of migration to track the
>> RAM pages that the device has written to and mark those pages dirty, so
>> they can later be re-sent to target.
>>
>> Device dirty page tracking uses the DMA logging uAPI to discover device
>> capabilities, to start and stop tracking, and to get dirty page bitmap
>> report. Extra details and uAPI definition can be found here [3].
>>
>> Device dirty page tracking operates in VFIOContainer scope. I.e., When
>> dirty tracking is started, stopped or dirty page report is queried, all
>> devices within a VFIOContainer are iterated and for each of them device
>> dirty page tracking is started, stopped or dirty page report is queried,
>> respectively.
>>
>> Device dirty page tracking is used only if all devices within a
>> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
>> used, and if that is not supported as well, memory is perpetually marked
>> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
>> support, the last two usually have the same effect of perpetually
>> marking all pages dirty.
>>
>> Normally, when asked to start dirty tracking, all the currently DMA
>> mapped ranges are tracked by device dirty page tracking. If using a
>> vIOMMU we block live migration. It's temporary and a separate series is
>> going to add support for it. Thus this series focus on getting the
>> ground work first.
>>
>> The series is organized as follows:
>>
>> - Patches 1-7: Fix bugs and do some preparatory work required prior to
>>    adding device dirty page tracking.
>> - Patches 8-10: Implement device dirty page tracking.
>> - Patch 11: Blocks live migration with vIOMMU.
>> - Patches 12-13 enable device dirty page tracking and document it.
>>
>> Comments, improvements as usual appreciated.
> 
> It would be helpful to have some feed back from Avihai on the new patches
> introduced in v3 or v4 before merging.
> 
I am gonna let him comment but Avihai is definitely looking/testing it too e.g.
one comment he mentioned to me that I have slated preemptively for v4 too is to
remove the 2 unnecessary iova-tree.h includes in patch 7 (given that I removed
the IOVATree need at all).

> Also, (being curious) did you test migration with a TCG guest ?
> 
KVM guests only.

	Joao


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU
  2023-03-04  1:43 ` [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU Joao Martins
  2023-03-06 17:00   ` Cédric Le Goater
@ 2023-03-06 19:42   ` Alex Williamson
  2023-03-06 23:10     ` Joao Martins
  1 sibling, 1 reply; 51+ messages in thread
From: Alex Williamson @ 2023-03-06 19:42 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On Sat,  4 Mar 2023 01:43:41 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> Migrating with vIOMMU will require either tracking maximum
> IOMMU supported address space (e.g. 39/48 address width on Intel)
> or range-track current mappings and dirty track the new ones
> post starting dirty tracking. This will be done as a separate
> series, so add a live migration blocker until that is fixed.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
>  hw/vfio/migration.c           |  6 +++++
>  include/hw/vfio/vfio-common.h |  2 ++
>  3 files changed, 59 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5b8456975e97..9b909f856722 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -365,6 +365,7 @@ bool vfio_mig_active(void)
>  }
>  
>  static Error *multiple_devices_migration_blocker;
> +static Error *giommu_migration_blocker;
>  
>  static unsigned int vfio_migratable_device_num(void)
>  {
> @@ -416,6 +417,56 @@ void vfio_unblock_multiple_devices_migration(void)
>      multiple_devices_migration_blocker = NULL;
>  }
>  
> +static unsigned int vfio_use_iommu_device_num(void)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    unsigned int device_num = 0;
> +
> +    QLIST_FOREACH(group, &vfio_group_list, next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (vbasedev->group->container->space->as !=
> +                                    &address_space_memory) {
> +                device_num++;
> +            }
> +        }
> +    }
> +
> +    return device_num;
> +}

I'm not sure why we're counting devices since nobody uses that data,
but couldn't this be even more simple and efficient by walking the
vfio_address_spaces list?

static bool vfio_viommu_preset(void)
{
    VFIOAddressSpace *space;

    QLIST_FOREACH(space, &vfio_address_spaces, list) {
        if (space->as != &address_space_memory) {
            return true;
        }
    }

    return false;
}

> +
> +int vfio_block_giommu_migration(Error **errp)
> +{
> +    int ret;
> +
> +    if (giommu_migration_blocker ||
> +        !vfio_use_iommu_device_num()) {
> +        return 0;
> +    }
> +
> +    error_setg(&giommu_migration_blocker,
> +               "Migration is currently not supported with vIOMMU enabled");
> +    ret = migrate_add_blocker(giommu_migration_blocker, errp);
> +    if (ret < 0) {
> +        error_free(giommu_migration_blocker);
> +        giommu_migration_blocker = NULL;
> +    }
> +
> +    return ret;
> +}
> +
> +void vfio_unblock_giommu_migration(void)
> +{
> +    if (!giommu_migration_blocker ||
> +        vfio_use_iommu_device_num()) {
> +        return;
> +    }
> +
> +    migrate_del_blocker(giommu_migration_blocker);
> +    error_free(giommu_migration_blocker);
> +    giommu_migration_blocker = NULL;
> +}
> +
>  static void vfio_set_migration_error(int err)
>  {
>      MigrationState *ms = migrate_get_current();
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index a2c3d9bade7f..3e75868ae7a9 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -634,6 +634,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>          return ret;
>      }
>  
> +    ret = vfio_block_giommu_migration(errp);
> +    if (ret) {
> +        return ret;
> +    }
> +
>      trace_vfio_migration_probe(vbasedev->name);
>      return 0;
>  
> @@ -659,6 +664,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>          unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
>          vfio_migration_exit(vbasedev);
>          vfio_unblock_multiple_devices_migration();
> +        vfio_unblock_giommu_migration();

Hmm, this actually gets called from vfio_exitfn(), doesn't all the
group, device, address space unlinking happen in
vfio_instance_finalize()?  Has this actually been tested to remove the
blocker?  And why is this a _finalize() function when it's called from
an exit callback?  Thanks,

Alex

>      }
>  
>      if (vbasedev->migration_blocker) {
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 1cbbccd91e11..38e44258925b 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -233,6 +233,8 @@ extern VFIOGroupList vfio_group_list;
>  bool vfio_mig_active(void);
>  int vfio_block_multiple_devices_migration(Error **errp);
>  void vfio_unblock_multiple_devices_migration(void);
> +int vfio_block_giommu_migration(Error **errp);
> +void vfio_unblock_giommu_migration(void);
>  int64_t vfio_mig_bytes_transferred(void);
>  
>  #ifdef CONFIG_LINUX



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 10/13] vfio/common: Add device dirty page bitmap sync
  2023-03-06 19:22   ` Alex Williamson
@ 2023-03-06 19:42     ` Joao Martins
  0 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 19:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon



On 06/03/2023 19:22, Alex Williamson wrote:
> On Sat,  4 Mar 2023 01:43:40 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> Add device dirty page bitmap sync functionality. This uses the device
>> DMA logging uAPI to sync dirty page bitmap from the device.
>>
>> Device dirty page bitmap sync is used only if all devices within a
>> container support device dirty page tracking.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  hw/vfio/common.c | 88 +++++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 79 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index b0c7d03279ab..5b8456975e97 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -342,6 +342,9 @@ static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
>>      return 0;
>>  }
>>  
>> +static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>> +                                 uint64_t size, ram_addr_t ram_addr);
>> +
>>  bool vfio_mig_active(void)
>>  {
>>      VFIOGroup *group;
>> @@ -565,10 +568,16 @@ static int vfio_dma_unmap(VFIOContainer *container,
>>          .iova = iova,
>>          .size = size,
>>      };
>> +    bool need_dirty_sync = false;
>> +    int ret;
>> +
>> +    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
>> +        if (!vfio_devices_all_device_dirty_tracking(container) &&
>> +            container->dirty_pages_supported) {
>> +            return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>> +        }
>>  
>> -    if (iotlb && container->dirty_pages_supported &&
>> -        vfio_devices_all_running_and_mig_active(container)) {
>> -        return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>> +        need_dirty_sync = true;
>>      }
>>  
>>      while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
>> @@ -594,10 +603,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
>>          return -errno;
>>      }
>>  
>> -    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
>> -        cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
>> -                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
>> -                                            DIRTY_CLIENTS_NOCODE);
>> +    if (need_dirty_sync) {
>> +        ret = vfio_get_dirty_bitmap(container, iova, size,
>> +                                    iotlb->translated_addr);
>> +        if (ret) {
>> +            return ret;
>> +        }
>>      }
>>  
>>      return 0;
>> @@ -1579,6 +1590,58 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>>      }
>>  }
>>  
>> +static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
>> +                                          hwaddr size, void *bitmap)
>> +{
>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
>> +                        sizeof(struct vfio_device_feature_dma_logging_report),
>> +                        sizeof(__aligned_u64))] = {};
>> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>> +    struct vfio_device_feature_dma_logging_report *report =
>> +        (struct vfio_device_feature_dma_logging_report *)feature->data;
>> +
>> +    report->iova = iova;
>> +    report->length = size;
>> +    report->page_size = qemu_real_host_page_size();
>> +    report->bitmap = (__aligned_u64)bitmap;
>> +
>> +    feature->argsz = sizeof(buf);
>> +    feature->flags =
>> +        VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
> 
> Nit, the series is inconsistent between initializing flags as above and
> as:
> 
>     feature->flags = VFIO_DEVICE_FEATURE_GET;
>     feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
> 
> My personal preference would be more like:
> 
>     feature->flags = VFIO_DEVICE_FEATURE_GET |
>                      VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
>

I'll switch the overall style to the latter.

> Thanks,
> Alex
> 
>> +
>> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> +        return -errno;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
>> +                                           VFIOBitmap *vbmap, hwaddr iova,
>> +                                           hwaddr size)
>> +{
>> +    VFIODevice *vbasedev;
>> +    VFIOGroup *group;
>> +    int ret;
>> +
>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            ret = vfio_device_dma_logging_report(vbasedev, iova, size,
>> +                                                 vbmap->bitmap);
>> +            if (ret) {
>> +                error_report("%s: Failed to get DMA logging report, iova: "
>> +                             "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
>> +                             ", err: %d (%s)",
>> +                             vbasedev->name, iova, size, ret, strerror(-ret));
>> +
>> +                return ret;
>> +            }
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
>>                                     hwaddr iova, hwaddr size)
>>  {
>> @@ -1619,10 +1682,12 @@ static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
>>  static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>>                                   uint64_t size, ram_addr_t ram_addr)
>>  {
>> +    bool all_device_dirty_tracking =
>> +        vfio_devices_all_device_dirty_tracking(container);
>>      VFIOBitmap vbmap;
>>      int ret;
>>  
>> -    if (!container->dirty_pages_supported) {
>> +    if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
>>          cpu_physical_memory_set_dirty_range(ram_addr, size,
>>                                              tcg_enabled() ? DIRTY_CLIENTS_ALL :
>>                                              DIRTY_CLIENTS_NOCODE);
>> @@ -1634,7 +1699,12 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>>          return -errno;
>>      }
>>  
>> -    ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
>> +    if (all_device_dirty_tracking) {
>> +        ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
>> +    } else {
>> +        ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
>> +    }
>> +
>>      if (ret) {
>>          goto out;
>>      }
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges
  2023-03-06 18:05   ` Cédric Le Goater
@ 2023-03-06 19:45     ` Joao Martins
  0 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 19:45 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon, qemu-devel



On 06/03/2023 18:05, Cédric Le Goater wrote:
> On 3/4/23 02:43, Joao Martins wrote:
>> According to the device DMA logging uAPI, IOVA ranges to be logged by
>> the device must be provided all at once upon DMA logging start.
>>
>> As preparation for the following patches which will add device dirty
>> page tracking, keep a record of all DMA mapped IOVA ranges so later they
>> can be used for DMA logging start.
>>
>> Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
>> This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> One question below,
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> 
> Thanks,
> 
> C.
> 
>> ---
>>   hw/vfio/common.c              | 84 +++++++++++++++++++++++++++++++++++
>>   hw/vfio/trace-events          |  1 +
>>   include/hw/vfio/vfio-common.h | 11 +++++
>>   3 files changed, 96 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index ed908e303dbb..d84e5fd86bb4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -44,6 +44,7 @@
>>   #include "migration/blocker.h"
>>   #include "migration/qemu-file.h"
>>   #include "sysemu/tpm.h"
>> +#include "qemu/iova-tree.h"
>>     VFIOGroupList vfio_group_list =
>>       QLIST_HEAD_INITIALIZER(vfio_group_list);
>> @@ -1313,11 +1314,94 @@ static int vfio_set_dirty_page_tracking(VFIOContainer
>> *container, bool start)
>>       return ret;
>>   }
>>   +/*
>> + * Called for the dirty tracking memory listener to calculate the iova/end
>> + * for a given memory region section. The checks here, replicate the logic
>> + * in vfio_listener_region_{add,del}() used for the same purpose. And thus
>> + * both listener should be kept in sync.
>> + */
>> +static bool vfio_get_section_iova_range(VFIOContainer *container,
>> +                                        MemoryRegionSection *section,
>> +                                        hwaddr *out_iova, hwaddr *out_end)
>> +{
>> +    Int128 llend;
>> +    hwaddr iova;
>> +
>> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>> +    llend = int128_make64(section->offset_within_address_space);
>> +    llend = int128_add(llend, section->size);
>> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
>> +
>> +    if (int128_ge(int128_make64(iova), llend)) {
>> +        return false;
>> +    }
>> +
>> +    *out_iova = iova;
>> +    *out_end = int128_get64(llend) - 1;
>> +    return true;
>> +}
>> +
>> +static void vfio_dirty_tracking_update(MemoryListener *listener,
>> +                                       MemoryRegionSection *section)
>> +{
>> +    VFIOContainer *container = container_of(listener, VFIOContainer,
>> +                                            tracking_listener);
>> +    VFIODirtyTrackingRange *range = &container->tracking_range;
>> +    hwaddr max32 = (1ULL << 32) - 1ULL;
>> +    hwaddr iova, end;
>> +
>> +    if (!vfio_listener_valid_section(section) ||
>> +        !vfio_get_section_iova_range(container, section, &iova, &end)) {
>> +        return;
>> +    }
>> +
>> +    WITH_QEMU_LOCK_GUARD(&container->tracking_mutex) {
>> +        if (iova < max32 && end <= max32) {
>> +                if (range->min32 > iova) {
> 
> With the memset(0) done in vfio_dirty_tracking_init(), min32 will always
> be 0. Is that OK ?
> 
While it's OK, it's making an assumption that it's zero-ed out. But Alex
comments will make this more clear (and cover all cases) and avoid assumption of
always having a range starting from 0.

>> +                    range->min32 = iova;
>> +                }
>> +                if (range->max32 < end) {
>> +                    range->max32 = end;
>> +                }
>> +                trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                            range->min32, range->max32);
>> +        } else {
>> +                if (!range->min64 || range->min64 > iova) {
>> +                    range->min64 = iova;
>> +                }
>> +                if (range->max64 < end) {
>> +                    range->max64 = end;
>> +                }
>> +                trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                            range->min64, range->max64);
>> +        }
>> +    }
>> +    return;
>> +}
>> +
>> +static const MemoryListener vfio_dirty_tracking_listener = {
>> +    .name = "vfio-tracking",
>> +    .region_add = vfio_dirty_tracking_update,
>> +};
>> +
>> +static void vfio_dirty_tracking_init(VFIOContainer *container)
>> +{
>> +    memset(&container->tracking_range, 0, sizeof(container->tracking_range));
>> +    qemu_mutex_init(&container->tracking_mutex);
>> +    container->tracking_listener = vfio_dirty_tracking_listener;
>> +    memory_listener_register(&container->tracking_listener,
>> +                             container->space->as);
>> +    memory_listener_unregister(&container->tracking_listener);
>> +    qemu_mutex_destroy(&container->tracking_mutex);
>> +}
>> +
>>   static void vfio_listener_log_global_start(MemoryListener *listener)
>>   {
>>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>       int ret;
>>   +    vfio_dirty_tracking_init(container);
>> +
>>       ret = vfio_set_dirty_page_tracking(container, true);
>>       if (ret) {
>>           vfio_set_migration_error(ret);
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 669d9fe07cd9..d97a6de17921 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t
>> iova, uint64_t offset_wi
>>   vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova,
>> uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64"
>> is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING
>> region_del 0x%"PRIx64" - 0x%"PRIx64
>>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del
>> 0x%"PRIx64" - 0x%"PRIx64
>> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min,
>> uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" -
>> 0x%"PRIx64"]"
>>   vfio_disconnect_container(int fd) "close container->fd=%d"
>>   vfio_put_group(int fd) "close group->fd=%d"
>>   vfio_get_device(const char * name, unsigned int flags, unsigned int
>> num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 87524c64a443..96791add2719 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -23,6 +23,7 @@
>>     #include "exec/memory.h"
>>   #include "qemu/queue.h"
>> +#include "qemu/iova-tree.h"
>>   #include "qemu/notify.h"
>>   #include "ui/console.h"
>>   #include "hw/display/ramfb.h"
>> @@ -68,6 +69,13 @@ typedef struct VFIOMigration {
>>       size_t data_buffer_size;
>>   } VFIOMigration;
>>   +typedef struct VFIODirtyTrackingRange {
>> +    hwaddr min32;
>> +    hwaddr max32;
>> +    hwaddr min64;
>> +    hwaddr max64;
>> +} VFIODirtyTrackingRange;
>> +
>>   typedef struct VFIOAddressSpace {
>>       AddressSpace *as;
>>       QLIST_HEAD(, VFIOContainer) containers;
>> @@ -89,6 +97,9 @@ typedef struct VFIOContainer {
>>       uint64_t max_dirty_bitmap_size;
>>       unsigned long pgsizes;
>>       unsigned int dma_max_mappings;
>> +    VFIODirtyTrackingRange tracking_range;
>> +    QemuMutex tracking_mutex;
>> +    MemoryListener tracking_listener;
>>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>>       QLIST_HEAD(, VFIOGroup) group_list;
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop
  2023-03-06 19:39     ` Joao Martins
@ 2023-03-06 20:00       ` Alex Williamson
  2023-03-06 23:12         ` Joao Martins
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Williamson @ 2023-03-06 20:00 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On Mon, 6 Mar 2023 19:39:15 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 06/03/2023 18:42, Alex Williamson wrote:
> > On Sat,  4 Mar 2023 01:43:38 +0000
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >   
> >> Add device dirty page tracking start/stop functionality. This uses the
> >> device DMA logging uAPI to start and stop dirty page tracking by device.
> >>
> >> Device dirty page tracking is used only if all devices within a
> >> container support device dirty page tracking.
> >>
> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> >> ---
> >>  hw/vfio/common.c              | 166 +++++++++++++++++++++++++++++++++-
> >>  hw/vfio/trace-events          |   1 +
> >>  include/hw/vfio/vfio-common.h |   2 +
> >>  3 files changed, 166 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index d84e5fd86bb4..aa0df0604704 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -453,6 +453,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
> >>      return true;
> >>  }
> >>  
> >> +static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
> >> +{
> >> +    VFIOGroup *group;
> >> +    VFIODevice *vbasedev;
> >> +
> >> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> >> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> >> +            if (!vbasedev->dirty_pages_supported) {
> >> +                return false;
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +    return true;
> >> +}
> >> +
> >>  /*
> >>   * Check if all VFIO devices are running and migration is active, which is
> >>   * essentially equivalent to the migration being in pre-copy phase.
> >> @@ -1395,15 +1411,152 @@ static void vfio_dirty_tracking_init(VFIOContainer *container)
> >>      qemu_mutex_destroy(&container->tracking_mutex);
> >>  }
> >>  
> >> +static int vfio_devices_dma_logging_set(VFIOContainer *container,
> >> +                                        struct vfio_device_feature *feature)
> >> +{
> >> +    bool status = (feature->flags & VFIO_DEVICE_FEATURE_MASK) ==
> >> +                  VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> >> +    VFIODevice *vbasedev;
> >> +    VFIOGroup *group;
> >> +    int ret = 0;
> >> +
> >> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> >> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> >> +            if (vbasedev->dirty_tracking == status) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> >> +            if (ret) {
> >> +                ret = -errno;
> >> +                error_report("%s: Failed to set DMA logging %s, err %d (%s)",
> >> +                             vbasedev->name, status ? "start" : "stop", ret,
> >> +                             strerror(errno));
> >> +                goto out;
> >> +            }  
> > 
> > Exiting and returning an error on the first failure when starting dirty
> > tracking makes sense.  Does that behavior still make sense for the stop
> > path?  Maybe since we only support a single device it doesn't really
> > matter, but this needs to be revisited for multiple devices.  Thanks,
> >   
> 
> How about I test for @status and exit earlier based on that?
> (maybe variable should be renamed too) e.g.
> 
> if (ret) {
>   ret = -errno;
>   error_report("%s: Failed to set DMA logging %s, err %d (%s)",
>                vbasedev->name, status ? "start" : "stop", ret,
>                strerror(errno))
>   if (status) {
>       goto out;
>   }
> }

Yep, exit on first error enabling, continue on disabling makes more
sense, but then we need to look at what return code makes sense for the
teardown.  TBH, a teardown function would typically return void, so
it's possible we'd be better off not using this for both.  Thanks,

Alex

PS - no further original comments on v3 from me.

> >> +            vbasedev->dirty_tracking = status;
> >> +        }
> >> +    }
> >> +
> >> +out:
> >> +    return ret;
> >> +}
> >> +
> >> +static int vfio_devices_dma_logging_stop(VFIOContainer *container)
> >> +{
> >> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
> >> +                              sizeof(uint64_t))] = {};
> >> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> >> +
> >> +    feature->argsz = sizeof(buf);
> >> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
> >> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
> >> +
> >> +    return vfio_devices_dma_logging_set(container, feature);
> >> +}
> >> +
> >> +static struct vfio_device_feature *
> >> +vfio_device_feature_dma_logging_start_create(VFIOContainer *container)
> >> +{
> >> +    struct vfio_device_feature *feature;
> >> +    size_t feature_size;
> >> +    struct vfio_device_feature_dma_logging_control *control;
> >> +    struct vfio_device_feature_dma_logging_range *ranges;
> >> +    VFIODirtyTrackingRange *tracking = &container->tracking_range;
> >> +
> >> +    feature_size = sizeof(struct vfio_device_feature) +
> >> +                   sizeof(struct vfio_device_feature_dma_logging_control);
> >> +    feature = g_try_malloc0(feature_size);
> >> +    if (!feature) {
> >> +        errno = ENOMEM;
> >> +        return NULL;
> >> +    }
> >> +    feature->argsz = feature_size;
> >> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
> >> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> >> +
> >> +    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
> >> +    control->page_size = qemu_real_host_page_size();
> >> +
> >> +    /*
> >> +     * DMA logging uAPI guarantees to support at least a number of ranges that
> >> +     * fits into a single host kernel base page.
> >> +     */
> >> +    control->num_ranges = !!tracking->max32 + !!tracking->max64;
> >> +    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
> >> +                        control->num_ranges);
> >> +    if (!ranges) {
> >> +        g_free(feature);
> >> +        errno = ENOMEM;
> >> +
> >> +        return NULL;
> >> +    }
> >> +
> >> +    control->ranges = (__aligned_u64)ranges;
> >> +    if (tracking->max32) {
> >> +        ranges->iova = tracking->min32;
> >> +        ranges->length = (tracking->max32 - tracking->min32) + 1;
> >> +        ranges++;
> >> +    }
> >> +    if (tracking->max64) {
> >> +        ranges->iova = tracking->min64;
> >> +        ranges->length = (tracking->max64 - tracking->min64) + 1;
> >> +    }
> >> +
> >> +    trace_vfio_device_dirty_tracking_start(control->num_ranges,
> >> +                                           tracking->min32, tracking->max32,
> >> +                                           tracking->min64, tracking->max64);
> >> +
> >> +    return feature;
> >> +}
> >> +
> >> +static void vfio_device_feature_dma_logging_start_destroy(
> >> +    struct vfio_device_feature *feature)
> >> +{
> >> +    struct vfio_device_feature_dma_logging_control *control =
> >> +        (struct vfio_device_feature_dma_logging_control *)feature->data;
> >> +    struct vfio_device_feature_dma_logging_range *ranges =
> >> +        (struct vfio_device_feature_dma_logging_range *)control->ranges;
> >> +
> >> +    g_free(ranges);
> >> +    g_free(feature);
> >> +}
> >> +
> >> +static int vfio_devices_dma_logging_start(VFIOContainer *container)
> >> +{
> >> +    struct vfio_device_feature *feature;
> >> +    int ret = 0;
> >> +
> >> +    vfio_dirty_tracking_init(container);
> >> +    feature = vfio_device_feature_dma_logging_start_create(container);
> >> +    if (!feature) {
> >> +        return -errno;
> >> +    }
> >> +
> >> +    ret = vfio_devices_dma_logging_set(container, feature);
> >> +    if (ret) {
> >> +        vfio_devices_dma_logging_stop(container);
> >> +    }
> >> +
> >> +    vfio_device_feature_dma_logging_start_destroy(feature);
> >> +
> >> +    return ret;
> >> +}
> >> +
> >>  static void vfio_listener_log_global_start(MemoryListener *listener)
> >>  {
> >>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> >>      int ret;
> >>  
> >> -    vfio_dirty_tracking_init(container);
> >> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> >> +        ret = vfio_devices_dma_logging_start(container);
> >> +    } else {
> >> +        ret = vfio_set_dirty_page_tracking(container, true);
> >> +    }
> >>  
> >> -    ret = vfio_set_dirty_page_tracking(container, true);
> >>      if (ret) {
> >> +        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
> >> +                     ret, strerror(-ret));
> >>          vfio_set_migration_error(ret);
> >>      }
> >>  }
> >> @@ -1413,8 +1566,15 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
> >>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> >>      int ret;
> >>  
> >> -    ret = vfio_set_dirty_page_tracking(container, false);
> >> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> >> +        ret = vfio_devices_dma_logging_stop(container);
> >> +    } else {
> >> +        ret = vfio_set_dirty_page_tracking(container, false);
> >> +    }
> >> +
> >>      if (ret) {
> >> +        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
> >> +                     ret, strerror(-ret));
> >>          vfio_set_migration_error(ret);
> >>      }
> >>  }
> >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> >> index d97a6de17921..7a7e0cfe5b23 100644
> >> --- a/hw/vfio/trace-events
> >> +++ b/hw/vfio/trace-events
> >> @@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
> >>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
> >>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
> >>  vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
> >> +vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
> >>  vfio_disconnect_container(int fd) "close container->fd=%d"
> >>  vfio_put_group(int fd) "close group->fd=%d"
> >>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index 96791add2719..1cbbccd91e11 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -154,6 +154,8 @@ typedef struct VFIODevice {
> >>      VFIOMigration *migration;
> >>      Error *migration_blocker;
> >>      OnOffAuto pre_copy_dirty_page_tracking;
> >> +    bool dirty_pages_supported;
> >> +    bool dirty_tracking;
> >>  } VFIODevice;
> >>  
> >>  struct VFIODeviceOps {  
> >   
> 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-06 11:05         ` Cédric Le Goater
@ 2023-03-06 21:19           ` Alex Williamson
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2023-03-06 21:19 UTC (permalink / raw)
  To: Joao Martins
  Cc: Cédric Le Goater, qemu-devel, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On Mon, 6 Mar 2023 12:05:06 +0100
Cédric Le Goater <clg@redhat.com> wrote:

> On 3/6/23 10:45, Joao Martins wrote:
> > On 06/03/2023 02:19, Alex Williamson wrote:  
> >> On Sun, 5 Mar 2023 23:33:35 +0000
> >> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>  
> >>> On 05/03/2023 20:57, Alex Williamson wrote:  
> >>>> On Sat,  4 Mar 2023 01:43:30 +0000
> >>>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>>      
> >>>>> Hey,
> >>>>>
> >>>>> Presented herewith a series based on the basic VFIO migration protocol v2
> >>>>> implementation [1].
> >>>>>
> >>>>> It is split from its parent series[5] to solely focus on device dirty
> >>>>> page tracking. Device dirty page tracking allows the VFIO device to
> >>>>> record its DMAs and report them back when needed. This is part of VFIO
> >>>>> migration and is used during pre-copy phase of migration to track the
> >>>>> RAM pages that the device has written to and mark those pages dirty, so
> >>>>> they can later be re-sent to target.
> >>>>>
> >>>>> Device dirty page tracking uses the DMA logging uAPI to discover device
> >>>>> capabilities, to start and stop tracking, and to get dirty page bitmap
> >>>>> report. Extra details and uAPI definition can be found here [3].
> >>>>>
> >>>>> Device dirty page tracking operates in VFIOContainer scope. I.e., When
> >>>>> dirty tracking is started, stopped or dirty page report is queried, all
> >>>>> devices within a VFIOContainer are iterated and for each of them device
> >>>>> dirty page tracking is started, stopped or dirty page report is queried,
> >>>>> respectively.
> >>>>>
> >>>>> Device dirty page tracking is used only if all devices within a
> >>>>> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
> >>>>> used, and if that is not supported as well, memory is perpetually marked
> >>>>> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
> >>>>> support, the last two usually have the same effect of perpetually
> >>>>> marking all pages dirty.
> >>>>>
> >>>>> Normally, when asked to start dirty tracking, all the currently DMA
> >>>>> mapped ranges are tracked by device dirty page tracking. If using a
> >>>>> vIOMMU we block live migration. It's temporary and a separate series is
> >>>>> going to add support for it. Thus this series focus on getting the
> >>>>> ground work first.
> >>>>>
> >>>>> The series is organized as follows:
> >>>>>
> >>>>> - Patches 1-7: Fix bugs and do some preparatory work required prior to
> >>>>>    adding device dirty page tracking.
> >>>>> - Patches 8-10: Implement device dirty page tracking.
> >>>>> - Patch 11: Blocks live migration with vIOMMU.
> >>>>> - Patches 12-13 enable device dirty page tracking and document it.
> >>>>>
> >>>>> Comments, improvements as usual appreciated.  
> >>>>
> >>>> Still some CI failures:
> >>>>
> >>>> https://gitlab.com/alex.williamson/qemu/-/pipelines/796657474
> >>>>
> >>>> The docker failures are normal, afaict the rest are not.  Thanks,
> >>>>      
> >>>
> >>> Ugh, sorry
> >>>
> >>> The patch below scissors mark (and also attached as a file) fixes those build
> >>> issues. I managed to reproduce on i386 target builds, and these changes fix my
> >>> 32-bit build.
> >>>
> >>> I don't have a working Gitlab setup[*] though to trigger the CI to enable to
> >>> wealth of targets it build-tests. If you could kindly test the patch attached in
> >>> a new pipeline (applied on top of the branch you just build) below to understand
> >>> if the CI gets happy. I will include these changes in the right patches (patch 8
> >>> and 10) for the v4 spin.  
> >>
> >> Looks like this passes:
> >>
> >> https://gitlab.com/alex.williamson/qemu/-/pipelines/796750136
> >>  
> > Great, I've staged this fixes in patches 8&10!
> > 
> > I have a sliver of hope that we might still make it by soft freeze (tomorrow?).
> > If you think it can still make it, should the rest of the series is good, then I
> > can follow up v4 today/tomorrow. Thoughts?  
> 
> I would say, wait and see if a v4 is needed first. These changes are
> relatively easy to fold in.

I think we have enough changes and fixes to post a v4 once you're happy
with it.  We should have tomorrow, the 7th to get final reviews and
post a pull request.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU
  2023-03-06 19:42   ` Alex Williamson
@ 2023-03-06 23:10     ` Joao Martins
  0 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 23:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 06/03/2023 19:42, Alex Williamson wrote:
> On Sat,  4 Mar 2023 01:43:41 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> Migrating with vIOMMU will require either tracking maximum
>> IOMMU supported address space (e.g. 39/48 address width on Intel)
>> or range-track current mappings and dirty track the new ones
>> post starting dirty tracking. This will be done as a separate
>> series, so add a live migration blocker until that is fixed.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
>>  hw/vfio/migration.c           |  6 +++++
>>  include/hw/vfio/vfio-common.h |  2 ++
>>  3 files changed, 59 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 5b8456975e97..9b909f856722 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -365,6 +365,7 @@ bool vfio_mig_active(void)
>>  }
>>  
>>  static Error *multiple_devices_migration_blocker;
>> +static Error *giommu_migration_blocker;
>>  
>>  static unsigned int vfio_migratable_device_num(void)
>>  {
>> @@ -416,6 +417,56 @@ void vfio_unblock_multiple_devices_migration(void)
>>      multiple_devices_migration_blocker = NULL;
>>  }
>>  
>> +static unsigned int vfio_use_iommu_device_num(void)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    unsigned int device_num = 0;
>> +
>> +    QLIST_FOREACH(group, &vfio_group_list, next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (vbasedev->group->container->space->as !=
>> +                                    &address_space_memory) {
>> +                device_num++;
>> +            }
>> +        }
>> +    }
>> +
>> +    return device_num;
>> +}
> 
> I'm not sure why we're counting devices since nobody uses that data,

My idea was to track in case some device is configured with bypass_iommu
PCI device option. But that would always be catched with same check below.

> but couldn't this be even more simple and efficient by walking the
> vfio_address_spaces list?
> 
Yes, or iterating groups too as Cedric suggested.

We don't care about number of devices per se, just whether *any* device is using
vIOMMU or not.

> static bool vfio_viommu_preset(void)
> {
>     VFIOAddressSpace *space;
> 
>     QLIST_FOREACH(space, &vfio_address_spaces, list) {
>         if (space->as != &address_space_memory) {
>             return true;
>         }
>     }
> 
>     return false;
> }
> 

Let me switch to the above.

>> +
>> +int vfio_block_giommu_migration(Error **errp)
>> +{
>> +    int ret;
>> +
>> +    if (giommu_migration_blocker ||
>> +        !vfio_use_iommu_device_num()) {
>> +        return 0;
>> +    }
>> +
>> +    error_setg(&giommu_migration_blocker,
>> +               "Migration is currently not supported with vIOMMU enabled");
>> +    ret = migrate_add_blocker(giommu_migration_blocker, errp);
>> +    if (ret < 0) {
>> +        error_free(giommu_migration_blocker);
>> +        giommu_migration_blocker = NULL;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +void vfio_unblock_giommu_migration(void)
>> +{
>> +    if (!giommu_migration_blocker ||
>> +        vfio_use_iommu_device_num()) {
>> +        return;
>> +    }
>> +
>> +    migrate_del_blocker(giommu_migration_blocker);
>> +    error_free(giommu_migration_blocker);
>> +    giommu_migration_blocker = NULL;
>> +}
>> +
>>  static void vfio_set_migration_error(int err)
>>  {
>>      MigrationState *ms = migrate_get_current();
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index a2c3d9bade7f..3e75868ae7a9 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -634,6 +634,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>>          return ret;
>>      }
>>  
>> +    ret = vfio_block_giommu_migration(errp);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>>      trace_vfio_migration_probe(vbasedev->name);
>>      return 0;
>>  
>> @@ -659,6 +664,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>>          unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
>>          vfio_migration_exit(vbasedev);
>>          vfio_unblock_multiple_devices_migration();
>> +        vfio_unblock_giommu_migration();
> 
> Hmm, this actually gets called from vfio_exitfn(), doesn't all the
> group, device, address space unlinking happen in
> vfio_instance_finalize()?  Has this actually been tested to remove the
> blocker?  And why is this a _finalize() function when it's called from
> an exit callback?  Thanks,
> 

I didn't test it correctly, clearly.

It doesn't work correctly as vfio_viommu_preset() always returns true and thus
it won't unblock, even after device deletion. I've moved
vfio_unblock_giommu_migration() into vfio_instance_finalize() like:

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 939dcc3d4a9e..30a271eab38c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3185,6 +3185,7 @@ static void vfio_instance_finalize(Object *obj)
      */
     vfio_put_device(vdev);
     vfio_put_group(group);
+    vfio_unblock_giommu_migration();
 }


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop
  2023-03-06 20:00       ` Alex Williamson
@ 2023-03-06 23:12         ` Joao Martins
  0 siblings, 0 replies; 51+ messages in thread
From: Joao Martins @ 2023-03-06 23:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon



On 06/03/2023 20:00, Alex Williamson wrote:
> On Mon, 6 Mar 2023 19:39:15 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> On 06/03/2023 18:42, Alex Williamson wrote:
>>> On Sat,  4 Mar 2023 01:43:38 +0000
>>> Joao Martins <joao.m.martins@oracle.com> wrote:
>>>   
>>>> Add device dirty page tracking start/stop functionality. This uses the
>>>> device DMA logging uAPI to start and stop dirty page tracking by device.
>>>>
>>>> Device dirty page tracking is used only if all devices within a
>>>> container support device dirty page tracking.
>>>>
>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>>  hw/vfio/common.c              | 166 +++++++++++++++++++++++++++++++++-
>>>>  hw/vfio/trace-events          |   1 +
>>>>  include/hw/vfio/vfio-common.h |   2 +
>>>>  3 files changed, 166 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index d84e5fd86bb4..aa0df0604704 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -453,6 +453,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>>>>      return true;
>>>>  }
>>>>  
>>>> +static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
>>>> +{
>>>> +    VFIOGroup *group;
>>>> +    VFIODevice *vbasedev;
>>>> +
>>>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>>>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>> +            if (!vbasedev->dirty_pages_supported) {
>>>> +                return false;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return true;
>>>> +}
>>>> +
>>>>  /*
>>>>   * Check if all VFIO devices are running and migration is active, which is
>>>>   * essentially equivalent to the migration being in pre-copy phase.
>>>> @@ -1395,15 +1411,152 @@ static void vfio_dirty_tracking_init(VFIOContainer *container)
>>>>      qemu_mutex_destroy(&container->tracking_mutex);
>>>>  }
>>>>  
>>>> +static int vfio_devices_dma_logging_set(VFIOContainer *container,
>>>> +                                        struct vfio_device_feature *feature)
>>>> +{
>>>> +    bool status = (feature->flags & VFIO_DEVICE_FEATURE_MASK) ==
>>>> +                  VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
>>>> +    VFIODevice *vbasedev;
>>>> +    VFIOGroup *group;
>>>> +    int ret = 0;
>>>> +
>>>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>>>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>> +            if (vbasedev->dirty_tracking == status) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
>>>> +            if (ret) {
>>>> +                ret = -errno;
>>>> +                error_report("%s: Failed to set DMA logging %s, err %d (%s)",
>>>> +                             vbasedev->name, status ? "start" : "stop", ret,
>>>> +                             strerror(errno));
>>>> +                goto out;
>>>> +            }  
>>>
>>> Exiting and returning an error on the first failure when starting dirty
>>> tracking makes sense.  Does that behavior still make sense for the stop
>>> path?  Maybe since we only support a single device it doesn't really
>>> matter, but this needs to be revisited for multiple devices.  Thanks,
>>>   
>>
>> How about I test for @status and exit earlier based on that?
>> (maybe variable should be renamed too) e.g.
>>
>> if (ret) {
>>   ret = -errno;
>>   error_report("%s: Failed to set DMA logging %s, err %d (%s)",
>>                vbasedev->name, status ? "start" : "stop", ret,
>>                strerror(errno))
>>   if (status) {
>>       goto out;
>>   }
>> }
> 
> Yep, exit on first error enabling, continue on disabling makes more
> sense, but then we need to look at what return code makes sense for the
> teardown.  TBH, a teardown function would typically return void, so
> it's possible we'd be better off not using this for both.  Thanks,
> 
Agreed. I can unroll set into _start and _stop and turn _stop into a void function.

> Alex
> 
> PS - no further original comments on v3 from me.
> 
OK

>>>> +            vbasedev->dirty_tracking = status;
>>>> +        }
>>>> +    }
>>>> +
>>>> +out:
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static int vfio_devices_dma_logging_stop(VFIOContainer *container)
>>>> +{
>>>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>>>> +                              sizeof(uint64_t))] = {};
>>>> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>>>> +
>>>> +    feature->argsz = sizeof(buf);
>>>> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
>>>> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
>>>> +
>>>> +    return vfio_devices_dma_logging_set(container, feature);
>>>> +}
>>>> +
>>>> +static struct vfio_device_feature *
>>>> +vfio_device_feature_dma_logging_start_create(VFIOContainer *container)
>>>> +{
>>>> +    struct vfio_device_feature *feature;
>>>> +    size_t feature_size;
>>>> +    struct vfio_device_feature_dma_logging_control *control;
>>>> +    struct vfio_device_feature_dma_logging_range *ranges;
>>>> +    VFIODirtyTrackingRange *tracking = &container->tracking_range;
>>>> +
>>>> +    feature_size = sizeof(struct vfio_device_feature) +
>>>> +                   sizeof(struct vfio_device_feature_dma_logging_control);
>>>> +    feature = g_try_malloc0(feature_size);
>>>> +    if (!feature) {
>>>> +        errno = ENOMEM;
>>>> +        return NULL;
>>>> +    }
>>>> +    feature->argsz = feature_size;
>>>> +    feature->flags = VFIO_DEVICE_FEATURE_SET;
>>>> +    feature->flags |= VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
>>>> +
>>>> +    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
>>>> +    control->page_size = qemu_real_host_page_size();
>>>> +
>>>> +    /*
>>>> +     * DMA logging uAPI guarantees to support at least a number of ranges that
>>>> +     * fits into a single host kernel base page.
>>>> +     */
>>>> +    control->num_ranges = !!tracking->max32 + !!tracking->max64;
>>>> +    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
>>>> +                        control->num_ranges);
>>>> +    if (!ranges) {
>>>> +        g_free(feature);
>>>> +        errno = ENOMEM;
>>>> +
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    control->ranges = (__aligned_u64)ranges;
>>>> +    if (tracking->max32) {
>>>> +        ranges->iova = tracking->min32;
>>>> +        ranges->length = (tracking->max32 - tracking->min32) + 1;
>>>> +        ranges++;
>>>> +    }
>>>> +    if (tracking->max64) {
>>>> +        ranges->iova = tracking->min64;
>>>> +        ranges->length = (tracking->max64 - tracking->min64) + 1;
>>>> +    }
>>>> +
>>>> +    trace_vfio_device_dirty_tracking_start(control->num_ranges,
>>>> +                                           tracking->min32, tracking->max32,
>>>> +                                           tracking->min64, tracking->max64);
>>>> +
>>>> +    return feature;
>>>> +}
>>>> +
>>>> +static void vfio_device_feature_dma_logging_start_destroy(
>>>> +    struct vfio_device_feature *feature)
>>>> +{
>>>> +    struct vfio_device_feature_dma_logging_control *control =
>>>> +        (struct vfio_device_feature_dma_logging_control *)feature->data;
>>>> +    struct vfio_device_feature_dma_logging_range *ranges =
>>>> +        (struct vfio_device_feature_dma_logging_range *)control->ranges;
>>>> +
>>>> +    g_free(ranges);
>>>> +    g_free(feature);
>>>> +}
>>>> +
>>>> +static int vfio_devices_dma_logging_start(VFIOContainer *container)
>>>> +{
>>>> +    struct vfio_device_feature *feature;
>>>> +    int ret = 0;
>>>> +
>>>> +    vfio_dirty_tracking_init(container);
>>>> +    feature = vfio_device_feature_dma_logging_start_create(container);
>>>> +    if (!feature) {
>>>> +        return -errno;
>>>> +    }
>>>> +
>>>> +    ret = vfio_devices_dma_logging_set(container, feature);
>>>> +    if (ret) {
>>>> +        vfio_devices_dma_logging_stop(container);
>>>> +    }
>>>> +
>>>> +    vfio_device_feature_dma_logging_start_destroy(feature);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>>  static void vfio_listener_log_global_start(MemoryListener *listener)
>>>>  {
>>>>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>>>      int ret;
>>>>  
>>>> -    vfio_dirty_tracking_init(container);
>>>> +    if (vfio_devices_all_device_dirty_tracking(container)) {
>>>> +        ret = vfio_devices_dma_logging_start(container);
>>>> +    } else {
>>>> +        ret = vfio_set_dirty_page_tracking(container, true);
>>>> +    }
>>>>  
>>>> -    ret = vfio_set_dirty_page_tracking(container, true);
>>>>      if (ret) {
>>>> +        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
>>>> +                     ret, strerror(-ret));
>>>>          vfio_set_migration_error(ret);
>>>>      }
>>>>  }
>>>> @@ -1413,8 +1566,15 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>>>>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>>>      int ret;
>>>>  
>>>> -    ret = vfio_set_dirty_page_tracking(container, false);
>>>> +    if (vfio_devices_all_device_dirty_tracking(container)) {
>>>> +        ret = vfio_devices_dma_logging_stop(container);
>>>> +    } else {
>>>> +        ret = vfio_set_dirty_page_tracking(container, false);
>>>> +    }
>>>> +
>>>>      if (ret) {
>>>> +        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
>>>> +                     ret, strerror(-ret));
>>>>          vfio_set_migration_error(ret);
>>>>      }
>>>>  }
>>>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>>>> index d97a6de17921..7a7e0cfe5b23 100644
>>>> --- a/hw/vfio/trace-events
>>>> +++ b/hw/vfio/trace-events
>>>> @@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
>>>>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>>>>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>>>>  vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>>>> +vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
>>>>  vfio_disconnect_container(int fd) "close container->fd=%d"
>>>>  vfio_put_group(int fd) "close group->fd=%d"
>>>>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>> index 96791add2719..1cbbccd91e11 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -154,6 +154,8 @@ typedef struct VFIODevice {
>>>>      VFIOMigration *migration;
>>>>      Error *migration_blocker;
>>>>      OnOffAuto pre_copy_dirty_page_tracking;
>>>> +    bool dirty_pages_supported;
>>>> +    bool dirty_tracking;
>>>>  } VFIODevice;
>>>>  
>>>>  struct VFIODeviceOps {  
>>>   
>>
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/13] vfio/migration: Device dirty page tracking
  2023-03-06 17:23 ` Cédric Le Goater
  2023-03-06 19:41   ` Joao Martins
@ 2023-03-07  8:33   ` Avihai Horon
  1 sibling, 0 replies; 51+ messages in thread
From: Avihai Horon @ 2023-03-07  8:33 UTC (permalink / raw)
  To: Cédric Le Goater, Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta


On 06/03/2023 19:23, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 3/4/23 02:43, Joao Martins wrote:
>> Hey,
>>
>> Presented herewith a series based on the basic VFIO migration 
>> protocol v2
>> implementation [1].
>>
>> It is split from its parent series[5] to solely focus on device dirty
>> page tracking. Device dirty page tracking allows the VFIO device to
>> record its DMAs and report them back when needed. This is part of VFIO
>> migration and is used during pre-copy phase of migration to track the
>> RAM pages that the device has written to and mark those pages dirty, so
>> they can later be re-sent to target.
>>
>> Device dirty page tracking uses the DMA logging uAPI to discover device
>> capabilities, to start and stop tracking, and to get dirty page bitmap
>> report. Extra details and uAPI definition can be found here [3].
>>
>> Device dirty page tracking operates in VFIOContainer scope. I.e., When
>> dirty tracking is started, stopped or dirty page report is queried, all
>> devices within a VFIOContainer are iterated and for each of them device
>> dirty page tracking is started, stopped or dirty page report is queried,
>> respectively.
>>
>> Device dirty page tracking is used only if all devices within a
>> VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
>> used, and if that is not supported as well, memory is perpetually marked
>> dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
>> support, the last two usually have the same effect of perpetually
>> marking all pages dirty.
>>
>> Normally, when asked to start dirty tracking, all the currently DMA
>> mapped ranges are tracked by device dirty page tracking. If using a
>> vIOMMU we block live migration. It's temporary and a separate series is
>> going to add support for it. Thus this series focus on getting the
>> ground work first.
>>
>> The series is organized as follows:
>>
>> - Patches 1-7: Fix bugs and do some preparatory work required prior to
>>    adding device dirty page tracking.
>> - Patches 8-10: Implement device dirty page tracking.
>> - Patch 11: Blocks live migration with vIOMMU.
>> - Patches 12-13 enable device dirty page tracking and document it.
>>
>> Comments, improvements as usual appreciated.
>
> It would be helpful to have some feed back from Avihai on the new patches
> introduced in v3 or v4 before merging.

Sure, will send it shortly.

Thanks.



^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2023-03-07  8:39 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-04  1:43 [PATCH v3 00/13] vfio/migration: Device dirty page tracking Joao Martins
2023-03-04  1:43 ` [PATCH v3 01/13] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
2023-03-04  1:43 ` [PATCH v3 02/13] vfio/common: Fix wrong %m usages Joao Martins
2023-03-04  1:43 ` [PATCH v3 03/13] vfio/common: Abort migration if dirty log start/stop/sync fails Joao Martins
2023-03-04  1:43 ` [PATCH v3 04/13] vfio/common: Add VFIOBitmap and alloc function Joao Martins
2023-03-06 13:20   ` Cédric Le Goater
2023-03-06 14:37     ` Joao Martins
2023-03-04  1:43 ` [PATCH v3 05/13] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
2023-03-06 13:24   ` Cédric Le Goater
2023-03-04  1:43 ` [PATCH v3 06/13] vfio/common: Consolidate skip/invalid section into helper Joao Martins
2023-03-06 13:33   ` Cédric Le Goater
2023-03-04  1:43 ` [PATCH v3 07/13] vfio/common: Record DMA mapped IOVA ranges Joao Martins
2023-03-06 13:41   ` Cédric Le Goater
2023-03-06 14:37     ` Joao Martins
2023-03-06 15:11       ` Alex Williamson
2023-03-06 18:05   ` Cédric Le Goater
2023-03-06 19:45     ` Joao Martins
2023-03-06 18:15   ` Alex Williamson
2023-03-06 19:32     ` Joao Martins
2023-03-04  1:43 ` [PATCH v3 08/13] vfio/common: Add device dirty page tracking start/stop Joao Martins
2023-03-06 18:25   ` Cédric Le Goater
2023-03-06 18:42   ` Alex Williamson
2023-03-06 19:39     ` Joao Martins
2023-03-06 20:00       ` Alex Williamson
2023-03-06 23:12         ` Joao Martins
2023-03-04  1:43 ` [PATCH v3 09/13] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Joao Martins
2023-03-06 16:24   ` Cédric Le Goater
2023-03-04  1:43 ` [PATCH v3 10/13] vfio/common: Add device dirty page bitmap sync Joao Martins
2023-03-06 19:22   ` Alex Williamson
2023-03-06 19:42     ` Joao Martins
2023-03-04  1:43 ` [PATCH v3 11/13] vfio/migration: Block migration with vIOMMU Joao Martins
2023-03-06 17:00   ` Cédric Le Goater
2023-03-06 17:04     ` Joao Martins
2023-03-06 19:42   ` Alex Williamson
2023-03-06 23:10     ` Joao Martins
2023-03-04  1:43 ` [PATCH v3 12/13] vfio/migration: Query device dirty page tracking support Joao Martins
2023-03-06 17:20   ` Cédric Le Goater
2023-03-04  1:43 ` [PATCH v3 13/13] docs/devel: Document VFIO device dirty page tracking Joao Martins
2023-03-06 17:15   ` Cédric Le Goater
2023-03-06 17:18     ` Joao Martins
2023-03-06 17:21       ` Joao Martins
2023-03-06 17:21       ` Cédric Le Goater
2023-03-05 20:57 ` [PATCH v3 00/13] vfio/migration: Device " Alex Williamson
2023-03-05 23:33   ` Joao Martins
2023-03-06  2:19     ` Alex Williamson
2023-03-06  9:45       ` Joao Martins
2023-03-06 11:05         ` Cédric Le Goater
2023-03-06 21:19           ` Alex Williamson
2023-03-06 17:23 ` Cédric Le Goater
2023-03-06 19:41   ` Joao Martins
2023-03-07  8:33   ` Avihai Horon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.