All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/14] vfio/migration: Device dirty page tracking
@ 2023-03-07  2:02 Joao Martins
  2023-03-07  2:02 ` [PATCH v4 01/14] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
                   ` (13 more replies)
  0 siblings, 14 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

Hey,

Presented herewith a series based on the basic VFIO migration protocol v2
implementation [1].

It is split from its parent series[5] to solely focus on device dirty
page tracking. Device dirty page tracking allows the VFIO device to
record its DMAs and report them back when needed. This is part of VFIO
migration and is used during pre-copy phase of migration to track the
RAM pages that the device has written to and mark those pages dirty, so
they can later be re-sent to target.

Device dirty page tracking uses the DMA logging uAPI to discover device
capabilities, to start and stop tracking, and to get dirty page bitmap
report. Extra details and uAPI definition can be found here [3].

Device dirty page tracking operates in VFIOContainer scope. I.e., When
dirty tracking is started, stopped or dirty page report is queried, all
devices within a VFIOContainer are iterated and for each of them device
dirty page tracking is started, stopped or dirty page report is queried,
respectively.

Device dirty page tracking is used only if all devices within a
VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
used, and if that is not supported as well, memory is perpetually marked
dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
support, the last two usually have the same effect of perpetually
marking all pages dirty.

Normally, when asked to start dirty tracking, all the currently DMA
mapped ranges are tracked by device dirty page tracking. If using a
vIOMMU we block live migration. It's temporary and a separate series is
going to add support for it. Thus this series focus on getting the
ground work first.

The series is organized as follows:

- Patches 1-7: Fix bugs and do some preparatory work required prior to
  adding device dirty page tracking.
- Patches 8-11: Implement device dirty page tracking.
- Patch 12: Blocks live migration with vIOMMU.
- Patches 13-14 Detect device dirty page tracking and document it.

Comments, improvements as usual appreciated.

Thanks,
	Joao

Changes from v3 [6]:
- Added Rbs in patches 4,5,6, 13,14;
 (Did not add the other because they suffered a lot of changes)
- Fix the unblocker of live migration by moving the vfio_unblock_giommu_migration
into vfio_instance_finalize()
- Refactor/Simplify the test for vIOMMU enabled
  (patch 12)
- Change the style of how we set features::flags
  (patch 9, 11)
- Return -ENOMEM in vfio_bitmap_alloc(), and change callsites to return
  ret instead of errno
  (patch 4)
- Remove iova-tree includes
- Initialize range min{32,64} to UINT{32,64}_MAX to better calculate the
  minimum range without assumptions.
- Add commentary into why we unregister the memory listener
- Add commentary about the dual-split of ranges
- Removed the mutex because the memory listener is all serialized
- Move out the vfio_section_get_iova_range() into its own patch and
  make vfio_listener_region_add() use it too.
- Add a VFIODirtyRanges struct which is allocated from the stack as
  opposed to being stored in the container and make the listener be
  registered with it.
- Remove stale paragraph from commit message
  (patch 8)
- Unroll vfio_device_dma_logging_set() to its own code in start() which
fails early and returns, and stop() which is void and we never return
early.

Changes from v2 [5]:
- Split initial dirty page tracking support from the parent series to
  split into smaller parts.
- Replace an IOVATree with a simple two range setup: one range for 32-bit
  another one for 64-bit address space. After discussions it was sorted out
  this way due to unnecessary complexity of IOVAtree while being more
  efficient too without stressing so much of the UAPI limits. (patch 7 and 8) 
- For now exclude vIOMMU, and so add a live migration blocker if a
  vIOMMU is passed in. This will be followed up with vIOMMU support in
  a separate series. (patch 10)
- Add new patches to reuse most helpers used across memory listeners.
  This is useful for reusal when recording DMA ranges.  (patch 5 and 6)
- Adjust Documentation to avoid mentioning the vIOMMU and instead
  claim that vIOMMU with device dirty page tracking is blocked. Cedric
  gave a Rb, but I've dropped taking into consideration the split and no
  vIOMMU support (patch 13)
- Improve VFIOBitmap to avoid allocating a 16byte structure to
  place it on the stack. Remove the free helper function. (patch 4)
- Fixing the compilation issues (patch 8 and 10). Possibly not 100%
  addressed as I am still working out the env to repro it.

Changes from v1 [4]:
- Rebased on latest master branch. As part of it, made some changes in
  pre-copy to adjust it to Juan's new patches:
  1. Added a new patch that passes threshold_size parameter to
     .state_pending_{estimate,exact}() handlers.
  2. Added a new patch that refactors vfio_save_block().
  3. Changed the pre-copy patch to cache and report pending pre-copy
     size in the .state_pending_estimate() handler.
- Removed unnecessary P2P code. This should be added later on when P2P
  support is added. (Alex)
- Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap()
  (patch #11). (Alex)
- Stored vfio_devices_all_device_dirty_tracking()'s value in a local
  variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11).
- Refactored the viommu device dirty tracking ranges creation code to
  make it clearer (patch #15).
- Changed overflow check in vfio_iommu_range_is_device_tracked() to
  emphasize that we specifically check for 2^64 wrap around (patch #15).
- Added R-bs / Acks.

[1] https://lore.kernel.org/qemu-devel/167658846945.932837.1420176491103357684.stgit@omen/
[2] https://lore.kernel.org/kvm/20221206083438.37807-3-yishaih@nvidia.com/
[3] https://lore.kernel.org/netdev/20220908183448.195262-4-yishaih@nvidia.com/
[4] https://lore.kernel.org/qemu-devel/20230126184948.10478-1-avihaih@nvidia.com/
[5] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
[6] https://lore.kernel.org/qemu-devel/20230304014343.33646-1-joao.m.martins@oracle.com/

Avihai Horon (6):
  vfio/common: Fix error reporting in vfio_get_dirty_bitmap()
  vfio/common: Fix wrong %m usages
  vfio/common: Abort migration if dirty log start/stop/sync fails
  vfio/common: Add VFIOBitmap and alloc function
  vfio/common: Extract code from vfio_get_dirty_bitmap() to new function
  docs/devel: Document VFIO device dirty page tracking

Joao Martins (8):
  vfio/common: Add helper to validate iova/end against hostwin
  vfio/common: Consolidate skip/invalid section into helper
  vfio/common: Add helper to consolidate iova/end calculation
  vfio/common: Record DMA mapped IOVA ranges
  vfio/common: Add device dirty page tracking start/stop
  vfio/common: Add device dirty page bitmap sync
  vfio/migration: Block migration with vIOMMU
  vfio/migration: Query device dirty page tracking support

 docs/devel/vfio-migration.rst |  46 ++-
 hw/vfio/common.c              | 685 ++++++++++++++++++++++++++++------
 hw/vfio/migration.c           |  20 +
 hw/vfio/pci.c                 |   1 +
 hw/vfio/trace-events          |   2 +
 include/hw/vfio/vfio-common.h |  17 +
 6 files changed, 634 insertions(+), 137 deletions(-)

-- 
2.17.2



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 01/14] vfio/common: Fix error reporting in vfio_get_dirty_bitmap()
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:02 ` [PATCH v4 02/14] vfio/common: Fix wrong %m usages Joao Martins
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

Return -errno instead of -1 if VFIO_IOMMU_DIRTY_PAGES ioctl fails in
vfio_get_dirty_bitmap().

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index bab83c0e55cb..9fc305448fa2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1335,6 +1335,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
     if (ret) {
+        ret = -errno;
         error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
                 " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
                 (uint64_t)range->size, errno);
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 02/14] vfio/common: Fix wrong %m usages
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
  2023-03-07  2:02 ` [PATCH v4 01/14] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:02 ` [PATCH v4 03/14] vfio/common: Abort migration if dirty log start/stop/sync fails Joao Martins
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

There are several places where the %m conversion is used if one of
vfio_dma_map(), vfio_dma_unmap() or vfio_get_dirty_bitmap() fail.

The %m usage in these places is wrong since %m relies on errno value while
the above functions don't report errors via errno.

Fix it by using strerror() with the returned value instead.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9fc305448fa2..4d26e9cccf91 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -703,17 +703,17 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                            read_only);
         if (ret) {
             error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         "0x%"HWADDR_PRIx", %p) = %d (%s)",
                          container, iova,
-                         iotlb->addr_mask + 1, vaddr, ret);
+                         iotlb->addr_mask + 1, vaddr, ret, strerror(-ret));
         }
     } else {
         ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
         if (ret) {
             error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         "0x%"HWADDR_PRIx") = %d (%s)",
                          container, iova,
-                         iotlb->addr_mask + 1, ret);
+                         iotlb->addr_mask + 1, ret, strerror(-ret));
         }
     }
 out:
@@ -1095,8 +1095,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
                        vaddr, section->readonly);
     if (ret) {
         error_setg(&err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                   "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                   container, iova, int128_get64(llsize), vaddr, ret);
+                   "0x%"HWADDR_PRIx", %p) = %d (%s)",
+                   container, iova, int128_get64(llsize), vaddr, ret,
+                   strerror(-ret));
         if (memory_region_is_ram_device(section->mr)) {
             /* Allow unexpected mappings not to be fatal for RAM devices */
             error_report_err(err);
@@ -1228,16 +1229,18 @@ static void vfio_listener_region_del(MemoryListener *listener,
             ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
             if (ret) {
                 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                             "0x%"HWADDR_PRIx") = %d (%m)",
-                             container, iova, int128_get64(llsize), ret);
+                             "0x%"HWADDR_PRIx") = %d (%s)",
+                             container, iova, int128_get64(llsize), ret,
+                             strerror(-ret));
             }
             iova += int128_get64(llsize);
         }
         ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
         if (ret) {
             error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
-                         container, iova, int128_get64(llsize), ret);
+                         "0x%"HWADDR_PRIx") = %d (%s)",
+                         container, iova, int128_get64(llsize), ret,
+                         strerror(-ret));
         }
     }
 
@@ -1384,9 +1387,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                                     translated_addr);
         if (ret) {
             error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
-                         container, iova,
-                         iotlb->addr_mask + 1, ret);
+                         "0x%"HWADDR_PRIx") = %d (%s)",
+                         container, iova, iotlb->addr_mask + 1, ret,
+                         strerror(-ret));
         }
     }
     rcu_read_unlock();
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 03/14] vfio/common: Abort migration if dirty log start/stop/sync fails
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
  2023-03-07  2:02 ` [PATCH v4 01/14] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
  2023-03-07  2:02 ` [PATCH v4 02/14] vfio/common: Fix wrong %m usages Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:02 ` [PATCH v4 04/14] vfio/common: Add VFIOBitmap and alloc function Joao Martins
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

If VFIO dirty pages log start/stop/sync fails during migration,
migration should be aborted as pages dirtied by VFIO devices might not
be reported properly.

This is not the case today, where in such scenario only an error is
printed.

Fix it by aborting migration in the above scenario.

Fixes: 758b96b61d5c ("vfio/migrate: Move switch of dirty tracking into vfio_memory_listener")
Fixes: b6dd6504e303 ("vfio: Add vfio_listener_log_sync to mark dirty pages")
Fixes: 9e7b0442f23a ("vfio: Add ioctl to get dirty pages bitmap during dma unmap")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4d26e9cccf91..4c801513136a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -42,6 +42,7 @@
 #include "migration/migration.h"
 #include "migration/misc.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file.h"
 #include "sysemu/tpm.h"
 
 VFIOGroupList vfio_group_list =
@@ -390,6 +391,19 @@ void vfio_unblock_multiple_devices_migration(void)
     multiple_devices_migration_blocker = NULL;
 }
 
+static void vfio_set_migration_error(int err)
+{
+    MigrationState *ms = migrate_get_current();
+
+    if (migration_is_setup_or_active(ms->state)) {
+        WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
+            if (ms->to_dst_file) {
+                qemu_file_set_error(ms->to_dst_file, err);
+            }
+        }
+    }
+}
+
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 {
     VFIOGroup *group;
@@ -680,6 +694,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     if (iotlb->target_as != &address_space_memory) {
         error_report("Wrong target AS \"%s\", only system memory is allowed",
                      iotlb->target_as->name ? iotlb->target_as->name : "none");
+        vfio_set_migration_error(-EINVAL);
         return;
     }
 
@@ -714,6 +729,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                          "0x%"HWADDR_PRIx") = %d (%s)",
                          container, iova,
                          iotlb->addr_mask + 1, ret, strerror(-ret));
+            vfio_set_migration_error(ret);
         }
     }
 out:
@@ -1259,7 +1275,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
     }
 }
 
-static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
+static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
 {
     int ret;
     struct vfio_iommu_type1_dirty_bitmap dirty = {
@@ -1267,7 +1283,7 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
     };
 
     if (!container->dirty_pages_supported) {
-        return;
+        return 0;
     }
 
     if (start) {
@@ -1278,23 +1294,34 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
     if (ret) {
+        ret = -errno;
         error_report("Failed to set dirty tracking flag 0x%x errno: %d",
                      dirty.flags, errno);
     }
+
+    return ret;
 }
 
 static void vfio_listener_log_global_start(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    int ret;
 
-    vfio_set_dirty_page_tracking(container, true);
+    ret = vfio_set_dirty_page_tracking(container, true);
+    if (ret) {
+        vfio_set_migration_error(ret);
+    }
 }
 
 static void vfio_listener_log_global_stop(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    int ret;
 
-    vfio_set_dirty_page_tracking(container, false);
+    ret = vfio_set_dirty_page_tracking(container, false);
+    if (ret) {
+        vfio_set_migration_error(ret);
+    }
 }
 
 static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
@@ -1370,19 +1397,18 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     VFIOContainer *container = giommu->container;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     ram_addr_t translated_addr;
+    int ret = -EINVAL;
 
     trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);
 
     if (iotlb->target_as != &address_space_memory) {
         error_report("Wrong target AS \"%s\", only system memory is allowed",
                      iotlb->target_as->name ? iotlb->target_as->name : "none");
-        return;
+        goto out;
     }
 
     rcu_read_lock();
     if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
-        int ret;
-
         ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
                                     translated_addr);
         if (ret) {
@@ -1393,6 +1419,11 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
         }
     }
     rcu_read_unlock();
+
+out:
+    if (ret) {
+        vfio_set_migration_error(ret);
+    }
 }
 
 static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
@@ -1485,13 +1516,19 @@ static void vfio_listener_log_sync(MemoryListener *listener,
         MemoryRegionSection *section)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    int ret;
 
     if (vfio_listener_skipped_section(section)) {
         return;
     }
 
     if (vfio_devices_all_dirty_tracking(container)) {
-        vfio_sync_dirty_bitmap(container, section);
+        ret = vfio_sync_dirty_bitmap(container, section);
+        if (ret) {
+            error_report("vfio: Failed to sync dirty bitmap, err: %d (%s)", ret,
+                         strerror(-ret));
+            vfio_set_migration_error(ret);
+        }
     }
 }
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 04/14] vfio/common: Add VFIOBitmap and alloc function
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (2 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 03/14] vfio/common: Abort migration if dirty log start/stop/sync fails Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  8:49   ` Avihai Horon
  2023-03-07  2:02 ` [PATCH v4 05/14] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

There are already two places where dirty page bitmap allocation and
calculations are done in open code. With device dirty page tracking
being added in next patches, there are going to be even more places.

To avoid code duplication, introduce VFIOBitmap struct and corresponding
alloc function and use them where applicable.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 73 +++++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4c801513136a..cec3de08d2b4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -320,6 +320,25 @@ const MemoryRegionOps vfio_region_ops = {
  * Device state interfaces
  */
 
+typedef struct {
+    unsigned long *bitmap;
+    hwaddr size;
+    hwaddr pages;
+} VFIOBitmap;
+
+static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
+{
+    vbmap->pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
+    vbmap->size = ROUND_UP(vbmap->pages, sizeof(__u64) * BITS_PER_BYTE) /
+                                         BITS_PER_BYTE;
+    vbmap->bitmap = g_try_malloc0(vbmap->size);
+    if (!vbmap->bitmap) {
+        return -ENOMEM;
+    }
+
+    return 0;
+}
+
 bool vfio_mig_active(void)
 {
     VFIOGroup *group;
@@ -468,9 +487,14 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
 {
     struct vfio_iommu_type1_dma_unmap *unmap;
     struct vfio_bitmap *bitmap;
-    uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
+    VFIOBitmap vbmap;
     int ret;
 
+    ret = vfio_bitmap_alloc(&vbmap, size);
+    if (ret) {
+        return ret;
+    }
+
     unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
 
     unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
@@ -484,35 +508,28 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
      * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize
      * to qemu_real_host_page_size.
      */
-
     bitmap->pgsize = qemu_real_host_page_size();
-    bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
-                   BITS_PER_BYTE;
+    bitmap->size = vbmap.size;
+    bitmap->data = (__u64 *)vbmap.bitmap;
 
-    if (bitmap->size > container->max_dirty_bitmap_size) {
-        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
-                     (uint64_t)bitmap->size);
+    if (vbmap.size > container->max_dirty_bitmap_size) {
+        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
         ret = -E2BIG;
         goto unmap_exit;
     }
 
-    bitmap->data = g_try_malloc0(bitmap->size);
-    if (!bitmap->data) {
-        ret = -ENOMEM;
-        goto unmap_exit;
-    }
-
     ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
     if (!ret) {
-        cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
-                iotlb->translated_addr, pages);
+        cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap,
+                iotlb->translated_addr, vbmap.pages);
     } else {
         error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
     }
 
-    g_free(bitmap->data);
 unmap_exit:
     g_free(unmap);
+    g_free(vbmap.bitmap);
+
     return ret;
 }
 
@@ -1329,7 +1346,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
 {
     struct vfio_iommu_type1_dirty_bitmap *dbitmap;
     struct vfio_iommu_type1_dirty_bitmap_get *range;
-    uint64_t pages;
+    VFIOBitmap vbmap;
     int ret;
 
     if (!container->dirty_pages_supported) {
@@ -1339,6 +1356,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         return 0;
     }
 
+    ret = vfio_bitmap_alloc(&vbmap, size);
+    if (ret) {
+        return ret;
+    }
+
     dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
     dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1353,15 +1375,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
      * to qemu_real_host_page_size.
      */
     range->bitmap.pgsize = qemu_real_host_page_size();
-
-    pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size();
-    range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
-                                         BITS_PER_BYTE;
-    range->bitmap.data = g_try_malloc0(range->bitmap.size);
-    if (!range->bitmap.data) {
-        ret = -ENOMEM;
-        goto err_out;
-    }
+    range->bitmap.size = vbmap.size;
+    range->bitmap.data = (__u64 *)vbmap.bitmap;
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
     if (ret) {
@@ -1372,14 +1387,14 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         goto err_out;
     }
 
-    cpu_physical_memory_set_dirty_lebitmap((unsigned long *)range->bitmap.data,
-                                            ram_addr, pages);
+    cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
+                                           vbmap.pages);
 
     trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
                                 range->bitmap.size, ram_addr);
 err_out:
-    g_free(range->bitmap.data);
     g_free(dbitmap);
+    g_free(vbmap.bitmap);
 
     return ret;
 }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 05/14] vfio/common: Add helper to validate iova/end against hostwin
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (3 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 04/14] vfio/common: Add VFIOBitmap and alloc function Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  8:57   ` Avihai Horon
  2023-03-07  2:02 ` [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper Joao Martins
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

In preparation to be used in device dirty tracking, move the code that
finds the container host DMA window against a iova range.  This avoids
duplication on the common checks across listener callbacks.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cec3de08d2b4..99acb998eb14 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -901,6 +901,22 @@ static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
     g_free(vrdl);
 }
 
+static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
+                                            hwaddr iova, hwaddr end)
+{
+    VFIOHostDMAWindow *hostwin;
+    bool hostwin_found = false;
+
+    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+            hostwin_found = true;
+            break;
+        }
+    }
+
+    return hostwin_found ? hostwin : NULL;
+}
+
 static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
 {
     MemoryRegion *mr = section->mr;
@@ -926,7 +942,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
     void *vaddr;
     int ret;
     VFIOHostDMAWindow *hostwin;
-    bool hostwin_found;
     Error *err = NULL;
 
     if (vfio_listener_skipped_section(section)) {
@@ -1027,15 +1042,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
 #endif
     }
 
-    hostwin_found = false;
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-            hostwin_found = true;
-            break;
-        }
-    }
-
-    if (!hostwin_found) {
+    hostwin = vfio_find_hostwin(container, iova, end);
+    if (!hostwin) {
         error_setg(&err, "Container %p can't map guest IOVA region"
                    " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
         goto fail;
@@ -1237,15 +1245,9 @@ static void vfio_listener_region_del(MemoryListener *listener,
     if (memory_region_is_ram_device(section->mr)) {
         hwaddr pgmask;
         VFIOHostDMAWindow *hostwin;
-        bool hostwin_found = false;
 
-        QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-            if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-                hostwin_found = true;
-                break;
-            }
-        }
-        assert(hostwin_found); /* or region_add() would have failed */
+        hostwin = vfio_find_hostwin(container, iova, end);
+        assert(hostwin); /* or region_add() would have failed */
 
         pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
         try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (4 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 05/14] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  9:13   ` Avihai Horon
  2023-03-07  2:02 ` [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation Joao Martins
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

The checks are replicated against region_add and region_del
and will be soon added in another memory listener dedicated
for dirty tracking.

Move these into a new helper for avoid duplication.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 31 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 99acb998eb14..54b4a4fc7daf 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -933,23 +933,14 @@ static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
     return true;
 }
 
-static void vfio_listener_region_add(MemoryListener *listener,
-                                     MemoryRegionSection *section)
+static bool vfio_listener_valid_section(MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-    hwaddr iova, end;
-    Int128 llend, llsize;
-    void *vaddr;
-    int ret;
-    VFIOHostDMAWindow *hostwin;
-    Error *err = NULL;
-
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_add_skip(
                 section->offset_within_address_space,
                 section->offset_within_address_space +
                 int128_get64(int128_sub(section->size, int128_one())));
-        return;
+        return false;
     }
 
     if (unlikely((section->offset_within_address_space &
@@ -964,6 +955,24 @@ static void vfio_listener_region_add(MemoryListener *listener,
                          section->offset_within_region,
                          qemu_real_host_page_size());
         }
+        return false;
+    }
+
+    return true;
+}
+
+static void vfio_listener_region_add(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    hwaddr iova, end;
+    Int128 llend, llsize;
+    void *vaddr;
+    int ret;
+    VFIOHostDMAWindow *hostwin;
+    Error *err = NULL;
+
+    if (!vfio_listener_valid_section(section)) {
         return;
     }
 
@@ -1182,26 +1191,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
     int ret;
     bool try_unmap = true;
 
-    if (vfio_listener_skipped_section(section)) {
-        trace_vfio_listener_region_del_skip(
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space &
-                  ~qemu_real_host_page_mask()) !=
-                 (section->offset_within_region & ~qemu_real_host_page_mask()))) {
-        if (!vfio_known_safe_misalignment(section)) {
-            error_report("%s received unaligned region %s iova=0x%"PRIx64
-                         " offset_within_region=0x%"PRIx64
-                         " qemu_real_host_page_size=0x%"PRIxPTR,
-                         __func__, memory_region_name(section->mr),
-                         section->offset_within_address_space,
-                         section->offset_within_region,
-                         qemu_real_host_page_size());
-        }
+    if (!vfio_listener_valid_section(section)) {
         return;
     }
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (5 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:40   ` Alex Williamson
  2023-03-07  9:52   ` Avihai Horon
  2023-03-07  2:02 ` [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges Joao Martins
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

In preparation to be used in device dirty tracking, move the code that
calculates a iova/end range from the container/section.  This avoids
duplication on the common checks across listener callbacks.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 54b4a4fc7daf..3a6491dbc523 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -961,6 +961,35 @@ static bool vfio_listener_valid_section(MemoryRegionSection *section)
     return true;
 }
 
+/*
+ * Called for the dirty tracking memory listener to calculate the iova/end
+ * for a given memory region section.
+ */
+static bool vfio_get_section_iova_range(VFIOContainer *container,
+                                        MemoryRegionSection *section,
+                                        hwaddr *out_iova, hwaddr *out_end,
+                                        Int128 *out_llend)
+{
+    Int128 llend;
+    hwaddr iova;
+
+    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
+
+    if (int128_ge(int128_make64(iova), llend)) {
+        return false;
+    }
+
+    *out_iova = iova;
+    *out_end = int128_get64(int128_sub(llend, int128_one()));
+    if (out_llend) {
+        *out_llend = llend;
+    }
+    return true;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
@@ -976,12 +1005,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
-    llend = int128_make64(section->offset_within_address_space);
-    llend = int128_add(llend, section->size);
-    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
-
-    if (int128_ge(int128_make64(iova), llend)) {
+    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
         if (memory_region_is_ram_device(section->mr)) {
             trace_vfio_listener_region_add_no_dma_map(
                 memory_region_name(section->mr),
@@ -991,7 +1015,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
         }
         return;
     }
-    end = int128_get64(int128_sub(llend, int128_one()));
 
     if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
         hwaddr pgsize = 0;
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (6 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:57   ` Alex Williamson
  2023-03-07  2:02 ` [PATCH v4 09/14] vfio/common: Add device dirty page tracking start/stop Joao Martins
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

According to the device DMA logging uAPI, IOVA ranges to be logged by
the device must be provided all at once upon DMA logging start.

As preparation for the following patches which will add device dirty
page tracking, keep a record of all DMA mapped IOVA ranges so later they
can be used for DMA logging start.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 76 +++++++++++++++++++++++++++++++++++
 hw/vfio/trace-events          |  1 +
 include/hw/vfio/vfio-common.h | 13 ++++++
 3 files changed, 90 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3a6491dbc523..a9b1fc999121 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1334,11 +1334,87 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
     return ret;
 }
 
+static void vfio_dirty_tracking_update(MemoryListener *listener,
+                                       MemoryRegionSection *section)
+{
+    VFIODirtyRanges *dirty = container_of(listener, VFIODirtyRanges, listener);
+    VFIODirtyTrackingRange *range = &dirty->ranges;
+    hwaddr max32 = UINT32_MAX - 1ULL;
+    hwaddr iova, end;
+
+    if (!vfio_listener_valid_section(section) ||
+        !vfio_get_section_iova_range(dirty->container, section,
+                                     &iova, &end, NULL)) {
+        return;
+    }
+
+    /*
+     * The address space passed to the dirty tracker is reduced to two ranges:
+     * one for 32-bit DMA ranges, and another one for 64-bit DMA ranges.
+     * The underlying reports of dirty will query a sub-interval of each of
+     * these ranges.
+     *
+     * The purpose of the dual range handling is to handle known cases of big
+     * holes in the address space, like the x86 AMD 1T hole. The alternative
+     * would be an IOVATree but that has a much bigger runtime overhead and
+     * unnecessary complexity.
+     */
+    if (iova < max32 && end <= max32) {
+        if (range->min32 > iova) {
+            range->min32 = iova;
+        }
+        if (range->max32 < end) {
+            range->max32 = end;
+        }
+        trace_vfio_device_dirty_tracking_update(iova, end,
+                                    range->min32, range->max32);
+    } else {
+        if (!range->min64 || range->min64 > iova) {
+            range->min64 = iova;
+        }
+        if (range->max64 < end) {
+            range->max64 = end;
+        }
+        trace_vfio_device_dirty_tracking_update(iova, end,
+                                    range->min64, range->max64);
+    }
+
+    return;
+}
+
+static const MemoryListener vfio_dirty_tracking_listener = {
+    .name = "vfio-tracking",
+    .region_add = vfio_dirty_tracking_update,
+};
+
+static void vfio_dirty_tracking_init(VFIOContainer *container,
+                                     VFIODirtyRanges *dirty)
+{
+    memset(dirty, 0, sizeof(*dirty));
+    dirty->ranges.min32 = UINT32_MAX;
+    dirty->ranges.min64 = UINT64_MAX;
+    dirty->listener = vfio_dirty_tracking_listener;
+    dirty->container = container;
+
+    memory_listener_register(&dirty->listener,
+                             container->space->as);
+
+    /*
+     * The memory listener is synchronous, and used to calculate the range
+     * to dirty tracking. Unregister it after we are done as we are not
+     * interested in any follow-up updates.
+     */
+    memory_listener_unregister(&dirty->listener);
+}
+
 static void vfio_listener_log_global_start(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    VFIODirtyRanges dirty;
     int ret;
 
+    vfio_dirty_tracking_init(container, &dirty);
+
     ret = vfio_set_dirty_page_tracking(container, true);
     if (ret) {
         vfio_set_migration_error(ret);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 669d9fe07cd9..d97a6de17921 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
 vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
+vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 87524c64a443..0f84136cceb5 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -96,6 +96,19 @@ typedef struct VFIOContainer {
     QLIST_ENTRY(VFIOContainer) next;
 } VFIOContainer;
 
+typedef struct VFIODirtyTrackingRange {
+    hwaddr min32;
+    hwaddr max32;
+    hwaddr min64;
+    hwaddr max64;
+} VFIODirtyTrackingRange;
+
+typedef struct VFIODirtyRanges {
+    VFIOContainer *container;
+    VFIODirtyTrackingRange ranges;
+    MemoryListener listener;
+} VFIODirtyRanges;
+
 typedef struct VFIOGuestIOMMU {
     VFIOContainer *container;
     IOMMUMemoryRegion *iommu_mr;
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 09/14] vfio/common: Add device dirty page tracking start/stop
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (7 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07 10:14   ` Avihai Horon
  2023-03-07  2:02 ` [PATCH v4 10/14] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Joao Martins
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

Add device dirty page tracking start/stop functionality. This uses the
device DMA logging uAPI to start and stop dirty page tracking by device.

Device dirty page tracking is used only if all devices within a
container support device dirty page tracking.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 175 +++++++++++++++++++++++++++++++++-
 hw/vfio/trace-events          |   1 +
 include/hw/vfio/vfio-common.h |   2 +
 3 files changed, 173 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a9b1fc999121..a42f5f1e7ffe 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -450,6 +450,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
     return true;
 }
 
+static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(group, &container->group_list, container_next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (!vbasedev->dirty_pages_supported) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
 /*
  * Check if all VFIO devices are running and migration is active, which is
  * essentially equivalent to the migration being in pre-copy phase.
@@ -1407,16 +1423,158 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
     memory_listener_unregister(&dirty->listener);
 }
 
+static void vfio_devices_dma_logging_stop(VFIOContainer *container)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+    VFIODevice *vbasedev;
+    VFIOGroup *group;
+    int ret = 0;
+
+    feature->argsz = sizeof(buf);
+    feature->flags = VFIO_DEVICE_FEATURE_SET |
+                     VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
+
+    QLIST_FOREACH(group, &container->group_list, container_next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (!vbasedev->dirty_tracking) {
+                continue;
+            }
+
+            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+            if (ret) {
+                warn_report("%s: Failed to stop DMA logging, err %d (%s)",
+                             vbasedev->name, ret, strerror(errno));
+            }
+            vbasedev->dirty_tracking = false;
+        }
+    }
+}
+
+static struct vfio_device_feature *
+vfio_device_feature_dma_logging_start_create(VFIOContainer *container,
+                                             VFIODirtyTrackingRange *tracking)
+{
+    struct vfio_device_feature *feature;
+    size_t feature_size;
+    struct vfio_device_feature_dma_logging_control *control;
+    struct vfio_device_feature_dma_logging_range *ranges;
+
+    feature_size = sizeof(struct vfio_device_feature) +
+                   sizeof(struct vfio_device_feature_dma_logging_control);
+    feature = g_try_malloc0(feature_size);
+    if (!feature) {
+        errno = ENOMEM;
+        return NULL;
+    }
+    feature->argsz = feature_size;
+    feature->flags = VFIO_DEVICE_FEATURE_SET |
+                     VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
+
+    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
+    control->page_size = qemu_real_host_page_size();
+
+    /*
+     * DMA logging uAPI guarantees to support at least a number of ranges that
+     * fits into a single host kernel base page.
+     */
+    control->num_ranges = !!tracking->max32 + !!tracking->max64;
+    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
+                        control->num_ranges);
+    if (!ranges) {
+        g_free(feature);
+        errno = ENOMEM;
+
+        return NULL;
+    }
+
+    control->ranges = (__u64)(uintptr_t)ranges;
+    if (tracking->max32) {
+        ranges->iova = tracking->min32;
+        ranges->length = (tracking->max32 - tracking->min32) + 1;
+        ranges++;
+    }
+    if (tracking->max64) {
+        ranges->iova = tracking->min64;
+        ranges->length = (tracking->max64 - tracking->min64) + 1;
+    }
+
+    trace_vfio_device_dirty_tracking_start(control->num_ranges,
+                                           tracking->min32, tracking->max32,
+                                           tracking->min64, tracking->max64);
+
+    return feature;
+}
+
+static void vfio_device_feature_dma_logging_start_destroy(
+    struct vfio_device_feature *feature)
+{
+    struct vfio_device_feature_dma_logging_control *control =
+        (struct vfio_device_feature_dma_logging_control *)feature->data;
+    struct vfio_device_feature_dma_logging_range *ranges =
+        (struct vfio_device_feature_dma_logging_range *)(uintptr_t) control->ranges;
+
+    g_free(ranges);
+    g_free(feature);
+}
+
+static int vfio_devices_dma_logging_start(VFIOContainer *container)
+{
+    struct vfio_device_feature *feature;
+    VFIODirtyRanges dirty;
+    VFIODevice *vbasedev;
+    VFIOGroup *group;
+    int ret = 0;
+
+    vfio_dirty_tracking_init(container, &dirty);
+    feature = vfio_device_feature_dma_logging_start_create(container,
+                                                           &dirty.ranges);
+    if (!feature) {
+        return -errno;
+    }
+
+    QLIST_FOREACH(group, &container->group_list, container_next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->dirty_tracking) {
+                continue;
+            }
+
+            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+            if (ret) {
+                ret = -errno;
+                error_report("%s: Failed to start DMA logging, err %d (%s)",
+                             vbasedev->name, ret, strerror(errno));
+                goto out;
+            }
+            vbasedev->dirty_tracking = true;
+        }
+    }
+
+out:
+    if (ret) {
+        vfio_devices_dma_logging_stop(container);
+    }
+
+    vfio_device_feature_dma_logging_start_destroy(feature);
+
+    return ret;
+}
+
 static void vfio_listener_log_global_start(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-    VFIODirtyRanges dirty;
     int ret;
 
-    vfio_dirty_tracking_init(container, &dirty);
+    if (vfio_devices_all_device_dirty_tracking(container)) {
+        ret = vfio_devices_dma_logging_start(container);
+    } else {
+        ret = vfio_set_dirty_page_tracking(container, true);
+    }
 
-    ret = vfio_set_dirty_page_tracking(container, true);
     if (ret) {
+        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
+                     ret, strerror(-ret));
         vfio_set_migration_error(ret);
     }
 }
@@ -1424,10 +1582,17 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
 static void vfio_listener_log_global_stop(MemoryListener *listener)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-    int ret;
+    int ret = 0;
+
+    if (vfio_devices_all_device_dirty_tracking(container)) {
+        vfio_devices_dma_logging_stop(container);
+    } else {
+        ret = vfio_set_dirty_page_tracking(container, false);
+    }
 
-    ret = vfio_set_dirty_page_tracking(container, false);
     if (ret) {
+        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
+                     ret, strerror(-ret));
         vfio_set_migration_error(ret);
     }
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index d97a6de17921..7a7e0cfe5b23 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
+vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 0f84136cceb5..7817ca7d8706 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -156,6 +156,8 @@ typedef struct VFIODevice {
     VFIOMigration *migration;
     Error *migration_blocker;
     OnOffAuto pre_copy_dirty_page_tracking;
+    bool dirty_pages_supported;
+    bool dirty_tracking;
 } VFIODevice;
 
 struct VFIODeviceOps {
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 10/14] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (8 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 09/14] vfio/common: Add device dirty page tracking start/stop Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:02 ` [PATCH v4 11/14] vfio/common: Add device dirty page bitmap sync Joao Martins
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

Extract the VFIO_IOMMU_DIRTY_PAGES ioctl code in vfio_get_dirty_bitmap()
to its own function.

This will help the code to be more readable after next patch will add
device dirty page bitmap sync functionality.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 57 +++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a42f5f1e7ffe..136665ca2c4e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1597,26 +1597,13 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     }
 }
 
-static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
-                                 uint64_t size, ram_addr_t ram_addr)
+static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
+                                   hwaddr iova, hwaddr size)
 {
     struct vfio_iommu_type1_dirty_bitmap *dbitmap;
     struct vfio_iommu_type1_dirty_bitmap_get *range;
-    VFIOBitmap vbmap;
     int ret;
 
-    if (!container->dirty_pages_supported) {
-        cpu_physical_memory_set_dirty_range(ram_addr, size,
-                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
-                                            DIRTY_CLIENTS_NOCODE);
-        return 0;
-    }
-
-    ret = vfio_bitmap_alloc(&vbmap, size);
-    if (ret) {
-        return ret;
-    }
-
     dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
     dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1631,8 +1618,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
      * to qemu_real_host_page_size.
      */
     range->bitmap.pgsize = qemu_real_host_page_size();
-    range->bitmap.size = vbmap.size;
-    range->bitmap.data = (__u64 *)vbmap.bitmap;
+    range->bitmap.size = vbmap->size;
+    range->bitmap.data = (__u64 *)vbmap->bitmap;
 
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
     if (ret) {
@@ -1640,16 +1627,42 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
                 " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
                 (uint64_t)range->size, errno);
-        goto err_out;
+    }
+
+    g_free(dbitmap);
+
+    return ret;
+}
+
+static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+                                 uint64_t size, ram_addr_t ram_addr)
+{
+    VFIOBitmap vbmap;
+    int ret;
+
+    if (!container->dirty_pages_supported) {
+        cpu_physical_memory_set_dirty_range(ram_addr, size,
+                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
+                                            DIRTY_CLIENTS_NOCODE);
+        return 0;
+    }
+
+    ret = vfio_bitmap_alloc(&vbmap, size);
+    if (ret) {
+        return ret;
+    }
+
+    ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+    if (ret) {
+        goto out;
     }
 
     cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
                                            vbmap.pages);
 
-    trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
-                                range->bitmap.size, ram_addr);
-err_out:
-    g_free(dbitmap);
+    trace_vfio_get_dirty_bitmap(container->fd, iova, size, vbmap.size,
+                                ram_addr);
+out:
     g_free(vbmap.bitmap);
 
     return ret;
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 11/14] vfio/common: Add device dirty page bitmap sync
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (9 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 10/14] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:02 ` [PATCH v4 12/14] vfio/migration: Block migration with vIOMMU Joao Martins
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

Add device dirty page bitmap sync functionality. This uses the device
DMA logging uAPI to sync dirty page bitmap from the device.

Device dirty page bitmap sync is used only if all devices within a
container support device dirty page tracking.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 88 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 136665ca2c4e..75b4902bbcc9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -339,6 +339,9 @@ static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
     return 0;
 }
 
+static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+                                 uint64_t size, ram_addr_t ram_addr);
+
 bool vfio_mig_active(void)
 {
     VFIOGroup *group;
@@ -562,10 +565,16 @@ static int vfio_dma_unmap(VFIOContainer *container,
         .iova = iova,
         .size = size,
     };
+    bool need_dirty_sync = false;
+    int ret;
+
+    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
+        if (!vfio_devices_all_device_dirty_tracking(container) &&
+            container->dirty_pages_supported) {
+            return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
+        }
 
-    if (iotlb && container->dirty_pages_supported &&
-        vfio_devices_all_running_and_mig_active(container)) {
-        return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
+        need_dirty_sync = true;
     }
 
     while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
@@ -591,10 +600,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
         return -errno;
     }
 
-    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
-        cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
-                                            tcg_enabled() ? DIRTY_CLIENTS_ALL :
-                                            DIRTY_CLIENTS_NOCODE);
+    if (need_dirty_sync) {
+        ret = vfio_get_dirty_bitmap(container, iova, size,
+                                    iotlb->translated_addr);
+        if (ret) {
+            return ret;
+        }
     }
 
     return 0;
@@ -1597,6 +1608,58 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     }
 }
 
+static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
+                                          hwaddr size, void *bitmap)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+                        sizeof(struct vfio_device_feature_dma_logging_report),
+                        sizeof(__u64))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+    struct vfio_device_feature_dma_logging_report *report =
+        (struct vfio_device_feature_dma_logging_report *)feature->data;
+
+    report->iova = iova;
+    report->length = size;
+    report->page_size = qemu_real_host_page_size();
+    report->bitmap = (__u64)(uintptr_t)bitmap;
+
+    feature->argsz = sizeof(buf);
+    feature->flags = VFIO_DEVICE_FEATURE_GET |
+                     VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
+
+    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        return -errno;
+    }
+
+    return 0;
+}
+
+static int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
+                                           VFIOBitmap *vbmap, hwaddr iova,
+                                           hwaddr size)
+{
+    VFIODevice *vbasedev;
+    VFIOGroup *group;
+    int ret;
+
+    QLIST_FOREACH(group, &container->group_list, container_next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            ret = vfio_device_dma_logging_report(vbasedev, iova, size,
+                                                 vbmap->bitmap);
+            if (ret) {
+                error_report("%s: Failed to get DMA logging report, iova: "
+                             "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
+                             ", err: %d (%s)",
+                             vbasedev->name, iova, size, ret, strerror(-ret));
+
+                return ret;
+            }
+        }
+    }
+
+    return 0;
+}
+
 static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
                                    hwaddr iova, hwaddr size)
 {
@@ -1637,10 +1700,12 @@ static int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
 static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
                                  uint64_t size, ram_addr_t ram_addr)
 {
+    bool all_device_dirty_tracking =
+        vfio_devices_all_device_dirty_tracking(container);
     VFIOBitmap vbmap;
     int ret;
 
-    if (!container->dirty_pages_supported) {
+    if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
         cpu_physical_memory_set_dirty_range(ram_addr, size,
                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
                                             DIRTY_CLIENTS_NOCODE);
@@ -1652,7 +1717,12 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
         return ret;
     }
 
-    ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+    if (all_device_dirty_tracking) {
+        ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
+    } else {
+        ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+    }
+
     if (ret) {
         goto out;
     }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 12/14] vfio/migration: Block migration with vIOMMU
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (10 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 11/14] vfio/common: Add device dirty page bitmap sync Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07 10:22   ` Cédric Le Goater
  2023-03-07  2:02 ` [PATCH v4 13/14] vfio/migration: Query device dirty page tracking support Joao Martins
  2023-03-07  2:02 ` [PATCH v4 14/14] docs/devel: Document VFIO device dirty page tracking Joao Martins
  13 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

Migrating with vIOMMU will require either tracking maximum
IOMMU supported address space (e.g. 39/48 address width on Intel)
or range-track current mappings and dirty track the new ones
post starting dirty tracking. This will be done as a separate
series, so add a live migration blocker until that is fixed.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 46 +++++++++++++++++++++++++++++++++++
 hw/vfio/migration.c           |  5 ++++
 hw/vfio/pci.c                 |  1 +
 include/hw/vfio/vfio-common.h |  2 ++
 4 files changed, 54 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 75b4902bbcc9..7278baa82f7d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -362,6 +362,7 @@ bool vfio_mig_active(void)
 }
 
 static Error *multiple_devices_migration_blocker;
+static Error *giommu_migration_blocker;
 
 static unsigned int vfio_migratable_device_num(void)
 {
@@ -413,6 +414,51 @@ void vfio_unblock_multiple_devices_migration(void)
     multiple_devices_migration_blocker = NULL;
 }
 
+static bool vfio_viommu_preset(void)
+{
+    VFIOAddressSpace *space;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        if (space->as != &address_space_memory) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+int vfio_block_giommu_migration(Error **errp)
+{
+    int ret;
+
+    if (giommu_migration_blocker ||
+        !vfio_viommu_preset()) {
+        return 0;
+    }
+
+    error_setg(&giommu_migration_blocker,
+               "Migration is currently not supported with vIOMMU enabled");
+    ret = migrate_add_blocker(giommu_migration_blocker, errp);
+    if (ret < 0) {
+        error_free(giommu_migration_blocker);
+        giommu_migration_blocker = NULL;
+    }
+
+    return ret;
+}
+
+void vfio_unblock_giommu_migration(void)
+{
+    if (!giommu_migration_blocker ||
+        vfio_viommu_preset()) {
+        return;
+    }
+
+    migrate_del_blocker(giommu_migration_blocker);
+    error_free(giommu_migration_blocker);
+    giommu_migration_blocker = NULL;
+}
+
 static void vfio_set_migration_error(int err)
 {
     MigrationState *ms = migrate_get_current();
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a2c3d9bade7f..776fd2d7cdf3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -634,6 +634,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
         return ret;
     }
 
+    ret = vfio_block_giommu_migration(errp);
+    if (ret) {
+        return ret;
+    }
+
     trace_vfio_migration_probe(vbasedev->name);
     return 0;
 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 939dcc3d4a9e..30a271eab38c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3185,6 +3185,7 @@ static void vfio_instance_finalize(Object *obj)
      */
     vfio_put_device(vdev);
     vfio_put_group(group);
+    vfio_unblock_giommu_migration();
 }
 
 static void vfio_exitfn(PCIDevice *pdev)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7817ca7d8706..63f93ab54811 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -235,6 +235,8 @@ extern VFIOGroupList vfio_group_list;
 bool vfio_mig_active(void);
 int vfio_block_multiple_devices_migration(Error **errp);
 void vfio_unblock_multiple_devices_migration(void);
+int vfio_block_giommu_migration(Error **errp);
+void vfio_unblock_giommu_migration(void);
 int64_t vfio_mig_bytes_transferred(void);
 
 #ifdef CONFIG_LINUX
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 13/14] vfio/migration: Query device dirty page tracking support
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (11 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 12/14] vfio/migration: Block migration with vIOMMU Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  2023-03-07  2:02 ` [PATCH v4 14/14] docs/devel: Document VFIO device dirty page tracking Joao Martins
  13 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

Now that everything has been set up for device dirty page tracking,
query the device for device dirty page tracking support.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/migration.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 776fd2d7cdf3..127a44ccaf19 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -555,6 +555,19 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
     return 0;
 }
 
+static bool vfio_dma_logging_supported(VFIODevice *vbasedev)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+
+    feature->argsz = sizeof(buf);
+    feature->flags =
+        VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
+
+    return !ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev)
 {
     int ret;
@@ -589,6 +602,8 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     migration->device_state = VFIO_DEVICE_STATE_RUNNING;
     migration->data_fd = -1;
 
+    vbasedev->dirty_pages_supported = vfio_dma_logging_supported(vbasedev);
+
     oid = vmstate_if_get_id(VMSTATE_IF(DEVICE(obj)));
     if (oid) {
         path = g_strdup_printf("%s/vfio", oid);
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 14/14] docs/devel: Document VFIO device dirty page tracking
  2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
                   ` (12 preceding siblings ...)
  2023-03-07  2:02 ` [PATCH v4 13/14] vfio/migration: Query device dirty page tracking support Joao Martins
@ 2023-03-07  2:02 ` Joao Martins
  13 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon,
	Joao Martins

From: Avihai Horon <avihaih@nvidia.com>

Adjust the VFIO dirty page tracking documentation and add a section to
describe device dirty page tracking.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 docs/devel/vfio-migration.rst | 46 +++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index c214c73e2818..1b68ccf11529 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -59,22 +59,37 @@ System memory dirty pages tracking
 ----------------------------------
 
 A ``log_global_start`` and ``log_global_stop`` memory listener callback informs
-the VFIO IOMMU module to start and stop dirty page tracking. A ``log_sync``
-memory listener callback marks those system memory pages as dirty which are
-used for DMA by the VFIO device. The dirty pages bitmap is queried per
-container. All pages pinned by the vendor driver through external APIs have to
-be marked as dirty during migration. When there are CPU writes, CPU dirty page
-tracking can identify dirtied pages, but any page pinned by the vendor driver
-can also be written by the device. There is currently no device or IOMMU
-support for dirty page tracking in hardware.
+the VFIO dirty tracking module to start and stop dirty page tracking. A
+``log_sync`` memory listener callback queries the dirty page bitmap from the
+dirty tracking module and marks system memory pages which were DMA-ed by the
+VFIO device as dirty. The dirty page bitmap is queried per container.
+
+Currently there are two ways dirty page tracking can be done:
+(1) Device dirty tracking:
+In this method the device is responsible to log and report its DMAs. This
+method can be used only if the device is capable of tracking its DMAs.
+Discovering device capability, starting and stopping dirty tracking, and
+syncing the dirty bitmaps from the device are done using the DMA logging uAPI.
+More info about the uAPI can be found in the comments of the
+``vfio_device_feature_dma_logging_control`` and
+``vfio_device_feature_dma_logging_report`` structures in the header file
+linux-headers/linux/vfio.h.
+
+(2) VFIO IOMMU module:
+In this method dirty tracking is done by IOMMU. However, there is currently no
+IOMMU support for dirty page tracking. For this reason, all pages are
+perpetually marked dirty, unless the device driver pins pages through external
+APIs in which case only those pinned pages are perpetually marked dirty.
+
+If the above two methods are not supported, all pages are perpetually marked
+dirty by QEMU.
 
 By default, dirty pages are tracked during pre-copy as well as stop-and-copy
-phase. So, a page pinned by the vendor driver will be copied to the destination
-in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
-it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
-finding dirty pages continuously, then it understands that even in stop-and-copy
-phase, it is likely to find dirty pages and can predict the downtime
-accordingly.
+phase. So, a page marked as dirty will be copied to the destination in both
+phases. Copying dirty pages in pre-copy phase helps QEMU to predict if it can
+achieve its downtime tolerances. If QEMU during pre-copy phase keeps finding
+dirty pages continuously, then it understands that even in stop-and-copy phase,
+it is likely to find dirty pages and can predict the downtime accordingly.
 
 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
 which disables querying the dirty bitmap during pre-copy phase. If it is set to
@@ -89,7 +104,8 @@ phase of migration. In that case, the unmap ioctl returns any dirty pages in
 that range and QEMU reports corresponding guest physical pages dirty. During
 stop-and-copy phase, an IOMMU notifier is used to get a callback for mapped
 pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for those
-mapped ranges.
+mapped ranges. If device dirty tracking is enabled with vIOMMU, live migration
+will be blocked.
 
 Flow of state changes during Live migration
 ===========================================
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation
  2023-03-07  2:02 ` [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation Joao Martins
@ 2023-03-07  2:40   ` Alex Williamson
  2023-03-07 10:11     ` Joao Martins
  2023-03-07  9:52   ` Avihai Horon
  1 sibling, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2023-03-07  2:40 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On Tue,  7 Mar 2023 02:02:51 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> In preparation to be used in device dirty tracking, move the code that
> calculates a iova/end range from the container/section.  This avoids
> duplication on the common checks across listener callbacks.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  hw/vfio/common.c | 37 ++++++++++++++++++++++++++++++-------
>  1 file changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 54b4a4fc7daf..3a6491dbc523 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -961,6 +961,35 @@ static bool vfio_listener_valid_section(MemoryRegionSection *section)
>      return true;
>  }
>  
> +/*
> + * Called for the dirty tracking memory listener to calculate the iova/end
> + * for a given memory region section.
> + */
> +static bool vfio_get_section_iova_range(VFIOContainer *container,
> +                                        MemoryRegionSection *section,
> +                                        hwaddr *out_iova, hwaddr *out_end,
> +                                        Int128 *out_llend)
> +{
> +    Int128 llend;
> +    hwaddr iova;
> +
> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> +
> +    if (int128_ge(int128_make64(iova), llend)) {
> +        return false;
> +    }
> +
> +    *out_iova = iova;
> +    *out_end = int128_get64(int128_sub(llend, int128_one()));
> +    if (out_llend) {
> +        *out_llend = llend;
> +    }
> +    return true;
> +}
> +
>  static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
> @@ -976,12 +1005,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          return;
>      }
>  
> -    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> -    llend = int128_make64(section->offset_within_address_space);
> -    llend = int128_add(llend, section->size);
> -    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> -
> -    if (int128_ge(int128_make64(iova), llend)) {
> +    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
>          if (memory_region_is_ram_device(section->mr)) {
>              trace_vfio_listener_region_add_no_dma_map(
>                  memory_region_name(section->mr),
> @@ -991,7 +1015,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          }
>          return;
>      }
> -    end = int128_get64(int128_sub(llend, int128_one()));
>  
>      if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>          hwaddr pgsize = 0;

Shouldn't this convert vfio_listener_region_del() too?  Thanks,

Alex



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges
  2023-03-07  2:02 ` [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges Joao Martins
@ 2023-03-07  2:57   ` Alex Williamson
  2023-03-07 10:08     ` Cédric Le Goater
  2023-03-07 10:16     ` Joao Martins
  0 siblings, 2 replies; 38+ messages in thread
From: Alex Williamson @ 2023-03-07  2:57 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On Tue,  7 Mar 2023 02:02:52 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:

> According to the device DMA logging uAPI, IOVA ranges to be logged by
> the device must be provided all at once upon DMA logging start.
> 
> As preparation for the following patches which will add device dirty
> page tracking, keep a record of all DMA mapped IOVA ranges so later they
> can be used for DMA logging start.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  hw/vfio/common.c              | 76 +++++++++++++++++++++++++++++++++++
>  hw/vfio/trace-events          |  1 +
>  include/hw/vfio/vfio-common.h | 13 ++++++
>  3 files changed, 90 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3a6491dbc523..a9b1fc999121 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1334,11 +1334,87 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>      return ret;
>  }
>  
> +static void vfio_dirty_tracking_update(MemoryListener *listener,
> +                                       MemoryRegionSection *section)
> +{
> +    VFIODirtyRanges *dirty = container_of(listener, VFIODirtyRanges, listener);
> +    VFIODirtyTrackingRange *range = &dirty->ranges;
> +    hwaddr max32 = UINT32_MAX - 1ULL;

The -1 is wrong here, UINT32_MAX is (2^32 - 1)

> +    hwaddr iova, end;
> +
> +    if (!vfio_listener_valid_section(section) ||
> +        !vfio_get_section_iova_range(dirty->container, section,
> +                                     &iova, &end, NULL)) {
> +        return;
> +    }
> +
> +    /*
> +     * The address space passed to the dirty tracker is reduced to two ranges:
> +     * one for 32-bit DMA ranges, and another one for 64-bit DMA ranges.
> +     * The underlying reports of dirty will query a sub-interval of each of
> +     * these ranges.
> +     *
> +     * The purpose of the dual range handling is to handle known cases of big
> +     * holes in the address space, like the x86 AMD 1T hole. The alternative
> +     * would be an IOVATree but that has a much bigger runtime overhead and
> +     * unnecessary complexity.
> +     */
> +    if (iova < max32 && end <= max32) {

Nit, the first test is redundant, iova is necessarily less than end.

> +        if (range->min32 > iova) {
> +            range->min32 = iova;
> +        }
> +        if (range->max32 < end) {
> +            range->max32 = end;
> +        }
> +        trace_vfio_device_dirty_tracking_update(iova, end,
> +                                    range->min32, range->max32);
> +    } else {
> +        if (!range->min64 || range->min64 > iova) {

The first test should be removed, we're initializing min64 to a
non-zero value now, so if it's zero it's been set and we can't
de-prioritize that set value.

> +            range->min64 = iova;
> +        }
> +        if (range->max64 < end) {
> +            range->max64 = end;
> +        }
> +        trace_vfio_device_dirty_tracking_update(iova, end,
> +                                    range->min64, range->max64);
> +    }
> +
> +    return;
> +}
> +
> +static const MemoryListener vfio_dirty_tracking_listener = {
> +    .name = "vfio-tracking",
> +    .region_add = vfio_dirty_tracking_update,
> +};
> +
> +static void vfio_dirty_tracking_init(VFIOContainer *container,
> +                                     VFIODirtyRanges *dirty)
> +{
> +    memset(dirty, 0, sizeof(*dirty));
> +    dirty->ranges.min32 = UINT32_MAX;
> +    dirty->ranges.min64 = UINT64_MAX;
> +    dirty->listener = vfio_dirty_tracking_listener;
> +    dirty->container = container;
> +

I was actually thinking the caller would just pass
VFIODirtyTrackingRange and VFIODirtyRanges would be allocated on the
stack here, perhaps both are defined private to this file, but this
works and we can refine later if we so decide.  Thanks,

Alex


> +    memory_listener_register(&dirty->listener,
> +                             container->space->as);
> +
> +    /*
> +     * The memory listener is synchronous, and used to calculate the range
> +     * to dirty tracking. Unregister it after we are done as we are not
> +     * interested in any follow-up updates.
> +     */
> +    memory_listener_unregister(&dirty->listener);
> +}
> +
>  static void vfio_listener_log_global_start(MemoryListener *listener)
>  {
>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    VFIODirtyRanges dirty;
>      int ret;
>  
> +    vfio_dirty_tracking_init(container, &dirty);
> +
>      ret = vfio_set_dirty_page_tracking(container, true);
>      if (ret) {
>          vfio_set_migration_error(ret);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 669d9fe07cd9..d97a6de17921 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>  vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>  vfio_disconnect_container(int fd) "close container->fd=%d"
>  vfio_put_group(int fd) "close group->fd=%d"
>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 87524c64a443..0f84136cceb5 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -96,6 +96,19 @@ typedef struct VFIOContainer {
>      QLIST_ENTRY(VFIOContainer) next;
>  } VFIOContainer;
>  
> +typedef struct VFIODirtyTrackingRange {
> +    hwaddr min32;
> +    hwaddr max32;
> +    hwaddr min64;
> +    hwaddr max64;
> +} VFIODirtyTrackingRange;
> +
> +typedef struct VFIODirtyRanges {
> +    VFIOContainer *container;
> +    VFIODirtyTrackingRange ranges;
> +    MemoryListener listener;
> +} VFIODirtyRanges;
> +
>  typedef struct VFIOGuestIOMMU {
>      VFIOContainer *container;
>      IOMMUMemoryRegion *iommu_mr;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 04/14] vfio/common: Add VFIOBitmap and alloc function
  2023-03-07  2:02 ` [PATCH v4 04/14] vfio/common: Add VFIOBitmap and alloc function Joao Martins
@ 2023-03-07  8:49   ` Avihai Horon
  2023-03-07 10:17     ` Joao Martins
  0 siblings, 1 reply; 38+ messages in thread
From: Avihai Horon @ 2023-03-07  8:49 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 07/03/2023 4:02, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> From: Avihai Horon <avihaih@nvidia.com>
>
> There are already two places where dirty page bitmap allocation and
> calculations are done in open code. With device dirty page tracking
> being added in next patches, there are going to be even more places.
>
> To avoid code duplication, introduce VFIOBitmap struct and corresponding
> alloc function and use them where applicable.

Nit, after splitting the series we still have only two places where we 
alloc dirty page bitmap.

So we can drop the second sentence:

There are two places where dirty page bitmap allocation and calculations
are done in open code.

To avoid code duplication, introduce VFIOBitmap struct and corresponding
alloc function and use them where applicable.

Thanks.

> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> ---
>   hw/vfio/common.c | 73 +++++++++++++++++++++++++++++-------------------
>   1 file changed, 44 insertions(+), 29 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 4c801513136a..cec3de08d2b4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -320,6 +320,25 @@ const MemoryRegionOps vfio_region_ops = {
>    * Device state interfaces
>    */
>
> +typedef struct {
> +    unsigned long *bitmap;
> +    hwaddr size;
> +    hwaddr pages;
> +} VFIOBitmap;
> +
> +static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
> +{
> +    vbmap->pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
> +    vbmap->size = ROUND_UP(vbmap->pages, sizeof(__u64) * BITS_PER_BYTE) /
> +                                         BITS_PER_BYTE;
> +    vbmap->bitmap = g_try_malloc0(vbmap->size);
> +    if (!vbmap->bitmap) {
> +        return -ENOMEM;
> +    }
> +
> +    return 0;
> +}
> +
>   bool vfio_mig_active(void)
>   {
>       VFIOGroup *group;
> @@ -468,9 +487,14 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>   {
>       struct vfio_iommu_type1_dma_unmap *unmap;
>       struct vfio_bitmap *bitmap;
> -    uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
> +    VFIOBitmap vbmap;
>       int ret;
>
> +    ret = vfio_bitmap_alloc(&vbmap, size);
> +    if (ret) {
> +        return ret;
> +    }
> +
>       unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
>
>       unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
> @@ -484,35 +508,28 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>        * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize
>        * to qemu_real_host_page_size.
>        */
> -
>       bitmap->pgsize = qemu_real_host_page_size();
> -    bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
> -                   BITS_PER_BYTE;
> +    bitmap->size = vbmap.size;
> +    bitmap->data = (__u64 *)vbmap.bitmap;
>
> -    if (bitmap->size > container->max_dirty_bitmap_size) {
> -        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
> -                     (uint64_t)bitmap->size);
> +    if (vbmap.size > container->max_dirty_bitmap_size) {
> +        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
>           ret = -E2BIG;
>           goto unmap_exit;
>       }
>
> -    bitmap->data = g_try_malloc0(bitmap->size);
> -    if (!bitmap->data) {
> -        ret = -ENOMEM;
> -        goto unmap_exit;
> -    }
> -
>       ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
>       if (!ret) {
> -        cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
> -                iotlb->translated_addr, pages);
> +        cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap,
> +                iotlb->translated_addr, vbmap.pages);
>       } else {
>           error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
>       }
>
> -    g_free(bitmap->data);
>   unmap_exit:
>       g_free(unmap);
> +    g_free(vbmap.bitmap);
> +
>       return ret;
>   }
>
> @@ -1329,7 +1346,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>   {
>       struct vfio_iommu_type1_dirty_bitmap *dbitmap;
>       struct vfio_iommu_type1_dirty_bitmap_get *range;
> -    uint64_t pages;
> +    VFIOBitmap vbmap;
>       int ret;
>
>       if (!container->dirty_pages_supported) {
> @@ -1339,6 +1356,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>           return 0;
>       }
>
> +    ret = vfio_bitmap_alloc(&vbmap, size);
> +    if (ret) {
> +        return ret;
> +    }
> +
>       dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
>
>       dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
> @@ -1353,15 +1375,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>        * to qemu_real_host_page_size.
>        */
>       range->bitmap.pgsize = qemu_real_host_page_size();
> -
> -    pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size();
> -    range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
> -                                         BITS_PER_BYTE;
> -    range->bitmap.data = g_try_malloc0(range->bitmap.size);
> -    if (!range->bitmap.data) {
> -        ret = -ENOMEM;
> -        goto err_out;
> -    }
> +    range->bitmap.size = vbmap.size;
> +    range->bitmap.data = (__u64 *)vbmap.bitmap;
>
>       ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
>       if (ret) {
> @@ -1372,14 +1387,14 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>           goto err_out;
>       }
>
> -    cpu_physical_memory_set_dirty_lebitmap((unsigned long *)range->bitmap.data,
> -                                            ram_addr, pages);
> +    cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
> +                                           vbmap.pages);
>
>       trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
>                                   range->bitmap.size, ram_addr);
>   err_out:
> -    g_free(range->bitmap.data);
>       g_free(dbitmap);
> +    g_free(vbmap.bitmap);
>
>       return ret;
>   }
> --
> 2.17.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 05/14] vfio/common: Add helper to validate iova/end against hostwin
  2023-03-07  2:02 ` [PATCH v4 05/14] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
@ 2023-03-07  8:57   ` Avihai Horon
  2023-03-07 10:18     ` Joao Martins
  0 siblings, 1 reply; 38+ messages in thread
From: Avihai Horon @ 2023-03-07  8:57 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 07/03/2023 4:02, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> In preparation to be used in device dirty tracking, move the code that
> finds the container host DMA window against a iova range.  This avoids
> duplication on the common checks across listener callbacks.

Eventually this isn't used by device dirty tracking, so "In preparation 
to be used in device dirty tracking" can be dropped.

Other than that, FWIW:

Reviewed-by: Avihai Horon <avihaih@nvidia.com>

Thanks.

>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> ---
>   hw/vfio/common.c | 38 ++++++++++++++++++++------------------
>   1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index cec3de08d2b4..99acb998eb14 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -901,6 +901,22 @@ static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
>       g_free(vrdl);
>   }
>
> +static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
> +                                            hwaddr iova, hwaddr end)
> +{
> +    VFIOHostDMAWindow *hostwin;
> +    bool hostwin_found = false;
> +
> +    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> +            hostwin_found = true;
> +            break;
> +        }
> +    }
> +
> +    return hostwin_found ? hostwin : NULL;
> +}
> +
>   static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>   {
>       MemoryRegion *mr = section->mr;
> @@ -926,7 +942,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       void *vaddr;
>       int ret;
>       VFIOHostDMAWindow *hostwin;
> -    bool hostwin_found;
>       Error *err = NULL;
>
>       if (vfio_listener_skipped_section(section)) {
> @@ -1027,15 +1042,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
>   #endif
>       }
>
> -    hostwin_found = false;
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> -        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> -            hostwin_found = true;
> -            break;
> -        }
> -    }
> -
> -    if (!hostwin_found) {
> +    hostwin = vfio_find_hostwin(container, iova, end);
> +    if (!hostwin) {
>           error_setg(&err, "Container %p can't map guest IOVA region"
>                      " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>           goto fail;
> @@ -1237,15 +1245,9 @@ static void vfio_listener_region_del(MemoryListener *listener,
>       if (memory_region_is_ram_device(section->mr)) {
>           hwaddr pgmask;
>           VFIOHostDMAWindow *hostwin;
> -        bool hostwin_found = false;
>
> -        QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> -            if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> -                hostwin_found = true;
> -                break;
> -            }
> -        }
> -        assert(hostwin_found); /* or region_add() would have failed */
> +        hostwin = vfio_find_hostwin(container, iova, end);
> +        assert(hostwin); /* or region_add() would have failed */
>
>           pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
>           try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
> --
> 2.17.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper
  2023-03-07  2:02 ` [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper Joao Martins
@ 2023-03-07  9:13   ` Avihai Horon
  2023-03-07  9:47     ` Cédric Le Goater
  2023-03-07 10:21     ` Joao Martins
  0 siblings, 2 replies; 38+ messages in thread
From: Avihai Horon @ 2023-03-07  9:13 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 07/03/2023 4:02, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> The checks are replicated against region_add and region_del
> and will be soon added in another memory listener dedicated
> for dirty tracking.
>
> Move these into a new helper for avoid duplication.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> ---
>   hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
>   1 file changed, 21 insertions(+), 31 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 99acb998eb14..54b4a4fc7daf 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -933,23 +933,14 @@ static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>       return true;
>   }
>
> -static void vfio_listener_region_add(MemoryListener *listener,
> -                                     MemoryRegionSection *section)
> +static bool vfio_listener_valid_section(MemoryRegionSection *section)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> -    hwaddr iova, end;
> -    Int128 llend, llsize;
> -    void *vaddr;
> -    int ret;
> -    VFIOHostDMAWindow *hostwin;
> -    Error *err = NULL;
> -
>       if (vfio_listener_skipped_section(section)) {
>           trace_vfio_listener_region_add_skip(
>                   section->offset_within_address_space,
>                   section->offset_within_address_space +
>                   int128_get64(int128_sub(section->size, int128_one())));

The original code uses two different traces depending on add or del -- 
trace_vfio_listener_region_{add,del}_skip.
Should we combine the two traces into a single trace? If the distinction 
is important then maybe pass a flag or the caller name to indicate 
whether it's add, del or dirty tracking update?

But other than that:

Reviewed-by: Avihai Horon <avihaih@nvidia.com>

Thanks.

> -        return;
> +        return false;
>       }
>
>       if (unlikely((section->offset_within_address_space &
> @@ -964,6 +955,24 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                            section->offset_within_region,
>                            qemu_real_host_page_size());
>           }
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static void vfio_listener_region_add(MemoryListener *listener,
> +                                     MemoryRegionSection *section)
> +{
> +    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    hwaddr iova, end;
> +    Int128 llend, llsize;
> +    void *vaddr;
> +    int ret;
> +    VFIOHostDMAWindow *hostwin;
> +    Error *err = NULL;
> +
> +    if (!vfio_listener_valid_section(section)) {
>           return;
>       }
>
> @@ -1182,26 +1191,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>       int ret;
>       bool try_unmap = true;
>
> -    if (vfio_listener_skipped_section(section)) {
> -        trace_vfio_listener_region_del_skip(
> -                section->offset_within_address_space,
> -                section->offset_within_address_space +
> -                int128_get64(int128_sub(section->size, int128_one())));
> -        return;
> -    }
> -
> -    if (unlikely((section->offset_within_address_space &
> -                  ~qemu_real_host_page_mask()) !=
> -                 (section->offset_within_region & ~qemu_real_host_page_mask()))) {
> -        if (!vfio_known_safe_misalignment(section)) {
> -            error_report("%s received unaligned region %s iova=0x%"PRIx64
> -                         " offset_within_region=0x%"PRIx64
> -                         " qemu_real_host_page_size=0x%"PRIxPTR,
> -                         __func__, memory_region_name(section->mr),
> -                         section->offset_within_address_space,
> -                         section->offset_within_region,
> -                         qemu_real_host_page_size());
> -        }
> +    if (!vfio_listener_valid_section(section)) {
>           return;
>       }
>
> --
> 2.17.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper
  2023-03-07  9:13   ` Avihai Horon
@ 2023-03-07  9:47     ` Cédric Le Goater
  2023-03-07 10:22       ` Joao Martins
  2023-03-07 10:21     ` Joao Martins
  1 sibling, 1 reply; 38+ messages in thread
From: Cédric Le Goater @ 2023-03-07  9:47 UTC (permalink / raw)
  To: Avihai Horon, Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On 3/7/23 10:13, Avihai Horon wrote:
> 
> On 07/03/2023 4:02, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> The checks are replicated against region_add and region_del
>> and will be soon added in another memory listener dedicated
>> for dirty tracking.
>>
>> Move these into a new helper for avoid duplication.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>   hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
>>   1 file changed, 21 insertions(+), 31 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 99acb998eb14..54b4a4fc7daf 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -933,23 +933,14 @@ static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>>       return true;
>>   }
>>
>> -static void vfio_listener_region_add(MemoryListener *listener,
>> -                                     MemoryRegionSection *section)
>> +static bool vfio_listener_valid_section(MemoryRegionSection *section)
>>   {
>> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> -    hwaddr iova, end;
>> -    Int128 llend, llsize;
>> -    void *vaddr;
>> -    int ret;
>> -    VFIOHostDMAWindow *hostwin;
>> -    Error *err = NULL;
>> -
>>       if (vfio_listener_skipped_section(section)) {
>>           trace_vfio_listener_region_add_skip(
>>                   section->offset_within_address_space,
>>                   section->offset_within_address_space +
>>                   int128_get64(int128_sub(section->size, int128_one())));
> 
> The original code uses two different traces depending on add or del -- trace_vfio_listener_region_{add,del}_skip.
> Should we combine the two traces into a single trace? If the distinction is important then maybe pass a flag or the caller name to indicate whether it's add, del or dirty tracking update?

I think introducing a new trace event 'trace_vfio_listener_region_skip'
to replace 'trace_vfio_listener_region_add_skip' above should be enough.

Thanks,

C.

> 
> But other than that:
> 
> Reviewed-by: Avihai Horon <avihaih@nvidia.com>
> 
> Thanks.
> 
>> -        return;
>> +        return false;
>>       }
>>
>>       if (unlikely((section->offset_within_address_space &
>> @@ -964,6 +955,24 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>                            section->offset_within_region,
>>                            qemu_real_host_page_size());
>>           }
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +static void vfio_listener_region_add(MemoryListener *listener,
>> +                                     MemoryRegionSection *section)
>> +{
>> +    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> +    hwaddr iova, end;
>> +    Int128 llend, llsize;
>> +    void *vaddr;
>> +    int ret;
>> +    VFIOHostDMAWindow *hostwin;
>> +    Error *err = NULL;
>> +
>> +    if (!vfio_listener_valid_section(section)) {
>>           return;
>>       }
>>
>> @@ -1182,26 +1191,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>>       int ret;
>>       bool try_unmap = true;
>>
>> -    if (vfio_listener_skipped_section(section)) {
>> -        trace_vfio_listener_region_del_skip(
>> -                section->offset_within_address_space,
>> -                section->offset_within_address_space +
>> -                int128_get64(int128_sub(section->size, int128_one())));
>> -        return;
>> -    }
>> -
>> -    if (unlikely((section->offset_within_address_space &
>> -                  ~qemu_real_host_page_mask()) !=
>> -                 (section->offset_within_region & ~qemu_real_host_page_mask()))) {
>> -        if (!vfio_known_safe_misalignment(section)) {
>> -            error_report("%s received unaligned region %s iova=0x%"PRIx64
>> -                         " offset_within_region=0x%"PRIx64
>> -                         " qemu_real_host_page_size=0x%"PRIxPTR,
>> -                         __func__, memory_region_name(section->mr),
>> -                         section->offset_within_address_space,
>> -                         section->offset_within_region,
>> -                         qemu_real_host_page_size());
>> -        }
>> +    if (!vfio_listener_valid_section(section)) {
>>           return;
>>       }
>>
>> -- 
>> 2.17.2
>>
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation
  2023-03-07  2:02 ` [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation Joao Martins
  2023-03-07  2:40   ` Alex Williamson
@ 2023-03-07  9:52   ` Avihai Horon
  2023-03-07 10:26     ` Joao Martins
  1 sibling, 1 reply; 38+ messages in thread
From: Avihai Horon @ 2023-03-07  9:52 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 07/03/2023 4:02, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> In preparation to be used in device dirty tracking, move the code that
> calculates a iova/end range from the container/section.  This avoids
> duplication on the common checks across listener callbacks.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   hw/vfio/common.c | 37 ++++++++++++++++++++++++++++++-------
>   1 file changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 54b4a4fc7daf..3a6491dbc523 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -961,6 +961,35 @@ static bool vfio_listener_valid_section(MemoryRegionSection *section)
>       return true;
>   }
>
> +/*
> + * Called for the dirty tracking memory listener to calculate the iova/end
> + * for a given memory region section.
> + */

Should we just delete this comment? The function name is pretty clear.
Besides that, the comment is not completely accurate -- in this patch we 
are not using it yet for dirty tracking and it's also used for 
region_{add,del}.

Thanks.

> +static bool vfio_get_section_iova_range(VFIOContainer *container,
> +                                        MemoryRegionSection *section,
> +                                        hwaddr *out_iova, hwaddr *out_end,
> +                                        Int128 *out_llend)
> +{
> +    Int128 llend;
> +    hwaddr iova;
> +
> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> +
> +    if (int128_ge(int128_make64(iova), llend)) {
> +        return false;
> +    }
> +
> +    *out_iova = iova;
> +    *out_end = int128_get64(int128_sub(llend, int128_one()));
> +    if (out_llend) {
> +        *out_llend = llend;
> +    }
> +    return true;
> +}
> +
>   static void vfio_listener_region_add(MemoryListener *listener,
>                                        MemoryRegionSection *section)
>   {
> @@ -976,12 +1005,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           return;
>       }
>
> -    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> -    llend = int128_make64(section->offset_within_address_space);
> -    llend = int128_add(llend, section->size);
> -    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
> -
> -    if (int128_ge(int128_make64(iova), llend)) {
> +    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
>           if (memory_region_is_ram_device(section->mr)) {
>               trace_vfio_listener_region_add_no_dma_map(
>                   memory_region_name(section->mr),
> @@ -991,7 +1015,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           }
>           return;
>       }
> -    end = int128_get64(int128_sub(llend, int128_one()));
>
>       if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>           hwaddr pgsize = 0;
> --
> 2.17.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges
  2023-03-07  2:57   ` Alex Williamson
@ 2023-03-07 10:08     ` Cédric Le Goater
  2023-03-07 10:30       ` Joao Martins
  2023-03-07 10:16     ` Joao Martins
  1 sibling, 1 reply; 38+ messages in thread
From: Cédric Le Goater @ 2023-03-07 10:08 UTC (permalink / raw)
  To: Alex Williamson, Joao Martins
  Cc: qemu-devel, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/7/23 03:57, Alex Williamson wrote:
> On Tue,  7 Mar 2023 02:02:52 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> According to the device DMA logging uAPI, IOVA ranges to be logged by
>> the device must be provided all at once upon DMA logging start.
>>
>> As preparation for the following patches which will add device dirty
>> page tracking, keep a record of all DMA mapped IOVA ranges so later they
>> can be used for DMA logging start.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   hw/vfio/common.c              | 76 +++++++++++++++++++++++++++++++++++
>>   hw/vfio/trace-events          |  1 +
>>   include/hw/vfio/vfio-common.h | 13 ++++++
>>   3 files changed, 90 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 3a6491dbc523..a9b1fc999121 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1334,11 +1334,87 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>>       return ret;
>>   }
>>   
>> +static void vfio_dirty_tracking_update(MemoryListener *listener,
>> +                                       MemoryRegionSection *section)
>> +{
>> +    VFIODirtyRanges *dirty = container_of(listener, VFIODirtyRanges, listener);
>> +    VFIODirtyTrackingRange *range = &dirty->ranges;
>> +    hwaddr max32 = UINT32_MAX - 1ULL;
> 
> The -1 is wrong here, UINT32_MAX is (2^32 - 1)
> 
>> +    hwaddr iova, end;
>> +
>> +    if (!vfio_listener_valid_section(section) ||
>> +        !vfio_get_section_iova_range(dirty->container, section,
>> +                                     &iova, &end, NULL)) {
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * The address space passed to the dirty tracker is reduced to two ranges:
>> +     * one for 32-bit DMA ranges, and another one for 64-bit DMA ranges.
>> +     * The underlying reports of dirty will query a sub-interval of each of
>> +     * these ranges.
>> +     *
>> +     * The purpose of the dual range handling is to handle known cases of big
>> +     * holes in the address space, like the x86 AMD 1T hole. The alternative
>> +     * would be an IOVATree but that has a much bigger runtime overhead and
>> +     * unnecessary complexity.
>> +     */
>> +    if (iova < max32 && end <= max32) {
> 
> Nit, the first test is redundant, iova is necessarily less than end.
> 
>> +        if (range->min32 > iova) {
>> +            range->min32 = iova;
>> +        }
>> +        if (range->max32 < end) {
>> +            range->max32 = end;
>> +        }
>> +        trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                    range->min32, range->max32);
>> +    } else {
>> +        if (!range->min64 || range->min64 > iova) {
> 
> The first test should be removed, we're initializing min64 to a
> non-zero value now, so if it's zero it's been set and we can't
> de-prioritize that set value.
> 
>> +            range->min64 = iova;
>> +        }
>> +        if (range->max64 < end) {
>> +            range->max64 = end;
>> +        }
>> +        trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                    range->min64, range->max64);
>> +    }
>> +
>> +    return;
>> +}
>> +
>> +static const MemoryListener vfio_dirty_tracking_listener = {
>> +    .name = "vfio-tracking",
>> +    .region_add = vfio_dirty_tracking_update,
>> +};
>> +
>> +static void vfio_dirty_tracking_init(VFIOContainer *container,
>> +                                     VFIODirtyRanges *dirty)
>> +{
>> +    memset(dirty, 0, sizeof(*dirty));
>> +    dirty->ranges.min32 = UINT32_MAX;
>> +    dirty->ranges.min64 = UINT64_MAX;
>> +    dirty->listener = vfio_dirty_tracking_listener;
>> +    dirty->container = container;
>> +
> 
> I was actually thinking the caller would just pass
> VFIODirtyTrackingRange and VFIODirtyRanges would be allocated on the
> stack here, perhaps both are defined private to this file, but this
> works and we can refine later if we so decide.  

It is true that vfio_devices_dma_logging_start() only needs
a VFIODirtyTrackingRange struct and not the VFIODirtyRanges struct
which is a temporary structure for the dirty ranges calculation.
That would be nicer to have if you respin a v5.

I would rename VFIODirtyRanges to VFIODirtyRangesListener and
VFIODirtyTrackingRange to VFIODirtyRanges.

I am not sure they need to be in include/hw/vfio/vfio-common.h but
that seems to be the VFIO practice.

Thanks,

C.


> 
> Alex
> 
> 
>> +    memory_listener_register(&dirty->listener,
>> +                             container->space->as);
>> +
>> +    /*
>> +     * The memory listener is synchronous, and used to calculate the range
>> +     * to dirty tracking. Unregister it after we are done as we are not
>> +     * interested in any follow-up updates.
>> +     */
>> +    memory_listener_unregister(&dirty->listener);
>> +}
>> +
>>   static void vfio_listener_log_global_start(MemoryListener *listener)
>>   {
>>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> +    VFIODirtyRanges dirty;
>>       int ret;
>>   
>> +    vfio_dirty_tracking_init(container, &dirty);
>> +
>>       ret = vfio_set_dirty_page_tracking(container, true);
>>       if (ret) {
>>           vfio_set_migration_error(ret);
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 669d9fe07cd9..d97a6de17921 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>>   vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>>   vfio_disconnect_container(int fd) "close container->fd=%d"
>>   vfio_put_group(int fd) "close group->fd=%d"
>>   vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 87524c64a443..0f84136cceb5 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -96,6 +96,19 @@ typedef struct VFIOContainer {
>>       QLIST_ENTRY(VFIOContainer) next;
>>   } VFIOContainer;
>>   
>> +typedef struct VFIODirtyTrackingRange {
>> +    hwaddr min32;
>> +    hwaddr max32;
>> +    hwaddr min64;
>> +    hwaddr max64;
>> +} VFIODirtyTrackingRange;
>> +
>> +typedef struct VFIODirtyRanges {
>> +    VFIOContainer *container;
>> +    VFIODirtyTrackingRange ranges;
>> +    MemoryListener listener;
>> +} VFIODirtyRanges;
>> +
>>   typedef struct VFIOGuestIOMMU {
>>       VFIOContainer *container;
>>       IOMMUMemoryRegion *iommu_mr;
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation
  2023-03-07  2:40   ` Alex Williamson
@ 2023-03-07 10:11     ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On 07/03/2023 02:40, Alex Williamson wrote:
> On Tue,  7 Mar 2023 02:02:51 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> In preparation to be used in device dirty tracking, move the code that
>> calculates a iova/end range from the container/section.  This avoids
>> duplication on the common checks across listener callbacks.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  hw/vfio/common.c | 37 ++++++++++++++++++++++++++++++-------
>>  1 file changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 54b4a4fc7daf..3a6491dbc523 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -961,6 +961,35 @@ static bool vfio_listener_valid_section(MemoryRegionSection *section)
>>      return true;
>>  }
>>  
>> +/*
>> + * Called for the dirty tracking memory listener to calculate the iova/end
>> + * for a given memory region section.
>> + */
>> +static bool vfio_get_section_iova_range(VFIOContainer *container,
>> +                                        MemoryRegionSection *section,
>> +                                        hwaddr *out_iova, hwaddr *out_end,
>> +                                        Int128 *out_llend)
>> +{
>> +    Int128 llend;
>> +    hwaddr iova;
>> +
>> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>> +    llend = int128_make64(section->offset_within_address_space);
>> +    llend = int128_add(llend, section->size);
>> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
>> +
>> +    if (int128_ge(int128_make64(iova), llend)) {
>> +        return false;
>> +    }
>> +
>> +    *out_iova = iova;
>> +    *out_end = int128_get64(int128_sub(llend, int128_one()));
>> +    if (out_llend) {
>> +        *out_llend = llend;
>> +    }
>> +    return true;
>> +}
>> +
>>  static void vfio_listener_region_add(MemoryListener *listener,
>>                                       MemoryRegionSection *section)
>>  {
>> @@ -976,12 +1005,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>          return;
>>      }
>>  
>> -    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>> -    llend = int128_make64(section->offset_within_address_space);
>> -    llend = int128_add(llend, section->size);
>> -    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
>> -
>> -    if (int128_ge(int128_make64(iova), llend)) {
>> +    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
>>          if (memory_region_is_ram_device(section->mr)) {
>>              trace_vfio_listener_region_add_no_dma_map(
>>                  memory_region_name(section->mr),
>> @@ -991,7 +1015,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>          }
>>          return;
>>      }
>> -    end = int128_get64(int128_sub(llend, int128_one()));
>>  
>>      if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>>          hwaddr pgsize = 0;
> 
> Shouldn't this convert vfio_listener_region_del() too?  Thanks,

Yeap.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 09/14] vfio/common: Add device dirty page tracking start/stop
  2023-03-07  2:02 ` [PATCH v4 09/14] vfio/common: Add device dirty page tracking start/stop Joao Martins
@ 2023-03-07 10:14   ` Avihai Horon
  2023-03-07 10:31     ` Joao Martins
  0 siblings, 1 reply; 38+ messages in thread
From: Avihai Horon @ 2023-03-07 10:14 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 07/03/2023 4:02, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> Add device dirty page tracking start/stop functionality. This uses the
> device DMA logging uAPI to start and stop dirty page tracking by device.
>
> Device dirty page tracking is used only if all devices within a
> container support device dirty page tracking.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   hw/vfio/common.c              | 175 +++++++++++++++++++++++++++++++++-
>   hw/vfio/trace-events          |   1 +
>   include/hw/vfio/vfio-common.h |   2 +
>   3 files changed, 173 insertions(+), 5 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index a9b1fc999121..a42f5f1e7ffe 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -450,6 +450,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>       return true;
>   }
>
> +static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (!vbasedev->dirty_pages_supported) {
> +                return false;
> +            }
> +        }
> +    }
> +
> +    return true;
> +}
> +
>   /*
>    * Check if all VFIO devices are running and migration is active, which is
>    * essentially equivalent to the migration being in pre-copy phase.
> @@ -1407,16 +1423,158 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
>       memory_listener_unregister(&dirty->listener);
>   }
>
> +static void vfio_devices_dma_logging_stop(VFIOContainer *container)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> +    VFIODevice *vbasedev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags = VFIO_DEVICE_FEATURE_SET |
> +                     VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (!vbasedev->dirty_tracking) {
> +                continue;
> +            }
> +
> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +            if (ret) {
> +                warn_report("%s: Failed to stop DMA logging, err %d (%s)",
> +                             vbasedev->name, ret, strerror(errno));
> +            }

Nit, no need for ret:

if (ioctl(...)) {
}

And regardless, need to replace ret with -errno in warn_report:

warn_report("%s: Failed to stop DMA logging, err %d (%s)",
             vbasedev->name, -errno, strerror(errno));

Thanks.

> +            vbasedev->dirty_tracking = false;
> +        }
> +    }
> +}
> +
> +static struct vfio_device_feature *
> +vfio_device_feature_dma_logging_start_create(VFIOContainer *container,
> +                                             VFIODirtyTrackingRange *tracking)
> +{
> +    struct vfio_device_feature *feature;
> +    size_t feature_size;
> +    struct vfio_device_feature_dma_logging_control *control;
> +    struct vfio_device_feature_dma_logging_range *ranges;
> +
> +    feature_size = sizeof(struct vfio_device_feature) +
> +                   sizeof(struct vfio_device_feature_dma_logging_control);
> +    feature = g_try_malloc0(feature_size);
> +    if (!feature) {
> +        errno = ENOMEM;
> +        return NULL;
> +    }
> +    feature->argsz = feature_size;
> +    feature->flags = VFIO_DEVICE_FEATURE_SET |
> +                     VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
> +
> +    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
> +    control->page_size = qemu_real_host_page_size();
> +
> +    /*
> +     * DMA logging uAPI guarantees to support at least a number of ranges that
> +     * fits into a single host kernel base page.
> +     */
> +    control->num_ranges = !!tracking->max32 + !!tracking->max64;
> +    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
> +                        control->num_ranges);
> +    if (!ranges) {
> +        g_free(feature);
> +        errno = ENOMEM;
> +
> +        return NULL;
> +    }
> +
> +    control->ranges = (__u64)(uintptr_t)ranges;
> +    if (tracking->max32) {
> +        ranges->iova = tracking->min32;
> +        ranges->length = (tracking->max32 - tracking->min32) + 1;
> +        ranges++;
> +    }
> +    if (tracking->max64) {
> +        ranges->iova = tracking->min64;
> +        ranges->length = (tracking->max64 - tracking->min64) + 1;
> +    }
> +
> +    trace_vfio_device_dirty_tracking_start(control->num_ranges,
> +                                           tracking->min32, tracking->max32,
> +                                           tracking->min64, tracking->max64);
> +
> +    return feature;
> +}
> +
> +static void vfio_device_feature_dma_logging_start_destroy(
> +    struct vfio_device_feature *feature)
> +{
> +    struct vfio_device_feature_dma_logging_control *control =
> +        (struct vfio_device_feature_dma_logging_control *)feature->data;
> +    struct vfio_device_feature_dma_logging_range *ranges =
> +        (struct vfio_device_feature_dma_logging_range *)(uintptr_t) control->ranges;
> +
> +    g_free(ranges);
> +    g_free(feature);
> +}
> +
> +static int vfio_devices_dma_logging_start(VFIOContainer *container)
> +{
> +    struct vfio_device_feature *feature;
> +    VFIODirtyRanges dirty;
> +    VFIODevice *vbasedev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    vfio_dirty_tracking_init(container, &dirty);
> +    feature = vfio_device_feature_dma_logging_start_create(container,
> +                                                           &dirty.ranges);
> +    if (!feature) {
> +        return -errno;
> +    }
> +
> +    QLIST_FOREACH(group, &container->group_list, container_next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (vbasedev->dirty_tracking) {
> +                continue;
> +            }
> +
> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +            if (ret) {
> +                ret = -errno;
> +                error_report("%s: Failed to start DMA logging, err %d (%s)",
> +                             vbasedev->name, ret, strerror(errno));
> +                goto out;
> +            }
> +            vbasedev->dirty_tracking = true;
> +        }
> +    }
> +
> +out:
> +    if (ret) {
> +        vfio_devices_dma_logging_stop(container);
> +    }
> +
> +    vfio_device_feature_dma_logging_start_destroy(feature);
> +
> +    return ret;
> +}
> +
>   static void vfio_listener_log_global_start(MemoryListener *listener)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> -    VFIODirtyRanges dirty;
>       int ret;
>
> -    vfio_dirty_tracking_init(container, &dirty);
> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> +        ret = vfio_devices_dma_logging_start(container);
> +    } else {
> +        ret = vfio_set_dirty_page_tracking(container, true);
> +    }
>
> -    ret = vfio_set_dirty_page_tracking(container, true);
>       if (ret) {
> +        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
> +                     ret, strerror(-ret));
>           vfio_set_migration_error(ret);
>       }
>   }
> @@ -1424,10 +1582,17 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
>   static void vfio_listener_log_global_stop(MemoryListener *listener)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> -    int ret;
> +    int ret = 0;
> +
> +    if (vfio_devices_all_device_dirty_tracking(container)) {
> +        vfio_devices_dma_logging_stop(container);
> +    } else {
> +        ret = vfio_set_dirty_page_tracking(container, false);
> +    }
>
> -    ret = vfio_set_dirty_page_tracking(container, false);
>       if (ret) {
> +        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
> +                     ret, strerror(-ret));
>           vfio_set_migration_error(ret);
>       }
>   }
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index d97a6de17921..7a7e0cfe5b23 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t si
>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>   vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
> +vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
>   vfio_disconnect_container(int fd) "close container->fd=%d"
>   vfio_put_group(int fd) "close group->fd=%d"
>   vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 0f84136cceb5..7817ca7d8706 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -156,6 +156,8 @@ typedef struct VFIODevice {
>       VFIOMigration *migration;
>       Error *migration_blocker;
>       OnOffAuto pre_copy_dirty_page_tracking;
> +    bool dirty_pages_supported;
> +    bool dirty_tracking;
>   } VFIODevice;
>
>   struct VFIODeviceOps {
> --
> 2.17.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges
  2023-03-07  2:57   ` Alex Williamson
  2023-03-07 10:08     ` Cédric Le Goater
@ 2023-03-07 10:16     ` Joao Martins
  2023-03-07 12:13       ` Joao Martins
  1 sibling, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:16 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

On 07/03/2023 02:57, Alex Williamson wrote:
> On Tue,  7 Mar 2023 02:02:52 +0000
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> According to the device DMA logging uAPI, IOVA ranges to be logged by
>> the device must be provided all at once upon DMA logging start.
>>
>> As preparation for the following patches which will add device dirty
>> page tracking, keep a record of all DMA mapped IOVA ranges so later they
>> can be used for DMA logging start.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  hw/vfio/common.c              | 76 +++++++++++++++++++++++++++++++++++
>>  hw/vfio/trace-events          |  1 +
>>  include/hw/vfio/vfio-common.h | 13 ++++++
>>  3 files changed, 90 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 3a6491dbc523..a9b1fc999121 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1334,11 +1334,87 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>>      return ret;
>>  }
>>  
>> +static void vfio_dirty_tracking_update(MemoryListener *listener,
>> +                                       MemoryRegionSection *section)
>> +{
>> +    VFIODirtyRanges *dirty = container_of(listener, VFIODirtyRanges, listener);
>> +    VFIODirtyTrackingRange *range = &dirty->ranges;
>> +    hwaddr max32 = UINT32_MAX - 1ULL;
> 
> The -1 is wrong here, UINT32_MAX is (2^32 - 1)
> 
Ugh, what a distraction.

The reason it worked in my tests is because there's a whole at the boundary,
being off by one wouldn't change the end range.

>> +    hwaddr iova, end;
>> +
>> +    if (!vfio_listener_valid_section(section) ||
>> +        !vfio_get_section_iova_range(dirty->container, section,
>> +                                     &iova, &end, NULL)) {
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * The address space passed to the dirty tracker is reduced to two ranges:
>> +     * one for 32-bit DMA ranges, and another one for 64-bit DMA ranges.
>> +     * The underlying reports of dirty will query a sub-interval of each of
>> +     * these ranges.
>> +     *
>> +     * The purpose of the dual range handling is to handle known cases of big
>> +     * holes in the address space, like the x86 AMD 1T hole. The alternative
>> +     * would be an IOVATree but that has a much bigger runtime overhead and
>> +     * unnecessary complexity.
>> +     */
>> +    if (iova < max32 && end <= max32) {
> 
> Nit, the first test is redundant, iova is necessarily less than end.
>

I'll delete it.

>> +        if (range->min32 > iova) {
>> +            range->min32 = iova;
>> +        }
>> +        if (range->max32 < end) {
>> +            range->max32 = end;
>> +        }
>> +        trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                    range->min32, range->max32);
>> +    } else {
>> +        if (!range->min64 || range->min64 > iova) {
> 
> The first test should be removed, we're initializing min64 to a
> non-zero value now, so if it's zero it's been set and we can't
> de-prioritize that set value.
> 
Distraction again, I am sure I removed all. and the test is pretty useless as
this will never be true.

>> +            range->min64 = iova;
>> +        }
>> +        if (range->max64 < end) {
>> +            range->max64 = end;
>> +        }
>> +        trace_vfio_device_dirty_tracking_update(iova, end,
>> +                                    range->min64, range->max64);
>> +    }
>> +
>> +    return;
>> +}
>> +
>> +static const MemoryListener vfio_dirty_tracking_listener = {
>> +    .name = "vfio-tracking",
>> +    .region_add = vfio_dirty_tracking_update,
>> +};
>> +
>> +static void vfio_dirty_tracking_init(VFIOContainer *container,
>> +                                     VFIODirtyRanges *dirty)
>> +{
>> +    memset(dirty, 0, sizeof(*dirty));
>> +    dirty->ranges.min32 = UINT32_MAX;
>> +    dirty->ranges.min64 = UINT64_MAX;
>> +    dirty->listener = vfio_dirty_tracking_listener;
>> +    dirty->container = container;
>> +
> 
> I was actually thinking the caller would just pass
> VFIODirtyTrackingRange and VFIODirtyRanges would be allocated on the
> stack here, perhaps both are defined private to this file, but this
> works and we can refine later if we so decide.  Thanks,
>
OK, I see what you mean. Since I have to respin v5, I'll fix this.

> 
>> +    memory_listener_register(&dirty->listener,
>> +                             container->space->as);
>> +
>> +    /*
>> +     * The memory listener is synchronous, and used to calculate the range
>> +     * to dirty tracking. Unregister it after we are done as we are not
>> +     * interested in any follow-up updates.
>> +     */
>> +    memory_listener_unregister(&dirty->listener);
>> +}
>> +
>>  static void vfio_listener_log_global_start(MemoryListener *listener)
>>  {
>>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> +    VFIODirtyRanges dirty;
>>      int ret;
>>  
>> +    vfio_dirty_tracking_init(container, &dirty);
>> +
>>      ret = vfio_set_dirty_page_tracking(container, true);
>>      if (ret) {
>>          vfio_set_migration_error(ret);
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 669d9fe07cd9..d97a6de17921 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>>  vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>>  vfio_disconnect_container(int fd) "close container->fd=%d"
>>  vfio_put_group(int fd) "close group->fd=%d"
>>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 87524c64a443..0f84136cceb5 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -96,6 +96,19 @@ typedef struct VFIOContainer {
>>      QLIST_ENTRY(VFIOContainer) next;
>>  } VFIOContainer;
>>  
>> +typedef struct VFIODirtyTrackingRange {
>> +    hwaddr min32;
>> +    hwaddr max32;
>> +    hwaddr min64;
>> +    hwaddr max64;
>> +} VFIODirtyTrackingRange;
>> +
>> +typedef struct VFIODirtyRanges {
>> +    VFIOContainer *container;
>> +    VFIODirtyTrackingRange ranges;
>> +    MemoryListener listener;
>> +} VFIODirtyRanges;
>> +
>>  typedef struct VFIOGuestIOMMU {
>>      VFIOContainer *container;
>>      IOMMUMemoryRegion *iommu_mr;
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 04/14] vfio/common: Add VFIOBitmap and alloc function
  2023-03-07  8:49   ` Avihai Horon
@ 2023-03-07 10:17     ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:17 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 07/03/2023 08:49, Avihai Horon wrote:
> 
> On 07/03/2023 4:02, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> From: Avihai Horon <avihaih@nvidia.com>
>>
>> There are already two places where dirty page bitmap allocation and
>> calculations are done in open code. With device dirty page tracking
>> being added in next patches, there are going to be even more places.
>>
>> To avoid code duplication, introduce VFIOBitmap struct and corresponding
>> alloc function and use them where applicable.
> 
> Nit, after splitting the series we still have only two places where we alloc
> dirty page bitmap.
> 
> So we can drop the second sentence:
> 
> There are two places where dirty page bitmap allocation and calculations
> are done in open code.
> 
> To avoid code duplication, introduce VFIOBitmap struct and corresponding
> alloc function and use them where applicable.
> 
Fixed, thanks!

> Thanks.
> 
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>   hw/vfio/common.c | 73 +++++++++++++++++++++++++++++-------------------
>>   1 file changed, 44 insertions(+), 29 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 4c801513136a..cec3de08d2b4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -320,6 +320,25 @@ const MemoryRegionOps vfio_region_ops = {
>>    * Device state interfaces
>>    */
>>
>> +typedef struct {
>> +    unsigned long *bitmap;
>> +    hwaddr size;
>> +    hwaddr pages;
>> +} VFIOBitmap;
>> +
>> +static int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size)
>> +{
>> +    vbmap->pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
>> +    vbmap->size = ROUND_UP(vbmap->pages, sizeof(__u64) * BITS_PER_BYTE) /
>> +                                         BITS_PER_BYTE;
>> +    vbmap->bitmap = g_try_malloc0(vbmap->size);
>> +    if (!vbmap->bitmap) {
>> +        return -ENOMEM;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   bool vfio_mig_active(void)
>>   {
>>       VFIOGroup *group;
>> @@ -468,9 +487,14 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>>   {
>>       struct vfio_iommu_type1_dma_unmap *unmap;
>>       struct vfio_bitmap *bitmap;
>> -    uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size();
>> +    VFIOBitmap vbmap;
>>       int ret;
>>
>> +    ret = vfio_bitmap_alloc(&vbmap, size);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>>       unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
>>
>>       unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
>> @@ -484,35 +508,28 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
>>        * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize
>>        * to qemu_real_host_page_size.
>>        */
>> -
>>       bitmap->pgsize = qemu_real_host_page_size();
>> -    bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
>> -                   BITS_PER_BYTE;
>> +    bitmap->size = vbmap.size;
>> +    bitmap->data = (__u64 *)vbmap.bitmap;
>>
>> -    if (bitmap->size > container->max_dirty_bitmap_size) {
>> -        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
>> -                     (uint64_t)bitmap->size);
>> +    if (vbmap.size > container->max_dirty_bitmap_size) {
>> +        error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
>>           ret = -E2BIG;
>>           goto unmap_exit;
>>       }
>>
>> -    bitmap->data = g_try_malloc0(bitmap->size);
>> -    if (!bitmap->data) {
>> -        ret = -ENOMEM;
>> -        goto unmap_exit;
>> -    }
>> -
>>       ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
>>       if (!ret) {
>> -        cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
>> -                iotlb->translated_addr, pages);
>> +        cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap,
>> +                iotlb->translated_addr, vbmap.pages);
>>       } else {
>>           error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
>>       }
>>
>> -    g_free(bitmap->data);
>>   unmap_exit:
>>       g_free(unmap);
>> +    g_free(vbmap.bitmap);
>> +
>>       return ret;
>>   }
>>
>> @@ -1329,7 +1346,7 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>   {
>>       struct vfio_iommu_type1_dirty_bitmap *dbitmap;
>>       struct vfio_iommu_type1_dirty_bitmap_get *range;
>> -    uint64_t pages;
>> +    VFIOBitmap vbmap;
>>       int ret;
>>
>>       if (!container->dirty_pages_supported) {
>> @@ -1339,6 +1356,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>           return 0;
>>       }
>>
>> +    ret = vfio_bitmap_alloc(&vbmap, size);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>>       dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
>>
>>       dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
>> @@ -1353,15 +1375,8 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>        * to qemu_real_host_page_size.
>>        */
>>       range->bitmap.pgsize = qemu_real_host_page_size();
>> -
>> -    pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size();
>> -    range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
>> -                                         BITS_PER_BYTE;
>> -    range->bitmap.data = g_try_malloc0(range->bitmap.size);
>> -    if (!range->bitmap.data) {
>> -        ret = -ENOMEM;
>> -        goto err_out;
>> -    }
>> +    range->bitmap.size = vbmap.size;
>> +    range->bitmap.data = (__u64 *)vbmap.bitmap;
>>
>>       ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
>>       if (ret) {
>> @@ -1372,14 +1387,14 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>> *container, uint64_t iova,
>>           goto err_out;
>>       }
>>
>> -    cpu_physical_memory_set_dirty_lebitmap((unsigned long *)range->bitmap.data,
>> -                                            ram_addr, pages);
>> +    cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
>> +                                           vbmap.pages);
>>
>>       trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
>>                                   range->bitmap.size, ram_addr);
>>   err_out:
>> -    g_free(range->bitmap.data);
>>       g_free(dbitmap);
>> +    g_free(vbmap.bitmap);
>>
>>       return ret;
>>   }
>> -- 
>> 2.17.2
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 05/14] vfio/common: Add helper to validate iova/end against hostwin
  2023-03-07  8:57   ` Avihai Horon
@ 2023-03-07 10:18     ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:18 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta



On 07/03/2023 08:57, Avihai Horon wrote:
> 
> On 07/03/2023 4:02, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> In preparation to be used in device dirty tracking, move the code that
>> finds the container host DMA window against a iova range.  This avoids
>> duplication on the common checks across listener callbacks.
> 
> Eventually this isn't used by device dirty tracking, so "In preparation to be
> used in device dirty tracking" can be dropped.
> 
good catch, as this is here since the first range-version checks that still had
a over complicated version of vfio_get_section_range(). I'll remove it.

> Other than that, FWIW:
> 
> Reviewed-by: Avihai Horon <avihaih@nvidia.com>
> 
> Thanks.
> 
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>   hw/vfio/common.c | 38 ++++++++++++++++++++------------------
>>   1 file changed, 20 insertions(+), 18 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index cec3de08d2b4..99acb998eb14 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -901,6 +901,22 @@ static void
>> vfio_unregister_ram_discard_listener(VFIOContainer *container,
>>       g_free(vrdl);
>>   }
>>
>> +static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
>> +                                            hwaddr iova, hwaddr end)
>> +{
>> +    VFIOHostDMAWindow *hostwin;
>> +    bool hostwin_found = false;
>> +
>> +    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
>> +        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
>> +            hostwin_found = true;
>> +            break;
>> +        }
>> +    }
>> +
>> +    return hostwin_found ? hostwin : NULL;
>> +}
>> +
>>   static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>>   {
>>       MemoryRegion *mr = section->mr;
>> @@ -926,7 +942,6 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>       void *vaddr;
>>       int ret;
>>       VFIOHostDMAWindow *hostwin;
>> -    bool hostwin_found;
>>       Error *err = NULL;
>>
>>       if (vfio_listener_skipped_section(section)) {
>> @@ -1027,15 +1042,8 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>   #endif
>>       }
>>
>> -    hostwin_found = false;
>> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
>> -        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
>> -            hostwin_found = true;
>> -            break;
>> -        }
>> -    }
>> -
>> -    if (!hostwin_found) {
>> +    hostwin = vfio_find_hostwin(container, iova, end);
>> +    if (!hostwin) {
>>           error_setg(&err, "Container %p can't map guest IOVA region"
>>                      " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>>           goto fail;
>> @@ -1237,15 +1245,9 @@ static void vfio_listener_region_del(MemoryListener
>> *listener,
>>       if (memory_region_is_ram_device(section->mr)) {
>>           hwaddr pgmask;
>>           VFIOHostDMAWindow *hostwin;
>> -        bool hostwin_found = false;
>>
>> -        QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
>> -            if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
>> -                hostwin_found = true;
>> -                break;
>> -            }
>> -        }
>> -        assert(hostwin_found); /* or region_add() would have failed */
>> +        hostwin = vfio_find_hostwin(container, iova, end);
>> +        assert(hostwin); /* or region_add() would have failed */
>>
>>           pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
>>           try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
>> -- 
>> 2.17.2
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper
  2023-03-07  9:13   ` Avihai Horon
  2023-03-07  9:47     ` Cédric Le Goater
@ 2023-03-07 10:21     ` Joao Martins
  1 sibling, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:21 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 07/03/2023 09:13, Avihai Horon wrote:
> On 07/03/2023 4:02, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> The checks are replicated against region_add and region_del
>> and will be soon added in another memory listener dedicated
>> for dirty tracking.
>>
>> Move these into a new helper for avoid duplication.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>   hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
>>   1 file changed, 21 insertions(+), 31 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 99acb998eb14..54b4a4fc7daf 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -933,23 +933,14 @@ static bool
>> vfio_known_safe_misalignment(MemoryRegionSection *section)
>>       return true;
>>   }
>>
>> -static void vfio_listener_region_add(MemoryListener *listener,
>> -                                     MemoryRegionSection *section)
>> +static bool vfio_listener_valid_section(MemoryRegionSection *section)
>>   {
>> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> -    hwaddr iova, end;
>> -    Int128 llend, llsize;
>> -    void *vaddr;
>> -    int ret;
>> -    VFIOHostDMAWindow *hostwin;
>> -    Error *err = NULL;
>> -
>>       if (vfio_listener_skipped_section(section)) {
>>           trace_vfio_listener_region_add_skip(
>>                   section->offset_within_address_space,
>>                   section->offset_within_address_space +
>>                   int128_get64(int128_sub(section->size, int128_one())));
> 
> The original code uses two different traces depending on add or del --
> trace_vfio_listener_region_{add,del}_skip.
> Should we combine the two traces into a single trace? If the distinction is
> important then maybe pass a flag or the caller name to indicate whether it's
> add, del or dirty tracking update?
> 
I should say that the way I distinct both of them is because there's a
dma_tracking_update new tracepoint where you can tell it's from. And there's a
region_add/del tracepoints. So despite the name we won't get confused IMHO

> But other than that:
> 
> Reviewed-by: Avihai Horon <avihaih@nvidia.com>
> Thanks!

> Thanks.
> 
>> -        return;
>> +        return false;
>>       }
>>
>>       if (unlikely((section->offset_within_address_space &
>> @@ -964,6 +955,24 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>                            section->offset_within_region,
>>                            qemu_real_host_page_size());
>>           }
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +static void vfio_listener_region_add(MemoryListener *listener,
>> +                                     MemoryRegionSection *section)
>> +{
>> +    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> +    hwaddr iova, end;
>> +    Int128 llend, llsize;
>> +    void *vaddr;
>> +    int ret;
>> +    VFIOHostDMAWindow *hostwin;
>> +    Error *err = NULL;
>> +
>> +    if (!vfio_listener_valid_section(section)) {
>>           return;
>>       }
>>
>> @@ -1182,26 +1191,7 @@ static void vfio_listener_region_del(MemoryListener
>> *listener,
>>       int ret;
>>       bool try_unmap = true;
>>
>> -    if (vfio_listener_skipped_section(section)) {
>> -        trace_vfio_listener_region_del_skip(
>> -                section->offset_within_address_space,
>> -                section->offset_within_address_space +
>> -                int128_get64(int128_sub(section->size, int128_one())));
>> -        return;
>> -    }
>> -
>> -    if (unlikely((section->offset_within_address_space &
>> -                  ~qemu_real_host_page_mask()) !=
>> -                 (section->offset_within_region &
>> ~qemu_real_host_page_mask()))) {
>> -        if (!vfio_known_safe_misalignment(section)) {
>> -            error_report("%s received unaligned region %s iova=0x%"PRIx64
>> -                         " offset_within_region=0x%"PRIx64
>> -                         " qemu_real_host_page_size=0x%"PRIxPTR,
>> -                         __func__, memory_region_name(section->mr),
>> -                         section->offset_within_address_space,
>> -                         section->offset_within_region,
>> -                         qemu_real_host_page_size());
>> -        }
>> +    if (!vfio_listener_valid_section(section)) {
>>           return;
>>       }
>>
>> -- 
>> 2.17.2
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 12/14] vfio/migration: Block migration with vIOMMU
  2023-03-07  2:02 ` [PATCH v4 12/14] vfio/migration: Block migration with vIOMMU Joao Martins
@ 2023-03-07 10:22   ` Cédric Le Goater
  2023-03-07 10:31     ` Joao Martins
  0 siblings, 1 reply; 38+ messages in thread
From: Cédric Le Goater @ 2023-03-07 10:22 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 3/7/23 03:02, Joao Martins wrote:
> Migrating with vIOMMU will require either tracking maximum
> IOMMU supported address space (e.g. 39/48 address width on Intel)
> or range-track current mappings and dirty track the new ones
> post starting dirty tracking. This will be done as a separate
> series, so add a live migration blocker until that is fixed.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/common.c              | 46 +++++++++++++++++++++++++++++++++++
>   hw/vfio/migration.c           |  5 ++++
>   hw/vfio/pci.c                 |  1 +
>   include/hw/vfio/vfio-common.h |  2 ++
>   4 files changed, 54 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 75b4902bbcc9..7278baa82f7d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -362,6 +362,7 @@ bool vfio_mig_active(void)
>   }
>   
>   static Error *multiple_devices_migration_blocker;
> +static Error *giommu_migration_blocker;
>   
>   static unsigned int vfio_migratable_device_num(void)
>   {
> @@ -413,6 +414,51 @@ void vfio_unblock_multiple_devices_migration(void)
>       multiple_devices_migration_blocker = NULL;
>   }
>   
> +static bool vfio_viommu_preset(void)
> +{
> +    VFIOAddressSpace *space;
> +
> +    QLIST_FOREACH(space, &vfio_address_spaces, list) {
> +        if (space->as != &address_space_memory) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
> +int vfio_block_giommu_migration(Error **errp)
> +{
> +    int ret;
> +
> +    if (giommu_migration_blocker ||
> +        !vfio_viommu_preset()) {
> +        return 0;
> +    }
> +
> +    error_setg(&giommu_migration_blocker,
> +               "Migration is currently not supported with vIOMMU enabled");
> +    ret = migrate_add_blocker(giommu_migration_blocker, errp);
> +    if (ret < 0) {
> +        error_free(giommu_migration_blocker);
> +        giommu_migration_blocker = NULL;
> +    }
> +
> +    return ret;
> +}
> +
> +void vfio_unblock_giommu_migration(void)
> +{
> +    if (!giommu_migration_blocker ||
> +        vfio_viommu_preset()) {
> +        return;
> +    }
> +
> +    migrate_del_blocker(giommu_migration_blocker);
> +    error_free(giommu_migration_blocker);
> +    giommu_migration_blocker = NULL;
> +}
> +
>   static void vfio_set_migration_error(int err)
>   {
>       MigrationState *ms = migrate_get_current();
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index a2c3d9bade7f..776fd2d7cdf3 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -634,6 +634,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>           return ret;
>       }
>   
> +    ret = vfio_block_giommu_migration(errp);
> +    if (ret) {
> +        return ret;
> +    }
> +
>       trace_vfio_migration_probe(vbasedev->name);
>       return 0;
>   
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 939dcc3d4a9e..30a271eab38c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3185,6 +3185,7 @@ static void vfio_instance_finalize(Object *obj)
>        */
>       vfio_put_device(vdev);
>       vfio_put_group(group);
> +    vfio_unblock_giommu_migration();
>   }
>   
>   static void vfio_exitfn(PCIDevice *pdev)
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7817ca7d8706..63f93ab54811 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -235,6 +235,8 @@ extern VFIOGroupList vfio_group_list;
>   bool vfio_mig_active(void);
>   int vfio_block_multiple_devices_migration(Error **errp);
>   void vfio_unblock_multiple_devices_migration(void);
> +int vfio_block_giommu_migration(Error **errp);
> +void vfio_unblock_giommu_migration(void);
>   int64_t vfio_mig_bytes_transferred(void);
>   
>   #ifdef CONFIG_LINUX



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper
  2023-03-07  9:47     ` Cédric Le Goater
@ 2023-03-07 10:22       ` Joao Martins
  2023-03-07 11:00         ` Joao Martins
  0 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:22 UTC (permalink / raw)
  To: Cédric Le Goater, Avihai Horon, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On 07/03/2023 09:47, Cédric Le Goater wrote:
> On 3/7/23 10:13, Avihai Horon wrote:
>> On 07/03/2023 4:02, Joao Martins wrote:
>>> External email: Use caution opening links or attachments
>>>
>>> The checks are replicated against region_add and region_del
>>> and will be soon added in another memory listener dedicated
>>> for dirty tracking.
>>>
>>> Move these into a new helper for avoid duplication.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>>   hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
>>>   1 file changed, 21 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 99acb998eb14..54b4a4fc7daf 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -933,23 +933,14 @@ static bool
>>> vfio_known_safe_misalignment(MemoryRegionSection *section)
>>>       return true;
>>>   }
>>>
>>> -static void vfio_listener_region_add(MemoryListener *listener,
>>> -                                     MemoryRegionSection *section)
>>> +static bool vfio_listener_valid_section(MemoryRegionSection *section)
>>>   {
>>> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>> -    hwaddr iova, end;
>>> -    Int128 llend, llsize;
>>> -    void *vaddr;
>>> -    int ret;
>>> -    VFIOHostDMAWindow *hostwin;
>>> -    Error *err = NULL;
>>> -
>>>       if (vfio_listener_skipped_section(section)) {
>>>           trace_vfio_listener_region_add_skip(
>>>                   section->offset_within_address_space,
>>>                   section->offset_within_address_space +
>>>                   int128_get64(int128_sub(section->size, int128_one())));
>>
>> The original code uses two different traces depending on add or del --
>> trace_vfio_listener_region_{add,del}_skip.
>> Should we combine the two traces into a single trace? If the distinction is
>> important then maybe pass a flag or the caller name to indicate whether it's
>> add, del or dirty tracking update?
> 
> I think introducing a new trace event 'trace_vfio_listener_region_skip'
> to replace 'trace_vfio_listener_region_add_skip' above should be enough.
> 
OK, I'll introduce a predecessor patch to change the name.

> Thanks,
> 
> C.
> 
>>
>> But other than that:
>>
>> Reviewed-by: Avihai Horon <avihaih@nvidia.com>
>>
>> Thanks.
>>
>>> -        return;
>>> +        return false;
>>>       }
>>>
>>>       if (unlikely((section->offset_within_address_space &
>>> @@ -964,6 +955,24 @@ static void vfio_listener_region_add(MemoryListener
>>> *listener,
>>>                            section->offset_within_region,
>>>                            qemu_real_host_page_size());
>>>           }
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +static void vfio_listener_region_add(MemoryListener *listener,
>>> +                                     MemoryRegionSection *section)
>>> +{
>>> +    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>> +    hwaddr iova, end;
>>> +    Int128 llend, llsize;
>>> +    void *vaddr;
>>> +    int ret;
>>> +    VFIOHostDMAWindow *hostwin;
>>> +    Error *err = NULL;
>>> +
>>> +    if (!vfio_listener_valid_section(section)) {
>>>           return;
>>>       }
>>>
>>> @@ -1182,26 +1191,7 @@ static void vfio_listener_region_del(MemoryListener
>>> *listener,
>>>       int ret;
>>>       bool try_unmap = true;
>>>
>>> -    if (vfio_listener_skipped_section(section)) {
>>> -        trace_vfio_listener_region_del_skip(
>>> -                section->offset_within_address_space,
>>> -                section->offset_within_address_space +
>>> -                int128_get64(int128_sub(section->size, int128_one())));
>>> -        return;
>>> -    }
>>> -
>>> -    if (unlikely((section->offset_within_address_space &
>>> -                  ~qemu_real_host_page_mask()) !=
>>> -                 (section->offset_within_region &
>>> ~qemu_real_host_page_mask()))) {
>>> -        if (!vfio_known_safe_misalignment(section)) {
>>> -            error_report("%s received unaligned region %s iova=0x%"PRIx64
>>> -                         " offset_within_region=0x%"PRIx64
>>> -                         " qemu_real_host_page_size=0x%"PRIxPTR,
>>> -                         __func__, memory_region_name(section->mr),
>>> -                         section->offset_within_address_space,
>>> -                         section->offset_within_region,
>>> -                         qemu_real_host_page_size());
>>> -        }
>>> +    if (!vfio_listener_valid_section(section)) {
>>>           return;
>>>       }
>>>
>>> -- 
>>> 2.17.2
>>>
>>
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation
  2023-03-07  9:52   ` Avihai Horon
@ 2023-03-07 10:26     ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:26 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta



On 07/03/2023 09:52, Avihai Horon wrote:
> 
> On 07/03/2023 4:02, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> In preparation to be used in device dirty tracking, move the code that
>> calculates a iova/end range from the container/section.  This avoids
>> duplication on the common checks across listener callbacks.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   hw/vfio/common.c | 37 ++++++++++++++++++++++++++++++-------
>>   1 file changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 54b4a4fc7daf..3a6491dbc523 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -961,6 +961,35 @@ static bool
>> vfio_listener_valid_section(MemoryRegionSection *section)
>>       return true;
>>   }
>>
>> +/*
>> + * Called for the dirty tracking memory listener to calculate the iova/end
>> + * for a given memory region section.
>> + */
> 
> Should we just delete this comment? The function name is pretty clear.
> Besides that, the comment is not completely accurate -- in this patch we are not
> using it yet for dirty tracking and it's also used for region_{add,del}.
> 
Yes, let me delete it.

> Thanks.
> 
>> +static bool vfio_get_section_iova_range(VFIOContainer *container,
>> +                                        MemoryRegionSection *section,
>> +                                        hwaddr *out_iova, hwaddr *out_end,
>> +                                        Int128 *out_llend)
>> +{
>> +    Int128 llend;
>> +    hwaddr iova;
>> +
>> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>> +    llend = int128_make64(section->offset_within_address_space);
>> +    llend = int128_add(llend, section->size);
>> +    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
>> +
>> +    if (int128_ge(int128_make64(iova), llend)) {
>> +        return false;
>> +    }
>> +
>> +    *out_iova = iova;
>> +    *out_end = int128_get64(int128_sub(llend, int128_one()));
>> +    if (out_llend) {
>> +        *out_llend = llend;
>> +    }
>> +    return true;
>> +}
>> +
>>   static void vfio_listener_region_add(MemoryListener *listener,
>>                                        MemoryRegionSection *section)
>>   {
>> @@ -976,12 +1005,7 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>           return;
>>       }
>>
>> -    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
>> -    llend = int128_make64(section->offset_within_address_space);
>> -    llend = int128_add(llend, section->size);
>> -    llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask()));
>> -
>> -    if (int128_ge(int128_make64(iova), llend)) {
>> +    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
>>           if (memory_region_is_ram_device(section->mr)) {
>>               trace_vfio_listener_region_add_no_dma_map(
>>                   memory_region_name(section->mr),
>> @@ -991,7 +1015,6 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>           }
>>           return;
>>       }
>> -    end = int128_get64(int128_sub(llend, int128_one()));
>>
>>       if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>>           hwaddr pgsize = 0;
>> -- 
>> 2.17.2
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges
  2023-03-07 10:08     ` Cédric Le Goater
@ 2023-03-07 10:30       ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:30 UTC (permalink / raw)
  To: Cédric Le Goater, Alex Williamson
  Cc: qemu-devel, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

On 07/03/2023 10:08, Cédric Le Goater wrote:
> On 3/7/23 03:57, Alex Williamson wrote:
>> On Tue,  7 Mar 2023 02:02:52 +0000
>> Joao Martins <joao.m.martins@oracle.com> wrote:
>>> +static void vfio_dirty_tracking_init(VFIOContainer *container,
>>> +                                     VFIODirtyRanges *dirty)
>>> +{
>>> +    memset(dirty, 0, sizeof(*dirty));
>>> +    dirty->ranges.min32 = UINT32_MAX;
>>> +    dirty->ranges.min64 = UINT64_MAX;
>>> +    dirty->listener = vfio_dirty_tracking_listener;
>>> +    dirty->container = container;
>>> +
>>
>> I was actually thinking the caller would just pass
>> VFIODirtyTrackingRange and VFIODirtyRanges would be allocated on the
>> stack here, perhaps both are defined private to this file, but this
>> works and we can refine later if we so decide.  
> 
> It is true that vfio_devices_dma_logging_start() only needs
> a VFIODirtyTrackingRange struct and not the VFIODirtyRanges struct
> which is a temporary structure for the dirty ranges calculation.
> That would be nicer to have if you respin a v5.
> 
I can.

> I would rename VFIODirtyRanges to VFIODirtyRangesListener and
> VFIODirtyTrackingRange to VFIODirtyRanges.
> 
Better naming indeed.

> I am not sure they need to be in include/hw/vfio/vfio-common.h but
> that seems to be the VFIO practice.
> 
I can move as Alex also suggested it. There's already have
vfio_giommu_dirty_notifier and VFIOBitmap as private structures. I don't expect
that this will be used by other files


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 12/14] vfio/migration: Block migration with vIOMMU
  2023-03-07 10:22   ` Cédric Le Goater
@ 2023-03-07 10:31     ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:31 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon



On 07/03/2023 10:22, Cédric Le Goater wrote:
> On 3/7/23 03:02, Joao Martins wrote:
>> Migrating with vIOMMU will require either tracking maximum
>> IOMMU supported address space (e.g. 39/48 address width on Intel)
>> or range-track current mappings and dirty track the new ones
>> post starting dirty tracking. This will be done as a separate
>> series, so add a live migration blocker until that is fixed.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> 
Thanks!


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 09/14] vfio/common: Add device dirty page tracking start/stop
  2023-03-07 10:14   ` Avihai Horon
@ 2023-03-07 10:31     ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 10:31 UTC (permalink / raw)
  To: Avihai Horon
  Cc: Alex Williamson, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, qemu-devel

On 07/03/2023 10:14, Avihai Horon wrote:
> On 07/03/2023 4:02, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Add device dirty page tracking start/stop functionality. This uses the
>> device DMA logging uAPI to start and stop dirty page tracking by device.
>>
>> Device dirty page tracking is used only if all devices within a
>> container support device dirty page tracking.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   hw/vfio/common.c              | 175 +++++++++++++++++++++++++++++++++-
>>   hw/vfio/trace-events          |   1 +
>>   include/hw/vfio/vfio-common.h |   2 +
>>   3 files changed, 173 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index a9b1fc999121..a42f5f1e7ffe 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -450,6 +450,22 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer
>> *container)
>>       return true;
>>   }
>>
>> +static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +
>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (!vbasedev->dirty_pages_supported) {
>> +                return false;
>> +            }
>> +        }
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>   /*
>>    * Check if all VFIO devices are running and migration is active, which is
>>    * essentially equivalent to the migration being in pre-copy phase.
>> @@ -1407,16 +1423,158 @@ static void vfio_dirty_tracking_init(VFIOContainer
>> *container,
>>       memory_listener_unregister(&dirty->listener);
>>   }
>>
>> +static void vfio_devices_dma_logging_stop(VFIOContainer *container)
>> +{
>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>> +                              sizeof(uint64_t))] = {};
>> +    struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>> +    VFIODevice *vbasedev;
>> +    VFIOGroup *group;
>> +    int ret = 0;
>> +
>> +    feature->argsz = sizeof(buf);
>> +    feature->flags = VFIO_DEVICE_FEATURE_SET |
>> +                     VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
>> +
>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (!vbasedev->dirty_tracking) {
>> +                continue;
>> +            }
>> +
>> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
>> +            if (ret) {
>> +                warn_report("%s: Failed to stop DMA logging, err %d (%s)",
>> +                             vbasedev->name, ret, strerror(errno));
>> +            }
> 
> Nit, no need for ret:
> 
> if (ioctl(...)) {
> }
> 
> And regardless, need to replace ret with -errno in warn_report:
> 
> warn_report("%s: Failed to stop DMA logging, err %d (%s)",
>             vbasedev->name, -errno, strerror(errno));
> 

I'll clean it up, thanks for the suggestion

> Thanks.
> 
>> +            vbasedev->dirty_tracking = false;
>> +        }
>> +    }
>> +}
>> +
>> +static struct vfio_device_feature *
>> +vfio_device_feature_dma_logging_start_create(VFIOContainer *container,
>> +                                             VFIODirtyTrackingRange *tracking)
>> +{
>> +    struct vfio_device_feature *feature;
>> +    size_t feature_size;
>> +    struct vfio_device_feature_dma_logging_control *control;
>> +    struct vfio_device_feature_dma_logging_range *ranges;
>> +
>> +    feature_size = sizeof(struct vfio_device_feature) +
>> +                   sizeof(struct vfio_device_feature_dma_logging_control);
>> +    feature = g_try_malloc0(feature_size);
>> +    if (!feature) {
>> +        errno = ENOMEM;
>> +        return NULL;
>> +    }
>> +    feature->argsz = feature_size;
>> +    feature->flags = VFIO_DEVICE_FEATURE_SET |
>> +                     VFIO_DEVICE_FEATURE_DMA_LOGGING_START;
>> +
>> +    control = (struct vfio_device_feature_dma_logging_control *)feature->data;
>> +    control->page_size = qemu_real_host_page_size();
>> +
>> +    /*
>> +     * DMA logging uAPI guarantees to support at least a number of ranges that
>> +     * fits into a single host kernel base page.
>> +     */
>> +    control->num_ranges = !!tracking->max32 + !!tracking->max64;
>> +    ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
>> +                        control->num_ranges);
>> +    if (!ranges) {
>> +        g_free(feature);
>> +        errno = ENOMEM;
>> +
>> +        return NULL;
>> +    }
>> +
>> +    control->ranges = (__u64)(uintptr_t)ranges;
>> +    if (tracking->max32) {
>> +        ranges->iova = tracking->min32;
>> +        ranges->length = (tracking->max32 - tracking->min32) + 1;
>> +        ranges++;
>> +    }
>> +    if (tracking->max64) {
>> +        ranges->iova = tracking->min64;
>> +        ranges->length = (tracking->max64 - tracking->min64) + 1;
>> +    }
>> +
>> +    trace_vfio_device_dirty_tracking_start(control->num_ranges,
>> +                                           tracking->min32, tracking->max32,
>> +                                           tracking->min64, tracking->max64);
>> +
>> +    return feature;
>> +}
>> +
>> +static void vfio_device_feature_dma_logging_start_destroy(
>> +    struct vfio_device_feature *feature)
>> +{
>> +    struct vfio_device_feature_dma_logging_control *control =
>> +        (struct vfio_device_feature_dma_logging_control *)feature->data;
>> +    struct vfio_device_feature_dma_logging_range *ranges =
>> +        (struct vfio_device_feature_dma_logging_range *)(uintptr_t)
>> control->ranges;
>> +
>> +    g_free(ranges);
>> +    g_free(feature);
>> +}
>> +
>> +static int vfio_devices_dma_logging_start(VFIOContainer *container)
>> +{
>> +    struct vfio_device_feature *feature;
>> +    VFIODirtyRanges dirty;
>> +    VFIODevice *vbasedev;
>> +    VFIOGroup *group;
>> +    int ret = 0;
>> +
>> +    vfio_dirty_tracking_init(container, &dirty);
>> +    feature = vfio_device_feature_dma_logging_start_create(container,
>> +                                                           &dirty.ranges);
>> +    if (!feature) {
>> +        return -errno;
>> +    }
>> +
>> +    QLIST_FOREACH(group, &container->group_list, container_next) {
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (vbasedev->dirty_tracking) {
>> +                continue;
>> +            }
>> +
>> +            ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
>> +            if (ret) {
>> +                ret = -errno;
>> +                error_report("%s: Failed to start DMA logging, err %d (%s)",
>> +                             vbasedev->name, ret, strerror(errno));
>> +                goto out;
>> +            }
>> +            vbasedev->dirty_tracking = true;
>> +        }
>> +    }
>> +
>> +out:
>> +    if (ret) {
>> +        vfio_devices_dma_logging_stop(container);
>> +    }
>> +
>> +    vfio_device_feature_dma_logging_start_destroy(feature);
>> +
>> +    return ret;
>> +}
>> +
>>   static void vfio_listener_log_global_start(MemoryListener *listener)
>>   {
>>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> -    VFIODirtyRanges dirty;
>>       int ret;
>>
>> -    vfio_dirty_tracking_init(container, &dirty);
>> +    if (vfio_devices_all_device_dirty_tracking(container)) {
>> +        ret = vfio_devices_dma_logging_start(container);
>> +    } else {
>> +        ret = vfio_set_dirty_page_tracking(container, true);
>> +    }
>>
>> -    ret = vfio_set_dirty_page_tracking(container, true);
>>       if (ret) {
>> +        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
>> +                     ret, strerror(-ret));
>>           vfio_set_migration_error(ret);
>>       }
>>   }
>> @@ -1424,10 +1582,17 @@ static void
>> vfio_listener_log_global_start(MemoryListener *listener)
>>   static void vfio_listener_log_global_stop(MemoryListener *listener)
>>   {
>>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>> -    int ret;
>> +    int ret = 0;
>> +
>> +    if (vfio_devices_all_device_dirty_tracking(container)) {
>> +        vfio_devices_dma_logging_stop(container);
>> +    } else {
>> +        ret = vfio_set_dirty_page_tracking(container, false);
>> +    }
>>
>> -    ret = vfio_set_dirty_page_tracking(container, false);
>>       if (ret) {
>> +        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
>> +                     ret, strerror(-ret));
>>           vfio_set_migration_error(ret);
>>       }
>>   }
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index d97a6de17921..7a7e0cfe5b23 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -105,6 +105,7 @@ vfio_listener_region_add_no_dma_map(const char *name,
>> uint64_t iova, uint64_t si
>>   vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING
>> region_del 0x%"PRIx64" - 0x%"PRIx64
>>   vfio_listener_region_del(uint64_t start, uint64_t end) "region_del
>> 0x%"PRIx64" - 0x%"PRIx64
>>   vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t
>> min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" -
>> 0x%"PRIx64"]"
>> +vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t
>> max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" -
>> 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
>>   vfio_disconnect_container(int fd) "close container->fd=%d"
>>   vfio_put_group(int fd) "close group->fd=%d"
>>   vfio_get_device(const char * name, unsigned int flags, unsigned int
>> num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 0f84136cceb5..7817ca7d8706 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -156,6 +156,8 @@ typedef struct VFIODevice {
>>       VFIOMigration *migration;
>>       Error *migration_blocker;
>>       OnOffAuto pre_copy_dirty_page_tracking;
>> +    bool dirty_pages_supported;
>> +    bool dirty_tracking;
>>   } VFIODevice;
>>
>>   struct VFIODeviceOps {
>> -- 
>> 2.17.2
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper
  2023-03-07 10:22       ` Joao Martins
@ 2023-03-07 11:00         ` Joao Martins
  2023-03-07 11:07           ` Cédric Le Goater
  0 siblings, 1 reply; 38+ messages in thread
From: Joao Martins @ 2023-03-07 11:00 UTC (permalink / raw)
  To: Cédric Le Goater, Avihai Horon
  Cc: Alex Williamson, Yishai Hadas, qemu-devel, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 07/03/2023 10:22, Joao Martins wrote:
> On 07/03/2023 09:47, Cédric Le Goater wrote:
>> On 3/7/23 10:13, Avihai Horon wrote:
>>> On 07/03/2023 4:02, Joao Martins wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>> The checks are replicated against region_add and region_del
>>>> and will be soon added in another memory listener dedicated
>>>> for dirty tracking.
>>>>
>>>> Move these into a new helper for avoid duplication.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>>>> ---
>>>>   hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
>>>>   1 file changed, 21 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 99acb998eb14..54b4a4fc7daf 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -933,23 +933,14 @@ static bool
>>>> vfio_known_safe_misalignment(MemoryRegionSection *section)
>>>>       return true;
>>>>   }
>>>>
>>>> -static void vfio_listener_region_add(MemoryListener *listener,
>>>> -                                     MemoryRegionSection *section)
>>>> +static bool vfio_listener_valid_section(MemoryRegionSection *section)
>>>>   {
>>>> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>>> -    hwaddr iova, end;
>>>> -    Int128 llend, llsize;
>>>> -    void *vaddr;
>>>> -    int ret;
>>>> -    VFIOHostDMAWindow *hostwin;
>>>> -    Error *err = NULL;
>>>> -
>>>>       if (vfio_listener_skipped_section(section)) {
>>>>           trace_vfio_listener_region_add_skip(
>>>>                   section->offset_within_address_space,
>>>>                   section->offset_within_address_space +
>>>>                   int128_get64(int128_sub(section->size, int128_one())));
>>>
>>> The original code uses two different traces depending on add or del --
>>> trace_vfio_listener_region_{add,del}_skip.
>>> Should we combine the two traces into a single trace? If the distinction is
>>> important then maybe pass a flag or the caller name to indicate whether it's
>>> add, del or dirty tracking update?
>>
>> I think introducing a new trace event 'trace_vfio_listener_region_skip'
>> to replace 'trace_vfio_listener_region_add_skip' above should be enough.
>>
> OK, I'll introduce a predecessor patch to change the name.
> 

Albeit this trace_vfio_listener_region_skip will have a new argument which the
caller passes e.g. region_add, region_skip, tracking_update.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper
  2023-03-07 11:00         ` Joao Martins
@ 2023-03-07 11:07           ` Cédric Le Goater
  0 siblings, 0 replies; 38+ messages in thread
From: Cédric Le Goater @ 2023-03-07 11:07 UTC (permalink / raw)
  To: Joao Martins, Cédric Le Goater, Avihai Horon
  Cc: Alex Williamson, Yishai Hadas, qemu-devel, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 3/7/23 12:00, Joao Martins wrote:
> On 07/03/2023 10:22, Joao Martins wrote:
>> On 07/03/2023 09:47, Cédric Le Goater wrote:
>>> On 3/7/23 10:13, Avihai Horon wrote:
>>>> On 07/03/2023 4:02, Joao Martins wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>> The checks are replicated against region_add and region_del
>>>>> and will be soon added in another memory listener dedicated
>>>>> for dirty tracking.
>>>>>
>>>>> Move these into a new helper for avoid duplication.
>>>>>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>>>>> ---
>>>>>    hw/vfio/common.c | 52 +++++++++++++++++++-----------------------------
>>>>>    1 file changed, 21 insertions(+), 31 deletions(-)
>>>>>
>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>> index 99acb998eb14..54b4a4fc7daf 100644
>>>>> --- a/hw/vfio/common.c
>>>>> +++ b/hw/vfio/common.c
>>>>> @@ -933,23 +933,14 @@ static bool
>>>>> vfio_known_safe_misalignment(MemoryRegionSection *section)
>>>>>        return true;
>>>>>    }
>>>>>
>>>>> -static void vfio_listener_region_add(MemoryListener *listener,
>>>>> -                                     MemoryRegionSection *section)
>>>>> +static bool vfio_listener_valid_section(MemoryRegionSection *section)
>>>>>    {
>>>>> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>>>> -    hwaddr iova, end;
>>>>> -    Int128 llend, llsize;
>>>>> -    void *vaddr;
>>>>> -    int ret;
>>>>> -    VFIOHostDMAWindow *hostwin;
>>>>> -    Error *err = NULL;
>>>>> -
>>>>>        if (vfio_listener_skipped_section(section)) {
>>>>>            trace_vfio_listener_region_add_skip(
>>>>>                    section->offset_within_address_space,
>>>>>                    section->offset_within_address_space +
>>>>>                    int128_get64(int128_sub(section->size, int128_one())));
>>>>
>>>> The original code uses two different traces depending on add or del --
>>>> trace_vfio_listener_region_{add,del}_skip.
>>>> Should we combine the two traces into a single trace? If the distinction is
>>>> important then maybe pass a flag or the caller name to indicate whether it's
>>>> add, del or dirty tracking update?
>>>
>>> I think introducing a new trace event 'trace_vfio_listener_region_skip'
>>> to replace 'trace_vfio_listener_region_add_skip' above should be enough.
>>>
>> OK, I'll introduce a predecessor patch to change the name.
>>
> 
> Albeit this trace_vfio_listener_region_skip will have a new argument which the
> caller passes e.g. region_add, region_skip, tracking_update.

yes. That's fine. The important part is to be able to select a family
of events with '-trace vfio_listener_region*'

Thanks,

C.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges
  2023-03-07 10:16     ` Joao Martins
@ 2023-03-07 12:13       ` Joao Martins
  0 siblings, 0 replies; 38+ messages in thread
From: Joao Martins @ 2023-03-07 12:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cedric Le Goater, Yishai Hadas, Jason Gunthorpe,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon



On 07/03/2023 10:16, Joao Martins wrote:
> On 07/03/2023 02:57, Alex Williamson wrote:
>> On Tue,  7 Mar 2023 02:02:52 +0000
>> Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>>> According to the device DMA logging uAPI, IOVA ranges to be logged by
>>> the device must be provided all at once upon DMA logging start.
>>>
>>> As preparation for the following patches which will add device dirty
>>> page tracking, keep a record of all DMA mapped IOVA ranges so later they
>>> can be used for DMA logging start.
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>>  hw/vfio/common.c              | 76 +++++++++++++++++++++++++++++++++++
>>>  hw/vfio/trace-events          |  1 +
>>>  include/hw/vfio/vfio-common.h | 13 ++++++
>>>  3 files changed, 90 insertions(+)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 3a6491dbc523..a9b1fc999121 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -1334,11 +1334,87 @@ static int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>>>      return ret;
>>>  }
>>>  
>>> +static void vfio_dirty_tracking_update(MemoryListener *listener,
>>> +                                       MemoryRegionSection *section)
>>> +{
>>> +    VFIODirtyRanges *dirty = container_of(listener, VFIODirtyRanges, listener);
>>> +    VFIODirtyTrackingRange *range = &dirty->ranges;
>>> +    hwaddr max32 = UINT32_MAX - 1ULL;
>>
>> The -1 is wrong here, UINT32_MAX is (2^32 - 1)
>>
> Ugh, what a distraction.
> 
> The reason it worked in my tests is because there's a whole at the boundary,
> being off by one wouldn't change the end range.
> 
>>> +    hwaddr iova, end;
>>> +
>>> +    if (!vfio_listener_valid_section(section) ||
>>> +        !vfio_get_section_iova_range(dirty->container, section,
>>> +                                     &iova, &end, NULL)) {
>>> +        return;
>>> +    }
>>> +
>>> +    /*
>>> +     * The address space passed to the dirty tracker is reduced to two ranges:
>>> +     * one for 32-bit DMA ranges, and another one for 64-bit DMA ranges.
>>> +     * The underlying reports of dirty will query a sub-interval of each of
>>> +     * these ranges.
>>> +     *
>>> +     * The purpose of the dual range handling is to handle known cases of big
>>> +     * holes in the address space, like the x86 AMD 1T hole. The alternative
>>> +     * would be an IOVATree but that has a much bigger runtime overhead and
>>> +     * unnecessary complexity.
>>> +     */
>>> +    if (iova < max32 && end <= max32) {
>>
>> Nit, the first test is redundant, iova is necessarily less than end.
>>
> 
> I'll delete it.
> 
>>> +        if (range->min32 > iova) {
>>> +            range->min32 = iova;
>>> +        }
>>> +        if (range->max32 < end) {
>>> +            range->max32 = end;
>>> +        }
>>> +        trace_vfio_device_dirty_tracking_update(iova, end,
>>> +                                    range->min32, range->max32);
>>> +    } else {
>>> +        if (!range->min64 || range->min64 > iova) {
>>
>> The first test should be removed, we're initializing min64 to a
>> non-zero value now, so if it's zero it's been set and we can't
>> de-prioritize that set value.
>>
> Distraction again, I am sure I removed all. and the test is pretty useless as
> this will never be true.
> 

Meanwhile I rewrote that as below for readability. I have one level off of
indentation, and it reads better to me with less repetition of checks.
We select the min/max pair and apply it and update the tracking range.

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a03a2fdfafc5..2ba8fa9043d2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1329,8 +1329,7 @@ static void vfio_dirty_tracking_update(MemoryListener
*listener,
 {
     VFIODirtyRanges *dirty = container_of(listener, VFIODirtyRanges, listener);
     VFIODirtyTrackingRange *range = &dirty->ranges;
-    hwaddr max32 = UINT32_MAX - 1ULL;
-    hwaddr iova, end;
+    hwaddr iova, end, *min, *max;

     if (!vfio_listener_valid_section(section) ||
         !vfio_get_section_iova_range(dirty->container, section,
@@ -1349,26 +1348,17 @@ static void vfio_dirty_tracking_update(MemoryListener
*listener,
      * would be an IOVATree but that has a much bigger runtime overhead and
      * unnecessary complexity.
      */
-    if (iova < max32 && end <= max32) {
-        if (range->min32 > iova) {
-            range->min32 = iova;
-        }
-        if (range->max32 < end) {
-            range->max32 = end;
-        }
-        trace_vfio_device_dirty_tracking_update(iova, end,
-                                    range->min32, range->max32);
-    } else {
-        if (!range->min64 || range->min64 > iova) {
-            range->min64 = iova;
-        }
-        if (range->max64 < end) {
-            range->max64 = end;
-        }
-        trace_vfio_device_dirty_tracking_update(iova, end,
-                                    range->min64, range->max64);
+    min = (end <= UINT32_MAX) ? &range->min32 : &range->min64;
+    max = (end <= UINT32_MAX) ? &range->max32 : &range->max64;
+
+    if (*min > iova) {
+        *min = iova;
+    }
+    if (*max < end) {
+        *max = end;
     }

+    trace_vfio_device_dirty_tracking_update(iova, end, *min, *max);
     return;
 }

>>> +            range->min64 = iova;
>>> +        }
>>> +        if (range->max64 < end) {
>>> +            range->max64 = end;
>>> +        }
>>> +        trace_vfio_device_dirty_tracking_update(iova, end,
>>> +                                    range->min64, range->max64);
>>> +    }
>>> +
>>> +    return;
>>> +}
>>> +
>>> +static const MemoryListener vfio_dirty_tracking_listener = {
>>> +    .name = "vfio-tracking",
>>> +    .region_add = vfio_dirty_tracking_update,
>>> +};
>>> +
>>> +static void vfio_dirty_tracking_init(VFIOContainer *container,
>>> +                                     VFIODirtyRanges *dirty)
>>> +{
>>> +    memset(dirty, 0, sizeof(*dirty));
>>> +    dirty->ranges.min32 = UINT32_MAX;
>>> +    dirty->ranges.min64 = UINT64_MAX;
>>> +    dirty->listener = vfio_dirty_tracking_listener;
>>> +    dirty->container = container;
>>> +
>>
>> I was actually thinking the caller would just pass
>> VFIODirtyTrackingRange and VFIODirtyRanges would be allocated on the
>> stack here, perhaps both are defined private to this file, but this
>> works and we can refine later if we so decide.  Thanks,
>>
> OK, I see what you mean. Since I have to respin v5, I'll fix this.
> 

I've done this, and made the declarations private (using Cedric's naming
suggestions).

>>
>>> +    memory_listener_register(&dirty->listener,
>>> +                             container->space->as);
>>> +
>>> +    /*
>>> +     * The memory listener is synchronous, and used to calculate the range
>>> +     * to dirty tracking. Unregister it after we are done as we are not
>>> +     * interested in any follow-up updates.
>>> +     */
>>> +    memory_listener_unregister(&dirty->listener);
>>> +}
>>> +
>>>  static void vfio_listener_log_global_start(MemoryListener *listener)
>>>  {
>>>      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>>> +    VFIODirtyRanges dirty;
>>>      int ret;
>>>  
>>> +    vfio_dirty_tracking_init(container, &dirty);
>>> +
>>>      ret = vfio_set_dirty_page_tracking(container, true);
>>>      if (ret) {
>>>          vfio_set_migration_error(ret);
>>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>>> index 669d9fe07cd9..d97a6de17921 100644
>>> --- a/hw/vfio/trace-events
>>> +++ b/hw/vfio/trace-events
>>> @@ -104,6 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
>>>  vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>>>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
>>>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
>>> +vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
>>>  vfio_disconnect_container(int fd) "close container->fd=%d"
>>>  vfio_put_group(int fd) "close group->fd=%d"
>>>  vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 87524c64a443..0f84136cceb5 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -96,6 +96,19 @@ typedef struct VFIOContainer {
>>>      QLIST_ENTRY(VFIOContainer) next;
>>>  } VFIOContainer;
>>>  
>>> +typedef struct VFIODirtyTrackingRange {
>>> +    hwaddr min32;
>>> +    hwaddr max32;
>>> +    hwaddr min64;
>>> +    hwaddr max64;
>>> +} VFIODirtyTrackingRange;
>>> +
>>> +typedef struct VFIODirtyRanges {
>>> +    VFIOContainer *container;
>>> +    VFIODirtyTrackingRange ranges;
>>> +    MemoryListener listener;
>>> +} VFIODirtyRanges;
>>> +
>>>  typedef struct VFIOGuestIOMMU {
>>>      VFIOContainer *container;
>>>      IOMMUMemoryRegion *iommu_mr;
>>


^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-03-07 12:14 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07  2:02 [PATCH v4 00/14] vfio/migration: Device dirty page tracking Joao Martins
2023-03-07  2:02 ` [PATCH v4 01/14] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Joao Martins
2023-03-07  2:02 ` [PATCH v4 02/14] vfio/common: Fix wrong %m usages Joao Martins
2023-03-07  2:02 ` [PATCH v4 03/14] vfio/common: Abort migration if dirty log start/stop/sync fails Joao Martins
2023-03-07  2:02 ` [PATCH v4 04/14] vfio/common: Add VFIOBitmap and alloc function Joao Martins
2023-03-07  8:49   ` Avihai Horon
2023-03-07 10:17     ` Joao Martins
2023-03-07  2:02 ` [PATCH v4 05/14] vfio/common: Add helper to validate iova/end against hostwin Joao Martins
2023-03-07  8:57   ` Avihai Horon
2023-03-07 10:18     ` Joao Martins
2023-03-07  2:02 ` [PATCH v4 06/14] vfio/common: Consolidate skip/invalid section into helper Joao Martins
2023-03-07  9:13   ` Avihai Horon
2023-03-07  9:47     ` Cédric Le Goater
2023-03-07 10:22       ` Joao Martins
2023-03-07 11:00         ` Joao Martins
2023-03-07 11:07           ` Cédric Le Goater
2023-03-07 10:21     ` Joao Martins
2023-03-07  2:02 ` [PATCH v4 07/14] vfio/common: Add helper to consolidate iova/end calculation Joao Martins
2023-03-07  2:40   ` Alex Williamson
2023-03-07 10:11     ` Joao Martins
2023-03-07  9:52   ` Avihai Horon
2023-03-07 10:26     ` Joao Martins
2023-03-07  2:02 ` [PATCH v4 08/14] vfio/common: Record DMA mapped IOVA ranges Joao Martins
2023-03-07  2:57   ` Alex Williamson
2023-03-07 10:08     ` Cédric Le Goater
2023-03-07 10:30       ` Joao Martins
2023-03-07 10:16     ` Joao Martins
2023-03-07 12:13       ` Joao Martins
2023-03-07  2:02 ` [PATCH v4 09/14] vfio/common: Add device dirty page tracking start/stop Joao Martins
2023-03-07 10:14   ` Avihai Horon
2023-03-07 10:31     ` Joao Martins
2023-03-07  2:02 ` [PATCH v4 10/14] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Joao Martins
2023-03-07  2:02 ` [PATCH v4 11/14] vfio/common: Add device dirty page bitmap sync Joao Martins
2023-03-07  2:02 ` [PATCH v4 12/14] vfio/migration: Block migration with vIOMMU Joao Martins
2023-03-07 10:22   ` Cédric Le Goater
2023-03-07 10:31     ` Joao Martins
2023-03-07  2:02 ` [PATCH v4 13/14] vfio/migration: Query device dirty page tracking support Joao Martins
2023-03-07  2:02 ` [PATCH v4 14/14] docs/devel: Document VFIO device dirty page tracking Joao Martins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.