All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking
@ 2024-02-12 13:56 Joao Martins
  2024-02-12 13:56 ` [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities() Joao Martins
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

This small series adds support for Dirty Tracking in IOMMUFD backend.
The sole reason I still made it RFC is because of the second patch,
where we are implementing user-managed auto domains.

In essence it is quite similar to the original IOMMUFD series where we
would allocate a HWPT, until we switched later on into a IOAS attach.
Patch 2 goes into more detail, but the gist is that there's two modes of
using IOMMUFD and by keep using kernel managed auto domains we would end
up duplicating the same flags we have in HWPT but into the VFIO IOAS
attach. While it is true that just adding a flag is simpler, it also
creates duplication and motivates duplicate what hwpt-alloc already has.
But there's a chance I have the wrong expectation here, so any feedback
welcome.

The series is divided into:

* Patch 1: Adds a simple helper to get device capabilities;

* Patches 2 - 5: IOMMUFD backend support for dirty tracking;

The workflow is relatively simple:

1) Probe device and allow dirty tracking in the HWPT
2) Toggling dirty tracking on/off
3) Read-and-clear of Dirty IOVAs

The heuristics selected for (1) were to enable it *if* device supports
migration but doesn't support VF dirty tracking or IOMMU dirty tracking
is supported. The latter is for the hotplug case where we can add a device
without a tracker and thus still support migration.

The unmap case is deferred until further vIOMMU support with migration
is added[3] which will then introduce the usage of
IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
dma unmap bitmap flow.

* Patches 6-8: Add disabling of hugepages to allow tracking at base
page; avoid blocking live migration where there's no VF dirty
tracker, considering that we have IOMMU dirty tracking. And allow
disabling VF dirty tracker via qemu command line.

This series builds on top of Zhengzhong series[0], but only requires the
first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
instance after attachment")[1] that are more generic IOMMUFD device
plumbing, and doesn't require the nesting counterpart.

This is stored on github:
https://github.com/jpemartins/qemu/commits/iommufd-v5

Note: While Linux v6.7 has IOMMU dirty tracking feature, I suggest folks
use the latest for-rc of iommufd kernel tree as there's some fixes there.

Comments and feedback appreciated.

Cheers,
        Joao

Chances since RFCv1[2]:
* Remove intel/amd dirty tracking emulation enabling
* Remove the dirtyrate improvement for VF/IOMMU dirty tracking
[Will pursue these two in separate series]
* Introduce auto domains support
* Enforce dirty tracking following the IOMMUFD UAPI for this
* Add support for toggling hugepages in IOMMUFD
* Auto enable support when VF supports migration to use IOMMU
when it doesn't have VF dirty tracking
* Add a parameter to toggle VF dirty tracking

[0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
[1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
[2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
[3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/

Joao Martins (8):
  backends/iommufd: Introduce helper function
    iommufd_device_get_hw_capabilities()
  vfio/iommufd: Introduce auto domain creation
  vfio/iommufd: Probe and request hwpt dirty tracking capability
  vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  backends/iommufd: Add ability to disable hugepages
  vfio/migration: Don't block migration device dirty tracking is
    unsupported
  vfio/common: Allow disabling device dirty page tracking

 backends/iommufd.c            | 133 ++++++++++++++++++++++++++++
 backends/trace-events         |   4 +
 hw/vfio/common.c              |  32 ++++++-
 hw/vfio/iommufd.c             | 162 ++++++++++++++++++++++++++++++++++
 hw/vfio/migration.c           |   5 +-
 hw/vfio/pci.c                 |   3 +
 include/hw/vfio/vfio-common.h |  12 +++
 include/sysemu/iommufd.h      |  17 ++++
 qapi/qom.json                 |   2 +-
 9 files changed, 367 insertions(+), 3 deletions(-)

-- 
2.39.3



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities()
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-26  7:29   ` Duan, Zhenzhong
  2024-02-12 13:56 ` [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation Joao Martins
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

The new helper will fetch vendor agnostic IOMMU capabilities supported
both by hardware and software. Right now it is only iommu dirty
tracking.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 backends/iommufd.c       | 25 +++++++++++++++++++++++++
 include/sysemu/iommufd.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index d92791bba935..8486894f1b3f 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -237,3 +237,28 @@ void iommufd_device_init(IOMMUFDDevice *idev)
     host_iommu_base_device_init(&idev->base, HID_IOMMUFD,
                                 sizeof(IOMMUFDDevice));
 }
+
+int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
+                                       Error **errp)
+{
+    struct iommu_hw_info info = {
+        .size = sizeof(info),
+        .flags = 0,
+        .dev_id = idev->devid,
+        .data_len = 0,
+        .__reserved = 0,
+        .data_uptr = 0,
+        .out_capabilities = 0,
+    };
+    int ret;
+
+    ret = ioctl(idev->iommufd->fd, IOMMU_GET_HW_INFO, &info);
+    if (ret) {
+        error_setg_errno(errp, errno,
+                         "Failed to get hardware info capabilities");
+    } else {
+        *caps = info.out_capabilities;
+    }
+
+    return ret;
+}
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index c3f346976039..4afe97307dbe 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -47,4 +47,6 @@ typedef struct IOMMUFDDevice {
 } IOMMUFDDevice;
 
 void iommufd_device_init(IOMMUFDDevice *idev);
+int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
+                                       Error **errp);
 #endif
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
  2024-02-12 13:56 ` [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities() Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-12 16:27   ` Jason Gunthorpe
                     ` (2 more replies)
  2024-02-12 13:56 ` [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

There's generally two modes of operation for IOMMUFD:

* The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
and mainly performs IOAS_MAP and UNMAP.

* The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.

For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.

For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
which is responsible for creating an IOMMU domain. This is contrast to
the 'simple API' where the IOMMU domain is created by IOMMUFD
automatically when it attaches to VFIO (usually referred as autodomains)

To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimmicing kernel
iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
to IOAS attach.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Right now the only alternative to a userspace autodomains implementation
is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
IOAS attach. So opted for autodomains userspace approach to avoid the
duplication of hwpt-alloc flags vs attach-ioas flags. I lack mdev real
drivers atm, so testing with those is still TBD.

Opinions, comments, welcome!
---
 backends/iommufd.c            | 29 +++++++++++++
 backends/trace-events         |  1 +
 hw/vfio/iommufd.c             | 78 +++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-common.h |  9 ++++
 include/sysemu/iommufd.h      |  4 ++
 5 files changed, 121 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 8486894f1b3f..2970135af4b9 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -211,6 +211,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
     return ret;
 }
 
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t flags,
+                               uint32_t data_type, uint32_t data_len,
+                               void *data_ptr, uint32_t *out_hwpt)
+{
+    int ret;
+    struct iommu_hwpt_alloc alloc_hwpt = {
+        .size = sizeof(struct iommu_hwpt_alloc),
+        .flags = flags,
+        .dev_id = dev_id,
+        .pt_id = pt_id,
+        .data_type = data_type,
+        .data_len = data_len,
+        .data_uptr = (uint64_t)data_ptr,
+        .__reserved = 0,
+    };
+
+    ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+    trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id, flags, data_type,
+                                     data_len, (uint64_t)data_ptr,
+                                     alloc_hwpt.out_hwpt_id, ret);
+    if (ret) {
+        error_report("IOMMU_HWPT_ALLOC failed: %m");
+    } else {
+        *out_hwpt = alloc_hwpt.out_hwpt_id;
+    }
+    return !ret ? 0 : -errno;
+}
+
 static const TypeInfo iommufd_backend_info = {
     .name = TYPE_IOMMUFD_BACKEND,
     .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index d45c6e31a67e..f83a276a4253 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -13,5 +13,6 @@ iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
 iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
 iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
 iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 7d39d7a5fa51..ca7ec45e725c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -219,10 +219,82 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
     return ret;
 }
 
+static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
+                                        VFIOIOMMUFDContainer *container,
+                                        Error **errp)
+{
+    int iommufd = vbasedev->iommufd_dev.iommufd->fd;
+    VFIOIOASHwpt *hwpt;
+    Error *err = NULL;
+    int ret = -EINVAL;
+    uint32_t hwpt_id;
+
+    /* Try to find a domain */
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
+        if (ret) {
+            /* -EINVAL means the domain is incompatible with the device. */
+            if (ret == -EINVAL) {
+                continue;
+            }
+            return ret;
+        } else {
+            vbasedev->hwpt = hwpt;
+            QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+            return 0;
+        }
+    }
+
+    ret = iommufd_backend_alloc_hwpt(iommufd,
+                                     vbasedev->iommufd_dev.devid,
+                                     container->ioas_id, 0, 0, 0,
+                                     NULL, &hwpt_id);
+    if (ret) {
+        error_append_hint(&err,
+                   "Failed to allocate HWPT for device %s. Fallback to IOAS attach\n",
+                   vbasedev->name);
+        warn_report_err(err);
+        return ret;
+    }
+
+    hwpt = g_malloc0(sizeof(*hwpt));
+    hwpt->hwpt_id = hwpt_id;
+    QLIST_INIT(&hwpt->device_list);
+
+    ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
+    if (ret) {
+        iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+        g_free(hwpt);
+        return ret;
+    }
+
+    vbasedev->hwpt = hwpt;
+    QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+    QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+    return 0;
+}
+
+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
+                                         VFIOIOMMUFDContainer *container)
+{
+    VFIOIOASHwpt *hwpt = vbasedev->hwpt;
+
+    QLIST_REMOVE(vbasedev, hwpt_next);
+    QLIST_REMOVE(hwpt, next);
+    if (QLIST_EMPTY(&hwpt->device_list)) {
+        iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+        g_free(hwpt);
+    }
+}
+
 static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
                                          VFIOIOMMUFDContainer *container,
                                          Error **errp)
 {
+    if (!iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
+        return 0;
+    }
+
     return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
 }
 
@@ -231,6 +303,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
 {
     Error *err = NULL;
 
+    if (vbasedev->hwpt) {
+        iommufd_cdev_autodomains_put(vbasedev, container);
+        return;
+    }
+
     if (iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
         error_report_err(err);
     }
@@ -370,6 +447,7 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
     container = g_malloc0(sizeof(*container));
     container->be = vbasedev->iommufd_dev.iommufd;
     container->ioas_id = ioas_id;
+    QLIST_INIT(&container->hwpt_list);
 
     bcontainer = &container->bcontainer;
     vfio_container_init(bcontainer, space, iommufd_vioc);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9c4b60c906d9..7f7d823221e2 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -93,10 +93,17 @@ typedef struct VFIOHostDMAWindow {
 
 typedef struct IOMMUFDBackend IOMMUFDBackend;
 
+typedef struct VFIOIOASHwpt {
+    uint32_t hwpt_id;
+    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
 typedef struct VFIOIOMMUFDContainer {
     VFIOContainerBase bcontainer;
     IOMMUFDBackend *be;
     uint32_t ioas_id;
+    QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
 } VFIOIOMMUFDContainer;
 
 /* Abstraction of host IOMMU legacy device */
@@ -136,6 +143,8 @@ typedef struct VFIODevice {
         IOMMULegacyDevice legacy_dev;
         IOMMUFDDevice iommufd_dev;
     };
+    VFIOIOASHwpt *hwpt;
+    QLIST_ENTRY(VFIODevice) hwpt_next;
 } VFIODevice;
 
 QEMU_BUILD_BUG_ON(offsetof(VFIODevice, legacy_dev.base) !=
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 4afe97307dbe..1966b75caae2 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -49,4 +49,8 @@ typedef struct IOMMUFDDevice {
 void iommufd_device_init(IOMMUFDDevice *idev);
 int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
                                        Error **errp);
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t flags,
+                               uint32_t data_type, uint32_t data_len,
+                               void *data_ptr, uint32_t *out_hwpt);
 #endif
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
  2024-02-12 13:56 ` [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities() Joao Martins
  2024-02-12 13:56 ` [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-19  9:03   ` Avihai Horon
  2024-02-12 13:56 ` [PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

Probe hardware dirty tracking support by querying device hw capabilities
via IOMMUFD_GET_HW_INFO.

In preparation to using the dirty tracking UAPI, request dirty tracking in
the HWPT flags when the device doesn't support dirty page tracking or has
it disabled; or when support when the VF backing IOMMU supports dirty
tracking. The latter is in the possibility of a device being attached
that doesn't have a dirty tracker.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 18 ++++++++++++++++++
 hw/vfio/iommufd.c             | 25 ++++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h |  2 ++
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7f85160be88..d8fc7077f839 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -216,6 +216,24 @@ bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
     return true;
 }
 
+bool vfio_device_migration_supported(VFIODevice *vbasedev)
+{
+    if (!vbasedev->migration) {
+        return false;
+    }
+
+    return vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY;
+}
+
+bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev)
+{
+    if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+        return false;
+    }
+
+    return !vbasedev->dirty_pages_supported;
+}
+
 /*
  * Check if all VFIO devices are running and migration is active, which is
  * essentially equivalent to the migration being in pre-copy phase.
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index ca7ec45e725c..edacb6d72748 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -219,11 +219,26 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
     return ret;
 }
 
+static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
+                                          Error **errp)
+{
+    uint64_t caps;
+    int r;
+
+    r = iommufd_device_get_hw_capabilities(iommufd_dev, &caps, errp);
+    if (r) {
+        return false;
+    }
+
+    return caps & IOMMU_HW_CAP_DIRTY_TRACKING;
+}
+
 static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
                                         VFIOIOMMUFDContainer *container,
                                         Error **errp)
 {
     int iommufd = vbasedev->iommufd_dev.iommufd->fd;
+    uint32_t flags = 0;
     VFIOIOASHwpt *hwpt;
     Error *err = NULL;
     int ret = -EINVAL;
@@ -245,9 +260,15 @@ static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
         }
     }
 
+    if ((vfio_device_migration_supported(vbasedev) &&
+         !vfio_device_dirty_pages_supported(vbasedev)) ||
+        iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err)) {
+        flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+    }
+
     ret = iommufd_backend_alloc_hwpt(iommufd,
                                      vbasedev->iommufd_dev.devid,
-                                     container->ioas_id, 0, 0, 0,
+                                     container->ioas_id, flags, 0, 0,
                                      NULL, &hwpt_id);
     if (ret) {
         error_append_hint(&err,
@@ -271,6 +292,8 @@ static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
     vbasedev->hwpt = hwpt;
     QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
     QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+    container->bcontainer.dirty_pages_supported =
+                              (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
     return 0;
 }
 
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7f7d823221e2..a3e691c126c6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -271,6 +271,8 @@ bool
 vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
 bool
 vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
+bool vfio_device_migration_supported(VFIODevice *vbasedev);
+bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev);
 int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                     VFIOBitmap *vbmap, hwaddr iova,
                                     hwaddr size);
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
                   ` (2 preceding siblings ...)
  2024-02-12 13:56 ` [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-19  9:17   ` Avihai Horon
  2024-02-12 13:56 ` [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking.

It is called on the whole list of iommu domains it is are tracking,
and on failure it rolls it back.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 backends/iommufd.c       | 19 +++++++++++++++++++
 backends/trace-events    |  1 +
 hw/vfio/common.c         |  7 ++++++-
 hw/vfio/iommufd.c        | 28 ++++++++++++++++++++++++++++
 include/sysemu/iommufd.h |  3 +++
 5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2970135af4b9..954de61c2da0 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -240,6 +240,25 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
     return !ret ? 0 : -errno;
 }
 
+int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+                                       bool start)
+{
+    int ret;
+    struct iommu_hwpt_set_dirty_tracking set_dirty = {
+            .size = sizeof(set_dirty),
+            .hwpt_id = hwpt_id,
+            .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
+    };
+
+    ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
+    trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret);
+    if (ret) {
+        error_report("IOMMU_HWPT_SET_DIRTY_TRACKING failed: %s",
+                     strerror(errno));
+    }
+    return !ret ? 0 : -errno;
+}
+
 static const TypeInfo iommufd_backend_info = {
     .name = TYPE_IOMMUFD_BACKEND,
     .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index f83a276a4253..feba2baca5f7 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
 iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%d enable=%d (%d)"
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d8fc7077f839..a940c0b6ede8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -190,7 +190,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
     QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         VFIOMigration *migration = vbasedev->migration;
 
-        if (!migration) {
+        if (!migration && !vbasedev->iommufd_dev.iommufd) {
             return false;
         }
 
@@ -199,6 +199,11 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
              vfio_device_state_is_precopy(vbasedev))) {
             return false;
         }
+
+        if (vbasedev->iommufd_dev.iommufd &&
+            !bcontainer->dirty_pages_supported) {
+            return false;
+        }
     }
     return true;
 }
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index edacb6d72748..361e659288fd 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
 #include "qemu/cutils.h"
 #include "qemu/chardev_open.h"
 #include "pci.h"
+#include "migration/migration.h"
 
 static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
                             ram_addr_t size, void *vaddr, bool readonly)
@@ -115,6 +116,32 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
     iommufd_backend_disconnect(vbasedev->iommufd_dev.iommufd);
 }
 
+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+                                           bool start)
+{
+    const VFIOIOMMUFDContainer *container =
+        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+    int ret;
+    VFIOIOASHwpt *hwpt;
+
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        ret = iommufd_backend_set_dirty_tracking(container->be,
+                                                 hwpt->hwpt_id, start);
+        if (ret) {
+            goto err;
+        }
+    }
+
+    return 0;
+
+err:
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        iommufd_backend_set_dirty_tracking(container->be,
+                                           hwpt->hwpt_id, !start);
+    }
+    return ret;
+}
+
 static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
 {
     long int ret = -ENOTTY;
@@ -737,6 +764,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
     vioc->detach_device = iommufd_cdev_detach;
     vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
     vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
+    vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
 };
 
 static const TypeInfo types[] = {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 1966b75caae2..562c189dd92c 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -53,4 +53,7 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
                                uint32_t pt_id, uint32_t flags,
                                uint32_t data_type, uint32_t data_len,
                                void *data_ptr, uint32_t *out_hwpt);
+int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+                                       bool start);
+
 #endif
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
                   ` (3 preceding siblings ...)
  2024-02-12 13:56 ` [PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-19  9:30   ` Avihai Horon
  2024-02-12 13:56 ` [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages Joao Martins
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.

A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 backends/iommufd.c       | 24 ++++++++++++++++++++++++
 backends/trace-events    |  1 +
 hw/vfio/iommufd.c        | 30 ++++++++++++++++++++++++++++++
 include/sysemu/iommufd.h |  3 +++
 4 files changed, 58 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 954de61c2da0..dd676d493c37 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -259,6 +259,30 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
     return !ret ? 0 : -errno;
 }
 
+int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+                                     uint64_t iova, ram_addr_t size,
+                                     uint64_t page_size, uint64_t *data)
+{
+    int ret;
+    struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
+        .size = sizeof(get_dirty_bitmap),
+        .hwpt_id = hwpt_id,
+        .iova = iova, .length = size,
+        .page_size = page_size, .data = (uintptr_t)data,
+    };
+
+    ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
+    trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
+                                           page_size, ret);
+    if (ret) {
+        error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
+                     " size: 0x%"PRIx64") failed: %s", iova,
+                     size, strerror(errno));
+    }
+
+    return !ret ? 0 : -errno;
+}
+
 static const TypeInfo iommufd_backend_info = {
     .name = TYPE_IOMMUFD_BACKEND,
     .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index feba2baca5f7..11a27cb114b6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -17,3 +17,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
 iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%d enable=%d (%d)"
+iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 361e659288fd..79b13bd262cc 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
 #include "qemu/cutils.h"
 #include "qemu/chardev_open.h"
 #include "pci.h"
+#include "exec/ram_addr.h"
 #include "migration/migration.h"
 
 static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
@@ -142,6 +143,34 @@ err:
     return ret;
 }
 
+static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
+                                      VFIOBitmap *vbmap, uint64_t iova,
+                                      uint64_t size)
+{
+    VFIOIOMMUFDContainer *container = container_of(bcontainer,
+                                                   VFIOIOMMUFDContainer,
+                                                   bcontainer);
+    int ret;
+    VFIOIOASHwpt *hwpt;
+    unsigned long page_size;
+
+    if (!bcontainer->dirty_pages_supported) {
+        return 0;
+    }
+
+    page_size = qemu_real_host_page_size();
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
+                                               iova, size, page_size,
+                                               vbmap->bitmap);
+        if (ret) {
+            break;
+        }
+    }
+
+    return ret;
+}
+
 static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
 {
     long int ret = -ENOTTY;
@@ -765,6 +794,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
     vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
     vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
     vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
+    vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
 };
 
 static const TypeInfo types[] = {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 562c189dd92c..ba19b7ea4c19 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -55,5 +55,8 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
                                void *data_ptr, uint32_t *out_hwpt);
 int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
                                        bool start);
+int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+                                     uint64_t iova, ram_addr_t size,
+                                     uint64_t page_size, uint64_t *data);
 
 #endif
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
                   ` (4 preceding siblings ...)
  2024-02-12 13:56 ` [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-12 16:59   ` Jason Gunthorpe
                     ` (2 more replies)
  2024-02-12 13:56 ` [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

Allow disabling hugepages to be dirty track at base page
granularity in similar vein to vfio_type1_iommu.disable_hugepages
but per IOAS.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 backends/iommufd.c       | 36 ++++++++++++++++++++++++++++++++++++
 backends/trace-events    |  1 +
 hw/vfio/iommufd.c        |  4 ++++
 include/sysemu/iommufd.h |  4 ++++
 qapi/qom.json            |  2 +-
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index dd676d493c37..72fd98a9a50c 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -29,6 +29,7 @@ static void iommufd_backend_init(Object *obj)
     be->fd = -1;
     be->users = 0;
     be->owned = true;
+    be->hugepages = 1;
 }
 
 static void iommufd_backend_finalize(Object *obj)
@@ -63,6 +64,14 @@ static bool iommufd_backend_can_be_deleted(UserCreatable *uc)
     return !be->users;
 }
 
+static void iommufd_backend_set_hugepages(Object *obj, bool enabled,
+                                          Error **errp)
+{
+    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+    be->hugepages = enabled;
+}
+
 static void iommufd_backend_class_init(ObjectClass *oc, void *data)
 {
     UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
@@ -70,6 +79,11 @@ static void iommufd_backend_class_init(ObjectClass *oc, void *data)
     ucc->can_be_deleted = iommufd_backend_can_be_deleted;
 
     object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
+
+    object_class_property_add_bool(oc, "hugepages", NULL,
+                                   iommufd_backend_set_hugepages);
+    object_class_property_set_description(oc, "hugepages",
+                                          "Set to 'off' to disable hugepages");
 }
 
 int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
@@ -106,6 +120,28 @@ out:
     trace_iommufd_backend_disconnect(be->fd, be->users);
 }
 
+int iommufd_backend_set_option(int fd, uint32_t object_id,
+                               uint32_t option_id, uint64_t val64)
+{
+    int ret;
+    struct iommu_option option = {
+        .size = sizeof(option),
+        .option_id = option_id,
+        .op = IOMMU_OPTION_OP_SET,
+        .val64 = val64,
+        .object_id = object_id,
+    };
+
+    ret = ioctl(fd, IOMMU_OPTION, &option);
+    if (ret) {
+        error_report("Failed to set option %x to value %"PRIx64" %m", option_id,
+                     val64);
+    }
+    trace_iommufd_backend_set_option(fd, object_id, option_id, val64, ret);
+
+    return ret;
+}
+
 int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
                                Error **errp)
 {
diff --git a/backends/trace-events b/backends/trace-events
index 11a27cb114b6..076166552881 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -15,6 +15,7 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, u
 iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
 iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
+iommufd_backend_set_option(int iommufd, uint32_t object_id, uint32_t option_id, uint64_t val, int ret) " iommufd=%d object_id=%u option_id=%u val64=0x%"PRIx64" (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
 iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%d enable=%d (%d)"
 iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 79b13bd262cc..697d40841d7f 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -521,6 +521,10 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
         goto err_alloc_ioas;
     }
 
+    if (!vbasedev->iommufd_dev.iommufd->hugepages) {
+        iommufd_backend_set_option(vbasedev->iommufd_dev.iommufd->fd, ioas_id,
+                                   IOMMU_OPTION_HUGE_PAGES, 0);
+    }
     trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd_dev.iommufd->fd, ioas_id);
 
     container = g_malloc0(sizeof(*container));
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index ba19b7ea4c19..bc6607e3d444 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -19,6 +19,7 @@ struct IOMMUFDBackend {
     /*< protected >*/
     int fd;            /* /dev/iommu file descriptor */
     bool owned;        /* is the /dev/iommu opened internally */
+    bool hugepages;    /* are hugepages enabled on the IOAS */
     uint32_t users;
 
     /*< public >*/
@@ -30,6 +31,9 @@ void iommufd_backend_disconnect(IOMMUFDBackend *be);
 int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
                                Error **errp);
 void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
+int iommufd_backend_set_option(int fd, uint32_t object_id,
+                               uint32_t option_id,
+                               uint64_t val64);
 int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
                             ram_addr_t size, void *vaddr, bool readonly);
 int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
diff --git a/qapi/qom.json b/qapi/qom.json
index 84af23fe245d..9ad27e2b939b 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -809,7 +809,7 @@
 # Since: 9.0
 ##
 { 'struct': 'IOMMUFDProperties',
-  'data': { '*fd': 'str' } }
+  'data': { '*fd': 'str', '*hugepages': 'bool' } }
 
 ##
 # @RngProperties:
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
                   ` (5 preceding siblings ...)
  2024-02-12 13:56 ` [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-19 10:12   ` Avihai Horon
  2024-02-12 13:56 ` [PATCH RFCv2 8/8] vfio/common: Allow disabling device dirty page tracking Joao Martins
  2024-02-13 11:59 ` [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
  8 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.

For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.

So starting with IOMMU dirty tracking it can use to acomodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/iommufd.c        | 3 +--
 hw/vfio/migration.c      | 4 +++-
 include/sysemu/iommufd.h | 1 +
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 697d40841d7f..78d8f4391b68 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -275,8 +275,7 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
     return ret;
 }
 
-static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
-                                          Error **errp)
+bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev, Error **errp)
 {
     uint64_t caps;
     int r;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 70e6b1a709f9..674e76b3f3df 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -938,7 +938,9 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
         return !vfio_block_migration(vbasedev, err, errp);
     }
 
-    if (!vbasedev->dirty_pages_supported) {
+    if (!vbasedev->dirty_pages_supported &&
+        (vbasedev->iommufd_dev.iommufd &&
+         !iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err))) {
         if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
             error_setg(&err,
                        "%s: VFIO device doesn't support device dirty tracking",
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index bc6607e3d444..d6be49f2ac78 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -53,6 +53,7 @@ typedef struct IOMMUFDDevice {
 void iommufd_device_init(IOMMUFDDevice *idev);
 int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
                                        Error **errp);
+bool iommufd_dirty_pages_supported(IOMMUFDDevice *idev, Error **errp);
 int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
                                uint32_t pt_id, uint32_t flags,
                                uint32_t data_type, uint32_t data_len,
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFCv2 8/8] vfio/common: Allow disabling device dirty page tracking
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
                   ` (6 preceding siblings ...)
  2024-02-12 13:56 ` [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-02-12 13:56 ` Joao Martins
  2024-02-13 11:59 ` [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
  8 siblings, 0 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-12 13:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon, Joao Martins

The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.

Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c              | 7 +++++++
 hw/vfio/migration.c           | 3 ++-
 hw/vfio/pci.c                 | 3 +++
 include/hw/vfio/vfio-common.h | 1 +
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a940c0b6ede8..9fe113ea016d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -213,6 +213,9 @@ bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
     VFIODevice *vbasedev;
 
     QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
+        if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+            return false;
+        }
         if (!vbasedev->dirty_pages_supported) {
             return false;
         }
@@ -236,6 +239,10 @@ bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev)
         return false;
     }
 
+    if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+        return false;
+    }
+
     return !vbasedev->dirty_pages_supported;
 }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 674e76b3f3df..09e742fbeb9f 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -938,7 +938,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
         return !vfio_block_migration(vbasedev, err, errp);
     }
 
-    if (!vbasedev->dirty_pages_supported &&
+    if ((!vbasedev->dirty_pages_supported ||
+         vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
         (vbasedev->iommufd_dev.iommufd &&
          !iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err))) {
         if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dedb64fc080e..d3b516b2e4d3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3349,6 +3349,9 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
                             vbasedev.pre_copy_dirty_page_tracking,
                             ON_OFF_AUTO_ON),
+    DEFINE_PROP_ON_OFF_AUTO("x-device-dirty-page-tracking", VFIOPCIDevice,
+                            vbasedev.device_dirty_page_tracking,
+                            ON_OFF_AUTO_ON),
     DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
                             display, ON_OFF_AUTO_OFF),
     DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a3e691c126c6..e67f5e74cebd 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -136,6 +136,7 @@ typedef struct VFIODevice {
     VFIOMigration *migration;
     Error *migration_blocker;
     OnOffAuto pre_copy_dirty_page_tracking;
+    OnOffAuto device_dirty_page_tracking;
     bool dirty_pages_supported;
     bool dirty_tracking;
     union {
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation
  2024-02-12 13:56 ` [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation Joao Martins
@ 2024-02-12 16:27   ` Jason Gunthorpe
  2024-02-12 18:09     ` Joao Martins
  2024-02-13 12:01   ` Joao Martins
  2024-02-19  8:58   ` Avihai Horon
  2 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2024-02-12 16:27 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Avihai Horon

On Mon, Feb 12, 2024 at 01:56:37PM +0000, Joao Martins wrote:
> There's generally two modes of operation for IOMMUFD:
> 
> * The simple user API which intends to perform relatively simple things
> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
> and mainly performs IOAS_MAP and UNMAP.
> 
> * The native IOMMUFD API where you have fine grained control of the
> IOMMU domain and model it accordingly. This is where most new feature
> are being steered to.
> 
> For dirty tracking 2) is required, as it needs to ensure that
> the stage-2/parent IOMMU domain will only attach devices
> that support dirty tracking (so far it is all homogeneous in x86, likely
> not the case for smmuv3). Such invariant on dirty tracking provides a
> useful guarantee to VMMs that will refuse incompatible device
> attachments for IOMMU domains.
> 
> For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
> which is responsible for creating an IOMMU domain. This is contrast to
> the 'simple API' where the IOMMU domain is created by IOMMUFD
> automatically when it attaches to VFIO (usually referred as autodomains)
> 
> To support dirty tracking with the advanced IOMMUFD API, it needs
> similar logic, where IOMMU domains are created and devices attached to
> compatible domains. Essentially mimmicing kernel
> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
> to IOAS attach.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Right now the only alternative to a userspace autodomains implementation
> is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
> IOAS attach.

It was my expectation that VMM userspace would implement direct hwpt
support. This is needed in all kinds of cases going forward because
hwpt allocation is not uniform across iommu instances and managing
this in the kernel is only feasible for simpler cases. Dirty tracking
would be one of them.

> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +                               uint32_t pt_id, uint32_t flags,
> +                               uint32_t data_type, uint32_t data_len,
> +                               void *data_ptr, uint32_t *out_hwpt)
> +{
> +    int ret;
> +    struct iommu_hwpt_alloc alloc_hwpt = {
> +        .size = sizeof(struct iommu_hwpt_alloc),
> +        .flags = flags,
> +        .dev_id = dev_id,
> +        .pt_id = pt_id,
> +        .data_type = data_type,
> +        .data_len = data_len,
> +        .data_uptr = (uint64_t)data_ptr,
> +        .__reserved = 0,

Unless the coding style requirs this it is not necessary to zero in
the __reserved within a bracketed in initializer..

> +static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> +                                        VFIOIOMMUFDContainer *container,
> +                                        Error **errp)
> +{
> +    int iommufd = vbasedev->iommufd_dev.iommufd->fd;
> +    VFIOIOASHwpt *hwpt;
> +    Error *err = NULL;
> +    int ret = -EINVAL;
> +    uint32_t hwpt_id;
> +
> +    /* Try to find a domain */
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
> +        if (ret) {
> +            /* -EINVAL means the domain is incompatible with the device. */
> +            if (ret == -EINVAL) {

Please send a kernel kdoc patch to document this..

The approach looks good to me

The nesting patches surely need this too?

Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages
  2024-02-12 13:56 ` [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages Joao Martins
@ 2024-02-12 16:59   ` Jason Gunthorpe
  2024-02-12 17:17   ` Markus Armbruster
  2024-02-19 10:05   ` Avihai Horon
  2 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2024-02-12 16:59 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Avihai Horon

On Mon, Feb 12, 2024 at 01:56:41PM +0000, Joao Martins wrote:
> Allow disabling hugepages to be dirty track at base page
> granularity in similar vein to vfio_type1_iommu.disable_hugepages
> but per IOAS.

No objection to this, but I just wanted to observe I didn't imagine
using this option for this purpose. It should work OK but it is a
pretty big an inefficient hammer :)

Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages
  2024-02-12 13:56 ` [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages Joao Martins
  2024-02-12 16:59   ` Jason Gunthorpe
@ 2024-02-12 17:17   ` Markus Armbruster
  2024-02-12 19:16     ` Joao Martins
  2024-02-19 10:05   ` Avihai Horon
  2 siblings, 1 reply; 37+ messages in thread
From: Markus Armbruster @ 2024-02-12 17:17 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Jason Gunthorpe, Avihai Horon

Joao Martins <joao.m.martins@oracle.com> writes:

> Allow disabling hugepages to be dirty track at base page
> granularity in similar vein to vfio_type1_iommu.disable_hugepages
> but per IOAS.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index 84af23fe245d..9ad27e2b939b 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -809,7 +809,7 @@
>  # Since: 9.0
>  ##
>  { 'struct': 'IOMMUFDProperties',
> -  'data': { '*fd': 'str' } }
> +  'data': { '*fd': 'str', '*hugepages': 'bool' } }
>  
>  ##
>  # @RngProperties:

Missing documentation for the new member.

The latest QAPI PR is making this a hard error.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation
  2024-02-12 16:27   ` Jason Gunthorpe
@ 2024-02-12 18:09     ` Joao Martins
  2024-02-26  9:14       ` Duan, Zhenzhong
  0 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-12 18:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: qemu-devel, Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Avihai Horon

On 12/02/2024 16:27, Jason Gunthorpe wrote:
> On Mon, Feb 12, 2024 at 01:56:37PM +0000, Joao Martins wrote:
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
>> which is responsible for creating an IOMMU domain. This is contrast to
>> the 'simple API' where the IOMMU domain is created by IOMMUFD
>> automatically when it attaches to VFIO (usually referred as autodomains)
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
>> to IOAS attach.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> Right now the only alternative to a userspace autodomains implementation
>> is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
>> IOAS attach.
> 
> It was my expectation that VMM userspace would implement direct hwpt
> support. This is needed in all kinds of cases going forward because
> hwpt allocation is not uniform across iommu instances and managing
> this in the kernel is only feasible for simpler cases. Dirty tracking
> would be one of them.
> 

/me nods

>> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>> +                               uint32_t pt_id, uint32_t flags,
>> +                               uint32_t data_type, uint32_t data_len,
>> +                               void *data_ptr, uint32_t *out_hwpt)
>> +{
>> +    int ret;
>> +    struct iommu_hwpt_alloc alloc_hwpt = {
>> +        .size = sizeof(struct iommu_hwpt_alloc),
>> +        .flags = flags,
>> +        .dev_id = dev_id,
>> +        .pt_id = pt_id,
>> +        .data_type = data_type,
>> +        .data_len = data_len,
>> +        .data_uptr = (uint64_t)data_ptr,
>> +        .__reserved = 0,
> 
> Unless the coding style requirs this it is not necessary to zero in
> the __reserved within a bracketed in initializer..
> 

Ah yes; and no other helper is doing this, so definitely doesn't look code
style. I'll remove it.

>> +static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> +                                        VFIOIOMMUFDContainer *container,
>> +                                        Error **errp)
>> +{
>> +    int iommufd = vbasedev->iommufd_dev.iommufd->fd;
>> +    VFIOIOASHwpt *hwpt;
>> +    Error *err = NULL;
>> +    int ret = -EINVAL;
>> +    uint32_t hwpt_id;
>> +
>> +    /* Try to find a domain */
>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> +        ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
>> +        if (ret) {
>> +            /* -EINVAL means the domain is incompatible with the device. */
>> +            if (ret == -EINVAL) {
> 
> Please send a kernel kdoc patch to document this..
> 

Ack

> The approach looks good to me
> 
> The nesting patches surely need this too?

From what I understand, yes, but they seem to be able to hid this inside
intel-iommu for the parent hwpt allocation somehow. I'll let them comment;


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages
  2024-02-12 17:17   ` Markus Armbruster
@ 2024-02-12 19:16     ` Joao Martins
  0 siblings, 0 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-12 19:16 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Jason Gunthorpe, Avihai Horon

On 12/02/2024 17:17, Markus Armbruster wrote:
> Joao Martins <joao.m.martins@oracle.com> writes:
> 
>> Allow disabling hugepages to be dirty track at base page
>> granularity in similar vein to vfio_type1_iommu.disable_hugepages
>> but per IOAS.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> [...]
> 
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 84af23fe245d..9ad27e2b939b 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -809,7 +809,7 @@
>>  # Since: 9.0
>>  ##
>>  { 'struct': 'IOMMUFDProperties',
>> -  'data': { '*fd': 'str' } }
>> +  'data': { '*fd': 'str', '*hugepages': 'bool' } }
>>  
>>  ##
>>  # @RngProperties:
> 
> Missing documentation for the new member.
> 
> The latest QAPI PR is making this a hard error.
> 

Gah, sorry. I think I didn't have that PR yet as I didn't hit any build errors.
Missing the doc was pure distraction.

Will fix it for the next version.

	Joao


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking
  2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
                   ` (7 preceding siblings ...)
  2024-02-12 13:56 ` [PATCH RFCv2 8/8] vfio/common: Allow disabling device dirty page tracking Joao Martins
@ 2024-02-13 11:59 ` Joao Martins
  2024-02-14 15:40   ` Cédric Le Goater
  8 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-13 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cedric Le Goater, Eric Auger, Yi Liu,
	Zhenzhong Duan, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon

On 12/02/2024 13:56, Joao Martins wrote:
> This small series adds support for Dirty Tracking in IOMMUFD backend.
> The sole reason I still made it RFC is because of the second patch,
> where we are implementing user-managed auto domains.
> 
> In essence it is quite similar to the original IOMMUFD series where we
> would allocate a HWPT, until we switched later on into a IOAS attach.
> Patch 2 goes into more detail, but the gist is that there's two modes of
> using IOMMUFD and by keep using kernel managed auto domains we would end
> up duplicating the same flags we have in HWPT but into the VFIO IOAS
> attach. While it is true that just adding a flag is simpler, it also
> creates duplication and motivates duplicate what hwpt-alloc already has.
> But there's a chance I have the wrong expectation here, so any feedback
> welcome.
> 
> The series is divided into:
> 
> * Patch 1: Adds a simple helper to get device capabilities;
> 
> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
> 
> The workflow is relatively simple:
> 
> 1) Probe device and allow dirty tracking in the HWPT
> 2) Toggling dirty tracking on/off
> 3) Read-and-clear of Dirty IOVAs
> 
> The heuristics selected for (1) were to enable it *if* device supports
> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
> is supported. The latter is for the hotplug case where we can add a device
> without a tracker and thus still support migration.
> 
> The unmap case is deferred until further vIOMMU support with migration
> is added[3] which will then introduce the usage of
> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
> dma unmap bitmap flow.
> 
> * Patches 6-8: Add disabling of hugepages to allow tracking at base
> page; avoid blocking live migration where there's no VF dirty
> tracker, considering that we have IOMMU dirty tracking. And allow
> disabling VF dirty tracker via qemu command line.
> 
> This series builds on top of Zhengzhong series[0], but only requires the
> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
> instance after attachment")[1] that are more generic IOMMUFD device
> plumbing, and doesn't require the nesting counterpart.
> 
I need to add that this series doesn't *need* to be based on Zhengzhong series.
Though given that he is consolidating how an IOMMUFD device info is represented
it felt the correct thing to do. For dirty tracking we mainly need the
dev_id/iommufd available when we are going to attach, that's it.

I've pushed this series version that doesn't have such dependency, let me know
if you want me to pursue this version instead going forward:

https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps

> This is stored on github:
> https://github.com/jpemartins/qemu/commits/iommufd-v5
> 
> Note: While Linux v6.7 has IOMMU dirty tracking feature, I suggest folks
> use the latest for-rc of iommufd kernel tree as there's some fixes there.
> 
> Comments and feedback appreciated.
> 
> Cheers,
>         Joao
> 
> Chances since RFCv1[2]:
> * Remove intel/amd dirty tracking emulation enabling
> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
> [Will pursue these two in separate series]
> * Introduce auto domains support
> * Enforce dirty tracking following the IOMMUFD UAPI for this
> * Add support for toggling hugepages in IOMMUFD
> * Auto enable support when VF supports migration to use IOMMU
> when it doesn't have VF dirty tracking
> * Add a parameter to toggle VF dirty tracking
> 
> [0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
> [2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
> [3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
> 
> Joao Martins (8):
>   backends/iommufd: Introduce helper function
>     iommufd_device_get_hw_capabilities()
>   vfio/iommufd: Introduce auto domain creation
>   vfio/iommufd: Probe and request hwpt dirty tracking capability
>   vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>   vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>   backends/iommufd: Add ability to disable hugepages
>   vfio/migration: Don't block migration device dirty tracking is
>     unsupported
>   vfio/common: Allow disabling device dirty page tracking
> 
>  backends/iommufd.c            | 133 ++++++++++++++++++++++++++++
>  backends/trace-events         |   4 +
>  hw/vfio/common.c              |  32 ++++++-
>  hw/vfio/iommufd.c             | 162 ++++++++++++++++++++++++++++++++++
>  hw/vfio/migration.c           |   5 +-
>  hw/vfio/pci.c                 |   3 +
>  include/hw/vfio/vfio-common.h |  12 +++
>  include/sysemu/iommufd.h      |  17 ++++
>  qapi/qom.json                 |   2 +-
>  9 files changed, 367 insertions(+), 3 deletions(-)
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation
  2024-02-12 13:56 ` [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation Joao Martins
  2024-02-12 16:27   ` Jason Gunthorpe
@ 2024-02-13 12:01   ` Joao Martins
  2024-02-19  8:58   ` Avihai Horon
  2 siblings, 0 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-13 12:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe,
	Avihai Horon

On 12/02/2024 13:56, Joao Martins wrote:
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 8486894f1b3f..2970135af4b9 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -211,6 +211,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>      return ret;
>  }
>  
> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +                               uint32_t pt_id, uint32_t flags,
> +                               uint32_t data_type, uint32_t data_len,
> +                               void *data_ptr, uint32_t *out_hwpt)

This function looks to be against the style of the file. Which is to have
IOMMUFDBackend as an argument rather than passing the file descriptor directly.

Likewise for dev_id now that we have IOMMUFDDevice structure.

I've fixed that for the next version (whether or not we require Zhengzhong series).

	Joao


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking
  2024-02-13 11:59 ` [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
@ 2024-02-14 15:40   ` Cédric Le Goater
  2024-02-14 16:25     ` Joao Martins
  0 siblings, 1 reply; 37+ messages in thread
From: Cédric Le Goater @ 2024-02-14 15:40 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Alex Williamson, Eric Auger, Yi Liu, Zhenzhong Duan,
	Paolo Bonzini, Daniel P . Berrange, Eduardo Habkost, Eric Blake,
	Markus Armbruster, Jason Gunthorpe, Avihai Horon

Hello Joao,

On 2/13/24 12:59, Joao Martins wrote:
> On 12/02/2024 13:56, Joao Martins wrote:
>> This small series adds support for Dirty Tracking in IOMMUFD backend.
>> The sole reason I still made it RFC is because of the second patch,
>> where we are implementing user-managed auto domains.
>>
>> In essence it is quite similar to the original IOMMUFD series where we
>> would allocate a HWPT, until we switched later on into a IOAS attach.
>> Patch 2 goes into more detail, but the gist is that there's two modes of
>> using IOMMUFD and by keep using kernel managed auto domains we would end
>> up duplicating the same flags we have in HWPT but into the VFIO IOAS
>> attach. While it is true that just adding a flag is simpler, it also
>> creates duplication and motivates duplicate what hwpt-alloc already has.
>> But there's a chance I have the wrong expectation here, so any feedback
>> welcome.
>>
>> The series is divided into:
>>
>> * Patch 1: Adds a simple helper to get device capabilities;
>>
>> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
>>
>> The workflow is relatively simple:
>>
>> 1) Probe device and allow dirty tracking in the HWPT
>> 2) Toggling dirty tracking on/off
>> 3) Read-and-clear of Dirty IOVAs
>>
>> The heuristics selected for (1) were to enable it *if* device supports
>> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
>> is supported. The latter is for the hotplug case where we can add a device
>> without a tracker and thus still support migration.
>>
>> The unmap case is deferred until further vIOMMU support with migration
>> is added[3] which will then introduce the usage of
>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>> dma unmap bitmap flow.
>>
>> * Patches 6-8: Add disabling of hugepages to allow tracking at base
>> page; avoid blocking live migration where there's no VF dirty
>> tracker, considering that we have IOMMU dirty tracking. And allow
>> disabling VF dirty tracker via qemu command line.
>>
>> This series builds on top of Zhengzhong series[0], but only requires the
>> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
>> instance after attachment")[1] that are more generic IOMMUFD device
>> plumbing, and doesn't require the nesting counterpart.
>>
> I need to add that this series doesn't *need* to be based on Zhengzhong series.
> Though given that he is consolidating how an IOMMUFD device info is represented
> it felt the correct thing to do. For dirty tracking we mainly need the
> dev_id/iommufd available when we are going to attach, that's it.
> 
> I've pushed this series version that doesn't have such dependency, let me know
> if you want me to pursue this version instead going forward:
> 
> https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps

I feel I have lost track of all the different patchsets.

To recap, there is yours :

* vfio/iommufd: IOMMUFD Dirty Tracking
   https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/

Zhengzhong's :

* [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
   https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/

Eric's :

* [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged devices
   https://lore.kernel.org/qemu-devel/20240117080414.316890-1-eric.auger@redhat.com/

Steve's:

* [PATCH V3 00/13] allow cpr-reboot for vfio
   https://lore.kernel.org/qemu-devel/1707418446-134863-1-git-send-email-steven.sistare@oracle.com/

Mine, which should be an RFC :

* [PATCH 00/14] migration: Improve error reporting
   https://lore.kernel.org/qemu-devel/20240207133347.1115903-1-clg@redhat.com/

Anything else ?

Thanks,

C.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking
  2024-02-14 15:40   ` Cédric Le Goater
@ 2024-02-14 16:25     ` Joao Martins
  2024-02-15 14:20       ` Eric Auger
  0 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-14 16:25 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Eric Auger, Yi Liu, Zhenzhong Duan,
	Paolo Bonzini, Daniel P . Berrange, Eduardo Habkost, Eric Blake,
	Markus Armbruster, Jason Gunthorpe, Avihai Horon

On 14/02/2024 15:40, Cédric Le Goater wrote:
> Hello Joao,
> 
> On 2/13/24 12:59, Joao Martins wrote:
>> On 12/02/2024 13:56, Joao Martins wrote:
>>> This small series adds support for Dirty Tracking in IOMMUFD backend.
>>> The sole reason I still made it RFC is because of the second patch,
>>> where we are implementing user-managed auto domains.
>>>
>>> In essence it is quite similar to the original IOMMUFD series where we
>>> would allocate a HWPT, until we switched later on into a IOAS attach.
>>> Patch 2 goes into more detail, but the gist is that there's two modes of
>>> using IOMMUFD and by keep using kernel managed auto domains we would end
>>> up duplicating the same flags we have in HWPT but into the VFIO IOAS
>>> attach. While it is true that just adding a flag is simpler, it also
>>> creates duplication and motivates duplicate what hwpt-alloc already has.
>>> But there's a chance I have the wrong expectation here, so any feedback
>>> welcome.
>>>
>>> The series is divided into:
>>>
>>> * Patch 1: Adds a simple helper to get device capabilities;
>>>
>>> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
>>>
>>> The workflow is relatively simple:
>>>
>>> 1) Probe device and allow dirty tracking in the HWPT
>>> 2) Toggling dirty tracking on/off
>>> 3) Read-and-clear of Dirty IOVAs
>>>
>>> The heuristics selected for (1) were to enable it *if* device supports
>>> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
>>> is supported. The latter is for the hotplug case where we can add a device
>>> without a tracker and thus still support migration.
>>>
>>> The unmap case is deferred until further vIOMMU support with migration
>>> is added[3] which will then introduce the usage of
>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>>> dma unmap bitmap flow.
>>>
>>> * Patches 6-8: Add disabling of hugepages to allow tracking at base
>>> page; avoid blocking live migration where there's no VF dirty
>>> tracker, considering that we have IOMMU dirty tracking. And allow
>>> disabling VF dirty tracker via qemu command line.
>>>
>>> This series builds on top of Zhengzhong series[0], but only requires the
>>> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
>>> instance after attachment")[1] that are more generic IOMMUFD device
>>> plumbing, and doesn't require the nesting counterpart.
>>>
>> I need to add that this series doesn't *need* to be based on Zhengzhong series.
>> Though given that he is consolidating how an IOMMUFD device info is represented
>> it felt the correct thing to do. For dirty tracking we mainly need the
>> dev_id/iommufd available when we are going to attach, that's it.
>>
>> I've pushed this series version that doesn't have such dependency, let me know
>> if you want me to pursue this version instead going forward:
>>
>> https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps
> 
> I feel I have lost track of all the different patchsets.
> 
> To recap, there is yours :
> 
> * vfio/iommufd: IOMMUFD Dirty Tracking
>  
> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
> 
> Zhengzhong's :
> 
> * [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
>  
> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> 

There's also this one from Zhenzhong which depends on this set above:

	https://lore.kernel.org/qemu-devel/20240115103735.132209-1-zhenzhong.duan@intel.com/

But I suspect that part of it is stale already, considering a whole lot of
IOMMUFDDevice was reworked. Though the series is about bringup intel-iommu
nesting support.

> Eric's :
> 
> * [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged
> devices
>   https://lore.kernel.org/qemu-devel/20240117080414.316890-1-eric.auger@redhat.com/
> 
> Steve's:
> 
> * [PATCH V3 00/13] allow cpr-reboot for vfio
>  
> https://lore.kernel.org/qemu-devel/1707418446-134863-1-git-send-email-steven.sistare@oracle.com/
> 
> Mine, which should be an RFC :
> 
> * [PATCH 00/14] migration: Improve error reporting
>   https://lore.kernel.org/qemu-devel/20240207133347.1115903-1-clg@redhat.com/
> 
> Anything else ?

In terms of major series, I think you only forgot one. The rest look to be
what's out there.

Just to avoid confusion, yesterday's message was just providing an alternative
of this same series but it that wouldn't be dependent on:

	[PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU

... which is what is posted in this link:

	https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps

While the series, as posted, is here:

	https://github.com/jpemartins/qemu/commits/iommufd-v5


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking
  2024-02-14 16:25     ` Joao Martins
@ 2024-02-15 14:20       ` Eric Auger
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2024-02-15 14:20 UTC (permalink / raw)
  To: Joao Martins, Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Yi Liu, Zhenzhong Duan, Paolo Bonzini,
	Daniel P . Berrange, Eduardo Habkost, Eric Blake,
	Markus Armbruster, Jason Gunthorpe, Avihai Horon

Hi,

On 2/14/24 17:25, Joao Martins wrote:
> On 14/02/2024 15:40, Cédric Le Goater wrote:
>> Hello Joao,
>>
>> On 2/13/24 12:59, Joao Martins wrote:
>>> On 12/02/2024 13:56, Joao Martins wrote:
>>>> This small series adds support for Dirty Tracking in IOMMUFD backend.
>>>> The sole reason I still made it RFC is because of the second patch,
>>>> where we are implementing user-managed auto domains.
>>>>
>>>> In essence it is quite similar to the original IOMMUFD series where we
>>>> would allocate a HWPT, until we switched later on into a IOAS attach.
>>>> Patch 2 goes into more detail, but the gist is that there's two modes of
>>>> using IOMMUFD and by keep using kernel managed auto domains we would end
>>>> up duplicating the same flags we have in HWPT but into the VFIO IOAS
>>>> attach. While it is true that just adding a flag is simpler, it also
>>>> creates duplication and motivates duplicate what hwpt-alloc already has.
>>>> But there's a chance I have the wrong expectation here, so any feedback
>>>> welcome.
>>>>
>>>> The series is divided into:
>>>>
>>>> * Patch 1: Adds a simple helper to get device capabilities;
>>>>
>>>> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
>>>>
>>>> The workflow is relatively simple:
>>>>
>>>> 1) Probe device and allow dirty tracking in the HWPT
>>>> 2) Toggling dirty tracking on/off
>>>> 3) Read-and-clear of Dirty IOVAs
>>>>
>>>> The heuristics selected for (1) were to enable it *if* device supports
>>>> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
>>>> is supported. The latter is for the hotplug case where we can add a device
>>>> without a tracker and thus still support migration.
>>>>
>>>> The unmap case is deferred until further vIOMMU support with migration
>>>> is added[3] which will then introduce the usage of
>>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>>>> dma unmap bitmap flow.
>>>>
>>>> * Patches 6-8: Add disabling of hugepages to allow tracking at base
>>>> page; avoid blocking live migration where there's no VF dirty
>>>> tracker, considering that we have IOMMU dirty tracking. And allow
>>>> disabling VF dirty tracker via qemu command line.
>>>>
>>>> This series builds on top of Zhengzhong series[0], but only requires the
>>>> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
>>>> instance after attachment")[1] that are more generic IOMMUFD device
>>>> plumbing, and doesn't require the nesting counterpart.
>>>>
>>> I need to add that this series doesn't *need* to be based on Zhengzhong series.
>>> Though given that he is consolidating how an IOMMUFD device info is represented
>>> it felt the correct thing to do. For dirty tracking we mainly need the
>>> dev_id/iommufd available when we are going to attach, that's it.
>>>
>>> I've pushed this series version that doesn't have such dependency, let me know
>>> if you want me to pursue this version instead going forward:
>>>
>>> https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps
>> I feel I have lost track of all the different patchsets.
>>
>> To recap, there is yours :
>>
>> * vfio/iommufd: IOMMUFD Dirty Tracking
>>  
>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
>>
>> Zhengzhong's :
>>
>> * [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
>>  
>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
>>
> There's also this one from Zhenzhong which depends on this set above:
>
> 	https://lore.kernel.org/qemu-devel/20240115103735.132209-1-zhenzhong.duan@intel.com/
>
> But I suspect that part of it is stale already, considering a whole lot of
> IOMMUFDDevice was reworked. Though the series is about bringup intel-iommu
> nesting support.
>
>> Eric's :
>>
>> * [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged
>> devices
>>   https://lore.kernel.org/qemu-devel/20240117080414.316890-1-eric.auger@redhat.com/

don't spend time reviewing my series at that stage. I will review
Zhenzhong's

[PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU

and try to rebase on it.

Thanks

Eric

>>
>> Steve's:
>>
>> * [PATCH V3 00/13] allow cpr-reboot for vfio
>>  
>> https://lore.kernel.org/qemu-devel/1707418446-134863-1-git-send-email-steven.sistare@oracle.com/
>>
>> Mine, which should be an RFC :
>>
>> * [PATCH 00/14] migration: Improve error reporting
>>   https://lore.kernel.org/qemu-devel/20240207133347.1115903-1-clg@redhat.com/
>>
>> Anything else ?
> In terms of major series, I think you only forgot one. The rest look to be
> what's out there.
>
> Just to avoid confusion, yesterday's message was just providing an alternative
> of this same series but it that wouldn't be dependent on:
>
> 	[PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
>
> ... which is what is posted in this link:
>
> 	https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps
>
> While the series, as posted, is here:
>
> 	https://github.com/jpemartins/qemu/commits/iommufd-v5
>



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation
  2024-02-12 13:56 ` [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation Joao Martins
  2024-02-12 16:27   ` Jason Gunthorpe
  2024-02-13 12:01   ` Joao Martins
@ 2024-02-19  8:58   ` Avihai Horon
  2024-02-20 10:42     ` Joao Martins
  2 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-02-19  8:58 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> There's generally two modes of operation for IOMMUFD:
>
> * The simple user API which intends to perform relatively simple things
> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
> and mainly performs IOAS_MAP and UNMAP.
>
> * The native IOMMUFD API where you have fine grained control of the
> IOMMU domain and model it accordingly. This is where most new feature
> are being steered to.
>
> For dirty tracking 2) is required, as it needs to ensure that
> the stage-2/parent IOMMU domain will only attach devices
> that support dirty tracking (so far it is all homogeneous in x86, likely
> not the case for smmuv3). Such invariant on dirty tracking provides a
> useful guarantee to VMMs that will refuse incompatible device
> attachments for IOMMU domains.
>
> For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
> which is responsible for creating an IOMMU domain. This is contrast to
> the 'simple API' where the IOMMU domain is created by IOMMUFD
> automatically when it attaches to VFIO (usually referred as autodomains)
>
> To support dirty tracking with the advanced IOMMUFD API, it needs
> similar logic, where IOMMU domains are created and devices attached to
> compatible domains. Essentially mimmicing kernel
> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
> to IOAS attach.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Right now the only alternative to a userspace autodomains implementation
> is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
> IOAS attach. So opted for autodomains userspace approach to avoid the
> duplication of hwpt-alloc flags vs attach-ioas flags. I lack mdev real
> drivers atm, so testing with those is still TBD.
>
> Opinions, comments, welcome!
> ---
>   backends/iommufd.c            | 29 +++++++++++++
>   backends/trace-events         |  1 +
>   hw/vfio/iommufd.c             | 78 +++++++++++++++++++++++++++++++++++
>   include/hw/vfio/vfio-common.h |  9 ++++
>   include/sysemu/iommufd.h      |  4 ++
>   5 files changed, 121 insertions(+)
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 8486894f1b3f..2970135af4b9 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -211,6 +211,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>       return ret;
>   }
>
> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +                               uint32_t pt_id, uint32_t flags,
> +                               uint32_t data_type, uint32_t data_len,
> +                               void *data_ptr, uint32_t *out_hwpt)
> +{
> +    int ret;
> +    struct iommu_hwpt_alloc alloc_hwpt = {
> +        .size = sizeof(struct iommu_hwpt_alloc),
> +        .flags = flags,
> +        .dev_id = dev_id,
> +        .pt_id = pt_id,
> +        .data_type = data_type,
> +        .data_len = data_len,
> +        .data_uptr = (uint64_t)data_ptr,
> +        .__reserved = 0,
> +    };
> +
> +    ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
> +    trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id, flags, data_type,
> +                                     data_len, (uint64_t)data_ptr,
> +                                     alloc_hwpt.out_hwpt_id, ret);
> +    if (ret) {
> +        error_report("IOMMU_HWPT_ALLOC failed: %m");
> +    } else {
> +        *out_hwpt = alloc_hwpt.out_hwpt_id;
> +    }
> +    return !ret ? 0 : -errno;
> +}
> +
>   static const TypeInfo iommufd_backend_info = {
>       .name = TYPE_IOMMUFD_BACKEND,
>       .parent = TYPE_OBJECT,
> diff --git a/backends/trace-events b/backends/trace-events
> index d45c6e31a67e..f83a276a4253 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -13,5 +13,6 @@ iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
>   iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
>   iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>   iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
>   iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 7d39d7a5fa51..ca7ec45e725c 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -219,10 +219,82 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>       return ret;
>   }
>
> +static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> +                                        VFIOIOMMUFDContainer *container,
> +                                        Error **errp)
> +{
> +    int iommufd = vbasedev->iommufd_dev.iommufd->fd;
> +    VFIOIOASHwpt *hwpt;
> +    Error *err = NULL;
> +    int ret = -EINVAL;

The -EINVAL initialization is not necessary.

> +    uint32_t hwpt_id;
> +
> +    /* Try to find a domain */
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
> +        if (ret) {
> +            /* -EINVAL means the domain is incompatible with the device. */
> +            if (ret == -EINVAL) {

On error iommufd_cdev_attach_ioas_hwpt() returns -1 and not -errno, so I 
guess we need to change it.

> +                continue;
> +            }
> +            return ret;
> +        } else {
> +            vbasedev->hwpt = hwpt;
> +            QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> +            return 0;
> +        }
> +    }
> +
> +    ret = iommufd_backend_alloc_hwpt(iommufd,
> +                                     vbasedev->iommufd_dev.devid,
> +                                     container->ioas_id, 0, 0, 0,

Should we explicitly pass IOMMU_HWPT_DATA_NONE instead of 0?

> +                                     NULL, &hwpt_id);
> +    if (ret) {
> +        error_append_hint(&err,
> +                   "Failed to allocate HWPT for device %s. Fallback to IOAS attach\n",
> +                   vbasedev->name);
> +        warn_report_err(err);
> +        return ret;
> +    }
> +
> +    hwpt = g_malloc0(sizeof(*hwpt));
> +    hwpt->hwpt_id = hwpt_id;
> +    QLIST_INIT(&hwpt->device_list);
> +
> +    ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
> +    if (ret) {
> +        iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> +        g_free(hwpt);
> +        return ret;
> +    }
> +
> +    vbasedev->hwpt = hwpt;
> +    QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
> +    QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> +    return 0;

I think we need to improve error handling in this function.
There are various places where err is not freed/reported, not NULL-ed 
before re-used, or used in error_append_hint() although it can be NULL.
If there are places where Error can be ignored, we can pass NULL instead 
of &err.
Plus, errp parameter passed to this function is never used.

> +}
> +
> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
> +                                         VFIOIOMMUFDContainer *container)
> +{
> +    VFIOIOASHwpt *hwpt = vbasedev->hwpt;
> +
> +    QLIST_REMOVE(vbasedev, hwpt_next);
> +    QLIST_REMOVE(hwpt, next);

Shouldn't QLIST_REMOVE(hwpt, next) be moved inside the if?

> +    if (QLIST_EMPTY(&hwpt->device_list)) {
> +        iommufd_backend_free_id(container->be, hwpt->hwpt_id);
> +        g_free(hwpt);
> +    }
> +}
> +
>   static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
>                                            VFIOIOMMUFDContainer *container,
>                                            Error **errp)
>   {
> +    if (!iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
> +        return 0;
> +    }

If errp is used in iommufd_cdev_autodomains_get() eventually, we need to 
make sure it doens't contain an Error object before using it again below.

Thanks.

> +
>       return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
>   }
>
> @@ -231,6 +303,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
>   {
>       Error *err = NULL;
>
> +    if (vbasedev->hwpt) {
> +        iommufd_cdev_autodomains_put(vbasedev, container);
> +        return;
> +    }
> +
>       if (iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
>           error_report_err(err);
>       }
> @@ -370,6 +447,7 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>       container = g_malloc0(sizeof(*container));
>       container->be = vbasedev->iommufd_dev.iommufd;
>       container->ioas_id = ioas_id;
> +    QLIST_INIT(&container->hwpt_list);
>
>       bcontainer = &container->bcontainer;
>       vfio_container_init(bcontainer, space, iommufd_vioc);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 9c4b60c906d9..7f7d823221e2 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -93,10 +93,17 @@ typedef struct VFIOHostDMAWindow {
>
>   typedef struct IOMMUFDBackend IOMMUFDBackend;
>
> +typedef struct VFIOIOASHwpt {
> +    uint32_t hwpt_id;
> +    QLIST_HEAD(, VFIODevice) device_list;
> +    QLIST_ENTRY(VFIOIOASHwpt) next;
> +} VFIOIOASHwpt;
> +
>   typedef struct VFIOIOMMUFDContainer {
>       VFIOContainerBase bcontainer;
>       IOMMUFDBackend *be;
>       uint32_t ioas_id;
> +    QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>   } VFIOIOMMUFDContainer;
>
>   /* Abstraction of host IOMMU legacy device */
> @@ -136,6 +143,8 @@ typedef struct VFIODevice {
>           IOMMULegacyDevice legacy_dev;
>           IOMMUFDDevice iommufd_dev;
>       };
> +    VFIOIOASHwpt *hwpt;
> +    QLIST_ENTRY(VFIODevice) hwpt_next;
>   } VFIODevice;
>
>   QEMU_BUILD_BUG_ON(offsetof(VFIODevice, legacy_dev.base) !=
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 4afe97307dbe..1966b75caae2 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -49,4 +49,8 @@ typedef struct IOMMUFDDevice {
>   void iommufd_device_init(IOMMUFDDevice *idev);
>   int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
>                                          Error **errp);
> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +                               uint32_t pt_id, uint32_t flags,
> +                               uint32_t data_type, uint32_t data_len,
> +                               void *data_ptr, uint32_t *out_hwpt);
>   #endif
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability
  2024-02-12 13:56 ` [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
@ 2024-02-19  9:03   ` Avihai Horon
  2024-02-20 10:51     ` Joao Martins
  0 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-02-19  9:03 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> Probe hardware dirty tracking support by querying device hw capabilities
> via IOMMUFD_GET_HW_INFO.
>
> In preparation to using the dirty tracking UAPI, request dirty tracking in
> the HWPT flags when the device doesn't support dirty page tracking or has
> it disabled; or when support when the VF backing IOMMU supports dirty
> tracking. The latter is in the possibility of a device being attached
> that doesn't have a dirty tracker.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   hw/vfio/common.c              | 18 ++++++++++++++++++
>   hw/vfio/iommufd.c             | 25 ++++++++++++++++++++++++-
>   include/hw/vfio/vfio-common.h |  2 ++
>   3 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f7f85160be88..d8fc7077f839 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -216,6 +216,24 @@ bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
>       return true;
>   }
>
> +bool vfio_device_migration_supported(VFIODevice *vbasedev)
> +{
> +    if (!vbasedev->migration) {
> +        return false;
> +    }
> +
> +    return vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY;

I think this is redundant, as (vbasedev->migration != NULL) implies 
(vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY) == true.

> +}
> +
> +bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev)
> +{
> +    if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) {
> +        return false;
> +    }
> +
> +    return !vbasedev->dirty_pages_supported;
> +}
> +
>   /*
>    * Check if all VFIO devices are running and migration is active, which is
>    * essentially equivalent to the migration being in pre-copy phase.
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index ca7ec45e725c..edacb6d72748 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -219,11 +219,26 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>       return ret;
>   }
>
> +static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
> +                                          Error **errp)
> +{
> +    uint64_t caps;
> +    int r;
> +
> +    r = iommufd_device_get_hw_capabilities(iommufd_dev, &caps, errp);
> +    if (r) {
> +        return false;
> +    }
> +
> +    return caps & IOMMU_HW_CAP_DIRTY_TRACKING;

The false return value of this function is overloaded, it can indicate 
both error and lack of DPT support.
Should we fail iommufd_cdev_autodomains_get() if 
iommufd_dirty_pages_supported() fails?
Otherwise, errp argument of iommufd_dirty_pages_supported() is redundant 
and we can handle iommufd_device_get_hw_capabilities() error locally.

> +}
> +
>   static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>                                           VFIOIOMMUFDContainer *container,
>                                           Error **errp)
>   {
>       int iommufd = vbasedev->iommufd_dev.iommufd->fd;
> +    uint32_t flags = 0;
>       VFIOIOASHwpt *hwpt;
>       Error *err = NULL;
>       int ret = -EINVAL;
> @@ -245,9 +260,15 @@ static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>           }
>       }
>
> +    if ((vfio_device_migration_supported(vbasedev) &&
> +         !vfio_device_dirty_pages_supported(vbasedev)) ||
> +        iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err)) {

I think it's too early to check vfio_device_migration_supported() and 
vfio_device_dirty_pages_supported() here, as vfio_migration_init() 
hasn't been called yet so vbasedev->migration and 
vbasedev->dirty_pages_supported are not initialized.
Why do we need to check this? Can't we simply request IOMMUFD DPT if 
it's supported?

Thanks.

> +        flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> +    }
> +
>       ret = iommufd_backend_alloc_hwpt(iommufd,
>                                        vbasedev->iommufd_dev.devid,
> -                                     container->ioas_id, 0, 0, 0,
> +                                     container->ioas_id, flags, 0, 0,
>                                        NULL, &hwpt_id);
>       if (ret) {
>           error_append_hint(&err,
> @@ -271,6 +292,8 @@ static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>       vbasedev->hwpt = hwpt;
>       QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>       QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> +    container->bcontainer.dirty_pages_supported =
> +                              (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>       return 0;
>   }
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7f7d823221e2..a3e691c126c6 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -271,6 +271,8 @@ bool
>   vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
>   bool
>   vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
> +bool vfio_device_migration_supported(VFIODevice *vbasedev);
> +bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev);
>   int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>                                       VFIOBitmap *vbmap, hwaddr iova,
>                                       hwaddr size);
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  2024-02-12 13:56 ` [PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
@ 2024-02-19  9:17   ` Avihai Horon
  0 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-02-19  9:17 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
> enables or disables dirty page tracking.
>
> It is called on the whole list of iommu domains it is are tracking,
> and on failure it rolls it back.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   backends/iommufd.c       | 19 +++++++++++++++++++
>   backends/trace-events    |  1 +
>   hw/vfio/common.c         |  7 ++++++-
>   hw/vfio/iommufd.c        | 28 ++++++++++++++++++++++++++++
>   include/sysemu/iommufd.h |  3 +++
>   5 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 2970135af4b9..954de61c2da0 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -240,6 +240,25 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>       return !ret ? 0 : -errno;
>   }
>
> +int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> +                                       bool start)
> +{
> +    int ret;
> +    struct iommu_hwpt_set_dirty_tracking set_dirty = {
> +            .size = sizeof(set_dirty),
> +            .hwpt_id = hwpt_id,
> +            .flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
> +    };
> +
> +    ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
> +    trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret);
> +    if (ret) {
> +        error_report("IOMMU_HWPT_SET_DIRTY_TRACKING failed: %s",
> +                     strerror(errno));
> +    }
> +    return !ret ? 0 : -errno;
> +}
> +
>   static const TypeInfo iommufd_backend_info = {
>       .name = TYPE_IOMMUFD_BACKEND,
>       .parent = TYPE_OBJECT,
> diff --git a/backends/trace-events b/backends/trace-events
> index f83a276a4253..feba2baca5f7 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t si
>   iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
>   iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> +iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%d enable=%d (%d)"

s/hwpt=%d/hwpt=%u

> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index d8fc7077f839..a940c0b6ede8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -190,7 +190,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
>       QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
>           VFIOMigration *migration = vbasedev->migration;
>
> -        if (!migration) {
> +        if (!migration && !vbasedev->iommufd_dev.iommufd) {
>               return false;
>           }
>
> @@ -199,6 +199,11 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
>                vfio_device_state_is_precopy(vbasedev))) {
>               return false;
>           }
> +
> +        if (vbasedev->iommufd_dev.iommufd &&
> +            !bcontainer->dirty_pages_supported) {
> +            return false;
> +        }

Why do we need this and the above?
IIUC, vfio_devices_all_dirty_tracking() is used to check if this is a 
"proper time" to issue a dirty page sync (e.g., if migration is active, 
if we are in pre-copy and dirty tracking during pre-copy is enabled).
If it's a "proper time" to do dirty page sync, even if 
bcontainer->dirty_pages_supported is false, we should still issue a 
dirty sync which will mark all dirty.

>       }
>       return true;
>   }
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index edacb6d72748..361e659288fd 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -25,6 +25,7 @@
>   #include "qemu/cutils.h"
>   #include "qemu/chardev_open.h"
>   #include "pci.h"
> +#include "migration/migration.h"

This is redundant.

Thanks.

>
>   static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>                               ram_addr_t size, void *vaddr, bool readonly)
> @@ -115,6 +116,32 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
>       iommufd_backend_disconnect(vbasedev->iommufd_dev.iommufd);
>   }
>
> +static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
> +                                           bool start)
> +{
> +    const VFIOIOMMUFDContainer *container =
> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> +    int ret;
> +    VFIOIOASHwpt *hwpt;
> +
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        ret = iommufd_backend_set_dirty_tracking(container->be,
> +                                                 hwpt->hwpt_id, start);
> +        if (ret) {
> +            goto err;
> +        }
> +    }
> +
> +    return 0;
> +
> +err:
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        iommufd_backend_set_dirty_tracking(container->be,
> +                                           hwpt->hwpt_id, !start);
> +    }
> +    return ret;
> +}
> +
>   static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>   {
>       long int ret = -ENOTTY;
> @@ -737,6 +764,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
>       vioc->detach_device = iommufd_cdev_detach;
>       vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>       vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
> +    vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
>   };
>
>   static const TypeInfo types[] = {
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 1966b75caae2..562c189dd92c 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -53,4 +53,7 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>                                  uint32_t pt_id, uint32_t flags,
>                                  uint32_t data_type, uint32_t data_len,
>                                  void *data_ptr, uint32_t *out_hwpt);
> +int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
> +                                       bool start);
> +
>   #endif
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  2024-02-12 13:56 ` [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
@ 2024-02-19  9:30   ` Avihai Horon
  2024-02-20 10:59     ` Joao Martins
  0 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-02-19  9:30 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
> that fetches the bitmap that tells what was dirty in an IOVA
> range.
>
> A single bitmap is allocated and used across all the hwpts
> sharing an IOAS which is then used in log_sync() to set Qemu
> global bitmaps.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   backends/iommufd.c       | 24 ++++++++++++++++++++++++
>   backends/trace-events    |  1 +
>   hw/vfio/iommufd.c        | 30 ++++++++++++++++++++++++++++++
>   include/sysemu/iommufd.h |  3 +++
>   4 files changed, 58 insertions(+)
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 954de61c2da0..dd676d493c37 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -259,6 +259,30 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>       return !ret ? 0 : -errno;
>   }
>
> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
> +                                     uint64_t iova, ram_addr_t size,
> +                                     uint64_t page_size, uint64_t *data)
> +{
> +    int ret;
> +    struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
> +        .size = sizeof(get_dirty_bitmap),
> +        .hwpt_id = hwpt_id,
> +        .iova = iova, .length = size,
> +        .page_size = page_size, .data = (uintptr_t)data,

Member per line for readability?

> +    };
> +
> +    ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
> +    trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
> +                                           page_size, ret);
> +    if (ret) {
> +        error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
> +                     " size: 0x%"PRIx64") failed: %s", iova,
> +                     size, strerror(errno));
> +    }
> +
> +    return !ret ? 0 : -errno;
> +}
> +
>   static const TypeInfo iommufd_backend_info = {
>       .name = TYPE_IOMMUFD_BACKEND,
>       .parent = TYPE_OBJECT,
> diff --git a/backends/trace-events b/backends/trace-events
> index feba2baca5f7..11a27cb114b6 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -17,3 +17,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_
>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
>   iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
>   iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%d enable=%d (%d)"
> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"

s/hwpt=%d/hwpt=%u

> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 361e659288fd..79b13bd262cc 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -25,6 +25,7 @@
>   #include "qemu/cutils.h"
>   #include "qemu/chardev_open.h"
>   #include "pci.h"
> +#include "exec/ram_addr.h"
>   #include "migration/migration.h"
>
>   static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> @@ -142,6 +143,34 @@ err:
>       return ret;
>   }
>
> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> +                                      VFIOBitmap *vbmap, uint64_t iova,
> +                                      uint64_t size)
> +{
> +    VFIOIOMMUFDContainer *container = container_of(bcontainer,
> +                                                   VFIOIOMMUFDContainer,
> +                                                   bcontainer);
> +    int ret;
> +    VFIOIOASHwpt *hwpt;
> +    unsigned long page_size;
> +
> +    if (!bcontainer->dirty_pages_supported) {

Do we need this check?
IIUC, if we got to iommufd_query_dirty_bitmap(), it means 
bcontainer->dirty_pages_supported is already true.

Thanks.

> +        return 0;
> +    }
> +
> +    page_size = qemu_real_host_page_size();
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
> +                                               iova, size, page_size,
> +                                               vbmap->bitmap);
> +        if (ret) {
> +            break;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
>   static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>   {
>       long int ret = -ENOTTY;
> @@ -765,6 +794,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
>       vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>       vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
>       vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
> +    vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
>   };
>
>   static const TypeInfo types[] = {
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 562c189dd92c..ba19b7ea4c19 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -55,5 +55,8 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>                                  void *data_ptr, uint32_t *out_hwpt);
>   int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>                                          bool start);
> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
> +                                     uint64_t iova, ram_addr_t size,
> +                                     uint64_t page_size, uint64_t *data);
>
>   #endif
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages
  2024-02-12 13:56 ` [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages Joao Martins
  2024-02-12 16:59   ` Jason Gunthorpe
  2024-02-12 17:17   ` Markus Armbruster
@ 2024-02-19 10:05   ` Avihai Horon
  2024-02-20 11:01     ` Joao Martins
  2 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-02-19 10:05 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> Allow disabling hugepages to be dirty track at base page
> granularity in similar vein to vfio_type1_iommu.disable_hugepages
> but per IOAS.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   backends/iommufd.c       | 36 ++++++++++++++++++++++++++++++++++++
>   backends/trace-events    |  1 +
>   hw/vfio/iommufd.c        |  4 ++++
>   include/sysemu/iommufd.h |  4 ++++
>   qapi/qom.json            |  2 +-
>   5 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index dd676d493c37..72fd98a9a50c 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -29,6 +29,7 @@ static void iommufd_backend_init(Object *obj)
>       be->fd = -1;
>       be->users = 0;
>       be->owned = true;
> +    be->hugepages = 1;
>   }
>
>   static void iommufd_backend_finalize(Object *obj)
> @@ -63,6 +64,14 @@ static bool iommufd_backend_can_be_deleted(UserCreatable *uc)
>       return !be->users;
>   }
>
> +static void iommufd_backend_set_hugepages(Object *obj, bool enabled,
> +                                          Error **errp)
> +{
> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +
> +    be->hugepages = enabled;
> +}
> +
>   static void iommufd_backend_class_init(ObjectClass *oc, void *data)
>   {
>       UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
> @@ -70,6 +79,11 @@ static void iommufd_backend_class_init(ObjectClass *oc, void *data)
>       ucc->can_be_deleted = iommufd_backend_can_be_deleted;
>
>       object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
> +
> +    object_class_property_add_bool(oc, "hugepages", NULL,
> +                                   iommufd_backend_set_hugepages);
> +    object_class_property_set_description(oc, "hugepages",
> +                                          "Set to 'off' to disable hugepages");
>   }
>
>   int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
> @@ -106,6 +120,28 @@ out:
>       trace_iommufd_backend_disconnect(be->fd, be->users);
>   }
>
> +int iommufd_backend_set_option(int fd, uint32_t object_id,
> +                               uint32_t option_id, uint64_t val64)
> +{
> +    int ret;
> +    struct iommu_option option = {
> +        .size = sizeof(option),
> +        .option_id = option_id,
> +        .op = IOMMU_OPTION_OP_SET,
> +        .val64 = val64,
> +        .object_id = object_id,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_OPTION, &option);
> +    if (ret) {
> +        error_report("Failed to set option %x to value %"PRIx64" %m", option_id,
> +                     val64);
> +    }
> +    trace_iommufd_backend_set_option(fd, object_id, option_id, val64, ret);
> +
> +    return ret;
> +}
> +
>   int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>                                  Error **errp)
>   {
> diff --git a/backends/trace-events b/backends/trace-events
> index 11a27cb114b6..076166552881 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -15,6 +15,7 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, u
>   iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>   iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
> +iommufd_backend_set_option(int iommufd, uint32_t object_id, uint32_t option_id, uint64_t val, int ret) " iommufd=%d object_id=%u option_id=%u val64=0x%"PRIx64" (%d)"
>   iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
>   iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%d enable=%d (%d)"
>   iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 79b13bd262cc..697d40841d7f 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -521,6 +521,10 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>           goto err_alloc_ioas;
>       }
>
> +    if (!vbasedev->iommufd_dev.iommufd->hugepages) {
> +        iommufd_backend_set_option(vbasedev->iommufd_dev.iommufd->fd, ioas_id,
> +                                   IOMMU_OPTION_HUGE_PAGES, 0);

Shouldn't we fail device attach if iommufd_backend_set_option() fails?

Thanks.

> +    }
>       trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd_dev.iommufd->fd, ioas_id);
>
>       container = g_malloc0(sizeof(*container));
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index ba19b7ea4c19..bc6607e3d444 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -19,6 +19,7 @@ struct IOMMUFDBackend {
>       /*< protected >*/
>       int fd;            /* /dev/iommu file descriptor */
>       bool owned;        /* is the /dev/iommu opened internally */
> +    bool hugepages;    /* are hugepages enabled on the IOAS */
>       uint32_t users;
>
>       /*< public >*/
> @@ -30,6 +31,9 @@ void iommufd_backend_disconnect(IOMMUFDBackend *be);
>   int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>                                  Error **errp);
>   void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
> +int iommufd_backend_set_option(int fd, uint32_t object_id,
> +                               uint32_t option_id,
> +                               uint64_t val64);
>   int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
>                               ram_addr_t size, void *vaddr, bool readonly);
>   int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 84af23fe245d..9ad27e2b939b 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -809,7 +809,7 @@
>   # Since: 9.0
>   ##
>   { 'struct': 'IOMMUFDProperties',
> -  'data': { '*fd': 'str' } }
> +  'data': { '*fd': 'str', '*hugepages': 'bool' } }
>
>   ##
>   # @RngProperties:
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported
  2024-02-12 13:56 ` [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
@ 2024-02-19 10:12   ` Avihai Horon
  2024-02-20 11:05     ` Joao Martins
  0 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-02-19 10:12 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> By default VFIO migration is set to auto, which will support live
> migration if the migration capability is set *and* also dirty page
> tracking is supported.
>
> For testing purposes one can force enable without dirty page tracking
> via enable-migration=on, but that option is generally left for testing
> purposes.
>
> So starting with IOMMU dirty tracking it can use to acomodate the lack of
> VF dirty page tracking allowing us to minimize the VF requirements for
> migration and thus enabling migration by default for those.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   hw/vfio/iommufd.c        | 3 +--
>   hw/vfio/migration.c      | 4 +++-
>   include/sysemu/iommufd.h | 1 +
>   3 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 697d40841d7f..78d8f4391b68 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -275,8 +275,7 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
>       return ret;
>   }
>
> -static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
> -                                          Error **errp)
> +bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev, Error **errp)
>   {
>       uint64_t caps;
>       int r;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 70e6b1a709f9..674e76b3f3df 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -938,7 +938,9 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>           return !vfio_block_migration(vbasedev, err, errp);
>       }
>
> -    if (!vbasedev->dirty_pages_supported) {
> +    if (!vbasedev->dirty_pages_supported &&
> +        (vbasedev->iommufd_dev.iommufd &&

Shouldn't we check the type of base_hdev instead?

> +         !iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err))) {

Maybe we can store IOMMUFD DPT support in iommufd_dev and use it instead 
of querying it here?

Thanks.

>           if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
>               error_setg(&err,
>                          "%s: VFIO device doesn't support device dirty tracking",
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index bc6607e3d444..d6be49f2ac78 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -53,6 +53,7 @@ typedef struct IOMMUFDDevice {
>   void iommufd_device_init(IOMMUFDDevice *idev);
>   int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
>                                          Error **errp);
> +bool iommufd_dirty_pages_supported(IOMMUFDDevice *idev, Error **errp);
>   int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>                                  uint32_t pt_id, uint32_t flags,
>                                  uint32_t data_type, uint32_t data_len,
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation
  2024-02-19  8:58   ` Avihai Horon
@ 2024-02-20 10:42     ` Joao Martins
  0 siblings, 0 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-20 10:42 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

On 19/02/2024 08:58, Avihai Horon wrote:
> Hi Joao,
> 
> On 12/02/2024 15:56, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
>> which is responsible for creating an IOMMU domain. This is contrast to
>> the 'simple API' where the IOMMU domain is created by IOMMUFD
>> automatically when it attaches to VFIO (usually referred as autodomains)
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
>> to IOAS attach.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> Right now the only alternative to a userspace autodomains implementation
>> is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
>> IOAS attach. So opted for autodomains userspace approach to avoid the
>> duplication of hwpt-alloc flags vs attach-ioas flags. I lack mdev real
>> drivers atm, so testing with those is still TBD.
>>
>> Opinions, comments, welcome!
>> ---
>>   backends/iommufd.c            | 29 +++++++++++++
>>   backends/trace-events         |  1 +
>>   hw/vfio/iommufd.c             | 78 +++++++++++++++++++++++++++++++++++
>>   include/hw/vfio/vfio-common.h |  9 ++++
>>   include/sysemu/iommufd.h      |  4 ++
>>   5 files changed, 121 insertions(+)
>>
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 8486894f1b3f..2970135af4b9 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -211,6 +211,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be,
>> uint32_t ioas_id,
>>       return ret;
>>   }
>>
>> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>> +                               uint32_t pt_id, uint32_t flags,
>> +                               uint32_t data_type, uint32_t data_len,
>> +                               void *data_ptr, uint32_t *out_hwpt)
>> +{
>> +    int ret;
>> +    struct iommu_hwpt_alloc alloc_hwpt = {
>> +        .size = sizeof(struct iommu_hwpt_alloc),
>> +        .flags = flags,
>> +        .dev_id = dev_id,
>> +        .pt_id = pt_id,
>> +        .data_type = data_type,
>> +        .data_len = data_len,
>> +        .data_uptr = (uint64_t)data_ptr,
>> +        .__reserved = 0,
>> +    };
>> +
>> +    ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>> +    trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id, flags, data_type,
>> +                                     data_len, (uint64_t)data_ptr,
>> +                                     alloc_hwpt.out_hwpt_id, ret);
>> +    if (ret) {
>> +        error_report("IOMMU_HWPT_ALLOC failed: %m");
>> +    } else {
>> +        *out_hwpt = alloc_hwpt.out_hwpt_id;
>> +    }
>> +    return !ret ? 0 : -errno;
>> +}
>> +
>>   static const TypeInfo iommufd_backend_info = {
>>       .name = TYPE_IOMMUFD_BACKEND,
>>       .parent = TYPE_OBJECT,
>> diff --git a/backends/trace-events b/backends/trace-events
>> index d45c6e31a67e..f83a276a4253 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -13,5 +13,6 @@ iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
>>   iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t
>> size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d
>> iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
>>   iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t
>> iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d
>> iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>   iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64"
>> (%d)"
>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id,
>> uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t
>> out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u
>> len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d
>> ioas=%d (%d)"
>>   iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>> id=%d (%d)"
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 7d39d7a5fa51..ca7ec45e725c 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -219,10 +219,82 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice
>> *vbasedev, Error **errp)
>>       return ret;
>>   }
>>
>> +static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> +                                        VFIOIOMMUFDContainer *container,
>> +                                        Error **errp)
>> +{
>> +    int iommufd = vbasedev->iommufd_dev.iommufd->fd;
>> +    VFIOIOASHwpt *hwpt;
>> +    Error *err = NULL;
>> +    int ret = -EINVAL;
> 
> The -EINVAL initialization is not necessary.
> 
/me nods

>> +    uint32_t hwpt_id;
>> +
>> +    /* Try to find a domain */
>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> +        ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
>> +        if (ret) {
>> +            /* -EINVAL means the domain is incompatible with the device. */
>> +            if (ret == -EINVAL) {
> 
> On error iommufd_cdev_attach_ioas_hwpt() returns -1 and not -errno, so I guess
> we need to change it.
> 

/me facepalm yes indeed

>> +                continue;
>> +            }
>> +            return ret;
>> +        } else {
>> +            vbasedev->hwpt = hwpt;
>> +            QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> +            return 0;
>> +        }
>> +    }
>> +
>> +    ret = iommufd_backend_alloc_hwpt(iommufd,
>> +                                     vbasedev->iommufd_dev.devid,
>> +                                     container->ioas_id, 0, 0, 0,
> 
> Should we explicitly pass IOMMU_HWPT_DATA_NONE instead of 0?
> 
That's better yes.

>> +                                     NULL, &hwpt_id);
>> +    if (ret) {
>> +        error_append_hint(&err,
>> +                   "Failed to allocate HWPT for device %s. Fallback to IOAS
>> attach\n",
>> +                   vbasedev->name);
>> +        warn_report_err(err);
>> +        return ret;
>> +    }
>> +
>> +    hwpt = g_malloc0(sizeof(*hwpt));
>> +    hwpt->hwpt_id = hwpt_id;
>> +    QLIST_INIT(&hwpt->device_list);
>> +
>> +    ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
>> +    if (ret) {
>> +        iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> +        g_free(hwpt);
>> +        return ret;
>> +    }
>> +
>> +    vbasedev->hwpt = hwpt;
>> +    QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>> +    QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> +    return 0;
> 
> I think we need to improve error handling in this function.
> There are various places where err is not freed/reported, not NULL-ed before
> re-used, or used in error_append_hint() although it can be NULL.
> If there are places where Error can be ignored, we can pass NULL instead of &err.
> Plus, errp parameter passed to this function is never used.
> 
Let me fix all those for the non-RFC v3.

>> +}
>> +
>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
>> +                                         VFIOIOMMUFDContainer *container)
>> +{
>> +    VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>> +
>> +    QLIST_REMOVE(vbasedev, hwpt_next);
>> +    QLIST_REMOVE(hwpt, next);
> 
> Shouldn't QLIST_REMOVE(hwpt, next) be moved inside the if?
> 
yes

>> +    if (QLIST_EMPTY(&hwpt->device_list)) {
>> +        iommufd_backend_free_id(container->be, hwpt->hwpt_id);
>> +        g_free(hwpt);
>> +    }
>> +}
>> +
>>   static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
>>                                            VFIOIOMMUFDContainer *container,
>>                                            Error **errp)
>>   {
>> +    if (!iommufd_cdev_autodomains_get(vbasedev, container, errp)) {
>> +        return 0;
>> +    }
> 
> If errp is used in iommufd_cdev_autodomains_get() eventually, we need to make
> sure it doens't contain an Error object before using it again below.
> 
yes



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability
  2024-02-19  9:03   ` Avihai Horon
@ 2024-02-20 10:51     ` Joao Martins
  2024-02-20 12:46       ` Avihai Horon
  0 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-20 10:51 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

On 19/02/2024 09:03, Avihai Horon wrote:
> Hi Joao,
> 
> On 12/02/2024 15:56, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Probe hardware dirty tracking support by querying device hw capabilities
>> via IOMMUFD_GET_HW_INFO.
>>
>> In preparation to using the dirty tracking UAPI, request dirty tracking in
>> the HWPT flags when the device doesn't support dirty page tracking or has
>> it disabled; or when support when the VF backing IOMMU supports dirty
>> tracking. The latter is in the possibility of a device being attached
>> that doesn't have a dirty tracker.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   hw/vfio/common.c              | 18 ++++++++++++++++++
>>   hw/vfio/iommufd.c             | 25 ++++++++++++++++++++++++-
>>   include/hw/vfio/vfio-common.h |  2 ++
>>   3 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index f7f85160be88..d8fc7077f839 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -216,6 +216,24 @@ bool vfio_devices_all_device_dirty_tracking(const
>> VFIOContainerBase *bcontainer)
>>       return true;
>>   }
>>
>> +bool vfio_device_migration_supported(VFIODevice *vbasedev)
>> +{
>> +    if (!vbasedev->migration) {
>> +        return false;
>> +    }
>> +
>> +    return vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY;
> 
> I think this is redundant, as (vbasedev->migration != NULL) implies
> (vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY) == true.
> 

The check was there to prevent a null-deref in case the device didn't support
migration.

>> +}
>> +
>> +bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev)
>> +{
>> +    if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) {
>> +        return false;
>> +    }
>> +
>> +    return !vbasedev->dirty_pages_supported;
>> +}
>> +
>>   /*
>>    * Check if all VFIO devices are running and migration is active, which is
>>    * essentially equivalent to the migration being in pre-copy phase.
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index ca7ec45e725c..edacb6d72748 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -219,11 +219,26 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice
>> *vbasedev, Error **errp)
>>       return ret;
>>   }
>>
>> +static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
>> +                                          Error **errp)
>> +{
>> +    uint64_t caps;
>> +    int r;
>> +
>> +    r = iommufd_device_get_hw_capabilities(iommufd_dev, &caps, errp);
>> +    if (r) {
>> +        return false;
>> +    }
>> +
>> +    return caps & IOMMU_HW_CAP_DIRTY_TRACKING;
> 
> The false return value of this function is overloaded, it can indicate both
> error and lack of DPT support.
> Should we fail iommufd_cdev_autodomains_get() if iommufd_dirty_pages_supported()
> fails?

Definitely not.

> Otherwise, errp argument of iommufd_dirty_pages_supported() is redundant and we
> can handle iommufd_device_get_hw_capabilities() error locally.
> 
I'll handle locally.

>> +}
>> +
>>   static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>                                           VFIOIOMMUFDContainer *container,
>>                                           Error **errp)
>>   {
>>       int iommufd = vbasedev->iommufd_dev.iommufd->fd;
>> +    uint32_t flags = 0;
>>       VFIOIOASHwpt *hwpt;
>>       Error *err = NULL;
>>       int ret = -EINVAL;
>> @@ -245,9 +260,15 @@ static int iommufd_cdev_autodomains_get(VFIODevice
>> *vbasedev,
>>           }
>>       }
>>
>> +    if ((vfio_device_migration_supported(vbasedev) &&
>> +         !vfio_device_dirty_pages_supported(vbasedev)) ||
>> +        iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err)) {
> 
> I think it's too early to check vfio_device_migration_supported() and
> vfio_device_dirty_pages_supported() here, as vfio_migration_init() hasn't been
> called yet so vbasedev->migration and vbasedev->dirty_pages_supported are not
> initialized.

I should replace with its own vfio device probing but the next point invalidates
this

> Why do we need to check this? Can't we simply request IOMMUFD DPT if it's
> supported?
> 
There's no point in force requesting dpt in the domain if the device doesn't do
migration that was my thinking here; but otoh as past hotplug bug fixes have
shown it needs to proof against a new device getting add up that supports
migration while and the unsupported one be removed. So I guess we might not have
another option but to always ask for it if supported.

> Thanks.
> 
>> +        flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> +    }
>> +
>>       ret = iommufd_backend_alloc_hwpt(iommufd,
>>                                        vbasedev->iommufd_dev.devid,
>> -                                     container->ioas_id, 0, 0, 0,
>> +                                     container->ioas_id, flags, 0, 0,
>>                                        NULL, &hwpt_id);
>>       if (ret) {
>>           error_append_hint(&err,
>> @@ -271,6 +292,8 @@ static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>       vbasedev->hwpt = hwpt;
>>       QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>       QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> +    container->bcontainer.dirty_pages_supported =
>> +                              (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>>       return 0;
>>   }
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 7f7d823221e2..a3e691c126c6 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -271,6 +271,8 @@ bool
>>   vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
>>   bool
>>   vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
>> +bool vfio_device_migration_supported(VFIODevice *vbasedev);
>> +bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev);
>>   int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>                                       VFIOBitmap *vbmap, hwaddr iova,
>>                                       hwaddr size);
>> -- 
>> 2.39.3
>>



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  2024-02-19  9:30   ` Avihai Horon
@ 2024-02-20 10:59     ` Joao Martins
  2024-02-20 12:52       ` Avihai Horon
  0 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-20 10:59 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

On 19/02/2024 09:30, Avihai Horon wrote:
> Hi Joao,
> 
> On 12/02/2024 15:56, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
>> that fetches the bitmap that tells what was dirty in an IOVA
>> range.
>>
>> A single bitmap is allocated and used across all the hwpts
>> sharing an IOAS which is then used in log_sync() to set Qemu
>> global bitmaps.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   backends/iommufd.c       | 24 ++++++++++++++++++++++++
>>   backends/trace-events    |  1 +
>>   hw/vfio/iommufd.c        | 30 ++++++++++++++++++++++++++++++
>>   include/sysemu/iommufd.h |  3 +++
>>   4 files changed, 58 insertions(+)
>>
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 954de61c2da0..dd676d493c37 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -259,6 +259,30 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend
>> *be, uint32_t hwpt_id,
>>       return !ret ? 0 : -errno;
>>   }
>>
>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>> +                                     uint64_t iova, ram_addr_t size,
>> +                                     uint64_t page_size, uint64_t *data)
>> +{
>> +    int ret;
>> +    struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
>> +        .size = sizeof(get_dirty_bitmap),
>> +        .hwpt_id = hwpt_id,
>> +        .iova = iova, .length = size,
>> +        .page_size = page_size, .data = (uintptr_t)data,
> 
> Member per line for readability?
> 

Yeap

>> +    };
>> +
>> +    ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
>> +    trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
>> +                                           page_size, ret);
>> +    if (ret) {
>> +        error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
>> +                     " size: 0x%"PRIx64") failed: %s", iova,
>> +                     size, strerror(errno));
>> +    }
>> +
>> +    return !ret ? 0 : -errno;
>> +}
>> +
>>   static const TypeInfo iommufd_backend_info = {
>>       .name = TYPE_IOMMUFD_BACKEND,
>>       .parent = TYPE_OBJECT,
>> diff --git a/backends/trace-events b/backends/trace-events
>> index feba2baca5f7..11a27cb114b6 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -17,3 +17,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>> uint32_t pt_id, uint32_
>>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d
>> ioas=%d (%d)"
>>   iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>> id=%d (%d)"
>>   iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int
>> ret) " iommufd=%d hwpt=%d enable=%d (%d)"
>> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t
>> iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d
>> iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
> 
> s/hwpt=%d/hwpt=%u
> 
/me nods

>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 361e659288fd..79b13bd262cc 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -25,6 +25,7 @@
>>   #include "qemu/cutils.h"
>>   #include "qemu/chardev_open.h"
>>   #include "pci.h"
>> +#include "exec/ram_addr.h"
>>   #include "migration/migration.h"
>>
>>   static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>> @@ -142,6 +143,34 @@ err:
>>       return ret;
>>   }
>>
>> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>> +                                      VFIOBitmap *vbmap, uint64_t iova,
>> +                                      uint64_t size)
>> +{
>> +    VFIOIOMMUFDContainer *container = container_of(bcontainer,
>> +                                                   VFIOIOMMUFDContainer,
>> +                                                   bcontainer);
>> +    int ret;
>> +    VFIOIOASHwpt *hwpt;
>> +    unsigned long page_size;
>> +
>> +    if (!bcontainer->dirty_pages_supported) {
> 
> Do we need this check?
> IIUC, if we got to iommufd_query_dirty_bitmap(), it means
> bcontainer->dirty_pages_supported is already true.
> 

Not quite. Look again at vfio_get_dirty_bitmap(); furthermore
vfio_container_query_dirty_bitmap() doesn't check for dirty_pages_supported.

I guess this check should be better placed into container-base class rather than
here.

> Thanks.
> 
>> +        return 0;
>> +    }
>> +
>> +    page_size = qemu_real_host_page_size();
>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> +        ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
>> +                                               iova, size, page_size,
>> +                                               vbmap->bitmap);
>> +        if (ret) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>>   static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>>   {
>>       long int ret = -ENOTTY;
>> @@ -765,6 +794,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass
>> *klass, void *data)
>>       vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>>       vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
>>       vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
>> +    vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
>>   };
>>
>>   static const TypeInfo types[] = {
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index 562c189dd92c..ba19b7ea4c19 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -55,5 +55,8 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>>                                  void *data_ptr, uint32_t *out_hwpt);
>>   int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>>                                          bool start);
>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>> +                                     uint64_t iova, ram_addr_t size,
>> +                                     uint64_t page_size, uint64_t *data);
>>
>>   #endif
>> -- 
>> 2.39.3
>>



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages
  2024-02-19 10:05   ` Avihai Horon
@ 2024-02-20 11:01     ` Joao Martins
  0 siblings, 0 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-20 11:01 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

On 19/02/2024 10:05, Avihai Horon wrote:
> Hi Joao,
> 
> On 12/02/2024 15:56, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Allow disabling hugepages to be dirty track at base page
>> granularity in similar vein to vfio_type1_iommu.disable_hugepages
>> but per IOAS.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   backends/iommufd.c       | 36 ++++++++++++++++++++++++++++++++++++
>>   backends/trace-events    |  1 +
>>   hw/vfio/iommufd.c        |  4 ++++
>>   include/sysemu/iommufd.h |  4 ++++
>>   qapi/qom.json            |  2 +-
>>   5 files changed, 46 insertions(+), 1 deletion(-)
>>
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index dd676d493c37..72fd98a9a50c 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -29,6 +29,7 @@ static void iommufd_backend_init(Object *obj)
>>       be->fd = -1;
>>       be->users = 0;
>>       be->owned = true;
>> +    be->hugepages = 1;
>>   }
>>
>>   static void iommufd_backend_finalize(Object *obj)
>> @@ -63,6 +64,14 @@ static bool iommufd_backend_can_be_deleted(UserCreatable *uc)
>>       return !be->users;
>>   }
>>
>> +static void iommufd_backend_set_hugepages(Object *obj, bool enabled,
>> +                                          Error **errp)
>> +{
>> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +
>> +    be->hugepages = enabled;
>> +}
>> +
>>   static void iommufd_backend_class_init(ObjectClass *oc, void *data)
>>   {
>>       UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
>> @@ -70,6 +79,11 @@ static void iommufd_backend_class_init(ObjectClass *oc,
>> void *data)
>>       ucc->can_be_deleted = iommufd_backend_can_be_deleted;
>>
>>       object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
>> +
>> +    object_class_property_add_bool(oc, "hugepages", NULL,
>> +                                   iommufd_backend_set_hugepages);
>> +    object_class_property_set_description(oc, "hugepages",
>> +                                          "Set to 'off' to disable hugepages");
>>   }
>>
>>   int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
>> @@ -106,6 +120,28 @@ out:
>>       trace_iommufd_backend_disconnect(be->fd, be->users);
>>   }
>>
>> +int iommufd_backend_set_option(int fd, uint32_t object_id,
>> +                               uint32_t option_id, uint64_t val64)
>> +{
>> +    int ret;
>> +    struct iommu_option option = {
>> +        .size = sizeof(option),
>> +        .option_id = option_id,
>> +        .op = IOMMU_OPTION_OP_SET,
>> +        .val64 = val64,
>> +        .object_id = object_id,
>> +    };
>> +
>> +    ret = ioctl(fd, IOMMU_OPTION, &option);
>> +    if (ret) {
>> +        error_report("Failed to set option %x to value %"PRIx64" %m", option_id,
>> +                     val64);
>> +    }
>> +    trace_iommufd_backend_set_option(fd, object_id, option_id, val64, ret);
>> +
>> +    return ret;
>> +}
>> +
>>   int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>>                                  Error **errp)
>>   {
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 11a27cb114b6..076166552881 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -15,6 +15,7 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t
>> ioas, uint64_t iova, u
>>   iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64"
>> (%d)"
>>   iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id,
>> uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t
>> out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u
>> len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
>>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d
>> ioas=%d (%d)"
>> +iommufd_backend_set_option(int iommufd, uint32_t object_id, uint32_t
>> option_id, uint64_t val, int ret) " iommufd=%d object_id=%u option_id=%u
>> val64=0x%"PRIx64" (%d)"
>>   iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>> id=%d (%d)"
>>   iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int
>> ret) " iommufd=%d hwpt=%d enable=%d (%d)"
>>   iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t
>> iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d
>> iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 79b13bd262cc..697d40841d7f 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -521,6 +521,10 @@ static int iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>>           goto err_alloc_ioas;
>>       }
>>
>> +    if (!vbasedev->iommufd_dev.iommufd->hugepages) {
>> +        iommufd_backend_set_option(vbasedev->iommufd_dev.iommufd->fd, ioas_id,
>> +                                   IOMMU_OPTION_HUGE_PAGES, 0);
> 
> Shouldn't we fail device attach if iommufd_backend_set_option() fails?
> 

Let handle error correctly and fail the attach.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported
  2024-02-19 10:12   ` Avihai Horon
@ 2024-02-20 11:05     ` Joao Martins
  0 siblings, 0 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-20 11:05 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

On 19/02/2024 10:12, Avihai Horon wrote:
> Hi Joao,
> 
> On 12/02/2024 15:56, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> By default VFIO migration is set to auto, which will support live
>> migration if the migration capability is set *and* also dirty page
>> tracking is supported.
>>
>> For testing purposes one can force enable without dirty page tracking
>> via enable-migration=on, but that option is generally left for testing
>> purposes.
>>
>> So starting with IOMMU dirty tracking it can use to acomodate the lack of
>> VF dirty page tracking allowing us to minimize the VF requirements for
>> migration and thus enabling migration by default for those.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   hw/vfio/iommufd.c        | 3 +--
>>   hw/vfio/migration.c      | 4 +++-
>>   include/sysemu/iommufd.h | 1 +
>>   3 files changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 697d40841d7f..78d8f4391b68 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -275,8 +275,7 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice
>> *vbasedev, Error **errp)
>>       return ret;
>>   }
>>
>> -static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
>> -                                          Error **errp)
>> +bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev, Error **errp)
>>   {
>>       uint64_t caps;
>>       int r;
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 70e6b1a709f9..674e76b3f3df 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -938,7 +938,9 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error
>> **errp)
>>           return !vfio_block_migration(vbasedev, err, errp);
>>       }
>>
>> -    if (!vbasedev->dirty_pages_supported) {
>> +    if (!vbasedev->dirty_pages_supported &&
>> +        (vbasedev->iommufd_dev.iommufd &&
> 
> Shouldn't we check the type of base_hdev instead?
> 

This was just me trying to be less tied to Zhenzhong series, but yes

>> +         !iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err))) {
> 
> Maybe we can store IOMMUFD DPT support in iommufd_dev and use it instead of
> querying it here?

It's a good idea, and originally I had the ::capabilities stored in the
iommufd_dev and I was mainly checking this there.

I could fetch the capabilities after we get an idev and then this just tests
against HWCAP and this avoids having to call GET_HWINFO all the time


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability
  2024-02-20 10:51     ` Joao Martins
@ 2024-02-20 12:46       ` Avihai Horon
  0 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-02-20 12:46 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe


On 20/02/2024 12:51, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> On 19/02/2024 09:03, Avihai Horon wrote:
>> Hi Joao,
>>
>> On 12/02/2024 15:56, Joao Martins wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Probe hardware dirty tracking support by querying device hw capabilities
>>> via IOMMUFD_GET_HW_INFO.
>>>
>>> In preparation to using the dirty tracking UAPI, request dirty tracking in
>>> the HWPT flags when the device doesn't support dirty page tracking or has
>>> it disabled; or when support when the VF backing IOMMU supports dirty
>>> tracking. The latter is in the possibility of a device being attached
>>> that doesn't have a dirty tracker.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>>    hw/vfio/common.c              | 18 ++++++++++++++++++
>>>    hw/vfio/iommufd.c             | 25 ++++++++++++++++++++++++-
>>>    include/hw/vfio/vfio-common.h |  2 ++
>>>    3 files changed, 44 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index f7f85160be88..d8fc7077f839 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -216,6 +216,24 @@ bool vfio_devices_all_device_dirty_tracking(const
>>> VFIOContainerBase *bcontainer)
>>>        return true;
>>>    }
>>>
>>> +bool vfio_device_migration_supported(VFIODevice *vbasedev)
>>> +{
>>> +    if (!vbasedev->migration) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY;
>> I think this is redundant, as (vbasedev->migration != NULL) implies
>> (vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY) == true.
>>
> The check was there to prevent a null-deref in case the device didn't support
> migration.

I meant that "vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY" 
check is redundant because if vbasedev->migration != NULL then 
VFIO_MIGRATION_STOP_COPY is supported (it's already checked in 
vfio_migration_init()).

But never mind, given what you wrote below.

>
>>> +}
>>> +
>>> +bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev)
>>> +{
>>> +    if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return !vbasedev->dirty_pages_supported;
>>> +}
>>> +
>>>    /*
>>>     * Check if all VFIO devices are running and migration is active, which is
>>>     * essentially equivalent to the migration being in pre-copy phase.
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index ca7ec45e725c..edacb6d72748 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -219,11 +219,26 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice
>>> *vbasedev, Error **errp)
>>>        return ret;
>>>    }
>>>
>>> +static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
>>> +                                          Error **errp)
>>> +{
>>> +    uint64_t caps;
>>> +    int r;
>>> +
>>> +    r = iommufd_device_get_hw_capabilities(iommufd_dev, &caps, errp);
>>> +    if (r) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return caps & IOMMU_HW_CAP_DIRTY_TRACKING;
>> The false return value of this function is overloaded, it can indicate both
>> error and lack of DPT support.
>> Should we fail iommufd_cdev_autodomains_get() if iommufd_dirty_pages_supported()
>> fails?
> Definitely not.
>
>> Otherwise, errp argument of iommufd_dirty_pages_supported() is redundant and we
>> can handle iommufd_device_get_hw_capabilities() error locally.
>>
> I'll handle locally.
>
>>> +}
>>> +
>>>    static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>                                            VFIOIOMMUFDContainer *container,
>>>                                            Error **errp)
>>>    {
>>>        int iommufd = vbasedev->iommufd_dev.iommufd->fd;
>>> +    uint32_t flags = 0;
>>>        VFIOIOASHwpt *hwpt;
>>>        Error *err = NULL;
>>>        int ret = -EINVAL;
>>> @@ -245,9 +260,15 @@ static int iommufd_cdev_autodomains_get(VFIODevice
>>> *vbasedev,
>>>            }
>>>        }
>>>
>>> +    if ((vfio_device_migration_supported(vbasedev) &&
>>> +         !vfio_device_dirty_pages_supported(vbasedev)) ||
>>> +        iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err)) {
>> I think it's too early to check vfio_device_migration_supported() and
>> vfio_device_dirty_pages_supported() here, as vfio_migration_init() hasn't been
>> called yet so vbasedev->migration and vbasedev->dirty_pages_supported are not
>> initialized.
> I should replace with its own vfio device probing but the next point invalidates
> this
>
>> Why do we need to check this? Can't we simply request IOMMUFD DPT if it's
>> supported?
>>
> There's no point in force requesting dpt in the domain if the device doesn't do
> migration that was my thinking here; but otoh as past hotplug bug fixes have
> shown it needs to proof against a new device getting add up that supports
> migration while and the unsupported one be removed. So I guess we might not have
> another option but to always ask for it if supported.
>
>> Thanks.
>>
>>> +        flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> +    }
>>> +
>>>        ret = iommufd_backend_alloc_hwpt(iommufd,
>>>                                         vbasedev->iommufd_dev.devid,
>>> -                                     container->ioas_id, 0, 0, 0,
>>> +                                     container->ioas_id, flags, 0, 0,
>>>                                         NULL, &hwpt_id);
>>>        if (ret) {
>>>            error_append_hint(&err,
>>> @@ -271,6 +292,8 @@ static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>        vbasedev->hwpt = hwpt;
>>>        QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
>>>        QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>>> +    container->bcontainer.dirty_pages_supported =
>>> +                              (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
>>>        return 0;
>>>    }
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 7f7d823221e2..a3e691c126c6 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -271,6 +271,8 @@ bool
>>>    vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
>>>    bool
>>>    vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
>>> +bool vfio_device_migration_supported(VFIODevice *vbasedev);
>>> +bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev);
>>>    int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>                                        VFIOBitmap *vbmap, hwaddr iova,
>>>                                        hwaddr size);
>>> --
>>> 2.39.3
>>>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  2024-02-20 10:59     ` Joao Martins
@ 2024-02-20 12:52       ` Avihai Horon
  2024-02-20 13:17         ` Joao Martins
  0 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-02-20 12:52 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe


On 20/02/2024 12:59, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> On 19/02/2024 09:30, Avihai Horon wrote:
>> Hi Joao,
>>
>> On 12/02/2024 15:56, Joao Martins wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
>>> that fetches the bitmap that tells what was dirty in an IOVA
>>> range.
>>>
>>> A single bitmap is allocated and used across all the hwpts
>>> sharing an IOAS which is then used in log_sync() to set Qemu
>>> global bitmaps.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>>    backends/iommufd.c       | 24 ++++++++++++++++++++++++
>>>    backends/trace-events    |  1 +
>>>    hw/vfio/iommufd.c        | 30 ++++++++++++++++++++++++++++++
>>>    include/sysemu/iommufd.h |  3 +++
>>>    4 files changed, 58 insertions(+)
>>>
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index 954de61c2da0..dd676d493c37 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -259,6 +259,30 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend
>>> *be, uint32_t hwpt_id,
>>>        return !ret ? 0 : -errno;
>>>    }
>>>
>>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>>> +                                     uint64_t iova, ram_addr_t size,
>>> +                                     uint64_t page_size, uint64_t *data)
>>> +{
>>> +    int ret;
>>> +    struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
>>> +        .size = sizeof(get_dirty_bitmap),
>>> +        .hwpt_id = hwpt_id,
>>> +        .iova = iova, .length = size,
>>> +        .page_size = page_size, .data = (uintptr_t)data,
>> Member per line for readability?
>>
> Yeap
>
>>> +    };
>>> +
>>> +    ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
>>> +    trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
>>> +                                           page_size, ret);
>>> +    if (ret) {
>>> +        error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
>>> +                     " size: 0x%"PRIx64") failed: %s", iova,
>>> +                     size, strerror(errno));
>>> +    }
>>> +
>>> +    return !ret ? 0 : -errno;
>>> +}
>>> +
>>>    static const TypeInfo iommufd_backend_info = {
>>>        .name = TYPE_IOMMUFD_BACKEND,
>>>        .parent = TYPE_OBJECT,
>>> diff --git a/backends/trace-events b/backends/trace-events
>>> index feba2baca5f7..11a27cb114b6 100644
>>> --- a/backends/trace-events
>>> +++ b/backends/trace-events
>>> @@ -17,3 +17,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>>> uint32_t pt_id, uint32_
>>>    iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d
>>> ioas=%d (%d)"
>>>    iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>>> id=%d (%d)"
>>>    iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int
>>> ret) " iommufd=%d hwpt=%d enable=%d (%d)"
>>> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t
>>> iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d
>>> iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
>> s/hwpt=%d/hwpt=%u
>>
> /me nods
>
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 361e659288fd..79b13bd262cc 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -25,6 +25,7 @@
>>>    #include "qemu/cutils.h"
>>>    #include "qemu/chardev_open.h"
>>>    #include "pci.h"
>>> +#include "exec/ram_addr.h"
>>>    #include "migration/migration.h"
>>>
>>>    static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>>> @@ -142,6 +143,34 @@ err:
>>>        return ret;
>>>    }
>>>
>>> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>> +                                      VFIOBitmap *vbmap, uint64_t iova,
>>> +                                      uint64_t size)
>>> +{
>>> +    VFIOIOMMUFDContainer *container = container_of(bcontainer,
>>> +                                                   VFIOIOMMUFDContainer,
>>> +                                                   bcontainer);
>>> +    int ret;
>>> +    VFIOIOASHwpt *hwpt;
>>> +    unsigned long page_size;
>>> +
>>> +    if (!bcontainer->dirty_pages_supported) {
>> Do we need this check?
>> IIUC, if we got to iommufd_query_dirty_bitmap(), it means
>> bcontainer->dirty_pages_supported is already true.
>>
> Not quite. Look again at vfio_get_dirty_bitmap(); furthermore
> vfio_container_query_dirty_bitmap() doesn't check for dirty_pages_supported.

vfio_get_dirty_bitmap() has this check at the beginning:

if (!bcontainer->dirty_pages_supported && !all_device_dirty_tracking) {
     cpu_physical_memory_set_dirty_range(ram_addr, size,
                                         tcg_enabled() ? DIRTY_CLIENTS_ALL :
                                         DIRTY_CLIENTS_NOCODE);
     return 0;
}

So if bcontainer->dirty_pages_supported is false we will mark all dirty 
and return (and not call vfio_container_query_dirty_bitmap()).

Or am I missing something?

> I guess this check should be better placed into container-base class rather than
> here.
>
>> Thanks.
>>
>>> +        return 0;
>>> +    }
>>> +
>>> +    page_size = qemu_real_host_page_size();
>>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> +        ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
>>> +                                               iova, size, page_size,
>>> +                                               vbmap->bitmap);
>>> +        if (ret) {
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>>    static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>>>    {
>>>        long int ret = -ENOTTY;
>>> @@ -765,6 +794,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass
>>> *klass, void *data)
>>>        vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
>>>        vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
>>>        vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
>>> +    vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
>>>    };
>>>
>>>    static const TypeInfo types[] = {
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> index 562c189dd92c..ba19b7ea4c19 100644
>>> --- a/include/sysemu/iommufd.h
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -55,5 +55,8 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>>>                                   void *data_ptr, uint32_t *out_hwpt);
>>>    int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>>>                                           bool start);
>>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>>> +                                     uint64_t iova, ram_addr_t size,
>>> +                                     uint64_t page_size, uint64_t *data);
>>>
>>>    #endif
>>> --
>>> 2.39.3
>>>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  2024-02-20 12:52       ` Avihai Horon
@ 2024-02-20 13:17         ` Joao Martins
  0 siblings, 0 replies; 37+ messages in thread
From: Joao Martins @ 2024-02-20 13:17 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Yi Liu, Eric Auger, Zhenzhong Duan, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Jason Gunthorpe

On 20/02/2024 12:52, Avihai Horon wrote:
> 
> On 20/02/2024 12:59, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 19/02/2024 09:30, Avihai Horon wrote:
>>> Hi Joao,
>>>
>>> On 12/02/2024 15:56, Joao Martins wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
>>>> that fetches the bitmap that tells what was dirty in an IOVA
>>>> range.
>>>>
>>>> A single bitmap is allocated and used across all the hwpts
>>>> sharing an IOAS which is then used in log_sync() to set Qemu
>>>> global bitmaps.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>>    backends/iommufd.c       | 24 ++++++++++++++++++++++++
>>>>    backends/trace-events    |  1 +
>>>>    hw/vfio/iommufd.c        | 30 ++++++++++++++++++++++++++++++
>>>>    include/sysemu/iommufd.h |  3 +++
>>>>    4 files changed, 58 insertions(+)
>>>>
>>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>>> index 954de61c2da0..dd676d493c37 100644
>>>> --- a/backends/iommufd.c
>>>> +++ b/backends/iommufd.c
>>>> @@ -259,6 +259,30 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend
>>>> *be, uint32_t hwpt_id,
>>>>        return !ret ? 0 : -errno;
>>>>    }
>>>>
>>>> +int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>>>> +                                     uint64_t iova, ram_addr_t size,
>>>> +                                     uint64_t page_size, uint64_t *data)
>>>> +{
>>>> +    int ret;
>>>> +    struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
>>>> +        .size = sizeof(get_dirty_bitmap),
>>>> +        .hwpt_id = hwpt_id,
>>>> +        .iova = iova, .length = size,
>>>> +        .page_size = page_size, .data = (uintptr_t)data,
>>> Member per line for readability?
>>>
>> Yeap
>>
>>>> +    };
>>>> +
>>>> +    ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
>>>> +    trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
>>>> +                                           page_size, ret);
>>>> +    if (ret) {
>>>> +        error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
>>>> +                     " size: 0x%"PRIx64") failed: %s", iova,
>>>> +                     size, strerror(errno));
>>>> +    }
>>>> +
>>>> +    return !ret ? 0 : -errno;
>>>> +}
>>>> +
>>>>    static const TypeInfo iommufd_backend_info = {
>>>>        .name = TYPE_IOMMUFD_BACKEND,
>>>>        .parent = TYPE_OBJECT,
>>>> diff --git a/backends/trace-events b/backends/trace-events
>>>> index feba2baca5f7..11a27cb114b6 100644
>>>> --- a/backends/trace-events
>>>> +++ b/backends/trace-events
>>>> @@ -17,3 +17,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>>>> uint32_t pt_id, uint32_
>>>>    iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d
>>>> ioas=%d (%d)"
>>>>    iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>>>> id=%d (%d)"
>>>>    iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int
>>>> ret) " iommufd=%d hwpt=%d enable=%d (%d)"
>>>> +iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t
>>>> iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d
>>>> iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
>>> s/hwpt=%d/hwpt=%u
>>>
>> /me nods
>>
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 361e659288fd..79b13bd262cc 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -25,6 +25,7 @@
>>>>    #include "qemu/cutils.h"
>>>>    #include "qemu/chardev_open.h"
>>>>    #include "pci.h"
>>>> +#include "exec/ram_addr.h"
>>>>    #include "migration/migration.h"
>>>>
>>>>    static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr
>>>> iova,
>>>> @@ -142,6 +143,34 @@ err:
>>>>        return ret;
>>>>    }
>>>>
>>>> +static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>>> +                                      VFIOBitmap *vbmap, uint64_t iova,
>>>> +                                      uint64_t size)
>>>> +{
>>>> +    VFIOIOMMUFDContainer *container = container_of(bcontainer,
>>>> +                                                   VFIOIOMMUFDContainer,
>>>> +                                                   bcontainer);
>>>> +    int ret;
>>>> +    VFIOIOASHwpt *hwpt;
>>>> +    unsigned long page_size;
>>>> +
>>>> +    if (!bcontainer->dirty_pages_supported) {
>>> Do we need this check?
>>> IIUC, if we got to iommufd_query_dirty_bitmap(), it means
>>> bcontainer->dirty_pages_supported is already true.
>>>
>> Not quite. Look again at vfio_get_dirty_bitmap(); furthermore
>> vfio_container_query_dirty_bitmap() doesn't check for dirty_pages_supported.
> 
> vfio_get_dirty_bitmap() has this check at the beginning:
> 
> if (!bcontainer->dirty_pages_supported && !all_device_dirty_tracking) {
>     cpu_physical_memory_set_dirty_range(ram_addr, size,
>                                         tcg_enabled() ? DIRTY_CLIENTS_ALL :
>                                         DIRTY_CLIENTS_NOCODE);
>     return 0;
> }
> 
> So if bcontainer->dirty_pages_supported is false we will mark all dirty and
> return (and not call vfio_container_query_dirty_bitmap()).
> 
> Or am I missing something?
> 

Nah, I just lacked coffee.

You're right, while I read that check there, I misread what
vfio_devices_all_device_dirty_tracking() returns. And if that returns false it
means we need IOMMU dirty tracking and it was already tested false, otherwise
device dirty tracking is used instead.

	Joao


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities()
  2024-02-12 13:56 ` [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities() Joao Martins
@ 2024-02-26  7:29   ` Duan, Zhenzhong
  2024-02-26 10:10     ` Joao Martins
  0 siblings, 1 reply; 37+ messages in thread
From: Duan, Zhenzhong @ 2024-02-26  7:29 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
	Paolo Bonzini, Daniel P . Berrange, Eduardo Habkost, Eric Blake,
	Markus Armbruster, Jason Gunthorpe, Avihai Horon

Hi Joao,

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function
>iommufd_device_get_hw_capabilities()
>
>The new helper will fetch vendor agnostic IOMMU capabilities supported
>both by hardware and software. Right now it is only iommu dirty
>tracking.
>
>Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>---
> backends/iommufd.c       | 25 +++++++++++++++++++++++++
> include/sysemu/iommufd.h |  2 ++
> 2 files changed, 27 insertions(+)
>
>diff --git a/backends/iommufd.c b/backends/iommufd.c
>index d92791bba935..8486894f1b3f 100644
>--- a/backends/iommufd.c
>+++ b/backends/iommufd.c
>@@ -237,3 +237,28 @@ void iommufd_device_init(IOMMUFDDevice *idev)
>     host_iommu_base_device_init(&idev->base, HID_IOMMUFD,
>                                 sizeof(IOMMUFDDevice));
> }
>+
>+int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t
>*caps,
>+                                       Error **errp)
>+{
>+    struct iommu_hw_info info = {
>+        .size = sizeof(info),
>+        .flags = 0,
>+        .dev_id = idev->devid,
>+        .data_len = 0,
>+        .__reserved = 0,
>+        .data_uptr = 0,
>+        .out_capabilities = 0,
>+    };
>+    int ret;
>+
>+    ret = ioctl(idev->iommufd->fd, IOMMU_GET_HW_INFO, &info);
>+    if (ret) {
>+        error_setg_errno(errp, errno,
>+                         "Failed to get hardware info capabilities");
>+    } else {
>+        *caps = info.out_capabilities;
>+    }
>+
>+    return ret;
>+}

This helper is redundant with https://lists.gnu.org/archive/html/qemu-devel/2024-02/msg00031.html
We have to get other elements in info in nesting series, so mind using that helper
Instead to avoid redundancy? I can move that patch ahead for your usage.

Thanks
Zhenzhong

>diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>index c3f346976039..4afe97307dbe 100644
>--- a/include/sysemu/iommufd.h
>+++ b/include/sysemu/iommufd.h
>@@ -47,4 +47,6 @@ typedef struct IOMMUFDDevice {
> } IOMMUFDDevice;
>
> void iommufd_device_init(IOMMUFDDevice *idev);
>+int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t
>*caps,
>+                                       Error **errp);
> #endif
>--
>2.39.3



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation
  2024-02-12 18:09     ` Joao Martins
@ 2024-02-26  9:14       ` Duan, Zhenzhong
  0 siblings, 0 replies; 37+ messages in thread
From: Duan, Zhenzhong @ 2024-02-26  9:14 UTC (permalink / raw)
  To: Joao Martins, Jason Gunthorpe
  Cc: qemu-devel, Liu, Yi L, Eric Auger, Alex Williamson,
	Cedric Le Goater, Paolo Bonzini, Daniel P . Berrange,
	Eduardo Habkost, Eric Blake, Markus Armbruster, Avihai Horon



>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain
>creation
>
>On 12/02/2024 16:27, Jason Gunthorpe wrote:
>> On Mon, Feb 12, 2024 at 01:56:37PM +0000, Joao Martins wrote:
>>> There's generally two modes of operation for IOMMUFD:
>>>
>>> * The simple user API which intends to perform relatively simple things
>>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>>> and mainly performs IOAS_MAP and UNMAP.
>>>
>>> * The native IOMMUFD API where you have fine grained control of the
>>> IOMMU domain and model it accordingly. This is where most new feature
>>> are being steered to.
>>>
>>> For dirty tracking 2) is required, as it needs to ensure that
>>> the stage-2/parent IOMMU domain will only attach devices
>>> that support dirty tracking (so far it is all homogeneous in x86, likely
>>> not the case for smmuv3). Such invariant on dirty tracking provides a
>>> useful guarantee to VMMs that will refuse incompatible device
>>> attachments for IOMMU domains.
>>>
>>> For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
>>> which is responsible for creating an IOMMU domain. This is contrast to
>>> the 'simple API' where the IOMMU domain is created by IOMMUFD
>>> automatically when it attaches to VFIO (usually referred as autodomains)
>>>
>>> To support dirty tracking with the advanced IOMMUFD API, it needs
>>> similar logic, where IOMMU domains are created and devices attached to
>>> compatible domains. Essentially mimmicing kernel
>>> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
>>> to IOAS attach.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> Right now the only alternative to a userspace autodomains
>implementation
>>> is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
>>> IOAS attach.
>>
>> It was my expectation that VMM userspace would implement direct hwpt
>> support. This is needed in all kinds of cases going forward because
>> hwpt allocation is not uniform across iommu instances and managing
>> this in the kernel is only feasible for simpler cases. Dirty tracking
>> would be one of them.
>>
>
>/me nods
>
>>> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>>> +                               uint32_t pt_id, uint32_t flags,
>>> +                               uint32_t data_type, uint32_t data_len,
>>> +                               void *data_ptr, uint32_t *out_hwpt)
>>> +{
>>> +    int ret;
>>> +    struct iommu_hwpt_alloc alloc_hwpt = {
>>> +        .size = sizeof(struct iommu_hwpt_alloc),
>>> +        .flags = flags,
>>> +        .dev_id = dev_id,
>>> +        .pt_id = pt_id,
>>> +        .data_type = data_type,
>>> +        .data_len = data_len,
>>> +        .data_uptr = (uint64_t)data_ptr,
>>> +        .__reserved = 0,
>>
>> Unless the coding style requirs this it is not necessary to zero in
>> the __reserved within a bracketed in initializer..
>>
>
>Ah yes; and no other helper is doing this, so definitely doesn't look code
>style. I'll remove it.
>
>>> +static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> +                                        VFIOIOMMUFDContainer *container,
>>> +                                        Error **errp)
>>> +{
>>> +    int iommufd = vbasedev->iommufd_dev.iommufd->fd;
>>> +    VFIOIOASHwpt *hwpt;
>>> +    Error *err = NULL;
>>> +    int ret = -EINVAL;
>>> +    uint32_t hwpt_id;
>>> +
>>> +    /* Try to find a domain */
>>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>>> +        ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id,
>&err);
>>> +        if (ret) {
>>> +            /* -EINVAL means the domain is incompatible with the device. */
>>> +            if (ret == -EINVAL) {
>>
>> Please send a kernel kdoc patch to document this..
>>
>
>Ack
>
>> The approach looks good to me
>>
>> The nesting patches surely need this too?
>
>From what I understand, yes, but they seem to be able to hid this inside
>intel-iommu for the parent hwpt allocation somehow. I'll let them comment;

Yes, we have similar implementation in nesting series. See vtd_device_attach_container() in
https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02750.html

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities()
  2024-02-26  7:29   ` Duan, Zhenzhong
@ 2024-02-26 10:10     ` Joao Martins
  2024-02-27  4:04       ` Duan, Zhenzhong
  0 siblings, 1 reply; 37+ messages in thread
From: Joao Martins @ 2024-02-26 10:10 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
	Paolo Bonzini, Daniel P . Berrange, Eduardo Habkost, Eric Blake,
	Markus Armbruster, Jason Gunthorpe, Avihai Horon

On 26/02/2024 07:29, Duan, Zhenzhong wrote:
> Hi Joao,
> 
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function
>> iommufd_device_get_hw_capabilities()
>>
>> The new helper will fetch vendor agnostic IOMMU capabilities supported
>> both by hardware and software. Right now it is only iommu dirty
>> tracking.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> backends/iommufd.c       | 25 +++++++++++++++++++++++++
>> include/sysemu/iommufd.h |  2 ++
>> 2 files changed, 27 insertions(+)
>>
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index d92791bba935..8486894f1b3f 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -237,3 +237,28 @@ void iommufd_device_init(IOMMUFDDevice *idev)
>>     host_iommu_base_device_init(&idev->base, HID_IOMMUFD,
>>                                 sizeof(IOMMUFDDevice));
>> }
>> +
>> +int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t
>> *caps,
>> +                                       Error **errp)
>> +{
>> +    struct iommu_hw_info info = {
>> +        .size = sizeof(info),
>> +        .flags = 0,
>> +        .dev_id = idev->devid,
>> +        .data_len = 0,
>> +        .__reserved = 0,
>> +        .data_uptr = 0,
>> +        .out_capabilities = 0,
>> +    };
>> +    int ret;
>> +
>> +    ret = ioctl(idev->iommufd->fd, IOMMU_GET_HW_INFO, &info);
>> +    if (ret) {
>> +        error_setg_errno(errp, errno,
>> +                         "Failed to get hardware info capabilities");
>> +    } else {
>> +        *caps = info.out_capabilities;
>> +    }
>> +
>> +    return ret;
>> +}
> 
> This helper is redundant with https://lists.gnu.org/archive/html/qemu-devel/2024-02/msg00031.html
> We have to get other elements in info in nesting series, so mind using that helper
> Instead to avoid redundancy? I can move that patch ahead for your usage.
> 

Sure.

Btw, speaking of which. You series could actually be split into two. One for
iommufd device abstraction part (patch 00-09) and another for the nesting bits
(10-18). FWIW this series here as submitted was actually just placing it on top
of the iommufd device abstraction

I am still planning on adding this same helper, probably just calling into
yours. Mostly because I disregard the data/data-size as I am only interested in
vendor agnostic capabilities.

> Thanks
> Zhenzhong
> 
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> index c3f346976039..4afe97307dbe 100644
>> --- a/include/sysemu/iommufd.h
>> +++ b/include/sysemu/iommufd.h
>> @@ -47,4 +47,6 @@ typedef struct IOMMUFDDevice {
>> } IOMMUFDDevice;
>>
>> void iommufd_device_init(IOMMUFDDevice *idev);
>> +int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t
>> *caps,
>> +                                       Error **errp);
>> #endif
>> --
>> 2.39.3
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities()
  2024-02-26 10:10     ` Joao Martins
@ 2024-02-27  4:04       ` Duan, Zhenzhong
  0 siblings, 0 replies; 37+ messages in thread
From: Duan, Zhenzhong @ 2024-02-27  4:04 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Liu, Yi L, Eric Auger, Alex Williamson, Cedric Le Goater,
	Paolo Bonzini, Daniel P . Berrange, Eduardo Habkost, Eric Blake,
	Markus Armbruster, Jason Gunthorpe, Avihai Horon



>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH RFCv2 1/8] backends/iommufd: Introduce helper
>function iommufd_device_get_hw_capabilities()
>
>On 26/02/2024 07:29, Duan, Zhenzhong wrote:
>> Hi Joao,
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: [PATCH RFCv2 1/8] backends/iommufd: Introduce helper
>function
>>> iommufd_device_get_hw_capabilities()
>>>
>>> The new helper will fetch vendor agnostic IOMMU capabilities supported
>>> both by hardware and software. Right now it is only iommu dirty
>>> tracking.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> backends/iommufd.c       | 25 +++++++++++++++++++++++++
>>> include/sysemu/iommufd.h |  2 ++
>>> 2 files changed, 27 insertions(+)
>>>
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> index d92791bba935..8486894f1b3f 100644
>>> --- a/backends/iommufd.c
>>> +++ b/backends/iommufd.c
>>> @@ -237,3 +237,28 @@ void iommufd_device_init(IOMMUFDDevice
>*idev)
>>>     host_iommu_base_device_init(&idev->base, HID_IOMMUFD,
>>>                                 sizeof(IOMMUFDDevice));
>>> }
>>> +
>>> +int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev,
>uint64_t
>>> *caps,
>>> +                                       Error **errp)
>>> +{
>>> +    struct iommu_hw_info info = {
>>> +        .size = sizeof(info),
>>> +        .flags = 0,
>>> +        .dev_id = idev->devid,
>>> +        .data_len = 0,
>>> +        .__reserved = 0,
>>> +        .data_uptr = 0,
>>> +        .out_capabilities = 0,
>>> +    };
>>> +    int ret;
>>> +
>>> +    ret = ioctl(idev->iommufd->fd, IOMMU_GET_HW_INFO, &info);
>>> +    if (ret) {
>>> +        error_setg_errno(errp, errno,
>>> +                         "Failed to get hardware info capabilities");
>>> +    } else {
>>> +        *caps = info.out_capabilities;
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>
>> This helper is redundant with https://lists.gnu.org/archive/html/qemu-
>devel/2024-02/msg00031.html
>> We have to get other elements in info in nesting series, so mind using that
>helper
>> Instead to avoid redundancy? I can move that patch ahead for your usage.
>>
>
>Sure.
>
>Btw, speaking of which. You series could actually be split into two. One for
>iommufd device abstraction part (patch 00-09) and another for the nesting
>bits
>(10-18). FWIW this series here as submitted was actually just placing it on
>top
>of the iommufd device abstraction

I see, will split in next version.

>
>I am still planning on adding this same helper, probably just calling into
>yours. Mostly because I disregard the data/data-size as I am only interested
>in
>vendor agnostic capabilities.

Sounds good.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-02-27  4:05 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-12 13:56 [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
2024-02-12 13:56 ` [PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities() Joao Martins
2024-02-26  7:29   ` Duan, Zhenzhong
2024-02-26 10:10     ` Joao Martins
2024-02-27  4:04       ` Duan, Zhenzhong
2024-02-12 13:56 ` [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation Joao Martins
2024-02-12 16:27   ` Jason Gunthorpe
2024-02-12 18:09     ` Joao Martins
2024-02-26  9:14       ` Duan, Zhenzhong
2024-02-13 12:01   ` Joao Martins
2024-02-19  8:58   ` Avihai Horon
2024-02-20 10:42     ` Joao Martins
2024-02-12 13:56 ` [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
2024-02-19  9:03   ` Avihai Horon
2024-02-20 10:51     ` Joao Martins
2024-02-20 12:46       ` Avihai Horon
2024-02-12 13:56 ` [PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
2024-02-19  9:17   ` Avihai Horon
2024-02-12 13:56 ` [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
2024-02-19  9:30   ` Avihai Horon
2024-02-20 10:59     ` Joao Martins
2024-02-20 12:52       ` Avihai Horon
2024-02-20 13:17         ` Joao Martins
2024-02-12 13:56 ` [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages Joao Martins
2024-02-12 16:59   ` Jason Gunthorpe
2024-02-12 17:17   ` Markus Armbruster
2024-02-12 19:16     ` Joao Martins
2024-02-19 10:05   ` Avihai Horon
2024-02-20 11:01     ` Joao Martins
2024-02-12 13:56 ` [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
2024-02-19 10:12   ` Avihai Horon
2024-02-20 11:05     ` Joao Martins
2024-02-12 13:56 ` [PATCH RFCv2 8/8] vfio/common: Allow disabling device dirty page tracking Joao Martins
2024-02-13 11:59 ` [PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking Joao Martins
2024-02-14 15:40   ` Cédric Le Goater
2024-02-14 16:25     ` Joao Martins
2024-02-15 14:20       ` Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.