All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Check host IOMMU compatilibity with vIOMMU
@ 2024-04-08  8:43 Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 1/5] intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap Zhenzhong Duan
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Zhenzhong Duan @ 2024-04-08  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, kevin.tian, yi.l.liu, chao.p.peng,
	Zhenzhong Duan

Hi,

Based on Joao's suggestion, the iommufd nesting prerequisite series [1]
is further splitted to host IOMMU device abstract part [2] and vIOMMU
check part. This series implements the 2nd part.

1st part implements get_host_iommu_info() callback which vIOMMU can call to
get host IOMMU info. For legacy VFIO or VDPA device, aw_bits is provided;
for IOMMUFD backed device, IOMMUFD uAPI provides detailed cap/ecap bits from
host.

vIOMMU implements set/unset_iommu_device() callback to get HostIOMMUDevice
and call get_host_iommu_info(). So vIOMMU can do compatibility check with
the return host IOMMU info.

This is also a prerequisite for incoming iommufd nesting series:
'intel_iommu: Enable stage-1 translation' where HostIOMMUDevice provides
more data such as iommufd/devid/ioas_id and callback attach/detach_hwpt()
for vIOMMU to create nested hwpt, attaching/detaching hwpt, etc.

The major change of this version is dropping the cap/ecap update logic based
on MST's suggestion. We can add property for any cap/ecap bit when necessary
just like "aw-bits". This way we don't need to concern about migration
compatibility and code is cleaner.

Qemu code can be found at:
https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_part2_v2

[1] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
[2] https://lists.gnu.org/archive/html/qemu-devel/2024-04/msg00763.html

Thanks
Zhenzhong

Changelog:
v2:
- drop cap/ecap update logic (MST)
- check aw-bits from get_host_iommu_info() in legacy mode

v1:
- convert HostIOMMUDevice to sub object pointer in vtd_check_hdev

rfcv2:
- introduce common abstract HostIOMMUDevice and sub struct for different BEs (Eric, Cédric)
- remove iommufd_device.[ch] (Cédric)
- remove duplicate iommufd/devid define from VFIODevice (Eric)
- drop the p in aliased_pbus and aliased_pdevfn (Eric)
- assert devfn and iommu_bus in pci_device_get_iommu_bus_devfn (Cédric, Eric)
- use errp in iommufd_device_get_info (Eric)
- split and simplify cap/ecap check/sync code in intel_iommu.c (Cédric)
- move VTDHostIOMMUDevice declaration to intel_iommu_internal.h (Cédric)
- make '(vtd->cap_reg >> 16) & 0x3fULL' a MACRO and add missed '+1' (Cédric)
- block migration if vIOMMU cap/ecap updated based on host IOMMU cap/ecap
- add R-B


Yi Liu (2):
  intel_iommu: Implement set/unset_iommu_device() callback
  intel_iommu: Add a framework to do compatibility check with host IOMMU
    cap/ecap

Zhenzhong Duan (3):
  intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap
  intel_iommu: Check for compatibility with legacy device
  intel_iommu: Check for compatibility with iommufd backed device

 hw/i386/intel_iommu_internal.h |   8 ++
 include/hw/i386/intel_iommu.h  |   3 +
 hw/i386/intel_iommu.c          | 242 +++++++++++++++++++++++++++------
 3 files changed, 211 insertions(+), 42 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/5] intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap
  2024-04-08  8:43 [PATCH v2 0/5] Check host IOMMU compatilibity with vIOMMU Zhenzhong Duan
@ 2024-04-08  8:44 ` Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 2/5] intel_iommu: Implement set/unset_iommu_device() callback Zhenzhong Duan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Zhenzhong Duan @ 2024-04-08  8:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, kevin.tian, yi.l.liu, chao.p.peng,
	Zhenzhong Duan, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

Extract cap/ecap initialization in vtd_cap_init() to make code
cleaner.

No functional change intended.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/i386/intel_iommu.c | 93 ++++++++++++++++++++++++-------------------
 1 file changed, 51 insertions(+), 42 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index cc8e59674e..519063c8f8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3934,30 +3934,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
     return;
 }
 
-/* Do the initialization. It will also be called when reset, so pay
- * attention when adding new initialization stuff.
- */
-static void vtd_init(IntelIOMMUState *s)
+static void vtd_cap_init(IntelIOMMUState *s)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
-    memset(s->csr, 0, DMAR_REG_SIZE);
-    memset(s->wmask, 0, DMAR_REG_SIZE);
-    memset(s->w1cmask, 0, DMAR_REG_SIZE);
-    memset(s->womask, 0, DMAR_REG_SIZE);
-
-    s->root = 0;
-    s->root_scalable = false;
-    s->dmar_enabled = false;
-    s->intr_enabled = false;
-    s->iq_head = 0;
-    s->iq_tail = 0;
-    s->iq = 0;
-    s->iq_size = 0;
-    s->qi_enabled = false;
-    s->iq_last_desc_type = VTD_INV_DESC_NONE;
-    s->iq_dw = false;
-    s->next_frcd_reg = 0;
     s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
              VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
              VTD_CAP_MGAW(s->aw_bits);
@@ -3974,27 +3954,6 @@ static void vtd_init(IntelIOMMUState *s)
     }
     s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
-    /*
-     * Rsvd field masks for spte
-     */
-    vtd_spte_rsvd[0] = ~0ULL;
-    vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits,
-                                                  x86_iommu->dt_supported);
-    vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
-    vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
-    vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
-
-    vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits,
-                                                         x86_iommu->dt_supported);
-    vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
-                                                         x86_iommu->dt_supported);
-
-    if (s->scalable_mode || s->snoop_control) {
-        vtd_spte_rsvd[1] &= ~VTD_SPTE_SNP;
-        vtd_spte_rsvd_large[2] &= ~VTD_SPTE_SNP;
-        vtd_spte_rsvd_large[3] &= ~VTD_SPTE_SNP;
-    }
-
     if (x86_iommu_ir_supported(x86_iommu)) {
         s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
         if (s->intr_eim == ON_OFF_AUTO_ON) {
@@ -4027,6 +3986,56 @@ static void vtd_init(IntelIOMMUState *s)
     if (s->pasid) {
         s->ecap |= VTD_ECAP_PASID;
     }
+}
+
+/*
+ * Do the initialization. It will also be called when reset, so pay
+ * attention when adding new initialization stuff.
+ */
+static void vtd_init(IntelIOMMUState *s)
+{
+    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
+
+    memset(s->csr, 0, DMAR_REG_SIZE);
+    memset(s->wmask, 0, DMAR_REG_SIZE);
+    memset(s->w1cmask, 0, DMAR_REG_SIZE);
+    memset(s->womask, 0, DMAR_REG_SIZE);
+
+    s->root = 0;
+    s->root_scalable = false;
+    s->dmar_enabled = false;
+    s->intr_enabled = false;
+    s->iq_head = 0;
+    s->iq_tail = 0;
+    s->iq = 0;
+    s->iq_size = 0;
+    s->qi_enabled = false;
+    s->iq_last_desc_type = VTD_INV_DESC_NONE;
+    s->iq_dw = false;
+    s->next_frcd_reg = 0;
+
+    vtd_cap_init(s);
+
+    /*
+     * Rsvd field masks for spte
+     */
+    vtd_spte_rsvd[0] = ~0ULL;
+    vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits,
+                                                  x86_iommu->dt_supported);
+    vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
+    vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
+    vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
+
+    vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits,
+                                                    x86_iommu->dt_supported);
+    vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
+                                                    x86_iommu->dt_supported);
+
+    if (s->scalable_mode || s->snoop_control) {
+        vtd_spte_rsvd[1] &= ~VTD_SPTE_SNP;
+        vtd_spte_rsvd_large[2] &= ~VTD_SPTE_SNP;
+        vtd_spte_rsvd_large[3] &= ~VTD_SPTE_SNP;
+    }
 
     vtd_reset_caches(s);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/5] intel_iommu: Implement set/unset_iommu_device() callback
  2024-04-08  8:43 [PATCH v2 0/5] Check host IOMMU compatilibity with vIOMMU Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 1/5] intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap Zhenzhong Duan
@ 2024-04-08  8:44 ` Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap Zhenzhong Duan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Zhenzhong Duan @ 2024-04-08  8:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, kevin.tian, yi.l.liu, chao.p.peng,
	Yi Sun, Zhenzhong Duan, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Marcel Apfelbaum

From: Yi Liu <yi.l.liu@intel.com>

Implement set/unset_iommu_device() callback in Intel vIOMMU.
In set call, a new structure VTDHostIOMMUDevice which holds
a reference to HostIOMMUDevice is stored in hash table
indexed by PCI BDF.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/i386/intel_iommu_internal.h |  8 ++++
 include/hw/i386/intel_iommu.h  |  2 +
 hw/i386/intel_iommu.c          | 76 ++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f8cf99bddf..becafd03c1 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -537,4 +537,12 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_SL_IGN_COM              0xbff0000000000000ULL
 #define VTD_SL_TM                   (1ULL << 62)
 
+
+typedef struct VTDHostIOMMUDevice {
+    IntelIOMMUState *iommu_state;
+    PCIBus *bus;
+    uint8_t devfn;
+    HostIOMMUDevice *dev;
+    QLIST_ENTRY(VTDHostIOMMUDevice) next;
+} VTDHostIOMMUDevice;
 #endif
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 7fa0a695c8..bbc7b96add 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -292,6 +292,8 @@ struct IntelIOMMUState {
     /* list of registered notifiers */
     QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
 
+    GHashTable *vtd_host_iommu_dev;             /* VTDHostIOMMUDevice */
+
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
     dma_addr_t intr_root;           /* Interrupt remapping table pointer */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 519063c8f8..4f84e2e801 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -237,6 +237,13 @@ static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
            (key1->pasid == key2->pasid);
 }
 
+static gboolean vtd_as_idev_equal(gconstpointer v1, gconstpointer v2)
+{
+    const struct vtd_as_key *key1 = v1;
+    const struct vtd_as_key *key2 = v2;
+
+    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+}
 /*
  * Note that we use pointer to PCIBus as the key, so hashing/shifting
  * based on the pointer value is intended. Note that we deal with
@@ -3812,6 +3819,70 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
     return vtd_dev_as;
 }
 
+static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
+                                    HostIOMMUDevice *hiod, Error **errp)
+{
+    IntelIOMMUState *s = opaque;
+    VTDHostIOMMUDevice *vtd_hdev;
+    struct vtd_as_key key = {
+        .bus = bus,
+        .devfn = devfn,
+    };
+    struct vtd_as_key *new_key;
+
+    assert(hiod);
+
+    vtd_iommu_lock(s);
+
+    vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, &key);
+
+    if (vtd_hdev) {
+        error_setg(errp, "IOMMUFD device already exist");
+        vtd_iommu_unlock(s);
+        return -EEXIST;
+    }
+
+    vtd_hdev = g_malloc0(sizeof(VTDHostIOMMUDevice));
+    vtd_hdev->bus = bus;
+    vtd_hdev->devfn = (uint8_t)devfn;
+    vtd_hdev->iommu_state = s;
+    vtd_hdev->dev = hiod;
+
+    new_key = g_malloc(sizeof(*new_key));
+    new_key->bus = bus;
+    new_key->devfn = devfn;
+
+    object_ref(hiod);
+    g_hash_table_insert(s->vtd_host_iommu_dev, new_key, vtd_hdev);
+
+    vtd_iommu_unlock(s);
+
+    return 0;
+}
+
+static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
+{
+    IntelIOMMUState *s = opaque;
+    VTDHostIOMMUDevice *vtd_hdev;
+    struct vtd_as_key key = {
+        .bus = bus,
+        .devfn = devfn,
+    };
+
+    vtd_iommu_lock(s);
+
+    vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, &key);
+    if (!vtd_hdev) {
+        vtd_iommu_unlock(s);
+        return;
+    }
+
+    g_hash_table_remove(s->vtd_host_iommu_dev, &key);
+    object_unref(vtd_hdev->dev);
+
+    vtd_iommu_unlock(s);
+}
+
 /* Unmap the whole range in the notifier's scope. */
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 {
@@ -4116,6 +4187,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
 static PCIIOMMUOps vtd_iommu_ops = {
     .get_address_space = vtd_host_dma_iommu,
+    .set_iommu_device = vtd_dev_set_iommu_device,
+    .unset_iommu_device = vtd_dev_unset_iommu_device,
 };
 
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
@@ -4235,6 +4308,9 @@ static void vtd_realize(DeviceState *dev, Error **errp)
                                      g_free, g_free);
     s->vtd_address_spaces = g_hash_table_new_full(vtd_as_hash, vtd_as_equal,
                                       g_free, g_free);
+    s->vtd_host_iommu_dev = g_hash_table_new_full(vtd_as_hash,
+                                                  vtd_as_idev_equal,
+                                                  g_free, g_free);
     vtd_init(s);
     pci_setup_iommu(bus, &vtd_iommu_ops, dev);
     /* Pseudo address space under root PCI bus. */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-08  8:43 [PATCH v2 0/5] Check host IOMMU compatilibity with vIOMMU Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 1/5] intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 2/5] intel_iommu: Implement set/unset_iommu_device() callback Zhenzhong Duan
@ 2024-04-08  8:44 ` Zhenzhong Duan
  2024-04-15 15:31   ` Cédric Le Goater
  2024-04-08  8:44 ` [PATCH v2 4/5] intel_iommu: Check for compatibility with legacy device Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 5/5] intel_iommu: Check for compatibility with iommufd backed device Zhenzhong Duan
  4 siblings, 1 reply; 19+ messages in thread
From: Zhenzhong Duan @ 2024-04-08  8:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, kevin.tian, yi.l.liu, chao.p.peng,
	Yi Sun, Zhenzhong Duan, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

From: Yi Liu <yi.l.liu@intel.com>

If check fails, the host side device(either vfio or vdpa device) should not
be passed to guest.

Implementation details for different backends will be in following patches.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/i386/intel_iommu.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4f84e2e801..a49b587c73 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -35,6 +35,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/dma.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/iommufd.h"
 #include "hw/i386/apic_internal.h"
 #include "kvm/kvm_i386.h"
 #include "migration/vmstate.h"
@@ -3819,6 +3820,32 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
     return vtd_dev_as;
 }
 
+static int vtd_check_legacy_hdev(IntelIOMMUState *s,
+                                 HostIOMMUDevice *hiod,
+                                 Error **errp)
+{
+    return 0;
+}
+
+static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
+                                  HostIOMMUDevice *hiod,
+                                  Error **errp)
+{
+    return 0;
+}
+
+static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
+                          Error **errp)
+{
+    HostIOMMUDevice *hiod = vtd_hdev->dev;
+
+    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
+        return vtd_check_iommufd_hdev(s, hiod, errp);
+    }
+
+    return vtd_check_legacy_hdev(s, hiod, errp);
+}
+
 static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
                                     HostIOMMUDevice *hiod, Error **errp)
 {
@@ -3829,6 +3856,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
         .devfn = devfn,
     };
     struct vtd_as_key *new_key;
+    int ret;
 
     assert(hiod);
 
@@ -3848,6 +3876,13 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
     vtd_hdev->iommu_state = s;
     vtd_hdev->dev = hiod;
 
+    ret = vtd_check_hdev(s, vtd_hdev, errp);
+    if (ret) {
+        g_free(vtd_hdev);
+        vtd_iommu_unlock(s);
+        return ret;
+    }
+
     new_key = g_malloc(sizeof(*new_key));
     new_key->bus = bus;
     new_key->devfn = devfn;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/5] intel_iommu: Check for compatibility with legacy device
  2024-04-08  8:43 [PATCH v2 0/5] Check host IOMMU compatilibity with vIOMMU Zhenzhong Duan
                   ` (2 preceding siblings ...)
  2024-04-08  8:44 ` [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap Zhenzhong Duan
@ 2024-04-08  8:44 ` Zhenzhong Duan
  2024-04-08  8:44 ` [PATCH v2 5/5] intel_iommu: Check for compatibility with iommufd backed device Zhenzhong Duan
  4 siblings, 0 replies; 19+ messages in thread
From: Zhenzhong Duan @ 2024-04-08  8:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, kevin.tian, yi.l.liu, chao.p.peng,
	Zhenzhong Duan, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

Currently only stage-2 translation is supported which is backed by
shadow page table on host side. So we don't need exact matching of
each bit of cap/ecap between vIOMMU and host. However, we can still
ensure compatibility of host and vIOMMU's address width at least,
i.e., vIOMMU's aw-bits <= host IOMMU aw-bits, which is missed before.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/i386/intel_iommu.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a49b587c73..d2cd186df0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3824,6 +3824,21 @@ static int vtd_check_legacy_hdev(IntelIOMMUState *s,
                                  HostIOMMUDevice *hiod,
                                  Error **errp)
 {
+    HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
+    HIOD_LEGACY_INFO info;
+    int ret;
+
+    ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
+    if (ret) {
+        return ret;
+    }
+
+    if (s->aw_bits > info.aw_bits) {
+        error_setg(errp, "aw-bits %d > host aw-bits %d",
+                   s->aw_bits, info.aw_bits);
+        return -EINVAL;
+    }
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/5] intel_iommu: Check for compatibility with iommufd backed device
  2024-04-08  8:43 [PATCH v2 0/5] Check host IOMMU compatilibity with vIOMMU Zhenzhong Duan
                   ` (3 preceding siblings ...)
  2024-04-08  8:44 ` [PATCH v2 4/5] intel_iommu: Check for compatibility with legacy device Zhenzhong Duan
@ 2024-04-08  8:44 ` Zhenzhong Duan
  4 siblings, 0 replies; 19+ messages in thread
From: Zhenzhong Duan @ 2024-04-08  8:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, kevin.tian, yi.l.liu, chao.p.peng,
	Zhenzhong Duan, Yi Sun, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

Currently only stage-2 translation is supported which is backed by
shadow page table on host side. So we don't need exact matching of
each bit of cap/ecap between vIOMMU and host. However, we can still
ensure compatibility of host and vIOMMU's address width at least,
i.e., vIOMMU's aw-bits <= host IOMMU aw-bits, which is missed before.

When stage-1 translation is supported in future, a.k.a. scalable
modern mode, this mechanism will be further extended to check more
bits.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/i386/intel_iommu.h |  1 +
 hw/i386/intel_iommu.c         | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index bbc7b96add..2bbde41e45 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -47,6 +47,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(IntelIOMMUState, INTEL_IOMMU_DEVICE)
 #define VTD_HOST_AW_48BIT           48
 #define VTD_HOST_ADDRESS_WIDTH      VTD_HOST_AW_39BIT
 #define VTD_HAW_MASK(aw)            ((1ULL << (aw)) - 1)
+#define VTD_MGAW_FROM_CAP(cap)      ((cap >> 16) & 0x3fULL)
 
 #define DMAR_REPORT_F_INTR          (1)
 
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d2cd186df0..d8fac9ef9f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3846,6 +3846,29 @@ static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
                                   HostIOMMUDevice *hiod,
                                   Error **errp)
 {
+    HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
+    struct iommu_hw_info_vtd *vtd;
+    HIOD_IOMMUFD_INFO info;
+    int host_aw_bits, ret;
+
+    ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
+    if (ret) {
+        return ret;
+    }
+
+    if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
+        error_setg(errp, "IOMMU hardware is not compatible");
+        return -EINVAL;
+    }
+
+    vtd = &info.data.vtd;
+    host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1;
+    if (s->aw_bits > host_aw_bits) {
+        error_setg(errp, "aw-bits %d > host aw-bits %d",
+                   s->aw_bits, host_aw_bits);
+        return -EINVAL;
+    }
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-08  8:44 ` [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap Zhenzhong Duan
@ 2024-04-15 15:31   ` Cédric Le Goater
  2024-04-16  7:09     ` Duan, Zhenzhong
  0 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2024-04-15 15:31 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, kevin.tian, yi.l.liu, chao.p.peng,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

On 4/8/24 10:44, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
> 
> If check fails, the host side device(either vfio or vdpa device) should not
> be passed to guest.
> 
> Implementation details for different backends will be in following patches.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/i386/intel_iommu.c | 35 +++++++++++++++++++++++++++++++++++
>   1 file changed, 35 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 4f84e2e801..a49b587c73 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -35,6 +35,7 @@
>   #include "sysemu/kvm.h"
>   #include "sysemu/dma.h"
>   #include "sysemu/sysemu.h"
> +#include "sysemu/iommufd.h"
>   #include "hw/i386/apic_internal.h"
>   #include "kvm/kvm_i386.h"
>   #include "migration/vmstate.h"
> @@ -3819,6 +3820,32 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>       return vtd_dev_as;
>   }
>   
> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
> +                                 HostIOMMUDevice *hiod,
> +                                 Error **errp)
> +{
> +    return 0;
> +}
> +
> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
> +                                  HostIOMMUDevice *hiod,
> +                                  Error **errp)
> +{
> +    return 0;
> +}
> +
> +static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
> +                          Error **errp)
> +{
> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
> +
> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
> +        return vtd_check_iommufd_hdev(s, hiod, errp);
> +    }
> +
> +    return vtd_check_legacy_hdev(s, hiod, errp);
> +}


I think we should be using the .get_host_iommu_info() class handler
instead. Can we refactor the code slightly to avoid this check on
the type ?


Thanks,

C.




> +
>   static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
>                                       HostIOMMUDevice *hiod, Error **errp)
>   {
> @@ -3829,6 +3856,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
>           .devfn = devfn,
>       };
>       struct vtd_as_key *new_key;
> +    int ret;
>   
>       assert(hiod);
>   
> @@ -3848,6 +3876,13 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
>       vtd_hdev->iommu_state = s;
>       vtd_hdev->dev = hiod;
>   
> +    ret = vtd_check_hdev(s, vtd_hdev, errp);
> +    if (ret) {
> +        g_free(vtd_hdev);
> +        vtd_iommu_unlock(s);
> +        return ret;
> +    }
> +
>       new_key = g_malloc(sizeof(*new_key));
>       new_key->bus = bus;
>       new_key->devfn = devfn;



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-15 15:31   ` Cédric Le Goater
@ 2024-04-16  7:09     ` Duan, Zhenzhong
  2024-04-16 14:17       ` Cédric Le Goater
  0 siblings, 1 reply; 19+ messages in thread
From: Duan, Zhenzhong @ 2024-04-16  7:09 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

Hi Cédric,

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>On 4/8/24 10:44, Zhenzhong Duan wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> If check fails, the host side device(either vfio or vdpa device) should not
>> be passed to guest.
>>
>> Implementation details for different backends will be in following patches.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/i386/intel_iommu.c | 35
>+++++++++++++++++++++++++++++++++++
>>   1 file changed, 35 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 4f84e2e801..a49b587c73 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -35,6 +35,7 @@
>>   #include "sysemu/kvm.h"
>>   #include "sysemu/dma.h"
>>   #include "sysemu/sysemu.h"
>> +#include "sysemu/iommufd.h"
>>   #include "hw/i386/apic_internal.h"
>>   #include "kvm/kvm_i386.h"
>>   #include "migration/vmstate.h"
>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>*vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>       return vtd_dev_as;
>>   }
>>
>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>> +                                 HostIOMMUDevice *hiod,
>> +                                 Error **errp)
>> +{
>> +    return 0;
>> +}
>> +
>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>> +                                  HostIOMMUDevice *hiod,
>> +                                  Error **errp)
>> +{
>> +    return 0;
>> +}
>> +
>> +static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>*vtd_hdev,
>> +                          Error **errp)
>> +{
>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>> +
>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>> +    }
>> +
>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>> +}
>
>
>I think we should be using the .get_host_iommu_info() class handler
>instead. Can we refactor the code slightly to avoid this check on
>the type ?

There is some difficulty ini avoiding this check, the behavior of vtd_check_legacy_hdev
and vtd_check_iommufd_hdev are different especially after nesting support introduced.
vtd_check_iommufd_hdev() has much wider check over cap/ecap bits besides aw_bits.
That the reason I have two functions to do different thing.
See:
https://github.com/yiliu1765/qemu/blob/zhenzhong/iommufd_nesting_rfcv2/hw/i386/intel_iommu.c#L5472

Meanwhile in vtd_check_legacy_hdev(), when legacy VFIO device attaches to modern vIOMMU,
this is unsupported and error out early, it will not call .get_host_iommu_info().
I mean we don't need to unconditionally call .get_host_iommu_info() in some cases.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-16  7:09     ` Duan, Zhenzhong
@ 2024-04-16 14:17       ` Cédric Le Goater
  2024-04-17  4:21         ` Duan, Zhenzhong
  0 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2024-04-16 14:17 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

Hello,

On 4/16/24 09:09, Duan, Zhenzhong wrote:
> Hi Cédric,
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> compatibility check with host IOMMU cap/ecap
>>
>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>> From: Yi Liu <yi.l.liu@intel.com>
>>>
>>> If check fails, the host side device(either vfio or vdpa device) should not
>>> be passed to guest.
>>>
>>> Implementation details for different backends will be in following patches.
>>>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>    hw/i386/intel_iommu.c | 35
>> +++++++++++++++++++++++++++++++++++
>>>    1 file changed, 35 insertions(+)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index 4f84e2e801..a49b587c73 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -35,6 +35,7 @@
>>>    #include "sysemu/kvm.h"
>>>    #include "sysemu/dma.h"
>>>    #include "sysemu/sysemu.h"
>>> +#include "sysemu/iommufd.h"
>>>    #include "hw/i386/apic_internal.h"
>>>    #include "kvm/kvm_i386.h"
>>>    #include "migration/vmstate.h"
>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>        return vtd_dev_as;
>>>    }
>>>
>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>> +                                 HostIOMMUDevice *hiod,
>>> +                                 Error **errp)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>> +                                  HostIOMMUDevice *hiod,
>>> +                                  Error **errp)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>> *vtd_hdev,
>>> +                          Error **errp)
>>> +{
>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>> +
>>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>> +    }
>>> +
>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>> +}
>>
>>
>> I think we should be using the .get_host_iommu_info() class handler
>> instead. Can we refactor the code slightly to avoid this check on
>> the type ?
> 
> There is some difficulty ini avoiding this check, the behavior of vtd_check_legacy_hdev
> and vtd_check_iommufd_hdev are different especially after nesting support introduced.
> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits besides aw_bits.

I think it is important to fully separate the vIOMMU model from the
host IOMMU backing device. Could we introduce a new HostIOMMUDeviceClass
handler .check_hdev() handler, which would call .get_host_iommu_info() ?


Thanks,

C.


> That the reason I have two functions to do different thing.
> See:
> https://github.com/yiliu1765/qemu/blob/zhenzhong/iommufd_nesting_rfcv2/hw/i386/intel_iommu.c#L5472
> 
> Meanwhile in vtd_check_legacy_hdev(), when legacy VFIO device attaches to modern vIOMMU,
> this is unsupported and error out early, it will not call .get_host_iommu_info().
> I mean we don't need to unconditionally call .get_host_iommu_info() in some cases.
> 
> Thanks
> Zhenzhong



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-16 14:17       ` Cédric Le Goater
@ 2024-04-17  4:21         ` Duan, Zhenzhong
  2024-04-17  8:30           ` Cédric Le Goater
  0 siblings, 1 reply; 19+ messages in thread
From: Duan, Zhenzhong @ 2024-04-17  4:21 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>Hello,
>
>On 4/16/24 09:09, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>
>>>> If check fails, the host side device(either vfio or vdpa device) should not
>>>> be passed to guest.
>>>>
>>>> Implementation details for different backends will be in following
>patches.
>>>>
>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>    hw/i386/intel_iommu.c | 35
>>> +++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 35 insertions(+)
>>>>
>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>> index 4f84e2e801..a49b587c73 100644
>>>> --- a/hw/i386/intel_iommu.c
>>>> +++ b/hw/i386/intel_iommu.c
>>>> @@ -35,6 +35,7 @@
>>>>    #include "sysemu/kvm.h"
>>>>    #include "sysemu/dma.h"
>>>>    #include "sysemu/sysemu.h"
>>>> +#include "sysemu/iommufd.h"
>>>>    #include "hw/i386/apic_internal.h"
>>>>    #include "kvm/kvm_i386.h"
>>>>    #include "migration/vmstate.h"
>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>        return vtd_dev_as;
>>>>    }
>>>>
>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>> +                                 HostIOMMUDevice *hiod,
>>>> +                                 Error **errp)
>>>> +{
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>> +                                  HostIOMMUDevice *hiod,
>>>> +                                  Error **errp)
>>>> +{
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>VTDHostIOMMUDevice
>>> *vtd_hdev,
>>>> +                          Error **errp)
>>>> +{
>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>> +
>>>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>> +    }
>>>> +
>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>> +}
>>>
>>>
>>> I think we should be using the .get_host_iommu_info() class handler
>>> instead. Can we refactor the code slightly to avoid this check on
>>> the type ?
>>
>> There is some difficulty ini avoiding this check, the behavior of
>vtd_check_legacy_hdev
>> and vtd_check_iommufd_hdev are different especially after nesting
>support introduced.
>> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits
>besides aw_bits.
>
>I think it is important to fully separate the vIOMMU model from the
>host IOMMU backing device. Could we introduce a new
>HostIOMMUDeviceClass
>handler .check_hdev() handler, which would call .get_host_iommu_info() ?

Understood, besides the new .check_hdev() handler, I think we also need a new interface
class TYPE_IOMMU_CHECK_HDEV which has two handlers check_[legacy|iommufd]_hdev(),
and different vIOMMUs have different implementation.

Then legacy and iommufd host device have different implementation of .check_hdev()
and calls into one of the two interface handlers.

Let me know if I misunderstand any of your point.

Thanks
Zhenzhong

>
>
>Thanks,
>
>C.
>
>
>> That the reason I have two functions to do different thing.
>> See:
>>
>https://github.com/yiliu1765/qemu/blob/zhenzhong/iommufd_nesting_rfc
>v2/hw/i386/intel_iommu.c#L5472
>>
>> Meanwhile in vtd_check_legacy_hdev(), when legacy VFIO device attaches
>to modern vIOMMU,
>> this is unsupported and error out early, it will not
>call .get_host_iommu_info().
>> I mean we don't need to unconditionally call .get_host_iommu_info() in
>some cases.
>>
>> Thanks
>> Zhenzhong


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-17  4:21         ` Duan, Zhenzhong
@ 2024-04-17  8:30           ` Cédric Le Goater
  2024-04-17  9:24             ` Duan, Zhenzhong
  0 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2024-04-17  8:30 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

On 4/17/24 06:21, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> compatibility check with host IOMMU cap/ecap
>>
>> Hello,
>>
>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>> Hi Cédric,
>>>
>>>> -----Original Message-----
>>>> From: Cédric Le Goater <clg@redhat.com>
>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>> compatibility check with host IOMMU cap/ecap
>>>>
>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>
>>>>> If check fails, the host side device(either vfio or vdpa device) should not
>>>>> be passed to guest.
>>>>>
>>>>> Implementation details for different backends will be in following
>> patches.
>>>>>
>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>     hw/i386/intel_iommu.c | 35
>>>> +++++++++++++++++++++++++++++++++++
>>>>>     1 file changed, 35 insertions(+)
>>>>>
>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>> index 4f84e2e801..a49b587c73 100644
>>>>> --- a/hw/i386/intel_iommu.c
>>>>> +++ b/hw/i386/intel_iommu.c
>>>>> @@ -35,6 +35,7 @@
>>>>>     #include "sysemu/kvm.h"
>>>>>     #include "sysemu/dma.h"
>>>>>     #include "sysemu/sysemu.h"
>>>>> +#include "sysemu/iommufd.h"
>>>>>     #include "hw/i386/apic_internal.h"
>>>>>     #include "kvm/kvm_i386.h"
>>>>>     #include "migration/vmstate.h"
>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>         return vtd_dev_as;
>>>>>     }
>>>>>
>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>> +                                 HostIOMMUDevice *hiod,
>>>>> +                                 Error **errp)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>> +                                  HostIOMMUDevice *hiod,
>>>>> +                                  Error **errp)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>> VTDHostIOMMUDevice
>>>> *vtd_hdev,
>>>>> +                          Error **errp)
>>>>> +{
>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>> +
>>>>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>> +    }
>>>>> +
>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>> +}
>>>>
>>>>
>>>> I think we should be using the .get_host_iommu_info() class handler
>>>> instead. Can we refactor the code slightly to avoid this check on
>>>> the type ?
>>>
>>> There is some difficulty ini avoiding this check, the behavior of
>> vtd_check_legacy_hdev
>>> and vtd_check_iommufd_hdev are different especially after nesting
>> support introduced.
>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits
>> besides aw_bits.
>>
>> I think it is important to fully separate the vIOMMU model from the
>> host IOMMU backing device. Could we introduce a new
>> HostIOMMUDeviceClass
>> handler .check_hdev() handler, which would call .get_host_iommu_info() ?
> 
> Understood, besides the new .check_hdev() handler, I think we also need a new interface
> class TYPE_IOMMU_CHECK_HDEV which has two handlers check_[legacy|iommufd]_hdev(),
> and different vIOMMUs have different implementation.

I am not sure to understand. Which class hierarchy would implement this
new "TYPE_IOMMU_CHECK_HDEV" interface ? vIOMMU or host iommu  ?

Could you please explain with an update of your diagram :

                         HostIOMMUDevice
                                | .get_host_iommu_info()
                                |
                                |
             .------------------------------------.
             |                  |                 |
       HIODLegacyVFIO    [HIODLegacyVDPA]    HIODIOMMUFD
             | .vdev            | [.vdev]         | .iommufd
                                                  | .devid
                                                  | [.ioas_id]
                                                  | [.attach_hwpt()]
                                                  | [.detach_hwpt()]
                                                  |
                                     .----------------------.
                                     |                      |
                            HIODIOMMUFDVFIO         [HIODIOMMUFDVDPA]
                                     | .vdev                | [.vdev]


Thanks,

C.


> Then legacy and iommufd host device have different implementation of .check_hdev()
> and calls into one of the two interface handlers.
> 
> Let me know if I misunderstand any of your point.
> 
> Thanks
> Zhenzhong
> 
>>
>>
>> Thanks,
>>
>> C.
>>
>>
>>> That the reason I have two functions to do different thing.
>>> See:
>>>
>> https://github.com/yiliu1765/qemu/blob/zhenzhong/iommufd_nesting_rfc
>> v2/hw/i386/intel_iommu.c#L5472
>>>
>>> Meanwhile in vtd_check_legacy_hdev(), when legacy VFIO device attaches
>> to modern vIOMMU,
>>> this is unsupported and error out early, it will not
>> call .get_host_iommu_info().
>>> I mean we don't need to unconditionally call .get_host_iommu_info() in
>> some cases.
>>>
>>> Thanks
>>> Zhenzhong
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-17  8:30           ` Cédric Le Goater
@ 2024-04-17  9:24             ` Duan, Zhenzhong
  2024-04-18  6:42               ` Cédric Le Goater
  0 siblings, 1 reply; 19+ messages in thread
From: Duan, Zhenzhong @ 2024-04-17  9:24 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> Hello,
>>>
>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>> Hi Cédric,
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>> compatibility check with host IOMMU cap/ecap
>>>>>
>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>
>>>>>> If check fails, the host side device(either vfio or vdpa device) should
>not
>>>>>> be passed to guest.
>>>>>>
>>>>>> Implementation details for different backends will be in following
>>> patches.
>>>>>>
>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>> ---
>>>>>>     hw/i386/intel_iommu.c | 35
>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 35 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>> @@ -35,6 +35,7 @@
>>>>>>     #include "sysemu/kvm.h"
>>>>>>     #include "sysemu/dma.h"
>>>>>>     #include "sysemu/sysemu.h"
>>>>>> +#include "sysemu/iommufd.h"
>>>>>>     #include "hw/i386/apic_internal.h"
>>>>>>     #include "kvm/kvm_i386.h"
>>>>>>     #include "migration/vmstate.h"
>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>         return vtd_dev_as;
>>>>>>     }
>>>>>>
>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>> +                                 Error **errp)
>>>>>> +{
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>> +                                  Error **errp)
>>>>>> +{
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>> VTDHostIOMMUDevice
>>>>> *vtd_hdev,
>>>>>> +                          Error **errp)
>>>>>> +{
>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>> +
>>>>>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>> +    }
>>>>>> +
>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>> +}
>>>>>
>>>>>
>>>>> I think we should be using the .get_host_iommu_info() class handler
>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>> the type ?
>>>>
>>>> There is some difficulty ini avoiding this check, the behavior of
>>> vtd_check_legacy_hdev
>>>> and vtd_check_iommufd_hdev are different especially after nesting
>>> support introduced.
>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits
>>> besides aw_bits.
>>>
>>> I think it is important to fully separate the vIOMMU model from the
>>> host IOMMU backing device. Could we introduce a new
>>> HostIOMMUDeviceClass
>>> handler .check_hdev() handler, which would call .get_host_iommu_info() ?
>>
>> Understood, besides the new .check_hdev() handler, I think we also need a
>new interface
>> class TYPE_IOMMU_CHECK_HDEV which has two handlers
>check_[legacy|iommufd]_hdev(),
>> and different vIOMMUs have different implementation.
>
>I am not sure to understand. Which class hierarchy would implement this
>new "TYPE_IOMMU_CHECK_HDEV" interface ? vIOMMU or host iommu  ?
>
>Could you please explain with an update of your diagram :
>
>                         HostIOMMUDevice
>                                | .get_host_iommu_info()
>                                |
>                                |
>             .------------------------------------.
>             |                  |                 |
>       HIODLegacyVFIO    [HIODLegacyVDPA]    HIODIOMMUFD
>             | .vdev            | [.vdev]         | .iommufd
>                                                  | .devid
>                                                  | [.ioas_id]
>                                                  | [.attach_hwpt()]
>                                                  | [.detach_hwpt()]
>                                                  |
>                                     .----------------------.
>                                     |                      |
>                            HIODIOMMUFDVFIO         [HIODIOMMUFDVDPA]
>                                     | .vdev                | [.vdev]
>

Sure.

                         HostIOMMUDevice
                                | .get_host_iommu_info()
                                | .check_hdev()
                                |
                   .------------------------------.
                   |                              |
               HIODLegacy                    HIODIOMMUFD
                   |                              | .iommufd
             .--------------.                     | .devid
             |              |                     | [.ioas_id]
       HIODLegacyVFIO    [HIODLegacyVDPA]         | [.attach_hwpt()]
             | .vdev            | [.vdev]         | [.detach_hwpt()]
                                                  |
                                     .----------------------.
                                     |                      |
                            HIODIOMMUFDVFIO         [HIODIOMMUFDVDPA]
                                     | .vdev                | [.vdev]


HostIOMMUDevice only declare .check_hdev(), but
HIODLegacy and HIODIOMMUFD will implement .check_hdev().
E.g., hiod_legacy_check_hdev() and hiod_iommufd_check_hdev().

int hiod_legacy_check_hdev(HostIOMMUDevice *hiod, IOMMUCheckHDev *viommu, Error **errp)
{
    IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_GET_CLASS(viommu);

    return chdc->check_legacy_hdev(viommu, hiod, errp);
}

int hiod_iommufd_check_hdev(HostIOMMUDevice *hiod, IOMMUCheckHDev *viommu, Error **errp)
{
    IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_GET_CLASS(viommu);

    return chdc->check_iommufd_hdev(viommu, hiod, errp);
}

And we implement interface TYPE_IOMMU_CHECK_HDEV in intel-iommu module.
Certainly, we can also implement the same in other vIOMMUs we want.
See below pseudo change:

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 68380d50ca..173c702b9f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -5521,12 +5521,9 @@ static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
                           Error **errp)
 {
     HostIOMMUDevice *hiod = vtd_hdev->dev;
+    HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);

-    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
-        return vtd_check_iommufd_hdev(s, vtd_hdev, errp);
-    }
-
-    return vtd_check_legacy_hdev(s, hiod, errp);
+    return hiodc->check_hdev(IOMMU_CHECK_HDEV(s), hiod, errp);
 }

 static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
@@ -6076,6 +6073,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass);
+    IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_CLASS(klass);

     dc->reset = vtd_reset;
     dc->vmsd = &vtd_vmstate;
@@ -6087,6 +6085,8 @@ static void vtd_class_init(ObjectClass *klass, void *data)
     dc->user_creatable = true;
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     dc->desc = "Intel IOMMU (VT-d) DMA Remapping device";
+    chdc->check_legacy_hdev = vtd_check_legacy_hdev;
+    chdc->check_iommufd_hdev = vtd_check_iommufd_hdev;
 }

 static const TypeInfo vtd_info = {
@@ -6094,6 +6094,10 @@ static const TypeInfo vtd_info = {
     .parent        = TYPE_X86_IOMMU_DEVICE,
     .instance_size = sizeof(IntelIOMMUState),
     .class_init    = vtd_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_IOMMU_CHECK_HDEV },
+        { }
+    }
 };

Thanks
Zhenzhong

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-17  9:24             ` Duan, Zhenzhong
@ 2024-04-18  6:42               ` Cédric Le Goater
  2024-04-18  8:42                 ` Duan, Zhenzhong
  0 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2024-04-18  6:42 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

Hello Zhenzhong

On 4/17/24 11:24, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> compatibility check with host IOMMU cap/ecap
>>
>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Cédric Le Goater <clg@redhat.com>
>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>> compatibility check with host IOMMU cap/ecap
>>>>
>>>> Hello,
>>>>
>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>> Hi Cédric,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>
>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>
>>>>>>> If check fails, the host side device(either vfio or vdpa device) should
>> not
>>>>>>> be passed to guest.
>>>>>>>
>>>>>>> Implementation details for different backends will be in following
>>>> patches.
>>>>>>>
>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>> ---
>>>>>>>      hw/i386/intel_iommu.c | 35
>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>      1 file changed, 35 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>      #include "sysemu/kvm.h"
>>>>>>>      #include "sysemu/dma.h"
>>>>>>>      #include "sysemu/sysemu.h"
>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>      #include "hw/i386/apic_internal.h"
>>>>>>>      #include "kvm/kvm_i386.h"
>>>>>>>      #include "migration/vmstate.h"
>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>          return vtd_dev_as;
>>>>>>>      }
>>>>>>>
>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>> +                                 Error **errp)
>>>>>>> +{
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>> +                                  Error **errp)
>>>>>>> +{
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>> VTDHostIOMMUDevice
>>>>>> *vtd_hdev,
>>>>>>> +                          Error **errp)
>>>>>>> +{
>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>> +
>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>> +}
>>>>>>
>>>>>>
>>>>>> I think we should be using the .get_host_iommu_info() class handler
>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>> the type ?
>>>>>
>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>> vtd_check_legacy_hdev
>>>>> and vtd_check_iommufd_hdev are different especially after nesting
>>>> support introduced.
>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits
>>>> besides aw_bits.
>>>>
>>>> I think it is important to fully separate the vIOMMU model from the
>>>> host IOMMU backing device. 

This comment is true for the structures also.

>>>> Could we introduce a new HostIOMMUDeviceClass
>>>> handler .check_hdev() handler, which would call .get_host_iommu_info() ?

This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should be
a common structure 'HostIOMMUDeviceInfo' holding all attributes
for the different backends. Each .get_host_iommu_info() implementation
would translate the specific host iommu device data presentation
into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.

'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
type attribute and host iommu device type definitions, or as you
suggested with a QOM interface. This is more complex however. In
this case, I would suggest to implement a .compatible() handler to
compare the host iommu device type with the vIOMMU type.

The resulting check_hdev routine would look something like :

static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
                           Error **errp)
{
     HostIOMMUDevice *hiod = vtd_hdev->dev;
     HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
     HostIOMMUDevice info;
     int host_aw_bits, ret;

     ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
     if (ret) {
         return ret;
     }

     ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
     if (ret) {
         return ret;
     }
     
     if (s->aw_bits > info.aw_bits) {
         error_setg(errp, "aw-bits %d > host aw-bits %d",
                    s->aw_bits, info.aw_bits);
         return -EINVAL;
     }
}

and the HostIOMMUDeviceClass::is_compatible() handler would call a
vIOMMUInterface::compatible() handler simply returning
IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?

Including the type in HostIOMMUDeviceInfo is much simpler to start with.

Thanks,

C.





>>>
>>> Understood, besides the new .check_hdev() handler, I think we also need a
>> new interface
>>> class TYPE_IOMMU_CHECK_HDEV which has two handlers
>> check_[legacy|iommufd]_hdev(),
>>> and different vIOMMUs have different implementation.
>>
>> I am not sure to understand. Which class hierarchy would implement this
>> new "TYPE_IOMMU_CHECK_HDEV" interface ? vIOMMU or host iommu  ?
>>
>> Could you please explain with an update of your diagram :
>>
>>                          HostIOMMUDevice
>>                                 | .get_host_iommu_info()
>>                                 |
>>                                 |
>>              .------------------------------------.
>>              |                  |                 |
>>        HIODLegacyVFIO    [HIODLegacyVDPA]    HIODIOMMUFD
>>              | .vdev            | [.vdev]         | .iommufd
>>                                                   | .devid
>>                                                   | [.ioas_id]
>>                                                   | [.attach_hwpt()]
>>                                                   | [.detach_hwpt()]
>>                                                   |
>>                                      .----------------------.
>>                                      |                      |
>>                             HIODIOMMUFDVFIO         [HIODIOMMUFDVDPA]
>>                                      | .vdev                | [.vdev]
>>
> 
> Sure.
> 
>                           HostIOMMUDevice
>                                  | .get_host_iommu_info()
>                                  | .check_hdev()
>                                  |
>                     .------------------------------.
>                     |                              |
>                 HIODLegacy                    HIODIOMMUFD
>                     |                              | .iommufd
>               .--------------.                     | .devid
>               |              |                     | [.ioas_id]
>         HIODLegacyVFIO    [HIODLegacyVDPA]         | [.attach_hwpt()]
>               | .vdev            | [.vdev]         | [.detach_hwpt()]
>                                                    |
>                                       .----------------------.
>                                       |                      |
>                              HIODIOMMUFDVFIO         [HIODIOMMUFDVDPA]
>                                       | .vdev                | [.vdev]
> 
> 
> HostIOMMUDevice only declare .check_hdev(), but
> HIODLegacy and HIODIOMMUFD will implement .check_hdev().
> E.g., hiod_legacy_check_hdev() and hiod_iommufd_check_hdev().
> 
> int hiod_legacy_check_hdev(HostIOMMUDevice *hiod, IOMMUCheckHDev *viommu, Error **errp)
> {
>      IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_GET_CLASS(viommu);
> 
>      return chdc->check_legacy_hdev(viommu, hiod, errp);
> }
> 
> int hiod_iommufd_check_hdev(HostIOMMUDevice *hiod, IOMMUCheckHDev *viommu, Error **errp)
> {
>      IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_GET_CLASS(viommu);
> 
>      return chdc->check_iommufd_hdev(viommu, hiod, errp);
> }
> 
> And we implement interface TYPE_IOMMU_CHECK_HDEV in intel-iommu module.
> Certainly, we can also implement the same in other vIOMMUs we want.
> See below pseudo change:
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 68380d50ca..173c702b9f 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -5521,12 +5521,9 @@ static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
>                             Error **errp)
>   {
>       HostIOMMUDevice *hiod = vtd_hdev->dev;
> +    HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
> 
> -    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
> -        return vtd_check_iommufd_hdev(s, vtd_hdev, errp);
> -    }
> -
> -    return vtd_check_legacy_hdev(s, hiod, errp);
> +    return hiodc->check_hdev(IOMMU_CHECK_HDEV(s), hiod, errp);
>   }
> 
>   static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> @@ -6076,6 +6073,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
>   {
>       DeviceClass *dc = DEVICE_CLASS(klass);
>       X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass);
> +    IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_CLASS(klass);
> 
>       dc->reset = vtd_reset;
>       dc->vmsd = &vtd_vmstate;
> @@ -6087,6 +6085,8 @@ static void vtd_class_init(ObjectClass *klass, void *data)
>       dc->user_creatable = true;
>       set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>       dc->desc = "Intel IOMMU (VT-d) DMA Remapping device";
> +    chdc->check_legacy_hdev = vtd_check_legacy_hdev;
> +    chdc->check_iommufd_hdev = vtd_check_iommufd_hdev;
>   }
> 
>   static const TypeInfo vtd_info = {
> @@ -6094,6 +6094,10 @@ static const TypeInfo vtd_info = {
>       .parent        = TYPE_X86_IOMMU_DEVICE,
>       .instance_size = sizeof(IntelIOMMUState),
>       .class_init    = vtd_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_IOMMU_CHECK_HDEV },
> +        { }
> +    }
>   };
> 
> Thanks
> Zhenzhong



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-18  6:42               ` Cédric Le Goater
@ 2024-04-18  8:42                 ` Duan, Zhenzhong
  2024-04-19  6:20                   ` Cédric Le Goater
  0 siblings, 1 reply; 19+ messages in thread
From: Duan, Zhenzhong @ 2024-04-18  8:42 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

Hi Cédric,

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>Hello Zhenzhong
>
>On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>> compatibility check with host IOMMU cap/ecap
>>>>>
>>>>> Hello,
>>>>>
>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>> Hi Cédric,
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>
>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>
>>>>>>>> If check fails, the host side device(either vfio or vdpa device) should
>>> not
>>>>>>>> be passed to guest.
>>>>>>>>
>>>>>>>> Implementation details for different backends will be in following
>>>>> patches.
>>>>>>>>
>>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>> ---
>>>>>>>>      hw/i386/intel_iommu.c | 35
>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>      1 file changed, 35 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>      #include "sysemu/kvm.h"
>>>>>>>>      #include "sysemu/dma.h"
>>>>>>>>      #include "sysemu/sysemu.h"
>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>      #include "hw/i386/apic_internal.h"
>>>>>>>>      #include "kvm/kvm_i386.h"
>>>>>>>>      #include "migration/vmstate.h"
>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>          return vtd_dev_as;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>> +                                 Error **errp)
>>>>>>>> +{
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>> +                                  Error **errp)
>>>>>>>> +{
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>> VTDHostIOMMUDevice
>>>>>>> *vtd_hdev,
>>>>>>>> +                          Error **errp)
>>>>>>>> +{
>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>> +
>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD))
>{
>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>> +}
>>>>>>>
>>>>>>>
>>>>>>> I think we should be using the .get_host_iommu_info() class handler
>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>> the type ?
>>>>>>
>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>> vtd_check_legacy_hdev
>>>>>> and vtd_check_iommufd_hdev are different especially after nesting
>>>>> support introduced.
>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits
>>>>> besides aw_bits.
>>>>>
>>>>> I think it is important to fully separate the vIOMMU model from the
>>>>> host IOMMU backing device.
>
>This comment is true for the structures also.
>
>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>> handler .check_hdev() handler, which would
>call .get_host_iommu_info() ?
>
>This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should be
>a common structure 'HostIOMMUDeviceInfo' holding all attributes
>for the different backends. Each .get_host_iommu_info() implementation
>would translate the specific host iommu device data presentation
>into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.

I see, it's just not easy to define the unified elements in HostIOMMUDeviceInfo
so that they maps to bits or fields in host return IOMMU info.

Different platform returned host IOMMU info is platform specific.
For vtd and siommu:

struct iommu_hw_info_vtd {
        __u32 flags;
        __u32 __reserved;
        __aligned_u64 cap_reg;
        __aligned_u64 ecap_reg;
};

struct iommu_hw_info_arm_smmuv3 {
       __u32 flags;
       __u32 __reserved;
       __u32 idr[6];
       __u32 iidr;
       __u32 aidr;
};

I can think of two kinds of declaration of HostIOMMUDeviceInfo:

struct HostIOMMUDeviceInfo {
    uint8_t aw_bits;
    enum iommu_hw_info_type type;
    union {
        struct iommu_hw_info_vtd vtd;
        struct iommu_hw_info_arm_smmuv3;
        ......
    } data;
}

or

struct HostIOMMUDeviceInfo {
    uint8_t aw_bits;
    enum iommu_hw_info_type type;
    __u32 flags;
    __aligned_u64 cap_reg;
    __aligned_u64 ecap_reg;
    __u32 idr[6];
    __u32 iidr;
    __u32 aidr;
   ......
}

Not clear if any is your expected format.

>
>'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>type attribute and host iommu device type definitions, or as you
>suggested with a QOM interface. This is more complex however. In
>this case, I would suggest to implement a .compatible() handler to
>compare the host iommu device type with the vIOMMU type.
>
>The resulting check_hdev routine would look something like :
>
>static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>*vtd_hdev,
>                           Error **errp)
>{
>     HostIOMMUDevice *hiod = vtd_hdev->dev;
>     HostIOMMUDeviceClass *hiodc =
>HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>     HostIOMMUDevice info;
>     int host_aw_bits, ret;
>
>     ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>     if (ret) {
>         return ret;
>     }
>
>     ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>     if (ret) {
>         return ret;
>     }
>
>     if (s->aw_bits > info.aw_bits) {
>         error_setg(errp, "aw-bits %d > host aw-bits %d",
>                    s->aw_bits, info.aw_bits);
>         return -EINVAL;
>     }
>}
>
>and the HostIOMMUDeviceClass::is_compatible() handler would call a
>vIOMMUInterface::compatible() handler simply returning
>IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?

Not quite get what HostIOMMUDeviceClass::is_compatible() does.
Currently legacy and IOMMUFD host device has different check logic, how it can help
in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into a single vtd_check_hdev()?

Below is the two functions after nesting series, for your easy reference:

static int vtd_check_legacy_hdev()
{
    if (s->scalable_modern) {
        /* Modern vIOMMU and legacy backend */
        error_setg(errp, "Need IOMMUFD backend in scalable modern mode");
        return -EINVAL;
    }

    ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
    if (ret) {
        return ret;
    }

    if (s->aw_bits > info.aw_bits) {
        error_setg(errp, "aw-bits %d > host aw-bits %d",
                   s->aw_bits, info.aw_bits);
        return -EINVAL;
    }

    return 0;
}

static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
                                  VTDHostIOMMUDevice *vtd_hdev,
                                  Error **errp)
{
    ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
    if (ret) {
        return ret;
    }

    if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
        error_setg(errp, "IOMMU hardware is not compatible");
        return -EINVAL;
    }

    vtd = &info.data.vtd;
    host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1;
    if (s->aw_bits > host_aw_bits) {
        error_setg(errp, "aw-bits %d > host aw-bits %d",
                   s->aw_bits, host_aw_bits);
        return -EINVAL;
    }

    if (!s->scalable_modern) {
        goto done;
    }

    if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
        error_setg(errp, "Host IOMMU doesn't support nested translation");
        return -EINVAL;
    }

    if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) {
        error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");
        return -EINVAL;
    }
    vtd_hdev->errata = vtd->flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17;

done:
    return 0;
}

Thanks
Zhenzhong

>
>Including the type in HostIOMMUDeviceInfo is much simpler to start with.
>
>Thanks,
>
>C.
>
>
>
>
>
>>>>
>>>> Understood, besides the new .check_hdev() handler, I think we also
>need a
>>> new interface
>>>> class TYPE_IOMMU_CHECK_HDEV which has two handlers
>>> check_[legacy|iommufd]_hdev(),
>>>> and different vIOMMUs have different implementation.
>>>
>>> I am not sure to understand. Which class hierarchy would implement this
>>> new "TYPE_IOMMU_CHECK_HDEV" interface ? vIOMMU or host iommu  ?
>>>
>>> Could you please explain with an update of your diagram :
>>>
>>>                          HostIOMMUDevice
>>>                                 | .get_host_iommu_info()
>>>                                 |
>>>                                 |
>>>              .------------------------------------.
>>>              |                  |                 |
>>>        HIODLegacyVFIO    [HIODLegacyVDPA]    HIODIOMMUFD
>>>              | .vdev            | [.vdev]         | .iommufd
>>>                                                   | .devid
>>>                                                   | [.ioas_id]
>>>                                                   | [.attach_hwpt()]
>>>                                                   | [.detach_hwpt()]
>>>                                                   |
>>>                                      .----------------------.
>>>                                      |                      |
>>>                             HIODIOMMUFDVFIO         [HIODIOMMUFDVDPA]
>>>                                      | .vdev                | [.vdev]
>>>
>>
>> Sure.
>>
>>                           HostIOMMUDevice
>>                                  | .get_host_iommu_info()
>>                                  | .check_hdev()
>>                                  |
>>                     .------------------------------.
>>                     |                              |
>>                 HIODLegacy                    HIODIOMMUFD
>>                     |                              | .iommufd
>>               .--------------.                     | .devid
>>               |              |                     | [.ioas_id]
>>         HIODLegacyVFIO    [HIODLegacyVDPA]         | [.attach_hwpt()]
>>               | .vdev            | [.vdev]         | [.detach_hwpt()]
>>                                                    |
>>                                       .----------------------.
>>                                       |                      |
>>                              HIODIOMMUFDVFIO         [HIODIOMMUFDVDPA]
>>                                       | .vdev                | [.vdev]
>>
>>
>> HostIOMMUDevice only declare .check_hdev(), but
>> HIODLegacy and HIODIOMMUFD will implement .check_hdev().
>> E.g., hiod_legacy_check_hdev() and hiod_iommufd_check_hdev().
>>
>> int hiod_legacy_check_hdev(HostIOMMUDevice *hiod, IOMMUCheckHDev
>*viommu, Error **errp)
>> {
>>      IOMMUCheckHDevClass *chdc =
>IOMMU_CHECK_HDEV_GET_CLASS(viommu);
>>
>>      return chdc->check_legacy_hdev(viommu, hiod, errp);
>> }
>>
>> int hiod_iommufd_check_hdev(HostIOMMUDevice *hiod,
>IOMMUCheckHDev *viommu, Error **errp)
>> {
>>      IOMMUCheckHDevClass *chdc =
>IOMMU_CHECK_HDEV_GET_CLASS(viommu);
>>
>>      return chdc->check_iommufd_hdev(viommu, hiod, errp);
>> }
>>
>> And we implement interface TYPE_IOMMU_CHECK_HDEV in intel-iommu
>module.
>> Certainly, we can also implement the same in other vIOMMUs we want.
>> See below pseudo change:
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 68380d50ca..173c702b9f 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -5521,12 +5521,9 @@ static int vtd_check_hdev(IntelIOMMUState
>*s, VTDHostIOMMUDevice *vtd_hdev,
>>                             Error **errp)
>>   {
>>       HostIOMMUDevice *hiod = vtd_hdev->dev;
>> +    HostIOMMUDeviceClass *hiodc =
>HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>
>> -    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) {
>> -        return vtd_check_iommufd_hdev(s, vtd_hdev, errp);
>> -    }
>> -
>> -    return vtd_check_legacy_hdev(s, hiod, errp);
>> +    return hiodc->check_hdev(IOMMU_CHECK_HDEV(s), hiod, errp);
>>   }
>>
>>   static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int
>devfn,
>> @@ -6076,6 +6073,7 @@ static void vtd_class_init(ObjectClass *klass,
>void *data)
>>   {
>>       DeviceClass *dc = DEVICE_CLASS(klass);
>>       X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass);
>> +    IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_CLASS(klass);
>>
>>       dc->reset = vtd_reset;
>>       dc->vmsd = &vtd_vmstate;
>> @@ -6087,6 +6085,8 @@ static void vtd_class_init(ObjectClass *klass,
>void *data)
>>       dc->user_creatable = true;
>>       set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>       dc->desc = "Intel IOMMU (VT-d) DMA Remapping device";
>> +    chdc->check_legacy_hdev = vtd_check_legacy_hdev;
>> +    chdc->check_iommufd_hdev = vtd_check_iommufd_hdev;
>>   }
>>
>>   static const TypeInfo vtd_info = {
>> @@ -6094,6 +6094,10 @@ static const TypeInfo vtd_info = {
>>       .parent        = TYPE_X86_IOMMU_DEVICE,
>>       .instance_size = sizeof(IntelIOMMUState),
>>       .class_init    = vtd_class_init,
>> +    .interfaces = (InterfaceInfo[]) {
>> +        { TYPE_IOMMU_CHECK_HDEV },
>> +        { }
>> +    }
>>   };
>>
>> Thanks
>> Zhenzhong


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-18  8:42                 ` Duan, Zhenzhong
@ 2024-04-19  6:20                   ` Cédric Le Goater
  2024-04-19  9:49                     ` Duan, Zhenzhong
  2024-04-25  8:46                     ` Duan, Zhenzhong
  0 siblings, 2 replies; 19+ messages in thread
From: Cédric Le Goater @ 2024-04-19  6:20 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

Hello Zhenzhong,

On 4/18/24 10:42, Duan, Zhenzhong wrote:
> Hi Cédric,
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> compatibility check with host IOMMU cap/ecap
>>
>> Hello Zhenzhong
>>
>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Cédric Le Goater <clg@redhat.com>
>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>> compatibility check with host IOMMU cap/ecap
>>>>
>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>>> Hi Cédric,
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>
>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>
>>>>>>>>> If check fails, the host side device(either vfio or vdpa device) should
>>>> not
>>>>>>>>> be passed to guest.
>>>>>>>>>
>>>>>>>>> Implementation details for different backends will be in following
>>>>>> patches.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>> ---
>>>>>>>>>       hw/i386/intel_iommu.c | 35
>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>       1 file changed, 35 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>>       #include "sysemu/kvm.h"
>>>>>>>>>       #include "sysemu/dma.h"
>>>>>>>>>       #include "sysemu/sysemu.h"
>>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>>       #include "hw/i386/apic_internal.h"
>>>>>>>>>       #include "kvm/kvm_i386.h"
>>>>>>>>>       #include "migration/vmstate.h"
>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>>           return vtd_dev_as;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>>> +                                 Error **errp)
>>>>>>>>> +{
>>>>>>>>> +    return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>>> +                                  Error **errp)
>>>>>>>>> +{
>>>>>>>>> +    return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>>> VTDHostIOMMUDevice
>>>>>>>> *vtd_hdev,
>>>>>>>>> +                          Error **errp)
>>>>>>>>> +{
>>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>>> +
>>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD))
>> {
>>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>>> +}
>>>>>>>>
>>>>>>>>
>>>>>>>> I think we should be using the .get_host_iommu_info() class handler
>>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>>> the type ?
>>>>>>>
>>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>>> vtd_check_legacy_hdev
>>>>>>> and vtd_check_iommufd_hdev are different especially after nesting
>>>>>> support introduced.
>>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits
>>>>>> besides aw_bits.
>>>>>>
>>>>>> I think it is important to fully separate the vIOMMU model from the
>>>>>> host IOMMU backing device.
>>
>> This comment is true for the structures also.
>>
>>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>>> handler .check_hdev() handler, which would
>> call .get_host_iommu_info() ?
>>
>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should be
>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>> for the different backends. Each .get_host_iommu_info() implementation
>> would translate the specific host iommu device data presentation
>> into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.
> 
> I see, it's just not easy to define the unified elements in HostIOMMUDeviceInfo
> so that they maps to bits or fields in host return IOMMU info.

The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a new
API needs to be completely defined for it. The IOMMU backend implementation
could be anything, legacy, iommufd, iommufd v2, some other framework and
the vIOMMU shouldn't be aware of its implementation.

Exposing the kernel structures as done below should be avoided because
they are part of the QEMU <-> kernel IOMMUFD interface.


> Different platform returned host IOMMU info is platform specific.
> For vtd and siommu:
> 
> struct iommu_hw_info_vtd {
>          __u32 flags;
>          __u32 __reserved;
>          __aligned_u64 cap_reg;
>          __aligned_u64 ecap_reg;
> };
> 
> struct iommu_hw_info_arm_smmuv3 {
>         __u32 flags;
>         __u32 __reserved;
>         __u32 idr[6];
>         __u32 iidr;
>         __u32 aidr;
> };
> 
> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
> 
> struct HostIOMMUDeviceInfo {
>      uint8_t aw_bits;
>      enum iommu_hw_info_type type;
>      union {
>          struct iommu_hw_info_vtd vtd;
>          struct iommu_hw_info_arm_smmuv3;
>          ......
>      } data;
> }
> 
> or
> 
> struct HostIOMMUDeviceInfo {
>      uint8_t aw_bits;
>      enum iommu_hw_info_type type;
>      __u32 flags;
>      __aligned_u64 cap_reg;
>      __aligned_u64 ecap_reg;
>      __u32 idr[6];
>      __u32 iidr;
>      __u32 aidr;
>     ......
> }
> 
> Not clear if any is your expected format.
>
>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>> type attribute and host iommu device type definitions, or as you
>> suggested with a QOM interface. This is more complex however. In
>> this case, I would suggest to implement a .compatible() handler to
>> compare the host iommu device type with the vIOMMU type.
>>
>> The resulting check_hdev routine would look something like :
>>
>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>> *vtd_hdev,
>>                            Error **errp)
>> {
>>      HostIOMMUDevice *hiod = vtd_hdev->dev;
>>      HostIOMMUDeviceClass *hiodc =
>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>      HostIOMMUDevice info;
>>      int host_aw_bits, ret;
>>
>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      if (s->aw_bits > info.aw_bits) {
>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>                     s->aw_bits, info.aw_bits);
>>          return -EINVAL;
>>      }
>> }
>>
>> and the HostIOMMUDeviceClass::is_compatible() handler would call a
>> vIOMMUInterface::compatible() handler simply returning
>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
> 
> Not quite get what HostIOMMUDeviceClass::is_compatible() does.

HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend
to determine which IOMMU types are exposed by the host, then calls the
vIOMMUInterface::compatible() handler to do the compare. API is to be
defined.

As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice
interface capabilities, or features, to check more precisely the level
of compatibility between the vIOMMU and the host IOMMU device. This is
similar to what is done between QEMU and KVM.

If you think this is too complex, include type in HostIOMMUDeviceInfo.

> Currently legacy and IOMMUFD host device has different check logic, how it can help
> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into a single vtd_check_hdev()?

IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation, but
if you think the Intel vIOMMU should access directly the iommufd backend
when available, then we should drop this proposal and revisit the design
to take a different approach.

> Below is the two functions after nesting series, for your easy reference:
> 
> static int vtd_check_legacy_hdev()
> {
>      if (s->scalable_modern) {
>          /* Modern vIOMMU and legacy backend */
>          error_setg(errp, "Need IOMMUFD backend in scalable modern mode");
>          return -EINVAL;
>      }

This part would typically go in the compatible() handler.

> 
>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>      if (ret) {
>          return ret;
>      }
> 
>      if (s->aw_bits > info.aw_bits) {
>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>                     s->aw_bits, info.aw_bits);
>          return -EINVAL;
>      }
> 
>      return 0;
> }
> 
> static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>                                    VTDHostIOMMUDevice *vtd_hdev,
>                                    Error **errp)
> {
>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>      if (ret) {
>          return ret;
>      }
> 
>      if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
>          error_setg(errp, "IOMMU hardware is not compatible");
>          return -EINVAL;
>      }
> 
>      vtd = &info.data.vtd;
>      host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1;
>      if (s->aw_bits > host_aw_bits) {
>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>                     s->aw_bits, host_aw_bits);
>          return -EINVAL;
>      }
> 
>      if (!s->scalable_modern) {
>          goto done;
>      }
> 
>      if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
>          error_setg(errp, "Host IOMMU doesn't support nested translation");
>          return -EINVAL;
>      }
> 
>      if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) {
>          error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");
>          return -EINVAL;
>      }

These checks above would typically go in the compatible() handler also.

Now, the question is how useful will that framework be if hotplugging
devices always fail because the vIOMMU and host IOMMU devices have
incompatible settings/capabilities ? Shouldn't we also add properties
at the vIOMMU level ?


Thanks,

C.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-19  6:20                   ` Cédric Le Goater
@ 2024-04-19  9:49                     ` Duan, Zhenzhong
  2024-04-25  8:46                     ` Duan, Zhenzhong
  1 sibling, 0 replies; 19+ messages in thread
From: Duan, Zhenzhong @ 2024-04-19  9:49 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>Hello Zhenzhong,
>
>On 4/18/24 10:42, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> Hello Zhenzhong
>>>
>>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>> compatibility check with host IOMMU cap/ecap
>>>>>
>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>>>> Hi Cédric,
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>
>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>>
>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device)
>should
>>>>> not
>>>>>>>>>> be passed to guest.
>>>>>>>>>>
>>>>>>>>>> Implementation details for different backends will be in
>following
>>>>>>> patches.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>       hw/i386/intel_iommu.c | 35
>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>       1 file changed, 35 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>>>       #include "sysemu/kvm.h"
>>>>>>>>>>       #include "sysemu/dma.h"
>>>>>>>>>>       #include "sysemu/sysemu.h"
>>>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>>>       #include "hw/i386/apic_internal.h"
>>>>>>>>>>       #include "kvm/kvm_i386.h"
>>>>>>>>>>       #include "migration/vmstate.h"
>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>>>           return vtd_dev_as;
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>>>> +                                 Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>>>> +                                  Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>>>> VTDHostIOMMUDevice
>>>>>>>>> *vtd_hdev,
>>>>>>>>>> +                          Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>>>> +
>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod),
>TYPE_HIOD_IOMMUFD))
>>> {
>>>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we should be using the .get_host_iommu_info() class
>handler
>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>>>> the type ?
>>>>>>>>
>>>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>>>> vtd_check_legacy_hdev
>>>>>>>> and vtd_check_iommufd_hdev are different especially after
>nesting
>>>>>>> support introduced.
>>>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap
>bits
>>>>>>> besides aw_bits.
>>>>>>>
>>>>>>> I think it is important to fully separate the vIOMMU model from the
>>>>>>> host IOMMU backing device.
>>>
>>> This comment is true for the structures also.
>>>
>>>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>>>> handler .check_hdev() handler, which would
>>> call .get_host_iommu_info() ?
>>>
>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should
>be
>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>>> for the different backends. Each .get_host_iommu_info() implementation
>>> would translate the specific host iommu device data presentation
>>> into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.
>>
>> I see, it's just not easy to define the unified elements in
>HostIOMMUDeviceInfo
>> so that they maps to bits or fields in host return IOMMU info.
>
>The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a
>new
>API needs to be completely defined for it. The IOMMU backend
>implementation
>could be anything, legacy, iommufd, iommufd v2, some other framework
>and
>the vIOMMU shouldn't be aware of its implementation.

Kernel IOMMUFD provide a unified uAPI supporting different platform, arm,
x86, etc. Userspace representive IOMMUFD backend(backend/iommufd.c) does
the same, I'm treating it as a general library for other modules. That
means other modules, i.e., vfio, vdpa and vIOMMUs can call into the
functions iommufd library provides.

VFIO supports two backends, "VFIO legacy backend"(driver/vfio/container.c)
and "VFIO IOMMUFD backend" (driver/vfio/iommufd.c). The 2nd one is a thin
layer calling into "general IOMMUFD backend". "VFIO legacy backend" is
targeting VFIO usage so its scope is small than "general IOMMUFD backend".

I think we don't need to hide the general IOMMUFD backend from vIOMMU,
VFIO only need to provide critical data, such as iommufd/devid/etc to
vIOMMU, vIOMMU can call "general IOMMUFD backend" itself. It looks
unnecessary and inefficient to wrap "general IOMMUFD backend" provided
functions attach/detach_hwpt, alloc_ioas, free_id, etc in vIOMMU <-> HostIOMMUDevice
interface.

On the other hand, legacy will be deprecated by iommufd finally. iommufd v2
should be backward compatible with iommufd v1, v2 should not break existing
iommufd related code in vIOMMU. If it does, we treat it same as an other
framework.

For other framework, if it's only for VFIO usage and support nesting, I
remember the oldest design is working that way and replaced by IOMMUFD.
If it's also a general framework, I think we don't need to hide it under
VFIO, we can extend vIOMMU to call into it too.

>
>Exposing the kernel structures as done below should be avoided because
>they are part of the QEMU <-> kernel IOMMUFD interface.

But VFIO is only an agent calling kernel for host IOMMU info and pass it to
vIOMMU. VFIO doesn't need to know details of the kernel structures.
Is that ok?

For legacy VFIO backend, kernel doesn't provide a uAPI to get host IOMMU info.
So VFIO constructs a host_iommu_info for vIOMMU with its knowledge.

>
>
>> Different platform returned host IOMMU info is platform specific.
>> For vtd and siommu:
>>
>> struct iommu_hw_info_vtd {
>>          __u32 flags;
>>          __u32 __reserved;
>>          __aligned_u64 cap_reg;
>>          __aligned_u64 ecap_reg;
>> };
>>
>> struct iommu_hw_info_arm_smmuv3 {
>>         __u32 flags;
>>         __u32 __reserved;
>>         __u32 idr[6];
>>         __u32 iidr;
>>         __u32 aidr;
>> };
>>
>> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
>>
>> struct HostIOMMUDeviceInfo {
>>      uint8_t aw_bits;
>>      enum iommu_hw_info_type type;
>>      union {
>>          struct iommu_hw_info_vtd vtd;
>>          struct iommu_hw_info_arm_smmuv3;
>>          ......
>>      } data;
>> }
>>
>> or
>>
>> struct HostIOMMUDeviceInfo {
>>      uint8_t aw_bits;
>>      enum iommu_hw_info_type type;
>>      __u32 flags;
>>      __aligned_u64 cap_reg;
>>      __aligned_u64 ecap_reg;
>>      __u32 idr[6];
>>      __u32 iidr;
>>      __u32 aidr;
>>     ......
>> }
>>
>> Not clear if any is your expected format.
>>
>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>>> type attribute and host iommu device type definitions, or as you
>>> suggested with a QOM interface. This is more complex however. In
>>> this case, I would suggest to implement a .compatible() handler to
>>> compare the host iommu device type with the vIOMMU type.
>>>
>>> The resulting check_hdev routine would look something like :
>>>
>>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>>> *vtd_hdev,
>>>                            Error **errp)
>>> {
>>>      HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>      HostIOMMUDeviceClass *hiodc =
>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>      HostIOMMUDevice info;
>>>      int host_aw_bits, ret;
>>>
>>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>>      if (ret) {
>>>          return ret;
>>>      }
>>>
>>>      ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>>>      if (ret) {
>>>          return ret;
>>>      }
>>>
>>>      if (s->aw_bits > info.aw_bits) {
>>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>>                     s->aw_bits, info.aw_bits);
>>>          return -EINVAL;
>>>      }
>>> }
>>>
>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a
>>> vIOMMUInterface::compatible() handler simply returning
>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
>>
>> Not quite get what HostIOMMUDeviceClass::is_compatible() does.
>
>HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend
>to determine which IOMMU types are exposed by the host, then calls the
>vIOMMUInterface::compatible() handler to do the compare. API is to be
>defined.
>
>As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice
>interface capabilities, or features, to check more precisely the level
>of compatibility between the vIOMMU and the host IOMMU device. This is
>similar to what is done between QEMU and KVM.
>
>If you think this is too complex, include type in HostIOMMUDeviceInfo.

Yea, feel hard to define a common capabilities or features in HostIOMMUDeviceInfo
for different host platform IOMMUs.

>
>> Currently legacy and IOMMUFD host device has different check logic, how
>it can help
>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into
>a single vtd_check_hdev()?
>
>IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation,
>but
>if you think the Intel vIOMMU should access directly the iommufd backend
>when available, then we should drop this proposal and revisit the design
>to take a different approach.

See my explanation above of why iommufd backend isn't separated from
vIOMMU.

I like your suggestion of QOM which helped in device attach/detach_hwpt
interface. We only need to discuss further to get a consensus on
host_iommu_info and host iommu device interface with vIOMMU.

>
>> Below is the two functions after nesting series, for your easy reference:
>>
>> static int vtd_check_legacy_hdev()
>> {
>>      if (s->scalable_modern) {
>>          /* Modern vIOMMU and legacy backend */
>>          error_setg(errp, "Need IOMMUFD backend in scalable modern
>mode");
>>          return -EINVAL;
>>      }
>
>This part would typically go in the compatible() handler.
>
>>
>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      if (s->aw_bits > info.aw_bits) {
>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>                     s->aw_bits, info.aw_bits);
>>          return -EINVAL;
>>      }
>>
>>      return 0;
>> }
>>
>> static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>                                    VTDHostIOMMUDevice *vtd_hdev,
>>                                    Error **errp)
>> {
>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
>>          error_setg(errp, "IOMMU hardware is not compatible");
>>          return -EINVAL;
>>      }
>>
>>      vtd = &info.data.vtd;
>>      host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1;
>>      if (s->aw_bits > host_aw_bits) {
>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>                     s->aw_bits, host_aw_bits);
>>          return -EINVAL;
>>      }
>>
>>      if (!s->scalable_modern) {
>>          goto done;
>>      }
>>
>>      if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
>>          error_setg(errp, "Host IOMMU doesn't support nested translation");
>>          return -EINVAL;
>>      }
>>
>>      if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) {
>>          error_setg(errp, "Stage-1 1GB huge page is unsupported by host
>IOMMU");
>>          return -EINVAL;
>>      }
>
>These checks above would typically go in the compatible() handler also.

I'm afraid we still need "if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)))"
check in .compatible() so that we can do different check for legacy and iommufd hdev.

>
>Now, the question is how useful will that framework be if hotplugging
>devices always fail because the vIOMMU and host IOMMU devices have
>incompatible settings/capabilities ? Shouldn't we also add properties
>at the vIOMMU level ?

Yes, we should.

We do add a property "x-cap-fl1gp" for s->fl1gp.
See https://github.com/yiliu1765/qemu/commit/0716f874527b20e8784ef1afaff66069cc3d7b60

Property "aw_bits" already exist for a long time.

The check on VTD_ECAP_NEST failure means host doesn't support nesting,
Property "x-scalable-mode=legacy" can help.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-19  6:20                   ` Cédric Le Goater
  2024-04-19  9:49                     ` Duan, Zhenzhong
@ 2024-04-25  8:46                     ` Duan, Zhenzhong
  2024-04-25 12:40                       ` Cédric Le Goater
  1 sibling, 1 reply; 19+ messages in thread
From: Duan, Zhenzhong @ 2024-04-25  8:46 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

Hi Cédric,

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>Hello Zhenzhong,
>
>On 4/18/24 10:42, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> Hello Zhenzhong
>>>
>>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>> compatibility check with host IOMMU cap/ecap
>>>>>
>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>>>> Hi Cédric,
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>
>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>>
>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device)
>should
>>>>> not
>>>>>>>>>> be passed to guest.
>>>>>>>>>>
>>>>>>>>>> Implementation details for different backends will be in
>following
>>>>>>> patches.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>       hw/i386/intel_iommu.c | 35
>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>       1 file changed, 35 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>>>       #include "sysemu/kvm.h"
>>>>>>>>>>       #include "sysemu/dma.h"
>>>>>>>>>>       #include "sysemu/sysemu.h"
>>>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>>>       #include "hw/i386/apic_internal.h"
>>>>>>>>>>       #include "kvm/kvm_i386.h"
>>>>>>>>>>       #include "migration/vmstate.h"
>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>>>           return vtd_dev_as;
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>>>> +                                 Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>>>> +                                  Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>>>> VTDHostIOMMUDevice
>>>>>>>>> *vtd_hdev,
>>>>>>>>>> +                          Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>>>> +
>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod),
>TYPE_HIOD_IOMMUFD))
>>> {
>>>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we should be using the .get_host_iommu_info() class
>handler
>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>>>> the type ?
>>>>>>>>
>>>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>>>> vtd_check_legacy_hdev
>>>>>>>> and vtd_check_iommufd_hdev are different especially after
>nesting
>>>>>>> support introduced.
>>>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap
>bits
>>>>>>> besides aw_bits.
>>>>>>>
>>>>>>> I think it is important to fully separate the vIOMMU model from the
>>>>>>> host IOMMU backing device.
>>>
>>> This comment is true for the structures also.
>>>
>>>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>>>> handler .check_hdev() handler, which would
>>> call .get_host_iommu_info() ?
>>>
>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should
>be
>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>>> for the different backends. Each .get_host_iommu_info() implementation
>>> would translate the specific host iommu device data presentation
>>> into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.
>>
>> I see, it's just not easy to define the unified elements in
>HostIOMMUDeviceInfo
>> so that they maps to bits or fields in host return IOMMU info.
>
>The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a
>new
>API needs to be completely defined for it. The IOMMU backend
>implementation
>could be anything, legacy, iommufd, iommufd v2, some other framework
>and
>the vIOMMU shouldn't be aware of its implementation.
>
>Exposing the kernel structures as done below should be avoided because
>they are part of the QEMU <-> kernel IOMMUFD interface.
>
>
>> Different platform returned host IOMMU info is platform specific.
>> For vtd and siommu:
>>
>> struct iommu_hw_info_vtd {
>>          __u32 flags;
>>          __u32 __reserved;
>>          __aligned_u64 cap_reg;
>>          __aligned_u64 ecap_reg;
>> };
>>
>> struct iommu_hw_info_arm_smmuv3 {
>>         __u32 flags;
>>         __u32 __reserved;
>>         __u32 idr[6];
>>         __u32 iidr;
>>         __u32 aidr;
>> };
>>
>> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
>>
>> struct HostIOMMUDeviceInfo {
>>      uint8_t aw_bits;
>>      enum iommu_hw_info_type type;
>>      union {
>>          struct iommu_hw_info_vtd vtd;
>>          struct iommu_hw_info_arm_smmuv3;
>>          ......
>>      } data;
>> }
>>
>> or
>>
>> struct HostIOMMUDeviceInfo {
>>      uint8_t aw_bits;
>>      enum iommu_hw_info_type type;
>>      __u32 flags;
>>      __aligned_u64 cap_reg;
>>      __aligned_u64 ecap_reg;
>>      __u32 idr[6];
>>      __u32 iidr;
>>      __u32 aidr;
>>     ......
>> }
>>
>> Not clear if any is your expected format.
>>
>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>>> type attribute and host iommu device type definitions, or as you
>>> suggested with a QOM interface. This is more complex however. In
>>> this case, I would suggest to implement a .compatible() handler to
>>> compare the host iommu device type with the vIOMMU type.
>>>
>>> The resulting check_hdev routine would look something like :
>>>
>>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>>> *vtd_hdev,
>>>                            Error **errp)
>>> {
>>>      HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>      HostIOMMUDeviceClass *hiodc =
>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>      HostIOMMUDevice info;
>>>      int host_aw_bits, ret;
>>>
>>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>>      if (ret) {
>>>          return ret;
>>>      }
>>>
>>>      ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>>>      if (ret) {
>>>          return ret;
>>>      }
>>>
>>>      if (s->aw_bits > info.aw_bits) {
>>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>>                     s->aw_bits, info.aw_bits);
>>>          return -EINVAL;
>>>      }
>>> }
>>>
>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a
>>> vIOMMUInterface::compatible() handler simply returning
>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
>>
>> Not quite get what HostIOMMUDeviceClass::is_compatible() does.
>
>HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend
>to determine which IOMMU types are exposed by the host, then calls the
>vIOMMUInterface::compatible() handler to do the compare. API is to be
>defined.
>
>As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice
>interface capabilities, or features, to check more precisely the level
>of compatibility between the vIOMMU and the host IOMMU device. This is
>similar to what is done between QEMU and KVM.
>
>If you think this is too complex, include type in HostIOMMUDeviceInfo.
>
>> Currently legacy and IOMMUFD host device has different check logic, how
>it can help
>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into
>a single vtd_check_hdev()?
>
>IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation,
>but
>if you think the Intel vIOMMU should access directly the iommufd backend
>when available, then we should drop this proposal and revisit the design
>to take a different approach.

I implemented a draft following your suggestions so we could explore further.
See https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_v3_tmp

In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx
just like KVM CAPs.
A common HostIOMMUDeviceCaps structure is introduced to be used by
both legacy and iommufd backend.

It indeed is cleaner. Only problem is I failed to implement .compatible()
as all the check could go ahead by just calling check_cap().
Could you help a quick check to see if I misunderstood any of your suggestion?

Thanks
Zhenzhong

>
>> Below is the two functions after nesting series, for your easy reference:
>>
>> static int vtd_check_legacy_hdev()
>> {
>>      if (s->scalable_modern) {
>>          /* Modern vIOMMU and legacy backend */
>>          error_setg(errp, "Need IOMMUFD backend in scalable modern
>mode");
>>          return -EINVAL;
>>      }
>
>This part would typically go in the compatible() handler.
>
>>
>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      if (s->aw_bits > info.aw_bits) {
>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>                     s->aw_bits, info.aw_bits);
>>          return -EINVAL;
>>      }
>>
>>      return 0;
>> }
>>
>> static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>                                    VTDHostIOMMUDevice *vtd_hdev,
>>                                    Error **errp)
>> {
>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
>>          error_setg(errp, "IOMMU hardware is not compatible");
>>          return -EINVAL;
>>      }
>>
>>      vtd = &info.data.vtd;
>>      host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1;
>>      if (s->aw_bits > host_aw_bits) {
>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>                     s->aw_bits, host_aw_bits);
>>          return -EINVAL;
>>      }
>>
>>      if (!s->scalable_modern) {
>>          goto done;
>>      }
>>
>>      if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
>>          error_setg(errp, "Host IOMMU doesn't support nested translation");
>>          return -EINVAL;
>>      }
>>
>>      if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) {
>>          error_setg(errp, "Stage-1 1GB huge page is unsupported by host
>IOMMU");
>>          return -EINVAL;
>>      }
>
>These checks above would typically go in the compatible() handler also.
>
>Now, the question is how useful will that framework be if hotplugging
>devices always fail because the vIOMMU and host IOMMU devices have
>incompatible settings/capabilities ? Shouldn't we also add properties
>at the vIOMMU level ?
>
>
>Thanks,
>
>C.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-25  8:46                     ` Duan, Zhenzhong
@ 2024-04-25 12:40                       ` Cédric Le Goater
  2024-04-26  3:10                         ` Duan, Zhenzhong
  0 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2024-04-25 12:40 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

On 4/25/24 10:46, Duan, Zhenzhong wrote:
> Hi Cédric,
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> compatibility check with host IOMMU cap/ecap
>>
>> Hello Zhenzhong,
>>
>> On 4/18/24 10:42, Duan, Zhenzhong wrote:
>>> Hi Cédric,
>>>
>>>> -----Original Message-----
>>>> From: Cédric Le Goater <clg@redhat.com>
>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>> compatibility check with host IOMMU cap/ecap
>>>>
>>>> Hello Zhenzhong
>>>>
>>>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>
>>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>>>>> Hi Cédric,
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>>
>>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>>>
>>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device)
>> should
>>>>>> not
>>>>>>>>>>> be passed to guest.
>>>>>>>>>>>
>>>>>>>>>>> Implementation details for different backends will be in
>> following
>>>>>>>> patches.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>>>> ---
>>>>>>>>>>>        hw/i386/intel_iommu.c | 35
>>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>>        1 file changed, 35 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>>>>        #include "sysemu/kvm.h"
>>>>>>>>>>>        #include "sysemu/dma.h"
>>>>>>>>>>>        #include "sysemu/sysemu.h"
>>>>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>>>>        #include "hw/i386/apic_internal.h"
>>>>>>>>>>>        #include "kvm/kvm_i386.h"
>>>>>>>>>>>        #include "migration/vmstate.h"
>>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>>>>            return vtd_dev_as;
>>>>>>>>>>>        }
>>>>>>>>>>>
>>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>>>>> +                                 Error **errp)
>>>>>>>>>>> +{
>>>>>>>>>>> +    return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>>>>> +                                  Error **errp)
>>>>>>>>>>> +{
>>>>>>>>>>> +    return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>>>>> VTDHostIOMMUDevice
>>>>>>>>>> *vtd_hdev,
>>>>>>>>>>> +                          Error **errp)
>>>>>>>>>>> +{
>>>>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>>>>> +
>>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod),
>> TYPE_HIOD_IOMMUFD))
>>>> {
>>>>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>>>>> +}
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think we should be using the .get_host_iommu_info() class
>> handler
>>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>>>>> the type ?
>>>>>>>>>
>>>>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>>>>> vtd_check_legacy_hdev
>>>>>>>>> and vtd_check_iommufd_hdev are different especially after
>> nesting
>>>>>>>> support introduced.
>>>>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap
>> bits
>>>>>>>> besides aw_bits.
>>>>>>>>
>>>>>>>> I think it is important to fully separate the vIOMMU model from the
>>>>>>>> host IOMMU backing device.
>>>>
>>>> This comment is true for the structures also.
>>>>
>>>>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>>>>> handler .check_hdev() handler, which would
>>>> call .get_host_iommu_info() ?
>>>>
>>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should
>> be
>>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>>>> for the different backends. Each .get_host_iommu_info() implementation
>>>> would translate the specific host iommu device data presentation
>>>> into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.
>>>
>>> I see, it's just not easy to define the unified elements in
>> HostIOMMUDeviceInfo
>>> so that they maps to bits or fields in host return IOMMU info.
>>
>> The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a
>> new
>> API needs to be completely defined for it. The IOMMU backend
>> implementation
>> could be anything, legacy, iommufd, iommufd v2, some other framework
>> and
>> the vIOMMU shouldn't be aware of its implementation.
>>
>> Exposing the kernel structures as done below should be avoided because
>> they are part of the QEMU <-> kernel IOMMUFD interface.
>>
>>
>>> Different platform returned host IOMMU info is platform specific.
>>> For vtd and siommu:
>>>
>>> struct iommu_hw_info_vtd {
>>>           __u32 flags;
>>>           __u32 __reserved;
>>>           __aligned_u64 cap_reg;
>>>           __aligned_u64 ecap_reg;
>>> };
>>>
>>> struct iommu_hw_info_arm_smmuv3 {
>>>          __u32 flags;
>>>          __u32 __reserved;
>>>          __u32 idr[6];
>>>          __u32 iidr;
>>>          __u32 aidr;
>>> };
>>>
>>> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
>>>
>>> struct HostIOMMUDeviceInfo {
>>>       uint8_t aw_bits;
>>>       enum iommu_hw_info_type type;
>>>       union {
>>>           struct iommu_hw_info_vtd vtd;
>>>           struct iommu_hw_info_arm_smmuv3;
>>>           ......
>>>       } data;
>>> }
>>>
>>> or
>>>
>>> struct HostIOMMUDeviceInfo {
>>>       uint8_t aw_bits;
>>>       enum iommu_hw_info_type type;
>>>       __u32 flags;
>>>       __aligned_u64 cap_reg;
>>>       __aligned_u64 ecap_reg;
>>>       __u32 idr[6];
>>>       __u32 iidr;
>>>       __u32 aidr;
>>>      ......
>>> }
>>>
>>> Not clear if any is your expected format.
>>>
>>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>>>> type attribute and host iommu device type definitions, or as you
>>>> suggested with a QOM interface. This is more complex however. In
>>>> this case, I would suggest to implement a .compatible() handler to
>>>> compare the host iommu device type with the vIOMMU type.
>>>>
>>>> The resulting check_hdev routine would look something like :
>>>>
>>>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>>>> *vtd_hdev,
>>>>                             Error **errp)
>>>> {
>>>>       HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>       HostIOMMUDeviceClass *hiodc =
>>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>       HostIOMMUDevice info;
>>>>       int host_aw_bits, ret;
>>>>
>>>>       ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>>>       if (ret) {
>>>>           return ret;
>>>>       }
>>>>
>>>>       ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>>>>       if (ret) {
>>>>           return ret;
>>>>       }
>>>>
>>>>       if (s->aw_bits > info.aw_bits) {
>>>>           error_setg(errp, "aw-bits %d > host aw-bits %d",
>>>>                      s->aw_bits, info.aw_bits);
>>>>           return -EINVAL;
>>>>       }
>>>> }
>>>>
>>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a
>>>> vIOMMUInterface::compatible() handler simply returning
>>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
>>>
>>> Not quite get what HostIOMMUDeviceClass::is_compatible() does.
>>
>> HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend
>> to determine which IOMMU types are exposed by the host, then calls the
>> vIOMMUInterface::compatible() handler to do the compare. API is to be
>> defined.
>>
>> As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice
>> interface capabilities, or features, to check more precisely the level
>> of compatibility between the vIOMMU and the host IOMMU device. This is
>> similar to what is done between QEMU and KVM.
>>
>> If you think this is too complex, include type in HostIOMMUDeviceInfo.
>>
>>> Currently legacy and IOMMUFD host device has different check logic, how
>> it can help
>>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into
>> a single vtd_check_hdev()?
>>
>> IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation,
>> but
>> if you think the Intel vIOMMU should access directly the iommufd backend
>> when available, then we should drop this proposal and revisit the design
>> to take a different approach.
> 
> I implemented a draft following your suggestions so we could explore further.
> See https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_v3_tmp
> 
> In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx
> just like KVM CAPs.
> A common HostIOMMUDeviceCaps structure is introduced to be used by
> both legacy and iommufd backend.
> 
> It indeed is cleaner. Only problem is I failed to implement .compatible()
> as all the check could go ahead by just calling check_cap().
> Could you help a quick check to see if I misunderstood any of your suggestion?

Thanks for the changes. It looks cleaner and simpler ! Some comments,

* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't
   remember if you told me already you had plans for future changes.
   Sorry about that if this is the case. I forgot.

* I would use the 'host_iommu_device_' prefix for external routines
   which are part of the HostIOMMUDevice API and use 'hiod_' for
   internal routines where it makes sense, to limit the name length for
   instance.

* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to
   HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as a
   theoretical example of a different IOMMU interface. I don't think we
   need to anticipate yet :)

* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from
   'linux/iommufd.h', that's not my preferred choice but I won't
   object. The result looks good.

* HostIOMMUDevice now has a realize() routine to query the host IOMMU
   capability for later usage. This is a good idea. However, you could
   change the return value to bool and avoid the ERRP_GUARD() prologue.

* Beware of :
     
     struct Range {
         /*
          * Do not access members directly, use the functions!
          * A non-empty range has @lob <= @upb.
          * An empty range has @lob == @upb + 1.
          */
         uint64_t lob;        /* inclusive lower bound */
         uint64_t upb;        /* inclusive upper bound */
     };
     

* I think you could introduce a new VFIOIOMMUClass attribute. Let's
   call it VFIOIOMMUClass::hiod_typename. The creation of HostIOMMUDevice
   would become generic and you could move :

     hiod= HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGACY_VFIO));
     HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
     if (*errp) {
         object_unref(hiod);
         return -EINVAL;
     }
     vbasedev->hiod = hiod;

   at the end of vfio_attach_device().


Thanks,

C.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
  2024-04-25 12:40                       ` Cédric Le Goater
@ 2024-04-26  3:10                         ` Duan, Zhenzhong
  0 siblings, 0 replies; 19+ messages in thread
From: Duan, Zhenzhong @ 2024-04-26  3:10 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, eric.auger, peterx, jasowang, mst, jgg,
	nicolinc, joao.m.martins, Tian, Kevin, Liu, Yi L, Peng, Chao P,
	Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>On 4/25/24 10:46, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> Hello Zhenzhong,
>>>
>>> On 4/18/24 10:42, Duan, Zhenzhong wrote:
>>>> Hi Cédric,
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>> compatibility check with host IOMMU cap/ecap
>>>>>
>>>>> Hello Zhenzhong
>>>>>
>>>>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>
>>>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>>>>>> Hi Cédric,
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to
>do
>>>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>>>
>>>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>>>>
>>>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device)
>>> should
>>>>>>> not
>>>>>>>>>>>> be passed to guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Implementation details for different backends will be in
>>> following
>>>>>>>>> patches.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>        hw/i386/intel_iommu.c | 35
>>>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>        1 file changed, 35 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>>>>>        #include "sysemu/kvm.h"
>>>>>>>>>>>>        #include "sysemu/dma.h"
>>>>>>>>>>>>        #include "sysemu/sysemu.h"
>>>>>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>>>>>        #include "hw/i386/apic_internal.h"
>>>>>>>>>>>>        #include "kvm/kvm_i386.h"
>>>>>>>>>>>>        #include "migration/vmstate.h"
>>>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>>>>>            return vtd_dev_as;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>>>>>> +                                 Error **errp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>>>>>> +                                  Error **errp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>>>>>> VTDHostIOMMUDevice
>>>>>>>>>>> *vtd_hdev,
>>>>>>>>>>>> +                          Error **errp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod),
>>> TYPE_HIOD_IOMMUFD))
>>>>> {
>>>>>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>>>>>> +}
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think we should be using the .get_host_iommu_info() class
>>> handler
>>>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>>>>>> the type ?
>>>>>>>>>>
>>>>>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>>>>>> vtd_check_legacy_hdev
>>>>>>>>>> and vtd_check_iommufd_hdev are different especially after
>>> nesting
>>>>>>>>> support introduced.
>>>>>>>>>> vtd_check_iommufd_hdev() has much wider check over
>cap/ecap
>>> bits
>>>>>>>>> besides aw_bits.
>>>>>>>>>
>>>>>>>>> I think it is important to fully separate the vIOMMU model from
>the
>>>>>>>>> host IOMMU backing device.
>>>>>
>>>>> This comment is true for the structures also.
>>>>>
>>>>>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>>>>>> handler .check_hdev() handler, which would
>>>>> call .get_host_iommu_info() ?
>>>>>
>>>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO
>should
>>> be
>>>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>>>>> for the different backends. Each .get_host_iommu_info()
>implementation
>>>>> would translate the specific host iommu device data presentation
>>>>> into the common 'HostIOMMUDeviceInfo', this is true for
>host_aw_bits.
>>>>
>>>> I see, it's just not easy to define the unified elements in
>>> HostIOMMUDeviceInfo
>>>> so that they maps to bits or fields in host return IOMMU info.
>>>
>>> The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a
>>> new
>>> API needs to be completely defined for it. The IOMMU backend
>>> implementation
>>> could be anything, legacy, iommufd, iommufd v2, some other framework
>>> and
>>> the vIOMMU shouldn't be aware of its implementation.
>>>
>>> Exposing the kernel structures as done below should be avoided because
>>> they are part of the QEMU <-> kernel IOMMUFD interface.
>>>
>>>
>>>> Different platform returned host IOMMU info is platform specific.
>>>> For vtd and siommu:
>>>>
>>>> struct iommu_hw_info_vtd {
>>>>           __u32 flags;
>>>>           __u32 __reserved;
>>>>           __aligned_u64 cap_reg;
>>>>           __aligned_u64 ecap_reg;
>>>> };
>>>>
>>>> struct iommu_hw_info_arm_smmuv3 {
>>>>          __u32 flags;
>>>>          __u32 __reserved;
>>>>          __u32 idr[6];
>>>>          __u32 iidr;
>>>>          __u32 aidr;
>>>> };
>>>>
>>>> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
>>>>
>>>> struct HostIOMMUDeviceInfo {
>>>>       uint8_t aw_bits;
>>>>       enum iommu_hw_info_type type;
>>>>       union {
>>>>           struct iommu_hw_info_vtd vtd;
>>>>           struct iommu_hw_info_arm_smmuv3;
>>>>           ......
>>>>       } data;
>>>> }
>>>>
>>>> or
>>>>
>>>> struct HostIOMMUDeviceInfo {
>>>>       uint8_t aw_bits;
>>>>       enum iommu_hw_info_type type;
>>>>       __u32 flags;
>>>>       __aligned_u64 cap_reg;
>>>>       __aligned_u64 ecap_reg;
>>>>       __u32 idr[6];
>>>>       __u32 iidr;
>>>>       __u32 aidr;
>>>>      ......
>>>> }
>>>>
>>>> Not clear if any is your expected format.
>>>>
>>>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>>>>> type attribute and host iommu device type definitions, or as you
>>>>> suggested with a QOM interface. This is more complex however. In
>>>>> this case, I would suggest to implement a .compatible() handler to
>>>>> compare the host iommu device type with the vIOMMU type.
>>>>>
>>>>> The resulting check_hdev routine would look something like :
>>>>>
>>>>> static int vtd_check_hdev(IntelIOMMUState *s,
>VTDHostIOMMUDevice
>>>>> *vtd_hdev,
>>>>>                             Error **errp)
>>>>> {
>>>>>       HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>       HostIOMMUDeviceClass *hiodc =
>>>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>>>       HostIOMMUDevice info;
>>>>>       int host_aw_bits, ret;
>>>>>
>>>>>       ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>>>>       if (ret) {
>>>>>           return ret;
>>>>>       }
>>>>>
>>>>>       ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>>>>>       if (ret) {
>>>>>           return ret;
>>>>>       }
>>>>>
>>>>>       if (s->aw_bits > info.aw_bits) {
>>>>>           error_setg(errp, "aw-bits %d > host aw-bits %d",
>>>>>                      s->aw_bits, info.aw_bits);
>>>>>           return -EINVAL;
>>>>>       }
>>>>> }
>>>>>
>>>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a
>>>>> vIOMMUInterface::compatible() handler simply returning
>>>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
>>>>
>>>> Not quite get what HostIOMMUDeviceClass::is_compatible() does.
>>>
>>> HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU
>backend
>>> to determine which IOMMU types are exposed by the host, then calls the
>>> vIOMMUInterface::compatible() handler to do the compare. API is to be
>>> defined.
>>>
>>> As a refinement, we could introduce in the vIOMMU <->
>HostIOMMUDevice
>>> interface capabilities, or features, to check more precisely the level
>>> of compatibility between the vIOMMU and the host IOMMU device. This
>is
>>> similar to what is done between QEMU and KVM.
>>>
>>> If you think this is too complex, include type in HostIOMMUDeviceInfo.
>>>
>>>> Currently legacy and IOMMUFD host device has different check logic,
>how
>>> it can help
>>>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev()
>into
>>> a single vtd_check_hdev()?
>>>
>>> IMHO, IOMMU shouldn't be aware of the IOMMU backend
>implementation,
>>> but
>>> if you think the Intel vIOMMU should access directly the iommufd
>backend
>>> when available, then we should drop this proposal and revisit the design
>>> to take a different approach.
>>
>> I implemented a draft following your suggestions so we could explore
>further.
>> See
>https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_pre
>q_v3_tmp
>>
>> In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx
>> just like KVM CAPs.
>> A common HostIOMMUDeviceCaps structure is introduced to be used by
>> both legacy and iommufd backend.
>>
>> It indeed is cleaner. Only problem is I failed to implement .compatible()
>> as all the check could go ahead by just calling check_cap().
>> Could you help a quick check to see if I misunderstood any of your
>suggestion?
>
>Thanks for the changes. It looks cleaner and simpler ! Some comments,
>
>* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't
>   remember if you told me already you had plans for future changes.
>   Sorry about that if this is the case. I forgot.

Never mind😊, reason is:

In nesting series 
https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting_rfcv2/
This commit 
https://github.com/yiliu1765/qemu/commit/581fc900aa296988eaa48abee6d68d3670faf8c9
implement [at|de]tach_hwpt handlers.

So I add an extra layer of abstract HostIOMMUDeviceIOMMUFDClass to define
[at|de]tach_hwpt handlers.

>
>* I would use the 'host_iommu_device_' prefix for external routines
>   which are part of the HostIOMMUDevice API and use 'hiod_' for
>   internal routines where it makes sense, to limit the name length for
>   instance.

Good idea, will do.

>
>* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to
>   HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as a
>   theoretical example of a different IOMMU interface. I don't think we
>   need to anticipate yet :)

Will do.

>
>* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from
>   'linux/iommufd.h', that's not my preferred choice but I won't
>   object. The result looks good.

Ok, will keep it for now. We can change when you want in future.

>
>* HostIOMMUDevice now has a realize() routine to query the host IOMMU
>   capability for later usage. This is a good idea. However, you could
>   change the return value to bool and avoid the ERRP_GUARD() prologue.

Will do.

>
>* Beware of :
>
>     struct Range {
>         /*
>          * Do not access members directly, use the functions!
>          * A non-empty range has @lob <= @upb.
>          * An empty range has @lob == @upb + 1.
>          */
>         uint64_t lob;        /* inclusive lower bound */
>         uint64_t upb;        /* inclusive upper bound */
>     };

I remember😊, will add the change in formal version.

>
>
>* I think you could introduce a new VFIOIOMMUClass attribute. Let's
>   call it VFIOIOMMUClass::hiod_typename. The creation of
>HostIOMMUDevice
>   would become generic and you could move :
>
>     hiod=
>HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGACY_V
>FIO));
>     HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
>     if (*errp) {
>         object_unref(hiod);
>         return -EINVAL;
>     }
>     vbasedev->hiod = hiod;
>
>   at the end of vfio_attach_device().

Good suggestion! Will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-04-26  3:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-08  8:43 [PATCH v2 0/5] Check host IOMMU compatilibity with vIOMMU Zhenzhong Duan
2024-04-08  8:44 ` [PATCH v2 1/5] intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap Zhenzhong Duan
2024-04-08  8:44 ` [PATCH v2 2/5] intel_iommu: Implement set/unset_iommu_device() callback Zhenzhong Duan
2024-04-08  8:44 ` [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap Zhenzhong Duan
2024-04-15 15:31   ` Cédric Le Goater
2024-04-16  7:09     ` Duan, Zhenzhong
2024-04-16 14:17       ` Cédric Le Goater
2024-04-17  4:21         ` Duan, Zhenzhong
2024-04-17  8:30           ` Cédric Le Goater
2024-04-17  9:24             ` Duan, Zhenzhong
2024-04-18  6:42               ` Cédric Le Goater
2024-04-18  8:42                 ` Duan, Zhenzhong
2024-04-19  6:20                   ` Cédric Le Goater
2024-04-19  9:49                     ` Duan, Zhenzhong
2024-04-25  8:46                     ` Duan, Zhenzhong
2024-04-25 12:40                       ` Cédric Le Goater
2024-04-26  3:10                         ` Duan, Zhenzhong
2024-04-08  8:44 ` [PATCH v2 4/5] intel_iommu: Check for compatibility with legacy device Zhenzhong Duan
2024-04-08  8:44 ` [PATCH v2 5/5] intel_iommu: Check for compatibility with iommufd backed device Zhenzhong Duan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.