All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] PASID support for Intel IOMMU
@ 2022-01-05  4:19 Jason Wang
  2022-01-05  4:19 ` [PATCH] intel-iommu: correctly check passthrough during translation Jason Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-05  4:19 UTC (permalink / raw)
  To: mst, peterx; +Cc: Jason Wang, yi.l.liu, yi.y.sun, qemu-devel

Hi All:

This series tries to introduce PASID support for Intel IOMMU. The work
is based on the previous scalabe mode support by implement the
ECAP_PASID. A new "x-pasid-mode" is introduced to enable this
mode. All internal vIOMMU codes were extended to support PASID instead
of the current RID2PASID method. The code is also capable of
provisiong address space with PASID. Note that no devices can issue
PASID DMA right now, this needs future work.

This will be used for prototying PASID based device like virito or
future vPASID support for Intel IOMMU.

Test has been done with the Linux guest with scalalbe mode enabled and
disabled. A virtio prototype[1][2] that can issue PAISD based DMA
request were also tested.

This series depends on the fix of passthrough mode:

https://lore.kernel.org/all/20211222063956.2871-1-jasowang@redhat.com/T/

Please review.

[1] https://github.com/jasowang/qemu.git virtio-pasid
[2] https://github.com/jasowang/linux.git virtio-pasid

Jason Wang (3):
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  intel-iommu: drop VTDBus
  intel-iommu: PASID support

 hw/i386/intel_iommu.c          | 538 +++++++++++++++++++++------------
 hw/i386/intel_iommu_internal.h |  14 +-
 hw/i386/trace-events           |   2 +
 include/hw/i386/intel_iommu.h  |  16 +-
 include/hw/pci/pci_bus.h       |   2 +
 5 files changed, 359 insertions(+), 213 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] intel-iommu: correctly check passthrough during translation
  2022-01-05  4:19 [PATCH 0/3] PASID support for Intel IOMMU Jason Wang
@ 2022-01-05  4:19 ` Jason Wang
  2022-01-05  4:19 ` [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry Jason Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-05  4:19 UTC (permalink / raw)
  To: mst, peterx; +Cc: Jason Wang, yi.l.liu, yi.y.sun, qemu-devel

When scsalable mode is enabled, the passthrough more is not determined
by the context entry but PASID entry, so switch to use the logic of
vtd_dev_pt_enabled() to determine the passthrough mode in
vtd_do_iommu_translate().

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f584449d8d..f346a82652 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1512,11 +1512,29 @@ static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
  * 1st-level translation or 2nd-level translation, it depends
  * on PGTT setting.
  */
-static bool vtd_dev_pt_enabled(VTDAddressSpace *as)
+static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
+{
+    VTDPASIDEntry pe;
+    int ret;
+
+    if (s->root_scalable) {
+        ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
+        if (ret) {
+            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
+                              __func__, ret);
+            return false;
+        }
+        return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
+    }
+
+    return (vtd_ce_get_type(ce) == VTD_CONTEXT_TT_PASS_THROUGH);
+
+}
+
+static bool vtd_as_pt_enabled(VTDAddressSpace *as)
 {
     IntelIOMMUState *s;
     VTDContextEntry ce;
-    VTDPASIDEntry pe;
     int ret;
 
     assert(as);
@@ -1534,17 +1552,7 @@ static bool vtd_dev_pt_enabled(VTDAddressSpace *as)
         return false;
     }
 
-    if (s->root_scalable) {
-        ret = vtd_ce_get_rid2pasid_entry(s, &ce, &pe);
-        if (ret) {
-            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
-                              __func__, ret);
-            return false;
-        }
-        return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
-    }
-
-    return (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH);
+    return vtd_dev_pt_enabled(s, &ce);
 }
 
 /* Return whether the device is using IOMMU translation. */
@@ -1556,7 +1564,7 @@ static bool vtd_switch_address_space(VTDAddressSpace *as)
 
     assert(as);
 
-    use_iommu = as->iommu_state->dmar_enabled && !vtd_dev_pt_enabled(as);
+    use_iommu = as->iommu_state->dmar_enabled && !vtd_as_pt_enabled(as);
 
     trace_vtd_switch_address_space(pci_bus_num(as->bus),
                                    VTD_PCI_SLOT(as->devfn),
@@ -1749,7 +1757,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
      * We don't need to translate for pass-through context entries.
      * Also, let's ignore IOTLB caching as well for PT devices.
      */
-    if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
+    if (vtd_dev_pt_enabled(s, &ce)) {
         entry->iova = addr & VTD_PAGE_MASK_4K;
         entry->translated_addr = entry->iova;
         entry->addr_mask = ~VTD_PAGE_MASK_4K;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-05  4:19 [PATCH 0/3] PASID support for Intel IOMMU Jason Wang
  2022-01-05  4:19 ` [PATCH] intel-iommu: correctly check passthrough during translation Jason Wang
@ 2022-01-05  4:19 ` Jason Wang
  2022-01-13  3:35   ` Peter Xu
  2022-01-13  7:06   ` Michael S. Tsirkin
  2022-01-05  4:19 ` [PATCH 2/3] intel-iommu: drop VTDBus Jason Wang
  2022-01-05  4:19 ` [PATCH 3/3] intel-iommu: PASID support Jason Wang
  3 siblings, 2 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-05  4:19 UTC (permalink / raw)
  To: mst, peterx; +Cc: Jason Wang, yi.l.liu, yi.y.sun, qemu-devel

We use to warn on wrong rid2pasid entry. But this error could be
triggered by the guest and could happens during initialization. So
let's don't warn in this case.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4c6c016388..f2c7a23712 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
     if (s->root_scalable) {
         ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
         if (ret) {
-            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
-                              __func__, ret);
+            /*
+             * This error is guest triggerable. We should assumt PT
+             * not enabled for safety.
+             */
             return false;
         }
         return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] intel-iommu: drop VTDBus
  2022-01-05  4:19 [PATCH 0/3] PASID support for Intel IOMMU Jason Wang
  2022-01-05  4:19 ` [PATCH] intel-iommu: correctly check passthrough during translation Jason Wang
  2022-01-05  4:19 ` [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry Jason Wang
@ 2022-01-05  4:19 ` Jason Wang
  2022-01-13  4:12   ` Peter Xu
  2022-01-05  4:19 ` [PATCH 3/3] intel-iommu: PASID support Jason Wang
  3 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2022-01-05  4:19 UTC (permalink / raw)
  To: mst, peterx; +Cc: Jason Wang, yi.l.liu, yi.y.sun, qemu-devel

We introduce VTDBus structure as an intermediate step for searching
the address space. This works well with SID based matching/lookup. But
when we want to support SID plus PASID based address space lookup,
this intermediate steps turns out to be a burden. So the patch simply
drops the VTDBus structure and use the PCIBus and devfn as the key for
the g_hash_table(). This simplifies the codes and the future PASID
extension.

This may case slight slower for the vtd_find_as_from_bus_num() callers
but since they are all called from the control path, we can afford it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c         | 183 +++++++++++++---------------------
 include/hw/i386/intel_iommu.h |  10 +-
 2 files changed, 69 insertions(+), 124 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f2c7a23712..58c682097b 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -61,6 +61,11 @@
     }                                                                         \
 }
 
+struct vtd_as_key {
+    PCIBus *bus;
+    uint8_t devfn;
+};
+
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
@@ -190,12 +195,18 @@ static inline gboolean vtd_as_has_map_notifier(VTDAddressSpace *as)
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
-    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+    const struct vtd_as_key *key1 = v1;
+    const struct vtd_as_key *key2 = v2;
+
+    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
 }
 
 static guint vtd_uint64_hash(gconstpointer v)
 {
-    return (guint)*(const uint64_t *)v;
+    const struct vtd_as_key *key = v;
+    guint value = (guint)(uintptr_t)key->bus;
+
+    return (guint)(value << 8 | key->devfn);
 }
 
 static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
@@ -236,22 +247,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
 static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
 {
     VTDAddressSpace *vtd_as;
-    VTDBus *vtd_bus;
-    GHashTableIter bus_it;
-    uint32_t devfn_it;
+    GHashTableIter as_it;
 
     trace_vtd_context_cache_reset();
 
-    g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr);
+    g_hash_table_iter_init(&as_it, s->vtd_as);
 
-    while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-        for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
-            vtd_as = vtd_bus->dev_as[devfn_it];
-            if (!vtd_as) {
-                continue;
-            }
-            vtd_as->context_cache_entry.context_cache_gen = 0;
-        }
+    while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) {
+        vtd_as->context_cache_entry.context_cache_gen = 0;
     }
     s->context_cache_gen = 1;
 }
@@ -986,32 +989,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
     return slpte & rsvd_mask;
 }
 
-/* Find the VTD address space associated with a given bus number */
-static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
-{
-    VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
-    GHashTableIter iter;
-
-    if (vtd_bus) {
-        return vtd_bus;
-    }
-
-    /*
-     * Iterate over the registered buses to find the one which
-     * currently holds this bus number and update the bus_num
-     * lookup table.
-     */
-    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-        if (pci_bus_num(vtd_bus->bus) == bus_num) {
-            s->vtd_as_by_bus_num[bus_num] = vtd_bus;
-            return vtd_bus;
-        }
-    }
-
-    return NULL;
-}
-
 /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
@@ -1604,18 +1581,12 @@ static bool vtd_switch_address_space(VTDAddressSpace *as)
 
 static void vtd_switch_address_space_all(IntelIOMMUState *s)
 {
+    VTDAddressSpace *vtd_as;
     GHashTableIter iter;
-    VTDBus *vtd_bus;
-    int i;
-
-    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-        for (i = 0; i < PCI_DEVFN_MAX; i++) {
-            if (!vtd_bus->dev_as[i]) {
-                continue;
-            }
-            vtd_switch_address_space(vtd_bus->dev_as[i]);
-        }
+
+    g_hash_table_iter_init(&iter, s->vtd_as);
+    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_as)) {
+        vtd_switch_address_space(vtd_as);
     }
 }
 
@@ -1659,16 +1630,11 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
 
 static void vtd_pt_enable_fast_path(IntelIOMMUState *s, uint16_t source_id)
 {
-    VTDBus *vtd_bus;
     VTDAddressSpace *vtd_as;
     bool success = false;
+    uintptr_t key = (uintptr_t)source_id;
 
-    vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
-    if (!vtd_bus) {
-        goto out;
-    }
-
-    vtd_as = vtd_bus->dev_as[VTD_SID_TO_DEVFN(source_id)];
+    vtd_as = g_hash_table_lookup(s->vtd_as, &key);
     if (!vtd_as) {
         goto out;
     }
@@ -1876,11 +1842,10 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                                           uint16_t source_id,
                                           uint16_t func_mask)
 {
+    GHashTableIter as_it;
     uint16_t mask;
-    VTDBus *vtd_bus;
     VTDAddressSpace *vtd_as;
     uint8_t bus_n, devfn;
-    uint16_t devfn_it;
 
     trace_vtd_inv_desc_cc_devices(source_id, func_mask);
 
@@ -1903,32 +1868,31 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
     mask = ~mask;
 
     bus_n = VTD_SID_TO_BUS(source_id);
-    vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
-    if (vtd_bus) {
-        devfn = VTD_SID_TO_DEVFN(source_id);
-        for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
-            vtd_as = vtd_bus->dev_as[devfn_it];
-            if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
-                trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
-                                             VTD_PCI_FUNC(devfn_it));
-                vtd_iommu_lock(s);
-                vtd_as->context_cache_entry.context_cache_gen = 0;
-                vtd_iommu_unlock(s);
-                /*
-                 * Do switch address space when needed, in case if the
-                 * device passthrough bit is switched.
-                 */
-                vtd_switch_address_space(vtd_as);
-                /*
-                 * So a device is moving out of (or moving into) a
-                 * domain, resync the shadow page table.
-                 * This won't bring bad even if we have no such
-                 * notifier registered - the IOMMU notification
-                 * framework will skip MAP notifications if that
-                 * happened.
-                 */
-                vtd_sync_shadow_page_table(vtd_as);
-            }
+    devfn = VTD_SID_TO_DEVFN(source_id);
+
+    g_hash_table_iter_init(&as_it, s->vtd_as);
+    while (g_hash_table_iter_next(&as_it, NULL, (void**)&vtd_as)) {
+        if ((pci_bus_num(vtd_as->bus) == bus_n) &&
+            (vtd_as->devfn & mask) == (devfn & mask)) {
+            trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(vtd_as->devfn),
+                                         VTD_PCI_FUNC(vtd_as->devfn));
+            vtd_iommu_lock(s);
+            vtd_as->context_cache_entry.context_cache_gen = 0;
+            vtd_iommu_unlock(s);
+            /*
+             * Do switch address space when needed, in case if the
+             * device passthrough bit is switched.
+             */
+            vtd_switch_address_space(vtd_as);
+            /*
+             * So a device is moving out of (or moving into) a
+             * domain, resync the shadow page table.
+             * This won't bring bad even if we have no such
+             * notifier registered - the IOMMU notification
+             * framework will skip MAP notifications if that
+             * happened.
+             */
+            vtd_sync_shadow_page_table(vtd_as);
         }
     }
 }
@@ -2442,18 +2406,14 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
 {
     VTDAddressSpace *vtd_dev_as;
     IOMMUTLBEvent event;
-    struct VTDBus *vtd_bus;
+    uintptr_t key;
     hwaddr addr;
     uint64_t sz;
     uint16_t sid;
-    uint8_t devfn;
     bool size;
-    uint8_t bus_num;
 
     addr = VTD_INV_DESC_DEVICE_IOTLB_ADDR(inv_desc->hi);
     sid = VTD_INV_DESC_DEVICE_IOTLB_SID(inv_desc->lo);
-    devfn = sid & 0xff;
-    bus_num = sid >> 8;
     size = VTD_INV_DESC_DEVICE_IOTLB_SIZE(inv_desc->hi);
 
     if ((inv_desc->lo & VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO) ||
@@ -2464,12 +2424,8 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
         return false;
     }
 
-    vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
-    if (!vtd_bus) {
-        goto done;
-    }
-
-    vtd_dev_as = vtd_bus->dev_as[devfn];
+    key = (uintptr_t)sid;
+    vtd_dev_as = g_hash_table_lookup(s->vtd_as, &key);
     if (!vtd_dev_as) {
         goto done;
     }
@@ -3390,27 +3346,22 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
 
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 {
-    uintptr_t key = (uintptr_t)bus;
-    VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
-    VTDAddressSpace *vtd_dev_as;
+    struct vtd_as_key key = {
+        .bus = bus,
+        .devfn = devfn,
+    };
+    VTDAddressSpace *vtd_dev_as = g_hash_table_lookup(s->vtd_as, &key);
     char name[128];
 
-    if (!vtd_bus) {
-        uintptr_t *new_key = g_malloc(sizeof(*new_key));
-        *new_key = (uintptr_t)bus;
-        /* No corresponding free() */
-        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
-                            PCI_DEVFN_MAX);
-        vtd_bus->bus = bus;
-        g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
-    }
+    if (!vtd_dev_as) {
+        struct vtd_as_key *new_key = g_malloc(sizeof(*new_key));
 
-    vtd_dev_as = vtd_bus->dev_as[devfn];
+        new_key->bus = bus;
+        new_key->devfn = devfn;
 
-    if (!vtd_dev_as) {
         snprintf(name, sizeof(name), "vtd-%02x.%x", PCI_SLOT(devfn),
                  PCI_FUNC(devfn));
-        vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
+        vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
 
         vtd_dev_as->bus = bus;
         vtd_dev_as->devfn = (uint8_t)devfn;
@@ -3466,6 +3417,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
                                             &vtd_dev_as->nodmar, 0);
 
         vtd_switch_address_space(vtd_dev_as);
+
+        g_hash_table_insert(s->vtd_as, new_key, vtd_dev_as);
     }
     return vtd_dev_as;
 }
@@ -3750,6 +3703,7 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
 
     vtd_as = vtd_find_add_as(s, bus, devfn);
+
     return &vtd_as->as;
 }
 
@@ -3835,7 +3789,6 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 
     QLIST_INIT(&s->vtd_as_with_notifiers);
     qemu_mutex_init(&s->iommu_lock);
-    memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
     memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
                           "intel_iommu", DMAR_REG_SIZE);
 
@@ -3857,8 +3810,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     /* No corresponding destroy */
     s->iotlb = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
                                      g_free, g_free);
-    s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
-                                              g_free, g_free);
+    s->vtd_as = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
+                                      g_free, g_free);
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
     pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 41783ee46d..014d77dc2a 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -58,7 +58,6 @@ typedef struct VTDContextEntry VTDContextEntry;
 typedef struct VTDContextCacheEntry VTDContextCacheEntry;
 typedef struct VTDAddressSpace VTDAddressSpace;
 typedef struct VTDIOTLBEntry VTDIOTLBEntry;
-typedef struct VTDBus VTDBus;
 typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
@@ -111,12 +110,6 @@ struct VTDAddressSpace {
     IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
 };
 
-struct VTDBus {
-    PCIBus* bus;		/* A reference to the bus to provide translation for */
-    /* A table of VTDAddressSpace objects indexed by devfn */
-    VTDAddressSpace *dev_as[];
-};
-
 struct VTDIOTLBEntry {
     uint64_t gfn;
     uint16_t domain_id;
@@ -252,8 +245,7 @@ struct IntelIOMMUState {
     uint32_t context_cache_gen;     /* Should be in [1,MAX] */
     GHashTable *iotlb;              /* IOTLB */
 
-    GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
-    VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
+    GHashTable *vtd_as;             /* VTD address space indexed by source id*/
     /* list of registered notifiers */
     QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] intel-iommu: PASID support
  2022-01-05  4:19 [PATCH 0/3] PASID support for Intel IOMMU Jason Wang
                   ` (2 preceding siblings ...)
  2022-01-05  4:19 ` [PATCH 2/3] intel-iommu: drop VTDBus Jason Wang
@ 2022-01-05  4:19 ` Jason Wang
  2022-01-13  5:06   ` Peter Xu
  3 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2022-01-05  4:19 UTC (permalink / raw)
  To: mst, peterx; +Cc: Jason Wang, yi.l.liu, yi.y.sun, qemu-devel

This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
existing support for scalable mode, we need to implement the following
missing parts:

1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
   with PASID
2) tag IOTLB with PASID
3) PASID cache and its flush

For simplicity, PASID cache is not implemented so we can simply
implement the PASID cache flush as a nop. This could be added in the
future.

Note that though PASID based IOMMU translation is ready but no device
can issue PASID DMA right now. In this case, PCI_NO_PASID is used as
PASID to identify the address w/ PASID. vtd_find_add_as() has been
extended to provision address space with PASID which could be utilized
by the future extension of PCI core to allow device model to use PASID
based DMA translation.

This feature would be useful for:

1) prototyping PASID support for devices like virtio
2) future vPASID work

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c          | 355 +++++++++++++++++++++++++--------
 hw/i386/intel_iommu_internal.h |  14 +-
 hw/i386/trace-events           |   2 +
 include/hw/i386/intel_iommu.h  |   6 +-
 include/hw/pci/pci_bus.h       |   2 +
 5 files changed, 289 insertions(+), 90 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 58c682097b..04ae604ea1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -64,6 +64,14 @@
 struct vtd_as_key {
     PCIBus *bus;
     uint8_t devfn;
+    uint32_t pasid;
+};
+
+struct vtd_iotlb_key {
+    uint16_t sid;
+    uint32_t pasid;
+    uint64_t gfn;
+    uint32_t level;
 };
 
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
@@ -193,15 +201,36 @@ static inline gboolean vtd_as_has_map_notifier(VTDAddressSpace *as)
 }
 
 /* GHashTable functions */
-static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
+static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2)
+{
+    const struct vtd_iotlb_key *key1 = v1;
+    const struct vtd_iotlb_key *key2 = v2;
+
+    return key1->sid == key2->sid &&
+           key1->pasid == key2->pasid &&
+           key1->level == key2->level &&
+           key1->gfn == key2->gfn;
+}
+
+static guint vtd_iotlb_hash(gconstpointer v)
+{
+    const struct vtd_iotlb_key *key = v;
+
+    return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
+           (key->level) << VTD_IOTLB_LVL_SHIFT |
+           (key->pasid) << VTD_IOTLB_PASID_SHIFT;
+}
+
+static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
 {
     const struct vtd_as_key *key1 = v1;
     const struct vtd_as_key *key2 = v2;
 
-    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn) &&
+           (key1->pasid == key2->pasid);
 }
 
-static guint vtd_uint64_hash(gconstpointer v)
+static guint vtd_as_hash(gconstpointer v)
 {
     const struct vtd_as_key *key = v;
     guint value = (guint)(uintptr_t)key->bus;
@@ -241,6 +270,29 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
              (entry->gfn == gfn_tlb));
 }
 
+static gboolean vtd_hash_remove_by_page_pasid(gpointer key, gpointer value,
+                                              gpointer user_data)
+{
+    VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
+    VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
+    uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;
+    uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K;
+    return (entry->domain_id == info->domain_id) &&
+           (entry->pasid == info->pasid) &&
+            (((entry->gfn & info->mask) == gfn) ||
+             (entry->gfn == gfn_tlb));
+}
+
+static gboolean vtd_hash_remove_by_pasid(gpointer key, gpointer value,
+                                         gpointer user_data)
+{
+    VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
+    VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
+
+    return (entry->domain_id == info->domain_id) &&
+           (entry->pasid == info->pasid);
+}
+
 /* Reset all the gen of VTDAddressSpace to zero and set the gen of
  * IntelIOMMUState to 1.  Must be called with IOMMU lock held.
  */
@@ -281,13 +333,6 @@ static void vtd_reset_caches(IntelIOMMUState *s)
     vtd_iommu_unlock(s);
 }
 
-static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
-                                  uint32_t level)
-{
-    return gfn | ((uint64_t)(source_id) << VTD_IOTLB_SID_SHIFT) |
-           ((uint64_t)(level) << VTD_IOTLB_LVL_SHIFT);
-}
-
 static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 {
     return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
@@ -295,15 +340,17 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 
 /* Must be called with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
-                                       hwaddr addr)
+                                       hwaddr addr, uint32_t pasid)
 {
+    struct vtd_iotlb_key key;
     VTDIOTLBEntry *entry;
-    uint64_t key;
     int level;
 
     for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
-        key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
-                                source_id, level);
+        key.gfn = vtd_get_iotlb_gfn(addr, level);
+        key.level = level;
+        key.sid = source_id;
+        key.pasid = pasid;
         entry = g_hash_table_lookup(s->iotlb, &key);
         if (entry) {
             goto out;
@@ -317,10 +364,11 @@ out:
 /* Must be with IOMMU lock held */
 static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
                              uint16_t domain_id, hwaddr addr, uint64_t slpte,
-                             uint8_t access_flags, uint32_t level)
+                             uint8_t access_flags, uint32_t level,
+                             uint32_t pasid)
 {
     VTDIOTLBEntry *entry = g_malloc(sizeof(*entry));
-    uint64_t *key = g_malloc(sizeof(*key));
+    struct vtd_iotlb_key *key = g_malloc(sizeof(*key));
     uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
 
     trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
@@ -334,7 +382,13 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
     entry->slpte = slpte;
     entry->access_flags = access_flags;
     entry->mask = vtd_slpt_level_page_mask(level);
-    *key = vtd_get_iotlb_key(gfn, source_id, level);
+    entry->pasid = pasid;
+
+    key->gfn = gfn;
+    key->sid = source_id;
+    key->level = level;
+    key->pasid = pasid;
+
     g_hash_table_replace(s->iotlb, key, entry);
 }
 
@@ -803,13 +857,15 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s,
 
 static int vtd_ce_get_rid2pasid_entry(IntelIOMMUState *s,
                                       VTDContextEntry *ce,
-                                      VTDPASIDEntry *pe)
+                                      VTDPASIDEntry *pe,
+                                      uint32_t pasid)
 {
-    uint32_t pasid;
     dma_addr_t pasid_dir_base;
     int ret = 0;
 
-    pasid = VTD_CE_GET_RID2PASID(ce);
+    if (pasid == PCI_NO_PASID) {
+        pasid = VTD_CE_GET_RID2PASID(ce);
+    }
     pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(ce);
     ret = vtd_get_pe_from_pasid_table(s, pasid_dir_base, pasid, pe);
 
@@ -818,15 +874,17 @@ static int vtd_ce_get_rid2pasid_entry(IntelIOMMUState *s,
 
 static int vtd_ce_get_pasid_fpd(IntelIOMMUState *s,
                                 VTDContextEntry *ce,
-                                bool *pe_fpd_set)
+                                bool *pe_fpd_set,
+                                uint32_t pasid)
 {
     int ret;
-    uint32_t pasid;
     dma_addr_t pasid_dir_base;
     VTDPASIDDirEntry pdire;
     VTDPASIDEntry pe;
 
-    pasid = VTD_CE_GET_RID2PASID(ce);
+    if (pasid == PCI_NO_PASID) {
+        pasid = VTD_CE_GET_RID2PASID(ce);
+    }
     pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(ce);
 
     /*
@@ -872,12 +930,13 @@ static inline uint32_t vtd_ce_get_level(VTDContextEntry *ce)
 }
 
 static uint32_t vtd_get_iova_level(IntelIOMMUState *s,
-                                   VTDContextEntry *ce)
+                                   VTDContextEntry *ce,
+                                   uint32_t pasid)
 {
     VTDPASIDEntry pe;
 
     if (s->root_scalable) {
-        vtd_ce_get_rid2pasid_entry(s, ce, &pe);
+        vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
         return VTD_PE_GET_LEVEL(&pe);
     }
 
@@ -890,12 +949,13 @@ static inline uint32_t vtd_ce_get_agaw(VTDContextEntry *ce)
 }
 
 static uint32_t vtd_get_iova_agaw(IntelIOMMUState *s,
-                                  VTDContextEntry *ce)
+                                  VTDContextEntry *ce,
+                                  uint32_t pasid)
 {
     VTDPASIDEntry pe;
 
     if (s->root_scalable) {
-        vtd_ce_get_rid2pasid_entry(s, ce, &pe);
+        vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
         return 30 + ((pe.val[0] >> 2) & VTD_SM_PASID_ENTRY_AW) * 9;
     }
 
@@ -937,31 +997,33 @@ static inline bool vtd_ce_type_check(X86IOMMUState *x86_iommu,
 }
 
 static inline uint64_t vtd_iova_limit(IntelIOMMUState *s,
-                                      VTDContextEntry *ce, uint8_t aw)
+                                      VTDContextEntry *ce, uint8_t aw,
+                                      uint32_t pasid)
 {
-    uint32_t ce_agaw = vtd_get_iova_agaw(s, ce);
+    uint32_t ce_agaw = vtd_get_iova_agaw(s, ce, pasid);
     return 1ULL << MIN(ce_agaw, aw);
 }
 
 /* Return true if IOVA passes range check, otherwise false. */
 static inline bool vtd_iova_range_check(IntelIOMMUState *s,
                                         uint64_t iova, VTDContextEntry *ce,
-                                        uint8_t aw)
+                                        uint8_t aw, uint32_t pasid)
 {
     /*
      * Check if @iova is above 2^X-1, where X is the minimum of MGAW
      * in CAP_REG and AW in context-entry.
      */
-    return !(iova & ~(vtd_iova_limit(s, ce, aw) - 1));
+    return !(iova & ~(vtd_iova_limit(s, ce, aw, pasid) - 1));
 }
 
 static dma_addr_t vtd_get_iova_pgtbl_base(IntelIOMMUState *s,
-                                          VTDContextEntry *ce)
+                                          VTDContextEntry *ce,
+                                          uint32_t pasid)
 {
     VTDPASIDEntry pe;
 
     if (s->root_scalable) {
-        vtd_ce_get_rid2pasid_entry(s, ce, &pe);
+        vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
         return pe.val[0] & VTD_SM_PASID_ENTRY_SLPTPTR;
     }
 
@@ -995,15 +1057,16 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
 static int vtd_iova_to_slpte(IntelIOMMUState *s, VTDContextEntry *ce,
                              uint64_t iova, bool is_write,
                              uint64_t *slptep, uint32_t *slpte_level,
-                             bool *reads, bool *writes, uint8_t aw_bits)
+                             bool *reads, bool *writes, uint8_t aw_bits,
+                             uint32_t pasid)
 {
-    dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce);
-    uint32_t level = vtd_get_iova_level(s, ce);
+    dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
+    uint32_t level = vtd_get_iova_level(s, ce, pasid);
     uint32_t offset;
     uint64_t slpte;
     uint64_t access_right_check;
 
-    if (!vtd_iova_range_check(s, iova, ce, aw_bits)) {
+    if (!vtd_iova_range_check(s, iova, ce, aw_bits, pasid)) {
         error_report_once("%s: detected IOVA overflow (iova=0x%" PRIx64 ")",
                           __func__, iova);
         return -VTD_FR_ADDR_BEYOND_MGAW;
@@ -1019,7 +1082,7 @@ static int vtd_iova_to_slpte(IntelIOMMUState *s, VTDContextEntry *ce,
         if (slpte == (uint64_t)-1) {
             error_report_once("%s: detected read error on DMAR slpte "
                               "(iova=0x%" PRIx64 ")", __func__, iova);
-            if (level == vtd_get_iova_level(s, ce)) {
+            if (level == vtd_get_iova_level(s, ce, pasid)) {
                 /* Invalid programming of context-entry */
                 return -VTD_FR_CONTEXT_ENTRY_INV;
             } else {
@@ -1261,18 +1324,19 @@ next:
  */
 static int vtd_page_walk(IntelIOMMUState *s, VTDContextEntry *ce,
                          uint64_t start, uint64_t end,
-                         vtd_page_walk_info *info)
+                         vtd_page_walk_info *info,
+                         uint32_t pasid)
 {
-    dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce);
-    uint32_t level = vtd_get_iova_level(s, ce);
+    dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
+    uint32_t level = vtd_get_iova_level(s, ce, pasid);
 
-    if (!vtd_iova_range_check(s, start, ce, info->aw)) {
+    if (!vtd_iova_range_check(s, start, ce, info->aw, pasid)) {
         return -VTD_FR_ADDR_BEYOND_MGAW;
     }
 
-    if (!vtd_iova_range_check(s, end, ce, info->aw)) {
+    if (!vtd_iova_range_check(s, end, ce, info->aw, pasid)) {
         /* Fix end so that it reaches the maximum */
-        end = vtd_iova_limit(s, ce, info->aw);
+        end = vtd_iova_limit(s, ce, info->aw, pasid);
     }
 
     return vtd_page_walk_level(addr, start, end, level, true, true, info);
@@ -1340,7 +1404,7 @@ static int vtd_ce_rid2pasid_check(IntelIOMMUState *s,
      * has valid rid2pasid setting, which includes valid
      * rid2pasid field and corresponding pasid entry setting
      */
-    return vtd_ce_get_rid2pasid_entry(s, ce, &pe);
+    return vtd_ce_get_rid2pasid_entry(s, ce, &pe, PCI_NO_PASID);
 }
 
 /* Map a device to its corresponding domain (context-entry) */
@@ -1423,12 +1487,13 @@ static int vtd_sync_shadow_page_hook(IOMMUTLBEvent *event,
 }
 
 static uint16_t vtd_get_domain_id(IntelIOMMUState *s,
-                                  VTDContextEntry *ce)
+                                  VTDContextEntry *ce,
+                                  uint32_t pasid)
 {
     VTDPASIDEntry pe;
 
     if (s->root_scalable) {
-        vtd_ce_get_rid2pasid_entry(s, ce, &pe);
+        vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
         return VTD_SM_PASID_ENTRY_DID(pe.val[1]);
     }
 
@@ -1446,10 +1511,10 @@ static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
         .notify_unmap = true,
         .aw = s->aw_bits,
         .as = vtd_as,
-        .domain_id = vtd_get_domain_id(s, ce),
+        .domain_id = vtd_get_domain_id(s, ce, vtd_as->pasid),
     };
 
-    return vtd_page_walk(s, ce, addr, addr + size, &info);
+    return vtd_page_walk(s, ce, addr, addr + size, &info, vtd_as->pasid);
 }
 
 static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
@@ -1493,13 +1558,14 @@ static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
  * 1st-level translation or 2nd-level translation, it depends
  * on PGTT setting.
  */
-static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
+static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce,
+                               uint32_t pasid)
 {
     VTDPASIDEntry pe;
     int ret;
 
     if (s->root_scalable) {
-        ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
+        ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
         if (ret) {
             /*
              * This error is guest triggerable. We should assumt PT
@@ -1535,7 +1601,7 @@ static bool vtd_as_pt_enabled(VTDAddressSpace *as)
         return false;
     }
 
-    return vtd_dev_pt_enabled(s, &ce);
+    return vtd_dev_pt_enabled(s, &ce, as->pasid);
 }
 
 /* Return whether the device is using IOMMU translation. */
@@ -1669,7 +1735,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     uint8_t bus_num = pci_bus_num(bus);
     VTDContextCacheEntry *cc_entry;
     uint64_t slpte, page_mask;
-    uint32_t level;
+    uint32_t level, pasid = vtd_as->pasid;
     uint16_t source_id = vtd_make_source_id(bus_num, devfn);
     int ret_fr;
     bool is_fpd_set = false;
@@ -1688,17 +1754,6 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
 
     cc_entry = &vtd_as->context_cache_entry;
 
-    /* Try to fetch slpte form IOTLB */
-    iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
-    if (iotlb_entry) {
-        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
-                                 iotlb_entry->domain_id);
-        slpte = iotlb_entry->slpte;
-        access_flags = iotlb_entry->access_flags;
-        page_mask = iotlb_entry->mask;
-        goto out;
-    }
-
     /* Try to fetch context-entry from cache first */
     if (cc_entry->context_cache_gen == s->context_cache_gen) {
         trace_vtd_iotlb_cc_hit(bus_num, devfn, cc_entry->context_entry.hi,
@@ -1707,14 +1762,14 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         ce = cc_entry->context_entry;
         is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
         if (!is_fpd_set && s->root_scalable) {
-            ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
+            ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set, pasid);
             VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
         }
     } else {
         ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
         is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
         if (!ret_fr && !is_fpd_set && s->root_scalable) {
-            ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
+            ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set, pasid);
         }
         VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
         /* Update context-cache */
@@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
 
+    /* Try to fetch slpte form IOTLB */
+    if ((pasid == PCI_NO_PASID) && s->root_scalable) {
+        pasid = VTD_CE_GET_RID2PASID(&ce);
+    }
+
     /*
      * We don't need to translate for pass-through context entries.
      * Also, let's ignore IOTLB caching as well for PT devices.
      */
-    if (vtd_dev_pt_enabled(s, &ce)) {
+    if (vtd_dev_pt_enabled(s, &ce, pasid)) {
         entry->iova = addr & VTD_PAGE_MASK_4K;
         entry->translated_addr = entry->iova;
         entry->addr_mask = ~VTD_PAGE_MASK_4K;
@@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         return true;
     }
 
+    iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
+    if (iotlb_entry) {
+        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
+                                 iotlb_entry->domain_id);
+        slpte = iotlb_entry->slpte;
+        access_flags = iotlb_entry->access_flags;
+        page_mask = iotlb_entry->mask;
+        goto out;
+    }
+
     ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
-                               &reads, &writes, s->aw_bits);
+                               &reads, &writes, s->aw_bits, pasid);
     VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
 
     page_mask = vtd_slpt_level_page_mask(level);
     access_flags = IOMMU_ACCESS_FLAG(reads, writes);
-    vtd_update_iotlb(s, source_id, vtd_get_domain_id(s, &ce), addr, slpte,
-                     access_flags, level);
+    vtd_update_iotlb(s, source_id, vtd_get_domain_id(s, &ce, pasid),
+                     addr, slpte, access_flags, level, pasid);
 out:
     vtd_iommu_unlock(s);
     entry->iova = addr & page_mask;
@@ -1949,7 +2019,7 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
     QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
         if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
                                       vtd_as->devfn, &ce) &&
-            domain_id == vtd_get_domain_id(s, &ce)) {
+            domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
             vtd_sync_shadow_page_table(vtd_as);
         }
     }
@@ -1957,7 +2027,7 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 
 static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                            uint16_t domain_id, hwaddr addr,
-                                           uint8_t am)
+                                             uint8_t am, uint32_t pasid)
 {
     VTDAddressSpace *vtd_as;
     VTDContextEntry ce;
@@ -1965,9 +2035,11 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
     hwaddr size = (1 << am) * VTD_PAGE_SIZE;
 
     QLIST_FOREACH(vtd_as, &(s->vtd_as_with_notifiers), next) {
+        if (pasid != PCI_NO_PASID && pasid != vtd_as->pasid)
+            continue;
         ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
                                        vtd_as->devfn, &ce);
-        if (!ret && domain_id == vtd_get_domain_id(s, &ce)) {
+        if (!ret && domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
             if (vtd_as_has_map_notifier(vtd_as)) {
                 /*
                  * As long as we have MAP notifications registered in
@@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
     vtd_iommu_lock(s);
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
     vtd_iommu_unlock(s);
-    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
+    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
+}
+
+static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
+                                            uint16_t domain_id,
+                                            hwaddr addr, uint8_t am,
+                                            uint32_t pasid)
+{
+    VTDIOTLBPageInvInfo info;
+
+    trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
+
+    assert(am <= VTD_MAMV);
+    info.domain_id = domain_id;
+    info.addr = addr;
+    info.mask = ~((1 << am) - 1);
+    info.pasid = pasid;
+    vtd_iommu_lock(s);
+    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, &info);
+    vtd_iommu_unlock(s);
+    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
+}
+
+static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id,
+                                       uint32_t pasid)
+{
+    VTDIOTLBPageInvInfo info;
+    VTDAddressSpace *vtd_as;
+    VTDContextEntry ce;
+
+    trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
+
+    info.domain_id = domain_id;
+    info.pasid = pasid;
+    vtd_iommu_lock(s);
+    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
+    vtd_iommu_unlock(s);
+
+    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
+        if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+                                      vtd_as->devfn, &ce) &&
+            domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
+            pasid == vtd_as->pasid) {
+            vtd_sync_shadow_page_table(vtd_as);
+        }
+    }
 }
 
 /* Flush IOTLB
@@ -2388,6 +2505,52 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static bool vtd_process_iotlb_pasid_desc(IntelIOMMUState *s,
+                                         VTDInvDesc *inv_desc)
+{
+    uint32_t pasid;
+    uint16_t domain_id;
+    uint8_t am;
+    hwaddr addr;
+
+    if ((inv_desc->lo & VTD_INV_DESC_IOTLB_PASID_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_IOTLB_PASID_RSVD_HI)) {
+        error_report_once("%s: invalid iotlb pasid inv desc: hi=0x%"PRIx64
+                          ", lo=0x%"PRIx64" (reserved bits unzero)",
+                          __func__, inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    switch (inv_desc->lo & VTD_INV_DESC_IOTLB_G) {
+    case VTD_INV_DESC_IOTLB_PASID_PASID:
+        domain_id = VTD_INV_DESC_IOTLB_DID(inv_desc->lo);
+        pasid = VTD_INV_DESC_IOTLB_PASID(inv_desc->lo);
+        vtd_iotlb_pasid_invalidate(s, domain_id, pasid);
+        break;
+    case VTD_INV_DESC_IOTLB_PASID_PAGE:
+        domain_id = VTD_INV_DESC_IOTLB_DID(inv_desc->lo);
+        pasid = VTD_INV_DESC_IOTLB_PASID(inv_desc->lo);
+        addr = VTD_INV_DESC_IOTLB_ADDR(inv_desc->hi);
+        am = VTD_INV_DESC_IOTLB_AM(inv_desc->hi);
+        if (am > VTD_MAMV) {
+            error_report_once("%s: invalid iotlb inv desc: hi=0x%"PRIx64
+                              ", lo=0x%"PRIx64" (am=%u > VTD_MAMV=%u)",
+                              __func__, inv_desc->hi, inv_desc->lo,
+                              am, (unsigned)VTD_MAMV);
+            return false;
+        }
+        vtd_iotlb_page_pasid_invalidate(s, domain_id, addr, am, pasid);
+        break;
+    default:
+        error_report_once("%s: invalid iotlb pasid inv desc: hi=0x%"PRIx64
+                          ", lo=0x%"PRIx64" (type mismatch: 0x%llx)",
+                          __func__, inv_desc->hi, inv_desc->lo,
+                          inv_desc->lo & VTD_INV_DESC_IOTLB_G);
+        return false;
+    }
+    return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
                                      VTDInvDesc *inv_desc)
 {
@@ -2487,15 +2650,19 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
-    /*
-     * TODO: the entity of below two cases will be implemented in future series.
-     * To make guest (which integrates scalable mode support patch set in
-     * iommu driver) work, just return true is enough so far.
-     */
     case VTD_INV_DESC_PC:
+        trace_vtd_inv_desc("pc", inv_desc.hi, inv_desc.lo);
+        /* We don't implement PASID cache, so it's fine to simply check iqa.dw */
+        if (!s->iq_dw) {
+            return false;
+        }
         break;
 
     case VTD_INV_DESC_PIOTLB:
+        trace_vtd_inv_desc("pasid-iotlb", inv_desc.hi, inv_desc.lo);
+        if (!s->iq_dw || !vtd_process_iotlb_pasid_desc(s, &inv_desc)) {
+            return false;
+        }
         break;
 
     case VTD_INV_DESC_WAIT:
@@ -3071,6 +3238,7 @@ static Property vtd_properties[] = {
                       VTD_HOST_ADDRESS_WIDTH),
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
     DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
+    DEFINE_PROP_BOOL("x-pasid-mode", IntelIOMMUState, pasid, false),
     DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -3344,11 +3512,13 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
     },
 };
 
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
+                                 int devfn, unsigned int pasid)
 {
     struct vtd_as_key key = {
         .bus = bus,
         .devfn = devfn,
+        .pasid = pasid,
     };
     VTDAddressSpace *vtd_dev_as = g_hash_table_lookup(s->vtd_as, &key);
     char name[128];
@@ -3358,6 +3528,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 
         new_key->bus = bus;
         new_key->devfn = devfn;
+        new_key->pasid = pasid;
 
         snprintf(name, sizeof(name), "vtd-%02x.%x", PCI_SLOT(devfn),
                  PCI_FUNC(devfn));
@@ -3365,6 +3536,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 
         vtd_dev_as->bus = bus;
         vtd_dev_as->devfn = (uint8_t)devfn;
+        vtd_dev_as->pasid = pasid;
         vtd_dev_as->iommu_state = s;
         vtd_dev_as->context_cache_entry.context_cache_gen = 0;
         vtd_dev_as->iova_tree = iova_tree_new();
@@ -3420,6 +3592,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 
         g_hash_table_insert(s->vtd_as, new_key, vtd_dev_as);
     }
+
     return vtd_dev_as;
 }
 
@@ -3525,7 +3698,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
                                   "legacy mode",
                                   bus_n, PCI_SLOT(vtd_as->devfn),
                                   PCI_FUNC(vtd_as->devfn),
-                                  vtd_get_domain_id(s, &ce),
+                                  vtd_get_domain_id(s, &ce, vtd_as->pasid),
                                   ce.hi, ce.lo);
         if (vtd_as_has_map_notifier(vtd_as)) {
             /* This is required only for MAP typed notifiers */
@@ -3535,10 +3708,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
                 .notify_unmap = false,
                 .aw = s->aw_bits,
                 .as = vtd_as,
-                .domain_id = vtd_get_domain_id(s, &ce),
+                .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
             };
 
-            vtd_page_walk(s, &ce, 0, ~0ULL, &info);
+            vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
         }
     } else {
         trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
@@ -3629,6 +3802,10 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
     }
 
+    if (s->pasid) {
+        s->ecap |= VTD_ECAP_PASID;
+    }
+
     vtd_reset_caches(s);
 
     /* Define registers with default values and bit semantics */
@@ -3702,8 +3879,7 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
     assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
 
-    vtd_as = vtd_find_add_as(s, bus, devfn);
-
+    vtd_as = vtd_find_add_as(s, bus, devfn, PCI_NO_PASID);
     return &vtd_as->as;
 }
 
@@ -3746,6 +3922,11 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
         return false;
     }
 
+    if (s->pasid && !s->scalable_mode) {
+        error_setg(errp, "Need to set PASID for scalable mode");
+        return false;
+    }
+
     return true;
 }
 
@@ -3808,9 +3989,9 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 
     sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->csrmem);
     /* No corresponding destroy */
-    s->iotlb = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
+    s->iotlb = g_hash_table_new_full(vtd_iotlb_hash, vtd_iotlb_equal,
                                      g_free, g_free);
-    s->vtd_as = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
+    s->vtd_as = g_hash_table_new_full(vtd_as_hash, vtd_as_equal,
                                       g_free, g_free);
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index a6c788049b..c0402113e2 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -114,8 +114,9 @@
                                      VTD_INTERRUPT_ADDR_FIRST + 1)
 
 /* The shift of source_id in the key of IOTLB hash table */
-#define VTD_IOTLB_SID_SHIFT         36
-#define VTD_IOTLB_LVL_SHIFT         52
+#define VTD_IOTLB_SID_SHIFT         20
+#define VTD_IOTLB_LVL_SHIFT         28
+#define VTD_IOTLB_PASID_SHIFT       30
 #define VTD_IOTLB_MAX_SIZE          1024    /* Max size of the hash table */
 
 /* IOTLB_REG */
@@ -190,6 +191,7 @@
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
 #define VTD_ECAP_SRS                (1ULL << 31)
+#define VTD_ECAP_PASID              (1ULL << 40)
 #define VTD_ECAP_SMTS               (1ULL << 43)
 #define VTD_ECAP_SLTS               (1ULL << 46)
 
@@ -210,6 +212,8 @@
 #define VTD_CAP_DRAIN_READ          (1ULL << 55)
 #define VTD_CAP_DRAIN               (VTD_CAP_DRAIN_READ | VTD_CAP_DRAIN_WRITE)
 #define VTD_CAP_CM                  (1ULL << 7)
+#define VTD_PASID_ID_SHIFT          20
+#define VTD_PASID_ID_MASK           ((1ULL << VTD_PASID_ID_SHIFT) - 1)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT         8
@@ -379,6 +383,11 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IOTLB_AM(val)      ((val) & 0x3fULL)
 #define VTD_INV_DESC_IOTLB_RSVD_LO      0xffffffff0000ff00ULL
 #define VTD_INV_DESC_IOTLB_RSVD_HI      0xf80ULL
+#define VTD_INV_DESC_IOTLB_PASID_PASID  (2ULL << 4)
+#define VTD_INV_DESC_IOTLB_PASID_PAGE   (3ULL << 4)
+#define VTD_INV_DESC_IOTLB_PASID(val)   (((val) >> 32) & VTD_PASID_ID_MASK)
+#define VTD_INV_DESC_IOTLB_PASID_RSVD_LO      0xfff00000000001c0ULL
+#define VTD_INV_DESC_IOTLB_PASID_RSVD_HI      0xf80ULL
 
 /* Mask for Device IOTLB Invalidate Descriptor */
 #define VTD_INV_DESC_DEVICE_IOTLB_ADDR(val) ((val) & 0xfffffffffffff000ULL)
@@ -413,6 +422,7 @@ typedef union VTDInvDesc VTDInvDesc;
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
+    uint32_t pasid;
     uint64_t addr;
     uint8_t mask;
 };
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 5bf7e52bf5..57beff0c17 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -12,6 +12,8 @@ vtd_inv_desc_cc_devices(uint16_t sid, uint16_t fmask) "context invalidate device
 vtd_inv_desc_iotlb_global(void) "iotlb invalidate global"
 vtd_inv_desc_iotlb_domain(uint16_t domain) "iotlb invalidate whole domain 0x%"PRIx16
 vtd_inv_desc_iotlb_pages(uint16_t domain, uint64_t addr, uint8_t mask) "iotlb invalidate domain 0x%"PRIx16" addr 0x%"PRIx64" mask 0x%"PRIx8
+vtd_inv_desc_iotlb_pasid_pages(uint16_t domain, uint64_t addr, uint8_t mask, uint32_t pasid) "iotlb invalidate domain 0x%"PRIx16" addr 0x%"PRIx64" mask 0x%"PRIx8" pasid 0x%"PRIx32
+vtd_inv_desc_iotlb_pasid(uint16_t domain, uint32_t pasid) "iotlb invalidate domain 0x%"PRIx16" pasid 0x%"PRIx32
 vtd_inv_desc_wait_sw(uint64_t addr, uint32_t data) "wait invalidate status write addr 0x%"PRIx64" data 0x%"PRIx32
 vtd_inv_desc_wait_irq(const char *msg) "%s"
 vtd_inv_desc_wait_write_fail(uint64_t hi, uint64_t lo) "write fail for wait desc hi 0x%"PRIx64" lo 0x%"PRIx64
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 014d77dc2a..2fadbf6033 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -97,6 +97,7 @@ struct VTDPASIDEntry {
 struct VTDAddressSpace {
     PCIBus *bus;
     uint8_t devfn;
+    uint32_t pasid;
     AddressSpace as;
     IOMMUMemoryRegion iommu;
     MemoryRegion root;          /* The root container of the device */
@@ -113,6 +114,7 @@ struct VTDAddressSpace {
 struct VTDIOTLBEntry {
     uint64_t gfn;
     uint16_t domain_id;
+    uint32_t pasid;
     uint64_t slpte;
     uint64_t mask;
     uint8_t access_flags;
@@ -258,6 +260,7 @@ struct IntelIOMMUState {
     bool buggy_eim;                 /* Force buggy EIM unless eim=off */
     uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
     bool dma_drain;                 /* Whether DMA r/w draining enabled */
+    bool pasid;                     /* Whether to support PASID */
 
     /*
      * Protects IOMMU states in general.  Currently it protects the
@@ -269,6 +272,7 @@ struct IntelIOMMUState {
 /* Find the VTD Address space associated with the given bus pointer,
  * create a new one if none exists
  */
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
+VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
+                                 int devfn, unsigned int pasid);
 
 #endif
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 347440d42c..cbfcf0b770 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -26,6 +26,8 @@ enum PCIBusFlags {
     PCI_BUS_EXTENDED_CONFIG_SPACE                           = 0x0002,
 };
 
+#define PCI_NO_PASID UINT32_MAX
+
 struct PCIBus {
     BusState qbus;
     enum PCIBusFlags flags;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-05  4:19 ` [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry Jason Wang
@ 2022-01-13  3:35   ` Peter Xu
  2022-01-13  6:16     ` Jason Wang
  2022-01-13  7:05     ` Michael S. Tsirkin
  2022-01-13  7:06   ` Michael S. Tsirkin
  1 sibling, 2 replies; 28+ messages in thread
From: Peter Xu @ 2022-01-13  3:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst

On Wed, Jan 05, 2022 at 12:19:43PM +0800, Jason Wang wrote:
> We use to warn on wrong rid2pasid entry. But this error could be
> triggered by the guest and could happens during initialization. So
> let's don't warn in this case.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 4c6c016388..f2c7a23712 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
>      if (s->root_scalable) {
>          ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
>          if (ret) {
> -            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
> -                              __func__, ret);
> +            /*
> +             * This error is guest triggerable. We should assumt PT
> +             * not enabled for safety.
> +             */
>              return false;
>          }
>          return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
> -- 
> 2.25.1
> 

No strong opinion, but the thing is mostly all error_report_once() in this file
is guest triggerable.  If we remove this one then it's debatable on whether we
want to remove all.

IMHO we used the _once() variant just for this: it won't go into any form of
DoS, meanwhile we'll still get some information (as hypervisor) that the guest
OS may not be trustworthy.

So from that pov it's still useful?  Or is this error very special in some way?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] intel-iommu: drop VTDBus
  2022-01-05  4:19 ` [PATCH 2/3] intel-iommu: drop VTDBus Jason Wang
@ 2022-01-13  4:12   ` Peter Xu
  2022-01-14  2:32     ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2022-01-13  4:12 UTC (permalink / raw)
  To: Jason Wang; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst

On Wed, Jan 05, 2022 at 12:19:44PM +0800, Jason Wang wrote:
> We introduce VTDBus structure as an intermediate step for searching
> the address space. This works well with SID based matching/lookup. But
> when we want to support SID plus PASID based address space lookup,
> this intermediate steps turns out to be a burden. So the patch simply
> drops the VTDBus structure and use the PCIBus and devfn as the key for
> the g_hash_table(). This simplifies the codes and the future PASID
> extension.
> 
> This may case slight slower for the vtd_find_as_from_bus_num() callers
> but since they are all called from the control path, we can afford it.

The only one I found is vtd_process_device_iotlb_desc() that may got affected
the most; the rest look mostly always traversing all the address space anyway
so shouldn't hurt.

I think dev-iotlb can be invalidated in IO path too when kernel device drivers
are used?  It shouldn't affect much when the VM has a few devices, but IIUC
it'll further slow down the kernel drivers on vIOMMU.  Maybe it's not a huge
problem either.

> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/i386/intel_iommu.c         | 183 +++++++++++++---------------------
>  include/hw/i386/intel_iommu.h |  10 +-
>  2 files changed, 69 insertions(+), 124 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index f2c7a23712..58c682097b 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -61,6 +61,11 @@
>      }                                                                         \
>  }
>  
> +struct vtd_as_key {
> +    PCIBus *bus;
> +    uint8_t devfn;
> +};
> +
>  static void vtd_address_space_refresh_all(IntelIOMMUState *s);
>  static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
>  
> @@ -190,12 +195,18 @@ static inline gboolean vtd_as_has_map_notifier(VTDAddressSpace *as)
>  /* GHashTable functions */
>  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>  {
> -    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
> +    const struct vtd_as_key *key1 = v1;
> +    const struct vtd_as_key *key2 = v2;
> +
> +    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
>  }
>  
>  static guint vtd_uint64_hash(gconstpointer v)
>  {
> -    return (guint)*(const uint64_t *)v;
> +    const struct vtd_as_key *key = v;
> +    guint value = (guint)(uintptr_t)key->bus;
> +
> +    return (guint)(value << 8 | key->devfn);

Note that value is a pointer to PCIBus*.  Just want to check with you that it's
intended to use this hash value (or maybe you wanted to use Source ID so it is
bus number to use not the bus pointer)?

>  }
>  
>  static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
> @@ -236,22 +247,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
>  static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
>  {
>      VTDAddressSpace *vtd_as;
> -    VTDBus *vtd_bus;
> -    GHashTableIter bus_it;
> -    uint32_t devfn_it;
> +    GHashTableIter as_it;
>  
>      trace_vtd_context_cache_reset();
>  
> -    g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr);
> +    g_hash_table_iter_init(&as_it, s->vtd_as);
>  
> -    while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
> -        for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
> -            vtd_as = vtd_bus->dev_as[devfn_it];
> -            if (!vtd_as) {
> -                continue;
> -            }
> -            vtd_as->context_cache_entry.context_cache_gen = 0;
> -        }
> +    while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) {
> +        vtd_as->context_cache_entry.context_cache_gen = 0;
>      }
>      s->context_cache_gen = 1;
>  }
> @@ -986,32 +989,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
>      return slpte & rsvd_mask;
>  }
>  
> -/* Find the VTD address space associated with a given bus number */
> -static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
> -{
> -    VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
> -    GHashTableIter iter;
> -
> -    if (vtd_bus) {
> -        return vtd_bus;
> -    }
> -
> -    /*
> -     * Iterate over the registered buses to find the one which
> -     * currently holds this bus number and update the bus_num
> -     * lookup table.
> -     */
> -    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> -    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
> -        if (pci_bus_num(vtd_bus->bus) == bus_num) {
> -            s->vtd_as_by_bus_num[bus_num] = vtd_bus;
> -            return vtd_bus;
> -        }
> -    }
> -
> -    return NULL;
> -}
> -
>  /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
>   * of the translation, can be used for deciding the size of large page.
>   */
> @@ -1604,18 +1581,12 @@ static bool vtd_switch_address_space(VTDAddressSpace *as)
>  
>  static void vtd_switch_address_space_all(IntelIOMMUState *s)
>  {
> +    VTDAddressSpace *vtd_as;
>      GHashTableIter iter;
> -    VTDBus *vtd_bus;
> -    int i;
> -
> -    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> -    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
> -        for (i = 0; i < PCI_DEVFN_MAX; i++) {
> -            if (!vtd_bus->dev_as[i]) {
> -                continue;
> -            }
> -            vtd_switch_address_space(vtd_bus->dev_as[i]);
> -        }
> +
> +    g_hash_table_iter_init(&iter, s->vtd_as);
> +    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_as)) {
> +        vtd_switch_address_space(vtd_as);
>      }
>  }
>  
> @@ -1659,16 +1630,11 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
>  
>  static void vtd_pt_enable_fast_path(IntelIOMMUState *s, uint16_t source_id)
>  {
> -    VTDBus *vtd_bus;
>      VTDAddressSpace *vtd_as;
>      bool success = false;
> +    uintptr_t key = (uintptr_t)source_id;
>  
> -    vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
> -    if (!vtd_bus) {
> -        goto out;
> -    }
> -
> -    vtd_as = vtd_bus->dev_as[VTD_SID_TO_DEVFN(source_id)];
> +    vtd_as = g_hash_table_lookup(s->vtd_as, &key);

I'm slightly confused - what I read below tells me that the key is "struct
vtd_as_key" at [1] below, but here it's a uintptr_t contains the SID.  I don't
think they match.. Did I misread something?

>      if (!vtd_as) {
>          goto out;
>      }
> @@ -1876,11 +1842,10 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
>                                            uint16_t source_id,
>                                            uint16_t func_mask)
>  {
> +    GHashTableIter as_it;
>      uint16_t mask;
> -    VTDBus *vtd_bus;
>      VTDAddressSpace *vtd_as;
>      uint8_t bus_n, devfn;
> -    uint16_t devfn_it;
>  
>      trace_vtd_inv_desc_cc_devices(source_id, func_mask);
>  
> @@ -1903,32 +1868,31 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
>      mask = ~mask;
>  
>      bus_n = VTD_SID_TO_BUS(source_id);
> -    vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> -    if (vtd_bus) {
> -        devfn = VTD_SID_TO_DEVFN(source_id);
> -        for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
> -            vtd_as = vtd_bus->dev_as[devfn_it];
> -            if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
> -                trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
> -                                             VTD_PCI_FUNC(devfn_it));
> -                vtd_iommu_lock(s);
> -                vtd_as->context_cache_entry.context_cache_gen = 0;
> -                vtd_iommu_unlock(s);
> -                /*
> -                 * Do switch address space when needed, in case if the
> -                 * device passthrough bit is switched.
> -                 */
> -                vtd_switch_address_space(vtd_as);
> -                /*
> -                 * So a device is moving out of (or moving into) a
> -                 * domain, resync the shadow page table.
> -                 * This won't bring bad even if we have no such
> -                 * notifier registered - the IOMMU notification
> -                 * framework will skip MAP notifications if that
> -                 * happened.
> -                 */
> -                vtd_sync_shadow_page_table(vtd_as);
> -            }
> +    devfn = VTD_SID_TO_DEVFN(source_id);
> +
> +    g_hash_table_iter_init(&as_it, s->vtd_as);
> +    while (g_hash_table_iter_next(&as_it, NULL, (void**)&vtd_as)) {
> +        if ((pci_bus_num(vtd_as->bus) == bus_n) &&
> +            (vtd_as->devfn & mask) == (devfn & mask)) {
> +            trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(vtd_as->devfn),
> +                                         VTD_PCI_FUNC(vtd_as->devfn));
> +            vtd_iommu_lock(s);
> +            vtd_as->context_cache_entry.context_cache_gen = 0;
> +            vtd_iommu_unlock(s);
> +            /*
> +             * Do switch address space when needed, in case if the
> +             * device passthrough bit is switched.
> +             */
> +            vtd_switch_address_space(vtd_as);
> +            /*
> +             * So a device is moving out of (or moving into) a
> +             * domain, resync the shadow page table.
> +             * This won't bring bad even if we have no such
> +             * notifier registered - the IOMMU notification
> +             * framework will skip MAP notifications if that
> +             * happened.
> +             */
> +            vtd_sync_shadow_page_table(vtd_as);
>          }
>      }
>  }
> @@ -2442,18 +2406,14 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>  {
>      VTDAddressSpace *vtd_dev_as;
>      IOMMUTLBEvent event;
> -    struct VTDBus *vtd_bus;
> +    uintptr_t key;
>      hwaddr addr;
>      uint64_t sz;
>      uint16_t sid;
> -    uint8_t devfn;
>      bool size;
> -    uint8_t bus_num;
>  
>      addr = VTD_INV_DESC_DEVICE_IOTLB_ADDR(inv_desc->hi);
>      sid = VTD_INV_DESC_DEVICE_IOTLB_SID(inv_desc->lo);
> -    devfn = sid & 0xff;
> -    bus_num = sid >> 8;
>      size = VTD_INV_DESC_DEVICE_IOTLB_SIZE(inv_desc->hi);
>  
>      if ((inv_desc->lo & VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO) ||
> @@ -2464,12 +2424,8 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>          return false;
>      }
>  
> -    vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
> -    if (!vtd_bus) {
> -        goto done;
> -    }
> -
> -    vtd_dev_as = vtd_bus->dev_as[devfn];
> +    key = (uintptr_t)sid;
> +    vtd_dev_as = g_hash_table_lookup(s->vtd_as, &key);

Same question here.

>      if (!vtd_dev_as) {
>          goto done;
>      }
> @@ -3390,27 +3346,22 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
>  
>  VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>  {
> -    uintptr_t key = (uintptr_t)bus;
> -    VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
> -    VTDAddressSpace *vtd_dev_as;
> +    struct vtd_as_key key = {
> +        .bus = bus,
> +        .devfn = devfn,
> +    };
> +    VTDAddressSpace *vtd_dev_as = g_hash_table_lookup(s->vtd_as, &key);
>      char name[128];
>  
> -    if (!vtd_bus) {
> -        uintptr_t *new_key = g_malloc(sizeof(*new_key));
> -        *new_key = (uintptr_t)bus;
> -        /* No corresponding free() */
> -        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
> -                            PCI_DEVFN_MAX);
> -        vtd_bus->bus = bus;
> -        g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
> -    }
> +    if (!vtd_dev_as) {
> +        struct vtd_as_key *new_key = g_malloc(sizeof(*new_key));
>  
> -    vtd_dev_as = vtd_bus->dev_as[devfn];
> +        new_key->bus = bus;
> +        new_key->devfn = devfn;
>  
> -    if (!vtd_dev_as) {
>          snprintf(name, sizeof(name), "vtd-%02x.%x", PCI_SLOT(devfn),
>                   PCI_FUNC(devfn));
> -        vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
> +        vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
>  
>          vtd_dev_as->bus = bus;
>          vtd_dev_as->devfn = (uint8_t)devfn;
> @@ -3466,6 +3417,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>                                              &vtd_dev_as->nodmar, 0);
>  
>          vtd_switch_address_space(vtd_dev_as);
> +
> +        g_hash_table_insert(s->vtd_as, new_key, vtd_dev_as);

[1]

>      }
>      return vtd_dev_as;
>  }
> @@ -3750,6 +3703,7 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
>  
>      vtd_as = vtd_find_add_as(s, bus, devfn);
> +
>      return &vtd_as->as;
>  }
>  
> @@ -3835,7 +3789,6 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>  
>      QLIST_INIT(&s->vtd_as_with_notifiers);
>      qemu_mutex_init(&s->iommu_lock);
> -    memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
>      memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
>                            "intel_iommu", DMAR_REG_SIZE);
>  
> @@ -3857,8 +3810,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      /* No corresponding destroy */
>      s->iotlb = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
>                                       g_free, g_free);
> -    s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
> -                                              g_free, g_free);
> +    s->vtd_as = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
> +                                      g_free, g_free);
>      vtd_init(s);
>      sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
>      pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 41783ee46d..014d77dc2a 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -58,7 +58,6 @@ typedef struct VTDContextEntry VTDContextEntry;
>  typedef struct VTDContextCacheEntry VTDContextCacheEntry;
>  typedef struct VTDAddressSpace VTDAddressSpace;
>  typedef struct VTDIOTLBEntry VTDIOTLBEntry;
> -typedef struct VTDBus VTDBus;
>  typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
>  typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
>  typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
> @@ -111,12 +110,6 @@ struct VTDAddressSpace {
>      IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
>  };
>  
> -struct VTDBus {
> -    PCIBus* bus;		/* A reference to the bus to provide translation for */
> -    /* A table of VTDAddressSpace objects indexed by devfn */
> -    VTDAddressSpace *dev_as[];
> -};
> -
>  struct VTDIOTLBEntry {
>      uint64_t gfn;
>      uint16_t domain_id;
> @@ -252,8 +245,7 @@ struct IntelIOMMUState {
>      uint32_t context_cache_gen;     /* Should be in [1,MAX] */
>      GHashTable *iotlb;              /* IOTLB */
>  
> -    GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
> -    VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
> +    GHashTable *vtd_as;             /* VTD address space indexed by source id*/
>      /* list of registered notifiers */
>      QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
>  
> -- 
> 2.25.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-05  4:19 ` [PATCH 3/3] intel-iommu: PASID support Jason Wang
@ 2022-01-13  5:06   ` Peter Xu
  2022-01-13  7:16     ` Michael S. Tsirkin
  2022-01-14  2:47     ` Jason Wang
  0 siblings, 2 replies; 28+ messages in thread
From: Peter Xu @ 2022-01-13  5:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst

On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> @@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>          cc_entry->context_cache_gen = s->context_cache_gen;
>      }
>  
> +    /* Try to fetch slpte form IOTLB */
> +    if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> +        pasid = VTD_CE_GET_RID2PASID(&ce);
> +    }
> +
>      /*
>       * We don't need to translate for pass-through context entries.
>       * Also, let's ignore IOTLB caching as well for PT devices.
>       */
> -    if (vtd_dev_pt_enabled(s, &ce)) {
> +    if (vtd_dev_pt_enabled(s, &ce, pasid)) {
>          entry->iova = addr & VTD_PAGE_MASK_4K;
>          entry->translated_addr = entry->iova;
>          entry->addr_mask = ~VTD_PAGE_MASK_4K;
> @@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>          return true;
>      }
>  
> +    iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> +    if (iotlb_entry) {
> +        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> +                                 iotlb_entry->domain_id);
> +        slpte = iotlb_entry->slpte;
> +        access_flags = iotlb_entry->access_flags;
> +        page_mask = iotlb_entry->mask;
> +        goto out;
> +    }

IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
we'll need to fetch the default pasid from the context entry.  That looks
reasonable.

It's just a bit of pity because logically it'll slow down iotlb hits due to
context entry operations.  When NO_PASID we could have looked up iotlb without
checking pasid at all, assuming that "default pasid" will always match.  But
that is a little bit hacky.

vIOMMU seems to be mostly used for assigned devices and dpdk in production in
the future due to its slowness otherwise.. so maybe not a big deal at all.

[...]

> @@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>      vtd_iommu_lock(s);
>      g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
>      vtd_iommu_unlock(s);
> -    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
> +}
> +
> +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> +                                            uint16_t domain_id,
> +                                            hwaddr addr, uint8_t am,
> +                                            uint32_t pasid)
> +{
> +    VTDIOTLBPageInvInfo info;
> +
> +    trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> +
> +    assert(am <= VTD_MAMV);
> +    info.domain_id = domain_id;
> +    info.addr = addr;
> +    info.mask = ~((1 << am) - 1);
> +    info.pasid = pasid;
> +    vtd_iommu_lock(s);
> +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, &info);
> +    vtd_iommu_unlock(s);
> +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);

Hmm, I think indeed we need a notification, but it'll be unnecessary for
e.g. vfio map notifiers, because this is 1st level invalidation and at least so
far vfio map notifiers are rewalking only the 2nd level page table, so it'll be
destined to be a no-op and pure overhead.

> +}
> +
> +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> +                                       uint32_t pasid)
> +{
> +    VTDIOTLBPageInvInfo info;
> +    VTDAddressSpace *vtd_as;
> +    VTDContextEntry ce;
> +
> +    trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
> +
> +    info.domain_id = domain_id;
> +    info.pasid = pasid;
> +    vtd_iommu_lock(s);
> +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
> +    vtd_iommu_unlock(s);
> +
> +    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
> +        if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> +                                      vtd_as->devfn, &ce) &&
> +            domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
> +            pasid == vtd_as->pasid) {
> +            vtd_sync_shadow_page_table(vtd_as);

Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
if we got the 1st level pgtable invalidated?

> +        }
> +    }
>  }

The rest looks mostly good to me; thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-13  3:35   ` Peter Xu
@ 2022-01-13  6:16     ` Jason Wang
  2022-01-13  6:32       ` Peter Xu
  2022-01-13  7:05     ` Michael S. Tsirkin
  1 sibling, 1 reply; 28+ messages in thread
From: Jason Wang @ 2022-01-13  6:16 UTC (permalink / raw)
  To: Peter Xu; +Cc: Liu, Yi L, yi.y.sun, qemu-devel, mst

On Thu, Jan 13, 2022 at 11:35 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jan 05, 2022 at 12:19:43PM +0800, Jason Wang wrote:
> > We use to warn on wrong rid2pasid entry. But this error could be
> > triggered by the guest and could happens during initialization. So
> > let's don't warn in this case.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 4c6c016388..f2c7a23712 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
> >      if (s->root_scalable) {
> >          ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
> >          if (ret) {
> > -            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
> > -                              __func__, ret);
> > +            /*
> > +             * This error is guest triggerable. We should assumt PT
> > +             * not enabled for safety.
> > +             */
> >              return false;
> >          }
> >          return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
> > --
> > 2.25.1
> >
>
> No strong opinion, but the thing is mostly all error_report_once() in this file
> is guest triggerable.  If we remove this one then it's debatable on whether we
> want to remove all.
>
> IMHO we used the _once() variant just for this: it won't go into any form of
> DoS, meanwhile we'll still get some information (as hypervisor) that the guest
> OS may not be trustworthy.
>
> So from that pov it's still useful?  Or is this error very special in some way?

I want to be consistent with vtd_as_pt_enabled() where we don't even
have error_report_once().

Thanks

>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-13  6:16     ` Jason Wang
@ 2022-01-13  6:32       ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2022-01-13  6:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: Liu, Yi L, yi.y.sun, qemu-devel, mst

On Thu, Jan 13, 2022 at 02:16:35PM +0800, Jason Wang wrote:
> On Thu, Jan 13, 2022 at 11:35 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Jan 05, 2022 at 12:19:43PM +0800, Jason Wang wrote:
> > > We use to warn on wrong rid2pasid entry. But this error could be
> > > triggered by the guest and could happens during initialization. So
> > > let's don't warn in this case.
> > >
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  hw/i386/intel_iommu.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index 4c6c016388..f2c7a23712 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
> > >      if (s->root_scalable) {
> > >          ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
> > >          if (ret) {
> > > -            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
> > > -                              __func__, ret);
> > > +            /*
> > > +             * This error is guest triggerable. We should assumt PT
> > > +             * not enabled for safety.
> > > +             */
> > >              return false;
> > >          }
> > >          return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
> > > --
> > > 2.25.1
> > >
> >
> > No strong opinion, but the thing is mostly all error_report_once() in this file
> > is guest triggerable.  If we remove this one then it's debatable on whether we
> > want to remove all.
> >
> > IMHO we used the _once() variant just for this: it won't go into any form of
> > DoS, meanwhile we'll still get some information (as hypervisor) that the guest
> > OS may not be trustworthy.
> >
> > So from that pov it's still useful?  Or is this error very special in some way?
> 
> I want to be consistent with vtd_as_pt_enabled() where we don't even
> have error_report_once().

According to the comments (which, I think, left over by myself..) in
vtd_as_pt_enabled() - maybe indeed it could be triggered during guest boot.
Then I assume this can indeed a special case.

Acked-by: Peter Xu <peterx@redhat.com>

Thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-13  3:35   ` Peter Xu
  2022-01-13  6:16     ` Jason Wang
@ 2022-01-13  7:05     ` Michael S. Tsirkin
  2022-01-14  3:02       ` Jason Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13  7:05 UTC (permalink / raw)
  To: Peter Xu; +Cc: Jason Wang, yi.l.liu, yi.y.sun, qemu-devel

On Thu, Jan 13, 2022 at 11:35:19AM +0800, Peter Xu wrote:
> On Wed, Jan 05, 2022 at 12:19:43PM +0800, Jason Wang wrote:
> > We use to warn on wrong rid2pasid entry. But this error could be
> > triggered by the guest and could happens during initialization. So
> > let's don't warn in this case.
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 4c6c016388..f2c7a23712 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
> >      if (s->root_scalable) {
> >          ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
> >          if (ret) {
> > -            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
> > -                              __func__, ret);
> > +            /*
> > +             * This error is guest triggerable. We should assumt PT
> > +             * not enabled for safety.
> > +             */
> >              return false;
> >          }
> >          return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
> > -- 
> > 2.25.1
> > 
> 
> No strong opinion, but the thing is mostly all error_report_once() in this file
> is guest triggerable.  If we remove this one then it's debatable on whether we
> want to remove all.
> 
> IMHO we used the _once() variant just for this: it won't go into any form of
> DoS, meanwhile we'll still get some information (as hypervisor) that the guest
> OS may not be trustworthy.
> 
> So from that pov it's still useful?  Or is this error very special in some way?
> 
> Thanks,


Well we have LOG_GUEST_ERROR for guest errors now.

> -- 
> Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-05  4:19 ` [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry Jason Wang
  2022-01-13  3:35   ` Peter Xu
@ 2022-01-13  7:06   ` Michael S. Tsirkin
  2022-01-14  2:56     ` Jason Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13  7:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: yi.l.liu, yi.y.sun, qemu-devel, peterx

On Wed, Jan 05, 2022 at 12:19:43PM +0800, Jason Wang wrote:
> We use to warn on wrong rid2pasid entry. But this error could be
> triggered by the guest and could happens during initialization. So
> let's don't warn in this case.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 4c6c016388..f2c7a23712 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
>      if (s->root_scalable) {
>          ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
>          if (ret) {
> -            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
> -                              __func__, ret);
> +            /*
> +             * This error is guest triggerable. We should assumt PT

typo

And drop "We should" pls, just use direct voice:
"Assume PT not enabled".


> +             * not enabled for safety.
> +             */
>              return false;
>          }
>          return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
> -- 
> 2.25.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-13  5:06   ` Peter Xu
@ 2022-01-13  7:16     ` Michael S. Tsirkin
  2022-01-14  2:47     ` Jason Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13  7:16 UTC (permalink / raw)
  To: Peter Xu; +Cc: Jason Wang, yi.l.liu, yi.y.sun, qemu-devel

On Thu, Jan 13, 2022 at 01:06:09PM +0800, Peter Xu wrote:
> On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> > @@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> >          cc_entry->context_cache_gen = s->context_cache_gen;
> >      }
> >  
> > +    /* Try to fetch slpte form IOTLB */
> > +    if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> > +        pasid = VTD_CE_GET_RID2PASID(&ce);
> > +    }
> > +
> >      /*
> >       * We don't need to translate for pass-through context entries.
> >       * Also, let's ignore IOTLB caching as well for PT devices.
> >       */
> > -    if (vtd_dev_pt_enabled(s, &ce)) {
> > +    if (vtd_dev_pt_enabled(s, &ce, pasid)) {
> >          entry->iova = addr & VTD_PAGE_MASK_4K;
> >          entry->translated_addr = entry->iova;
> >          entry->addr_mask = ~VTD_PAGE_MASK_4K;
> > @@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> >          return true;
> >      }
> >  
> > +    iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> > +    if (iotlb_entry) {
> > +        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> > +                                 iotlb_entry->domain_id);
> > +        slpte = iotlb_entry->slpte;
> > +        access_flags = iotlb_entry->access_flags;
> > +        page_mask = iotlb_entry->mask;
> > +        goto out;
> > +    }
> 
> IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
> we'll need to fetch the default pasid from the context entry.  That looks
> reasonable.
> 
> It's just a bit of pity because logically it'll slow down iotlb hits due to
> context entry operations.  When NO_PASID we could have looked up iotlb without
> checking pasid at all, assuming that "default pasid" will always match.  But
> that is a little bit hacky.

Maybe that's not a bad idea for an optimization.


> vIOMMU seems to be mostly used for assigned devices and dpdk in production in
> the future due to its slowness otherwise.. so maybe not a big deal at all.
> 
> [...]
> 
> > @@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> >      vtd_iommu_lock(s);
> >      g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> >      vtd_iommu_unlock(s);
> > -    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> > +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
> > +}
> > +
> > +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> > +                                            uint16_t domain_id,
> > +                                            hwaddr addr, uint8_t am,
> > +                                            uint32_t pasid)
> > +{
> > +    VTDIOTLBPageInvInfo info;
> > +
> > +    trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> > +
> > +    assert(am <= VTD_MAMV);
> > +    info.domain_id = domain_id;
> > +    info.addr = addr;
> > +    info.mask = ~((1 << am) - 1);
> > +    info.pasid = pasid;
> > +    vtd_iommu_lock(s);
> > +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, &info);
> > +    vtd_iommu_unlock(s);
> > +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
> 
> Hmm, I think indeed we need a notification, but it'll be unnecessary for
> e.g. vfio map notifiers, because this is 1st level invalidation and at least so
> far vfio map notifiers are rewalking only the 2nd level page table, so it'll be
> destined to be a no-op and pure overhead.
> 
> > +}
> > +
> > +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > +                                       uint32_t pasid)
> > +{
> > +    VTDIOTLBPageInvInfo info;
> > +    VTDAddressSpace *vtd_as;
> > +    VTDContextEntry ce;
> > +
> > +    trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
> > +
> > +    info.domain_id = domain_id;
> > +    info.pasid = pasid;
> > +    vtd_iommu_lock(s);
> > +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
> > +    vtd_iommu_unlock(s);
> > +
> > +    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
> > +        if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > +                                      vtd_as->devfn, &ce) &&
> > +            domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
> > +            pasid == vtd_as->pasid) {
> > +            vtd_sync_shadow_page_table(vtd_as);
> 
> Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
> if we got the 1st level pgtable invalidated?
> 
> > +        }
> > +    }
> >  }
> 
> The rest looks mostly good to me; thanks.
> 
> -- 
> Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] intel-iommu: drop VTDBus
  2022-01-13  4:12   ` Peter Xu
@ 2022-01-14  2:32     ` Jason Wang
  2022-01-14  9:15       ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2022-01-14  2:32 UTC (permalink / raw)
  To: Peter Xu; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst


在 2022/1/13 下午12:12, Peter Xu 写道:
> On Wed, Jan 05, 2022 at 12:19:44PM +0800, Jason Wang wrote:
>> We introduce VTDBus structure as an intermediate step for searching
>> the address space. This works well with SID based matching/lookup. But
>> when we want to support SID plus PASID based address space lookup,
>> this intermediate steps turns out to be a burden. So the patch simply
>> drops the VTDBus structure and use the PCIBus and devfn as the key for
>> the g_hash_table(). This simplifies the codes and the future PASID
>> extension.
>>
>> This may case slight slower for the vtd_find_as_from_bus_num() callers
>> but since they are all called from the control path, we can afford it.
> The only one I found is vtd_process_device_iotlb_desc() that may got affected
> the most; the rest look mostly always traversing all the address space anyway
> so shouldn't hurt.
>
> I think dev-iotlb can be invalidated in IO path too when kernel device drivers
> are used?  It shouldn't affect much when the VM has a few devices, but IIUC
> it'll further slow down the kernel drivers on vIOMMU.  Maybe it's not a huge
> problem either.


Maybe we can keep maintaining a cache for some speedup for the searching 
for NO_PASID.


>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   hw/i386/intel_iommu.c         | 183 +++++++++++++---------------------
>>   include/hw/i386/intel_iommu.h |  10 +-
>>   2 files changed, 69 insertions(+), 124 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index f2c7a23712..58c682097b 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -61,6 +61,11 @@
>>       }                                                                         \
>>   }
>>   
>> +struct vtd_as_key {
>> +    PCIBus *bus;
>> +    uint8_t devfn;
>> +};
>> +
>>   static void vtd_address_space_refresh_all(IntelIOMMUState *s);
>>   static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
>>   
>> @@ -190,12 +195,18 @@ static inline gboolean vtd_as_has_map_notifier(VTDAddressSpace *as)
>>   /* GHashTable functions */
>>   static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>>   {
>> -    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
>> +    const struct vtd_as_key *key1 = v1;
>> +    const struct vtd_as_key *key2 = v2;
>> +
>> +    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
>>   }
>>   
>>   static guint vtd_uint64_hash(gconstpointer v)
>>   {
>> -    return (guint)*(const uint64_t *)v;
>> +    const struct vtd_as_key *key = v;
>> +    guint value = (guint)(uintptr_t)key->bus;
>> +
>> +    return (guint)(value << 8 | key->devfn);
> Note that value is a pointer to PCIBus*.  Just want to check with you that it's
> intended to use this hash value (or maybe you wanted to use Source ID so it is
> bus number to use not the bus pointer)?


Right, SID should be used here.


>
>>   }
>>   
>>   static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
>> @@ -236,22 +247,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
>>   static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
>>   {
>>       VTDAddressSpace *vtd_as;
>> -    VTDBus *vtd_bus;
>> -    GHashTableIter bus_it;
>> -    uint32_t devfn_it;
>> +    GHashTableIter as_it;
>>   
>>       trace_vtd_context_cache_reset();
>>   
>> -    g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr);
>> +    g_hash_table_iter_init(&as_it, s->vtd_as);
>>   
>> -    while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
>> -        for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
>> -            vtd_as = vtd_bus->dev_as[devfn_it];
>> -            if (!vtd_as) {
>> -                continue;
>> -            }
>> -            vtd_as->context_cache_entry.context_cache_gen = 0;
>> -        }
>> +    while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) {
>> +        vtd_as->context_cache_entry.context_cache_gen = 0;
>>       }
>>       s->context_cache_gen = 1;
>>   }
>> @@ -986,32 +989,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
>>       return slpte & rsvd_mask;
>>   }
>>   
>> -/* Find the VTD address space associated with a given bus number */
>> -static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
>> -{
>> -    VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
>> -    GHashTableIter iter;
>> -
>> -    if (vtd_bus) {
>> -        return vtd_bus;
>> -    }
>> -
>> -    /*
>> -     * Iterate over the registered buses to find the one which
>> -     * currently holds this bus number and update the bus_num
>> -     * lookup table.
>> -     */
>> -    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
>> -    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
>> -        if (pci_bus_num(vtd_bus->bus) == bus_num) {
>> -            s->vtd_as_by_bus_num[bus_num] = vtd_bus;
>> -            return vtd_bus;
>> -        }
>> -    }
>> -
>> -    return NULL;
>> -}
>> -
>>   /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
>>    * of the translation, can be used for deciding the size of large page.
>>    */
>> @@ -1604,18 +1581,12 @@ static bool vtd_switch_address_space(VTDAddressSpace *as)
>>   
>>   static void vtd_switch_address_space_all(IntelIOMMUState *s)
>>   {
>> +    VTDAddressSpace *vtd_as;
>>       GHashTableIter iter;
>> -    VTDBus *vtd_bus;
>> -    int i;
>> -
>> -    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
>> -    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
>> -        for (i = 0; i < PCI_DEVFN_MAX; i++) {
>> -            if (!vtd_bus->dev_as[i]) {
>> -                continue;
>> -            }
>> -            vtd_switch_address_space(vtd_bus->dev_as[i]);
>> -        }
>> +
>> +    g_hash_table_iter_init(&iter, s->vtd_as);
>> +    while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_as)) {
>> +        vtd_switch_address_space(vtd_as);
>>       }
>>   }
>>   
>> @@ -1659,16 +1630,11 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
>>   
>>   static void vtd_pt_enable_fast_path(IntelIOMMUState *s, uint16_t source_id)
>>   {
>> -    VTDBus *vtd_bus;
>>       VTDAddressSpace *vtd_as;
>>       bool success = false;
>> +    uintptr_t key = (uintptr_t)source_id;
>>   
>> -    vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
>> -    if (!vtd_bus) {
>> -        goto out;
>> -    }
>> -
>> -    vtd_as = vtd_bus->dev_as[VTD_SID_TO_DEVFN(source_id)];
>> +    vtd_as = g_hash_table_lookup(s->vtd_as, &key);
> I'm slightly confused - what I read below tells me that the key is "struct
> vtd_as_key" at [1] below, but here it's a uintptr_t contains the SID.  I don't
> think they match.. Did I misread something?


Nope, it looks like a bug, it looks to me we can't simply use SID here 
since the key requires a pointer to PCIBus. The reason that we can't 
simply use SID here is that the dma_as is initialized before the guest 
can initialize the root port where we may end up with wrong bus number. 
I will fix this by using g_hash_table_find() which could be slow but 
consider this function is not a frequent operation it should be acceptable.


>
>>       if (!vtd_as) {
>>           goto out;
>>       }
>> @@ -1876,11 +1842,10 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
>>                                             uint16_t source_id,
>>                                             uint16_t func_mask)
>>   {
>> +    GHashTableIter as_it;
>>       uint16_t mask;
>> -    VTDBus *vtd_bus;
>>       VTDAddressSpace *vtd_as;
>>       uint8_t bus_n, devfn;
>> -    uint16_t devfn_it;
>>   
>>       trace_vtd_inv_desc_cc_devices(source_id, func_mask);
>>   
>> @@ -1903,32 +1868,31 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
>>       mask = ~mask;
>>   
>>       bus_n = VTD_SID_TO_BUS(source_id);
>> -    vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
>> -    if (vtd_bus) {
>> -        devfn = VTD_SID_TO_DEVFN(source_id);
>> -        for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
>> -            vtd_as = vtd_bus->dev_as[devfn_it];
>> -            if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
>> -                trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
>> -                                             VTD_PCI_FUNC(devfn_it));
>> -                vtd_iommu_lock(s);
>> -                vtd_as->context_cache_entry.context_cache_gen = 0;
>> -                vtd_iommu_unlock(s);
>> -                /*
>> -                 * Do switch address space when needed, in case if the
>> -                 * device passthrough bit is switched.
>> -                 */
>> -                vtd_switch_address_space(vtd_as);
>> -                /*
>> -                 * So a device is moving out of (or moving into) a
>> -                 * domain, resync the shadow page table.
>> -                 * This won't bring bad even if we have no such
>> -                 * notifier registered - the IOMMU notification
>> -                 * framework will skip MAP notifications if that
>> -                 * happened.
>> -                 */
>> -                vtd_sync_shadow_page_table(vtd_as);
>> -            }
>> +    devfn = VTD_SID_TO_DEVFN(source_id);
>> +
>> +    g_hash_table_iter_init(&as_it, s->vtd_as);
>> +    while (g_hash_table_iter_next(&as_it, NULL, (void**)&vtd_as)) {
>> +        if ((pci_bus_num(vtd_as->bus) == bus_n) &&
>> +            (vtd_as->devfn & mask) == (devfn & mask)) {
>> +            trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(vtd_as->devfn),
>> +                                         VTD_PCI_FUNC(vtd_as->devfn));
>> +            vtd_iommu_lock(s);
>> +            vtd_as->context_cache_entry.context_cache_gen = 0;
>> +            vtd_iommu_unlock(s);
>> +            /*
>> +             * Do switch address space when needed, in case if the
>> +             * device passthrough bit is switched.
>> +             */
>> +            vtd_switch_address_space(vtd_as);
>> +            /*
>> +             * So a device is moving out of (or moving into) a
>> +             * domain, resync the shadow page table.
>> +             * This won't bring bad even if we have no such
>> +             * notifier registered - the IOMMU notification
>> +             * framework will skip MAP notifications if that
>> +             * happened.
>> +             */
>> +            vtd_sync_shadow_page_table(vtd_as);
>>           }
>>       }
>>   }
>> @@ -2442,18 +2406,14 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>>   {
>>       VTDAddressSpace *vtd_dev_as;
>>       IOMMUTLBEvent event;
>> -    struct VTDBus *vtd_bus;
>> +    uintptr_t key;
>>       hwaddr addr;
>>       uint64_t sz;
>>       uint16_t sid;
>> -    uint8_t devfn;
>>       bool size;
>> -    uint8_t bus_num;
>>   
>>       addr = VTD_INV_DESC_DEVICE_IOTLB_ADDR(inv_desc->hi);
>>       sid = VTD_INV_DESC_DEVICE_IOTLB_SID(inv_desc->lo);
>> -    devfn = sid & 0xff;
>> -    bus_num = sid >> 8;
>>       size = VTD_INV_DESC_DEVICE_IOTLB_SIZE(inv_desc->hi);
>>   
>>       if ((inv_desc->lo & VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO) ||
>> @@ -2464,12 +2424,8 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>>           return false;
>>       }
>>   
>> -    vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
>> -    if (!vtd_bus) {
>> -        goto done;
>> -    }
>> -
>> -    vtd_dev_as = vtd_bus->dev_as[devfn];
>> +    key = (uintptr_t)sid;
>> +    vtd_dev_as = g_hash_table_lookup(s->vtd_as, &key);
> Same question here.


See above reply.

Thanks


>
>>       if (!vtd_dev_as) {
>>           goto done;
>>       }
>> @@ -3390,27 +3346,22 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
>>   
>>   VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>>   {
>> -    uintptr_t key = (uintptr_t)bus;
>> -    VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
>> -    VTDAddressSpace *vtd_dev_as;
>> +    struct vtd_as_key key = {
>> +        .bus = bus,
>> +        .devfn = devfn,
>> +    };
>> +    VTDAddressSpace *vtd_dev_as = g_hash_table_lookup(s->vtd_as, &key);
>>       char name[128];
>>   
>> -    if (!vtd_bus) {
>> -        uintptr_t *new_key = g_malloc(sizeof(*new_key));
>> -        *new_key = (uintptr_t)bus;
>> -        /* No corresponding free() */
>> -        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
>> -                            PCI_DEVFN_MAX);
>> -        vtd_bus->bus = bus;
>> -        g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
>> -    }
>> +    if (!vtd_dev_as) {
>> +        struct vtd_as_key *new_key = g_malloc(sizeof(*new_key));
>>   
>> -    vtd_dev_as = vtd_bus->dev_as[devfn];
>> +        new_key->bus = bus;
>> +        new_key->devfn = devfn;
>>   
>> -    if (!vtd_dev_as) {
>>           snprintf(name, sizeof(name), "vtd-%02x.%x", PCI_SLOT(devfn),
>>                    PCI_FUNC(devfn));
>> -        vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
>> +        vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
>>   
>>           vtd_dev_as->bus = bus;
>>           vtd_dev_as->devfn = (uint8_t)devfn;
>> @@ -3466,6 +3417,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>>                                               &vtd_dev_as->nodmar, 0);
>>   
>>           vtd_switch_address_space(vtd_dev_as);
>> +
>> +        g_hash_table_insert(s->vtd_as, new_key, vtd_dev_as);
> [1]
>
>>       }
>>       return vtd_dev_as;
>>   }
>> @@ -3750,6 +3703,7 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>>       assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
>>   
>>       vtd_as = vtd_find_add_as(s, bus, devfn);
>> +
>>       return &vtd_as->as;
>>   }
>>   
>> @@ -3835,7 +3789,6 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>>   
>>       QLIST_INIT(&s->vtd_as_with_notifiers);
>>       qemu_mutex_init(&s->iommu_lock);
>> -    memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
>>       memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
>>                             "intel_iommu", DMAR_REG_SIZE);
>>   
>> @@ -3857,8 +3810,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>>       /* No corresponding destroy */
>>       s->iotlb = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
>>                                        g_free, g_free);
>> -    s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
>> -                                              g_free, g_free);
>> +    s->vtd_as = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
>> +                                      g_free, g_free);
>>       vtd_init(s);
>>       sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
>>       pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>> index 41783ee46d..014d77dc2a 100644
>> --- a/include/hw/i386/intel_iommu.h
>> +++ b/include/hw/i386/intel_iommu.h
>> @@ -58,7 +58,6 @@ typedef struct VTDContextEntry VTDContextEntry;
>>   typedef struct VTDContextCacheEntry VTDContextCacheEntry;
>>   typedef struct VTDAddressSpace VTDAddressSpace;
>>   typedef struct VTDIOTLBEntry VTDIOTLBEntry;
>> -typedef struct VTDBus VTDBus;
>>   typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
>>   typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
>>   typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
>> @@ -111,12 +110,6 @@ struct VTDAddressSpace {
>>       IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
>>   };
>>   
>> -struct VTDBus {
>> -    PCIBus* bus;		/* A reference to the bus to provide translation for */
>> -    /* A table of VTDAddressSpace objects indexed by devfn */
>> -    VTDAddressSpace *dev_as[];
>> -};
>> -
>>   struct VTDIOTLBEntry {
>>       uint64_t gfn;
>>       uint16_t domain_id;
>> @@ -252,8 +245,7 @@ struct IntelIOMMUState {
>>       uint32_t context_cache_gen;     /* Should be in [1,MAX] */
>>       GHashTable *iotlb;              /* IOTLB */
>>   
>> -    GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
>> -    VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
>> +    GHashTable *vtd_as;             /* VTD address space indexed by source id*/
>>       /* list of registered notifiers */
>>       QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
>>   
>> -- 
>> 2.25.1
>>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-13  5:06   ` Peter Xu
  2022-01-13  7:16     ` Michael S. Tsirkin
@ 2022-01-14  2:47     ` Jason Wang
  2022-01-14  3:31       ` Peter Xu
  1 sibling, 1 reply; 28+ messages in thread
From: Jason Wang @ 2022-01-14  2:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst


在 2022/1/13 下午1:06, Peter Xu 写道:
> On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
>> @@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>>           cc_entry->context_cache_gen = s->context_cache_gen;
>>       }
>>   
>> +    /* Try to fetch slpte form IOTLB */
>> +    if ((pasid == PCI_NO_PASID) && s->root_scalable) {
>> +        pasid = VTD_CE_GET_RID2PASID(&ce);
>> +    }
>> +
>>       /*
>>        * We don't need to translate for pass-through context entries.
>>        * Also, let's ignore IOTLB caching as well for PT devices.
>>        */
>> -    if (vtd_dev_pt_enabled(s, &ce)) {
>> +    if (vtd_dev_pt_enabled(s, &ce, pasid)) {
>>           entry->iova = addr & VTD_PAGE_MASK_4K;
>>           entry->translated_addr = entry->iova;
>>           entry->addr_mask = ~VTD_PAGE_MASK_4K;
>> @@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>>           return true;
>>       }
>>   
>> +    iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
>> +    if (iotlb_entry) {
>> +        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
>> +                                 iotlb_entry->domain_id);
>> +        slpte = iotlb_entry->slpte;
>> +        access_flags = iotlb_entry->access_flags;
>> +        page_mask = iotlb_entry->mask;
>> +        goto out;
>> +    }
> IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
> we'll need to fetch the default pasid from the context entry.  That looks
> reasonable.
>
> It's just a bit of pity because logically it'll slow down iotlb hits due to
> context entry operations.  When NO_PASID we could have looked up iotlb without
> checking pasid at all, assuming that "default pasid" will always match.  But
> that is a little bit hacky.


Right, but I think you meant to do this only when scalable mode is disabled.


>
> vIOMMU seems to be mostly used for assigned devices and dpdk in production in
> the future due to its slowness otherwise.. so maybe not a big deal at all.
>
> [...]
>
>> @@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>>       vtd_iommu_lock(s);
>>       g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
>>       vtd_iommu_unlock(s);
>> -    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
>> +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
>> +}
>> +
>> +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
>> +                                            uint16_t domain_id,
>> +                                            hwaddr addr, uint8_t am,
>> +                                            uint32_t pasid)
>> +{
>> +    VTDIOTLBPageInvInfo info;
>> +
>> +    trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
>> +
>> +    assert(am <= VTD_MAMV);
>> +    info.domain_id = domain_id;
>> +    info.addr = addr;
>> +    info.mask = ~((1 << am) - 1);
>> +    info.pasid = pasid;
>> +    vtd_iommu_lock(s);
>> +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, &info);
>> +    vtd_iommu_unlock(s);
>> +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
> Hmm, I think indeed we need a notification, but it'll be unnecessary for
> e.g. vfio map notifiers, because this is 1st level invalidation and at least so
> far vfio map notifiers are rewalking only the 2nd level page table, so it'll be
> destined to be a no-op and pure overhead.


Right, consider we don't implement l1 and we don't have a 1st level 
abstraction in neither vhost nor vfio, we can simply remove this.


>
>> +}
>> +
>> +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>> +                                       uint32_t pasid)
>> +{
>> +    VTDIOTLBPageInvInfo info;
>> +    VTDAddressSpace *vtd_as;
>> +    VTDContextEntry ce;
>> +
>> +    trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
>> +
>> +    info.domain_id = domain_id;
>> +    info.pasid = pasid;
>> +    vtd_iommu_lock(s);
>> +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
>> +    vtd_iommu_unlock(s);
>> +
>> +    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
>> +        if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
>> +                                      vtd_as->devfn, &ce) &&
>> +            domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
>> +            pasid == vtd_as->pasid) {
>> +            vtd_sync_shadow_page_table(vtd_as);
> Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
> if we got the 1st level pgtable invalidated?


Seems not and this makes me think to remove the whole PASID based 
invalidation logic since they are for L1 which is not implemented in 
this series.


>
>> +        }
>> +    }
>>   }
> The rest looks mostly good to me; thanks.


Thanks


>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-13  7:06   ` Michael S. Tsirkin
@ 2022-01-14  2:56     ` Jason Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-14  2:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: yi.l.liu, yi.y.sun, qemu-devel, peterx


在 2022/1/13 下午3:06, Michael S. Tsirkin 写道:
> On Wed, Jan 05, 2022 at 12:19:43PM +0800, Jason Wang wrote:
>> We use to warn on wrong rid2pasid entry. But this error could be
>> triggered by the guest and could happens during initialization. So
>> let's don't warn in this case.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   hw/i386/intel_iommu.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 4c6c016388..f2c7a23712 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
>>       if (s->root_scalable) {
>>           ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
>>           if (ret) {
>> -            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
>> -                              __func__, ret);
>> +            /*
>> +             * This error is guest triggerable. We should assumt PT
> typo
>
> And drop "We should" pls, just use direct voice:
> "Assume PT not enabled".


Fixed.

Thanks


>
>
>> +             * not enabled for safety.
>> +             */
>>               return false;
>>           }
>>           return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
>> -- 
>> 2.25.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry
  2022-01-13  7:05     ` Michael S. Tsirkin
@ 2022-01-14  3:02       ` Jason Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-14  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin, Peter Xu; +Cc: yi.l.liu, yi.y.sun, qemu-devel


在 2022/1/13 下午3:05, Michael S. Tsirkin 写道:
> On Thu, Jan 13, 2022 at 11:35:19AM +0800, Peter Xu wrote:
>> On Wed, Jan 05, 2022 at 12:19:43PM +0800, Jason Wang wrote:
>>> We use to warn on wrong rid2pasid entry. But this error could be
>>> triggered by the guest and could happens during initialization. So
>>> let's don't warn in this case.
>>>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>>   hw/i386/intel_iommu.c | 6 ++++--
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index 4c6c016388..f2c7a23712 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -1524,8 +1524,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce)
>>>       if (s->root_scalable) {
>>>           ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
>>>           if (ret) {
>>> -            error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
>>> -                              __func__, ret);
>>> +            /*
>>> +             * This error is guest triggerable. We should assumt PT
>>> +             * not enabled for safety.
>>> +             */
>>>               return false;
>>>           }
>>>           return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
>>> -- 
>>> 2.25.1
>>>
>> No strong opinion, but the thing is mostly all error_report_once() in this file
>> is guest triggerable.  If we remove this one then it's debatable on whether we
>> want to remove all.
>>
>> IMHO we used the _once() variant just for this: it won't go into any form of
>> DoS, meanwhile we'll still get some information (as hypervisor) that the guest
>> OS may not be trustworthy.
>>
>> So from that pov it's still useful?  Or is this error very special in some way?
>>
>> Thanks,
>
> Well we have LOG_GUEST_ERROR for guest errors now.


Ok, but this is not necessarily a guest error. (Inferring from the 
comment in vtd_as_pt_enabled()).

Thanks


>
>> -- 
>> Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14  2:47     ` Jason Wang
@ 2022-01-14  3:31       ` Peter Xu
  2022-01-14  5:58         ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2022-01-14  3:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst

On Fri, Jan 14, 2022 at 10:47:44AM +0800, Jason Wang wrote:
> 
> 在 2022/1/13 下午1:06, Peter Xu 写道:
> > On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> > > @@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > >           cc_entry->context_cache_gen = s->context_cache_gen;
> > >       }
> > > +    /* Try to fetch slpte form IOTLB */
> > > +    if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> > > +        pasid = VTD_CE_GET_RID2PASID(&ce);
> > > +    }
> > > +
> > >       /*
> > >        * We don't need to translate for pass-through context entries.
> > >        * Also, let's ignore IOTLB caching as well for PT devices.
> > >        */
> > > -    if (vtd_dev_pt_enabled(s, &ce)) {
> > > +    if (vtd_dev_pt_enabled(s, &ce, pasid)) {
> > >           entry->iova = addr & VTD_PAGE_MASK_4K;
> > >           entry->translated_addr = entry->iova;
> > >           entry->addr_mask = ~VTD_PAGE_MASK_4K;
> > > @@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > >           return true;
> > >       }
> > > +    iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> > > +    if (iotlb_entry) {
> > > +        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> > > +                                 iotlb_entry->domain_id);
> > > +        slpte = iotlb_entry->slpte;
> > > +        access_flags = iotlb_entry->access_flags;
> > > +        page_mask = iotlb_entry->mask;
> > > +        goto out;
> > > +    }
> > IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
> > we'll need to fetch the default pasid from the context entry.  That looks
> > reasonable.
> > 
> > It's just a bit of pity because logically it'll slow down iotlb hits due to
> > context entry operations.  When NO_PASID we could have looked up iotlb without
> > checking pasid at all, assuming that "default pasid" will always match.  But
> > that is a little bit hacky.
> 
> 
> Right, but I think you meant to do this only when scalable mode is disabled.

Yes IMHO it will definitely suite for !scalable case since that's exactly what
we did before.  What I'm also wondering is even if scalable is enabled but no
"real" pasid is used, so if all the translations go through the default pasid
that stored in the device context entry, then maybe we can ignore checking it.
The latter is the "hacky" part mentioned above.

The other thing to mention is, if we postpone the iotlb lookup to be after
context entry, then logically we can have per-device iotlb, that means we can
replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
which can also be more efficient.

Not sure whether Michael will have a preference, for me I think either way can
be done on top.

> 
> 
> > 
> > vIOMMU seems to be mostly used for assigned devices and dpdk in production in
> > the future due to its slowness otherwise.. so maybe not a big deal at all.
> > 
> > [...]
> > 
> > > @@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > >       vtd_iommu_lock(s);
> > >       g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> > >       vtd_iommu_unlock(s);
> > > -    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> > > +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
> > > +}
> > > +
> > > +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> > > +                                            uint16_t domain_id,
> > > +                                            hwaddr addr, uint8_t am,
> > > +                                            uint32_t pasid)
> > > +{
> > > +    VTDIOTLBPageInvInfo info;
> > > +
> > > +    trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> > > +
> > > +    assert(am <= VTD_MAMV);
> > > +    info.domain_id = domain_id;
> > > +    info.addr = addr;
> > > +    info.mask = ~((1 << am) - 1);
> > > +    info.pasid = pasid;
> > > +    vtd_iommu_lock(s);
> > > +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, &info);
> > > +    vtd_iommu_unlock(s);
> > > +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
> > Hmm, I think indeed we need a notification, but it'll be unnecessary for
> > e.g. vfio map notifiers, because this is 1st level invalidation and at least so
> > far vfio map notifiers are rewalking only the 2nd level page table, so it'll be
> > destined to be a no-op and pure overhead.
> 
> 
> Right, consider we don't implement l1 and we don't have a 1st level
> abstraction in neither vhost nor vfio, we can simply remove this.

We probably still need the real pasid invalidation parts in the future?  Either
for vhost (if vhost will going to cache pasid-based translations), or for
compatible assigned devices in the future where the HW can cache it.

I'm not sure what's the best way to do this, yet. Perhaps adding a new field to
vtd_iotlb_page_invalidate_notify() telling whether this is pasid-based or not
(basically, an invalidation for 1st or 2nd level pgtable)?  Then if it is
pasid-based, we could opt-out for the shadow page walking.

But as you mentioned we could also postpone it to the future.  Your call. :-)

Thanks,

> 
> 
> > 
> > > +}
> > > +
> > > +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > > +                                       uint32_t pasid)
> > > +{
> > > +    VTDIOTLBPageInvInfo info;
> > > +    VTDAddressSpace *vtd_as;
> > > +    VTDContextEntry ce;
> > > +
> > > +    trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
> > > +
> > > +    info.domain_id = domain_id;
> > > +    info.pasid = pasid;
> > > +    vtd_iommu_lock(s);
> > > +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
> > > +    vtd_iommu_unlock(s);
> > > +
> > > +    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
> > > +        if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > > +                                      vtd_as->devfn, &ce) &&
> > > +            domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
> > > +            pasid == vtd_as->pasid) {
> > > +            vtd_sync_shadow_page_table(vtd_as);
> > Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
> > if we got the 1st level pgtable invalidated?
> 
> 
> Seems not and this makes me think to remove the whole PASID based
> invalidation logic since they are for L1 which is not implemented in this
> series.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14  3:31       ` Peter Xu
@ 2022-01-14  5:58         ` Jason Wang
  2022-01-14  7:13           ` Peter Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2022-01-14  5:58 UTC (permalink / raw)
  To: Peter Xu; +Cc: Liu, Yi L, yi.y.sun, qemu-devel, mst

On Fri, Jan 14, 2022 at 11:31 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jan 14, 2022 at 10:47:44AM +0800, Jason Wang wrote:
> >
> > 在 2022/1/13 下午1:06, Peter Xu 写道:
> > > On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> > > > @@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > >           cc_entry->context_cache_gen = s->context_cache_gen;
> > > >       }
> > > > +    /* Try to fetch slpte form IOTLB */
> > > > +    if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> > > > +        pasid = VTD_CE_GET_RID2PASID(&ce);
> > > > +    }
> > > > +
> > > >       /*
> > > >        * We don't need to translate for pass-through context entries.
> > > >        * Also, let's ignore IOTLB caching as well for PT devices.
> > > >        */
> > > > -    if (vtd_dev_pt_enabled(s, &ce)) {
> > > > +    if (vtd_dev_pt_enabled(s, &ce, pasid)) {
> > > >           entry->iova = addr & VTD_PAGE_MASK_4K;
> > > >           entry->translated_addr = entry->iova;
> > > >           entry->addr_mask = ~VTD_PAGE_MASK_4K;
> > > > @@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > >           return true;
> > > >       }
> > > > +    iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> > > > +    if (iotlb_entry) {
> > > > +        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> > > > +                                 iotlb_entry->domain_id);
> > > > +        slpte = iotlb_entry->slpte;
> > > > +        access_flags = iotlb_entry->access_flags;
> > > > +        page_mask = iotlb_entry->mask;
> > > > +        goto out;
> > > > +    }
> > > IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
> > > we'll need to fetch the default pasid from the context entry.  That looks
> > > reasonable.
> > >
> > > It's just a bit of pity because logically it'll slow down iotlb hits due to
> > > context entry operations.  When NO_PASID we could have looked up iotlb without
> > > checking pasid at all, assuming that "default pasid" will always match.  But
> > > that is a little bit hacky.
> >
> >
> > Right, but I think you meant to do this only when scalable mode is disabled.
>
> Yes IMHO it will definitely suite for !scalable case since that's exactly what
> we did before.  What I'm also wondering is even if scalable is enabled but no
> "real" pasid is used, so if all the translations go through the default pasid
> that stored in the device context entry, then maybe we can ignore checking it.
> The latter is the "hacky" part mentioned above.

The problem I see is that we can't know what PASID is used as default
without reading the context entry?

>
> The other thing to mention is, if we postpone the iotlb lookup to be after
> context entry, then logically we can have per-device iotlb, that means we can
> replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> which can also be more efficient.

Right but we still need to limit the total slots and ATS is a better
way to deal with the IOTLB bottleneck actually.

>
> Not sure whether Michael will have a preference, for me I think either way can
> be done on top.
>
> >
> >
> > >
> > > vIOMMU seems to be mostly used for assigned devices and dpdk in production in
> > > the future due to its slowness otherwise.. so maybe not a big deal at all.
> > >
> > > [...]
> > >
> > > > @@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > > >       vtd_iommu_lock(s);
> > > >       g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> > > >       vtd_iommu_unlock(s);
> > > > -    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> > > > +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
> > > > +}
> > > > +
> > > > +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> > > > +                                            uint16_t domain_id,
> > > > +                                            hwaddr addr, uint8_t am,
> > > > +                                            uint32_t pasid)
> > > > +{
> > > > +    VTDIOTLBPageInvInfo info;
> > > > +
> > > > +    trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> > > > +
> > > > +    assert(am <= VTD_MAMV);
> > > > +    info.domain_id = domain_id;
> > > > +    info.addr = addr;
> > > > +    info.mask = ~((1 << am) - 1);
> > > > +    info.pasid = pasid;
> > > > +    vtd_iommu_lock(s);
> > > > +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, &info);
> > > > +    vtd_iommu_unlock(s);
> > > > +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
> > > Hmm, I think indeed we need a notification, but it'll be unnecessary for
> > > e.g. vfio map notifiers, because this is 1st level invalidation and at least so
> > > far vfio map notifiers are rewalking only the 2nd level page table, so it'll be
> > > destined to be a no-op and pure overhead.
> >
> >
> > Right, consider we don't implement l1 and we don't have a 1st level
> > abstraction in neither vhost nor vfio, we can simply remove this.
>
> We probably still need the real pasid invalidation parts in the future?

Yes.

>  Either
> for vhost (if vhost will going to cache pasid-based translations), or for
> compatible assigned devices in the future where the HW can cache it.

Vhost has the plan to support ASID here:

https://patchwork.kernel.org/project/kvm/patch/20201216064818.48239-11-jasowang@redhat.com/#23866593

>
> I'm not sure what's the best way to do this, yet. Perhaps adding a new field to
> vtd_iotlb_page_invalidate_notify() telling whether this is pasid-based or not
> (basically, an invalidation for 1st or 2nd level pgtable)?

AFAIK there's no L1 in the abstraction for device IOTLB but a combined
translation result from IVOA-GPA

>  Then if it is
> pasid-based, we could opt-out for the shadow page walking.
>
> But as you mentioned we could also postpone it to the future.  Your call. :-)

Right, I tend to defer it otherwise there seems no way to test this.

Thanks

>
> Thanks,
>
> >
> >
> > >
> > > > +}
> > > > +
> > > > +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > > > +                                       uint32_t pasid)
> > > > +{
> > > > +    VTDIOTLBPageInvInfo info;
> > > > +    VTDAddressSpace *vtd_as;
> > > > +    VTDContextEntry ce;
> > > > +
> > > > +    trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
> > > > +
> > > > +    info.domain_id = domain_id;
> > > > +    info.pasid = pasid;
> > > > +    vtd_iommu_lock(s);
> > > > +    g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
> > > > +    vtd_iommu_unlock(s);
> > > > +
> > > > +    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
> > > > +        if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > > > +                                      vtd_as->devfn, &ce) &&
> > > > +            domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
> > > > +            pasid == vtd_as->pasid) {
> > > > +            vtd_sync_shadow_page_table(vtd_as);
> > > Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
> > > if we got the 1st level pgtable invalidated?
> >
> >
> > Seems not and this makes me think to remove the whole PASID based
> > invalidation logic since they are for L1 which is not implemented in this
> > series.
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14  5:58         ` Jason Wang
@ 2022-01-14  7:13           ` Peter Xu
  2022-01-14  7:22             ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2022-01-14  7:13 UTC (permalink / raw)
  To: Jason Wang; +Cc: Liu, Yi L, yi.y.sun, qemu-devel, mst

On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > Right, but I think you meant to do this only when scalable mode is disabled.
> >
> > Yes IMHO it will definitely suite for !scalable case since that's exactly what
> > we did before.  What I'm also wondering is even if scalable is enabled but no
> > "real" pasid is used, so if all the translations go through the default pasid
> > that stored in the device context entry, then maybe we can ignore checking it.
> > The latter is the "hacky" part mentioned above.
> 
> The problem I see is that we can't know what PASID is used as default
> without reading the context entry?

Can the default NO_PASID being used in mixture of !NO_PASID use case on the
same device?  If that's possible, then I agree..

My previous idea should be based on the fact that if NO_PASID is used on one
device, then all translations will be based on NO_PASID, but now I'm not sure
of it.

> 
> >
> > The other thing to mention is, if we postpone the iotlb lookup to be after
> > context entry, then logically we can have per-device iotlb, that means we can
> > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> > which can also be more efficient.
> 
> Right but we still need to limit the total slots and ATS is a better
> way to deal with the IOTLB bottleneck actually.

I think it depends on how the iotlb ghash is implemented.  Logically I think if
we can split the cache to per-device it'll be slightly better because we don't
need to iterate over iotlbs of other devices when lookup anymore; meanwhile
each iotlb takes less space too (no devfn needed anymore).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14  7:13           ` Peter Xu
@ 2022-01-14  7:22             ` Jason Wang
  2022-01-14  7:45               ` Peter Xu
  2022-01-14 12:58               ` Liu Yi L
  0 siblings, 2 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-14  7:22 UTC (permalink / raw)
  To: Peter Xu; +Cc: Liu, Yi L, yi.y.sun, qemu-devel, mst

On Fri, Jan 14, 2022 at 3:13 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > > Right, but I think you meant to do this only when scalable mode is disabled.
> > >
> > > Yes IMHO it will definitely suite for !scalable case since that's exactly what
> > > we did before.  What I'm also wondering is even if scalable is enabled but no
> > > "real" pasid is used, so if all the translations go through the default pasid
> > > that stored in the device context entry, then maybe we can ignore checking it.
> > > The latter is the "hacky" part mentioned above.
> >
> > The problem I see is that we can't know what PASID is used as default
> > without reading the context entry?
>
> Can the default NO_PASID being used in mixture of !NO_PASID use case on the
> same device?  If that's possible, then I agree..

My understanding is that it is possible.

>
> My previous idea should be based on the fact that if NO_PASID is used on one
> device, then all translations will be based on NO_PASID, but now I'm not sure
> of it.

Actually, what I meant is:

device 1 using transactions without PASID with RID2PASID 1
device 2 using transactions without PASID with RID2PASID 2

Then we can't assume a default pasid here.

>
> >
> > >
> > > The other thing to mention is, if we postpone the iotlb lookup to be after
> > > context entry, then logically we can have per-device iotlb, that means we can
> > > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> > > which can also be more efficient.
> >
> > Right but we still need to limit the total slots and ATS is a better
> > way to deal with the IOTLB bottleneck actually.
>
> I think it depends on how the iotlb ghash is implemented.  Logically I think if
> we can split the cache to per-device it'll be slightly better because we don't
> need to iterate over iotlbs of other devices when lookup anymore; meanwhile
> each iotlb takes less space too (no devfn needed anymore).

So we've already used sid in the IOTLB hash, I wonder how much we can
gain form this.

Thanks

>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14  7:22             ` Jason Wang
@ 2022-01-14  7:45               ` Peter Xu
  2022-01-14  9:12                 ` Jason Wang
  2022-01-14 12:58               ` Liu Yi L
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Xu @ 2022-01-14  7:45 UTC (permalink / raw)
  To: Jason Wang; +Cc: Liu, Yi L, yi.y.sun, qemu-devel, mst

On Fri, Jan 14, 2022 at 03:22:16PM +0800, Jason Wang wrote:
> On Fri, Jan 14, 2022 at 3:13 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > > > Right, but I think you meant to do this only when scalable mode is disabled.
> > > >
> > > > Yes IMHO it will definitely suite for !scalable case since that's exactly what
> > > > we did before.  What I'm also wondering is even if scalable is enabled but no
> > > > "real" pasid is used, so if all the translations go through the default pasid
> > > > that stored in the device context entry, then maybe we can ignore checking it.
> > > > The latter is the "hacky" part mentioned above.
> > >
> > > The problem I see is that we can't know what PASID is used as default
> > > without reading the context entry?
> >
> > Can the default NO_PASID being used in mixture of !NO_PASID use case on the
> > same device?  If that's possible, then I agree..
> 
> My understanding is that it is possible.

OK.

> 
> >
> > My previous idea should be based on the fact that if NO_PASID is used on one
> > device, then all translations will be based on NO_PASID, but now I'm not sure
> > of it.
> 
> Actually, what I meant is:
> 
> device 1 using transactions without PASID with RID2PASID 1
> device 2 using transactions without PASID with RID2PASID 2
> 
> Then we can't assume a default pasid here.

This seems fine, because "device N" is still part of the equation when looking
up, so we won't lookup wrong.

But yeah.. it could not really work anyway.

> 
> >
> > >
> > > >
> > > > The other thing to mention is, if we postpone the iotlb lookup to be after
> > > > context entry, then logically we can have per-device iotlb, that means we can
> > > > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> > > > which can also be more efficient.
> > >
> > > Right but we still need to limit the total slots and ATS is a better
> > > way to deal with the IOTLB bottleneck actually.
> >
> > I think it depends on how the iotlb ghash is implemented.  Logically I think if
> > we can split the cache to per-device it'll be slightly better because we don't
> > need to iterate over iotlbs of other devices when lookup anymore; meanwhile
> > each iotlb takes less space too (no devfn needed anymore).
> 
> So we've already used sid in the IOTLB hash, I wonder how much we can
> gain form this.

I think at least we can shrink iotlb structures, e.g.:

struct vtd_iotlb_key {
    uint16_t sid;               <------ not needed
    uint32_t pasid;             <------ not needed
    uint64_t gfn;
    uint32_t level;
};

struct VTDIOTLBEntry {
    uint64_t gfn;
    uint16_t domain_id;
    uint32_t pasid;             <------ not needed
    uint64_t slpte;
    uint64_t mask;
    uint8_t access_flags;
};

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14  7:45               ` Peter Xu
@ 2022-01-14  9:12                 ` Jason Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-14  9:12 UTC (permalink / raw)
  To: Peter Xu; +Cc: Liu, Yi L, yi.y.sun, qemu-devel, mst

On Fri, Jan 14, 2022 at 3:45 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jan 14, 2022 at 03:22:16PM +0800, Jason Wang wrote:
> > On Fri, Jan 14, 2022 at 3:13 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > > > > Right, but I think you meant to do this only when scalable mode is disabled.
> > > > >
> > > > > Yes IMHO it will definitely suite for !scalable case since that's exactly what
> > > > > we did before.  What I'm also wondering is even if scalable is enabled but no
> > > > > "real" pasid is used, so if all the translations go through the default pasid
> > > > > that stored in the device context entry, then maybe we can ignore checking it.
> > > > > The latter is the "hacky" part mentioned above.
> > > >
> > > > The problem I see is that we can't know what PASID is used as default
> > > > without reading the context entry?
> > >
> > > Can the default NO_PASID being used in mixture of !NO_PASID use case on the
> > > same device?  If that's possible, then I agree..
> >
> > My understanding is that it is possible.
>
> OK.
>
> >
> > >
> > > My previous idea should be based on the fact that if NO_PASID is used on one
> > > device, then all translations will be based on NO_PASID, but now I'm not sure
> > > of it.
> >
> > Actually, what I meant is:
> >
> > device 1 using transactions without PASID with RID2PASID 1
> > device 2 using transactions without PASID with RID2PASID 2
> >
> > Then we can't assume a default pasid here.
>
> This seems fine, because "device N" is still part of the equation when looking
> up, so we won't lookup wrong.

Right.

>
> But yeah.. it could not really work anyway.
>
> >
> > >
> > > >
> > > > >
> > > > > The other thing to mention is, if we postpone the iotlb lookup to be after
> > > > > context entry, then logically we can have per-device iotlb, that means we can
> > > > > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> > > > > which can also be more efficient.
> > > >
> > > > Right but we still need to limit the total slots and ATS is a better
> > > > way to deal with the IOTLB bottleneck actually.
> > >
> > > I think it depends on how the iotlb ghash is implemented.  Logically I think if
> > > we can split the cache to per-device it'll be slightly better because we don't
> > > need to iterate over iotlbs of other devices when lookup anymore; meanwhile
> > > each iotlb takes less space too (no devfn needed anymore).
> >
> > So we've already used sid in the IOTLB hash, I wonder how much we can
> > gain form this.
>
> I think at least we can shrink iotlb structures, e.g.:
>
> struct vtd_iotlb_key {
>     uint16_t sid;               <------ not needed
>     uint32_t pasid;             <------ not needed
>     uint64_t gfn;
>     uint32_t level;
> };

I don't get why PASID is not needed.

Thanks

>
> struct VTDIOTLBEntry {
>     uint64_t gfn;
>     uint16_t domain_id;
>     uint32_t pasid;             <------ not needed
>     uint64_t slpte;
>     uint64_t mask;
>     uint8_t access_flags;
> };
>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] intel-iommu: drop VTDBus
  2022-01-14  2:32     ` Jason Wang
@ 2022-01-14  9:15       ` Jason Wang
  2022-01-17  1:27         ` Peter Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2022-01-14  9:15 UTC (permalink / raw)
  To: Peter Xu; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst


在 2022/1/14 上午10:32, Jason Wang 写道:
>>> dressSpace *as)
>>>   /* GHashTable functions */
>>>   static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>>>   {
>>> -    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
>>> +    const struct vtd_as_key *key1 = v1;
>>> +    const struct vtd_as_key *key2 = v2;
>>> +
>>> +    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
>>>   }
>>>     static guint vtd_uint64_hash(gconstpointer v)
>>>   {
>>> -    return (guint)*(const uint64_t *)v;
>>> +    const struct vtd_as_key *key = v;
>>> +    guint value = (guint)(uintptr_t)key->bus;
>>> +
>>> +    return (guint)(value << 8 | key->devfn);
>> Note that value is a pointer to PCIBus*.  Just want to check with you 
>> that it's
>> intended to use this hash value (or maybe you wanted to use Source ID 
>> so it is
>> bus number to use not the bus pointer)?
>
>
> Right, SID should be used here.


Sorry for taking too long for the context switching ...

The hash and shift based the bus pointer is intended since we use bus 
pointer as part of the key. The reason is still, during 
vtd_find_add_as(), SID is not correct since guest might not finish the 
initialization of the device and PCI bridge.

Thanks



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14  7:22             ` Jason Wang
  2022-01-14  7:45               ` Peter Xu
@ 2022-01-14 12:58               ` Liu Yi L
  2022-01-17  6:01                 ` Jason Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Liu Yi L @ 2022-01-14 12:58 UTC (permalink / raw)
  To: Jason Wang, Peter Xu; +Cc: yi.y.sun, qemu-devel, mst

On 2022/1/14 15:22, Jason Wang wrote:
> On Fri, Jan 14, 2022 at 3:13 PM Peter Xu <peterx@redhat.com> wrote:
>>
>> On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
>>>>> Right, but I think you meant to do this only when scalable mode is disabled.
>>>>
>>>> Yes IMHO it will definitely suite for !scalable case since that's exactly what
>>>> we did before.  What I'm also wondering is even if scalable is enabled but no
>>>> "real" pasid is used, so if all the translations go through the default pasid
>>>> that stored in the device context entry, then maybe we can ignore checking it.
>>>> The latter is the "hacky" part mentioned above.
>>>
>>> The problem I see is that we can't know what PASID is used as default
>>> without reading the context entry?
>>
>> Can the default NO_PASID being used in mixture of !NO_PASID use case on the
>> same device?  If that's possible, then I agree..
> 
> My understanding is that it is possible.
> 
>>
>> My previous idea should be based on the fact that if NO_PASID is used on one
>> device, then all translations will be based on NO_PASID, but now I'm not sure
>> of it.
> 
> Actually, what I meant is:
> 
> device 1 using transactions without PASID with RID2PASID 1
> device 2 using transactions without PASID with RID2PASID 2

Interesting series, Jason.

haven't read through all your code yet. Just a quick comment. The 
RID2PASID1 and RID2PASID2 may be the same one. Vt-d spec has defined a RPS 
bit in ecap register. If it is reported as 0, that means the RID_PASID 
(previously it is called RID2PASID :-)) field of scalable mode context 
entry is not supported, a PASID value of 0 will be used for transactions 
wihout PASID. So in the code, you may check the RPS bit to see if the 
RID_PASID value are the same for all devices.

Regards,
Yi Liu

> Then we can't assume a default pasid here.
> 
>>
>>>
>>>>
>>>> The other thing to mention is, if we postpone the iotlb lookup to be after
>>>> context entry, then logically we can have per-device iotlb, that means we can
>>>> replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
>>>> which can also be more efficient.
>>>
>>> Right but we still need to limit the total slots and ATS is a better
>>> way to deal with the IOTLB bottleneck actually.
>>
>> I think it depends on how the iotlb ghash is implemented.  Logically I think if
>> we can split the cache to per-device it'll be slightly better because we don't
>> need to iterate over iotlbs of other devices when lookup anymore; meanwhile
>> each iotlb takes less space too (no devfn needed anymore).
> 
> So we've already used sid in the IOTLB hash, I wonder how much we can
> gain form this.
> 
> Thanks
> 
>>
>> Thanks,
>>
>> --
>> Peter Xu
>>
> 

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] intel-iommu: drop VTDBus
  2022-01-14  9:15       ` Jason Wang
@ 2022-01-17  1:27         ` Peter Xu
  2022-01-17  1:42           ` Peter Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2022-01-17  1:27 UTC (permalink / raw)
  To: Jason Wang; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst

On Fri, Jan 14, 2022 at 05:15:35PM +0800, Jason Wang wrote:
> 
> 在 2022/1/14 上午10:32, Jason Wang 写道:
> > > > dressSpace *as)
> > > >   /* GHashTable functions */
> > > >   static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
> > > >   {
> > > > -    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
> > > > +    const struct vtd_as_key *key1 = v1;
> > > > +    const struct vtd_as_key *key2 = v2;
> > > > +
> > > > +    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
> > > >   }
> > > >     static guint vtd_uint64_hash(gconstpointer v)
> > > >   {
> > > > -    return (guint)*(const uint64_t *)v;
> > > > +    const struct vtd_as_key *key = v;
> > > > +    guint value = (guint)(uintptr_t)key->bus;
> > > > +
> > > > +    return (guint)(value << 8 | key->devfn);
> > > Note that value is a pointer to PCIBus*.  Just want to check with
> > > you that it's
> > > intended to use this hash value (or maybe you wanted to use Source
> > > ID so it is
> > > bus number to use not the bus pointer)?
> > 
> > 
> > Right, SID should be used here.
> 
> 
> Sorry for taking too long for the context switching ...
> 
> The hash and shift based the bus pointer is intended since we use bus
> pointer as part of the key. The reason is still, during vtd_find_add_as(),
> SID is not correct since guest might not finish the initialization of the
> device and PCI bridge.

Ah, right.

However here value is left-shifted 8 bits so I'm not sure how it could
guarantee no collision - logically any addresses that match lower 56 bits will
hit with the same devfn by accident.

I don't think it'll matter in reality, but... wondering whether it's easier to
simply use g_direct_hash() (the glibc provided hash function when hash==NULL is
passed over, IOW we simply pass hash_func=NULL for g_hash_table_new) so we'll
hash with struct vtd_as_key* instead; IMHO that'll suffice too.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] intel-iommu: drop VTDBus
  2022-01-17  1:27         ` Peter Xu
@ 2022-01-17  1:42           ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2022-01-17  1:42 UTC (permalink / raw)
  To: Jason Wang; +Cc: yi.l.liu, yi.y.sun, qemu-devel, mst

On Mon, Jan 17, 2022 at 09:27:03AM +0800, Peter Xu wrote:
> On Fri, Jan 14, 2022 at 05:15:35PM +0800, Jason Wang wrote:
> > 
> > 在 2022/1/14 上午10:32, Jason Wang 写道:
> > > > > dressSpace *as)
> > > > >   /* GHashTable functions */
> > > > >   static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
> > > > >   {
> > > > > -    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
> > > > > +    const struct vtd_as_key *key1 = v1;
> > > > > +    const struct vtd_as_key *key2 = v2;
> > > > > +
> > > > > +    return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
> > > > >   }
> > > > >     static guint vtd_uint64_hash(gconstpointer v)
> > > > >   {
> > > > > -    return (guint)*(const uint64_t *)v;
> > > > > +    const struct vtd_as_key *key = v;
> > > > > +    guint value = (guint)(uintptr_t)key->bus;
> > > > > +
> > > > > +    return (guint)(value << 8 | key->devfn);
> > > > Note that value is a pointer to PCIBus*.  Just want to check with
> > > > you that it's
> > > > intended to use this hash value (or maybe you wanted to use Source
> > > > ID so it is
> > > > bus number to use not the bus pointer)?
> > > 
> > > 
> > > Right, SID should be used here.
> > 
> > 
> > Sorry for taking too long for the context switching ...
> > 
> > The hash and shift based the bus pointer is intended since we use bus
> > pointer as part of the key. The reason is still, during vtd_find_add_as(),
> > SID is not correct since guest might not finish the initialization of the
> > device and PCI bridge.
> 
> Ah, right.
> 
> However here value is left-shifted 8 bits so I'm not sure how it could
> guarantee no collision - logically any addresses that match lower 56 bits will
> hit with the same devfn by accident.
> 
> I don't think it'll matter in reality, but... wondering whether it's easier to
> simply use g_direct_hash() (the glibc provided hash function when hash==NULL is
> passed over, IOW we simply pass hash_func=NULL for g_hash_table_new) so we'll
> hash with struct vtd_as_key* instead; IMHO that'll suffice too.

Please ignore this - I think what you said should work indeed, and hash
collision should be fine with the equal function.  Dangling vtd_as_key* may not
really work.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] intel-iommu: PASID support
  2022-01-14 12:58               ` Liu Yi L
@ 2022-01-17  6:01                 ` Jason Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Wang @ 2022-01-17  6:01 UTC (permalink / raw)
  To: Liu Yi L; +Cc: yi.y.sun, qemu-devel, Peter Xu, mst

On Fri, Jan 14, 2022 at 8:59 PM Liu Yi L <yi.l.liu@intel.com> wrote:
>
> On 2022/1/14 15:22, Jason Wang wrote:
> > On Fri, Jan 14, 2022 at 3:13 PM Peter Xu <peterx@redhat.com> wrote:
> >>
> >> On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> >>>>> Right, but I think you meant to do this only when scalable mode is disabled.
> >>>>
> >>>> Yes IMHO it will definitely suite for !scalable case since that's exactly what
> >>>> we did before.  What I'm also wondering is even if scalable is enabled but no
> >>>> "real" pasid is used, so if all the translations go through the default pasid
> >>>> that stored in the device context entry, then maybe we can ignore checking it.
> >>>> The latter is the "hacky" part mentioned above.
> >>>
> >>> The problem I see is that we can't know what PASID is used as default
> >>> without reading the context entry?
> >>
> >> Can the default NO_PASID being used in mixture of !NO_PASID use case on the
> >> same device?  If that's possible, then I agree..
> >
> > My understanding is that it is possible.
> >
> >>
> >> My previous idea should be based on the fact that if NO_PASID is used on one
> >> device, then all translations will be based on NO_PASID, but now I'm not sure
> >> of it.
> >
> > Actually, what I meant is:
> >
> > device 1 using transactions without PASID with RID2PASID 1
> > device 2 using transactions without PASID with RID2PASID 2
>
> Interesting series, Jason.
>
> haven't read through all your code yet. Just a quick comment. The
> RID2PASID1 and RID2PASID2 may be the same one. Vt-d spec has defined a RPS
> bit in ecap register. If it is reported as 0, that means the RID_PASID
> (previously it is called RID2PASID :-)) field of scalable mode context
> entry is not supported, a PASID value of 0 will be used for transactions
> wihout PASID. So in the code, you may check the RPS bit to see if the
> RID_PASID value are the same for all devices.

Good to know this, will spend some time to implement this (probably on
top). That can speed up the IOTLB lookup somehow

Thanks

>
> Regards,
> Yi Liu
>
> > Then we can't assume a default pasid here.
> >
> >>
> >>>
> >>>>
> >>>> The other thing to mention is, if we postpone the iotlb lookup to be after
> >>>> context entry, then logically we can have per-device iotlb, that means we can
> >>>> replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> >>>> which can also be more efficient.
> >>>
> >>> Right but we still need to limit the total slots and ATS is a better
> >>> way to deal with the IOTLB bottleneck actually.
> >>
> >> I think it depends on how the iotlb ghash is implemented.  Logically I think if
> >> we can split the cache to per-device it'll be slightly better because we don't
> >> need to iterate over iotlbs of other devices when lookup anymore; meanwhile
> >> each iotlb takes less space too (no devfn needed anymore).
> >
> > So we've already used sid in the IOTLB hash, I wonder how much we can
> > gain form this.
> >
> > Thanks
> >
> >>
> >> Thanks,
> >>
> >> --
> >> Peter Xu
> >>
> >
>
> --
> Regards,
> Yi Liu
>



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-01-17  6:04 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05  4:19 [PATCH 0/3] PASID support for Intel IOMMU Jason Wang
2022-01-05  4:19 ` [PATCH] intel-iommu: correctly check passthrough during translation Jason Wang
2022-01-05  4:19 ` [PATCH 1/3] intel-iommu: don't warn guest errors when getting rid2pasid entry Jason Wang
2022-01-13  3:35   ` Peter Xu
2022-01-13  6:16     ` Jason Wang
2022-01-13  6:32       ` Peter Xu
2022-01-13  7:05     ` Michael S. Tsirkin
2022-01-14  3:02       ` Jason Wang
2022-01-13  7:06   ` Michael S. Tsirkin
2022-01-14  2:56     ` Jason Wang
2022-01-05  4:19 ` [PATCH 2/3] intel-iommu: drop VTDBus Jason Wang
2022-01-13  4:12   ` Peter Xu
2022-01-14  2:32     ` Jason Wang
2022-01-14  9:15       ` Jason Wang
2022-01-17  1:27         ` Peter Xu
2022-01-17  1:42           ` Peter Xu
2022-01-05  4:19 ` [PATCH 3/3] intel-iommu: PASID support Jason Wang
2022-01-13  5:06   ` Peter Xu
2022-01-13  7:16     ` Michael S. Tsirkin
2022-01-14  2:47     ` Jason Wang
2022-01-14  3:31       ` Peter Xu
2022-01-14  5:58         ` Jason Wang
2022-01-14  7:13           ` Peter Xu
2022-01-14  7:22             ` Jason Wang
2022-01-14  7:45               ` Peter Xu
2022-01-14  9:12                 ` Jason Wang
2022-01-14 12:58               ` Liu Yi L
2022-01-17  6:01                 ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.