All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes
@ 2018-05-18  7:25 Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 1/9] intel-iommu: send PSI always even if across PDEs Peter Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

Upstream repo: https://github.com/xzpeter/qemu/tree/fix-vtd-dma

This version contains little functional changes, mostly comment-wise
and patch squashings.  I squashed the last 4 patches since they are
solving the same problem, meanwhile the old patch 9 itself might
brought some problem if without patch 12 so I just squashed them all.
I re-wrote the commit message for that patch (and it simplifies me to
write the SECURITY part since all those four patches are solving the
single problem).  Reviewers can go to previous version 3 and have a
look at patches 9-12 since the contents are totally the same (and I
suppose that one might be easier for reviewers).

V3: http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04009.html

I made bold to have Jintack's Reported-by and Tested-by for the last
big patch already, after I did some smoke test for this new version.

v4:
- rename: notifiers_list to vtd_as_with_notifiers,
  vtd_as_notify_mappings to vtd_as_has_map_notifier [Eric]
- tune the comments in misc patches [Eric]
- add missing places to take iommu lock [Eric]
- for each patch: cc stable, add "SECURITY IMPLICATION" at the
  beginning of commit where necessary
- merge patch 9-12

Please review.  Thanks.

Peter Xu (9):
  intel-iommu: send PSI always even if across PDEs
  intel-iommu: remove IntelIOMMUNotifierNode
  intel-iommu: add iommu lock
  intel-iommu: only do page walk for MAP notifiers
  intel-iommu: introduce vtd_page_walk_info
  intel-iommu: pass in address space when page walk
  intel-iommu: trace domain id during page walk
  util: implement simple iova tree
  intel-iommu: rework the page walk logic

 include/hw/i386/intel_iommu.h |  19 +-
 include/qemu/iova-tree.h      | 134 ++++++++++++
 hw/i386/intel_iommu.c         | 396 +++++++++++++++++++++++++---------
 util/iova-tree.c              | 114 ++++++++++
 MAINTAINERS                   |   6 +
 hw/i386/trace-events          |   5 +-
 util/Makefile.objs            |   1 +
 7 files changed, 566 insertions(+), 109 deletions(-)
 create mode 100644 include/qemu/iova-tree.h
 create mode 100644 util/iova-tree.c

-- 
2.17.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 1/9] intel-iommu: send PSI always even if across PDEs
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 2/9] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

SECURITY IMPLICATION: without this patch, any guest with both assigned
device and a vIOMMU might encounter stale IO page mappings even if guest
has already unmapped the page.  The stale mappings will only be limited
to the guest's own memory range, so it should not affect the host memory
or other guests on the host.

During IOVA page table walking, there is a special case when the PSI
covers one whole PDE (Page Directory Entry, which contains 512 Page
Table Entries) or more.  In the past, we skip that entry and we don't
notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
notification to registered UNMAP notifiers in this case.

For UNMAP only notifiers, this might cause IOTLBs cached in the devices
even if they were already invalid.  For MAP/UNMAP notifiers like
vfio-pci, this will cause stale page mappings.

This special case doesn't trigger often, but it is very easy to be
triggered by nested device assignments, since in that case we'll
possibly map the whole L2 guest RAM region into the device's IOVA
address space (several GBs at least), which is far bigger than normal
kernel driver usages of the device (tens of MBs normally).

Without this patch applied to L1 QEMU, nested device assignment to L2
guests will dump some errors like:

qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
                    0x7f89a920d000) = -17 (File exists)

CC: QEMU Stable <qemu-stable@nongnu.org>
Acked-by: Jason Wang <jasowang@redhat.com>
[peterx: rewrite the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fb31de9416..b359efd6f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
+                             vtd_page_walk_hook hook_fn, void *private)
+{
+    assert(hook_fn);
+    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
+                            entry->addr_mask, entry->perm);
+    return hook_fn(entry, private);
+}
+
 /**
  * vtd_page_walk_level - walk over specific level for IOVA range
  *
@@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
          */
         entry_valid = read_cur | write_cur;
 
+        entry.target_as = &address_space_memory;
+        entry.iova = iova & subpage_mask;
+        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+        entry.addr_mask = ~subpage_mask;
+
         if (vtd_is_last_slpte(slpte, level)) {
-            entry.target_as = &address_space_memory;
-            entry.iova = iova & subpage_mask;
             /* NOTE: this is only meaningful if entry_valid == true */
             entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-            entry.addr_mask = ~subpage_mask;
-            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
             if (!entry_valid && !notify_unmap) {
                 trace_vtd_page_walk_skip_perm(iova, iova_next);
                 goto next;
             }
-            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
-                                    entry.addr_mask, entry.perm);
-            if (hook_fn) {
-                ret = hook_fn(&entry, private);
-                if (ret < 0) {
-                    return ret;
-                }
+            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+            if (ret < 0) {
+                return ret;
             }
         } else {
             if (!entry_valid) {
-                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                if (notify_unmap) {
+                    /*
+                     * The whole entry is invalid; unmap it all.
+                     * Translated address is meaningless, zero it.
+                     */
+                    entry.translated_addr = 0x0;
+                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+                    if (ret < 0) {
+                        return ret;
+                    }
+                } else {
+                    trace_vtd_page_walk_skip_perm(iova, iova_next);
+                }
                 goto next;
             }
             ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 2/9] intel-iommu: remove IntelIOMMUNotifierNode
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 1/9] intel-iommu: send PSI always even if across PDEs Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 3/9] intel-iommu: add iommu lock Peter Xu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

That is not really necessary.  Removing that node struct and put the
list entry directly into VTDAddressSpace.  It simplfies the code a lot.
Since at it, rename the old notifiers_list into vtd_as_with_notifiers.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/intel_iommu.h |  9 ++------
 hw/i386/intel_iommu.c         | 41 ++++++++++-------------------------
 2 files changed, 13 insertions(+), 37 deletions(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 45ec8919b6..032e33bcb2 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -67,7 +67,6 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDIrq VTDIrq;
 typedef struct VTD_MSIMessage VTD_MSIMessage;
-typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -93,6 +92,7 @@ struct VTDAddressSpace {
     MemoryRegion iommu_ir;      /* Interrupt region: 0xfeeXXXXX */
     IntelIOMMUState *iommu_state;
     VTDContextCacheEntry context_cache_entry;
+    QLIST_ENTRY(VTDAddressSpace) next;
 };
 
 struct VTDBus {
@@ -253,11 +253,6 @@ struct VTD_MSIMessage {
 /* When IR is enabled, all MSI/MSI-X data bits should be zero */
 #define VTD_IR_MSI_DATA          (0)
 
-struct IntelIOMMUNotifierNode {
-    VTDAddressSpace *vtd_as;
-    QLIST_ENTRY(IntelIOMMUNotifierNode) next;
-};
-
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
     X86IOMMUState x86_iommu;
@@ -295,7 +290,7 @@ struct IntelIOMMUState {
     GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
     /* list of registered notifiers */
-    QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
+    QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
 
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b359efd6f9..3df90457f8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1248,10 +1248,10 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 
 static void vtd_iommu_replay_all(IntelIOMMUState *s)
 {
-    IntelIOMMUNotifierNode *node;
+    VTDAddressSpace *vtd_as;
 
-    QLIST_FOREACH(node, &s->notifiers_list, next) {
-        memory_region_iommu_replay_all(&node->vtd_as->iommu);
+    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
+        memory_region_iommu_replay_all(&vtd_as->iommu);
     }
 }
 
@@ -1372,7 +1372,6 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState *s)
 
 static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 {
-    IntelIOMMUNotifierNode *node;
     VTDContextEntry ce;
     VTDAddressSpace *vtd_as;
 
@@ -1381,8 +1380,7 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
                                 &domain_id);
 
-    QLIST_FOREACH(node, &s->notifiers_list, next) {
-        vtd_as = node->vtd_as;
+    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
         if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
                                       vtd_as->devfn, &ce) &&
             domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
@@ -1402,12 +1400,11 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                            uint16_t domain_id, hwaddr addr,
                                            uint8_t am)
 {
-    IntelIOMMUNotifierNode *node;
+    VTDAddressSpace *vtd_as;
     VTDContextEntry ce;
     int ret;
 
-    QLIST_FOREACH(node, &(s->notifiers_list), next) {
-        VTDAddressSpace *vtd_as = node->vtd_as;
+    QLIST_FOREACH(vtd_as, &(s->vtd_as_with_notifiers), next) {
         ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
                                        vtd_as->devfn, &ce);
         if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
@@ -2344,8 +2341,6 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
-    IntelIOMMUNotifierNode *node = NULL;
-    IntelIOMMUNotifierNode *next_node = NULL;
 
     if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
         error_report("We need to set caching-mode=1 for intel-iommu to enable "
@@ -2354,21 +2349,9 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
     }
 
     if (old == IOMMU_NOTIFIER_NONE) {
-        node = g_malloc0(sizeof(*node));
-        node->vtd_as = vtd_as;
-        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
-        return;
-    }
-
-    /* update notifier node with new flags */
-    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
-        if (node->vtd_as == vtd_as) {
-            if (new == IOMMU_NOTIFIER_NONE) {
-                QLIST_REMOVE(node, next);
-                g_free(node);
-            }
-            return;
-        }
+        QLIST_INSERT_HEAD(&s->vtd_as_with_notifiers, vtd_as, next);
+    } else if (new == IOMMU_NOTIFIER_NONE) {
+        QLIST_REMOVE(vtd_as, next);
     }
 }
 
@@ -2838,12 +2821,10 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 
 static void vtd_address_space_unmap_all(IntelIOMMUState *s)
 {
-    IntelIOMMUNotifierNode *node;
     VTDAddressSpace *vtd_as;
     IOMMUNotifier *n;
 
-    QLIST_FOREACH(node, &s->notifiers_list, next) {
-        vtd_as = node->vtd_as;
+    QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
         IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
             vtd_address_space_unmap(vtd_as, n);
         }
@@ -3088,7 +3069,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    QLIST_INIT(&s->notifiers_list);
+    QLIST_INIT(&s->vtd_as_with_notifiers);
     memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
     memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
                           "intel_iommu", DMAR_REG_SIZE);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 3/9] intel-iommu: add iommu lock
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 1/9] intel-iommu: send PSI always even if across PDEs Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 2/9] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 4/9] intel-iommu: only do page walk for MAP notifiers Peter Xu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

SECURITY IMPLICATION: this patch fixes a potential race when multiple
threads access the IOMMU IOTLB cache.

Add a per-iommu big lock to protect IOMMU status.  Currently the only
thing to be protected is the IOTLB/context cache, since that can be
accessed even without BQL, e.g., in IO dataplane.

Note that we don't need to protect device page tables since that's fully
controlled by the guest kernel.  However there is still possibility that
malicious drivers will program the device to not obey the rule.  In that
case QEMU can't really do anything useful, instead the guest itself will
be responsible for all uncertainties.

CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/intel_iommu.h |  6 ++++
 hw/i386/intel_iommu.c         | 56 +++++++++++++++++++++++++++++------
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 032e33bcb2..016e74bedb 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -300,6 +300,12 @@ struct IntelIOMMUState {
     OnOffAuto intr_eim;             /* Toggle for EIM cabability */
     bool buggy_eim;                 /* Force buggy EIM unless eim=off */
     uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
+
+    /*
+     * Protects IOMMU states in general.  Currently it protects the
+     * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
+     */
+    QemuMutex iommu_lock;
 };
 
 /* Find the VTD Address space associated with the given bus pointer,
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3df90457f8..8d4069de9f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -128,6 +128,16 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
     return new_val;
 }
 
+static inline void vtd_iommu_lock(IntelIOMMUState *s)
+{
+    qemu_mutex_lock(&s->iommu_lock);
+}
+
+static inline void vtd_iommu_unlock(IntelIOMMUState *s)
+{
+    qemu_mutex_unlock(&s->iommu_lock);
+}
+
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -172,9 +182,9 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
 }
 
 /* Reset all the gen of VTDAddressSpace to zero and set the gen of
- * IntelIOMMUState to 1.
+ * IntelIOMMUState to 1.  Must be called with IOMMU lock held.
  */
-static void vtd_reset_context_cache(IntelIOMMUState *s)
+static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
 {
     VTDAddressSpace *vtd_as;
     VTDBus *vtd_bus;
@@ -197,12 +207,20 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
     s->context_cache_gen = 1;
 }
 
-static void vtd_reset_iotlb(IntelIOMMUState *s)
+/* Must be called with IOMMU lock held. */
+static void vtd_reset_iotlb_locked(IntelIOMMUState *s)
 {
     assert(s->iotlb);
     g_hash_table_remove_all(s->iotlb);
 }
 
+static void vtd_reset_iotlb(IntelIOMMUState *s)
+{
+    vtd_iommu_lock(s);
+    vtd_reset_iotlb_locked(s);
+    vtd_iommu_unlock(s);
+}
+
 static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
                                   uint32_t level)
 {
@@ -215,6 +233,7 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
     return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
 }
 
+/* Must be called with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
                                        hwaddr addr)
 {
@@ -235,6 +254,7 @@ out:
     return entry;
 }
 
+/* Must be with IOMMU lock held */
 static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
                              uint16_t domain_id, hwaddr addr, uint64_t slpte,
                              uint8_t access_flags, uint32_t level)
@@ -246,7 +266,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
     trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
     if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
         trace_vtd_iotlb_reset("iotlb exceeds size limit");
-        vtd_reset_iotlb(s);
+        vtd_reset_iotlb_locked(s);
     }
 
     entry->gfn = gfn;
@@ -1106,7 +1126,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     IntelIOMMUState *s = vtd_as->iommu_state;
     VTDContextEntry ce;
     uint8_t bus_num = pci_bus_num(bus);
-    VTDContextCacheEntry *cc_entry = &vtd_as->context_cache_entry;
+    VTDContextCacheEntry *cc_entry;
     uint64_t slpte, page_mask;
     uint32_t level;
     uint16_t source_id = vtd_make_source_id(bus_num, devfn);
@@ -1123,6 +1143,10 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
      */
     assert(!vtd_is_interrupt_addr(addr));
 
+    vtd_iommu_lock(s);
+
+    cc_entry = &vtd_as->context_cache_entry;
+
     /* Try to fetch slpte form IOTLB */
     iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
     if (iotlb_entry) {
@@ -1182,7 +1206,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
          * IOMMU region can be swapped back.
          */
         vtd_pt_enable_fast_path(s, source_id);
-
+        vtd_iommu_unlock(s);
         return true;
     }
 
@@ -1203,6 +1227,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     vtd_update_iotlb(s, source_id, VTD_CONTEXT_ENTRY_DID(ce.hi), addr, slpte,
                      access_flags, level);
 out:
+    vtd_iommu_unlock(s);
     entry->iova = addr & page_mask;
     entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
     entry->addr_mask = ~page_mask;
@@ -1210,6 +1235,7 @@ out:
     return true;
 
 error:
+    vtd_iommu_unlock(s);
     entry->iova = 0;
     entry->translated_addr = 0;
     entry->addr_mask = 0;
@@ -1258,10 +1284,13 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
     trace_vtd_inv_desc_cc_global();
+    /* Protects context cache */
+    vtd_iommu_lock(s);
     s->context_cache_gen++;
     if (s->context_cache_gen == VTD_CONTEXT_CACHE_GEN_MAX) {
-        vtd_reset_context_cache(s);
+        vtd_reset_context_cache_locked(s);
     }
+    vtd_iommu_unlock(s);
     vtd_switch_address_space_all(s);
     /*
      * From VT-d spec 6.5.2.1, a global context entry invalidation
@@ -1313,7 +1342,9 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
             if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
                 trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
                                              VTD_PCI_FUNC(devfn_it));
+                vtd_iommu_lock(s);
                 vtd_as->context_cache_entry.context_cache_gen = 0;
+                vtd_iommu_unlock(s);
                 /*
                  * Do switch address space when needed, in case if the
                  * device passthrough bit is switched.
@@ -1377,8 +1408,10 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 
     trace_vtd_inv_desc_iotlb_domain(domain_id);
 
+    vtd_iommu_lock(s);
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
                                 &domain_id);
+    vtd_iommu_unlock(s);
 
     QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
         if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
@@ -1426,7 +1459,9 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
     info.domain_id = domain_id;
     info.addr = addr;
     info.mask = ~((1 << am) - 1);
+    vtd_iommu_lock(s);
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
+    vtd_iommu_unlock(s);
     vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
 }
 
@@ -2929,8 +2964,10 @@ static void vtd_init(IntelIOMMUState *s)
         s->cap |= VTD_CAP_CM;
     }
 
-    vtd_reset_context_cache(s);
-    vtd_reset_iotlb(s);
+    vtd_iommu_lock(s);
+    vtd_reset_context_cache_locked(s);
+    vtd_reset_iotlb_locked(s);
+    vtd_iommu_unlock(s);
 
     /* Define registers with default values and bit semantics */
     vtd_define_long(s, DMAR_VER_REG, 0x10UL, 0, 0);
@@ -3070,6 +3107,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     }
 
     QLIST_INIT(&s->vtd_as_with_notifiers);
+    qemu_mutex_init(&s->iommu_lock);
     memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
     memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
                           "intel_iommu", DMAR_REG_SIZE);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 4/9] intel-iommu: only do page walk for MAP notifiers
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
                   ` (2 preceding siblings ...)
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 3/9] intel-iommu: add iommu lock Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 5/9] intel-iommu: introduce vtd_page_walk_info Peter Xu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

For UNMAP-only IOMMU notifiers, we don't need to walk the page tables.
Fasten that procedure by skipping the page table walk.  That should
boost performance for UNMAP-only notifiers like vhost.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/intel_iommu.h |  2 ++
 hw/i386/intel_iommu.c         | 44 +++++++++++++++++++++++++++++++----
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 016e74bedb..156f35e919 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -93,6 +93,8 @@ struct VTDAddressSpace {
     IntelIOMMUState *iommu_state;
     VTDContextCacheEntry context_cache_entry;
     QLIST_ENTRY(VTDAddressSpace) next;
+    /* Superset of notifier flags that this address space has */
+    IOMMUNotifierFlag notifier_flags;
 };
 
 struct VTDBus {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 8d4069de9f..38ccc741f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -138,6 +138,12 @@ static inline void vtd_iommu_unlock(IntelIOMMUState *s)
     qemu_mutex_unlock(&s->iommu_lock);
 }
 
+/* Whether the address space needs to notify new mappings */
+static inline gboolean vtd_as_has_map_notifier(VTDAddressSpace *as)
+{
+    return as->notifier_flags & IOMMU_NOTIFIER_MAP;
+}
+
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -1436,14 +1442,36 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
     VTDAddressSpace *vtd_as;
     VTDContextEntry ce;
     int ret;
+    hwaddr size = (1 << am) * VTD_PAGE_SIZE;
 
     QLIST_FOREACH(vtd_as, &(s->vtd_as_with_notifiers), next) {
         ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
                                        vtd_as->devfn, &ce);
         if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
-            vtd_page_walk(&ce, addr, addr + (1 << am) * VTD_PAGE_SIZE,
-                          vtd_page_invalidate_notify_hook,
-                          (void *)&vtd_as->iommu, true, s->aw_bits);
+            if (vtd_as_has_map_notifier(vtd_as)) {
+                /*
+                 * As long as we have MAP notifications registered in
+                 * any of our IOMMU notifiers, we need to sync the
+                 * shadow page table.
+                 */
+                vtd_page_walk(&ce, addr, addr + size,
+                              vtd_page_invalidate_notify_hook,
+                              (void *)&vtd_as->iommu, true, s->aw_bits);
+            } else {
+                /*
+                 * For UNMAP-only notifiers, we don't need to walk the
+                 * page tables.  We just deliver the PSI down to
+                 * invalidate caches.
+                 */
+                IOMMUTLBEntry entry = {
+                    .target_as = &address_space_memory,
+                    .iova = addr,
+                    .translated_addr = 0,
+                    .addr_mask = size - 1,
+                    .perm = IOMMU_NONE,
+                };
+                memory_region_notify_iommu(&vtd_as->iommu, entry);
+            }
         }
     }
 }
@@ -2383,6 +2411,9 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
         exit(1);
     }
 
+    /* Update per-address-space notifier flags */
+    vtd_as->notifier_flags = new;
+
     if (old == IOMMU_NOTIFIER_NONE) {
         QLIST_INSERT_HEAD(&s->vtd_as_with_notifiers, vtd_as, next);
     } else if (new == IOMMU_NOTIFIER_NONE) {
@@ -2891,8 +2922,11 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
                                   PCI_FUNC(vtd_as->devfn),
                                   VTD_CONTEXT_ENTRY_DID(ce.hi),
                                   ce.hi, ce.lo);
-        vtd_page_walk(&ce, 0, ~0ULL, vtd_replay_hook, (void *)n, false,
-                      s->aw_bits);
+        if (vtd_as_has_map_notifier(vtd_as)) {
+            /* This is required only for MAP typed notifiers */
+            vtd_page_walk(&ce, 0, ~0ULL, vtd_replay_hook, (void *)n, false,
+                          s->aw_bits);
+        }
     } else {
         trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
                                     PCI_FUNC(vtd_as->devfn));
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 5/9] intel-iommu: introduce vtd_page_walk_info
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
                   ` (3 preceding siblings ...)
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 4/9] intel-iommu: only do page walk for MAP notifiers Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 6/9] intel-iommu: pass in address space when page walk Peter Xu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

During the recursive page walking of IOVA page tables, some stack
variables are constant variables and never changed during the whole page
walking procedure.  Isolate them into a struct so that we don't need to
pass those contants down the stack every time and multiple times.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 84 ++++++++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 32 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 38ccc741f9..e247269659 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -748,9 +748,27 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+/**
+ * Constant information used during page walking
+ *
+ * @hook_fn: hook func to be called when detected page
+ * @private: private data to be passed into hook func
+ * @notify_unmap: whether we should notify invalid entries
+ * @aw: maximum address width
+ */
+typedef struct {
+    vtd_page_walk_hook hook_fn;
+    void *private;
+    bool notify_unmap;
+    uint8_t aw;
+} vtd_page_walk_info;
+
 static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
-                             vtd_page_walk_hook hook_fn, void *private)
+                             vtd_page_walk_info *info)
 {
+    vtd_page_walk_hook hook_fn = info->hook_fn;
+    void *private = info->private;
+
     assert(hook_fn);
     trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
                             entry->addr_mask, entry->perm);
@@ -763,17 +781,13 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
  * @addr: base GPA addr to start the walk
  * @start: IOVA range start address
  * @end: IOVA range end address (start <= addr < end)
- * @hook_fn: hook func to be called when detected page
- * @private: private data to be passed into hook func
  * @read: whether parent level has read permission
  * @write: whether parent level has write permission
- * @notify_unmap: whether we should notify invalid entries
- * @aw: maximum address width
+ * @info: constant information for the page walk
  */
 static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
-                               uint64_t end, vtd_page_walk_hook hook_fn,
-                               void *private, uint32_t level, bool read,
-                               bool write, bool notify_unmap, uint8_t aw)
+                               uint64_t end, uint32_t level, bool read,
+                               bool write, vtd_page_walk_info *info)
 {
     bool read_cur, write_cur, entry_valid;
     uint32_t offset;
@@ -823,24 +837,24 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
 
         if (vtd_is_last_slpte(slpte, level)) {
             /* NOTE: this is only meaningful if entry_valid == true */
-            entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-            if (!entry_valid && !notify_unmap) {
+            entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+            if (!entry_valid && !info->notify_unmap) {
                 trace_vtd_page_walk_skip_perm(iova, iova_next);
                 goto next;
             }
-            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+            ret = vtd_page_walk_one(&entry, level, info);
             if (ret < 0) {
                 return ret;
             }
         } else {
             if (!entry_valid) {
-                if (notify_unmap) {
+                if (info->notify_unmap) {
                     /*
                      * The whole entry is invalid; unmap it all.
                      * Translated address is meaningless, zero it.
                      */
                     entry.translated_addr = 0x0;
-                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+                    ret = vtd_page_walk_one(&entry, level, info);
                     if (ret < 0) {
                         return ret;
                     }
@@ -849,10 +863,9 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
                 }
                 goto next;
             }
-            ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-                                      MIN(iova_next, end), hook_fn, private,
-                                      level - 1, read_cur, write_cur,
-                                      notify_unmap, aw);
+            ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
+                                      iova, MIN(iova_next, end), level - 1,
+                                      read_cur, write_cur, info);
             if (ret < 0) {
                 return ret;
             }
@@ -871,28 +884,24 @@ next:
  * @ce: context entry to walk upon
  * @start: IOVA address to start the walk
  * @end: IOVA range end address (start <= addr < end)
- * @hook_fn: the hook that to be called for each detected area
- * @private: private data for the hook function
- * @aw: maximum address width
+ * @info: page walking information struct
  */
 static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
-                         vtd_page_walk_hook hook_fn, void *private,
-                         bool notify_unmap, uint8_t aw)
+                         vtd_page_walk_info *info)
 {
     dma_addr_t addr = vtd_ce_get_slpt_base(ce);
     uint32_t level = vtd_ce_get_level(ce);
 
-    if (!vtd_iova_range_check(start, ce, aw)) {
+    if (!vtd_iova_range_check(start, ce, info->aw)) {
         return -VTD_FR_ADDR_BEYOND_MGAW;
     }
 
-    if (!vtd_iova_range_check(end, ce, aw)) {
+    if (!vtd_iova_range_check(end, ce, info->aw)) {
         /* Fix end so that it reaches the maximum */
-        end = vtd_iova_limit(ce, aw);
+        end = vtd_iova_limit(ce, info->aw);
     }
 
-    return vtd_page_walk_level(addr, start, end, hook_fn, private,
-                               level, true, true, notify_unmap, aw);
+    return vtd_page_walk_level(addr, start, end, level, true, true, info);
 }
 
 /* Map a device to its corresponding domain (context-entry) */
@@ -1449,14 +1458,19 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                        vtd_as->devfn, &ce);
         if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
             if (vtd_as_has_map_notifier(vtd_as)) {
+                vtd_page_walk_info info = {
+                    .hook_fn = vtd_page_invalidate_notify_hook,
+                    .private = (void *)&vtd_as->iommu,
+                    .notify_unmap = true,
+                    .aw = s->aw_bits,
+                };
+
                 /*
                  * As long as we have MAP notifications registered in
                  * any of our IOMMU notifiers, we need to sync the
                  * shadow page table.
                  */
-                vtd_page_walk(&ce, addr, addr + size,
-                              vtd_page_invalidate_notify_hook,
-                              (void *)&vtd_as->iommu, true, s->aw_bits);
+                vtd_page_walk(&ce, addr, addr + size, &info);
             } else {
                 /*
                  * For UNMAP-only notifiers, we don't need to walk the
@@ -2924,8 +2938,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
                                   ce.hi, ce.lo);
         if (vtd_as_has_map_notifier(vtd_as)) {
             /* This is required only for MAP typed notifiers */
-            vtd_page_walk(&ce, 0, ~0ULL, vtd_replay_hook, (void *)n, false,
-                          s->aw_bits);
+            vtd_page_walk_info info = {
+                .hook_fn = vtd_replay_hook,
+                .private = (void *)n,
+                .notify_unmap = false,
+                .aw = s->aw_bits,
+            };
+
+            vtd_page_walk(&ce, 0, ~0ULL, &info);
         }
     } else {
         trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 6/9] intel-iommu: pass in address space when page walk
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
                   ` (4 preceding siblings ...)
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 5/9] intel-iommu: introduce vtd_page_walk_info Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 7/9] intel-iommu: trace domain id during " Peter Xu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

We pass in the VTDAddressSpace too.  It'll be used in the follow up
patches.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e247269659..a882894f49 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -754,9 +754,11 @@ typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
  * @hook_fn: hook func to be called when detected page
  * @private: private data to be passed into hook func
  * @notify_unmap: whether we should notify invalid entries
+ * @as: VT-d address space of the device
  * @aw: maximum address width
  */
 typedef struct {
+    VTDAddressSpace *as;
     vtd_page_walk_hook hook_fn;
     void *private;
     bool notify_unmap;
@@ -1463,6 +1465,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                     .private = (void *)&vtd_as->iommu,
                     .notify_unmap = true,
                     .aw = s->aw_bits,
+                    .as = vtd_as,
                 };
 
                 /*
@@ -2943,6 +2946,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
                 .private = (void *)n,
                 .notify_unmap = false,
                 .aw = s->aw_bits,
+                .as = vtd_as,
             };
 
             vtd_page_walk(&ce, 0, ~0ULL, &info);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 7/9] intel-iommu: trace domain id during page walk
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
                   ` (5 preceding siblings ...)
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 6/9] intel-iommu: pass in address space when page walk Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 8/9] util: implement simple iova tree Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 9/9] intel-iommu: rework the page walk logic Peter Xu
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

This patch only modifies the trace points.

Previously we were tracing page walk levels.  They are redundant since
we have page mask (size) already.  Now we trace something much more
useful which is the domain ID of the page walking.  That can be very
useful when we trace more than one devices on the same system, so that
we can know which map is for which domain.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 16 ++++++++++------
 hw/i386/trace-events  |  2 +-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a882894f49..61bb3d31e7 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -756,6 +756,7 @@ typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
  * @notify_unmap: whether we should notify invalid entries
  * @as: VT-d address space of the device
  * @aw: maximum address width
+ * @domain: domain ID of the page walk
  */
 typedef struct {
     VTDAddressSpace *as;
@@ -763,17 +764,18 @@ typedef struct {
     void *private;
     bool notify_unmap;
     uint8_t aw;
+    uint16_t domain_id;
 } vtd_page_walk_info;
 
-static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
-                             vtd_page_walk_info *info)
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
 {
     vtd_page_walk_hook hook_fn = info->hook_fn;
     void *private = info->private;
 
     assert(hook_fn);
-    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
-                            entry->addr_mask, entry->perm);
+    trace_vtd_page_walk_one(info->domain_id, entry->iova,
+                            entry->translated_addr, entry->addr_mask,
+                            entry->perm);
     return hook_fn(entry, private);
 }
 
@@ -844,7 +846,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
                 trace_vtd_page_walk_skip_perm(iova, iova_next);
                 goto next;
             }
-            ret = vtd_page_walk_one(&entry, level, info);
+            ret = vtd_page_walk_one(&entry, info);
             if (ret < 0) {
                 return ret;
             }
@@ -856,7 +858,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
                      * Translated address is meaningless, zero it.
                      */
                     entry.translated_addr = 0x0;
-                    ret = vtd_page_walk_one(&entry, level, info);
+                    ret = vtd_page_walk_one(&entry, info);
                     if (ret < 0) {
                         return ret;
                     }
@@ -1466,6 +1468,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                     .notify_unmap = true,
                     .aw = s->aw_bits,
                     .as = vtd_as,
+                    .domain_id = domain_id,
                 };
 
                 /*
@@ -2947,6 +2950,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
                 .notify_unmap = false,
                 .aw = s->aw_bits,
                 .as = vtd_as,
+                .domain_id = VTD_CONTEXT_ENTRY_DID(ce.hi),
             };
 
             vtd_page_walk(&ce, 0, ~0ULL, &info);
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 22d44648af..ca23ba9fad 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -39,7 +39,7 @@ vtd_fault_disabled(void) "Fault processing disabled for context entry"
 vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t domain, uint64_t hi, uint64_t lo) "replay valid context device %02"PRIx8":%02"PRIx8".%02"PRIx8" domain 0x%"PRIx16" hi 0x%"PRIx64" lo 0x%"PRIx64
 vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid context device %02"PRIx8":%02"PRIx8".%02"PRIx8
 vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t end) "walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 0x%"PRIx64
-vtd_page_walk_one(uint32_t level, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "detected page level 0x%"PRIx32" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
+vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "domain 0x%"PRIu16" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
 vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
 vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
 vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 8/9] util: implement simple iova tree
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
                   ` (6 preceding siblings ...)
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 7/9] intel-iommu: trace domain id during " Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 9/9] intel-iommu: rework the page walk logic Peter Xu
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

Introduce a simplest iova tree implementation based on GTree.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qemu/iova-tree.h | 134 +++++++++++++++++++++++++++++++++++++++
 util/iova-tree.c         | 114 +++++++++++++++++++++++++++++++++
 MAINTAINERS              |   6 ++
 util/Makefile.objs       |   1 +
 4 files changed, 255 insertions(+)
 create mode 100644 include/qemu/iova-tree.h
 create mode 100644 util/iova-tree.c

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
new file mode 100644
index 0000000000..b061932097
--- /dev/null
+++ b/include/qemu/iova-tree.h
@@ -0,0 +1,134 @@
+/*
+ * An very simplified iova tree implementation based on GTree.
+ *
+ * Copyright 2018 Red Hat, Inc.
+ *
+ * Authors:
+ *  Peter Xu <peterx@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ */
+#ifndef IOVA_TREE_H
+#define IOVA_TREE_H
+
+/*
+ * Currently the iova tree will only allow to keep ranges
+ * information, and no extra user data is allowed for each element.  A
+ * benefit is that we can merge adjacent ranges internally within the
+ * tree.  It can save a lot of memory when the ranges are splitted but
+ * mostly continuous.
+ *
+ * Note that current implementation does not provide any thread
+ * protections.  Callers of the iova tree should be responsible
+ * for the thread safety issue.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+#include "exec/hwaddr.h"
+
+#define  IOVA_OK           (0)
+#define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
+#define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
+
+typedef struct IOVATree IOVATree;
+typedef struct DMAMap {
+    hwaddr iova;
+    hwaddr translated_addr;
+    hwaddr size;                /* Inclusive */
+    IOMMUAccessFlags perm;
+} QEMU_PACKED DMAMap;
+typedef gboolean (*iova_tree_iterator)(DMAMap *map);
+
+/**
+ * iova_tree_new:
+ *
+ * Create a new iova tree.
+ *
+ * Returns: the tree pointer when succeeded, or NULL if error.
+ */
+IOVATree *iova_tree_new(void);
+
+/**
+ * iova_tree_insert:
+ *
+ * @tree: the iova tree to insert
+ * @map: the mapping to insert
+ *
+ * Insert an iova range to the tree.  If there is overlapped
+ * ranges, IOVA_ERR_OVERLAP will be returned.
+ *
+ * Return: 0 if succeeded, or <0 if error.
+ */
+int iova_tree_insert(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_remove:
+ *
+ * @tree: the iova tree to remove range from
+ * @map: the map range to remove
+ *
+ * Remove mappings from the tree that are covered by the map range
+ * provided.  The range does not need to be exactly what has inserted,
+ * all the mappings that are included in the provided range will be
+ * removed from the tree.  Here map->translated_addr is meaningless.
+ *
+ * Return: 0 if succeeded, or <0 if error.
+ */
+int iova_tree_remove(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_find:
+ *
+ * @tree: the iova tree to search from
+ * @map: the mapping to search
+ *
+ * Search for a mapping in the iova tree that overlaps with the
+ * mapping range specified.  Only the first found mapping will be
+ * returned.
+ *
+ * Return: DMAMap pointer if found, or NULL if not found.  Note that
+ * the returned DMAMap pointer is maintained internally.  User should
+ * only read the content but never modify or free the content.  Also,
+ * user is responsible to make sure the pointer is valid (say, no
+ * concurrent deletion in progress).
+ */
+DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_find_address:
+ *
+ * @tree: the iova tree to search from
+ * @iova: the iova address to find
+ *
+ * Similar to iova_tree_find(), but it tries to find mapping with
+ * range iova=iova & size=0.
+ *
+ * Return: same as iova_tree_find().
+ */
+DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova);
+
+/**
+ * iova_tree_foreach:
+ *
+ * @tree: the iova tree to iterate on
+ * @iterator: the interator for the mappings, return true to stop
+ *
+ * Iterate over the iova tree.
+ *
+ * Return: 1 if found any overlap, 0 if not, <0 if error.
+ */
+void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
+
+/**
+ * iova_tree_destroy:
+ *
+ * @tree: the iova tree to destroy
+ *
+ * Destroy an existing iova tree.
+ *
+ * Return: None.
+ */
+void iova_tree_destroy(IOVATree *tree);
+
+#endif
diff --git a/util/iova-tree.c b/util/iova-tree.c
new file mode 100644
index 0000000000..2d9cebfc89
--- /dev/null
+++ b/util/iova-tree.c
@@ -0,0 +1,114 @@
+/*
+ * IOVA tree implementation based on GTree.
+ *
+ * Copyright 2018 Red Hat, Inc.
+ *
+ * Authors:
+ *  Peter Xu <peterx@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ */
+
+#include <glib.h>
+#include "qemu/iova-tree.h"
+
+struct IOVATree {
+    GTree *tree;
+};
+
+static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
+{
+    const DMAMap *m1 = a, *m2 = b;
+
+    if (m1->iova > m2->iova + m2->size) {
+        return 1;
+    }
+
+    if (m1->iova + m1->size < m2->iova) {
+        return -1;
+    }
+
+    /* Overlapped */
+    return 0;
+}
+
+IOVATree *iova_tree_new(void)
+{
+    IOVATree *iova_tree = g_new0(IOVATree, 1);
+
+    /* We don't have values actually, no need to free */
+    iova_tree->tree = g_tree_new_full(iova_tree_compare, NULL, g_free, NULL);
+
+    return iova_tree;
+}
+
+DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map)
+{
+    return g_tree_lookup(tree->tree, map);
+}
+
+DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova)
+{
+    DMAMap map = { .iova = iova, .size = 0 };
+
+    return iova_tree_find(tree, &map);
+}
+
+static inline void iova_tree_insert_internal(GTree *gtree, DMAMap *range)
+{
+    /* Key and value are sharing the same range data */
+    g_tree_insert(gtree, range, range);
+}
+
+int iova_tree_insert(IOVATree *tree, DMAMap *map)
+{
+    DMAMap *new;
+
+    if (map->iova + map->size < map->iova || map->perm == IOMMU_NONE) {
+        return IOVA_ERR_INVALID;
+    }
+
+    /* We don't allow to insert range that overlaps with existings */
+    if (iova_tree_find(tree, map)) {
+        return IOVA_ERR_OVERLAP;
+    }
+
+    new = g_new0(DMAMap, 1);
+    memcpy(new, map, sizeof(*new));
+    iova_tree_insert_internal(tree->tree, new);
+
+    return IOVA_OK;
+}
+
+static gboolean iova_tree_traverse(gpointer key, gpointer value,
+                                gpointer data)
+{
+    iova_tree_iterator iterator = data;
+    DMAMap *map = key;
+
+    g_assert(key == value);
+
+    return iterator(map);
+}
+
+void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
+{
+    g_tree_foreach(tree->tree, iova_tree_traverse, iterator);
+}
+
+int iova_tree_remove(IOVATree *tree, DMAMap *map)
+{
+    DMAMap *overlap;
+
+    while ((overlap = iova_tree_find(tree, map))) {
+        g_tree_remove(tree->tree, overlap);
+    }
+
+    return IOVA_OK;
+}
+
+void iova_tree_destroy(IOVATree *tree)
+{
+    g_tree_destroy(tree->tree);
+    g_free(tree);
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index e187b1f18f..f07fcee72e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1783,6 +1783,12 @@ F: include/sysemu/replay.h
 F: docs/replay.txt
 F: stubs/replay.c
 
+IOVA Tree
+M: Peter Xu <peterx@redhat.com>
+S: Maintained
+F: include/qemu/iova-tree.h
+F: util/iova-tree.c
+
 Usermode Emulation
 ------------------
 Overall
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 728c3541db..e1c3fed4dc 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -47,4 +47,5 @@ util-obj-y += qht.o
 util-obj-y += range.o
 util-obj-y += stats64.o
 util-obj-y += systemd.o
+util-obj-y += iova-tree.o
 util-obj-$(CONFIG_LINUX) += vfio-helpers.o
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v4 9/9] intel-iommu: rework the page walk logic
  2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
                   ` (7 preceding siblings ...)
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 8/9] util: implement simple iova tree Peter Xu
@ 2018-05-18  7:25 ` Peter Xu
  2018-05-23 14:33   ` Michael S. Tsirkin
  8 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2018-05-18  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tian Kevin, QEMU Stable, Jintack Lim, peterx, Jason Wang,
	Michael S . Tsirkin, Alex Williamson

SECURITY IMPLICATION: this patch will fix a potential small window that
the DMA page table might be incomplete or invalid when the guest sends
domain/context invalidations to a device.  It can cause random DMA
errors for assigned devices.

This is a major change to the VT-d shadow page walking logic. It
includes but not limited to:

- For each VTDAddressSpace, now we maintain what IOVA ranges we have
  mapped and what we have not.  With that information, now we only send
  MAP or UNMAP when necessary.  Say, we don't send MAP notifies if we
  know we have already mapped the range, meanwhile we don't send UNMAP
  notifies if we know we never mapped the range at all.

- Introduced vtd_sync_shadow_page_table[_range] APIs so that we can call
  in any places to resync the shadow page table for a device.

- When we receive domain/context invalidation, we should not really run
  the replay logic, instead we use the new sync shadow page table API to
  resync the whole shadow page table without unmapping the whole
  region.  After this change, we'll only do the page walk once for each
  domain invalidations (before this, it can be multiple, depending on
  number of notifiers per address space).

Since at it, the page walking logic is also refactored to be simpler.

CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Tested-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/intel_iommu.h |   2 +
 hw/i386/intel_iommu.c         | 213 +++++++++++++++++++++++++---------
 hw/i386/trace-events          |   3 +-
 3 files changed, 159 insertions(+), 59 deletions(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 156f35e919..fbfedcb1c0 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -27,6 +27,7 @@
 #include "hw/i386/ioapic.h"
 #include "hw/pci/msi.h"
 #include "hw/sysbus.h"
+#include "qemu/iova-tree.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
@@ -95,6 +96,7 @@ struct VTDAddressSpace {
     QLIST_ENTRY(VTDAddressSpace) next;
     /* Superset of notifier flags that this address space has */
     IOMMUNotifierFlag notifier_flags;
+    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
 };
 
 struct VTDBus {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 61bb3d31e7..b5a09b7908 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -769,10 +769,77 @@ typedef struct {
 
 static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
 {
+    VTDAddressSpace *as = info->as;
     vtd_page_walk_hook hook_fn = info->hook_fn;
     void *private = info->private;
+    DMAMap target = {
+        .iova = entry->iova,
+        .size = entry->addr_mask,
+        .translated_addr = entry->translated_addr,
+        .perm = entry->perm,
+    };
+    DMAMap *mapped = iova_tree_find(as->iova_tree, &target);
+
+    if (entry->perm == IOMMU_NONE && !info->notify_unmap) {
+        trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
+        return 0;
+    }
 
     assert(hook_fn);
+
+    /* Update local IOVA mapped ranges */
+    if (entry->perm) {
+        if (mapped) {
+            /* If it's exactly the same translation, skip */
+            if (!memcmp(mapped, &target, sizeof(target))) {
+                trace_vtd_page_walk_one_skip_map(entry->iova, entry->addr_mask,
+                                                 entry->translated_addr);
+                return 0;
+            } else {
+                /*
+                 * Translation changed.  Normally this should not
+                 * happen, but it can happen when with buggy guest
+                 * OSes.  Note that there will be a small window that
+                 * we don't have map at all.  But that's the best
+                 * effort we can do.  The ideal way to emulate this is
+                 * atomically modify the PTE to follow what has
+                 * changed, but we can't.  One example is that vfio
+                 * driver only has VFIO_IOMMU_[UN]MAP_DMA but no
+                 * interface to modify a mapping (meanwhile it seems
+                 * meaningless to even provide one).  Anyway, let's
+                 * mark this as a TODO in case one day we'll have
+                 * a better solution.
+                 */
+                IOMMUAccessFlags cache_perm = entry->perm;
+                int ret;
+
+                /* Emulate an UNMAP */
+                entry->perm = IOMMU_NONE;
+                trace_vtd_page_walk_one(info->domain_id,
+                                        entry->iova,
+                                        entry->translated_addr,
+                                        entry->addr_mask,
+                                        entry->perm);
+                ret = hook_fn(entry, private);
+                if (ret) {
+                    return ret;
+                }
+                /* Drop any existing mapping */
+                iova_tree_remove(as->iova_tree, &target);
+                /* Recover the correct permission */
+                entry->perm = cache_perm;
+            }
+        }
+        iova_tree_insert(as->iova_tree, &target);
+    } else {
+        if (!mapped) {
+            /* Skip since we didn't map this range at all */
+            trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
+            return 0;
+        }
+        iova_tree_remove(as->iova_tree, &target);
+    }
+
     trace_vtd_page_walk_one(info->domain_id, entry->iova,
                             entry->translated_addr, entry->addr_mask,
                             entry->perm);
@@ -834,45 +901,34 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
          */
         entry_valid = read_cur | write_cur;
 
-        entry.target_as = &address_space_memory;
-        entry.iova = iova & subpage_mask;
-        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
-        entry.addr_mask = ~subpage_mask;
-
-        if (vtd_is_last_slpte(slpte, level)) {
-            /* NOTE: this is only meaningful if entry_valid == true */
-            entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
-            if (!entry_valid && !info->notify_unmap) {
-                trace_vtd_page_walk_skip_perm(iova, iova_next);
-                goto next;
-            }
-            ret = vtd_page_walk_one(&entry, info);
-            if (ret < 0) {
-                return ret;
-            }
-        } else {
-            if (!entry_valid) {
-                if (info->notify_unmap) {
-                    /*
-                     * The whole entry is invalid; unmap it all.
-                     * Translated address is meaningless, zero it.
-                     */
-                    entry.translated_addr = 0x0;
-                    ret = vtd_page_walk_one(&entry, info);
-                    if (ret < 0) {
-                        return ret;
-                    }
-                } else {
-                    trace_vtd_page_walk_skip_perm(iova, iova_next);
-                }
-                goto next;
-            }
+        if (!vtd_is_last_slpte(slpte, level) && entry_valid) {
+            /*
+             * This is a valid PDE (or even bigger than PDE).  We need
+             * to walk one further level.
+             */
             ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
                                       iova, MIN(iova_next, end), level - 1,
                                       read_cur, write_cur, info);
-            if (ret < 0) {
-                return ret;
-            }
+        } else {
+            /*
+             * This means we are either:
+             *
+             * (1) the real page entry (either 4K page, or huge page)
+             * (2) the whole range is invalid
+             *
+             * In either case, we send an IOTLB notification down.
+             */
+            entry.target_as = &address_space_memory;
+            entry.iova = iova & subpage_mask;
+            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+            entry.addr_mask = ~subpage_mask;
+            /* NOTE: this is only meaningful if entry_valid == true */
+            entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+            ret = vtd_page_walk_one(&entry, info);
+        }
+
+        if (ret < 0) {
+            return ret;
         }
 
 next:
@@ -964,6 +1020,58 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     return 0;
 }
 
+static int vtd_sync_shadow_page_hook(IOMMUTLBEntry *entry,
+                                     void *private)
+{
+    memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
+    return 0;
+}
+
+/* If context entry is NULL, we'll try to fetch it on our own. */
+static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
+                                            VTDContextEntry *ce,
+                                            hwaddr addr, hwaddr size)
+{
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    vtd_page_walk_info info = {
+        .hook_fn = vtd_sync_shadow_page_hook,
+        .private = (void *)&vtd_as->iommu,
+        .notify_unmap = true,
+        .aw = s->aw_bits,
+        .as = vtd_as,
+    };
+    VTDContextEntry ce_cache;
+    int ret;
+
+    if (ce) {
+        /* If the caller provided context entry, use it */
+        ce_cache = *ce;
+    } else {
+        /* If the caller didn't provide ce, try to fetch */
+        ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+                                       vtd_as->devfn, &ce_cache);
+        if (ret) {
+            /*
+             * This should not really happen, but in case it happens,
+             * we just skip the sync for this time.  After all we even
+             * don't have the root table pointer!
+             */
+            trace_vtd_err("Detected invalid context entry when "
+                          "trying to sync shadow page table");
+            return 0;
+        }
+    }
+
+    info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
+
+    return vtd_page_walk(&ce_cache, addr, addr + size, &info);
+}
+
+static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
+{
+    return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
+}
+
 /*
  * Fetch translation type for specific device. Returns <0 if error
  * happens, otherwise return the shifted type to check against
@@ -1296,7 +1404,7 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
     VTDAddressSpace *vtd_as;
 
     QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
-        memory_region_iommu_replay_all(&vtd_as->iommu);
+        vtd_sync_shadow_page_table(vtd_as);
     }
 }
 
@@ -1371,14 +1479,13 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                 vtd_switch_address_space(vtd_as);
                 /*
                  * So a device is moving out of (or moving into) a
-                 * domain, a replay() suites here to notify all the
-                 * IOMMU_NOTIFIER_MAP registers about this change.
+                 * domain, resync the shadow page table.
                  * This won't bring bad even if we have no such
                  * notifier registered - the IOMMU notification
                  * framework will skip MAP notifications if that
                  * happened.
                  */
-                memory_region_iommu_replay_all(&vtd_as->iommu);
+                vtd_sync_shadow_page_table(vtd_as);
             }
         }
     }
@@ -1436,18 +1543,11 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
         if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
                                       vtd_as->devfn, &ce) &&
             domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
-            memory_region_iommu_replay_all(&vtd_as->iommu);
+            vtd_sync_shadow_page_table(vtd_as);
         }
     }
 }
 
-static int vtd_page_invalidate_notify_hook(IOMMUTLBEntry *entry,
-                                           void *private)
-{
-    memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
-    return 0;
-}
-
 static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                            uint16_t domain_id, hwaddr addr,
                                            uint8_t am)
@@ -1462,21 +1562,12 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                        vtd_as->devfn, &ce);
         if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
             if (vtd_as_has_map_notifier(vtd_as)) {
-                vtd_page_walk_info info = {
-                    .hook_fn = vtd_page_invalidate_notify_hook,
-                    .private = (void *)&vtd_as->iommu,
-                    .notify_unmap = true,
-                    .aw = s->aw_bits,
-                    .as = vtd_as,
-                    .domain_id = domain_id,
-                };
-
                 /*
                  * As long as we have MAP notifications registered in
                  * any of our IOMMU notifiers, we need to sync the
                  * shadow page table.
                  */
-                vtd_page_walk(&ce, addr, addr + size, &info);
+                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
             } else {
                 /*
                  * For UNMAP-only notifiers, we don't need to walk the
@@ -2806,6 +2897,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
         vtd_dev_as->devfn = (uint8_t)devfn;
         vtd_dev_as->iommu_state = s;
         vtd_dev_as->context_cache_entry.context_cache_gen = 0;
+        vtd_dev_as->iova_tree = iova_tree_new();
 
         /*
          * Memory region relationships looks like (Address range shows
@@ -2858,6 +2950,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
     hwaddr start = n->start;
     hwaddr end = n->end;
     IntelIOMMUState *s = as->iommu_state;
+    DMAMap map;
 
     /*
      * Note: all the codes in this function has a assumption that IOVA
@@ -2902,6 +2995,10 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
                              VTD_PCI_FUNC(as->devfn),
                              entry.iova, size);
 
+    map.iova = entry.iova;
+    map.size = entry.addr_mask;
+    iova_tree_remove(as->iova_tree, &map);
+
     memory_region_notify_one(n, &entry);
 }
 
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index ca23ba9fad..e14d06ec83 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -40,8 +40,9 @@ vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t domain, uint6
 vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid context device %02"PRIx8":%02"PRIx8".%02"PRIx8
 vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t end) "walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 0x%"PRIx64
 vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "domain 0x%"PRIu16" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
+vtd_page_walk_one_skip_map(uint64_t iova, uint64_t mask, uint64_t translated) "iova 0x%"PRIx64" mask 0x%"PRIx64" translated 0x%"PRIx64
+vtd_page_walk_one_skip_unmap(uint64_t iova, uint64_t mask) "iova 0x%"PRIx64" mask 0x%"PRIx64
 vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
-vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
 vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
 vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
 vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, uint64_t size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v4 9/9] intel-iommu: rework the page walk logic
  2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 9/9] intel-iommu: rework the page walk logic Peter Xu
@ 2018-05-23 14:33   ` Michael S. Tsirkin
  2018-05-24  2:54     ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 14:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Tian Kevin, QEMU Stable, Jintack Lim, Jason Wang,
	Alex Williamson

On Fri, May 18, 2018 at 03:25:17PM +0800, Peter Xu wrote:
> SECURITY IMPLICATION: this patch will fix a potential small window that
> the DMA page table might be incomplete or invalid when the guest sends
> domain/context invalidations to a device.  It can cause random DMA
> errors for assigned devices.

So this is more a correctness IMO. I don't see how can
e.g. an application within guest cause any mischief
with this, it will just get a non working device.

> 
> This is a major change to the VT-d shadow page walking logic. It
> includes but not limited to:
> 
> - For each VTDAddressSpace, now we maintain what IOVA ranges we have
>   mapped and what we have not.  With that information, now we only send
>   MAP or UNMAP when necessary.  Say, we don't send MAP notifies if we
>   know we have already mapped the range, meanwhile we don't send UNMAP
>   notifies if we know we never mapped the range at all.
> 
> - Introduced vtd_sync_shadow_page_table[_range] APIs so that we can call
>   in any places to resync the shadow page table for a device.
> 
> - When we receive domain/context invalidation, we should not really run
>   the replay logic, instead we use the new sync shadow page table API to
>   resync the whole shadow page table without unmapping the whole
>   region.  After this change, we'll only do the page walk once for each
>   domain invalidations (before this, it can be multiple, depending on
>   number of notifiers per address space).
> 
> Since at it, the page walking logic is also refactored to be simpler.
> 
> CC: QEMU Stable <qemu-stable@nongnu.org>
> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
> Tested-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/hw/i386/intel_iommu.h |   2 +
>  hw/i386/intel_iommu.c         | 213 +++++++++++++++++++++++++---------
>  hw/i386/trace-events          |   3 +-
>  3 files changed, 159 insertions(+), 59 deletions(-)
> 
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 156f35e919..fbfedcb1c0 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -27,6 +27,7 @@
>  #include "hw/i386/ioapic.h"
>  #include "hw/pci/msi.h"
>  #include "hw/sysbus.h"
> +#include "qemu/iova-tree.h"
>  
>  #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
>  #define INTEL_IOMMU_DEVICE(obj) \
> @@ -95,6 +96,7 @@ struct VTDAddressSpace {
>      QLIST_ENTRY(VTDAddressSpace) next;
>      /* Superset of notifier flags that this address space has */
>      IOMMUNotifierFlag notifier_flags;
> +    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
>  };
>  
>  struct VTDBus {
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 61bb3d31e7..b5a09b7908 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -769,10 +769,77 @@ typedef struct {
>  
>  static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
>  {
> +    VTDAddressSpace *as = info->as;
>      vtd_page_walk_hook hook_fn = info->hook_fn;
>      void *private = info->private;
> +    DMAMap target = {
> +        .iova = entry->iova,
> +        .size = entry->addr_mask,
> +        .translated_addr = entry->translated_addr,
> +        .perm = entry->perm,
> +    };
> +    DMAMap *mapped = iova_tree_find(as->iova_tree, &target);
> +
> +    if (entry->perm == IOMMU_NONE && !info->notify_unmap) {
> +        trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
> +        return 0;
> +    }
>  
>      assert(hook_fn);
> +
> +    /* Update local IOVA mapped ranges */
> +    if (entry->perm) {
> +        if (mapped) {
> +            /* If it's exactly the same translation, skip */
> +            if (!memcmp(mapped, &target, sizeof(target))) {
> +                trace_vtd_page_walk_one_skip_map(entry->iova, entry->addr_mask,
> +                                                 entry->translated_addr);
> +                return 0;
> +            } else {
> +                /*
> +                 * Translation changed.  Normally this should not
> +                 * happen, but it can happen when with buggy guest
> +                 * OSes.  Note that there will be a small window that
> +                 * we don't have map at all.  But that's the best
> +                 * effort we can do.  The ideal way to emulate this is
> +                 * atomically modify the PTE to follow what has
> +                 * changed, but we can't.  One example is that vfio
> +                 * driver only has VFIO_IOMMU_[UN]MAP_DMA but no
> +                 * interface to modify a mapping (meanwhile it seems
> +                 * meaningless to even provide one).  Anyway, let's
> +                 * mark this as a TODO in case one day we'll have
> +                 * a better solution.
> +                 */
> +                IOMMUAccessFlags cache_perm = entry->perm;
> +                int ret;
> +
> +                /* Emulate an UNMAP */
> +                entry->perm = IOMMU_NONE;
> +                trace_vtd_page_walk_one(info->domain_id,
> +                                        entry->iova,
> +                                        entry->translated_addr,
> +                                        entry->addr_mask,
> +                                        entry->perm);
> +                ret = hook_fn(entry, private);
> +                if (ret) {
> +                    return ret;
> +                }
> +                /* Drop any existing mapping */
> +                iova_tree_remove(as->iova_tree, &target);
> +                /* Recover the correct permission */
> +                entry->perm = cache_perm;
> +            }
> +        }
> +        iova_tree_insert(as->iova_tree, &target);
> +    } else {
> +        if (!mapped) {
> +            /* Skip since we didn't map this range at all */
> +            trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
> +            return 0;
> +        }
> +        iova_tree_remove(as->iova_tree, &target);
> +    }
> +
>      trace_vtd_page_walk_one(info->domain_id, entry->iova,
>                              entry->translated_addr, entry->addr_mask,
>                              entry->perm);
> @@ -834,45 +901,34 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
>           */
>          entry_valid = read_cur | write_cur;
>  
> -        entry.target_as = &address_space_memory;
> -        entry.iova = iova & subpage_mask;
> -        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> -        entry.addr_mask = ~subpage_mask;
> -
> -        if (vtd_is_last_slpte(slpte, level)) {
> -            /* NOTE: this is only meaningful if entry_valid == true */
> -            entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
> -            if (!entry_valid && !info->notify_unmap) {
> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> -                goto next;
> -            }
> -            ret = vtd_page_walk_one(&entry, info);
> -            if (ret < 0) {
> -                return ret;
> -            }
> -        } else {
> -            if (!entry_valid) {
> -                if (info->notify_unmap) {
> -                    /*
> -                     * The whole entry is invalid; unmap it all.
> -                     * Translated address is meaningless, zero it.
> -                     */
> -                    entry.translated_addr = 0x0;
> -                    ret = vtd_page_walk_one(&entry, info);
> -                    if (ret < 0) {
> -                        return ret;
> -                    }
> -                } else {
> -                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> -                }
> -                goto next;
> -            }
> +        if (!vtd_is_last_slpte(slpte, level) && entry_valid) {
> +            /*
> +             * This is a valid PDE (or even bigger than PDE).  We need
> +             * to walk one further level.
> +             */
>              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
>                                        iova, MIN(iova_next, end), level - 1,
>                                        read_cur, write_cur, info);
> -            if (ret < 0) {
> -                return ret;
> -            }
> +        } else {
> +            /*
> +             * This means we are either:
> +             *
> +             * (1) the real page entry (either 4K page, or huge page)
> +             * (2) the whole range is invalid
> +             *
> +             * In either case, we send an IOTLB notification down.
> +             */
> +            entry.target_as = &address_space_memory;
> +            entry.iova = iova & subpage_mask;
> +            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +            entry.addr_mask = ~subpage_mask;
> +            /* NOTE: this is only meaningful if entry_valid == true */
> +            entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
> +            ret = vtd_page_walk_one(&entry, info);
> +        }
> +
> +        if (ret < 0) {
> +            return ret;
>          }
>  
>  next:
> @@ -964,6 +1020,58 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>      return 0;
>  }
>  
> +static int vtd_sync_shadow_page_hook(IOMMUTLBEntry *entry,
> +                                     void *private)
> +{
> +    memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
> +    return 0;
> +}
> +
> +/* If context entry is NULL, we'll try to fetch it on our own. */
> +static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
> +                                            VTDContextEntry *ce,
> +                                            hwaddr addr, hwaddr size)
> +{
> +    IntelIOMMUState *s = vtd_as->iommu_state;
> +    vtd_page_walk_info info = {
> +        .hook_fn = vtd_sync_shadow_page_hook,
> +        .private = (void *)&vtd_as->iommu,
> +        .notify_unmap = true,
> +        .aw = s->aw_bits,
> +        .as = vtd_as,
> +    };
> +    VTDContextEntry ce_cache;
> +    int ret;
> +
> +    if (ce) {
> +        /* If the caller provided context entry, use it */
> +        ce_cache = *ce;
> +    } else {
> +        /* If the caller didn't provide ce, try to fetch */
> +        ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> +                                       vtd_as->devfn, &ce_cache);
> +        if (ret) {
> +            /*
> +             * This should not really happen, but in case it happens,
> +             * we just skip the sync for this time.  After all we even
> +             * don't have the root table pointer!
> +             */
> +            trace_vtd_err("Detected invalid context entry when "
> +                          "trying to sync shadow page table");
> +            return 0;
> +        }
> +    }
> +
> +    info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
> +
> +    return vtd_page_walk(&ce_cache, addr, addr + size, &info);
> +}
> +
> +static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> +{
> +    return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
> +}
> +
>  /*
>   * Fetch translation type for specific device. Returns <0 if error
>   * happens, otherwise return the shifted type to check against
> @@ -1296,7 +1404,7 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
>      VTDAddressSpace *vtd_as;
>  
>      QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
> -        memory_region_iommu_replay_all(&vtd_as->iommu);
> +        vtd_sync_shadow_page_table(vtd_as);
>      }
>  }
>  
> @@ -1371,14 +1479,13 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
>                  vtd_switch_address_space(vtd_as);
>                  /*
>                   * So a device is moving out of (or moving into) a
> -                 * domain, a replay() suites here to notify all the
> -                 * IOMMU_NOTIFIER_MAP registers about this change.
> +                 * domain, resync the shadow page table.
>                   * This won't bring bad even if we have no such
>                   * notifier registered - the IOMMU notification
>                   * framework will skip MAP notifications if that
>                   * happened.
>                   */
> -                memory_region_iommu_replay_all(&vtd_as->iommu);
> +                vtd_sync_shadow_page_table(vtd_as);
>              }
>          }
>      }
> @@ -1436,18 +1543,11 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
>          if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
>                                        vtd_as->devfn, &ce) &&
>              domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
> -            memory_region_iommu_replay_all(&vtd_as->iommu);
> +            vtd_sync_shadow_page_table(vtd_as);
>          }
>      }
>  }
>  
> -static int vtd_page_invalidate_notify_hook(IOMMUTLBEntry *entry,
> -                                           void *private)
> -{
> -    memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
> -    return 0;
> -}
> -
>  static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
>                                             uint16_t domain_id, hwaddr addr,
>                                             uint8_t am)
> @@ -1462,21 +1562,12 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
>                                         vtd_as->devfn, &ce);
>          if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
>              if (vtd_as_has_map_notifier(vtd_as)) {
> -                vtd_page_walk_info info = {
> -                    .hook_fn = vtd_page_invalidate_notify_hook,
> -                    .private = (void *)&vtd_as->iommu,
> -                    .notify_unmap = true,
> -                    .aw = s->aw_bits,
> -                    .as = vtd_as,
> -                    .domain_id = domain_id,
> -                };
> -
>                  /*
>                   * As long as we have MAP notifications registered in
>                   * any of our IOMMU notifiers, we need to sync the
>                   * shadow page table.
>                   */
> -                vtd_page_walk(&ce, addr, addr + size, &info);
> +                vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
>              } else {
>                  /*
>                   * For UNMAP-only notifiers, we don't need to walk the
> @@ -2806,6 +2897,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>          vtd_dev_as->devfn = (uint8_t)devfn;
>          vtd_dev_as->iommu_state = s;
>          vtd_dev_as->context_cache_entry.context_cache_gen = 0;
> +        vtd_dev_as->iova_tree = iova_tree_new();
>  
>          /*
>           * Memory region relationships looks like (Address range shows
> @@ -2858,6 +2950,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>      hwaddr start = n->start;
>      hwaddr end = n->end;
>      IntelIOMMUState *s = as->iommu_state;
> +    DMAMap map;
>  
>      /*
>       * Note: all the codes in this function has a assumption that IOVA
> @@ -2902,6 +2995,10 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>                               VTD_PCI_FUNC(as->devfn),
>                               entry.iova, size);
>  
> +    map.iova = entry.iova;
> +    map.size = entry.addr_mask;
> +    iova_tree_remove(as->iova_tree, &map);
> +
>      memory_region_notify_one(n, &entry);
>  }
>  
> diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> index ca23ba9fad..e14d06ec83 100644
> --- a/hw/i386/trace-events
> +++ b/hw/i386/trace-events
> @@ -40,8 +40,9 @@ vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t domain, uint6
>  vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid context device %02"PRIx8":%02"PRIx8".%02"PRIx8
>  vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t end) "walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 0x%"PRIx64
>  vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "domain 0x%"PRIu16" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
> +vtd_page_walk_one_skip_map(uint64_t iova, uint64_t mask, uint64_t translated) "iova 0x%"PRIx64" mask 0x%"PRIx64" translated 0x%"PRIx64
> +vtd_page_walk_one_skip_unmap(uint64_t iova, uint64_t mask) "iova 0x%"PRIx64" mask 0x%"PRIx64
>  vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
> -vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
>  vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
>  vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
>  vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, uint64_t size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
> -- 
> 2.17.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v4 9/9] intel-iommu: rework the page walk logic
  2018-05-23 14:33   ` Michael S. Tsirkin
@ 2018-05-24  2:54     ` Peter Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2018-05-24  2:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Tian Kevin, QEMU Stable, Jintack Lim, Jason Wang,
	Alex Williamson

On Wed, May 23, 2018 at 05:33:34PM +0300, Michael S. Tsirkin wrote:
> On Fri, May 18, 2018 at 03:25:17PM +0800, Peter Xu wrote:
> > SECURITY IMPLICATION: this patch will fix a potential small window that
> > the DMA page table might be incomplete or invalid when the guest sends
> > domain/context invalidations to a device.  It can cause random DMA
> > errors for assigned devices.
> 
> So this is more a correctness IMO. I don't see how can
> e.g. an application within guest cause any mischief
> with this, it will just get a non working device.

Yes, you are right.  It causes valid mappings missing so there's no
risk to anyone but only unfunction.  I should remove the "SECURITY
IMPLICATION" wordings.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-05-24  2:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18  7:25 [Qemu-devel] [PATCH v4 0/9] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 1/9] intel-iommu: send PSI always even if across PDEs Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 2/9] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 3/9] intel-iommu: add iommu lock Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 4/9] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 5/9] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 6/9] intel-iommu: pass in address space when page walk Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 7/9] intel-iommu: trace domain id during " Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 8/9] util: implement simple iova tree Peter Xu
2018-05-18  7:25 ` [Qemu-devel] [PATCH v4 9/9] intel-iommu: rework the page walk logic Peter Xu
2018-05-23 14:33   ` Michael S. Tsirkin
2018-05-24  2:54     ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.