All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
@ 2016-10-17 15:44 Aviv B.D
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Aviv B.D @ 2016-10-17 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

* Advertize Cache Mode capability in iommu cap register. 
  This capability is controlled by "cache-mode" property of intel-iommu device.
  To enable this option call QEMU with "-device intel-iommu,cache-mode=true".

* On page cache invalidation in intel vIOMMU, check if the domain belong to
  registered notifier, and notify accordingly.

Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
present. Current problems:
* vfio_iommu_map_notify is not aware about memory range belong to specific 
  VFIOGuestIOMMU.
* memory_region_iommu_replay hangs QEMU on start up while it itterate over 
  64bit address space. Commenting out the call to this function enables 
  workable VFIO device while vIOMMU present.

Changes from v1 to v2:
* remove assumption that the cache do not clears
* fix lockup on high load.

Changes from v2 to v3:
* remove debug leftovers
* split to sepearate commits
* change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
  to suppress error propagating to guest.

Changes from v3 to v4:
* Add property to intel_iommu device to control the CM capability, 
  default to False.
* Use s->iommu_ops.notify_flag_changed to register notifiers.

Changes from v4 to v4 RESEND:
* Fix codding style pointed by checkpatch.pl script.

Aviv Ben-David (3):
  IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
    guest
  IOMMU: change iommu_op->translate's is_write to flags, add support to
    NO_FAIL flag mode
  IOMMU: enable intel_iommu map and unmap notifiers

 exec.c                         |   3 +-
 hw/i386/amd_iommu.c            |   4 +-
 hw/i386/intel_iommu.c          | 175 +++++++++++++++++++++++++++++++++--------
 hw/i386/intel_iommu_internal.h |   3 +
 include/exec/memory.h          |   6 +-
 include/hw/i386/intel_iommu.h  |  11 +++
 memory.c                       |   3 +-
 7 files changed, 168 insertions(+), 37 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-10-17 15:44 [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
@ 2016-10-17 15:44 ` Aviv B.D
  2016-10-21  7:14   ` Jason Wang
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Aviv B.D @ 2016-10-17 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

This capability asks the guest to invalidate cache before each map operation.
We can use this invalidation to trap map operations in the hypervisor.

Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
---
 hw/i386/intel_iommu.c          | 5 +++++
 hw/i386/intel_iommu_internal.h | 1 +
 include/hw/i386/intel_iommu.h  | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2efd69b..69730cb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2012,6 +2012,7 @@ static const MemoryRegionOps vtd_mem_ops = {
 
 static Property vtd_properties[] = {
     DEFINE_PROP_UINT32("version", IntelIOMMUState, version, 0),
+    DEFINE_PROP_BOOL("cache-mode", IntelIOMMUState, cache_mode_enabled, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2385,6 +2386,10 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM | VTD_ECAP_MHMV;
     }
 
+    if (s->cache_mode_enabled) {
+        s->cap |= VTD_CAP_CM;
+    }
+
     vtd_reset_context_cache(s);
     vtd_reset_iotlb(s);
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 0829a50..35d9f3a 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -201,6 +201,7 @@
 #define VTD_CAP_MAMV                (VTD_MAMV << 48)
 #define VTD_CAP_PSI                 (1ULL << 39)
 #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
+#define VTD_CAP_CM                  (1ULL << 7)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT         8
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index a42dbd7..7a94f16 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -258,6 +258,8 @@ struct IntelIOMMUState {
     uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
     uint32_t version;
 
+    bool cache_mode_enabled;        /* RO - is cap CM enabled? */
+
     dma_addr_t root;                /* Current root table pointer */
     bool root_extended;             /* Type of root table (extended or not) */
     bool dmar_enabled;              /* Set if DMA remapping is enabled */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode
  2016-10-17 15:44 [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
@ 2016-10-17 15:44 ` Aviv B.D
  2016-10-18  3:57   ` David Gibson
  2016-10-19  8:35   ` Peter Xu
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
  2016-10-17 16:07 ` [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Alex Williamson
  3 siblings, 2 replies; 31+ messages in thread
From: Aviv B.D @ 2016-10-17 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

Supports translation trials without reporting error to guest on
translation failure.

Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
---
 exec.c                |  3 ++-
 hw/i386/amd_iommu.c   |  4 +--
 hw/i386/intel_iommu.c | 70 +++++++++++++++++++++++++++++++++------------------
 include/exec/memory.h |  6 +++--
 memory.c              |  3 ++-
 5 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/exec.c b/exec.c
index 374c364..266fa01 100644
--- a/exec.c
+++ b/exec.c
@@ -432,7 +432,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
             break;
         }
 
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = mr->iommu_ops->translate(mr, addr,
+                                         is_write ? IOMMU_WO : IOMMU_RO);
         addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
                 | (addr & iotlb.addr_mask));
         *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 47b79d9..1f0d76b 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -988,7 +988,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
 }
 
 static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
-                                     bool is_write)
+                                     IOMMUAccessFlags flags)
 {
     AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
     AMDVIState *s = as->iommu_state;
@@ -1017,7 +1017,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
         return ret;
     }
 
-    amdvi_do_translate(as, addr, is_write, &ret);
+    amdvi_do_translate(as, addr, flags & IOMMU_WO, &ret);
     trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
             PCI_FUNC(as->devfn), addr, ret.translated_addr);
     return ret;
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 69730cb..dcf45f0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -364,7 +364,7 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState *s, uint16_t index)
 /* Must not update F field now, should be done later */
 static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
                             uint16_t source_id, hwaddr addr,
-                            VTDFaultReason fault, bool is_write)
+                            VTDFaultReason fault, IOMMUAccessFlags flags)
 {
     uint64_t hi = 0, lo;
     hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4);
@@ -373,7 +373,7 @@ static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
 
     lo = VTD_FRCD_FI(addr);
     hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault);
-    if (!is_write) {
+    if (!(flags == IOMMU_WO || flags == IOMMU_RW)) {
         hi |= VTD_FRCD_T;
     }
     vtd_set_quad_raw(s, frcd_reg_addr, lo);
@@ -404,7 +404,7 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, uint16_t source_id)
 /* Log and report an DMAR (address translation) fault to software */
 static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
                                   hwaddr addr, VTDFaultReason fault,
-                                  bool is_write)
+                                  IOMMUAccessFlags flags)
 {
     uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG);
 
@@ -415,7 +415,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
         return;
     }
     VTD_DPRINTF(FLOG, "sid 0x%"PRIx16 ", fault %d, addr 0x%"PRIx64
-                ", is_write %d", source_id, fault, addr, is_write);
+                ", flags %d", source_id, fault, addr, flags);
     if (fsts_reg & VTD_FSTS_PFO) {
         VTD_DPRINTF(FLOG, "new fault is not recorded due to "
                     "Primary Fault Overflow");
@@ -433,7 +433,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
         return;
     }
 
-    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, is_write);
+    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, flags);
 
     if (fsts_reg & VTD_FSTS_PPF) {
         VTD_DPRINTF(FLOG, "there are pending faults already, "
@@ -629,7 +629,8 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
 /* Given the @gpa, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
-static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
+static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
+                            IOMMUAccessFlags flags,
                             uint64_t *slptep, uint32_t *slpte_level,
                             bool *reads, bool *writes)
 {
@@ -649,7 +650,20 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
     }
 
     /* FIXME: what is the Atomics request here? */
-    access_right_check = is_write ? VTD_SL_W : VTD_SL_R;
+    switch (flags) {
+    case IOMMU_WO:
+        access_right_check = VTD_SL_W;
+        break;
+    case IOMMU_RO:
+        access_right_check = VTD_SL_R;
+        break;
+    case IOMMU_RW: /* passthrow */
+    case IOMMU_NO_FAIL:
+        access_right_check = VTD_SL_R | VTD_SL_W;
+        break;
+    default:
+        assert(0);
+    }
 
     while (true) {
         offset = vtd_gpa_level_offset(gpa, level);
@@ -671,8 +685,9 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
         if (!(slpte & access_right_check)) {
             VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
                         "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
-                        (is_write ? "write" : "read"), gpa, slpte);
-            return is_write ? -VTD_FR_WRITE : -VTD_FR_READ;
+                        (flags == IOMMU_WO ? "write" : "read"), gpa, slpte);
+            return (flags == IOMMU_RW || flags == IOMMU_WO) ?
+                   -VTD_FR_WRITE : -VTD_FR_READ;
         }
         if (vtd_slpte_nonzero_rsvd(slpte, level)) {
             VTD_DPRINTF(GENERAL, "error: non-zero reserved field in second "
@@ -789,11 +804,12 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
  *
  * @bus_num: The bus number
  * @devfn: The devfn, which is the  combined of device and function number
- * @is_write: The access is a write operation
+ * @flags: The access is a write operation
  * @entry: IOMMUTLBEntry that contain the addr to be translated and result
  */
 static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
-                                   uint8_t devfn, hwaddr addr, bool is_write,
+                                   uint8_t devfn, hwaddr addr,
+                                   IOMMUAccessFlags flags,
                                    IOMMUTLBEntry *entry)
 {
     IntelIOMMUState *s = vtd_as->iommu_state;
@@ -811,7 +827,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
 
     /* Check if the request is in interrupt address range */
     if (vtd_is_interrupt_addr(addr)) {
-        if (is_write) {
+        if (flags == IOMMU_WO || flags == IOMMU_RW) {
             /* FIXME: since we don't know the length of the access here, we
              * treat Non-DWORD length write requests without PASID as
              * interrupt requests, too. Withoud interrupt remapping support,
@@ -827,7 +843,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         } else {
             VTD_DPRINTF(GENERAL, "error: read request from interrupt address "
                         "gpa 0x%"PRIx64, addr);
-            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, is_write);
+            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, flags);
             return;
         }
     }
@@ -856,12 +872,14 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
         if (ret_fr) {
             ret_fr = -ret_fr;
-            if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
-                VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
+            if (flags != IOMMU_NO_FAIL) {
+                if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
+                    VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
                             "requests through this context-entry "
                             "(with FPD Set)");
-            } else {
-                vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
+                } else {
+                    vtd_report_dmar_fault(s, source_id, addr, ret_fr, flags);
+                }
             }
             return;
         }
@@ -874,15 +892,17 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
 
-    ret_fr = vtd_gpa_to_slpte(&ce, addr, is_write, &slpte, &level,
+    ret_fr = vtd_gpa_to_slpte(&ce, addr, flags, &slpte, &level,
                               &reads, &writes);
     if (ret_fr) {
         ret_fr = -ret_fr;
-        if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
-            VTD_DPRINTF(FLOG, "fault processing is disabled for DMA requests "
-                        "through this context-entry (with FPD Set)");
-        } else {
-            vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
+        if (flags != IOMMU_NO_FAIL) {
+            if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
+                VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
+                        "requests through this context-entry (with FPD Set)");
+            } else {
+                vtd_report_dmar_fault(s, source_id, addr, ret_fr, flags);
+            }
         }
         return;
     }
@@ -1944,7 +1964,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
 }
 
 static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
-                                         bool is_write)
+                                         IOMMUAccessFlags flags)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
@@ -1966,7 +1986,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
     }
 
     vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, addr,
-                           is_write, &ret);
+                           flags, &ret);
     VTD_DPRINTF(MMU,
                 "bus %"PRIu8 " slot %"PRIu8 " func %"PRIu8 " devfn %"PRIu8
                 " gpa 0x%"PRIx64 " hpa 0x%"PRIx64, pci_bus_num(vtd_as->bus),
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 10d7eac..0d4acb9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -57,6 +57,7 @@ typedef enum {
     IOMMU_RO   = 1,
     IOMMU_WO   = 2,
     IOMMU_RW   = 3,
+    IOMMU_NO_FAIL  = 4, /* may not be present, don't repport error to guest */
 } IOMMUAccessFlags;
 
 struct IOMMUTLBEntry {
@@ -168,10 +169,11 @@ struct MemoryRegionOps {
 };
 
 typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
-
 struct MemoryRegionIOMMUOps {
     /* Return a TLB entry that contains a given address. */
-    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
+    IOMMUTLBEntry (*translate)(MemoryRegion *iommu,
+                               hwaddr addr,
+                               IOMMUAccessFlags flags);
     /* Returns minimum supported page size */
     uint64_t (*get_min_page_size)(MemoryRegion *iommu);
     /* Called when IOMMU Notifier flag changed */
diff --git a/memory.c b/memory.c
index 58f9269..dfbb9a0 100644
--- a/memory.c
+++ b/memory.c
@@ -1563,7 +1563,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
     granularity = memory_region_iommu_get_min_page_size(mr);
 
     for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = mr->iommu_ops->translate(mr, addr,
+                                         is_write ? IOMMU_WO : IOMMU_RO);
         if (iotlb.perm != IOMMU_NONE) {
             n->notify(n, &iotlb);
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-17 15:44 [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
@ 2016-10-17 15:44 ` Aviv B.D
  2016-10-18  4:04   ` David Gibson
                     ` (2 more replies)
  2016-10-17 16:07 ` [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Alex Williamson
  3 siblings, 3 replies; 31+ messages in thread
From: Aviv B.D @ 2016-10-17 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

Adds a list of registered vtd_as's to intel iommu state to save
iteration over each PCI device in a search of the corrosponding domain.

Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
---
 hw/i386/intel_iommu.c          | 102 ++++++++++++++++++++++++++++++++++++++---
 hw/i386/intel_iommu_internal.h |   2 +
 include/hw/i386/intel_iommu.h  |   9 ++++
 3 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index dcf45f0..34fc1e8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -51,6 +51,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
+                                    uint8_t devfn, VTDContextEntry *ce);
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
@@ -142,6 +145,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
     return new_val;
 }
 
+static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t devfn,
+                           uint16_t *domain_id)
+{
+    VTDContextEntry ce;
+    int ret_fr;
+
+    assert(domain_id);
+
+    ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
+    if (ret_fr) {
+        return -1;
+    }
+
+    *domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
+    return 0;
+}
+
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -683,9 +703,6 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
         *reads = (*reads) && (slpte & VTD_SL_R);
         *writes = (*writes) && (slpte & VTD_SL_W);
         if (!(slpte & access_right_check)) {
-            VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
-                        "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
-                        (flags == IOMMU_WO ? "write" : "read"), gpa, slpte);
             return (flags == IOMMU_RW || flags == IOMMU_WO) ?
                    -VTD_FR_WRITE : -VTD_FR_READ;
         }
@@ -734,9 +751,6 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     }
 
     if (!vtd_context_entry_present(ce)) {
-        VTD_DPRINTF(GENERAL,
-                    "error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") "
-                    "is not present", devfn, bus_num);
         return -VTD_FR_CONTEXT_ENTRY_P;
     } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
                (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
@@ -1065,6 +1079,55 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
                                 &domain_id);
 }
 
+static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
+                                           uint16_t domain_id, hwaddr addr,
+                                           uint8_t am)
+{
+    IntelIOMMUNotifierNode *node;
+
+    QLIST_FOREACH(node, &(s->notifiers_list), next) {
+        VTDAddressSpace *vtd_as = node->vtd_as;
+        uint16_t vfio_domain_id;
+        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn,
+                                  &vfio_domain_id);
+        if (!ret && domain_id == vfio_domain_id) {
+            IOMMUTLBEntry entry;
+
+            /* notify unmap */
+            if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
+                VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d",
+                            addr, am);
+                entry.target_as = &address_space_memory;
+                entry.iova = addr & VTD_PAGE_MASK_4K;
+                entry.translated_addr = 0;
+                entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K + am);
+                entry.perm = IOMMU_NONE;
+                memory_region_notify_iommu(&node->vtd_as->iommu, entry);
+            }
+
+            /* notify map */
+            if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
+                hwaddr original_addr = addr;
+                VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", addr, am);
+                while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
+                    /* call to vtd_iommu_translate */
+                    IOMMUTLBEntry entry = s->iommu_ops.translate(
+                                                         &node->vtd_as->iommu,
+                                                         addr,
+                                                         IOMMU_NO_FAIL);
+                    if (entry.perm != IOMMU_NONE) {
+                        addr += entry.addr_mask + 1;
+                        memory_region_notify_iommu(&node->vtd_as->iommu, entry);
+                    } else {
+                        addr += VTD_PAGE_SIZE;
+                    }
+                }
+            }
+        }
+    }
+}
+
+
 static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
                                       hwaddr addr, uint8_t am)
 {
@@ -1075,6 +1138,8 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
     info.addr = addr;
     info.mask = ~((1 << am) - 1);
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
+
+    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
 }
 
 /* Flush IOTLB
@@ -2000,8 +2065,10 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
                                           IOMMUNotifierFlag new)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    IntelIOMMUNotifierNode *node = NULL;
 
-    if (new & IOMMU_NOTIFIER_MAP) {
+    if (!s->cache_mode_enabled && new & IOMMU_NOTIFIER_MAP) {
         error_report("Device at bus %s addr %02x.%d requires iommu "
                      "notifier which is currently not supported by "
                      "intel-iommu emulation",
@@ -2009,6 +2076,27 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
                      PCI_FUNC(vtd_as->devfn));
         exit(1);
     }
+
+    /* Add new ndoe if no mapping was exising before this call */
+    if (old == IOMMU_NOTIFIER_NONE) {
+        node = g_malloc0(sizeof(*node));
+        node->vtd_as = vtd_as;
+        node->notifier_flag = new;
+        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
+        return;
+    }
+
+    /* update notifier node with new flags */
+    QLIST_FOREACH(node, &s->notifiers_list, next) {
+        if (node->vtd_as == vtd_as) {
+            if (new == IOMMU_NOTIFIER_NONE) {
+                QLIST_REMOVE(node, next);
+            } else {
+                node->notifier_flag = new;
+            }
+            return;
+        }
+    }
 }
 
 static const VMStateDescription vtd_vmstate = {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 35d9f3a..96df4ae 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -381,6 +381,8 @@ typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
 #define VTD_PAGE_SHIFT_1G           30
 #define VTD_PAGE_MASK_1G            (~((1ULL << VTD_PAGE_SHIFT_1G) - 1))
 
+#define VTD_PAGE_MASK(shift)        (~((1ULL << (shift)) - 1))
+
 struct VTDRootEntry {
     uint64_t val;
     uint64_t rsvd;
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 7a94f16..c160706 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -63,6 +63,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDIrq VTDIrq;
 typedef struct VTD_MSIMessage VTD_MSIMessage;
+typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -248,6 +249,12 @@ struct VTD_MSIMessage {
 /* When IR is enabled, all MSI/MSI-X data bits should be zero */
 #define VTD_IR_MSI_DATA          (0)
 
+struct IntelIOMMUNotifierNode {
+    VTDAddressSpace *vtd_as;
+    IOMMUNotifierFlag notifier_flag;
+    QLIST_ENTRY(IntelIOMMUNotifierNode) next;
+};
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
     X86IOMMUState x86_iommu;
@@ -285,6 +292,8 @@ struct IntelIOMMUState {
     MemoryRegionIOMMUOps iommu_ops;
     GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
+    /* list of registered notifiers */
+    QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
 
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-17 15:44 [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
                   ` (2 preceding siblings ...)
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
@ 2016-10-17 16:07 ` Alex Williamson
  2016-10-18  4:06   ` David Gibson
  2016-10-20 19:17   ` Aviv B.D.
  3 siblings, 2 replies; 31+ messages in thread
From: Alex Williamson @ 2016-10-17 16:07 UTC (permalink / raw)
  To: Aviv B.D; +Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Jan Kiszka

On Mon, 17 Oct 2016 18:44:21 +0300
"Aviv B.D" <bd.aviv@gmail.com> wrote:

> From: "Aviv Ben-David" <bd.aviv@gmail.com>
> 
> * Advertize Cache Mode capability in iommu cap register. 
>   This capability is controlled by "cache-mode" property of intel-iommu device.
>   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> 
> * On page cache invalidation in intel vIOMMU, check if the domain belong to
>   registered notifier, and notify accordingly.
> 
> Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> present. Current problems:
> * vfio_iommu_map_notify is not aware about memory range belong to specific 
>   VFIOGuestIOMMU.

Could you elaborate on why this is an issue?

> * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
>   64bit address space. Commenting out the call to this function enables 
>   workable VFIO device while vIOMMU present.

This has been discussed previously, it would be incorrect for vfio not
to call the replay function.  The solution is to add an iommu driver
callback to efficiently walk the mappings within a MemoryRegion.
Thanks,

Alex

> Changes from v1 to v2:
> * remove assumption that the cache do not clears
> * fix lockup on high load.
> 
> Changes from v2 to v3:
> * remove debug leftovers
> * split to sepearate commits
> * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
>   to suppress error propagating to guest.
> 
> Changes from v3 to v4:
> * Add property to intel_iommu device to control the CM capability, 
>   default to False.
> * Use s->iommu_ops.notify_flag_changed to register notifiers.
> 
> Changes from v4 to v4 RESEND:
> * Fix codding style pointed by checkpatch.pl script.
> 
> Aviv Ben-David (3):
>   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
>     guest
>   IOMMU: change iommu_op->translate's is_write to flags, add support to
>     NO_FAIL flag mode
>   IOMMU: enable intel_iommu map and unmap notifiers
> 
>  exec.c                         |   3 +-
>  hw/i386/amd_iommu.c            |   4 +-
>  hw/i386/intel_iommu.c          | 175 +++++++++++++++++++++++++++++++++--------
>  hw/i386/intel_iommu_internal.h |   3 +
>  include/exec/memory.h          |   6 +-
>  include/hw/i386/intel_iommu.h  |  11 +++
>  memory.c                       |   3 +-
>  7 files changed, 168 insertions(+), 37 deletions(-)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
@ 2016-10-18  3:57   ` David Gibson
  2016-10-19  8:35   ` Peter Xu
  1 sibling, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-10-18  3:57 UTC (permalink / raw)
  To: Aviv B.D
  Cc: qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 14146 bytes --]

On Mon, Oct 17, 2016 at 06:44:23PM +0300, Aviv B.D wrote:
> From: "Aviv Ben-David" <bd.aviv@gmail.com>
> 
> Supports translation trials without reporting error to guest on
> translation failure.
> 
> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> ---
>  exec.c                |  3 ++-
>  hw/i386/amd_iommu.c   |  4 +--
>  hw/i386/intel_iommu.c | 70 +++++++++++++++++++++++++++++++++------------------
>  include/exec/memory.h |  6 +++--
>  memory.c              |  3 ++-
>  5 files changed, 55 insertions(+), 31 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 374c364..266fa01 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -432,7 +432,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
>              break;
>          }
>  
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = mr->iommu_ops->translate(mr, addr,
> +                                         is_write ? IOMMU_WO : IOMMU_RO);
>          addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>                  | (addr & iotlb.addr_mask));
>          *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 47b79d9..1f0d76b 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -988,7 +988,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>  }
>  
>  static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
> -                                     bool is_write)
> +                                     IOMMUAccessFlags flags)

You've also updated the intel viommu implementation for the new
notifier flags semantics, but none of the others (AMD and sPAPR).
That needs to be done as well.

>  {
>      AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
>      AMDVIState *s = as->iommu_state;
> @@ -1017,7 +1017,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
>          return ret;
>      }
>  
> -    amdvi_do_translate(as, addr, is_write, &ret);
> +    amdvi_do_translate(as, addr, flags & IOMMU_WO, &ret);
>      trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
>              PCI_FUNC(as->devfn), addr, ret.translated_addr);
>      return ret;
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 69730cb..dcf45f0 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -364,7 +364,7 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState *s, uint16_t index)
>  /* Must not update F field now, should be done later */
>  static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
>                              uint16_t source_id, hwaddr addr,
> -                            VTDFaultReason fault, bool is_write)
> +                            VTDFaultReason fault, IOMMUAccessFlags flags)
>  {
>      uint64_t hi = 0, lo;
>      hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4);
> @@ -373,7 +373,7 @@ static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
>  
>      lo = VTD_FRCD_FI(addr);
>      hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault);
> -    if (!is_write) {
> +    if (!(flags == IOMMU_WO || flags == IOMMU_RW)) {
>          hi |= VTD_FRCD_T;
>      }
>      vtd_set_quad_raw(s, frcd_reg_addr, lo);
> @@ -404,7 +404,7 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, uint16_t source_id)
>  /* Log and report an DMAR (address translation) fault to software */
>  static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
>                                    hwaddr addr, VTDFaultReason fault,
> -                                  bool is_write)
> +                                  IOMMUAccessFlags flags)
>  {
>      uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG);
>  
> @@ -415,7 +415,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
>          return;
>      }
>      VTD_DPRINTF(FLOG, "sid 0x%"PRIx16 ", fault %d, addr 0x%"PRIx64
> -                ", is_write %d", source_id, fault, addr, is_write);
> +                ", flags %d", source_id, fault, addr, flags);
>      if (fsts_reg & VTD_FSTS_PFO) {
>          VTD_DPRINTF(FLOG, "new fault is not recorded due to "
>                      "Primary Fault Overflow");
> @@ -433,7 +433,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
>          return;
>      }
>  
> -    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, is_write);
> +    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, flags);
>  
>      if (fsts_reg & VTD_FSTS_PPF) {
>          VTD_DPRINTF(FLOG, "there are pending faults already, "
> @@ -629,7 +629,8 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
>  /* Given the @gpa, get relevant @slptep. @slpte_level will be the last level
>   * of the translation, can be used for deciding the size of large page.
>   */
> -static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
> +static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
> +                            IOMMUAccessFlags flags,
>                              uint64_t *slptep, uint32_t *slpte_level,
>                              bool *reads, bool *writes)
>  {
> @@ -649,7 +650,20 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
>      }
>  
>      /* FIXME: what is the Atomics request here? */
> -    access_right_check = is_write ? VTD_SL_W : VTD_SL_R;
> +    switch (flags) {
> +    case IOMMU_WO:
> +        access_right_check = VTD_SL_W;
> +        break;
> +    case IOMMU_RO:
> +        access_right_check = VTD_SL_R;
> +        break;
> +    case IOMMU_RW: /* passthrow */
> +    case IOMMU_NO_FAIL:
> +        access_right_check = VTD_SL_R | VTD_SL_W;
> +        break;

I'm very confused by the semantics of IOMMU_NO_FAIL.  At first I
thought it was an additional flag that would be ORed with the access
mode to trigger extra behaviour.  But here it seems that you'd use it
_instead_ of a normal access mode.  I don't really see how that makes
sense.

> +    default:
> +        assert(0);
> +    }
>  
>      while (true) {
>          offset = vtd_gpa_level_offset(gpa, level);
> @@ -671,8 +685,9 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
>          if (!(slpte & access_right_check)) {
>              VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
>                          "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> -                        (is_write ? "write" : "read"), gpa, slpte);
> -            return is_write ? -VTD_FR_WRITE : -VTD_FR_READ;
> +                        (flags == IOMMU_WO ? "write" : "read"), gpa, slpte);
> +            return (flags == IOMMU_RW || flags == IOMMU_WO) ?
> +                   -VTD_FR_WRITE : -VTD_FR_READ;
>          }
>          if (vtd_slpte_nonzero_rsvd(slpte, level)) {
>              VTD_DPRINTF(GENERAL, "error: non-zero reserved field in second "
> @@ -789,11 +804,12 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
>   *
>   * @bus_num: The bus number
>   * @devfn: The devfn, which is the  combined of device and function number
> - * @is_write: The access is a write operation
> + * @flags: The access is a write operation
>   * @entry: IOMMUTLBEntry that contain the addr to be translated and result
>   */
>  static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> -                                   uint8_t devfn, hwaddr addr, bool is_write,
> +                                   uint8_t devfn, hwaddr addr,
> +                                   IOMMUAccessFlags flags,
>                                     IOMMUTLBEntry *entry)
>  {
>      IntelIOMMUState *s = vtd_as->iommu_state;
> @@ -811,7 +827,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>  
>      /* Check if the request is in interrupt address range */
>      if (vtd_is_interrupt_addr(addr)) {
> -        if (is_write) {
> +        if (flags == IOMMU_WO || flags == IOMMU_RW) {
>              /* FIXME: since we don't know the length of the access here, we
>               * treat Non-DWORD length write requests without PASID as
>               * interrupt requests, too. Withoud interrupt remapping support,
> @@ -827,7 +843,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>          } else {
>              VTD_DPRINTF(GENERAL, "error: read request from interrupt address "
>                          "gpa 0x%"PRIx64, addr);
> -            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, is_write);
> +            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, flags);
>              return;
>          }
>      }
> @@ -856,12 +872,14 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>          if (ret_fr) {
>              ret_fr = -ret_fr;
> -            if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
> -                VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
> +            if (flags != IOMMU_NO_FAIL) {
> +                if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
> +                    VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
>                              "requests through this context-entry "
>                              "(with FPD Set)");

Um.. the descriptions seem to imply that NO_FAIL prevents reporting
failures to the guest.  But here it seems to be the opposite - you
only transmit the fault if the flags are != IOMMU_NO_FAIL.

> -            } else {
> -                vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
> +                } else {
> +                    vtd_report_dmar_fault(s, source_id, addr, ret_fr, flags);
> +                }
>              }
>              return;
>          }
> @@ -874,15 +892,17 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>          cc_entry->context_cache_gen = s->context_cache_gen;
>      }
>  
> -    ret_fr = vtd_gpa_to_slpte(&ce, addr, is_write, &slpte, &level,
> +    ret_fr = vtd_gpa_to_slpte(&ce, addr, flags, &slpte, &level,
>                                &reads, &writes);
>      if (ret_fr) {
>          ret_fr = -ret_fr;
> -        if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
> -            VTD_DPRINTF(FLOG, "fault processing is disabled for DMA requests "
> -                        "through this context-entry (with FPD Set)");
> -        } else {
> -            vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
> +        if (flags != IOMMU_NO_FAIL) {
> +            if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
> +                VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
> +                        "requests through this context-entry (with FPD Set)");
> +            } else {
> +                vtd_report_dmar_fault(s, source_id, addr, ret_fr, flags);
> +            }
>          }
>          return;
>      }
> @@ -1944,7 +1964,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>  }
>  
>  static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
> -                                         bool is_write)
> +                                         IOMMUAccessFlags flags)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>      IntelIOMMUState *s = vtd_as->iommu_state;
> @@ -1966,7 +1986,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
>      }
>  
>      vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, addr,
> -                           is_write, &ret);
> +                           flags, &ret);
>      VTD_DPRINTF(MMU,
>                  "bus %"PRIu8 " slot %"PRIu8 " func %"PRIu8 " devfn %"PRIu8
>                  " gpa 0x%"PRIx64 " hpa 0x%"PRIx64, pci_bus_num(vtd_as->bus),
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 10d7eac..0d4acb9 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -57,6 +57,7 @@ typedef enum {
>      IOMMU_RO   = 1,
>      IOMMU_WO   = 2,
>      IOMMU_RW   = 3,
> +    IOMMU_NO_FAIL  = 4, /* may not be present, don't repport error to guest */
>  } IOMMUAccessFlags;
>  
>  struct IOMMUTLBEntry {
> @@ -168,10 +169,11 @@ struct MemoryRegionOps {
>  };
>  
>  typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
> -
>  struct MemoryRegionIOMMUOps {
>      /* Return a TLB entry that contains a given address. */
> -    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
> +    IOMMUTLBEntry (*translate)(MemoryRegion *iommu,
> +                               hwaddr addr,
> +                               IOMMUAccessFlags flags);
>      /* Returns minimum supported page size */
>      uint64_t (*get_min_page_size)(MemoryRegion *iommu);
>      /* Called when IOMMU Notifier flag changed */
> diff --git a/memory.c b/memory.c
> index 58f9269..dfbb9a0 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1563,7 +1563,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>      granularity = memory_region_iommu_get_min_page_size(mr);
>  
>      for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = mr->iommu_ops->translate(mr, addr,
> +                                         is_write ? IOMMU_WO : IOMMU_RO);

It probably makes more sense to pust the IOMMUAccessFlags parameter
into memory_region_iommu_replay().

>          if (iotlb.perm != IOMMU_NONE) {
>              n->notify(n, &iotlb);
>          }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
@ 2016-10-18  4:04   ` David Gibson
  2016-10-19  9:33   ` Peter Xu
  2016-10-20  7:28   ` Peter Xu
  2 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-10-18  4:04 UTC (permalink / raw)
  To: Aviv B.D
  Cc: qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 9865 bytes --]

On Mon, Oct 17, 2016 at 06:44:24PM +0300, Aviv B.D wrote:
> From: "Aviv Ben-David" <bd.aviv@gmail.com>
> 
> Adds a list of registered vtd_as's to intel iommu state to save
> iteration over each PCI device in a search of the corrosponding domain.
> 
> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> ---
>  hw/i386/intel_iommu.c          | 102 ++++++++++++++++++++++++++++++++++++++---
>  hw/i386/intel_iommu_internal.h |   2 +
>  include/hw/i386/intel_iommu.h  |   9 ++++
>  3 files changed, 106 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index dcf45f0..34fc1e8 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -51,6 +51,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
>  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
>  #endif
>  
> +static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> +                                    uint8_t devfn, VTDContextEntry *ce);
> +
>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                              uint64_t wmask, uint64_t w1cmask)
>  {
> @@ -142,6 +145,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
>      return new_val;
>  }
>  
> +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t devfn,
> +                           uint16_t *domain_id)
> +{
> +    VTDContextEntry ce;
> +    int ret_fr;
> +
> +    assert(domain_id);
> +
> +    ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> +    if (ret_fr) {
> +        return -1;
> +    }
> +
> +    *domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
> +    return 0;
> +}
> +
>  /* GHashTable functions */
>  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>  {
> @@ -683,9 +703,6 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
>          *reads = (*reads) && (slpte & VTD_SL_R);
>          *writes = (*writes) && (slpte & VTD_SL_W);
>          if (!(slpte & access_right_check)) {
> -            VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> -                        "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> -                        (flags == IOMMU_WO ? "write" : "read"), gpa, slpte);
>              return (flags == IOMMU_RW || flags == IOMMU_WO) ?
>                     -VTD_FR_WRITE : -VTD_FR_READ;
>          }
> @@ -734,9 +751,6 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>      }
>  
>      if (!vtd_context_entry_present(ce)) {
> -        VTD_DPRINTF(GENERAL,
> -                    "error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") "
> -                    "is not present", devfn, bus_num);
>          return -VTD_FR_CONTEXT_ENTRY_P;
>      } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
>                 (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
> @@ -1065,6 +1079,55 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
>                                  &domain_id);
>  }
>  
> +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> +                                           uint16_t domain_id, hwaddr addr,
> +                                           uint8_t am)
> +{
> +    IntelIOMMUNotifierNode *node;
> +
> +    QLIST_FOREACH(node, &(s->notifiers_list), next) {

It's not really obvious to me why you need this additional list of
IntelIOMMUNotifierNode structures, rather than just the notifier list
already built into each MemoryRegion.

> +        VTDAddressSpace *vtd_as = node->vtd_as;
> +        uint16_t vfio_domain_id;
> +        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn,
> +                                  &vfio_domain_id);
> +        if (!ret && domain_id == vfio_domain_id) {
> +            IOMMUTLBEntry entry;
> +
> +            /* notify unmap */
> +            if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
> +                VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d",
> +                            addr, am);
> +                entry.target_as = &address_space_memory;
> +                entry.iova = addr & VTD_PAGE_MASK_4K;
> +                entry.translated_addr = 0;
> +                entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K + am);
> +                entry.perm = IOMMU_NONE;
> +                memory_region_notify_iommu(&node->vtd_as->iommu, entry);
> +            }
> +
> +            /* notify map */
> +            if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> +                hwaddr original_addr = addr;
> +                VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", addr, am);
> +                while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
> +                    /* call to vtd_iommu_translate */
> +                    IOMMUTLBEntry entry = s->iommu_ops.translate(
> +                                                         &node->vtd_as->iommu,
> +                                                         addr,
> +                                                         IOMMU_NO_FAIL);
> +                    if (entry.perm != IOMMU_NONE) {
> +                        addr += entry.addr_mask + 1;
> +                        memory_region_notify_iommu(&node->vtd_as->iommu, entry);
> +                    } else {
> +                        addr += VTD_PAGE_SIZE;
> +                    }
> +                }
> +            }
> +        }
> +    }
> +}
> +
> +
>  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>                                        hwaddr addr, uint8_t am)
>  {
> @@ -1075,6 +1138,8 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>      info.addr = addr;
>      info.mask = ~((1 << am) - 1);
>      g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> +
> +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
>  }
>  
>  /* Flush IOTLB
> @@ -2000,8 +2065,10 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
>                                            IOMMUNotifierFlag new)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> +    IntelIOMMUState *s = vtd_as->iommu_state;
> +    IntelIOMMUNotifierNode *node = NULL;
>  
> -    if (new & IOMMU_NOTIFIER_MAP) {
> +    if (!s->cache_mode_enabled && new & IOMMU_NOTIFIER_MAP) {
>          error_report("Device at bus %s addr %02x.%d requires iommu "
>                       "notifier which is currently not supported by "
>                       "intel-iommu emulation",
> @@ -2009,6 +2076,27 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
>                       PCI_FUNC(vtd_as->devfn));
>          exit(1);
>      }
> +
> +    /* Add new ndoe if no mapping was exising before this call */
> +    if (old == IOMMU_NOTIFIER_NONE) {
> +        node = g_malloc0(sizeof(*node));
> +        node->vtd_as = vtd_as;
> +        node->notifier_flag = new;
> +        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
> +        return;
> +    }
> +
> +    /* update notifier node with new flags */
> +    QLIST_FOREACH(node, &s->notifiers_list, next) {
> +        if (node->vtd_as == vtd_as) {
> +            if (new == IOMMU_NOTIFIER_NONE) {
> +                QLIST_REMOVE(node, next);
> +            } else {
> +                node->notifier_flag = new;
> +            }
> +            return;
> +        }
> +    }
>  }
>  
>  static const VMStateDescription vtd_vmstate = {
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 35d9f3a..96df4ae 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -381,6 +381,8 @@ typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
>  #define VTD_PAGE_SHIFT_1G           30
>  #define VTD_PAGE_MASK_1G            (~((1ULL << VTD_PAGE_SHIFT_1G) - 1))
>  
> +#define VTD_PAGE_MASK(shift)        (~((1ULL << (shift)) - 1))
> +
>  struct VTDRootEntry {
>      uint64_t val;
>      uint64_t rsvd;
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 7a94f16..c160706 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -63,6 +63,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
>  typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
>  typedef struct VTDIrq VTDIrq;
>  typedef struct VTD_MSIMessage VTD_MSIMessage;
> +typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
>  
>  /* Context-Entry */
>  struct VTDContextEntry {
> @@ -248,6 +249,12 @@ struct VTD_MSIMessage {
>  /* When IR is enabled, all MSI/MSI-X data bits should be zero */
>  #define VTD_IR_MSI_DATA          (0)
>  
> +struct IntelIOMMUNotifierNode {
> +    VTDAddressSpace *vtd_as;
> +    IOMMUNotifierFlag notifier_flag;
> +    QLIST_ENTRY(IntelIOMMUNotifierNode) next;
> +};
> +
>  /* The iommu (DMAR) device state struct */
>  struct IntelIOMMUState {
>      X86IOMMUState x86_iommu;
> @@ -285,6 +292,8 @@ struct IntelIOMMUState {
>      MemoryRegionIOMMUOps iommu_ops;
>      GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
>      VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
> +    /* list of registered notifiers */
> +    QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
>  
>      /* interrupt remapping */
>      bool intr_enabled;              /* Whether guest enabled IR */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-17 16:07 ` [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Alex Williamson
@ 2016-10-18  4:06   ` David Gibson
  2016-10-18  4:47     ` Alex Williamson
  2016-10-20 19:17   ` Aviv B.D.
  1 sibling, 1 reply; 31+ messages in thread
From: David Gibson @ 2016-10-18  4:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3350 bytes --]

On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote:
> On Mon, 17 Oct 2016 18:44:21 +0300
> "Aviv B.D" <bd.aviv@gmail.com> wrote:
> 
> > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > 
> > * Advertize Cache Mode capability in iommu cap register. 
> >   This capability is controlled by "cache-mode" property of intel-iommu device.
> >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > 
> > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> >   registered notifier, and notify accordingly.
> > 
> > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > present. Current problems:
> > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> >   VFIOGuestIOMMU.
> 
> Could you elaborate on why this is an issue?
> 
> > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> >   64bit address space. Commenting out the call to this function enables 
> >   workable VFIO device while vIOMMU present.
> 
> This has been discussed previously, it would be incorrect for vfio not
> to call the replay function.  The solution is to add an iommu driver
> callback to efficiently walk the mappings within a MemoryRegion.

Right, replay is a bit of a hack.  There are a couple of other
approaches that might be adequate without a new callback:
   - Make the VFIOGuestIOMMU aware of the guest address range mapped
     by the vIOMMU.  Intel currently advertises that as a full 64-bit
     address space, but I bet that's not actually true in practice.
   - Have the IOMMU MR advertise a (minimum) page size for vIOMMU
     mappings.  That may let you stpe through the range with greater
     strides

> Thanks,
> 
> Alex
> 
> > Changes from v1 to v2:
> > * remove assumption that the cache do not clears
> > * fix lockup on high load.
> > 
> > Changes from v2 to v3:
> > * remove debug leftovers
> > * split to sepearate commits
> > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> >   to suppress error propagating to guest.
> > 
> > Changes from v3 to v4:
> > * Add property to intel_iommu device to control the CM capability, 
> >   default to False.
> > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > 
> > Changes from v4 to v4 RESEND:
> > * Fix codding style pointed by checkpatch.pl script.
> > 
> > Aviv Ben-David (3):
> >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> >     guest
> >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> >     NO_FAIL flag mode
> >   IOMMU: enable intel_iommu map and unmap notifiers
> > 
> >  exec.c                         |   3 +-
> >  hw/i386/amd_iommu.c            |   4 +-
> >  hw/i386/intel_iommu.c          | 175 +++++++++++++++++++++++++++++++++--------
> >  hw/i386/intel_iommu_internal.h |   3 +
> >  include/exec/memory.h          |   6 +-
> >  include/hw/i386/intel_iommu.h  |  11 +++
> >  memory.c                       |   3 +-
> >  7 files changed, 168 insertions(+), 37 deletions(-)
> > 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-18  4:06   ` David Gibson
@ 2016-10-18  4:47     ` Alex Williamson
  2016-10-18  5:52       ` David Gibson
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2016-10-18  4:47 UTC (permalink / raw)
  To: David Gibson
  Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu, Michael S. Tsirkin

On Tue, 18 Oct 2016 15:06:55 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote:
> > On Mon, 17 Oct 2016 18:44:21 +0300
> > "Aviv B.D" <bd.aviv@gmail.com> wrote:
> >   
> > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > 
> > > * Advertize Cache Mode capability in iommu cap register. 
> > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > 
> > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > >   registered notifier, and notify accordingly.
> > > 
> > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > present. Current problems:
> > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > >   VFIOGuestIOMMU.  
> > 
> > Could you elaborate on why this is an issue?
> >   
> > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > >   64bit address space. Commenting out the call to this function enables 
> > >   workable VFIO device while vIOMMU present.  
> > 
> > This has been discussed previously, it would be incorrect for vfio not
> > to call the replay function.  The solution is to add an iommu driver
> > callback to efficiently walk the mappings within a MemoryRegion.  
> 
> Right, replay is a bit of a hack.  There are a couple of other
> approaches that might be adequate without a new callback:
>    - Make the VFIOGuestIOMMU aware of the guest address range mapped
>      by the vIOMMU.  Intel currently advertises that as a full 64-bit
>      address space, but I bet that's not actually true in practice.
>    - Have the IOMMU MR advertise a (minimum) page size for vIOMMU
>      mappings.  That may let you stpe through the range with greater
>      strides

Hmm, VT-d supports at least a 39-bit address width and always supports
a minimum 4k page size, so yes that does reduce us from 2^52 steps down
to 2^27, but it's still absurd to walk through the raw address space.
It does however seem correct to create the MemoryRegion with a width
that actually matches the IOMMU capability, but I don't think that's a
sufficient fix by itself.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-18  4:47     ` Alex Williamson
@ 2016-10-18  5:52       ` David Gibson
  2016-10-18  8:03         ` Alex Williamson
  0 siblings, 1 reply; 31+ messages in thread
From: David Gibson @ 2016-10-18  5:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3249 bytes --]

On Mon, Oct 17, 2016 at 10:47:02PM -0600, Alex Williamson wrote:
> On Tue, 18 Oct 2016 15:06:55 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote:
> > > On Mon, 17 Oct 2016 18:44:21 +0300
> > > "Aviv B.D" <bd.aviv@gmail.com> wrote:
> > >   
> > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > 
> > > > * Advertize Cache Mode capability in iommu cap register. 
> > > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > > 
> > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > > >   registered notifier, and notify accordingly.
> > > > 
> > > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > > present. Current problems:
> > > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > > >   VFIOGuestIOMMU.  
> > > 
> > > Could you elaborate on why this is an issue?
> > >   
> > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > > >   64bit address space. Commenting out the call to this function enables 
> > > >   workable VFIO device while vIOMMU present.  
> > > 
> > > This has been discussed previously, it would be incorrect for vfio not
> > > to call the replay function.  The solution is to add an iommu driver
> > > callback to efficiently walk the mappings within a MemoryRegion.  
> > 
> > Right, replay is a bit of a hack.  There are a couple of other
> > approaches that might be adequate without a new callback:
> >    - Make the VFIOGuestIOMMU aware of the guest address range mapped
> >      by the vIOMMU.  Intel currently advertises that as a full 64-bit
> >      address space, but I bet that's not actually true in practice.
> >    - Have the IOMMU MR advertise a (minimum) page size for vIOMMU
> >      mappings.  That may let you stpe through the range with greater
> >      strides
> 
> Hmm, VT-d supports at least a 39-bit address width and always supports
> a minimum 4k page size, so yes that does reduce us from 2^52 steps down
> to 2^27,

Right, which is probably doable, if not ideal

> but it's still absurd to walk through the raw address space.

Well.. it depends on the internal structure of the IOMMU.  For Power,
it's traditionally just a 1-level page table, so we can't actually do
any better than stepping through each IOMMU page.

> It does however seem correct to create the MemoryRegion with a width
> that actually matches the IOMMU capability, but I don't think that's a
> sufficient fix by itself.  Thanks,

I suspect it would actually make it workable in the short term.

But I don't disagree that a "traverse" or "replay" callback of some
sort in the iommu_ops is a better idea long term.  Having a fallback
to the current replay implementation if the callback isn't supplied
seems pretty reasonable though.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-18  5:52       ` David Gibson
@ 2016-10-18  8:03         ` Alex Williamson
  0 siblings, 0 replies; 31+ messages in thread
From: Alex Williamson @ 2016-10-18  8:03 UTC (permalink / raw)
  To: David Gibson
  Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu, Michael S. Tsirkin

On Tue, 18 Oct 2016 16:52:04 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Oct 17, 2016 at 10:47:02PM -0600, Alex Williamson wrote:
> > On Tue, 18 Oct 2016 15:06:55 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote:  
> > > > On Mon, 17 Oct 2016 18:44:21 +0300
> > > > "Aviv B.D" <bd.aviv@gmail.com> wrote:
> > > >     
> > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > 
> > > > > * Advertize Cache Mode capability in iommu cap register. 
> > > > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > > > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > > > 
> > > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > > > >   registered notifier, and notify accordingly.
> > > > > 
> > > > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > > > present. Current problems:
> > > > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > > > >   VFIOGuestIOMMU.    
> > > > 
> > > > Could you elaborate on why this is an issue?
> > > >     
> > > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > > > >   64bit address space. Commenting out the call to this function enables 
> > > > >   workable VFIO device while vIOMMU present.    
> > > > 
> > > > This has been discussed previously, it would be incorrect for vfio not
> > > > to call the replay function.  The solution is to add an iommu driver
> > > > callback to efficiently walk the mappings within a MemoryRegion.    
> > > 
> > > Right, replay is a bit of a hack.  There are a couple of other
> > > approaches that might be adequate without a new callback:
> > >    - Make the VFIOGuestIOMMU aware of the guest address range mapped
> > >      by the vIOMMU.  Intel currently advertises that as a full 64-bit
> > >      address space, but I bet that's not actually true in practice.
> > >    - Have the IOMMU MR advertise a (minimum) page size for vIOMMU
> > >      mappings.  That may let you stpe through the range with greater
> > >      strides  
> > 
> > Hmm, VT-d supports at least a 39-bit address width and always supports
> > a minimum 4k page size, so yes that does reduce us from 2^52 steps down
> > to 2^27,  
> 
> Right, which is probably doable, if not ideal
> 
> > but it's still absurd to walk through the raw address space.  
> 
> Well.. it depends on the internal structure of the IOMMU.  For Power,
> it's traditionally just a 1-level page table, so we can't actually do
> any better than stepping through each IOMMU page.

Intel always has a least a 3-level page table AIUI.
 
> > It does however seem correct to create the MemoryRegion with a width
> > that actually matches the IOMMU capability, but I don't think that's a
> > sufficient fix by itself.  Thanks,  
> 
> I suspect it would actually make it workable in the short term.
> 
> But I don't disagree that a "traverse" or "replay" callback of some
> sort in the iommu_ops is a better idea long term.  Having a fallback
> to the current replay implementation if the callback isn't supplied
> seems pretty reasonable though.

Exactly, the callback could be optional where IOMMUs that supply a
relatively small IOVA window could fallback to the code we have today.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
  2016-10-18  3:57   ` David Gibson
@ 2016-10-19  8:35   ` Peter Xu
  2016-10-20 18:54     ` Aviv B.D.
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Xu @ 2016-10-19  8:35 UTC (permalink / raw)
  To: Aviv B.D; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Mon, Oct 17, 2016 at 06:44:23PM +0300, Aviv B.D wrote:

[...]

> @@ -364,7 +364,7 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState *s, uint16_t index)
>  /* Must not update F field now, should be done later */
>  static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
>                              uint16_t source_id, hwaddr addr,
> -                            VTDFaultReason fault, bool is_write)
> +                            VTDFaultReason fault, IOMMUAccessFlags flags)

I think we don't need to change this is_write into flags. We can just
make sure we translate the flags into is_write when calling
vtd_record_frcd. After all, NO_FAIL makes no sense in this function.

>  {
>      uint64_t hi = 0, lo;
>      hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4);
> @@ -373,7 +373,7 @@ static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
>  
>      lo = VTD_FRCD_FI(addr);
>      hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault);
> -    if (!is_write) {
> +    if (!(flags == IOMMU_WO || flags == IOMMU_RW)) {
>          hi |= VTD_FRCD_T;
>      }
>      vtd_set_quad_raw(s, frcd_reg_addr, lo);
> @@ -404,7 +404,7 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, uint16_t source_id)
>  /* Log and report an DMAR (address translation) fault to software */
>  static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
>                                    hwaddr addr, VTDFaultReason fault,
> -                                  bool is_write)
> +                                  IOMMUAccessFlags flags)

Here as well. IMHO we can just keep the old is_write, and tune the
callers.

>  {
>      uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG);
>  
> @@ -415,7 +415,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
>          return;
>      }
>      VTD_DPRINTF(FLOG, "sid 0x%"PRIx16 ", fault %d, addr 0x%"PRIx64
> -                ", is_write %d", source_id, fault, addr, is_write);
> +                ", flags %d", source_id, fault, addr, flags);
>      if (fsts_reg & VTD_FSTS_PFO) {
>          VTD_DPRINTF(FLOG, "new fault is not recorded due to "
>                      "Primary Fault Overflow");
> @@ -433,7 +433,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
>          return;
>      }
>  
> -    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, is_write);
> +    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, flags);

... so if you agree on my previous comments, here we can use:

       vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault,
                       flags & IOMMU_WO);

and keep the vtd_record_frcd() untouched.

[...]

> @@ -789,11 +804,12 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
>   *
>   * @bus_num: The bus number
>   * @devfn: The devfn, which is the  combined of device and function number
> - * @is_write: The access is a write operation
> + * @flags: The access is a write operation

Need to fix comment to suite the new flag.

>   * @entry: IOMMUTLBEntry that contain the addr to be translated and result
>   */
>  static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> -                                   uint8_t devfn, hwaddr addr, bool is_write,
> +                                   uint8_t devfn, hwaddr addr,
> +                                   IOMMUAccessFlags flags,
>                                     IOMMUTLBEntry *entry)
>  {
>      IntelIOMMUState *s = vtd_as->iommu_state;
> @@ -811,7 +827,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>  
>      /* Check if the request is in interrupt address range */
>      if (vtd_is_interrupt_addr(addr)) {
> -        if (is_write) {
> +        if (flags == IOMMU_WO || flags == IOMMU_RW) {

I suggest we directly use (flags & IOMMU_WO) in all similar places.

>              /* FIXME: since we don't know the length of the access here, we
>               * treat Non-DWORD length write requests without PASID as
>               * interrupt requests, too. Withoud interrupt remapping support,
> @@ -827,7 +843,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>          } else {
>              VTD_DPRINTF(GENERAL, "error: read request from interrupt address "
>                          "gpa 0x%"PRIx64, addr);
> -            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, is_write);
> +            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, flags);

Again, we can avoid touching vtd_report_dmar_fault() interface, and
use the old is_write. Looks like NO_FAIL makes no sense for it as well.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
  2016-10-18  4:04   ` David Gibson
@ 2016-10-19  9:33   ` Peter Xu
  2016-10-20 19:11     ` Aviv B.D.
  2016-10-20  7:28   ` Peter Xu
  2 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2016-10-19  9:33 UTC (permalink / raw)
  To: Aviv B.D; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Mon, Oct 17, 2016 at 06:44:24PM +0300, Aviv B.D wrote:
> From: "Aviv Ben-David" <bd.aviv@gmail.com>
> 
> Adds a list of registered vtd_as's to intel iommu state to save
> iteration over each PCI device in a search of the corrosponding domain.
> 
> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> ---
>  hw/i386/intel_iommu.c          | 102 ++++++++++++++++++++++++++++++++++++++---
>  hw/i386/intel_iommu_internal.h |   2 +
>  include/hw/i386/intel_iommu.h  |   9 ++++
>  3 files changed, 106 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index dcf45f0..34fc1e8 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -51,6 +51,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
>  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
>  #endif
>  
> +static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> +                                    uint8_t devfn, VTDContextEntry *ce);
> +
>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                              uint64_t wmask, uint64_t w1cmask)
>  {
> @@ -142,6 +145,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
>      return new_val;
>  }
>  
> +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t devfn,
> +                           uint16_t *domain_id)
> +{
> +    VTDContextEntry ce;
> +    int ret_fr;
> +
> +    assert(domain_id);
> +
> +    ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> +    if (ret_fr) {
> +        return -1;
> +    }
> +
> +    *domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
                    ^ one more space

> +    return 0;
> +}
> +
>  /* GHashTable functions */
>  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>  {
> @@ -683,9 +703,6 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
>          *reads = (*reads) && (slpte & VTD_SL_R);
>          *writes = (*writes) && (slpte & VTD_SL_W);
>          if (!(slpte & access_right_check)) {
> -            VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> -                        "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> -                        (flags == IOMMU_WO ? "write" : "read"), gpa, slpte);

Could I ask why we are removing these lines? It can be useful if we
have permission issues.

>              return (flags == IOMMU_RW || flags == IOMMU_WO) ?
>                     -VTD_FR_WRITE : -VTD_FR_READ;
>          }
> @@ -734,9 +751,6 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>      }
>  
>      if (!vtd_context_entry_present(ce)) {
> -        VTD_DPRINTF(GENERAL,
> -                    "error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") "
> -                    "is not present", devfn, bus_num);

Here as well. Any reason to remove it?

>          return -VTD_FR_CONTEXT_ENTRY_P;
>      } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
>                 (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
> @@ -1065,6 +1079,55 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
>                                  &domain_id);
>  }
>  
> +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> +                                           uint16_t domain_id, hwaddr addr,
> +                                           uint8_t am)
> +{

The logic of this function looks strange to me.

> +    IntelIOMMUNotifierNode *node;
> +
> +    QLIST_FOREACH(node, &(s->notifiers_list), next) {
> +        VTDAddressSpace *vtd_as = node->vtd_as;
> +        uint16_t vfio_domain_id;
> +        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn,
> +                                  &vfio_domain_id);
> +        if (!ret && domain_id == vfio_domain_id) {
> +            IOMMUTLBEntry entry;
> +
> +            /* notify unmap */
> +            if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {

First of all, if we are talking about VFIO, notifier_flag should
always be MAP|UNMAP. So in that case, for newly mapped entries, looks
like we will first send an UNMAP, then a MAP?

> +                VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d",
> +                            addr, am);
> +                entry.target_as = &address_space_memory;
> +                entry.iova = addr & VTD_PAGE_MASK_4K;
> +                entry.translated_addr = 0;
> +                entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K + am);
> +                entry.perm = IOMMU_NONE;
> +                memory_region_notify_iommu(&node->vtd_as->iommu, entry);
> +            }
> +
> +            /* notify map */
> +            if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> +                hwaddr original_addr = addr;
> +                VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", addr, am);
> +                while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
> +                    /* call to vtd_iommu_translate */
> +                    IOMMUTLBEntry entry = s->iommu_ops.translate(
> +                                                         &node->vtd_as->iommu,
> +                                                         addr,
> +                                                         IOMMU_NO_FAIL);
> +                    if (entry.perm != IOMMU_NONE) {
> +                        addr += entry.addr_mask + 1;
> +                        memory_region_notify_iommu(&node->vtd_as->iommu, entry);
> +                    } else {
> +                        addr += VTD_PAGE_SIZE;

IIUC, here is the point that we found "the page is gone" (so this is
an UNMAP invalidation), and we should do memory_region_iommu_notify()
for the whole area with IOMMU_NONE. Then we just quit the loop since
continuous translate()s should fail as well if the first page is
missing.

Please correct if I am wrong.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
  2016-10-18  4:04   ` David Gibson
  2016-10-19  9:33   ` Peter Xu
@ 2016-10-20  7:28   ` Peter Xu
  2 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2016-10-20  7:28 UTC (permalink / raw)
  To: Aviv B.D; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Mon, Oct 17, 2016 at 06:44:24PM +0300, Aviv B.D wrote:

[...]

> @@ -2000,8 +2065,10 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
>                                            IOMMUNotifierFlag new)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> +    IntelIOMMUState *s = vtd_as->iommu_state;
> +    IntelIOMMUNotifierNode *node = NULL;
>  
> -    if (new & IOMMU_NOTIFIER_MAP) {
> +    if (!s->cache_mode_enabled && new & IOMMU_NOTIFIER_MAP) {
>          error_report("Device at bus %s addr %02x.%d requires iommu "
>                       "notifier which is currently not supported by "
>                       "intel-iommu emulation",

Here after the patch works, we can modify the warning message into
something like:

"We need to set cache_mode=1 for intel-iommu to enable device
assignment with IOMMU protection."

> @@ -2009,6 +2076,27 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
>                       PCI_FUNC(vtd_as->devfn));
>          exit(1);
>      }
> +
> +    /* Add new ndoe if no mapping was exising before this call */
> +    if (old == IOMMU_NOTIFIER_NONE) {
> +        node = g_malloc0(sizeof(*node));
> +        node->vtd_as = vtd_as;
> +        node->notifier_flag = new;
> +        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
> +        return;
> +    }
> +
> +    /* update notifier node with new flags */
> +    QLIST_FOREACH(node, &s->notifiers_list, next) {

Though in this case it is safe, I would still suggest we use
QLIST_FOREACH_SAFE here.

> +        if (node->vtd_as == vtd_as) {
> +            if (new == IOMMU_NOTIFIER_NONE) {
> +                QLIST_REMOVE(node, next);

Memory leak here?

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode
  2016-10-19  8:35   ` Peter Xu
@ 2016-10-20 18:54     ` Aviv B.D.
  0 siblings, 0 replies; 31+ messages in thread
From: Aviv B.D. @ 2016-10-20 18:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

Hi,
You are right, and I will revert those functions you mentioned.

Thanks,
Aviv.



On Wed, Oct 19, 2016 at 11:35 AM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Oct 17, 2016 at 06:44:23PM +0300, Aviv B.D wrote:
>
> [...]
>
> > @@ -364,7 +364,7 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState
> *s, uint16_t index)
> >  /* Must not update F field now, should be done later */
> >  static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
> >                              uint16_t source_id, hwaddr addr,
> > -                            VTDFaultReason fault, bool is_write)
> > +                            VTDFaultReason fault, IOMMUAccessFlags
> flags)
>
> I think we don't need to change this is_write into flags. We can just
> make sure we translate the flags into is_write when calling
> vtd_record_frcd. After all, NO_FAIL makes no sense in this function.
>
> >  {
> >      uint64_t hi = 0, lo;
> >      hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) <<
> 4);
> > @@ -373,7 +373,7 @@ static void vtd_record_frcd(IntelIOMMUState *s,
> uint16_t index,
> >
> >      lo = VTD_FRCD_FI(addr);
> >      hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault);
> > -    if (!is_write) {
> > +    if (!(flags == IOMMU_WO || flags == IOMMU_RW)) {
> >          hi |= VTD_FRCD_T;
> >      }
> >      vtd_set_quad_raw(s, frcd_reg_addr, lo);
> > @@ -404,7 +404,7 @@ static bool vtd_try_collapse_fault(IntelIOMMUState
> *s, uint16_t source_id)
> >  /* Log and report an DMAR (address translation) fault to software */
> >  static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t
> source_id,
> >                                    hwaddr addr, VTDFaultReason fault,
> > -                                  bool is_write)
> > +                                  IOMMUAccessFlags flags)
>
> Here as well. IMHO we can just keep the old is_write, and tune the
> callers.
>
> >  {
> >      uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG);
> >
> > @@ -415,7 +415,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState
> *s, uint16_t source_id,
> >          return;
> >      }
> >      VTD_DPRINTF(FLOG, "sid 0x%"PRIx16 ", fault %d, addr 0x%"PRIx64
> > -                ", is_write %d", source_id, fault, addr, is_write);
> > +                ", flags %d", source_id, fault, addr, flags);
> >      if (fsts_reg & VTD_FSTS_PFO) {
> >          VTD_DPRINTF(FLOG, "new fault is not recorded due to "
> >                      "Primary Fault Overflow");
> > @@ -433,7 +433,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState
> *s, uint16_t source_id,
> >          return;
> >      }
> >
> > -    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault,
> is_write);
> > +    vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, flags);
>
> ... so if you agree on my previous comments, here we can use:
>
>        vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault,
>                        flags & IOMMU_WO);
>
> and keep the vtd_record_frcd() untouched.
>
> [...]
>
> > @@ -789,11 +804,12 @@ static inline bool vtd_is_interrupt_addr(hwaddr
> addr)
> >   *
> >   * @bus_num: The bus number
> >   * @devfn: The devfn, which is the  combined of device and function
> number
> > - * @is_write: The access is a write operation
> > + * @flags: The access is a write operation
>
> Need to fix comment to suite the new flag.
>
> >   * @entry: IOMMUTLBEntry that contain the addr to be translated and
> result
> >   */
> >  static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus
> *bus,
> > -                                   uint8_t devfn, hwaddr addr, bool
> is_write,
> > +                                   uint8_t devfn, hwaddr addr,
> > +                                   IOMMUAccessFlags flags,
> >                                     IOMMUTLBEntry *entry)
> >  {
> >      IntelIOMMUState *s = vtd_as->iommu_state;
> > @@ -811,7 +827,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace
> *vtd_as, PCIBus *bus,
> >
> >      /* Check if the request is in interrupt address range */
> >      if (vtd_is_interrupt_addr(addr)) {
> > -        if (is_write) {
> > +        if (flags == IOMMU_WO || flags == IOMMU_RW) {
>
> I suggest we directly use (flags & IOMMU_WO) in all similar places.
>
> >              /* FIXME: since we don't know the length of the access
> here, we
> >               * treat Non-DWORD length write requests without PASID as
> >               * interrupt requests, too. Withoud interrupt remapping
> support,
> > @@ -827,7 +843,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace
> *vtd_as, PCIBus *bus,
> >          } else {
> >              VTD_DPRINTF(GENERAL, "error: read request from interrupt
> address "
> >                          "gpa 0x%"PRIx64, addr);
> > -            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ,
> is_write);
> > +            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ,
> flags);
>
> Again, we can avoid touching vtd_report_dmar_fault() interface, and
> use the old is_write. Looks like NO_FAIL makes no sense for it as well.
>
> Thanks,
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-19  9:33   ` Peter Xu
@ 2016-10-20 19:11     ` Aviv B.D.
  2016-10-20 19:11       ` Aviv B.D.
  2016-10-21  3:57       ` Peter Xu
  0 siblings, 2 replies; 31+ messages in thread
From: Aviv B.D. @ 2016-10-20 19:11 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Wed, Oct 19, 2016 at 12:33 PM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Oct 17, 2016 at 06:44:24PM +0300, Aviv B.D wrote:
> > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> >
> > Adds a list of registered vtd_as's to intel iommu state to save
> > iteration over each PCI device in a search of the corrosponding domain.
> >
> > Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> > ---
> >  hw/i386/intel_iommu.c          | 102 ++++++++++++++++++++++++++++++
> ++++++++---
> >  hw/i386/intel_iommu_internal.h |   2 +
> >  include/hw/i386/intel_iommu.h  |   9 ++++
> >  3 files changed, 106 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index dcf45f0..34fc1e8 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -51,6 +51,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) |
> VTD_DBGBIT(CSR);
> >  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
> >  #endif
> >
> > +static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t
> bus_num,
> > +                                    uint8_t devfn, VTDContextEntry *ce);
> > +
> >  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t
> val,
> >                              uint64_t wmask, uint64_t w1cmask)
> >  {
> > @@ -142,6 +145,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState
> *s, hwaddr addr,
> >      return new_val;
> >  }
> >
> > +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num,
> uint8_t devfn,
> > +                           uint16_t *domain_id)
> > +{
> > +    VTDContextEntry ce;
> > +    int ret_fr;
> > +
> > +    assert(domain_id);
> > +
> > +    ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> > +    if (ret_fr) {
> > +        return -1;
> > +    }
> > +
> > +    *domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
>                     ^ one more space
>
> > +    return 0;
> > +}
> > +
> >  /* GHashTable functions */
> >  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
> >  {
> > @@ -683,9 +703,6 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce,
> uint64_t gpa,
> >          *reads = (*reads) && (slpte & VTD_SL_R);
> >          *writes = (*writes) && (slpte & VTD_SL_W);
> >          if (!(slpte & access_right_check)) {
> > -            VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> > -                        "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> > -                        (flags == IOMMU_WO ? "write" : "read"), gpa,
> slpte);
>
> Could I ask why we are removing these lines? It can be useful if we
> have permission issues.
>

I will return Those lines if flags & NO_FAIL == 0

>
> >              return (flags == IOMMU_RW || flags == IOMMU_WO) ?
> >                     -VTD_FR_WRITE : -VTD_FR_READ;
> >          }
> > @@ -734,9 +751,6 @@ static int vtd_dev_to_context_entry(IntelIOMMUState
> *s, uint8_t bus_num,
> >      }
> >
> >      if (!vtd_context_entry_present(ce)) {
> > -        VTD_DPRINTF(GENERAL,
> > -                    "error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") "
> > -                    "is not present", devfn, bus_num);
>
> Here as well. Any reason to remove it?
>
>
Here as well...


> >          return -VTD_FR_CONTEXT_ENTRY_P;
> >      } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
> >                 (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
> > @@ -1065,6 +1079,55 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState
> *s, uint16_t domain_id)
> >                                  &domain_id);
> >  }
> >
> > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > +                                           uint16_t domain_id, hwaddr
> addr,
> > +                                           uint8_t am)
> > +{
>
> The logic of this function looks strange to me.
>
> > +    IntelIOMMUNotifierNode *node;
> > +
> > +    QLIST_FOREACH(node, &(s->notifiers_list), next) {
> > +        VTDAddressSpace *vtd_as = node->vtd_as;
> > +        uint16_t vfio_domain_id;
> > +        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus),
> vtd_as->devfn,
> > +                                  &vfio_domain_id);
> > +        if (!ret && domain_id == vfio_domain_id) {
> > +            IOMMUTLBEntry entry;
> > +
> > +            /* notify unmap */
> > +            if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
>
> First of all, if we are talking about VFIO, notifier_flag should
> always be MAP|UNMAP. So in that case, for newly mapped entries, looks
> like we will first send an UNMAP, then a MAP?
>

You are correct, there is no valid reason to have notifier_flag other than
MAP|UNMAP
at least for VFIO.
I'm not sure if in the feature there won't be good reason to do otherwise,
so my
code support this scenario...


> > +                VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d",
> > +                            addr, am);
> > +                entry.target_as = &address_space_memory;
> > +                entry.iova = addr & VTD_PAGE_MASK_4K;
> > +                entry.translated_addr = 0;
> > +                entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K +
> am);
> > +                entry.perm = IOMMU_NONE;
> > +                memory_region_notify_iommu(&node->vtd_as->iommu,
> entry);
> > +            }
> > +
> > +            /* notify map */
> > +            if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> > +                hwaddr original_addr = addr;
> > +                VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d",
> addr, am);
> > +                while (addr < original_addr + (1 << am) *
> VTD_PAGE_SIZE) {
> > +                    /* call to vtd_iommu_translate */
> > +                    IOMMUTLBEntry entry = s->iommu_ops.translate(
> > +
>  &node->vtd_as->iommu,
> > +                                                         addr,
> > +                                                         IOMMU_NO_FAIL);
> > +                    if (entry.perm != IOMMU_NONE) {
> > +                        addr += entry.addr_mask + 1;
> > +                        memory_region_notify_iommu(&node->vtd_as->iommu,
> entry);
> > +                    } else {
> > +                        addr += VTD_PAGE_SIZE;
>
> IIUC, here is the point that we found "the page is gone" (so this is
> an UNMAP invalidation), and we should do memory_region_iommu_notify()
> for the whole area with IOMMU_NONE. Then we just quit the loop since
> continuous translate()s should fail as well if the first page is
> missing.
>
> Please correct if I am wrong.
>

If I remember correctly I encounter a few cases where there was hole of
unmaped
memory in the middle of otherwise mapped pages. If I remember correctly it
was
with linux kernel 4.4, but I'm not sure.


> Thanks,
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-20 19:11     ` Aviv B.D.
@ 2016-10-20 19:11       ` Aviv B.D.
  2016-10-21  3:57       ` Peter Xu
  1 sibling, 0 replies; 31+ messages in thread
From: Aviv B.D. @ 2016-10-20 19:11 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

I will fix all those issues.
Thanks,
Aviv.

On Thu, Oct 20, 2016 at 10:11 PM, Aviv B.D. <bd.aviv@gmail.com> wrote:

>
>
> On Wed, Oct 19, 2016 at 12:33 PM, Peter Xu <peterx@redhat.com> wrote:
>
>> On Mon, Oct 17, 2016 at 06:44:24PM +0300, Aviv B.D wrote:
>> > From: "Aviv Ben-David" <bd.aviv@gmail.com>
>> >
>> > Adds a list of registered vtd_as's to intel iommu state to save
>> > iteration over each PCI device in a search of the corrosponding domain.
>> >
>> > Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
>> > ---
>> >  hw/i386/intel_iommu.c          | 102 ++++++++++++++++++++++++++++++
>> ++++++++---
>> >  hw/i386/intel_iommu_internal.h |   2 +
>> >  include/hw/i386/intel_iommu.h  |   9 ++++
>> >  3 files changed, 106 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> > index dcf45f0..34fc1e8 100644
>> > --- a/hw/i386/intel_iommu.c
>> > +++ b/hw/i386/intel_iommu.c
>> > @@ -51,6 +51,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) |
>> VTD_DBGBIT(CSR);
>> >  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
>> >  #endif
>> >
>> > +static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t
>> bus_num,
>> > +                                    uint8_t devfn, VTDContextEntry
>> *ce);
>> > +
>> >  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t
>> val,
>> >                              uint64_t wmask, uint64_t w1cmask)
>> >  {
>> > @@ -142,6 +145,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState
>> *s, hwaddr addr,
>> >      return new_val;
>> >  }
>> >
>> > +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num,
>> uint8_t devfn,
>> > +                           uint16_t *domain_id)
>> > +{
>> > +    VTDContextEntry ce;
>> > +    int ret_fr;
>> > +
>> > +    assert(domain_id);
>> > +
>> > +    ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
>> > +    if (ret_fr) {
>> > +        return -1;
>> > +    }
>> > +
>> > +    *domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
>>                     ^ one more space
>>
>> > +    return 0;
>> > +}
>> > +
>> >  /* GHashTable functions */
>> >  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>> >  {
>> > @@ -683,9 +703,6 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce,
>> uint64_t gpa,
>> >          *reads = (*reads) && (slpte & VTD_SL_R);
>> >          *writes = (*writes) && (slpte & VTD_SL_W);
>> >          if (!(slpte & access_right_check)) {
>> > -            VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
>> > -                        "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
>> > -                        (flags == IOMMU_WO ? "write" : "read"), gpa,
>> slpte);
>>
>> Could I ask why we are removing these lines? It can be useful if we
>> have permission issues.
>>
>
> I will return Those lines if flags & NO_FAIL == 0
>
>>
>> >              return (flags == IOMMU_RW || flags == IOMMU_WO) ?
>> >                     -VTD_FR_WRITE : -VTD_FR_READ;
>> >          }
>> > @@ -734,9 +751,6 @@ static int vtd_dev_to_context_entry(IntelIOMMUState
>> *s, uint8_t bus_num,
>> >      }
>> >
>> >      if (!vtd_context_entry_present(ce)) {
>> > -        VTD_DPRINTF(GENERAL,
>> > -                    "error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") "
>> > -                    "is not present", devfn, bus_num);
>>
>> Here as well. Any reason to remove it?
>>
>>
> Here as well...
>
>
>> >          return -VTD_FR_CONTEXT_ENTRY_P;
>> >      } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
>> >                 (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
>> > @@ -1065,6 +1079,55 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState
>> *s, uint16_t domain_id)
>> >                                  &domain_id);
>> >  }
>> >
>> > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
>> > +                                           uint16_t domain_id, hwaddr
>> addr,
>> > +                                           uint8_t am)
>> > +{
>>
>> The logic of this function looks strange to me.
>>
>> > +    IntelIOMMUNotifierNode *node;
>> > +
>> > +    QLIST_FOREACH(node, &(s->notifiers_list), next) {
>> > +        VTDAddressSpace *vtd_as = node->vtd_as;
>> > +        uint16_t vfio_domain_id;
>> > +        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus),
>> vtd_as->devfn,
>> > +                                  &vfio_domain_id);
>> > +        if (!ret && domain_id == vfio_domain_id) {
>> > +            IOMMUTLBEntry entry;
>> > +
>> > +            /* notify unmap */
>> > +            if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
>>
>> First of all, if we are talking about VFIO, notifier_flag should
>> always be MAP|UNMAP. So in that case, for newly mapped entries, looks
>> like we will first send an UNMAP, then a MAP?
>>
>
> You are correct, there is no valid reason to have notifier_flag other than
> MAP|UNMAP
> at least for VFIO.
> I'm not sure if in the feature there won't be good reason to do otherwise,
> so my
> code support this scenario...
>
>
>> > +                VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask
>> %d",
>> > +                            addr, am);
>> > +                entry.target_as = &address_space_memory;
>> > +                entry.iova = addr & VTD_PAGE_MASK_4K;
>> > +                entry.translated_addr = 0;
>> > +                entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K +
>> am);
>> > +                entry.perm = IOMMU_NONE;
>> > +                memory_region_notify_iommu(&node->vtd_as->iommu,
>> entry);
>> > +            }
>> > +
>> > +            /* notify map */
>> > +            if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
>> > +                hwaddr original_addr = addr;
>> > +                VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d",
>> addr, am);
>> > +                while (addr < original_addr + (1 << am) *
>> VTD_PAGE_SIZE) {
>> > +                    /* call to vtd_iommu_translate */
>> > +                    IOMMUTLBEntry entry = s->iommu_ops.translate(
>> > +
>>  &node->vtd_as->iommu,
>> > +                                                         addr,
>> > +
>>  IOMMU_NO_FAIL);
>> > +                    if (entry.perm != IOMMU_NONE) {
>> > +                        addr += entry.addr_mask + 1;
>> > +                        memory_region_notify_iommu(&node->vtd_as->iommu,
>> entry);
>> > +                    } else {
>> > +                        addr += VTD_PAGE_SIZE;
>>
>> IIUC, here is the point that we found "the page is gone" (so this is
>> an UNMAP invalidation), and we should do memory_region_iommu_notify()
>> for the whole area with IOMMU_NONE. Then we just quit the loop since
>> continuous translate()s should fail as well if the first page is
>> missing.
>>
>> Please correct if I am wrong.
>>
>
> If I remember correctly I encounter a few cases where there was hole of
> unmaped
> memory in the middle of otherwise mapped pages. If I remember correctly it
> was
> with linux kernel 4.4, but I'm not sure.
>
>
>> Thanks,
>>
>> -- peterx
>>
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-17 16:07 ` [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Alex Williamson
  2016-10-18  4:06   ` David Gibson
@ 2016-10-20 19:17   ` Aviv B.D.
  2016-10-20 20:06     ` Alex Williamson
  1 sibling, 1 reply; 31+ messages in thread
From: Aviv B.D. @ 2016-10-20 19:17 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Jan Kiszka

On Mon, Oct 17, 2016 at 7:07 PM, Alex Williamson <alex.williamson@redhat.com
> wrote:

> On Mon, 17 Oct 2016 18:44:21 +0300
> "Aviv B.D" <bd.aviv@gmail.com> wrote:
>
> > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> >
> > * Advertize Cache Mode capability in iommu cap register.
> >   This capability is controlled by "cache-mode" property of intel-iommu
> device.
> >   To enable this option call QEMU with "-device
> intel-iommu,cache-mode=true".
> >
> > * On page cache invalidation in intel vIOMMU, check if the domain belong
> to
> >   registered notifier, and notify accordingly.
> >
> > Currently this patch still doesn't enabling VFIO devices support with
> vIOMMU
> > present. Current problems:
> > * vfio_iommu_map_notify is not aware about memory range belong to
> specific
> >   VFIOGuestIOMMU.
>
> Could you elaborate on why this is an issue?
>

In my setup the VFIO registered two memory areas with one page of
unregistered memory
between them.

When I'm calling memory_region_notify_iommu it calls the notifier function
of VFIO twice
when the second time is failing with warning to console as the new mapping
is already present.

The notifier function of VFIO should ignore IOMMUTLBEntry that is not in
the correct
range.


>
> > * memory_region_iommu_replay hangs QEMU on start up while it itterate
> over
> >   64bit address space. Commenting out the call to this function enables
> >   workable VFIO device while vIOMMU present.
>
> This has been discussed previously, it would be incorrect for vfio not
> to call the replay function.  The solution is to add an iommu driver
> callback to efficiently walk the mappings within a MemoryRegion.
> Thanks,
>

I'm aware about those discussion, I put this info here as helpful reminder
if someone wants to
test my code.


> Alex
>
> > Changes from v1 to v2:
> > * remove assumption that the cache do not clears
> > * fix lockup on high load.
> >
> > Changes from v2 to v3:
> > * remove debug leftovers
> > * split to sepearate commits
> > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL
> >   to suppress error propagating to guest.
> >
> > Changes from v3 to v4:
> > * Add property to intel_iommu device to control the CM capability,
> >   default to False.
> > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> >
> > Changes from v4 to v4 RESEND:
> > * Fix codding style pointed by checkpatch.pl script.
> >
> > Aviv Ben-David (3):
> >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> >     guest
> >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> >     NO_FAIL flag mode
> >   IOMMU: enable intel_iommu map and unmap notifiers
> >
> >  exec.c                         |   3 +-
> >  hw/i386/amd_iommu.c            |   4 +-
> >  hw/i386/intel_iommu.c          | 175 ++++++++++++++++++++++++++++++
> +++--------
> >  hw/i386/intel_iommu_internal.h |   3 +
> >  include/exec/memory.h          |   6 +-
> >  include/hw/i386/intel_iommu.h  |  11 +++
> >  memory.c                       |   3 +-
> >  7 files changed, 168 insertions(+), 37 deletions(-)
> >
>
>
Aviv.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-20 19:17   ` Aviv B.D.
@ 2016-10-20 20:06     ` Alex Williamson
  2016-10-21  0:50       ` David Gibson
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2016-10-20 20:06 UTC (permalink / raw)
  To: Aviv B.D.
  Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Jan Kiszka, David Gibson

[cc +david]

On Thu, 20 Oct 2016 22:17:18 +0300
"Aviv B.D." <bd.aviv@gmail.com> wrote:

> On Mon, Oct 17, 2016 at 7:07 PM, Alex Williamson <alex.williamson@redhat.com
> > wrote:  
> 
> > On Mon, 17 Oct 2016 18:44:21 +0300
> > "Aviv B.D" <bd.aviv@gmail.com> wrote:
> >  
> > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > >
> > > * Advertize Cache Mode capability in iommu cap register.
> > >   This capability is controlled by "cache-mode" property of intel-iommu  
> > device.  
> > >   To enable this option call QEMU with "-device  
> > intel-iommu,cache-mode=true".  
> > >
> > > * On page cache invalidation in intel vIOMMU, check if the domain belong  
> > to  
> > >   registered notifier, and notify accordingly.
> > >
> > > Currently this patch still doesn't enabling VFIO devices support with  
> > vIOMMU  
> > > present. Current problems:
> > > * vfio_iommu_map_notify is not aware about memory range belong to  
> > specific  
> > >   VFIOGuestIOMMU.  
> >
> > Could you elaborate on why this is an issue?
> >  
> 
> In my setup the VFIO registered two memory areas with one page of
> unregistered memory
> between them.
> 
> When I'm calling memory_region_notify_iommu it calls the notifier function
> of VFIO twice
> when the second time is failing with warning to console as the new mapping
> is already present.
> 
> The notifier function of VFIO should ignore IOMMUTLBEntry that is not in
> the correct
> range.

Hmm, right vfio_listener_region_add() is called for a
MemoryRegionSection, but then we add an iommu notifier to the
MemoryRegion, so we end up with a notifier per MemoryRegionSection
regardless of whether they're backed by the same MemoryRegion.  Seems
like we need a MemoryRegion based list of VFIOGuestIOMMUs so we only
register once per MemoryRegion and then sort though the list of
VFIOGuestIOMMUs for a given MemoryRegion to find the one affected.
David, does that sound right to you?

I am curious why you get two regions separated by one page, can you
give an example of the ranges for each?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-20 20:06     ` Alex Williamson
@ 2016-10-21  0:50       ` David Gibson
  2016-10-21  5:17         ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: David Gibson @ 2016-10-21  0:50 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Aviv B.D., qemu-devel, Michael S. Tsirkin, Peter Xu, Jan Kiszka

[-- Attachment #1: Type: text/plain, Size: 2986 bytes --]

On Thu, Oct 20, 2016 at 02:06:08PM -0600, Alex Williamson wrote:
> [cc +david]
> 
> On Thu, 20 Oct 2016 22:17:18 +0300
> "Aviv B.D." <bd.aviv@gmail.com> wrote:
> 
> > On Mon, Oct 17, 2016 at 7:07 PM, Alex Williamson <alex.williamson@redhat.com
> > > wrote:  
> > 
> > > On Mon, 17 Oct 2016 18:44:21 +0300
> > > "Aviv B.D" <bd.aviv@gmail.com> wrote:
> > >  
> > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > >
> > > > * Advertize Cache Mode capability in iommu cap register.
> > > >   This capability is controlled by "cache-mode" property of intel-iommu  
> > > device.  
> > > >   To enable this option call QEMU with "-device  
> > > intel-iommu,cache-mode=true".  
> > > >
> > > > * On page cache invalidation in intel vIOMMU, check if the domain belong  
> > > to  
> > > >   registered notifier, and notify accordingly.
> > > >
> > > > Currently this patch still doesn't enabling VFIO devices support with  
> > > vIOMMU  
> > > > present. Current problems:
> > > > * vfio_iommu_map_notify is not aware about memory range belong to  
> > > specific  
> > > >   VFIOGuestIOMMU.  
> > >
> > > Could you elaborate on why this is an issue?
> > >  
> > 
> > In my setup the VFIO registered two memory areas with one page of
> > unregistered memory
> > between them.
> > 
> > When I'm calling memory_region_notify_iommu it calls the notifier function
> > of VFIO twice
> > when the second time is failing with warning to console as the new mapping
> > is already present.
> > 
> > The notifier function of VFIO should ignore IOMMUTLBEntry that is not in
> > the correct
> > range.
> 
> Hmm, right vfio_listener_region_add() is called for a
> MemoryRegionSection, but then we add an iommu notifier to the
> MemoryRegion, so we end up with a notifier per MemoryRegionSection
> regardless of whether they're backed by the same MemoryRegion.  Seems
> like we need a MemoryRegion based list of VFIOGuestIOMMUs so we only
> register once per MemoryRegion and then sort though the list of
> VFIOGuestIOMMUs for a given MemoryRegion to find the one affected.
> David, does that sound right to you?

Well, I think that would work.  But I think it would be better to fix
it from the other side:

We add the range to be notified into the IOMMUNotifier structure and
filter based on that range in memory_region_notify_iommu.

It means a little more list searching and filtering on notify, but it
avoids having to have another list and search on the VFIO side.  I
think it will also better deal with cases where part of an IOMMU
mapped region is inaccessible due to an intermediate bridge.

> I am curious why you get two regions separated by one page, can you
> give an example of the ranges for each?  Thanks,
> 
> Alex
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-20 19:11     ` Aviv B.D.
  2016-10-20 19:11       ` Aviv B.D.
@ 2016-10-21  3:57       ` Peter Xu
  2016-10-24  7:53         ` Aviv B.D.
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Xu @ 2016-10-21  3:57 UTC (permalink / raw)
  To: Aviv B.D.; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Thu, Oct 20, 2016 at 10:11:15PM +0300, Aviv B.D. wrote:

[...]

> > > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > > +                                           uint16_t domain_id, hwaddr
> > addr,
> > > +                                           uint8_t am)
> > > +{
> >
> > The logic of this function looks strange to me.
> >
> > > +    IntelIOMMUNotifierNode *node;
> > > +
> > > +    QLIST_FOREACH(node, &(s->notifiers_list), next) {
> > > +        VTDAddressSpace *vtd_as = node->vtd_as;
> > > +        uint16_t vfio_domain_id;
> > > +        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus),
> > vtd_as->devfn,
> > > +                                  &vfio_domain_id);
> > > +        if (!ret && domain_id == vfio_domain_id) {
> > > +            IOMMUTLBEntry entry;
> > > +
> > > +            /* notify unmap */
> > > +            if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
> >
> > First of all, if we are talking about VFIO, notifier_flag should
> > always be MAP|UNMAP. So in that case, for newly mapped entries, looks
> > like we will first send an UNMAP, then a MAP?
> >
> 
> You are correct, there is no valid reason to have notifier_flag other than
> MAP|UNMAP
> at least for VFIO.

Not sure whether you know about upstream vhost work, vhost is going to
support Intel IOMMU, in that case, it will only register to UNMAP
notifier, not MAP.

> I'm not sure if in the feature there won't be good reason to do otherwise,
> so my
> code support this scenario...

My point is, I think you should not send this notify unconditionally.
IMO the logic should be simpler here, like:

    foreach (page in invalidation range) {
        IOTLBEntry entry = m->iommu_ops.translate();
        if (entry.perm != IOMMU_NONE && notifier_flag & NOTIIFER_MAP) {
            /* this is map, user requested MAP flag */
            memory_region_iommu_notify(entry);
        } else if (entry.perm == IOMMU_NONE && notifier_flag &
                NOTIFIER_UNMAP) {
            /* this is unmap, user requested UNMAP */
            entry = ... /* setup the entry */
            memory_region_iommu_notify(entry);
        }
    }

This is to follow your logic. I don't know whether this is efficient
enough, maybe good for the first version. The problem is, when you
call translate(), you will need to go over the page every time from
root dir. A faster way may be: provide a function to walk specific
address range. If you are going to implement the replay logic that
Alex/David has mentioned, maybe that will help too (walk over the
whole 64bit range).

> 
> 
> > > +                VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d",
> > > +                            addr, am);
> > > +                entry.target_as = &address_space_memory;
> > > +                entry.iova = addr & VTD_PAGE_MASK_4K;
> > > +                entry.translated_addr = 0;
> > > +                entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K +
> > am);
> > > +                entry.perm = IOMMU_NONE;
> > > +                memory_region_notify_iommu(&node->vtd_as->iommu,
> > entry);
> > > +            }
> > > +
> > > +            /* notify map */
> > > +            if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> > > +                hwaddr original_addr = addr;
> > > +                VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d",
> > addr, am);
> > > +                while (addr < original_addr + (1 << am) *
> > VTD_PAGE_SIZE) {
> > > +                    /* call to vtd_iommu_translate */
> > > +                    IOMMUTLBEntry entry = s->iommu_ops.translate(
> > > +
> >  &node->vtd_as->iommu,
> > > +                                                         addr,
> > > +                                                         IOMMU_NO_FAIL);
> > > +                    if (entry.perm != IOMMU_NONE) {
> > > +                        addr += entry.addr_mask + 1;
> > > +                        memory_region_notify_iommu(&node->vtd_as->iommu,
> > entry);
> > > +                    } else {
> > > +                        addr += VTD_PAGE_SIZE;
> >
> > IIUC, here is the point that we found "the page is gone" (so this is
> > an UNMAP invalidation), and we should do memory_region_iommu_notify()
> > for the whole area with IOMMU_NONE. Then we just quit the loop since
> > continuous translate()s should fail as well if the first page is
> > missing.
> >
> > Please correct if I am wrong.
> >
> 
> If I remember correctly I encounter a few cases where there was hole of
> unmaped
> memory in the middle of otherwise mapped pages. If I remember correctly it
> was
> with linux kernel 4.4, but I'm not sure.

I see. Thanks.

-- peterx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-21  0:50       ` David Gibson
@ 2016-10-21  5:17         ` Peter Xu
  2016-10-21 14:43           ` Alex Williamson
  2016-10-24  6:03           ` David Gibson
  0 siblings, 2 replies; 31+ messages in thread
From: Peter Xu @ 2016-10-21  5:17 UTC (permalink / raw)
  To: David Gibson
  Cc: Alex Williamson, Aviv B.D., qemu-devel, Michael S. Tsirkin, Jan Kiszka

On Fri, Oct 21, 2016 at 11:50:05AM +1100, David Gibson wrote:

[...]

> > > In my setup the VFIO registered two memory areas with one page of
> > > unregistered memory
> > > between them.
> > > 
> > > When I'm calling memory_region_notify_iommu it calls the notifier function
> > > of VFIO twice
> > > when the second time is failing with warning to console as the new mapping
> > > is already present.
> > > 
> > > The notifier function of VFIO should ignore IOMMUTLBEntry that is not in
> > > the correct
> > > range.
> > 
> > Hmm, right vfio_listener_region_add() is called for a
> > MemoryRegionSection, but then we add an iommu notifier to the
> > MemoryRegion, so we end up with a notifier per MemoryRegionSection
> > regardless of whether they're backed by the same MemoryRegion.  Seems
> > like we need a MemoryRegion based list of VFIOGuestIOMMUs so we only
> > register once per MemoryRegion and then sort though the list of
> > VFIOGuestIOMMUs for a given MemoryRegion to find the one affected.
> > David, does that sound right to you?

I think we already have such a list (VFIOContainer.giommu_list)? Can
we use that to do it? When we try to add a new IOMMU notifier for
specific MemoryRegion, we can first traverse VFIOContainer.giommu_list
and see whether there are existing MemoryRegion registered, and we
only register if it's the first one.

> 
> Well, I think that would work.  But I think it would be better to fix
> it from the other side:
> 
> We add the range to be notified into the IOMMUNotifier structure and
> filter based on that range in memory_region_notify_iommu.
> 
> It means a little more list searching and filtering on notify, but it
> avoids having to have another list and search on the VFIO side.  I
> think it will also better deal with cases where part of an IOMMU
> mapped region is inaccessible due to an intermediate bridge.

IIUC, this will still need to keep several VFIOGuestIOMMUs which
contains exactly the same content?

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
@ 2016-10-21  7:14   ` Jason Wang
  2016-10-21 19:47     ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2016-10-21  7:14 UTC (permalink / raw)
  To: Aviv B.D, qemu-devel
  Cc: Jan Kiszka, Alex Williamson, Peter Xu, Michael S. Tsirkin



On 2016年10月17日 23:44, Aviv B.D wrote:
> From: "Aviv Ben-David" <bd.aviv@gmail.com>
>
> This capability asks the guest to invalidate cache before each map operation.
> We can use this invalidation to trap map operations in the hypervisor.
>
> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> ---
>   hw/i386/intel_iommu.c          | 5 +++++
>   hw/i386/intel_iommu_internal.h | 1 +
>   include/hw/i386/intel_iommu.h  | 2 ++
>   3 files changed, 8 insertions(+)

As I asked in previous version, this may not be sufficient.

CM requires to cache fault translations which is not implemented in this 
patch. Guest can easily notice this kind of spec violation.

Thanks

>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 2efd69b..69730cb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2012,6 +2012,7 @@ static const MemoryRegionOps vtd_mem_ops = {
>   
>   static Property vtd_properties[] = {
>       DEFINE_PROP_UINT32("version", IntelIOMMUState, version, 0),
> +    DEFINE_PROP_BOOL("cache-mode", IntelIOMMUState, cache_mode_enabled, FALSE),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> @@ -2385,6 +2386,10 @@ static void vtd_init(IntelIOMMUState *s)
>           s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM | VTD_ECAP_MHMV;
>       }
>   
> +    if (s->cache_mode_enabled) {
> +        s->cap |= VTD_CAP_CM;
> +    }
> +
>       vtd_reset_context_cache(s);
>       vtd_reset_iotlb(s);
>   
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 0829a50..35d9f3a 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -201,6 +201,7 @@
>   #define VTD_CAP_MAMV                (VTD_MAMV << 48)
>   #define VTD_CAP_PSI                 (1ULL << 39)
>   #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> +#define VTD_CAP_CM                  (1ULL << 7)
>   
>   /* Supported Adjusted Guest Address Widths */
>   #define VTD_CAP_SAGAW_SHIFT         8
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index a42dbd7..7a94f16 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -258,6 +258,8 @@ struct IntelIOMMUState {
>       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
>       uint32_t version;
>   
> +    bool cache_mode_enabled;        /* RO - is cap CM enabled? */
> +
>       dma_addr_t root;                /* Current root table pointer */
>       bool root_extended;             /* Type of root table (extended or not) */
>       bool dmar_enabled;              /* Set if DMA remapping is enabled */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-21  5:17         ` Peter Xu
@ 2016-10-21 14:43           ` Alex Williamson
  2016-10-31  6:47             ` Peter Xu
  2016-10-24  6:03           ` David Gibson
  1 sibling, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2016-10-21 14:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Gibson, Aviv B.D., qemu-devel, Michael S. Tsirkin, Jan Kiszka

On Fri, 21 Oct 2016 13:17:08 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Oct 21, 2016 at 11:50:05AM +1100, David Gibson wrote:
> 
> [...]
> 
> > > > In my setup the VFIO registered two memory areas with one page of
> > > > unregistered memory
> > > > between them.
> > > > 
> > > > When I'm calling memory_region_notify_iommu it calls the notifier function
> > > > of VFIO twice
> > > > when the second time is failing with warning to console as the new mapping
> > > > is already present.
> > > > 
> > > > The notifier function of VFIO should ignore IOMMUTLBEntry that is not in
> > > > the correct
> > > > range.  
> > > 
> > > Hmm, right vfio_listener_region_add() is called for a
> > > MemoryRegionSection, but then we add an iommu notifier to the
> > > MemoryRegion, so we end up with a notifier per MemoryRegionSection
> > > regardless of whether they're backed by the same MemoryRegion.  Seems
> > > like we need a MemoryRegion based list of VFIOGuestIOMMUs so we only
> > > register once per MemoryRegion and then sort though the list of
> > > VFIOGuestIOMMUs for a given MemoryRegion to find the one affected.
> > > David, does that sound right to you?  
> 
> I think we already have such a list (VFIOContainer.giommu_list)? Can
> we use that to do it? When we try to add a new IOMMU notifier for
> specific MemoryRegion, we can first traverse VFIOContainer.giommu_list
> and see whether there are existing MemoryRegion registered, and we
> only register if it's the first one.

That's effectively what I'm suggesting except I was trying to add some
efficiency by tracking both the MemoryRegion and the
MemoryRegionSection separately so we don't need to traverse a list to
see if a MemoryRegion is already registered.

> > 
> > Well, I think that would work.  But I think it would be better to fix
> > it from the other side:
> > 
> > We add the range to be notified into the IOMMUNotifier structure and
> > filter based on that range in memory_region_notify_iommu.
> > 
> > It means a little more list searching and filtering on notify, but it
> > avoids having to have another list and search on the VFIO side.  I
> > think it will also better deal with cases where part of an IOMMU
> > mapped region is inaccessible due to an intermediate bridge.  
> 
> IIUC, this will still need to keep several VFIOGuestIOMMUs which
> contains exactly the same content?

The contain different offsets since this is based on the section rather
than the MemoryRegion.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-10-21  7:14   ` Jason Wang
@ 2016-10-21 19:47     ` Michael S. Tsirkin
  2016-10-24  2:32       ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2016-10-21 19:47 UTC (permalink / raw)
  To: Jason Wang; +Cc: Aviv B.D, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu

On Fri, Oct 21, 2016 at 03:14:00PM +0800, Jason Wang wrote:
> 
> 
> On 2016年10月17日 23:44, Aviv B.D wrote:
> > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > 
> > This capability asks the guest to invalidate cache before each map operation.
> > We can use this invalidation to trap map operations in the hypervisor.
> > 
> > Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> > ---
> >   hw/i386/intel_iommu.c          | 5 +++++
> >   hw/i386/intel_iommu_internal.h | 1 +
> >   include/hw/i386/intel_iommu.h  | 2 ++
> >   3 files changed, 8 insertions(+)
> 
> As I asked in previous version, this may not be sufficient.
> 
> CM requires to cache fault translations which is not implemented in this
> patch.

I'm not sure why would there be a requirement to cache
fault information. Cache can always be invalidated for
any reason, in particular an empty cache is always OK.


> Guest can easily notice this kind of spec violation.

How?

> Thanks
> 
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 2efd69b..69730cb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2012,6 +2012,7 @@ static const MemoryRegionOps vtd_mem_ops = {
> >   static Property vtd_properties[] = {
> >       DEFINE_PROP_UINT32("version", IntelIOMMUState, version, 0),
> > +    DEFINE_PROP_BOOL("cache-mode", IntelIOMMUState, cache_mode_enabled, FALSE),
> >       DEFINE_PROP_END_OF_LIST(),
> >   };
> > @@ -2385,6 +2386,10 @@ static void vtd_init(IntelIOMMUState *s)
> >           s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM | VTD_ECAP_MHMV;
> >       }
> > +    if (s->cache_mode_enabled) {
> > +        s->cap |= VTD_CAP_CM;
> > +    }
> > +
> >       vtd_reset_context_cache(s);
> >       vtd_reset_iotlb(s);
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index 0829a50..35d9f3a 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -201,6 +201,7 @@
> >   #define VTD_CAP_MAMV                (VTD_MAMV << 48)
> >   #define VTD_CAP_PSI                 (1ULL << 39)
> >   #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> > +#define VTD_CAP_CM                  (1ULL << 7)
> >   /* Supported Adjusted Guest Address Widths */
> >   #define VTD_CAP_SAGAW_SHIFT         8
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index a42dbd7..7a94f16 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -258,6 +258,8 @@ struct IntelIOMMUState {
> >       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
> >       uint32_t version;
> > +    bool cache_mode_enabled;        /* RO - is cap CM enabled? */
> > +
> >       dma_addr_t root;                /* Current root table pointer */
> >       bool root_extended;             /* Type of root table (extended or not) */
> >       bool dmar_enabled;              /* Set if DMA remapping is enabled */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-10-21 19:47     ` Michael S. Tsirkin
@ 2016-10-24  2:32       ` Jason Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2016-10-24  2:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Aviv B.D, Alex Williamson, qemu-devel, Peter Xu



On 2016年10月22日 03:47, Michael S. Tsirkin wrote:
> On Fri, Oct 21, 2016 at 03:14:00PM +0800, Jason Wang wrote:
>> >
>> >
>> >On 2016年10月17日 23:44, Aviv B.D wrote:
>>> > >From: "Aviv Ben-David"<bd.aviv@gmail.com>
>>> > >
>>> > >This capability asks the guest to invalidate cache before each map operation.
>>> > >We can use this invalidation to trap map operations in the hypervisor.
>>> > >
>>> > >Signed-off-by: Aviv Ben-David<bd.aviv@gmail.com>
>>> > >---
>>> > >   hw/i386/intel_iommu.c          | 5 +++++
>>> > >   hw/i386/intel_iommu_internal.h | 1 +
>>> > >   include/hw/i386/intel_iommu.h  | 2 ++
>>> > >   3 files changed, 8 insertions(+)
>> >
>> >As I asked in previous version, this may not be sufficient.
>> >
>> >CM requires to cache fault translations which is not implemented in this
>> >patch.
> I'm not sure why would there be a requirement to cache
> fault information. Cache can always be invalidated for
> any reason, in particular an empty cache is always OK.

s/requires/may/. But what did here is "don't". Isn't this an obvious 
violation?

Empty cache only work if we don't implement an real IOTLB but traverse 
the IO page tables each time.

>
>> >Guest can easily notice this kind of spec violation.
> How?
>

I guess this may do the detection:

1) map iova A to be non-present.
2) invalidate iova A
3) access iova A
4) map iova A to addr B
5) access iova A

A correct implemented CM may meet fault in step 5, but with this patch, 
we don't.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-21  5:17         ` Peter Xu
  2016-10-21 14:43           ` Alex Williamson
@ 2016-10-24  6:03           ` David Gibson
  1 sibling, 0 replies; 31+ messages in thread
From: David Gibson @ 2016-10-24  6:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: Alex Williamson, Aviv B.D., qemu-devel, Michael S. Tsirkin, Jan Kiszka

[-- Attachment #1: Type: text/plain, Size: 2815 bytes --]

On Fri, Oct 21, 2016 at 01:17:08PM +0800, Peter Xu wrote:
> On Fri, Oct 21, 2016 at 11:50:05AM +1100, David Gibson wrote:
> 
> [...]
> 
> > > > In my setup the VFIO registered two memory areas with one page of
> > > > unregistered memory
> > > > between them.
> > > > 
> > > > When I'm calling memory_region_notify_iommu it calls the notifier function
> > > > of VFIO twice
> > > > when the second time is failing with warning to console as the new mapping
> > > > is already present.
> > > > 
> > > > The notifier function of VFIO should ignore IOMMUTLBEntry that is not in
> > > > the correct
> > > > range.
> > > 
> > > Hmm, right vfio_listener_region_add() is called for a
> > > MemoryRegionSection, but then we add an iommu notifier to the
> > > MemoryRegion, so we end up with a notifier per MemoryRegionSection
> > > regardless of whether they're backed by the same MemoryRegion.  Seems
> > > like we need a MemoryRegion based list of VFIOGuestIOMMUs so we only
> > > register once per MemoryRegion and then sort though the list of
> > > VFIOGuestIOMMUs for a given MemoryRegion to find the one affected.
> > > David, does that sound right to you?
> 
> I think we already have such a list (VFIOContainer.giommu_list)?  Can
> we use that to do it? When we try to add a new IOMMU notifier for
> specific MemoryRegion, we can first traverse VFIOContainer.giommu_list
> and see whether there are existing MemoryRegion registered, and we
> only register if it's the first one.

Yes, I think that would work, but I still prefer the alternate
approach I describe below.

> > Well, I think that would work.  But I think it would be better to fix
> > it from the other side:
> > 
> > We add the range to be notified into the IOMMUNotifier structure and
> > filter based on that range in memory_region_notify_iommu.
> > 
> > It means a little more list searching and filtering on notify, but it
> > avoids having to have another list and search on the VFIO side.  I
> > think it will also better deal with cases where part of an IOMMU
> > mapped region is inaccessible due to an intermediate bridge.
> 
> IIUC, this will still need to keep several VFIOGuestIOMMUs which
> contains exactly the same content?

Well, it would no longer be exactly the same content since the
different ranges would be stored within the IOMMUNotifier
substructure.  But, yes, it would be very similar.

I think it's still worth it for improved clarity, and the possibility
to handle in future cases where part of the IOMMU region simply isn't
accessible at all in the CPU address space.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-21  3:57       ` Peter Xu
@ 2016-10-24  7:53         ` Aviv B.D.
  2016-10-24  8:02           ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: Aviv B.D. @ 2016-10-24  7:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Fri, Oct 21, 2016 at 6:57 AM, Peter Xu <peterx@redhat.com> wrote:

> On Thu, Oct 20, 2016 at 10:11:15PM +0300, Aviv B.D. wrote:
>
> [...]
>
> > > > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > > > +                                           uint16_t domain_id,
> hwaddr
> > > addr,
> > > > +                                           uint8_t am)
> > > > +{
> > >
> > > The logic of this function looks strange to me.
> > >
> > > > +    IntelIOMMUNotifierNode *node;
> > > > +
> > > > +    QLIST_FOREACH(node, &(s->notifiers_list), next) {
> > > > +        VTDAddressSpace *vtd_as = node->vtd_as;
> > > > +        uint16_t vfio_domain_id;
> > > > +        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus),
> > > vtd_as->devfn,
> > > > +                                  &vfio_domain_id);
> > > > +        if (!ret && domain_id == vfio_domain_id) {
> > > > +            IOMMUTLBEntry entry;
> > > > +
> > > > +            /* notify unmap */
> > > > +            if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
> > >
> > > First of all, if we are talking about VFIO, notifier_flag should
> > > always be MAP|UNMAP. So in that case, for newly mapped entries, looks
> > > like we will first send an UNMAP, then a MAP?
> > >
> >
> > You are correct, there is no valid reason to have notifier_flag other
> than
> > MAP|UNMAP
> > at least for VFIO.
>
> Not sure whether you know about upstream vhost work, vhost is going to
> support Intel IOMMU, in that case, it will only register to UNMAP
> notifier, not MAP.
>
> > I'm not sure if in the feature there won't be good reason to do
> otherwise,
> > so my
> > code support this scenario...
>
> My point is, I think you should not send this notify unconditionally.
> IMO the logic should be simpler here, like:
>
>     foreach (page in invalidation range) {
>         IOTLBEntry entry = m->iommu_ops.translate();
>         if (entry.perm != IOMMU_NONE && notifier_flag & NOTIIFER_MAP) {
>             /* this is map, user requested MAP flag */
>             memory_region_iommu_notify(entry);
>         } else if (entry.perm == IOMMU_NONE && notifier_flag &
>                 NOTIFIER_UNMAP) {
>             /* this is unmap, user requested UNMAP */
>             entry = ... /* setup the entry */
>             memory_region_iommu_notify(entry);
>         }
>     }
>

This was my first algorithm, but VFIO do not support remapping of mapped
page.
Before each MAP operation in VFIO one must do unmap, and therefore I'm
sending
the unmap notifications blindly before.
I can rearrange my code closer to your suggestion.


>
> This is to follow your logic. I don't know whether this is efficient
> enough, maybe good for the first version. The problem is, when you
> call translate(), you will need to go over the page every time from
> root dir. A faster way may be: provide a function to walk specific
> address range. If you are going to implement the replay logic that
> Alex/David has mentioned, maybe that will help too (walk over the
> whole 64bit range).
>
> Interesting idea, but I prefer to add it in separate patch set after this
one committed, if it's OK.


> >
> >
> > > > +                VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask
> %d",
> > > > +                            addr, am);
> > > > +                entry.target_as = &address_space_memory;
> > > > +                entry.iova = addr & VTD_PAGE_MASK_4K;
> > > > +                entry.translated_addr = 0;
> > > > +                entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K
> +
> > > am);
> > > > +                entry.perm = IOMMU_NONE;
> > > > +                memory_region_notify_iommu(&node->vtd_as->iommu,
> > > entry);
> > > > +            }
> > > > +
> > > > +            /* notify map */
> > > > +            if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> > > > +                hwaddr original_addr = addr;
> > > > +                VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask
> %d",
> > > addr, am);
> > > > +                while (addr < original_addr + (1 << am) *
> > > VTD_PAGE_SIZE) {
> > > > +                    /* call to vtd_iommu_translate */
> > > > +                    IOMMUTLBEntry entry = s->iommu_ops.translate(
> > > > +
> > >  &node->vtd_as->iommu,
> > > > +                                                         addr,
> > > > +
>  IOMMU_NO_FAIL);
> > > > +                    if (entry.perm != IOMMU_NONE) {
> > > > +                        addr += entry.addr_mask + 1;
> > > > +                        memory_region_notify_iommu(&
> node->vtd_as->iommu,
> > > entry);
> > > > +                    } else {
> > > > +                        addr += VTD_PAGE_SIZE;
> > >
> > > IIUC, here is the point that we found "the page is gone" (so this is
> > > an UNMAP invalidation), and we should do memory_region_iommu_notify()
> > > for the whole area with IOMMU_NONE. Then we just quit the loop since
> > > continuous translate()s should fail as well if the first page is
> > > missing.
> > >
> > > Please correct if I am wrong.
> > >
> >
> > If I remember correctly I encounter a few cases where there was hole of
> > unmaped
> > memory in the middle of otherwise mapped pages. If I remember correctly
> it
> > was
> > with linux kernel 4.4, but I'm not sure.
>
> I see. Thanks.
>
> -- peterx
>

Best regards,
Aviv.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-24  7:53         ` Aviv B.D.
@ 2016-10-24  8:02           ` Peter Xu
  2016-10-25 10:07             ` Aviv B.D.
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2016-10-24  8:02 UTC (permalink / raw)
  To: Aviv B.D.; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Mon, Oct 24, 2016 at 10:53:01AM +0300, Aviv B.D. wrote:

[...]

> This was my first algorithm, but VFIO do not support remapping of mapped
> page.
> Before each MAP operation in VFIO one must do unmap, and therefore I'm
> sending
> the unmap notifications blindly before.
> I can rearrange my code closer to your suggestion.

If so, I would suggest we solve the real problem first: we should not
notify VFIO twice on map(), but only once. IMO either Alex's or
David's suggestion (in the other mail) is a good start.

> 
> 
> >
> > This is to follow your logic. I don't know whether this is efficient
> > enough, maybe good for the first version. The problem is, when you
> > call translate(), you will need to go over the page every time from
> > root dir. A faster way may be: provide a function to walk specific
> > address range. If you are going to implement the replay logic that
> > Alex/David has mentioned, maybe that will help too (walk over the
> > whole 64bit range).
> >
> > Interesting idea, but I prefer to add it in separate patch set after this
> one committed, if it's OK.

Sure.

-- peterx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-10-24  8:02           ` Peter Xu
@ 2016-10-25 10:07             ` Aviv B.D.
  0 siblings, 0 replies; 31+ messages in thread
From: Aviv B.D. @ 2016-10-25 10:07 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Michael S. Tsirkin, Alex Williamson, Jan Kiszka

On Mon, Oct 24, 2016 at 11:02 AM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Oct 24, 2016 at 10:53:01AM +0300, Aviv B.D. wrote:
>
> [...]
>
> > This was my first algorithm, but VFIO do not support remapping of mapped
> > page.
> > Before each MAP operation in VFIO one must do unmap, and therefore I'm
> > sending
> > the unmap notifications blindly before.
> > I can rearrange my code closer to your suggestion.
>
> If so, I would suggest we solve the real problem first: we should not
> notify VFIO twice on map(), but only once. IMO either Alex's or
> David's suggestion (in the other mail) is a good start.
>
> OK. I will publish a new patch set that notify only once per page.
I prefer David's suggestion - adding the range information to the notifier
struct
and check it from inside the VFIO notification function.


> >
> >
> > >
> > > This is to follow your logic. I don't know whether this is efficient
> > > enough, maybe good for the first version. The problem is, when you
> > > call translate(), you will need to go over the page every time from
> > > root dir. A faster way may be: provide a function to walk specific
> > > address range. If you are going to implement the replay logic that
> > > Alex/David has mentioned, maybe that will help too (walk over the
> > > whole 64bit range).
> > >
> > > Interesting idea, but I prefer to add it in separate patch set after
> this
> > one committed, if it's OK.
>
> Sure.
>
> -- peterx
>

Thanks,
Aviv.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-10-21 14:43           ` Alex Williamson
@ 2016-10-31  6:47             ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2016-10-31  6:47 UTC (permalink / raw)
  To: Alex Williamson
  Cc: David Gibson, Aviv B.D., qemu-devel, Michael S. Tsirkin, Jan Kiszka

On Fri, Oct 21, 2016 at 08:43:05AM -0600, Alex Williamson wrote:

[...]

> > > Well, I think that would work.  But I think it would be better to fix
> > > it from the other side:
> > > 
> > > We add the range to be notified into the IOMMUNotifier structure and
> > > filter based on that range in memory_region_notify_iommu.
> > > 
> > > It means a little more list searching and filtering on notify, but it
> > > avoids having to have another list and search on the VFIO side.  I
> > > think it will also better deal with cases where part of an IOMMU
> > > mapped region is inaccessible due to an intermediate bridge.  
> > 
> > IIUC, this will still need to keep several VFIOGuestIOMMUs which
> > contains exactly the same content?
> 
> The contain different offsets since this is based on the section rather
> than the MemoryRegion.  Thanks,

Though vfio_listener_region_add() is per memory region section,
VFIOGuestIOMMU looks like to be for memory region? See in
vfio_listener_region_add():

        giommu->iommu_offset = section->offset_within_address_space -
                               section->offset_within_region;

Here VFIOGuestIOMMU.iommu_offset should be the offset of memory
region (rather than memory region section offset), right?

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-10-31  6:47 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 15:44 [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
2016-10-21  7:14   ` Jason Wang
2016-10-21 19:47     ` Michael S. Tsirkin
2016-10-24  2:32       ` Jason Wang
2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
2016-10-18  3:57   ` David Gibson
2016-10-19  8:35   ` Peter Xu
2016-10-20 18:54     ` Aviv B.D.
2016-10-17 15:44 ` [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
2016-10-18  4:04   ` David Gibson
2016-10-19  9:33   ` Peter Xu
2016-10-20 19:11     ` Aviv B.D.
2016-10-20 19:11       ` Aviv B.D.
2016-10-21  3:57       ` Peter Xu
2016-10-24  7:53         ` Aviv B.D.
2016-10-24  8:02           ` Peter Xu
2016-10-25 10:07             ` Aviv B.D.
2016-10-20  7:28   ` Peter Xu
2016-10-17 16:07 ` [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications Alex Williamson
2016-10-18  4:06   ` David Gibson
2016-10-18  4:47     ` Alex Williamson
2016-10-18  5:52       ` David Gibson
2016-10-18  8:03         ` Alex Williamson
2016-10-20 19:17   ` Aviv B.D.
2016-10-20 20:06     ` Alex Williamson
2016-10-21  0:50       ` David Gibson
2016-10-21  5:17         ` Peter Xu
2016-10-21 14:43           ` Alex Williamson
2016-10-31  6:47             ` Peter Xu
2016-10-24  6:03           ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.