All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
@ 2016-11-08 11:04 Aviv B.D
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Aviv B.D @ 2016-11-08 11:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

* Advertize Cache Mode capability in iommu cap register. 
  This capability is controlled by "cache-mode" property of intel-iommu device.
  To enable this option call QEMU with "-device intel-iommu,cache-mode=true".

* On page cache invalidation in intel vIOMMU, check if the domain belong to
  registered notifier, and notify accordingly.

Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
present. Current problems:
* vfio_iommu_map_notify is not aware about memory range belong to specific 
  VFIOGuestIOMMU.
* memory_region_iommu_replay hangs QEMU on start up while it itterate over 
  64bit address space. Commenting out the call to this function enables 
  workable VFIO device while vIOMMU present.
* vfio_iommu_map_notify should check if address space range is suitable for 
  current notifier.

Changes from v1 to v2:
* remove assumption that the cache do not clears
* fix lockup on high load.

Changes from v2 to v3:
* remove debug leftovers
* split to sepearate commits
* change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
  to suppress error propagating to guest.

Changes from v3 to v4:
* Add property to intel_iommu device to control the CM capability, 
  default to False.
* Use s->iommu_ops.notify_flag_changed to register notifiers.

Changes from v4 to v4 RESEND:
* Fix codding style pointed by checkpatch.pl script.

Changes from v4 to v5:
* Reduce the number of changes in patch 2 and make flags real bitfield.
* Revert deleted debug prints.
* Fix memory leak in patch 3.

Changes from v5 to v6:
* fix prototype of iommu_translate function for more IOMMU types.
* VFIO will be notified only on the difference, without unmap 
  before change to maps.

Aviv Ben-David (3):
  IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
    guest
  IOMMU: change iommu_op->translate's is_write to flags, add support to
    NO_FAIL flag mode
  IOMMU: enable intel_iommu map and unmap notifiers

 exec.c                         |   3 +-
 hw/alpha/typhoon.c             |   2 +-
 hw/i386/amd_iommu.c            |   4 +-
 hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
 hw/i386/intel_iommu_internal.h |   3 +
 hw/pci-host/apb.c              |   2 +-
 hw/ppc/spapr_iommu.c           |   2 +-
 hw/s390x/s390-pci-bus.c        |   2 +-
 include/exec/memory.h          |   6 +-
 include/hw/i386/intel_iommu.h  |  11 +++
 memory.c                       |   3 +-
 11 files changed, 160 insertions(+), 38 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-08 11:04 [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
@ 2016-11-08 11:04 ` Aviv B.D
  2016-11-09  7:28   ` Jason Wang
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Aviv B.D @ 2016-11-08 11:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

This capability asks the guest to invalidate cache before each map operation.
We can use this invalidation to trap map operations in the hypervisor.

Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
---
 hw/i386/intel_iommu.c          | 5 +++++
 hw/i386/intel_iommu_internal.h | 1 +
 include/hw/i386/intel_iommu.h  | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1655a65..834887f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2017,6 +2017,7 @@ static Property vtd_properties[] = {
     DEFINE_PROP_ON_OFF_AUTO("eim", IntelIOMMUState, intr_eim,
                             ON_OFF_AUTO_AUTO),
     DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
+    DEFINE_PROP_BOOL("cache-mode", IntelIOMMUState, cache_mode_enabled, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2391,6 +2392,10 @@ static void vtd_init(IntelIOMMUState *s)
         assert(s->intr_eim != ON_OFF_AUTO_AUTO);
     }
 
+    if (s->cache_mode_enabled) {
+        s->cap |= VTD_CAP_CM;
+    }
+
     vtd_reset_context_cache(s);
     vtd_reset_iotlb(s);
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 0829a50..35d9f3a 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -201,6 +201,7 @@
 #define VTD_CAP_MAMV                (VTD_MAMV << 48)
 #define VTD_CAP_PSI                 (1ULL << 39)
 #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
+#define VTD_CAP_CM                  (1ULL << 7)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT         8
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 1989c1e..42d293f 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -258,6 +258,8 @@ struct IntelIOMMUState {
     uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
     uint32_t version;
 
+    bool cache_mode_enabled;        /* RO - is cap CM enabled? */
+
     dma_addr_t root;                /* Current root table pointer */
     bool root_extended;             /* Type of root table (extended or not) */
     bool dmar_enabled;              /* Set if DMA remapping is enabled */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode
  2016-11-08 11:04 [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
@ 2016-11-08 11:04 ` Aviv B.D
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
  2016-11-10 15:14 ` [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Michael S. Tsirkin
  3 siblings, 0 replies; 32+ messages in thread
From: Aviv B.D @ 2016-11-08 11:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

Supports translation trials without reporting error to guest on
translation failure.

Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
---
 exec.c                  |  3 ++-
 hw/alpha/typhoon.c      |  2 +-
 hw/i386/amd_iommu.c     |  4 ++--
 hw/i386/intel_iommu.c   | 59 +++++++++++++++++++++++++++++++------------------
 hw/pci-host/apb.c       |  2 +-
 hw/ppc/spapr_iommu.c    |  2 +-
 hw/s390x/s390-pci-bus.c |  2 +-
 include/exec/memory.h   |  6 +++--
 memory.c                |  3 ++-
 9 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/exec.c b/exec.c
index 963b91a..5687c2a 100644
--- a/exec.c
+++ b/exec.c
@@ -466,7 +466,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
             break;
         }
 
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = mr->iommu_ops->translate(mr, addr,
+                                         is_write ? IOMMU_WO : IOMMU_RO);
         addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
                 | (addr & iotlb.addr_mask));
         *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 883db13..a2840ac 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -664,7 +664,7 @@ static bool window_translate(TyphoonWindow *win, hwaddr addr,
 /* TODO: A translation failure here ought to set PCI error codes on the
    Pchip and generate a machine check interrupt.  */
 static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr,
-                                             bool is_write)
+                                             IOMMUAccessFlags flags)
 {
     TyphoonPchip *pchip = container_of(iommu, TyphoonPchip, iommu);
     IOMMUTLBEntry ret;
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 47b79d9..1f0d76b 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -988,7 +988,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
 }
 
 static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
-                                     bool is_write)
+                                     IOMMUAccessFlags flags)
 {
     AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
     AMDVIState *s = as->iommu_state;
@@ -1017,7 +1017,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
         return ret;
     }
 
-    amdvi_do_translate(as, addr, is_write, &ret);
+    amdvi_do_translate(as, addr, flags & IOMMU_WO, &ret);
     trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
             PCI_FUNC(as->devfn), addr, ret.translated_addr);
     return ret;
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 834887f..6ba915c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -631,7 +631,8 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
 /* Given the @gpa, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
-static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
+static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
+                            IOMMUAccessFlags flags,
                             uint64_t *slptep, uint32_t *slpte_level,
                             bool *reads, bool *writes)
 {
@@ -640,7 +641,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
     uint32_t offset;
     uint64_t slpte;
     uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
-    uint64_t access_right_check;
+    uint64_t access_right_check = 0;
 
     /* Check if @gpa is above 2^X-1, where X is the minimum of MGAW in CAP_REG
      * and AW in context-entry.
@@ -651,7 +652,15 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
     }
 
     /* FIXME: what is the Atomics request here? */
-    access_right_check = is_write ? VTD_SL_W : VTD_SL_R;
+    if (flags & IOMMU_WO) {
+        access_right_check |= VTD_SL_W;
+    }
+    if (flags & IOMMU_RO) {
+        access_right_check |= VTD_SL_R;
+    }
+    if (flags & IOMMU_NO_FAIL) {
+        access_right_check |= VTD_SL_R | VTD_SL_W;
+    }
 
     while (true) {
         offset = vtd_gpa_level_offset(gpa, level);
@@ -673,8 +682,8 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
         if (!(slpte & access_right_check)) {
             VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
                         "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
-                        (is_write ? "write" : "read"), gpa, slpte);
-            return is_write ? -VTD_FR_WRITE : -VTD_FR_READ;
+                        (flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
+            return (flags & IOMMU_WO) ? -VTD_FR_WRITE : -VTD_FR_READ;
         }
         if (vtd_slpte_nonzero_rsvd(slpte, level)) {
             VTD_DPRINTF(GENERAL, "error: non-zero reserved field in second "
@@ -791,11 +800,13 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
  *
  * @bus_num: The bus number
  * @devfn: The devfn, which is the  combined of device and function number
- * @is_write: The access is a write operation
+ * @flags: The access permission of the operation, use IOMMU_NO_FAIL to
+ *         suppress translation errors (e.g. no mapping present)
  * @entry: IOMMUTLBEntry that contain the addr to be translated and result
  */
 static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
-                                   uint8_t devfn, hwaddr addr, bool is_write,
+                                   uint8_t devfn, hwaddr addr,
+                                   IOMMUAccessFlags flags,
                                    IOMMUTLBEntry *entry)
 {
     IntelIOMMUState *s = vtd_as->iommu_state;
@@ -813,7 +824,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
 
     /* Check if the request is in interrupt address range */
     if (vtd_is_interrupt_addr(addr)) {
-        if (is_write) {
+        if (flags & IOMMU_WO) {
             /* FIXME: since we don't know the length of the access here, we
              * treat Non-DWORD length write requests without PASID as
              * interrupt requests, too. Withoud interrupt remapping support,
@@ -829,7 +840,8 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         } else {
             VTD_DPRINTF(GENERAL, "error: read request from interrupt address "
                         "gpa 0x%"PRIx64, addr);
-            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, is_write);
+            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ,
+                    flags & IOMMU_WO);
             return;
         }
     }
@@ -858,12 +870,15 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
         if (ret_fr) {
             ret_fr = -ret_fr;
-            if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
-                VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
+            if (!(flags & IOMMU_NO_FAIL)) {
+                if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
+                    VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
                             "requests through this context-entry "
                             "(with FPD Set)");
-            } else {
-                vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
+                } else {
+                    vtd_report_dmar_fault(s, source_id, addr, ret_fr,
+                            flags & IOMMU_WO);
+                }
             }
             return;
         }
@@ -876,15 +891,17 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
 
-    ret_fr = vtd_gpa_to_slpte(&ce, addr, is_write, &slpte, &level,
+    ret_fr = vtd_gpa_to_slpte(&ce, addr, flags, &slpte, &level,
                               &reads, &writes);
     if (ret_fr) {
         ret_fr = -ret_fr;
-        if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
-            VTD_DPRINTF(FLOG, "fault processing is disabled for DMA requests "
-                        "through this context-entry (with FPD Set)");
-        } else {
-            vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
+        if (!(flags & IOMMU_NO_FAIL)) {
+            if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
+                VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
+                        "requests through this context-entry (with FPD Set)");
+            } else {
+                vtd_report_dmar_fault(s, source_id, addr, ret_fr, flags);
+            }
         }
         return;
     }
@@ -1946,7 +1963,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
 }
 
 static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
-                                         bool is_write)
+                                         IOMMUAccessFlags flags)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
@@ -1968,7 +1985,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
     }
 
     vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, addr,
-                           is_write, &ret);
+                           flags, &ret);
     VTD_DPRINTF(MMU,
                 "bus %"PRIu8 " slot %"PRIu8 " func %"PRIu8 " devfn %"PRIu8
                 " gpa 0x%"PRIx64 " hpa 0x%"PRIx64, pci_bus_num(vtd_as->bus),
diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 653e711..eaa7f34 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -209,7 +209,7 @@ static AddressSpace *pbm_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
 /* Called from RCU critical section */
 static IOMMUTLBEntry pbm_translate_iommu(MemoryRegion *iommu, hwaddr addr,
-                                         bool is_write)
+                                         IOMMUAccessFlags flags)
 {
     IOMMUState *is = container_of(iommu, IOMMUState, iommu);
     hwaddr baseaddr, offset;
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index ae30bbe..d19c3ff 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -110,7 +110,7 @@ static void spapr_tce_free_table(uint64_t *table, int fd, uint32_t nb_table)
 
 /* Called from RCU critical section */
 static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
-                                               bool is_write)
+                                               IOMMUAccessFlags flags)
 {
     sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
     uint64_t tce;
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index b7f8bca..964b78e 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -378,7 +378,7 @@ out:
 }
 
 static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *iommu, hwaddr addr,
-                                          bool is_write)
+                                          IOMMUAccessFlags flags)
 {
     uint64_t pte;
     uint32_t flags;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 79ccaab..eab464a 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -57,6 +57,7 @@ typedef enum {
     IOMMU_RO   = 1,
     IOMMU_WO   = 2,
     IOMMU_RW   = 3,
+    IOMMU_NO_FAIL  = 4, /* may not be present, don't repport error to guest */
 } IOMMUAccessFlags;
 
 struct IOMMUTLBEntry {
@@ -168,10 +169,11 @@ struct MemoryRegionOps {
 };
 
 typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
-
 struct MemoryRegionIOMMUOps {
     /* Return a TLB entry that contains a given address. */
-    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
+    IOMMUTLBEntry (*translate)(MemoryRegion *iommu,
+                               hwaddr addr,
+                               IOMMUAccessFlags flags);
     /* Returns minimum supported page size */
     uint64_t (*get_min_page_size)(MemoryRegion *iommu);
     /* Called when IOMMU Notifier flag changed */
diff --git a/memory.c b/memory.c
index edbc701..83ec3ec 100644
--- a/memory.c
+++ b/memory.c
@@ -1555,7 +1555,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
     granularity = memory_region_iommu_get_min_page_size(mr);
 
     for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = mr->iommu_ops->translate(mr, addr,
+                                         is_write ? IOMMU_WO : IOMMU_RO);
         if (iotlb.perm != IOMMU_NONE) {
             n->notify(n, &iotlb);
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-11-08 11:04 [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
@ 2016-11-08 11:04 ` Aviv B.D
  2016-11-10 15:13   ` Michael S. Tsirkin
  2016-11-10 15:14 ` [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Michael S. Tsirkin
  3 siblings, 1 reply; 32+ messages in thread
From: Aviv B.D @ 2016-11-08 11:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alex Williamson, Peter Xu, Jan Kiszka,
	Aviv Ben-David

From: "Aviv Ben-David" <bd.aviv@gmail.com>

Adds a list of registered vtd_as's to intel iommu state to save
iteration over each PCI device in a search of the corrosponding domain.

Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
---
 hw/i386/intel_iommu.c          | 96 +++++++++++++++++++++++++++++++++++++++---
 hw/i386/intel_iommu_internal.h |  2 +
 include/hw/i386/intel_iommu.h  |  9 ++++
 3 files changed, 100 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6ba915c..095b56d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -54,6 +54,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
+                                    uint8_t devfn, VTDContextEntry *ce);
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
@@ -145,6 +148,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
     return new_val;
 }
 
+static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t devfn,
+                           uint16_t *domain_id)
+{
+    VTDContextEntry ce;
+    int ret_fr;
+
+    assert(domain_id);
+
+    ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
+    if (ret_fr) {
+        return -1;
+    }
+
+    *domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
+    return 0;
+}
+
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -679,7 +699,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
         }
         *reads = (*reads) && (slpte & VTD_SL_R);
         *writes = (*writes) && (slpte & VTD_SL_W);
-        if (!(slpte & access_right_check)) {
+        if (!(slpte & access_right_check) && !(flags & IOMMU_NO_FAIL)) {
             VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
                         "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
                         (flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
@@ -1064,6 +1084,44 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
                                 &domain_id);
 }
 
+static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
+                                           uint16_t domain_id, hwaddr addr,
+                                           uint8_t am)
+{
+    IntelIOMMUNotifierNode *node;
+
+    QLIST_FOREACH(node, &(s->notifiers_list), next) {
+        VTDAddressSpace *vtd_as = node->vtd_as;
+        uint16_t vfio_domain_id;
+        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn,
+                                  &vfio_domain_id);
+        if (!ret && domain_id == vfio_domain_id) {
+            hwaddr original_addr = addr;
+
+            while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
+                IOMMUTLBEntry entry = s->iommu_ops.translate(
+                                                         &node->vtd_as->iommu,
+                                                         addr,
+                                                         IOMMU_NO_FAIL);
+
+                if (entry.perm == IOMMU_NONE &&
+                        node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
+                    entry.target_as = &address_space_memory;
+                    entry.iova = addr & VTD_PAGE_MASK_4K;
+                    entry.translated_addr = 0;
+                    entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT);
+                    memory_region_notify_iommu(&node->vtd_as->iommu, entry);
+                    addr += VTD_PAGE_SIZE;
+                } else if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
+                        memory_region_notify_iommu(&node->vtd_as->iommu, entry);
+                        addr += entry.addr_mask + 1;
+                }
+            }
+        }
+    }
+}
+
+
 static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
                                       hwaddr addr, uint8_t am)
 {
@@ -1074,6 +1132,8 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
     info.addr = addr;
     info.mask = ~((1 << am) - 1);
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
+
+    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
 }
 
 /* Flush IOTLB
@@ -1999,15 +2059,37 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
                                           IOMMUNotifierFlag new)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    IntelIOMMUNotifierNode *node = NULL;
+    IntelIOMMUNotifierNode *next_node = NULL;
 
-    if (new & IOMMU_NOTIFIER_MAP) {
-        error_report("Device at bus %s addr %02x.%d requires iommu "
-                     "notifier which is currently not supported by "
-                     "intel-iommu emulation",
-                     vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
-                     PCI_FUNC(vtd_as->devfn));
+    if (!s->cache_mode_enabled && new & IOMMU_NOTIFIER_MAP) {
+        error_report("We need to set cache_mode=1 for intel-iommu to enable "
+                     "device assignment with IOMMU protection.");
         exit(1);
     }
+
+    /* Add new ndoe if no mapping was exising before this call */
+    if (old == IOMMU_NOTIFIER_NONE) {
+        node = g_malloc0(sizeof(*node));
+        node->vtd_as = vtd_as;
+        node->notifier_flag = new;
+        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
+        return;
+    }
+
+    /* update notifier node with new flags */
+    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
+        if (node->vtd_as == vtd_as) {
+            if (new == IOMMU_NOTIFIER_NONE) {
+                QLIST_REMOVE(node, next);
+                g_free(node);
+            } else {
+                node->notifier_flag = new;
+            }
+            return;
+        }
+    }
 }
 
 static const VMStateDescription vtd_vmstate = {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 35d9f3a..96df4ae 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -381,6 +381,8 @@ typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
 #define VTD_PAGE_SHIFT_1G           30
 #define VTD_PAGE_MASK_1G            (~((1ULL << VTD_PAGE_SHIFT_1G) - 1))
 
+#define VTD_PAGE_MASK(shift)        (~((1ULL << (shift)) - 1))
+
 struct VTDRootEntry {
     uint64_t val;
     uint64_t rsvd;
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 42d293f..ebe9250 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -63,6 +63,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDIrq VTDIrq;
 typedef struct VTD_MSIMessage VTD_MSIMessage;
+typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -248,6 +249,12 @@ struct VTD_MSIMessage {
 /* When IR is enabled, all MSI/MSI-X data bits should be zero */
 #define VTD_IR_MSI_DATA          (0)
 
+struct IntelIOMMUNotifierNode {
+    VTDAddressSpace *vtd_as;
+    IOMMUNotifierFlag notifier_flag;
+    QLIST_ENTRY(IntelIOMMUNotifierNode) next;
+};
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
     X86IOMMUState x86_iommu;
@@ -285,6 +292,8 @@ struct IntelIOMMUState {
     MemoryRegionIOMMUOps iommu_ops;
     GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
+    /* list of registered notifiers */
+    QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
 
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
@ 2016-11-09  7:28   ` Jason Wang
  2016-11-09 22:00     ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2016-11-09  7:28 UTC (permalink / raw)
  To: Aviv B.D, qemu-devel
  Cc: Jan Kiszka, Alex Williamson, Peter Xu, Michael S. Tsirkin



On 2016年11月08日 19:04, Aviv B.D wrote:
> From: "Aviv Ben-David" <bd.aviv@gmail.com>
>
> This capability asks the guest to invalidate cache before each map operation.
> We can use this invalidation to trap map operations in the hypervisor.

Hi:

Like I've asked twice in the past, I want to know why don't you cache 
translation faults as what spec required (especially this is a guest 
visible behavior)?

Btw, please cc me on posting future versions.

Thanks

>
> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> ---
>   hw/i386/intel_iommu.c          | 5 +++++
>   hw/i386/intel_iommu_internal.h | 1 +
>   include/hw/i386/intel_iommu.h  | 2 ++
>   3 files changed, 8 insertions(+)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 1655a65..834887f 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2017,6 +2017,7 @@ static Property vtd_properties[] = {
>       DEFINE_PROP_ON_OFF_AUTO("eim", IntelIOMMUState, intr_eim,
>                               ON_OFF_AUTO_AUTO),
>       DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> +    DEFINE_PROP_BOOL("cache-mode", IntelIOMMUState, cache_mode_enabled, FALSE),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> @@ -2391,6 +2392,10 @@ static void vtd_init(IntelIOMMUState *s)
>           assert(s->intr_eim != ON_OFF_AUTO_AUTO);
>       }
>   
> +    if (s->cache_mode_enabled) {
> +        s->cap |= VTD_CAP_CM;
> +    }
> +
>       vtd_reset_context_cache(s);
>       vtd_reset_iotlb(s);
>   
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 0829a50..35d9f3a 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -201,6 +201,7 @@
>   #define VTD_CAP_MAMV                (VTD_MAMV << 48)
>   #define VTD_CAP_PSI                 (1ULL << 39)
>   #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> +#define VTD_CAP_CM                  (1ULL << 7)
>   
>   /* Supported Adjusted Guest Address Widths */
>   #define VTD_CAP_SAGAW_SHIFT         8
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 1989c1e..42d293f 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -258,6 +258,8 @@ struct IntelIOMMUState {
>       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
>       uint32_t version;
>   
> +    bool cache_mode_enabled;        /* RO - is cap CM enabled? */
> +
>       dma_addr_t root;                /* Current root table pointer */
>       bool root_extended;             /* Type of root table (extended or not) */
>       bool dmar_enabled;              /* Set if DMA remapping is enabled */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-09  7:28   ` Jason Wang
@ 2016-11-09 22:00     ` Michael S. Tsirkin
  2016-11-11  2:32       ` Jason Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-09 22:00 UTC (permalink / raw)
  To: Jason Wang; +Cc: Aviv B.D, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu

On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
> 
> 
> On 2016年11月08日 19:04, Aviv B.D wrote:
> > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > 
> > This capability asks the guest to invalidate cache before each map operation.
> > We can use this invalidation to trap map operations in the hypervisor.
> 
> Hi:
> 
> Like I've asked twice in the past, I want to know why don't you cache
> translation faults as what spec required (especially this is a guest visible
> behavior)?
> 
> Btw, please cc me on posting future versions.
> 
> Thanks

Caching isn't guest visible. Spec just says you *can* cache,
not that you must.

> > 
> > Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> > ---
> >   hw/i386/intel_iommu.c          | 5 +++++
> >   hw/i386/intel_iommu_internal.h | 1 +
> >   include/hw/i386/intel_iommu.h  | 2 ++
> >   3 files changed, 8 insertions(+)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 1655a65..834887f 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2017,6 +2017,7 @@ static Property vtd_properties[] = {
> >       DEFINE_PROP_ON_OFF_AUTO("eim", IntelIOMMUState, intr_eim,
> >                               ON_OFF_AUTO_AUTO),
> >       DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > +    DEFINE_PROP_BOOL("cache-mode", IntelIOMMUState, cache_mode_enabled, FALSE),
> >       DEFINE_PROP_END_OF_LIST(),
> >   };
> > @@ -2391,6 +2392,10 @@ static void vtd_init(IntelIOMMUState *s)
> >           assert(s->intr_eim != ON_OFF_AUTO_AUTO);
> >       }
> > +    if (s->cache_mode_enabled) {
> > +        s->cap |= VTD_CAP_CM;
> > +    }
> > +
> >       vtd_reset_context_cache(s);
> >       vtd_reset_iotlb(s);
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index 0829a50..35d9f3a 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -201,6 +201,7 @@
> >   #define VTD_CAP_MAMV                (VTD_MAMV << 48)
> >   #define VTD_CAP_PSI                 (1ULL << 39)
> >   #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> > +#define VTD_CAP_CM                  (1ULL << 7)
> >   /* Supported Adjusted Guest Address Widths */
> >   #define VTD_CAP_SAGAW_SHIFT         8
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 1989c1e..42d293f 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -258,6 +258,8 @@ struct IntelIOMMUState {
> >       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
> >       uint32_t version;
> > +    bool cache_mode_enabled;        /* RO - is cap CM enabled? */
> > +
> >       dma_addr_t root;                /* Current root table pointer */
> >       bool root_extended;             /* Type of root table (extended or not) */
> >       bool dmar_enabled;              /* Set if DMA remapping is enabled */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/3] IOMMU: enable intel_iommu map and unmap notifiers
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
@ 2016-11-10 15:13   ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-10 15:13 UTC (permalink / raw)
  To: Aviv B.D; +Cc: qemu-devel, Alex Williamson, Peter Xu, Jan Kiszka

On Tue, Nov 08, 2016 at 01:04:24PM +0200, Aviv B.D wrote:
> From: "Aviv Ben-David" <bd.aviv@gmail.com>
> 
> Adds a list of registered vtd_as's to intel iommu state to save
> iteration over each PCI device in a search of the corrosponding domain.
> 
> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
> ---
>  hw/i386/intel_iommu.c          | 96 +++++++++++++++++++++++++++++++++++++++---
>  hw/i386/intel_iommu_internal.h |  2 +
>  include/hw/i386/intel_iommu.h  |  9 ++++
>  3 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 6ba915c..095b56d 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -54,6 +54,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
>  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
>  #endif
>  
> +static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> +                                    uint8_t devfn, VTDContextEntry *ce);
> +

Please put the whole function here, don't declare static ones.


>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                              uint64_t wmask, uint64_t w1cmask)
>  {
> @@ -145,6 +148,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
>      return new_val;
>  }
>  
> +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t devfn,
> +                           uint16_t *domain_id)
> +{
> +    VTDContextEntry ce;
> +    int ret_fr;
> +
> +    assert(domain_id);
> +
> +    ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> +    if (ret_fr) {
> +        return -1;
> +    }
> +
> +    *domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
> +    return 0;
> +}
> +
>  /* GHashTable functions */
>  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>  {
> @@ -679,7 +699,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
>          }
>          *reads = (*reads) && (slpte & VTD_SL_R);
>          *writes = (*writes) && (slpte & VTD_SL_W);
> -        if (!(slpte & access_right_check)) {
> +        if (!(slpte & access_right_check) && !(flags & IOMMU_NO_FAIL)) {
>              VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
>                          "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
>                          (flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
> @@ -1064,6 +1084,44 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
>                                  &domain_id);
>  }
>  
> +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> +                                           uint16_t domain_id, hwaddr addr,
> +                                           uint8_t am)
> +{
> +    IntelIOMMUNotifierNode *node;
> +
> +    QLIST_FOREACH(node, &(s->notifiers_list), next) {
> +        VTDAddressSpace *vtd_as = node->vtd_as;
> +        uint16_t vfio_domain_id;
> +        int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn,
> +                                  &vfio_domain_id);

Space won't hurt here, after a declaration.

> +        if (!ret && domain_id == vfio_domain_id) {
> +            hwaddr original_addr = addr;
> +
> +            while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
> +                IOMMUTLBEntry entry = s->iommu_ops.translate(
> +                                                         &node->vtd_as->iommu,
> +                                                         addr,
> +                                                         IOMMU_NO_FAIL);
> +
> +                if (entry.perm == IOMMU_NONE &&
> +                        node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
> +                    entry.target_as = &address_space_memory;
> +                    entry.iova = addr & VTD_PAGE_MASK_4K;
> +                    entry.translated_addr = 0;
> +                    entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT);
> +                    memory_region_notify_iommu(&node->vtd_as->iommu, entry);
> +                    addr += VTD_PAGE_SIZE;
> +                } else if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> +                        memory_region_notify_iommu(&node->vtd_as->iommu, entry);
> +                        addr += entry.addr_mask + 1;
> +                }
> +            }
> +        }
> +    }
> +}
> +
> +
>  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>                                        hwaddr addr, uint8_t am)
>  {
> @@ -1074,6 +1132,8 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>      info.addr = addr;
>      info.mask = ~((1 << am) - 1);
>      g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> +
> +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
>  }
>  
>  /* Flush IOTLB
> @@ -1999,15 +2059,37 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
>                                            IOMMUNotifierFlag new)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> +    IntelIOMMUState *s = vtd_as->iommu_state;
> +    IntelIOMMUNotifierNode *node = NULL;
> +    IntelIOMMUNotifierNode *next_node = NULL;
>  
> -    if (new & IOMMU_NOTIFIER_MAP) {
> -        error_report("Device at bus %s addr %02x.%d requires iommu "
> -                     "notifier which is currently not supported by "
> -                     "intel-iommu emulation",
> -                     vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
> -                     PCI_FUNC(vtd_as->devfn));
> +    if (!s->cache_mode_enabled && new & IOMMU_NOTIFIER_MAP) {
> +        error_report("We need to set cache_mode=1 for intel-iommu to enable "
> +                     "device assignment with IOMMU protection.");
>          exit(1);
>      }

Isn't there a way to fail assignment cleanly?

Can be fixed by patch on top I guess.

> +
> +    /* Add new ndoe if no mapping was exising before this call */
> +    if (old == IOMMU_NOTIFIER_NONE) {
> +        node = g_malloc0(sizeof(*node));
> +        node->vtd_as = vtd_as;
> +        node->notifier_flag = new;
> +        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
> +        return;
> +    }
> +
> +    /* update notifier node with new flags */
> +    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
> +        if (node->vtd_as == vtd_as) {
> +            if (new == IOMMU_NOTIFIER_NONE) {
> +                QLIST_REMOVE(node, next);
> +                g_free(node);
> +            } else {
> +                node->notifier_flag = new;
> +            }
> +            return;
> +        }
> +    }
>  }
>  
>  static const VMStateDescription vtd_vmstate = {
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 35d9f3a..96df4ae 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -381,6 +381,8 @@ typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
>  #define VTD_PAGE_SHIFT_1G           30
>  #define VTD_PAGE_MASK_1G            (~((1ULL << VTD_PAGE_SHIFT_1G) - 1))
>  
> +#define VTD_PAGE_MASK(shift)        (~((1ULL << (shift)) - 1))
> +
>  struct VTDRootEntry {
>      uint64_t val;
>      uint64_t rsvd;
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 42d293f..ebe9250 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -63,6 +63,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
>  typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
>  typedef struct VTDIrq VTDIrq;
>  typedef struct VTD_MSIMessage VTD_MSIMessage;
> +typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
>  
>  /* Context-Entry */
>  struct VTDContextEntry {
> @@ -248,6 +249,12 @@ struct VTD_MSIMessage {
>  /* When IR is enabled, all MSI/MSI-X data bits should be zero */
>  #define VTD_IR_MSI_DATA          (0)
>  
> +struct IntelIOMMUNotifierNode {
> +    VTDAddressSpace *vtd_as;
> +    IOMMUNotifierFlag notifier_flag;
> +    QLIST_ENTRY(IntelIOMMUNotifierNode) next;
> +};
> +
>  /* The iommu (DMAR) device state struct */
>  struct IntelIOMMUState {
>      X86IOMMUState x86_iommu;
> @@ -285,6 +292,8 @@ struct IntelIOMMUState {
>      MemoryRegionIOMMUOps iommu_ops;
>      GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
>      VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
> +    /* list of registered notifiers */
> +    QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
>  
>      /* interrupt remapping */
>      bool intr_enabled;              /* Whether guest enabled IR */
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-08 11:04 [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
                   ` (2 preceding siblings ...)
  2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
@ 2016-11-10 15:14 ` Michael S. Tsirkin
  2016-11-10 15:30   ` Alex Williamson
  3 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-10 15:14 UTC (permalink / raw)
  To: Aviv B.D; +Cc: qemu-devel, Alex Williamson, Peter Xu, Jan Kiszka

On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
> From: "Aviv Ben-David" <bd.aviv@gmail.com>
> 
> * Advertize Cache Mode capability in iommu cap register. 
>   This capability is controlled by "cache-mode" property of intel-iommu device.
>   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> 
> * On page cache invalidation in intel vIOMMU, check if the domain belong to
>   registered notifier, and notify accordingly.

This looks sane I think. Alex, care to comment?
Merging will have to wait until after the release.
Pls remember to re-test and re-ping then.


> Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> present. Current problems:
> * vfio_iommu_map_notify is not aware about memory range belong to specific 
>   VFIOGuestIOMMU.
> * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
>   64bit address space. Commenting out the call to this function enables 
>   workable VFIO device while vIOMMU present.
> * vfio_iommu_map_notify should check if address space range is suitable for 
>   current notifier.
> 
> Changes from v1 to v2:
> * remove assumption that the cache do not clears
> * fix lockup on high load.
> 
> Changes from v2 to v3:
> * remove debug leftovers
> * split to sepearate commits
> * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
>   to suppress error propagating to guest.
> 
> Changes from v3 to v4:
> * Add property to intel_iommu device to control the CM capability, 
>   default to False.
> * Use s->iommu_ops.notify_flag_changed to register notifiers.
> 
> Changes from v4 to v4 RESEND:
> * Fix codding style pointed by checkpatch.pl script.
> 
> Changes from v4 to v5:
> * Reduce the number of changes in patch 2 and make flags real bitfield.
> * Revert deleted debug prints.
> * Fix memory leak in patch 3.
> 
> Changes from v5 to v6:
> * fix prototype of iommu_translate function for more IOMMU types.
> * VFIO will be notified only on the difference, without unmap 
>   before change to maps.
> 
> Aviv Ben-David (3):
>   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
>     guest
>   IOMMU: change iommu_op->translate's is_write to flags, add support to
>     NO_FAIL flag mode
>   IOMMU: enable intel_iommu map and unmap notifiers
> 
>  exec.c                         |   3 +-
>  hw/alpha/typhoon.c             |   2 +-
>  hw/i386/amd_iommu.c            |   4 +-
>  hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
>  hw/i386/intel_iommu_internal.h |   3 +
>  hw/pci-host/apb.c              |   2 +-
>  hw/ppc/spapr_iommu.c           |   2 +-
>  hw/s390x/s390-pci-bus.c        |   2 +-
>  include/exec/memory.h          |   6 +-
>  include/hw/i386/intel_iommu.h  |  11 +++
>  memory.c                       |   3 +-
>  11 files changed, 160 insertions(+), 38 deletions(-)
> 
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-10 15:14 ` [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Michael S. Tsirkin
@ 2016-11-10 15:30   ` Alex Williamson
  2016-11-10 15:54     ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2016-11-10 15:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu

On Thu, 10 Nov 2016 17:14:24 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
> > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > 
> > * Advertize Cache Mode capability in iommu cap register. 
> >   This capability is controlled by "cache-mode" property of intel-iommu device.
> >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > 
> > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> >   registered notifier, and notify accordingly.  
> 
> This looks sane I think. Alex, care to comment?
> Merging will have to wait until after the release.
> Pls remember to re-test and re-ping then.

I don't think it's suitable for upstream until there's a reasonable
replay mechanism and we straighten out whether it's expected to get
multiple notifies and the notif-ee is responsible for filtering
them or if the notif-er should do filtering.  Without those, this is
effectively just an RFC.  Thanks,

Alex

> > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > present. Current problems:
> > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> >   VFIOGuestIOMMU.
> > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> >   64bit address space. Commenting out the call to this function enables 
> >   workable VFIO device while vIOMMU present.
> > * vfio_iommu_map_notify should check if address space range is suitable for 
> >   current notifier.
> > 
> > Changes from v1 to v2:
> > * remove assumption that the cache do not clears
> > * fix lockup on high load.
> > 
> > Changes from v2 to v3:
> > * remove debug leftovers
> > * split to sepearate commits
> > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> >   to suppress error propagating to guest.
> > 
> > Changes from v3 to v4:
> > * Add property to intel_iommu device to control the CM capability, 
> >   default to False.
> > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > 
> > Changes from v4 to v4 RESEND:
> > * Fix codding style pointed by checkpatch.pl script.
> > 
> > Changes from v4 to v5:
> > * Reduce the number of changes in patch 2 and make flags real bitfield.
> > * Revert deleted debug prints.
> > * Fix memory leak in patch 3.
> > 
> > Changes from v5 to v6:
> > * fix prototype of iommu_translate function for more IOMMU types.
> > * VFIO will be notified only on the difference, without unmap 
> >   before change to maps.
> > 
> > Aviv Ben-David (3):
> >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> >     guest
> >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> >     NO_FAIL flag mode
> >   IOMMU: enable intel_iommu map and unmap notifiers
> > 
> >  exec.c                         |   3 +-
> >  hw/alpha/typhoon.c             |   2 +-
> >  hw/i386/amd_iommu.c            |   4 +-
> >  hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
> >  hw/i386/intel_iommu_internal.h |   3 +
> >  hw/pci-host/apb.c              |   2 +-
> >  hw/ppc/spapr_iommu.c           |   2 +-
> >  hw/s390x/s390-pci-bus.c        |   2 +-
> >  include/exec/memory.h          |   6 +-
> >  include/hw/i386/intel_iommu.h  |  11 +++
> >  memory.c                       |   3 +-
> >  11 files changed, 160 insertions(+), 38 deletions(-)
> > 
> > -- 
> > 1.9.1  
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-10 15:30   ` Alex Williamson
@ 2016-11-10 15:54     ` Michael S. Tsirkin
  2016-11-10 16:04       ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-10 15:54 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu

On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
> On Thu, 10 Nov 2016 17:14:24 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
> > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > 
> > > * Advertize Cache Mode capability in iommu cap register. 
> > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > 
> > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > >   registered notifier, and notify accordingly.  
> > 
> > This looks sane I think. Alex, care to comment?
> > Merging will have to wait until after the release.
> > Pls remember to re-test and re-ping then.
> 
> I don't think it's suitable for upstream until there's a reasonable
> replay mechanism

Could you pls clarify what do you mean by replay?
Is this when you attach a device by hotplug to
a running system?

If yes this can maybe be addressed by disabling hotplug temporarily.


> and we straighten out whether it's expected to get
> multiple notifies and the notif-ee is responsible for filtering
> them or if the notif-er should do filtering.

OK this is a documentation thing.

>  Without those, this is
> effectively just an RFC.

It's infrastructure without users so it doesn't break things,
I'm more interested in seeing whether it's broken in
some way than whether it's complete.

The patchset spent out of tree too long and I'd like to see
us make progress towards device assignment working with
vIOMMU sooner rather than later, so if it's broken I won't
merge it but if it's incomplete I will.

>  Thanks,
> 
> Alex
> 
> > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > present. Current problems:
> > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > >   VFIOGuestIOMMU.
> > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > >   64bit address space. Commenting out the call to this function enables 
> > >   workable VFIO device while vIOMMU present.
> > > * vfio_iommu_map_notify should check if address space range is suitable for 
> > >   current notifier.
> > > 
> > > Changes from v1 to v2:
> > > * remove assumption that the cache do not clears
> > > * fix lockup on high load.
> > > 
> > > Changes from v2 to v3:
> > > * remove debug leftovers
> > > * split to sepearate commits
> > > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> > >   to suppress error propagating to guest.
> > > 
> > > Changes from v3 to v4:
> > > * Add property to intel_iommu device to control the CM capability, 
> > >   default to False.
> > > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > > 
> > > Changes from v4 to v4 RESEND:
> > > * Fix codding style pointed by checkpatch.pl script.
> > > 
> > > Changes from v4 to v5:
> > > * Reduce the number of changes in patch 2 and make flags real bitfield.
> > > * Revert deleted debug prints.
> > > * Fix memory leak in patch 3.
> > > 
> > > Changes from v5 to v6:
> > > * fix prototype of iommu_translate function for more IOMMU types.
> > > * VFIO will be notified only on the difference, without unmap 
> > >   before change to maps.
> > > 
> > > Aviv Ben-David (3):
> > >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> > >     guest
> > >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> > >     NO_FAIL flag mode
> > >   IOMMU: enable intel_iommu map and unmap notifiers
> > > 
> > >  exec.c                         |   3 +-
> > >  hw/alpha/typhoon.c             |   2 +-
> > >  hw/i386/amd_iommu.c            |   4 +-
> > >  hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
> > >  hw/i386/intel_iommu_internal.h |   3 +
> > >  hw/pci-host/apb.c              |   2 +-
> > >  hw/ppc/spapr_iommu.c           |   2 +-
> > >  hw/s390x/s390-pci-bus.c        |   2 +-
> > >  include/exec/memory.h          |   6 +-
> > >  include/hw/i386/intel_iommu.h  |  11 +++
> > >  memory.c                       |   3 +-
> > >  11 files changed, 160 insertions(+), 38 deletions(-)
> > > 
> > > -- 
> > > 1.9.1  
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-10 15:54     ` Michael S. Tsirkin
@ 2016-11-10 16:04       ` Alex Williamson
  2016-11-10 19:20         ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2016-11-10 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu

On Thu, 10 Nov 2016 17:54:35 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
> > On Thu, 10 Nov 2016 17:14:24 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:  
> > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > 
> > > > * Advertize Cache Mode capability in iommu cap register. 
> > > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > > 
> > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > > >   registered notifier, and notify accordingly.    
> > > 
> > > This looks sane I think. Alex, care to comment?
> > > Merging will have to wait until after the release.
> > > Pls remember to re-test and re-ping then.  
> > 
> > I don't think it's suitable for upstream until there's a reasonable
> > replay mechanism  
> 
> Could you pls clarify what do you mean by replay?
> Is this when you attach a device by hotplug to
> a running system?
> 
> If yes this can maybe be addressed by disabling hotplug temporarily.

No, hotplug is not required, moving a device between existing domains
requires replay, ie. actually using it for nested device assignment.

> > and we straighten out whether it's expected to get
> > multiple notifies and the notif-ee is responsible for filtering
> > them or if the notif-er should do filtering.  
> 
> OK this is a documentation thing.

Well no, it needs to be decided and if necessary implemented.

> >  Without those, this is
> > effectively just an RFC.  
> 
> It's infrastructure without users so it doesn't break things,
> I'm more interested in seeing whether it's broken in
> some way than whether it's complete.

If it allows use with vfio but doesn't fully implement the complete set
of interfaces, it does break things.  We currently prevent viommu usage
with vfio because it is incomplete.
 
> The patchset spent out of tree too long and I'd like to see
> us make progress towards device assignment working with
> vIOMMU sooner rather than later, so if it's broken I won't
> merge it but if it's incomplete I will.

So long as it's incomplete and still prevents vfio usage, I'm ok with
merging it, but I don't want to enable vfio usage until it's complete.
Thanks,

Alex

> > > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > > present. Current problems:
> > > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > > >   VFIOGuestIOMMU.
> > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > > >   64bit address space. Commenting out the call to this function enables 
> > > >   workable VFIO device while vIOMMU present.
> > > > * vfio_iommu_map_notify should check if address space range is suitable for 
> > > >   current notifier.
> > > > 
> > > > Changes from v1 to v2:
> > > > * remove assumption that the cache do not clears
> > > > * fix lockup on high load.
> > > > 
> > > > Changes from v2 to v3:
> > > > * remove debug leftovers
> > > > * split to sepearate commits
> > > > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> > > >   to suppress error propagating to guest.
> > > > 
> > > > Changes from v3 to v4:
> > > > * Add property to intel_iommu device to control the CM capability, 
> > > >   default to False.
> > > > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > > > 
> > > > Changes from v4 to v4 RESEND:
> > > > * Fix codding style pointed by checkpatch.pl script.
> > > > 
> > > > Changes from v4 to v5:
> > > > * Reduce the number of changes in patch 2 and make flags real bitfield.
> > > > * Revert deleted debug prints.
> > > > * Fix memory leak in patch 3.
> > > > 
> > > > Changes from v5 to v6:
> > > > * fix prototype of iommu_translate function for more IOMMU types.
> > > > * VFIO will be notified only on the difference, without unmap 
> > > >   before change to maps.
> > > > 
> > > > Aviv Ben-David (3):
> > > >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> > > >     guest
> > > >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> > > >     NO_FAIL flag mode
> > > >   IOMMU: enable intel_iommu map and unmap notifiers
> > > > 
> > > >  exec.c                         |   3 +-
> > > >  hw/alpha/typhoon.c             |   2 +-
> > > >  hw/i386/amd_iommu.c            |   4 +-
> > > >  hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
> > > >  hw/i386/intel_iommu_internal.h |   3 +
> > > >  hw/pci-host/apb.c              |   2 +-
> > > >  hw/ppc/spapr_iommu.c           |   2 +-
> > > >  hw/s390x/s390-pci-bus.c        |   2 +-
> > > >  include/exec/memory.h          |   6 +-
> > > >  include/hw/i386/intel_iommu.h  |  11 +++
> > > >  memory.c                       |   3 +-
> > > >  11 files changed, 160 insertions(+), 38 deletions(-)
> > > > 
> > > > -- 
> > > > 1.9.1    
> > >   

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-10 16:04       ` Alex Williamson
@ 2016-11-10 19:20         ` Michael S. Tsirkin
  2016-11-10 19:44           ` Alex Williamson
  2016-11-16 19:20           ` Aviv B.D.
  0 siblings, 2 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-10 19:20 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu

On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
> On Thu, 10 Nov 2016 17:54:35 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
> > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:  
> > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > 
> > > > > * Advertize Cache Mode capability in iommu cap register. 
> > > > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > > > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > > > 
> > > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > > > >   registered notifier, and notify accordingly.    
> > > > 
> > > > This looks sane I think. Alex, care to comment?
> > > > Merging will have to wait until after the release.
> > > > Pls remember to re-test and re-ping then.  
> > > 
> > > I don't think it's suitable for upstream until there's a reasonable
> > > replay mechanism  
> > 
> > Could you pls clarify what do you mean by replay?
> > Is this when you attach a device by hotplug to
> > a running system?
> > 
> > If yes this can maybe be addressed by disabling hotplug temporarily.
> 
> No, hotplug is not required, moving a device between existing domains
> requires replay, ie. actually using it for nested device assignment.

Good point, that one is a correctness thing. Aviv,
could you add this in TODO list in a cover letter pls?

> > > and we straighten out whether it's expected to get
> > > multiple notifies and the notif-ee is responsible for filtering
> > > them or if the notif-er should do filtering.  
> > 
> > OK this is a documentation thing.
> 
> Well no, it needs to be decided and if necessary implemented.

Let's assume it's the notif-ee for now. Less is more and all that.

> > >  Without those, this is
> > > effectively just an RFC.  
> > 
> > It's infrastructure without users so it doesn't break things,
> > I'm more interested in seeing whether it's broken in
> > some way than whether it's complete.
> 
> If it allows use with vfio but doesn't fully implement the complete set
> of interfaces, it does break things.  We currently prevent viommu usage
> with vfio because it is incomplete.

Right - that bit is still in as far as I can see.

> > The patchset spent out of tree too long and I'd like to see
> > us make progress towards device assignment working with
> > vIOMMU sooner rather than later, so if it's broken I won't
> > merge it but if it's incomplete I will.
> 
> So long as it's incomplete and still prevents vfio usage, I'm ok with
> merging it, but I don't want to enable vfio usage until it's complete.
> Thanks,
> 
> Alex
> 
> > > > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > > > present. Current problems:
> > > > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > > > >   VFIOGuestIOMMU.
> > > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > > > >   64bit address space. Commenting out the call to this function enables 
> > > > >   workable VFIO device while vIOMMU present.
> > > > > * vfio_iommu_map_notify should check if address space range is suitable for 
> > > > >   current notifier.
> > > > > 
> > > > > Changes from v1 to v2:
> > > > > * remove assumption that the cache do not clears
> > > > > * fix lockup on high load.
> > > > > 
> > > > > Changes from v2 to v3:
> > > > > * remove debug leftovers
> > > > > * split to sepearate commits
> > > > > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> > > > >   to suppress error propagating to guest.
> > > > > 
> > > > > Changes from v3 to v4:
> > > > > * Add property to intel_iommu device to control the CM capability, 
> > > > >   default to False.
> > > > > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > > > > 
> > > > > Changes from v4 to v4 RESEND:
> > > > > * Fix codding style pointed by checkpatch.pl script.
> > > > > 
> > > > > Changes from v4 to v5:
> > > > > * Reduce the number of changes in patch 2 and make flags real bitfield.
> > > > > * Revert deleted debug prints.
> > > > > * Fix memory leak in patch 3.
> > > > > 
> > > > > Changes from v5 to v6:
> > > > > * fix prototype of iommu_translate function for more IOMMU types.
> > > > > * VFIO will be notified only on the difference, without unmap 
> > > > >   before change to maps.
> > > > > 
> > > > > Aviv Ben-David (3):
> > > > >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> > > > >     guest
> > > > >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> > > > >     NO_FAIL flag mode
> > > > >   IOMMU: enable intel_iommu map and unmap notifiers
> > > > > 
> > > > >  exec.c                         |   3 +-
> > > > >  hw/alpha/typhoon.c             |   2 +-
> > > > >  hw/i386/amd_iommu.c            |   4 +-
> > > > >  hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
> > > > >  hw/i386/intel_iommu_internal.h |   3 +
> > > > >  hw/pci-host/apb.c              |   2 +-
> > > > >  hw/ppc/spapr_iommu.c           |   2 +-
> > > > >  hw/s390x/s390-pci-bus.c        |   2 +-
> > > > >  include/exec/memory.h          |   6 +-
> > > > >  include/hw/i386/intel_iommu.h  |  11 +++
> > > > >  memory.c                       |   3 +-
> > > > >  11 files changed, 160 insertions(+), 38 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 1.9.1    
> > > >   

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-10 19:20         ` Michael S. Tsirkin
@ 2016-11-10 19:44           ` Alex Williamson
  2016-11-16 13:54             ` Michael S. Tsirkin
  2016-11-16 19:20           ` Aviv B.D.
  1 sibling, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2016-11-10 19:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu

On Thu, 10 Nov 2016 21:20:36 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
> > On Thu, 10 Nov 2016 17:54:35 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:  
> > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >     
> > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:    
> > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > > 
> > > > > > * Advertize Cache Mode capability in iommu cap register. 
> > > > > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > > > > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > > > > 
> > > > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > > > > >   registered notifier, and notify accordingly.      
> > > > > 
> > > > > This looks sane I think. Alex, care to comment?
> > > > > Merging will have to wait until after the release.
> > > > > Pls remember to re-test and re-ping then.    
> > > > 
> > > > I don't think it's suitable for upstream until there's a reasonable
> > > > replay mechanism    
> > > 
> > > Could you pls clarify what do you mean by replay?
> > > Is this when you attach a device by hotplug to
> > > a running system?
> > > 
> > > If yes this can maybe be addressed by disabling hotplug temporarily.  
> > 
> > No, hotplug is not required, moving a device between existing domains
> > requires replay, ie. actually using it for nested device assignment.  
> 
> Good point, that one is a correctness thing. Aviv,
> could you add this in TODO list in a cover letter pls?
> 
> > > > and we straighten out whether it's expected to get
> > > > multiple notifies and the notif-ee is responsible for filtering
> > > > them or if the notif-er should do filtering.    
> > > 
> > > OK this is a documentation thing.  
> > 
> > Well no, it needs to be decided and if necessary implemented.  
> 
> Let's assume it's the notif-ee for now. Less is more and all that.

I think this is opposite of the approach dwg suggested.
 
> > > >  Without those, this is
> > > > effectively just an RFC.    
> > > 
> > > It's infrastructure without users so it doesn't break things,
> > > I'm more interested in seeing whether it's broken in
> > > some way than whether it's complete.  
> > 
> > If it allows use with vfio but doesn't fully implement the complete set
> > of interfaces, it does break things.  We currently prevent viommu usage
> > with vfio because it is incomplete.  
> 
> Right - that bit is still in as far as I can see.

Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
vfio even though it's still incomplete.  We would at least need
something like a replay callback for VT-d that triggers an abort if you
still want to accept it incomplete.  Thanks,

Alex

> > > The patchset spent out of tree too long and I'd like to see
> > > us make progress towards device assignment working with
> > > vIOMMU sooner rather than later, so if it's broken I won't
> > > merge it but if it's incomplete I will.  
> > 
> > So long as it's incomplete and still prevents vfio usage, I'm ok with
> > merging it, but I don't want to enable vfio usage until it's complete.
> > Thanks,
> > 
> > Alex
> >   
> > > > > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > > > > present. Current problems:
> > > > > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > > > > >   VFIOGuestIOMMU.
> > > > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > > > > >   64bit address space. Commenting out the call to this function enables 
> > > > > >   workable VFIO device while vIOMMU present.
> > > > > > * vfio_iommu_map_notify should check if address space range is suitable for 
> > > > > >   current notifier.
> > > > > > 
> > > > > > Changes from v1 to v2:
> > > > > > * remove assumption that the cache do not clears
> > > > > > * fix lockup on high load.
> > > > > > 
> > > > > > Changes from v2 to v3:
> > > > > > * remove debug leftovers
> > > > > > * split to sepearate commits
> > > > > > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> > > > > >   to suppress error propagating to guest.
> > > > > > 
> > > > > > Changes from v3 to v4:
> > > > > > * Add property to intel_iommu device to control the CM capability, 
> > > > > >   default to False.
> > > > > > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > > > > > 
> > > > > > Changes from v4 to v4 RESEND:
> > > > > > * Fix codding style pointed by checkpatch.pl script.
> > > > > > 
> > > > > > Changes from v4 to v5:
> > > > > > * Reduce the number of changes in patch 2 and make flags real bitfield.
> > > > > > * Revert deleted debug prints.
> > > > > > * Fix memory leak in patch 3.
> > > > > > 
> > > > > > Changes from v5 to v6:
> > > > > > * fix prototype of iommu_translate function for more IOMMU types.
> > > > > > * VFIO will be notified only on the difference, without unmap 
> > > > > >   before change to maps.
> > > > > > 
> > > > > > Aviv Ben-David (3):
> > > > > >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> > > > > >     guest
> > > > > >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> > > > > >     NO_FAIL flag mode
> > > > > >   IOMMU: enable intel_iommu map and unmap notifiers
> > > > > > 
> > > > > >  exec.c                         |   3 +-
> > > > > >  hw/alpha/typhoon.c             |   2 +-
> > > > > >  hw/i386/amd_iommu.c            |   4 +-
> > > > > >  hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
> > > > > >  hw/i386/intel_iommu_internal.h |   3 +
> > > > > >  hw/pci-host/apb.c              |   2 +-
> > > > > >  hw/ppc/spapr_iommu.c           |   2 +-
> > > > > >  hw/s390x/s390-pci-bus.c        |   2 +-
> > > > > >  include/exec/memory.h          |   6 +-
> > > > > >  include/hw/i386/intel_iommu.h  |  11 +++
> > > > > >  memory.c                       |   3 +-
> > > > > >  11 files changed, 160 insertions(+), 38 deletions(-)
> > > > > > 
> > > > > > -- 
> > > > > > 1.9.1      
> > > > >     

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-09 22:00     ` Michael S. Tsirkin
@ 2016-11-11  2:32       ` Jason Wang
  2016-11-11  3:39         ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2016-11-11  2:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Aviv B.D, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu



On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
> On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
>> >
>> >
>> >On 2016年11月08日 19:04, Aviv B.D wrote:
>>> > >From: "Aviv Ben-David"<bd.aviv@gmail.com>
>>> > >
>>> > >This capability asks the guest to invalidate cache before each map operation.
>>> > >We can use this invalidation to trap map operations in the hypervisor.
>> >
>> >Hi:
>> >
>> >Like I've asked twice in the past, I want to know why don't you cache
>> >translation faults as what spec required (especially this is a guest visible
>> >behavior)?
>> >
>> >Btw, please cc me on posting future versions.
>> >
>> >Thanks
> Caching isn't guest visible.

Seems not, if one fault mapping were cached by IOTLB. Guest can notice 
this behavior.

> Spec just says you*can*  cache,
> not that you must.
>

Yes, but what did in this patch is "don't". What I suggest is just a 
"can", since anyway the IOTLB entries were limited and could be replaced 
by other.

Thanks

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-11  2:32       ` Jason Wang
@ 2016-11-11  3:39         ` Michael S. Tsirkin
  2016-11-11  4:15           ` Jason Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-11  3:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: Aviv B.D, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu

On Fri, Nov 11, 2016 at 10:32:42AM +0800, Jason Wang wrote:
> 
> 
> On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
> > On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
> > > >
> > > >
> > > >On 2016年11月08日 19:04, Aviv B.D wrote:
> > > > > >From: "Aviv Ben-David"<bd.aviv@gmail.com>
> > > > > >
> > > > > >This capability asks the guest to invalidate cache before each map operation.
> > > > > >We can use this invalidation to trap map operations in the hypervisor.
> > > >
> > > >Hi:
> > > >
> > > >Like I've asked twice in the past, I want to know why don't you cache
> > > >translation faults as what spec required (especially this is a guest visible
> > > >behavior)?
> > > >
> > > >Btw, please cc me on posting future versions.
> > > >
> > > >Thanks
> > Caching isn't guest visible.
> 
> Seems not, if one fault mapping were cached by IOTLB. Guest can notice this
> behavior.

Sorry, I don't get what you are saying.

> > Spec just says you*can*  cache,
> > not that you must.
> > 
> 
> Yes, but what did in this patch is "don't". What I suggest is just a "can",
> since anyway the IOTLB entries were limited and could be replaced by other.
> 
> Thanks

Have trouble understanding this. Can you given an example of
a guest visible difference?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-11  3:39         ` Michael S. Tsirkin
@ 2016-11-11  4:15           ` Jason Wang
  2016-11-21 12:41             ` Aviv B.D.
  2016-11-21 13:35             ` Michael S. Tsirkin
  0 siblings, 2 replies; 32+ messages in thread
From: Jason Wang @ 2016-11-11  4:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Aviv B.D, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu



On 2016年11月11日 11:39, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2016 at 10:32:42AM +0800, Jason Wang wrote:
>>
>> On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
>>> On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
>>>>>
>>>>> On 2016年11月08日 19:04, Aviv B.D wrote:
>>>>>>> From: "Aviv Ben-David"<bd.aviv@gmail.com>
>>>>>>>
>>>>>>> This capability asks the guest to invalidate cache before each map operation.
>>>>>>> We can use this invalidation to trap map operations in the hypervisor.
>>>>> Hi:
>>>>>
>>>>> Like I've asked twice in the past, I want to know why don't you cache
>>>>> translation faults as what spec required (especially this is a guest visible
>>>>> behavior)?
>>>>>
>>>>> Btw, please cc me on posting future versions.
>>>>>
>>>>> Thanks
>>> Caching isn't guest visible.
>> Seems not, if one fault mapping were cached by IOTLB. Guest can notice this
>> behavior.
> Sorry, I don't get what you are saying.
>
>>> Spec just says you*can*  cache,
>>> not that you must.
>>>
>> Yes, but what did in this patch is "don't". What I suggest is just a "can",
>> since anyway the IOTLB entries were limited and could be replaced by other.
>>
>> Thanks
> Have trouble understanding this. Can you given an example of
> a guest visible difference?

I guess this may do the detection:

1) map iova A to be non-present.
2) invalidate iova A
3) map iova A to addr B
4) access iova A

A correct implemented CM may meet fault in step 4, but with this patch, 
we never.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-10 19:44           ` Alex Williamson
@ 2016-11-16 13:54             ` Michael S. Tsirkin
  2016-11-16 15:34               ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-16 13:54 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu

On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:
> On Thu, 10 Nov 2016 21:20:36 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
> > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:  
> > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > >     
> > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:    
> > > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > > > 
> > > > > > > * Advertize Cache Mode capability in iommu cap register. 
> > > > > > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > > > > > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > > > > > 
> > > > > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > > > > > >   registered notifier, and notify accordingly.      
> > > > > > 
> > > > > > This looks sane I think. Alex, care to comment?
> > > > > > Merging will have to wait until after the release.
> > > > > > Pls remember to re-test and re-ping then.    
> > > > > 
> > > > > I don't think it's suitable for upstream until there's a reasonable
> > > > > replay mechanism    
> > > > 
> > > > Could you pls clarify what do you mean by replay?
> > > > Is this when you attach a device by hotplug to
> > > > a running system?
> > > > 
> > > > If yes this can maybe be addressed by disabling hotplug temporarily.  
> > > 
> > > No, hotplug is not required, moving a device between existing domains
> > > requires replay, ie. actually using it for nested device assignment.  
> > 
> > Good point, that one is a correctness thing. Aviv,
> > could you add this in TODO list in a cover letter pls?
> > 
> > > > > and we straighten out whether it's expected to get
> > > > > multiple notifies and the notif-ee is responsible for filtering
> > > > > them or if the notif-er should do filtering.    
> > > > 
> > > > OK this is a documentation thing.  
> > > 
> > > Well no, it needs to be decided and if necessary implemented.  
> > 
> > Let's assume it's the notif-ee for now. Less is more and all that.
> 
> I think this is opposite of the approach dwg suggested.
>  
> > > > >  Without those, this is
> > > > > effectively just an RFC.    
> > > > 
> > > > It's infrastructure without users so it doesn't break things,
> > > > I'm more interested in seeing whether it's broken in
> > > > some way than whether it's complete.  
> > > 
> > > If it allows use with vfio but doesn't fully implement the complete set
> > > of interfaces, it does break things.  We currently prevent viommu usage
> > > with vfio because it is incomplete.  
> > 
> > Right - that bit is still in as far as I can see.
> 
> Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> vfio even though it's still incomplete.  We would at least need
> something like a replay callback for VT-d that triggers an abort if you
> still want to accept it incomplete.  Thanks,
> 
> Alex

IIUC practically things seems to work, right?
So how about disabling by default with a flag for people that want to
experiment with it?
E.g. x-vfio-allow-broken-translations ?

I would like to help this make progress such that 1. Aviv
gets the credit he did so far and 2. more people can join
development and help complete it.

> > > > The patchset spent out of tree too long and I'd like to see
> > > > us make progress towards device assignment working with
> > > > vIOMMU sooner rather than later, so if it's broken I won't
> > > > merge it but if it's incomplete I will.  
> > > 
> > > So long as it's incomplete and still prevents vfio usage, I'm ok with
> > > merging it, but I don't want to enable vfio usage until it's complete.
> > > Thanks,
> > > 
> > > Alex
> > >   
> > > > > > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU 
> > > > > > > present. Current problems:
> > > > > > > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> > > > > > >   VFIOGuestIOMMU.
> > > > > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> > > > > > >   64bit address space. Commenting out the call to this function enables 
> > > > > > >   workable VFIO device while vIOMMU present.
> > > > > > > * vfio_iommu_map_notify should check if address space range is suitable for 
> > > > > > >   current notifier.
> > > > > > > 
> > > > > > > Changes from v1 to v2:
> > > > > > > * remove assumption that the cache do not clears
> > > > > > > * fix lockup on high load.
> > > > > > > 
> > > > > > > Changes from v2 to v3:
> > > > > > > * remove debug leftovers
> > > > > > > * split to sepearate commits
> > > > > > > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> > > > > > >   to suppress error propagating to guest.
> > > > > > > 
> > > > > > > Changes from v3 to v4:
> > > > > > > * Add property to intel_iommu device to control the CM capability, 
> > > > > > >   default to False.
> > > > > > > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > > > > > > 
> > > > > > > Changes from v4 to v4 RESEND:
> > > > > > > * Fix codding style pointed by checkpatch.pl script.
> > > > > > > 
> > > > > > > Changes from v4 to v5:
> > > > > > > * Reduce the number of changes in patch 2 and make flags real bitfield.
> > > > > > > * Revert deleted debug prints.
> > > > > > > * Fix memory leak in patch 3.
> > > > > > > 
> > > > > > > Changes from v5 to v6:
> > > > > > > * fix prototype of iommu_translate function for more IOMMU types.
> > > > > > > * VFIO will be notified only on the difference, without unmap 
> > > > > > >   before change to maps.
> > > > > > > 
> > > > > > > Aviv Ben-David (3):
> > > > > > >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> > > > > > >     guest
> > > > > > >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> > > > > > >     NO_FAIL flag mode
> > > > > > >   IOMMU: enable intel_iommu map and unmap notifiers
> > > > > > > 
> > > > > > >  exec.c                         |   3 +-
> > > > > > >  hw/alpha/typhoon.c             |   2 +-
> > > > > > >  hw/i386/amd_iommu.c            |   4 +-
> > > > > > >  hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++--------
> > > > > > >  hw/i386/intel_iommu_internal.h |   3 +
> > > > > > >  hw/pci-host/apb.c              |   2 +-
> > > > > > >  hw/ppc/spapr_iommu.c           |   2 +-
> > > > > > >  hw/s390x/s390-pci-bus.c        |   2 +-
> > > > > > >  include/exec/memory.h          |   6 +-
> > > > > > >  include/hw/i386/intel_iommu.h  |  11 +++
> > > > > > >  memory.c                       |   3 +-
> > > > > > >  11 files changed, 160 insertions(+), 38 deletions(-)
> > > > > > > 
> > > > > > > -- 
> > > > > > > 1.9.1      
> > > > > >     

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-16 13:54             ` Michael S. Tsirkin
@ 2016-11-16 15:34               ` Alex Williamson
  2016-11-16 19:50                 ` Aviv B.D.
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2016-11-16 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Aviv B.D, Jan Kiszka, qemu-devel, Peter Xu

On Wed, 16 Nov 2016 15:54:56 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:
> > On Thu, 10 Nov 2016 21:20:36 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:  
> > > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >     
> > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:    
> > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > >       
> > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:      
> > > > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > > > > 
> > > > > > > > * Advertize Cache Mode capability in iommu cap register. 
> > > > > > > >   This capability is controlled by "cache-mode" property of intel-iommu device.
> > > > > > > >   To enable this option call QEMU with "-device intel-iommu,cache-mode=true".
> > > > > > > > 
> > > > > > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> > > > > > > >   registered notifier, and notify accordingly.        
> > > > > > > 
> > > > > > > This looks sane I think. Alex, care to comment?
> > > > > > > Merging will have to wait until after the release.
> > > > > > > Pls remember to re-test and re-ping then.      
> > > > > > 
> > > > > > I don't think it's suitable for upstream until there's a reasonable
> > > > > > replay mechanism      
> > > > > 
> > > > > Could you pls clarify what do you mean by replay?
> > > > > Is this when you attach a device by hotplug to
> > > > > a running system?
> > > > > 
> > > > > If yes this can maybe be addressed by disabling hotplug temporarily.    
> > > > 
> > > > No, hotplug is not required, moving a device between existing domains
> > > > requires replay, ie. actually using it for nested device assignment.    
> > > 
> > > Good point, that one is a correctness thing. Aviv,
> > > could you add this in TODO list in a cover letter pls?
> > >   
> > > > > > and we straighten out whether it's expected to get
> > > > > > multiple notifies and the notif-ee is responsible for filtering
> > > > > > them or if the notif-er should do filtering.      
> > > > > 
> > > > > OK this is a documentation thing.    
> > > > 
> > > > Well no, it needs to be decided and if necessary implemented.    
> > > 
> > > Let's assume it's the notif-ee for now. Less is more and all that.  
> > 
> > I think this is opposite of the approach dwg suggested.
> >    
> > > > > >  Without those, this is
> > > > > > effectively just an RFC.      
> > > > > 
> > > > > It's infrastructure without users so it doesn't break things,
> > > > > I'm more interested in seeing whether it's broken in
> > > > > some way than whether it's complete.    
> > > > 
> > > > If it allows use with vfio but doesn't fully implement the complete set
> > > > of interfaces, it does break things.  We currently prevent viommu usage
> > > > with vfio because it is incomplete.    
> > > 
> > > Right - that bit is still in as far as I can see.  
> > 
> > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> > vfio even though it's still incomplete.  We would at least need
> > something like a replay callback for VT-d that triggers an abort if you
> > still want to accept it incomplete.  Thanks,
> > 
> > Alex  
> 
> IIUC practically things seems to work, right?

AFAIK, no.

> So how about disabling by default with a flag for people that want to
> experiment with it?
> E.g. x-vfio-allow-broken-translations ?

We've already been through one round of "intel-iommu is incomplete for
use with device assignment, how can we prevent it from being used",
which led to the notify_flag_changed callback on MemoryRegionIOMMUOps.
This series now claims to fix that yet still doesn't provide a
mechanism to do memory_region_iommu_replay() given that VT-d has a much
larger address width.  Why is the onus on vfio to resolve this or
provide some sort of workaround?  vfio is using the QEMU iommu
interface correctly, intel-iommu is still incomplete. The least it
could do is add an optional replay callback to MemoryRegionIOMMUOps
that supersedes the existing memory_region_iommu_replay() code and
triggers an abort when it gets called.  I don't know what an
x-vfio-allow-broken-translations option would do, how I'd implement it,
or why I'd bother to implement it.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-10 19:20         ` Michael S. Tsirkin
  2016-11-10 19:44           ` Alex Williamson
@ 2016-11-16 19:20           ` Aviv B.D.
  1 sibling, 0 replies; 32+ messages in thread
From: Aviv B.D. @ 2016-11-16 19:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, Jan Kiszka, qemu-devel, Peter Xu

On Thu, Nov 10, 2016 at 9:20 PM, Michael S. Tsirkin <mst@redhat.com> wrote:

> On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
> > On Thu, 10 Nov 2016 17:54:35 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
> > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >
> > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
> > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > >
> > > > > > * Advertize Cache Mode capability in iommu cap register.
> > > > > >   This capability is controlled by "cache-mode" property of
> intel-iommu device.
> > > > > >   To enable this option call QEMU with "-device
> intel-iommu,cache-mode=true".
> > > > > >
> > > > > > * On page cache invalidation in intel vIOMMU, check if the
> domain belong to
> > > > > >   registered notifier, and notify accordingly.
> > > > >
> > > > > This looks sane I think. Alex, care to comment?
> > > > > Merging will have to wait until after the release.
> > > > > Pls remember to re-test and re-ping then.
> > > >
> > > > I don't think it's suitable for upstream until there's a reasonable
> > > > replay mechanism
> > >
> > > Could you pls clarify what do you mean by replay?
> > > Is this when you attach a device by hotplug to
> > > a running system?
> > >
> > > If yes this can maybe be addressed by disabling hotplug temporarily.
> >
> > No, hotplug is not required, moving a device between existing domains
> > requires replay, ie. actually using it for nested device assignment.
>
> Good point, that one is a correctness thing. Aviv,
> could you add this in TODO list in a cover letter pls?
>

Sure, no problem.


>
> > > > and we straighten out whether it's expected to get
> > > > multiple notifies and the notif-ee is responsible for filtering
> > > > them or if the notif-er should do filtering.
> > >
> > > OK this is a documentation thing.
> >
> > Well no, it needs to be decided and if necessary implemented.
>
> Let's assume it's the notif-ee for now. Less is more and all that.
>
> > > >  Without those, this is
> > > > effectively just an RFC.
> > >
> > > It's infrastructure without users so it doesn't break things,
> > > I'm more interested in seeing whether it's broken in
> > > some way than whether it's complete.
> >
> > If it allows use with vfio but doesn't fully implement the complete set
> > of interfaces, it does break things.  We currently prevent viommu usage
> > with vfio because it is incomplete.
>
> Right - that bit is still in as far as I can see.
>
> > > The patchset spent out of tree too long and I'd like to see
> > > us make progress towards device assignment working with
> > > vIOMMU sooner rather than later, so if it's broken I won't
> > > merge it but if it's incomplete I will.
> >
> > So long as it's incomplete and still prevents vfio usage, I'm ok with
> > merging it, but I don't want to enable vfio usage until it's complete.
> > Thanks,
> >
> > Alex
> >
> > > > > > Currently this patch still doesn't enabling VFIO devices support
> with vIOMMU
> > > > > > present. Current problems:
> > > > > > * vfio_iommu_map_notify is not aware about memory range belong
> to specific
> > > > > >   VFIOGuestIOMMU.
> > > > > > * memory_region_iommu_replay hangs QEMU on start up while it
> itterate over
> > > > > >   64bit address space. Commenting out the call to this function
> enables
> > > > > >   workable VFIO device while vIOMMU present.
> > > > > > * vfio_iommu_map_notify should check if address space range is
> suitable for
> > > > > >   current notifier.
> > > > > >
> > > > > > Changes from v1 to v2:
> > > > > > * remove assumption that the cache do not clears
> > > > > > * fix lockup on high load.
> > > > > >
> > > > > > Changes from v2 to v3:
> > > > > > * remove debug leftovers
> > > > > > * split to sepearate commits
> > > > > > * change is_write to flags in vtd_do_iommu_translate, add
> IOMMU_NO_FAIL
> > > > > >   to suppress error propagating to guest.
> > > > > >
> > > > > > Changes from v3 to v4:
> > > > > > * Add property to intel_iommu device to control the CM
> capability,
> > > > > >   default to False.
> > > > > > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > > > > >
> > > > > > Changes from v4 to v4 RESEND:
> > > > > > * Fix codding style pointed by checkpatch.pl script.
> > > > > >
> > > > > > Changes from v4 to v5:
> > > > > > * Reduce the number of changes in patch 2 and make flags real
> bitfield.
> > > > > > * Revert deleted debug prints.
> > > > > > * Fix memory leak in patch 3.
> > > > > >
> > > > > > Changes from v5 to v6:
> > > > > > * fix prototype of iommu_translate function for more IOMMU types.
> > > > > > * VFIO will be notified only on the difference, without unmap
> > > > > >   before change to maps.
> > > > > >
> > > > > > Aviv Ben-David (3):
> > > > > >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility
> exposoed to
> > > > > >     guest
> > > > > >   IOMMU: change iommu_op->translate's is_write to flags, add
> support to
> > > > > >     NO_FAIL flag mode
> > > > > >   IOMMU: enable intel_iommu map and unmap notifiers
> > > > > >
> > > > > >  exec.c                         |   3 +-
> > > > > >  hw/alpha/typhoon.c             |   2 +-
> > > > > >  hw/i386/amd_iommu.c            |   4 +-
> > > > > >  hw/i386/intel_iommu.c          | 160
> +++++++++++++++++++++++++++++++++--------
> > > > > >  hw/i386/intel_iommu_internal.h |   3 +
> > > > > >  hw/pci-host/apb.c              |   2 +-
> > > > > >  hw/ppc/spapr_iommu.c           |   2 +-
> > > > > >  hw/s390x/s390-pci-bus.c        |   2 +-
> > > > > >  include/exec/memory.h          |   6 +-
> > > > > >  include/hw/i386/intel_iommu.h  |  11 +++
> > > > > >  memory.c                       |   3 +-
> > > > > >  11 files changed, 160 insertions(+), 38 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 1.9.1
> > > > >
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-16 15:34               ` Alex Williamson
@ 2016-11-16 19:50                 ` Aviv B.D.
  2016-11-16 19:57                   ` Michael S. Tsirkin
  2016-11-16 20:03                   ` Alex Williamson
  0 siblings, 2 replies; 32+ messages in thread
From: Aviv B.D. @ 2016-11-16 19:50 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Michael S. Tsirkin, Jan Kiszka, qemu-devel, Peter Xu

On Wed, Nov 16, 2016 at 5:34 PM, Alex Williamson <alex.williamson@redhat.com
> wrote:

> On Wed, 16 Nov 2016 15:54:56 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:
> > > On Thu, 10 Nov 2016 21:20:36 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
> > > > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > >
> > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
> > > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
> > > > > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > > > > >
> > > > > > > > > * Advertize Cache Mode capability in iommu cap register.
> > > > > > > > >   This capability is controlled by "cache-mode" property
> of intel-iommu device.
> > > > > > > > >   To enable this option call QEMU with "-device
> intel-iommu,cache-mode=true".
> > > > > > > > >
> > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the
> domain belong to
> > > > > > > > >   registered notifier, and notify accordingly.
> > > > > > > >
> > > > > > > > This looks sane I think. Alex, care to comment?
> > > > > > > > Merging will have to wait until after the release.
> > > > > > > > Pls remember to re-test and re-ping then.
> > > > > > >
> > > > > > > I don't think it's suitable for upstream until there's a
> reasonable
> > > > > > > replay mechanism
> > > > > >
> > > > > > Could you pls clarify what do you mean by replay?
> > > > > > Is this when you attach a device by hotplug to
> > > > > > a running system?
> > > > > >
> > > > > > If yes this can maybe be addressed by disabling hotplug
> temporarily.
> > > > >
> > > > > No, hotplug is not required, moving a device between existing
> domains
> > > > > requires replay, ie. actually using it for nested device
> assignment.
> > > >
> > > > Good point, that one is a correctness thing. Aviv,
> > > > could you add this in TODO list in a cover letter pls?
> > > >
> > > > > > > and we straighten out whether it's expected to get
> > > > > > > multiple notifies and the notif-ee is responsible for filtering
> > > > > > > them or if the notif-er should do filtering.
> > > > > >
> > > > > > OK this is a documentation thing.
> > > > >
> > > > > Well no, it needs to be decided and if necessary implemented.
> > > >
> > > > Let's assume it's the notif-ee for now. Less is more and all that.
> > >
> > > I think this is opposite of the approach dwg suggested.
> > >
> > > > > > >  Without those, this is
> > > > > > > effectively just an RFC.
> > > > > >
> > > > > > It's infrastructure without users so it doesn't break things,
> > > > > > I'm more interested in seeing whether it's broken in
> > > > > > some way than whether it's complete.
> > > > >
> > > > > If it allows use with vfio but doesn't fully implement the
> complete set
> > > > > of interfaces, it does break things.  We currently prevent viommu
> usage
> > > > > with vfio because it is incomplete.
> > > >
> > > > Right - that bit is still in as far as I can see.
> > >
> > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> > > vfio even though it's still incomplete.  We would at least need
> > > something like a replay callback for VT-d that triggers an abort if you
> > > still want to accept it incomplete.  Thanks,
> > >
> > > Alex
> >
> > IIUC practically things seems to work, right?
>
> AFAIK, no.
>
> > So how about disabling by default with a flag for people that want to
> > experiment with it?
> > E.g. x-vfio-allow-broken-translations ?
>
> We've already been through one round of "intel-iommu is incomplete for
> use with device assignment, how can we prevent it from being used",
> which led to the notify_flag_changed callback on MemoryRegionIOMMUOps.
> This series now claims to fix that yet still doesn't provide a
> mechanism to do memory_region_iommu_replay() given that VT-d has a much
> larger address width.  Why is the onus on vfio to resolve this or
> provide some sort of workaround?  vfio is using the QEMU iommu
> interface correctly, intel-iommu is still incomplete. The least it
> could do is add an optional replay callback to MemoryRegionIOMMUOps
> that supersedes the existing memory_region_iommu_replay() code and
> triggers an abort when it gets called.  I don't know what an
> x-vfio-allow-broken-translations option would do, how I'd implement it,
> or why I'd bother to implement it.  Thanks,
>
> Alex
>

I'll implement your suggestion regarding the replay framwork.
Probably in another patch set, if it is OK?

Thanks,
Aviv.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-16 19:50                 ` Aviv B.D.
@ 2016-11-16 19:57                   ` Michael S. Tsirkin
  2016-11-21 11:36                     ` Aviv B.D.
  2016-11-16 20:03                   ` Alex Williamson
  1 sibling, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-16 19:57 UTC (permalink / raw)
  To: Aviv B.D.; +Cc: Alex Williamson, Jan Kiszka, qemu-devel, Peter Xu

On Wed, Nov 16, 2016 at 09:50:46PM +0200, Aviv B.D. wrote:
> 
> 
> On Wed, Nov 16, 2016 at 5:34 PM, Alex Williamson <alex.williamson@redhat.com>
> wrote:
> 
>     On Wed, 16 Nov 2016 15:54:56 +0200
>     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>     > On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:
>     > > On Thu, 10 Nov 2016 21:20:36 +0200
>     > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>     > >
>     > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
>     > > > > On Thu, 10 Nov 2016 17:54:35 +0200
>     > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>     > > > >
>     > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
>     > > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
>     > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>     > > > > > >
>     > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
>     > > > > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
>     > > > > > > > >
>     > > > > > > > > * Advertize Cache Mode capability in iommu cap register.
>     > > > > > > > >   This capability is controlled by "cache-mode" property of
>     intel-iommu device.
>     > > > > > > > >   To enable this option call QEMU with "-device
>     intel-iommu,cache-mode=true".
>     > > > > > > > >
>     > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the
>     domain belong to
>     > > > > > > > >   registered notifier, and notify accordingly.
>     > > > > > > >
>     > > > > > > > This looks sane I think. Alex, care to comment?
>     > > > > > > > Merging will have to wait until after the release.
>     > > > > > > > Pls remember to re-test and re-ping then.
>     > > > > > >
>     > > > > > > I don't think it's suitable for upstream until there's a
>     reasonable
>     > > > > > > replay mechanism
>     > > > > >
>     > > > > > Could you pls clarify what do you mean by replay?
>     > > > > > Is this when you attach a device by hotplug to
>     > > > > > a running system?
>     > > > > >
>     > > > > > If yes this can maybe be addressed by disabling hotplug
>     temporarily.
>     > > > >
>     > > > > No, hotplug is not required, moving a device between existing
>     domains
>     > > > > requires replay, ie. actually using it for nested device
>     assignment.
>     > > >
>     > > > Good point, that one is a correctness thing. Aviv,
>     > > > could you add this in TODO list in a cover letter pls?
>     > > >
>     > > > > > > and we straighten out whether it's expected to get
>     > > > > > > multiple notifies and the notif-ee is responsible for filtering
>     > > > > > > them or if the notif-er should do filtering.
>     > > > > >
>     > > > > > OK this is a documentation thing.
>     > > > >
>     > > > > Well no, it needs to be decided and if necessary implemented.
>     > > >
>     > > > Let's assume it's the notif-ee for now. Less is more and all that.
>     > >
>     > > I think this is opposite of the approach dwg suggested.
>     > >
>     > > > > > >  Without those, this is
>     > > > > > > effectively just an RFC.
>     > > > > >
>     > > > > > It's infrastructure without users so it doesn't break things,
>     > > > > > I'm more interested in seeing whether it's broken in
>     > > > > > some way than whether it's complete.
>     > > > >
>     > > > > If it allows use with vfio but doesn't fully implement the complete
>     set
>     > > > > of interfaces, it does break things.  We currently prevent viommu
>     usage
>     > > > > with vfio because it is incomplete.
>     > > >
>     > > > Right - that bit is still in as far as I can see.
>     > >
>     > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
>     > > vfio even though it's still incomplete.  We would at least need
>     > > something like a replay callback for VT-d that triggers an abort if you
>     > > still want to accept it incomplete.  Thanks,
>     > >
>     > > Alex
>     >
>     > IIUC practically things seems to work, right?
> 
>     AFAIK, no.
>    
>     > So how about disabling by default with a flag for people that want to
>     > experiment with it?
>     > E.g. x-vfio-allow-broken-translations ?
> 
>     We've already been through one round of "intel-iommu is incomplete for
>     use with device assignment, how can we prevent it from being used",
>     which led to the notify_flag_changed callback on MemoryRegionIOMMUOps.
>     This series now claims to fix that yet still doesn't provide a
>     mechanism to do memory_region_iommu_replay() given that VT-d has a much
>     larger address width.  Why is the onus on vfio to resolve this or
>     provide some sort of workaround?  vfio is using the QEMU iommu
>     interface correctly, intel-iommu is still incomplete. The least it
>     could do is add an optional replay callback to MemoryRegionIOMMUOps
>     that supersedes the existing memory_region_iommu_replay() code and
>     triggers an abort when it gets called.  I don't know what an
>     x-vfio-allow-broken-translations option would do, how I'd implement it,
>     or why I'd bother to implement it.  Thanks,
> 
>     Alex
> 
> 
> I'll implement your suggestion regarding the replay framwork. 
> Probably in another patch set, if it is OK?
> 
> Thanks,
> Aviv. 
> 

Alex points out that with 3/3 setting cache_mode=1 will enable VFIO
device assignment. Question would be, does it actually work for you
in this mode? If not it seems important not to enable it for users.

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-16 19:50                 ` Aviv B.D.
  2016-11-16 19:57                   ` Michael S. Tsirkin
@ 2016-11-16 20:03                   ` Alex Williamson
  2016-11-16 20:17                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2016-11-16 20:03 UTC (permalink / raw)
  To: Aviv B.D.; +Cc: Michael S. Tsirkin, Jan Kiszka, qemu-devel, Peter Xu

On Wed, 16 Nov 2016 21:50:46 +0200
"Aviv B.D." <bd.aviv@gmail.com> wrote:

> On Wed, Nov 16, 2016 at 5:34 PM, Alex Williamson <alex.williamson@redhat.com
> > wrote:  
> 
> > On Wed, 16 Nov 2016 15:54:56 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >  
> > > On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:  
> > > > On Thu, 10 Nov 2016 21:20:36 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >  
> > > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:  
> > > > > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > >  
> > > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:  
> > > > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > >  
> > > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:  
> > > > > > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > > > > > >
> > > > > > > > > > * Advertize Cache Mode capability in iommu cap register.
> > > > > > > > > >   This capability is controlled by "cache-mode" property  
> > of intel-iommu device.  
> > > > > > > > > >   To enable this option call QEMU with "-device  
> > intel-iommu,cache-mode=true".  
> > > > > > > > > >
> > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the  
> > domain belong to  
> > > > > > > > > >   registered notifier, and notify accordingly.  
> > > > > > > > >
> > > > > > > > > This looks sane I think. Alex, care to comment?
> > > > > > > > > Merging will have to wait until after the release.
> > > > > > > > > Pls remember to re-test and re-ping then.  
> > > > > > > >
> > > > > > > > I don't think it's suitable for upstream until there's a  
> > reasonable  
> > > > > > > > replay mechanism  
> > > > > > >
> > > > > > > Could you pls clarify what do you mean by replay?
> > > > > > > Is this when you attach a device by hotplug to
> > > > > > > a running system?
> > > > > > >
> > > > > > > If yes this can maybe be addressed by disabling hotplug  
> > temporarily.  
> > > > > >
> > > > > > No, hotplug is not required, moving a device between existing  
> > domains  
> > > > > > requires replay, ie. actually using it for nested device  
> > assignment.  
> > > > >
> > > > > Good point, that one is a correctness thing. Aviv,
> > > > > could you add this in TODO list in a cover letter pls?
> > > > >  
> > > > > > > > and we straighten out whether it's expected to get
> > > > > > > > multiple notifies and the notif-ee is responsible for filtering
> > > > > > > > them or if the notif-er should do filtering.  
> > > > > > >
> > > > > > > OK this is a documentation thing.  
> > > > > >
> > > > > > Well no, it needs to be decided and if necessary implemented.  
> > > > >
> > > > > Let's assume it's the notif-ee for now. Less is more and all that.  
> > > >
> > > > I think this is opposite of the approach dwg suggested.
> > > >  
> > > > > > > >  Without those, this is
> > > > > > > > effectively just an RFC.  
> > > > > > >
> > > > > > > It's infrastructure without users so it doesn't break things,
> > > > > > > I'm more interested in seeing whether it's broken in
> > > > > > > some way than whether it's complete.  
> > > > > >
> > > > > > If it allows use with vfio but doesn't fully implement the  
> > complete set  
> > > > > > of interfaces, it does break things.  We currently prevent viommu  
> > usage  
> > > > > > with vfio because it is incomplete.  
> > > > >
> > > > > Right - that bit is still in as far as I can see.  
> > > >
> > > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> > > > vfio even though it's still incomplete.  We would at least need
> > > > something like a replay callback for VT-d that triggers an abort if you
> > > > still want to accept it incomplete.  Thanks,
> > > >
> > > > Alex  
> > >
> > > IIUC practically things seems to work, right?  
> >
> > AFAIK, no.
> >  
> > > So how about disabling by default with a flag for people that want to
> > > experiment with it?
> > > E.g. x-vfio-allow-broken-translations ?  
> >
> > We've already been through one round of "intel-iommu is incomplete for
> > use with device assignment, how can we prevent it from being used",
> > which led to the notify_flag_changed callback on MemoryRegionIOMMUOps.
> > This series now claims to fix that yet still doesn't provide a
> > mechanism to do memory_region_iommu_replay() given that VT-d has a much
> > larger address width.  Why is the onus on vfio to resolve this or
> > provide some sort of workaround?  vfio is using the QEMU iommu
> > interface correctly, intel-iommu is still incomplete. The least it
> > could do is add an optional replay callback to MemoryRegionIOMMUOps
> > that supersedes the existing memory_region_iommu_replay() code and
> > triggers an abort when it gets called.  I don't know what an
> > x-vfio-allow-broken-translations option would do, how I'd implement it,
> > or why I'd bother to implement it.  Thanks,
> >
> > Alex
> >  
> 
> I'll implement your suggestion regarding the replay framwork.
> Probably in another patch set, if it is OK?

I think it needs to be committed in the same series that enables the
notifier_flag_changed callback, otherwise we're opening the door to
make use of vfio devices when we know that support is incomplete.  QEMU
is in hard freeze anyway, this has missed 2.8.  I would absolutely
object to including this in 2.8 without the replay provision.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-16 20:03                   ` Alex Williamson
@ 2016-11-16 20:17                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-16 20:17 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Aviv B.D., Jan Kiszka, qemu-devel, Peter Xu

On Wed, Nov 16, 2016 at 01:03:14PM -0700, Alex Williamson wrote:
> On Wed, 16 Nov 2016 21:50:46 +0200
> "Aviv B.D." <bd.aviv@gmail.com> wrote:
> 
> > On Wed, Nov 16, 2016 at 5:34 PM, Alex Williamson <alex.williamson@redhat.com
> > > wrote:  
> > 
> > > On Wed, 16 Nov 2016 15:54:56 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >  
> > > > On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:  
> > > > > On Thu, 10 Nov 2016 21:20:36 +0200
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > >  
> > > > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:  
> > > > > > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > >  
> > > > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:  
> > > > > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > >  
> > > > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:  
> > > > > > > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> > > > > > > > > > >
> > > > > > > > > > > * Advertize Cache Mode capability in iommu cap register.
> > > > > > > > > > >   This capability is controlled by "cache-mode" property  
> > > of intel-iommu device.  
> > > > > > > > > > >   To enable this option call QEMU with "-device  
> > > intel-iommu,cache-mode=true".  
> > > > > > > > > > >
> > > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the  
> > > domain belong to  
> > > > > > > > > > >   registered notifier, and notify accordingly.  
> > > > > > > > > >
> > > > > > > > > > This looks sane I think. Alex, care to comment?
> > > > > > > > > > Merging will have to wait until after the release.
> > > > > > > > > > Pls remember to re-test and re-ping then.  
> > > > > > > > >
> > > > > > > > > I don't think it's suitable for upstream until there's a  
> > > reasonable  
> > > > > > > > > replay mechanism  
> > > > > > > >
> > > > > > > > Could you pls clarify what do you mean by replay?
> > > > > > > > Is this when you attach a device by hotplug to
> > > > > > > > a running system?
> > > > > > > >
> > > > > > > > If yes this can maybe be addressed by disabling hotplug  
> > > temporarily.  
> > > > > > >
> > > > > > > No, hotplug is not required, moving a device between existing  
> > > domains  
> > > > > > > requires replay, ie. actually using it for nested device  
> > > assignment.  
> > > > > >
> > > > > > Good point, that one is a correctness thing. Aviv,
> > > > > > could you add this in TODO list in a cover letter pls?
> > > > > >  
> > > > > > > > > and we straighten out whether it's expected to get
> > > > > > > > > multiple notifies and the notif-ee is responsible for filtering
> > > > > > > > > them or if the notif-er should do filtering.  
> > > > > > > >
> > > > > > > > OK this is a documentation thing.  
> > > > > > >
> > > > > > > Well no, it needs to be decided and if necessary implemented.  
> > > > > >
> > > > > > Let's assume it's the notif-ee for now. Less is more and all that.  
> > > > >
> > > > > I think this is opposite of the approach dwg suggested.
> > > > >  
> > > > > > > > >  Without those, this is
> > > > > > > > > effectively just an RFC.  
> > > > > > > >
> > > > > > > > It's infrastructure without users so it doesn't break things,
> > > > > > > > I'm more interested in seeing whether it's broken in
> > > > > > > > some way than whether it's complete.  
> > > > > > >
> > > > > > > If it allows use with vfio but doesn't fully implement the  
> > > complete set  
> > > > > > > of interfaces, it does break things.  We currently prevent viommu  
> > > usage  
> > > > > > > with vfio because it is incomplete.  
> > > > > >
> > > > > > Right - that bit is still in as far as I can see.  
> > > > >
> > > > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> > > > > vfio even though it's still incomplete.  We would at least need
> > > > > something like a replay callback for VT-d that triggers an abort if you
> > > > > still want to accept it incomplete.  Thanks,
> > > > >
> > > > > Alex  
> > > >
> > > > IIUC practically things seems to work, right?  
> > >
> > > AFAIK, no.
> > >  
> > > > So how about disabling by default with a flag for people that want to
> > > > experiment with it?
> > > > E.g. x-vfio-allow-broken-translations ?  
> > >
> > > We've already been through one round of "intel-iommu is incomplete for
> > > use with device assignment, how can we prevent it from being used",
> > > which led to the notify_flag_changed callback on MemoryRegionIOMMUOps.
> > > This series now claims to fix that yet still doesn't provide a
> > > mechanism to do memory_region_iommu_replay() given that VT-d has a much
> > > larger address width.  Why is the onus on vfio to resolve this or
> > > provide some sort of workaround?  vfio is using the QEMU iommu
> > > interface correctly, intel-iommu is still incomplete. The least it
> > > could do is add an optional replay callback to MemoryRegionIOMMUOps
> > > that supersedes the existing memory_region_iommu_replay() code and
> > > triggers an abort when it gets called.  I don't know what an
> > > x-vfio-allow-broken-translations option would do, how I'd implement it,
> > > or why I'd bother to implement it.  Thanks,
> > >
> > > Alex
> > >  
> > 
> > I'll implement your suggestion regarding the replay framwork.
> > Probably in another patch set, if it is OK?
> 
> I think it needs to be committed in the same series that enables the
> notifier_flag_changed callback, otherwise we're opening the door to
> make use of vfio devices when we know that support is incomplete.  QEMU
> is in hard freeze anyway, this has missed 2.8.  I would absolutely
> object to including this in 2.8 without the replay provision.  Thanks,
> 
> Alex

I don't think this is 2.8 material but I'd like to start merging
this support first thing afterwards.

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
  2016-11-16 19:57                   ` Michael S. Tsirkin
@ 2016-11-21 11:36                     ` Aviv B.D.
  0 siblings, 0 replies; 32+ messages in thread
From: Aviv B.D. @ 2016-11-21 11:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, Jan Kiszka, qemu-devel, Peter Xu

On Wed, Nov 16, 2016 at 9:57 PM, Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, Nov 16, 2016 at 09:50:46PM +0200, Aviv B.D. wrote:
> >
> >
> > On Wed, Nov 16, 2016 at 5:34 PM, Alex Williamson <
> alex.williamson@redhat.com>
> > wrote:
> >
> >     On Wed, 16 Nov 2016 15:54:56 +0200
> >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> >     > On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:
> >     > > On Thu, 10 Nov 2016 21:20:36 +0200
> >     > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >     > >
> >     > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson
> wrote:
> >     > > > > On Thu, 10 Nov 2016 17:54:35 +0200
> >     > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >     > > > >
> >     > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson
> wrote:
> >     > > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> >     > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >     > > > > > >
> >     > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D
> wrote:
> >     > > > > > > > > From: "Aviv Ben-David" <bd.aviv@gmail.com>
> >     > > > > > > > >
> >     > > > > > > > > * Advertize Cache Mode capability in iommu cap
> register.
> >     > > > > > > > >   This capability is controlled by "cache-mode"
> property of
> >     intel-iommu device.
> >     > > > > > > > >   To enable this option call QEMU with "-device
> >     intel-iommu,cache-mode=true".
> >     > > > > > > > >
> >     > > > > > > > > * On page cache invalidation in intel vIOMMU, check
> if the
> >     domain belong to
> >     > > > > > > > >   registered notifier, and notify accordingly.
> >     > > > > > > >
> >     > > > > > > > This looks sane I think. Alex, care to comment?
> >     > > > > > > > Merging will have to wait until after the release.
> >     > > > > > > > Pls remember to re-test and re-ping then.
> >     > > > > > >
> >     > > > > > > I don't think it's suitable for upstream until there's a
> >     reasonable
> >     > > > > > > replay mechanism
> >     > > > > >
> >     > > > > > Could you pls clarify what do you mean by replay?
> >     > > > > > Is this when you attach a device by hotplug to
> >     > > > > > a running system?
> >     > > > > >
> >     > > > > > If yes this can maybe be addressed by disabling hotplug
> >     temporarily.
> >     > > > >
> >     > > > > No, hotplug is not required, moving a device between existing
> >     domains
> >     > > > > requires replay, ie. actually using it for nested device
> >     assignment.
> >     > > >
> >     > > > Good point, that one is a correctness thing. Aviv,
> >     > > > could you add this in TODO list in a cover letter pls?
> >     > > >
> >     > > > > > > and we straighten out whether it's expected to get
> >     > > > > > > multiple notifies and the notif-ee is responsible for
> filtering
> >     > > > > > > them or if the notif-er should do filtering.
> >     > > > > >
> >     > > > > > OK this is a documentation thing.
> >     > > > >
> >     > > > > Well no, it needs to be decided and if necessary implemented.
> >     > > >
> >     > > > Let's assume it's the notif-ee for now. Less is more and all
> that.
> >     > >
> >     > > I think this is opposite of the approach dwg suggested.
> >     > >
> >     > > > > > >  Without those, this is
> >     > > > > > > effectively just an RFC.
> >     > > > > >
> >     > > > > > It's infrastructure without users so it doesn't break
> things,
> >     > > > > > I'm more interested in seeing whether it's broken in
> >     > > > > > some way than whether it's complete.
> >     > > > >
> >     > > > > If it allows use with vfio but doesn't fully implement the
> complete
> >     set
> >     > > > > of interfaces, it does break things.  We currently prevent
> viommu
> >     usage
> >     > > > > with vfio because it is incomplete.
> >     > > >
> >     > > > Right - that bit is still in as far as I can see.
> >     > >
> >     > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use
> with
> >     > > vfio even though it's still incomplete.  We would at least need
> >     > > something like a replay callback for VT-d that triggers an abort
> if you
> >     > > still want to accept it incomplete.  Thanks,
> >     > >
> >     > > Alex
> >     >
> >     > IIUC practically things seems to work, right?
> >
> >     AFAIK, no.
> >
> >     > So how about disabling by default with a flag for people that want
> to
> >     > experiment with it?
> >     > E.g. x-vfio-allow-broken-translations ?
> >
> >     We've already been through one round of "intel-iommu is incomplete
> for
> >     use with device assignment, how can we prevent it from being used",
> >     which led to the notify_flag_changed callback on
> MemoryRegionIOMMUOps.
> >     This series now claims to fix that yet still doesn't provide a
> >     mechanism to do memory_region_iommu_replay() given that VT-d has a
> much
> >     larger address width.  Why is the onus on vfio to resolve this or
> >     provide some sort of workaround?  vfio is using the QEMU iommu
> >     interface correctly, intel-iommu is still incomplete. The least it
> >     could do is add an optional replay callback to MemoryRegionIOMMUOps
> >     that supersedes the existing memory_region_iommu_replay() code and
> >     triggers an abort when it gets called.  I don't know what an
> >     x-vfio-allow-broken-translations option would do, how I'd implement
> it,
> >     or why I'd bother to implement it.  Thanks,
> >
> >     Alex
> >
> >
> > I'll implement your suggestion regarding the replay framwork.
> > Probably in another patch set, if it is OK?
> >
> > Thanks,
> > Aviv.
> >
>
> Alex points out that with 3/3 setting cache_mode=1 will enable VFIO
> device assignment. Question would be, does it actually work for you
> in this mode? If not it seems important not to enable it for users.
>
> --
> MST
>

The patch is actually working with just commenting out the replay call in
vfio.
I didn't tested it with nesting environment but with DPDK it worked great.

Aviv.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-11  4:15           ` Jason Wang
@ 2016-11-21 12:41             ` Aviv B.D.
  2016-11-22  3:59               ` Jason Wang
  2016-11-21 13:35             ` Michael S. Tsirkin
  1 sibling, 1 reply; 32+ messages in thread
From: Aviv B.D. @ 2016-11-21 12:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu

On Fri, Nov 11, 2016 at 6:15 AM, Jason Wang <jasowang@redhat.com> wrote:

>
>
> On 2016年11月11日 11:39, Michael S. Tsirkin wrote:
>
>> On Fri, Nov 11, 2016 at 10:32:42AM +0800, Jason Wang wrote:
>>
>>>
>>> On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
>>>
>>>> On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
>>>>
>>>>>
>>>>>> On 2016年11月08日 19:04, Aviv B.D wrote:
>>>>>>
>>>>>>> From: "Aviv Ben-David"<bd.aviv@gmail.com>
>>>>>>>>
>>>>>>>> This capability asks the guest to invalidate cache before each map
>>>>>>>> operation.
>>>>>>>> We can use this invalidation to trap map operations in the
>>>>>>>> hypervisor.
>>>>>>>>
>>>>>>> Hi:
>>>>>>
>>>>>> Like I've asked twice in the past, I want to know why don't you cache
>>>>>> translation faults as what spec required (especially this is a guest
>>>>>> visible
>>>>>> behavior)?
>>>>>>
>>>>>> Btw, please cc me on posting future versions.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>> Caching isn't guest visible.
>>>>
>>> Seems not, if one fault mapping were cached by IOTLB. Guest can notice
>>> this
>>> behavior.
>>>
>> Sorry, I don't get what you are saying.
>>
>> Spec just says you*can*  cache,
>>>> not that you must.
>>>>
>>>> Yes, but what did in this patch is "don't". What I suggest is just a
>>> "can",
>>> since anyway the IOTLB entries were limited and could be replaced by
>>> other.
>>>
>>> Thanks
>>>
>> Have trouble understanding this. Can you given an example of
>> a guest visible difference?
>>
>
> I guess this may do the detection:
>
> 1) map iova A to be non-present.
> 2) invalidate iova A
> 3) map iova A to addr B
> 4) access iova A
>
> A correct implemented CM may meet fault in step 4, but with this patch, we
> never.
>
>
By the specification (from intel) section 6.1 (Caching mode) tells that
this bit may be present
only on virtual machines. So with just this point the guest can detect that
it running in
a VM of some sort. Additionally, the wording of the specification enable
the host to issue a fault
it your scenario but, doesn't requiring it.

Thanks,
Aviv.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-11  4:15           ` Jason Wang
  2016-11-21 12:41             ` Aviv B.D.
@ 2016-11-21 13:35             ` Michael S. Tsirkin
  2016-11-22  4:11               ` Jason Wang
  1 sibling, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-21 13:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: Aviv B.D, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu

On Fri, Nov 11, 2016 at 12:15:48PM +0800, Jason Wang wrote:
> 
> 
> On 2016年11月11日 11:39, Michael S. Tsirkin wrote:
> > On Fri, Nov 11, 2016 at 10:32:42AM +0800, Jason Wang wrote:
> > > 
> > > On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
> > > > On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
> > > > > > 
> > > > > > On 2016年11月08日 19:04, Aviv B.D wrote:
> > > > > > > > From: "Aviv Ben-David"<bd.aviv@gmail.com>
> > > > > > > > 
> > > > > > > > This capability asks the guest to invalidate cache before each map operation.
> > > > > > > > We can use this invalidation to trap map operations in the hypervisor.
> > > > > > Hi:
> > > > > > 
> > > > > > Like I've asked twice in the past, I want to know why don't you cache
> > > > > > translation faults as what spec required (especially this is a guest visible
> > > > > > behavior)?
> > > > > > 
> > > > > > Btw, please cc me on posting future versions.
> > > > > > 
> > > > > > Thanks
> > > > Caching isn't guest visible.
> > > Seems not, if one fault mapping were cached by IOTLB. Guest can notice this
> > > behavior.
> > Sorry, I don't get what you are saying.
> > 
> > > > Spec just says you*can*  cache,
> > > > not that you must.
> > > > 
> > > Yes, but what did in this patch is "don't". What I suggest is just a "can",
> > > since anyway the IOTLB entries were limited and could be replaced by other.
> > > 
> > > Thanks
> > Have trouble understanding this. Can you given an example of
> > a guest visible difference?
> 
> I guess this may do the detection:
> 
> 1) map iova A to be non-present.
> 2) invalidate iova A
> 3) map iova A to addr B
> 4) access iova A
> 
> A correct implemented CM may meet fault in step 4, but with this patch, we
> never.

I think that IOTLB is free to invalidate entries at any point,
so the fault is not guaranteed on bare metal.

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-21 12:41             ` Aviv B.D.
@ 2016-11-22  3:59               ` Jason Wang
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Wang @ 2016-11-22  3:59 UTC (permalink / raw)
  To: Aviv B.D.
  Cc: Michael S. Tsirkin, qemu-devel, Jan Kiszka, Alex Williamson, Peter Xu



On 2016年11月21日 20:41, Aviv B.D. wrote:
>
>
> On Fri, Nov 11, 2016 at 6:15 AM, Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>
>     On 2016年11月11日 11:39, Michael S. Tsirkin wrote:
>
>         On Fri, Nov 11, 2016 at 10:32:42AM +0800, Jason Wang wrote:
>
>
>             On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
>
>                 On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang
>                 wrote:
>
>
>                         On 2016年11月08日 19:04, Aviv B.D wrote:
>
>                                 From: "Aviv
>                                 Ben-David"<bd.aviv@gmail.com
>                                 <mailto:bd.aviv@gmail.com>>
>
>                                 This capability asks the guest to
>                                 invalidate cache before each map
>                                 operation.
>                                 We can use this invalidation to trap
>                                 map operations in the hypervisor.
>
>                         Hi:
>
>                         Like I've asked twice in the past, I want to
>                         know why don't you cache
>                         translation faults as what spec required
>                         (especially this is a guest visible
>                         behavior)?
>
>                         Btw, please cc me on posting future versions.
>
>                         Thanks
>
>                 Caching isn't guest visible.
>
>             Seems not, if one fault mapping were cached by IOTLB.
>             Guest can notice this
>             behavior.
>
>         Sorry, I don't get what you are saying.
>
>                 Spec just says you*can*  cache,
>                 not that you must.
>
>             Yes, but what did in this patch is "don't". What I suggest
>             is just a "can",
>             since anyway the IOTLB entries were limited and could be
>             replaced by other.
>
>             Thanks
>
>         Have trouble understanding this. Can you given an example of
>         a guest visible difference?
>
>
>     I guess this may do the detection:
>
>     1) map iova A to be non-present.
>     2) invalidate iova A
>     3) map iova A to addr B
>     4) access iova A
>
>     A correct implemented CM may meet fault in step 4, but with this
>     patch, we never.
>
>
> By the specification (from intel) section 6.1 (Caching mode) tells 
> that this bit may be present
> only on virtual machines.

Yes, but we should also align the behavior to the spec.

> So with just this point the guest can detect that it running in
> a VM of some sort. Additionally, the wording of the specification 
> enable the host to issue a fault
> it your scenario but, doesn't requiring it.

I'm not sure I get the meaning, is there a section that describes your 
meaning in the spec?

Thanks

>
> Thanks,
> Aviv.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-21 13:35             ` Michael S. Tsirkin
@ 2016-11-22  4:11               ` Jason Wang
  2016-11-22 13:15                 ` Aviv B.D.
  2016-11-22 15:09                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 32+ messages in thread
From: Jason Wang @ 2016-11-22  4:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Aviv B.D, Alex Williamson, qemu-devel, Peter Xu



On 2016年11月21日 21:35, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2016 at 12:15:48PM +0800, Jason Wang wrote:
>> >
>> >
>> >On 2016年11月11日 11:39, Michael S. Tsirkin wrote:
>>> > >On Fri, Nov 11, 2016 at 10:32:42AM +0800, Jason Wang wrote:
>>>> > > >
>>>> > > >On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
>>>>> > > > >On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
>>>>>>> > > > > > >
>>>>>>> > > > > > >On 2016年11月08日 19:04, Aviv B.D wrote:
>>>>>>>>> > > > > > > > >From: "Aviv Ben-David"<bd.aviv@gmail.com>
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > > >This capability asks the guest to invalidate cache before each map operation.
>>>>>>>>> > > > > > > > >We can use this invalidation to trap map operations in the hypervisor.
>>>>>>> > > > > > >Hi:
>>>>>>> > > > > > >
>>>>>>> > > > > > >Like I've asked twice in the past, I want to know why don't you cache
>>>>>>> > > > > > >translation faults as what spec required (especially this is a guest visible
>>>>>>> > > > > > >behavior)?
>>>>>>> > > > > > >
>>>>>>> > > > > > >Btw, please cc me on posting future versions.
>>>>>>> > > > > > >
>>>>>>> > > > > > >Thanks
>>>>> > > > >Caching isn't guest visible.
>>>> > > >Seems not, if one fault mapping were cached by IOTLB. Guest can notice this
>>>> > > >behavior.
>>> > >Sorry, I don't get what you are saying.
>>> > >
>>>>> > > > >Spec just says you*can*  cache,
>>>>> > > > >not that you must.
>>>>> > > > >
>>>> > > >Yes, but what did in this patch is "don't". What I suggest is just a "can",
>>>> > > >since anyway the IOTLB entries were limited and could be replaced by other.
>>>> > > >
>>>> > > >Thanks
>>> > >Have trouble understanding this. Can you given an example of
>>> > >a guest visible difference?
>> >
>> >I guess this may do the detection:
>> >
>> >1) map iova A to be non-present.
>> >2) invalidate iova A
>> >3) map iova A to addr B
>> >4) access iova A
>> >
>> >A correct implemented CM may meet fault in step 4, but with this patch, we
>> >never.
> I think that IOTLB is free to invalidate entries at any point,
> so the fault is not guaranteed on bare metal.

Yes, that's the interesting point. The fault is not guaranteed but 
conditional. And we have similar issue for IEC.

So in conclusion (since I can't find a hardware IOMMU that have CM set):

1) If we don't cache fault conditions, we are still in question that 
whether it was spec compatible.
2) If we do cache fault conditions, we are 100% sure it was spec 
compatible.

Consider 2) is not complicated, we'd better do it I believe?

Thanks

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-22  4:11               ` Jason Wang
@ 2016-11-22 13:15                 ` Aviv B.D.
  2016-11-22 15:13                   ` Michael S. Tsirkin
  2016-11-22 15:09                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 32+ messages in thread
From: Aviv B.D. @ 2016-11-22 13:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Jan Kiszka, Alex Williamson, qemu-devel, Peter Xu

On Tue, Nov 22, 2016 at 6:11 AM, Jason Wang <jasowang@redhat.com> wrote:

>
>
> On 2016年11月21日 21:35, Michael S. Tsirkin wrote:
>
>> On Fri, Nov 11, 2016 at 12:15:48PM +0800, Jason Wang wrote:
>>
>>> >
>>> >
>>> >On 2016年11月11日 11:39, Michael S. Tsirkin wrote:
>>>
>>>> > >On Fri, Nov 11, 2016 at 10:32:42AM +0800, Jason Wang wrote:
>>>>
>>>>> > > >
>>>>> > > >On 2016年11月10日 06:00, Michael S. Tsirkin wrote:
>>>>>
>>>>>> > > > >On Wed, Nov 09, 2016 at 03:28:02PM +0800, Jason Wang wrote:
>>>>>>
>>>>>>> > > > > > >
>>>>>>>> > > > > > >On 2016年11月08日 19:04, Aviv B.D wrote:
>>>>>>>>
>>>>>>>>> > > > > > > > >From: "Aviv Ben-David"<bd.aviv@gmail.com>
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > > >This capability asks the guest to invalidate cache
>>>>>>>>>> before each map operation.
>>>>>>>>>> > > > > > > > >We can use this invalidation to trap map
>>>>>>>>>> operations in the hypervisor.
>>>>>>>>>>
>>>>>>>>> > > > > > >Hi:
>>>>>>>> > > > > > >
>>>>>>>> > > > > > >Like I've asked twice in the past, I want to know why
>>>>>>>> don't you cache
>>>>>>>> > > > > > >translation faults as what spec required (especially
>>>>>>>> this is a guest visible
>>>>>>>> > > > > > >behavior)?
>>>>>>>> > > > > > >
>>>>>>>> > > > > > >Btw, please cc me on posting future versions.
>>>>>>>> > > > > > >
>>>>>>>> > > > > > >Thanks
>>>>>>>>
>>>>>>> > > > >Caching isn't guest visible.
>>>>>>
>>>>> > > >Seems not, if one fault mapping were cached by IOTLB. Guest can
>>>>> notice this
>>>>> > > >behavior.
>>>>>
>>>> > >Sorry, I don't get what you are saying.
>>>> > >
>>>>
>>>>> > > > >Spec just says you*can*  cache,
>>>>>> > > > >not that you must.
>>>>>> > > > >
>>>>>>
>>>>> > > >Yes, but what did in this patch is "don't". What I suggest is
>>>>> just a "can",
>>>>> > > >since anyway the IOTLB entries were limited and could be replaced
>>>>> by other.
>>>>> > > >
>>>>> > > >Thanks
>>>>>
>>>> > >Have trouble understanding this. Can you given an example of
>>>> > >a guest visible difference?
>>>>
>>> >
>>> >I guess this may do the detection:
>>> >
>>> >1) map iova A to be non-present.
>>> >2) invalidate iova A
>>> >3) map iova A to addr B
>>> >4) access iova A
>>> >
>>> >A correct implemented CM may meet fault in step 4, but with this patch,
>>> we
>>> >never.
>>>
>> I think that IOTLB is free to invalidate entries at any point,
>> so the fault is not guaranteed on bare metal.
>>
>
> Yes, that's the interesting point. The fault is not guaranteed but
> conditional. And we have similar issue for IEC.
>
> So in conclusion (since I can't find a hardware IOMMU that have CM set):
>
> The spec says that there is no hardware IOMMU with CM bit set. This bit
exists only to enable efficient emulation of iommu.


> 1) If we don't cache fault conditions, we are still in question that
> whether it was spec compatible.
> 2) If we do cache fault conditions, we are 100% sure it was spec
> compatible.
>
Consider 2) is not complicated, we'd better do it I believe?
>
> "For implementations reporting Caching Mode (CM) as 1 in the Capability
Register, above conditions may cause caching of the entry that resulted in
the fault, and require explicit invalidation by software to invalidate such
cached entries." (
http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf,,
section 6.2.2, page 82). The key word is "may", therefore I do not think
that actually faulting the guest is necessary in this case.

However, if you insist on this part I will add the relevant code...

Thanks
>

Aviv.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-22  4:11               ` Jason Wang
  2016-11-22 13:15                 ` Aviv B.D.
@ 2016-11-22 15:09                 ` Michael S. Tsirkin
  2016-12-01  3:02                   ` Tian, Kevin
  1 sibling, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-22 15:09 UTC (permalink / raw)
  To: Jason Wang; +Cc: Jan Kiszka, Aviv B.D, Alex Williamson, qemu-devel, Peter Xu

On Tue, Nov 22, 2016 at 12:11:21PM +0800, Jason Wang wrote:
> Yes, that's the interesting point. The fault is not guaranteed but
> conditional. And we have similar issue for IEC.
> 
> So in conclusion (since I can't find a hardware IOMMU that have CM set):
> 
> 1) If we don't cache fault conditions, we are still in question that whether
> it was spec compatible.
> 2) If we do cache fault conditions, we are 100% sure it was spec compatible.
> 
> Consider 2) is not complicated, we'd better do it I believe?
> 
> Thanks

IMO it's just a confusing jargon used by intel architects.
CM is there exactly so you can shadow the tables, but
they were trying hard to use wording that also makes
sense to hardware/driver developers.

Nothing to worry about.

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-22 13:15                 ` Aviv B.D.
@ 2016-11-22 15:13                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-11-22 15:13 UTC (permalink / raw)
  To: Aviv B.D.; +Cc: Jason Wang, Jan Kiszka, Alex Williamson, qemu-devel, Peter Xu

On Tue, Nov 22, 2016 at 03:15:45PM +0200, Aviv B.D. wrote:
> However, if you insist on this part I will add the relevant code...

I wouldn't bother at this point. Getting to a point where
we can merge it with vfio support seems more important.

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest
  2016-11-22 15:09                 ` Michael S. Tsirkin
@ 2016-12-01  3:02                   ` Tian, Kevin
  0 siblings, 0 replies; 32+ messages in thread
From: Tian, Kevin @ 2016-12-01  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Jan Kiszka, Aviv B.D, Alex Williamson, Peter Xu, qemu-devel

> From: Michael S. Tsirkin
> Sent: Tuesday, November 22, 2016 11:10 PM
> 
> On Tue, Nov 22, 2016 at 12:11:21PM +0800, Jason Wang wrote:
> > Yes, that's the interesting point. The fault is not guaranteed but
> > conditional. And we have similar issue for IEC.
> >
> > So in conclusion (since I can't find a hardware IOMMU that have CM set):
> >
> > 1) If we don't cache fault conditions, we are still in question that whether
> > it was spec compatible.
> > 2) If we do cache fault conditions, we are 100% sure it was spec compatible.
> >
> > Consider 2) is not complicated, we'd better do it I believe?
> >
> > Thanks
> 
> IMO it's just a confusing jargon used by intel architects.
> CM is there exactly so you can shadow the tables, but
> they were trying hard to use wording that also makes
> sense to hardware/driver developers.
> 
> Nothing to worry about.
> 

Agree. Software shouldn't make any assumption that fault entries
will be cached when caching mode is enabled. It's worded just
because some very old generation did cache non-present/faulting
entries so IOMMU driver needs to be aware of such potential side
effect. while for virtualization the only purpose is to ask guest side 
to do IOTLB invalidation for any PTE change. It's completely fine 
to do simple thing here for vIOMMU here.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-12-01  3:02 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-08 11:04 [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Aviv B.D
2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 1/3] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Aviv B.D
2016-11-09  7:28   ` Jason Wang
2016-11-09 22:00     ` Michael S. Tsirkin
2016-11-11  2:32       ` Jason Wang
2016-11-11  3:39         ` Michael S. Tsirkin
2016-11-11  4:15           ` Jason Wang
2016-11-21 12:41             ` Aviv B.D.
2016-11-22  3:59               ` Jason Wang
2016-11-21 13:35             ` Michael S. Tsirkin
2016-11-22  4:11               ` Jason Wang
2016-11-22 13:15                 ` Aviv B.D.
2016-11-22 15:13                   ` Michael S. Tsirkin
2016-11-22 15:09                 ` Michael S. Tsirkin
2016-12-01  3:02                   ` Tian, Kevin
2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode Aviv B.D
2016-11-08 11:04 ` [Qemu-devel] [PATCH v6 3/3] IOMMU: enable intel_iommu map and unmap notifiers Aviv B.D
2016-11-10 15:13   ` Michael S. Tsirkin
2016-11-10 15:14 ` [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications Michael S. Tsirkin
2016-11-10 15:30   ` Alex Williamson
2016-11-10 15:54     ` Michael S. Tsirkin
2016-11-10 16:04       ` Alex Williamson
2016-11-10 19:20         ` Michael S. Tsirkin
2016-11-10 19:44           ` Alex Williamson
2016-11-16 13:54             ` Michael S. Tsirkin
2016-11-16 15:34               ` Alex Williamson
2016-11-16 19:50                 ` Aviv B.D.
2016-11-16 19:57                   ` Michael S. Tsirkin
2016-11-21 11:36                     ` Aviv B.D.
2016-11-16 20:03                   ` Alex Williamson
2016-11-16 20:17                     ` Michael S. Tsirkin
2016-11-16 19:20           ` Aviv B.D.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.