All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes
@ 2017-04-17 11:32 Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate() Peter Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum

v2:
- resending v1 since my patch script had a regression, which failed to
  send the patches to the list... trying again. sorry for the noise!

This series is for 2.10, and based on following series:

    [PATCH v8 0/9] VT-d: vfio enablement and misc enhances

This series add support for per-device passthrough mode for VT-d
emulation, along with some tweaks on existing codes.

Patches 1-2 are memory related cleanups.
Patches 3-6 are VT-d cleanups.
Patch 7 add support for passthrough.

todo:
- vhost: when pt is specified for vhost-enabled device, kernel can
  avoid translating the message and fallback to no-iommu mode.

Please review. Thanks.

Peter Xu (7):
  memory: tune last param of iommu_ops.translate()
  memory: remove the last param in memory_region_iommu_replay()
  x86-iommu: use DeviceClass properties
  intel_iommu: renaming context entry helpers
  intel_iommu: provide vtd_ce_get_type()
  intel_iommu: use IOMMU_ACCESS_FLAG()
  intel_iommu: support passthrough (PT)

 exec.c                         |   6 +-
 hw/alpha/typhoon.c             |   2 +-
 hw/dma/rc4030.c                |   2 +-
 hw/i386/amd_iommu.c            |   4 +-
 hw/i386/intel_iommu.c          | 146 +++++++++++++++++++++++++++++++----------
 hw/i386/intel_iommu_internal.h |   1 +
 hw/i386/trace-events           |   1 +
 hw/i386/x86-iommu.c            |  48 +++-----------
 hw/pci-host/apb.c              |   2 +-
 hw/ppc/spapr_iommu.c           |   2 +-
 hw/s390x/s390-pci-bus.c        |   2 +-
 hw/s390x/s390-pci-inst.c       |   2 +-
 hw/vfio/common.c               |   2 +-
 include/exec/memory.h          |  15 +++--
 include/hw/i386/x86-iommu.h    |   1 +
 memory.c                       |   7 +-
 16 files changed, 146 insertions(+), 97 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate()
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
@ 2017-04-17 11:32 ` Peter Xu
  2017-04-18  4:07   ` David Gibson
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 2/7] memory: remove the last param in memory_region_iommu_replay() Peter Xu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum, Paolo Bonzini

This patch converts the old "is_write" bool into IOMMUAccessFlags. The
difference is that "is_write" can only express either read/write, but
sometimes what we really want is "none" here (neither read nor write).
Replay is an good example - during replay, we should not check any RW
permission bits since thats not an actual IO at all.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 exec.c                   |  6 ++++--
 hw/alpha/typhoon.c       |  2 +-
 hw/dma/rc4030.c          |  2 +-
 hw/i386/amd_iommu.c      |  4 ++--
 hw/i386/intel_iommu.c    |  4 ++--
 hw/pci-host/apb.c        |  2 +-
 hw/ppc/spapr_iommu.c     |  2 +-
 hw/s390x/s390-pci-bus.c  |  2 +-
 hw/s390x/s390-pci-inst.c |  2 +-
 include/exec/memory.h    | 10 ++++++++--
 memory.c                 |  3 ++-
 11 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/exec.c b/exec.c
index c97ef4a..188892b 100644
--- a/exec.c
+++ b/exec.c
@@ -475,7 +475,8 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
             break;
         }
 
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = mr->iommu_ops->translate(mr, addr, is_write ?
+                                         IOMMU_WO : IOMMU_RO);
         if (!(iotlb.perm & (1 << is_write))) {
             iotlb.target_as = NULL;
             break;
@@ -507,7 +508,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
             break;
         }
 
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = mr->iommu_ops->translate(mr, addr, is_write ?
+                                         IOMMU_WO : IOMMU_RO);
         addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
                 | (addr & iotlb.addr_mask));
         *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index f50f5cf..c1cf780 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -664,7 +664,7 @@ static bool window_translate(TyphoonWindow *win, hwaddr addr,
 /* TODO: A translation failure here ought to set PCI error codes on the
    Pchip and generate a machine check interrupt.  */
 static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr,
-                                             bool is_write)
+                                             IOMMUAccessFlags flag)
 {
     TyphoonPchip *pchip = container_of(iommu, TyphoonPchip, iommu);
     IOMMUTLBEntry ret;
diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c
index 0080141..edf9432 100644
--- a/hw/dma/rc4030.c
+++ b/hw/dma/rc4030.c
@@ -489,7 +489,7 @@ static const MemoryRegionOps jazzio_ops = {
 };
 
 static IOMMUTLBEntry rc4030_dma_translate(MemoryRegion *iommu, hwaddr addr,
-                                          bool is_write)
+                                          IOMMUAccessFlags flag)
 {
     rc4030State *s = container_of(iommu, rc4030State, dma_mr);
     IOMMUTLBEntry ret = {
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index f86a40a..42b34ef 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -987,7 +987,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
 }
 
 static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
-                                     bool is_write)
+                                     IOMMUAccessFlags flag)
 {
     AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
     AMDVIState *s = as->iommu_state;
@@ -1016,7 +1016,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
         return ret;
     }
 
-    amdvi_do_translate(as, addr, is_write, &ret);
+    amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
     trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
             PCI_FUNC(as->devfn), addr, ret.translated_addr);
     return ret;
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 02f047c..ea54ec3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2221,7 +2221,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
 }
 
 static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
-                                         bool is_write)
+                                         IOMMUAccessFlags flag)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
@@ -2243,7 +2243,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
     }
 
     vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, addr,
-                           is_write, &ret);
+                           flag & IOMMU_WO, &ret);
     VTD_DPRINTF(MMU,
                 "bus %"PRIu8 " slot %"PRIu8 " func %"PRIu8 " devfn %"PRIu8
                 " iova 0x%"PRIx64 " hpa 0x%"PRIx64, pci_bus_num(vtd_as->bus),
diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 653e711..ad7abb2 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -209,7 +209,7 @@ static AddressSpace *pbm_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
 /* Called from RCU critical section */
 static IOMMUTLBEntry pbm_translate_iommu(MemoryRegion *iommu, hwaddr addr,
-                                         bool is_write)
+                                         IOMMUAccessFlags flag)
 {
     IOMMUState *is = container_of(iommu, IOMMUState, iommu);
     hwaddr baseaddr, offset;
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index ae30bbe..a3ae78d 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -110,7 +110,7 @@ static void spapr_tce_free_table(uint64_t *table, int fd, uint32_t nb_table)
 
 /* Called from RCU critical section */
 static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
-                                               bool is_write)
+                                               IOMMUAccessFlags flag)
 {
     sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
     uint64_t tce;
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 69b0291..9b2c344 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -355,7 +355,7 @@ out:
 }
 
 static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *mr, hwaddr addr,
-                                          bool is_write)
+                                          IOMMUAccessFlags flag)
 {
     uint64_t pte;
     uint32_t flags;
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index d2a8c0a..0087aa7 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -622,7 +622,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 
     mr = &iommu->iommu_mr;
     while (start < end) {
-        entry = mr->iommu_ops->translate(mr, start, 0);
+        entry = mr->iommu_ops->translate(mr, start, IOMMU_NONE);
 
         if (!entry.translated_addr) {
             pbdev->state = ZPCI_FS_ERROR;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index c4fc94d..9047bf3 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -185,8 +185,14 @@ struct MemoryRegionOps {
 typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
 
 struct MemoryRegionIOMMUOps {
-    /* Return a TLB entry that contains a given address. */
-    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
+    /*
+     * Return a TLB entry that contains a given address. Flag should
+     * be the access permission of this translation operation. We can
+     * set flag to IOMMU_NONE to mean that we don't need any
+     * read/write permission checks, like, when for region replay.
+     */
+    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr,
+                               IOMMUAccessFlags flag);
     /* Returns minimum supported page size */
     uint64_t (*get_min_page_size)(MemoryRegion *iommu);
     /* Called when IOMMU Notifier flag changed */
diff --git a/memory.c b/memory.c
index b782d5b..47dc107 100644
--- a/memory.c
+++ b/memory.c
@@ -1625,6 +1625,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
 {
     hwaddr addr, granularity;
     IOMMUTLBEntry iotlb;
+    IOMMUAccessFlags flag = is_write ? IOMMU_WO : IOMMU_RO;
 
     /* If the IOMMU has its own replay callback, override */
     if (mr->iommu_ops->replay) {
@@ -1635,7 +1636,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
     granularity = memory_region_iommu_get_min_page_size(mr);
 
     for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = mr->iommu_ops->translate(mr, addr, flag);
         if (iotlb.perm != IOMMU_NONE) {
             n->notify(n, &iotlb);
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 2/7] memory: remove the last param in memory_region_iommu_replay()
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate() Peter Xu
@ 2017-04-17 11:32 ` Peter Xu
  2017-04-18  4:08   ` David Gibson
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 3/7] x86-iommu: use DeviceClass properties Peter Xu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum, Paolo Bonzini

We were always passing in that one as "false" to assume that's an read
operation, and we also assume that IOMMU translation would always have
that read permission. A better permission would be IOMMU_NONE since the
replay is after all not a real read operation, but just a page table
rebuilding process.

CC: David Gibson <david@gibson.dropbear.id.au>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/vfio/common.c      | 2 +-
 include/exec/memory.h | 5 +----
 memory.c              | 8 +++-----
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6b33b9f..d008a4b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -488,7 +488,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
         memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-        memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
+        memory_region_iommu_replay(giommu->iommu, &giommu->n);
 
         return;
     }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9047bf3..8721d53 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -731,11 +731,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
  *
  * @mr: the memory region to observe
  * @n: the notifier to which to replay iommu mappings
- * @is_write: Whether to treat the replay as a translate "write"
- *     through the iommu
  */
-void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
-                                bool is_write);
+void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n);
 
 /**
  * memory_region_iommu_replay_all: replay existing IOMMU translations
diff --git a/memory.c b/memory.c
index 47dc107..6b2fdb7 100644
--- a/memory.c
+++ b/memory.c
@@ -1620,12 +1620,10 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
     return TARGET_PAGE_SIZE;
 }
 
-void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
-                                bool is_write)
+void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
 {
     hwaddr addr, granularity;
     IOMMUTLBEntry iotlb;
-    IOMMUAccessFlags flag = is_write ? IOMMU_WO : IOMMU_RO;
 
     /* If the IOMMU has its own replay callback, override */
     if (mr->iommu_ops->replay) {
@@ -1636,7 +1634,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
     granularity = memory_region_iommu_get_min_page_size(mr);
 
     for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
-        iotlb = mr->iommu_ops->translate(mr, addr, flag);
+        iotlb = mr->iommu_ops->translate(mr, addr, IOMMU_NONE);
         if (iotlb.perm != IOMMU_NONE) {
             n->notify(n, &iotlb);
         }
@@ -1654,7 +1652,7 @@ void memory_region_iommu_replay_all(MemoryRegion *mr)
     IOMMUNotifier *notifier;
 
     IOMMU_NOTIFIER_FOREACH(notifier, mr) {
-        memory_region_iommu_replay(mr, notifier, false);
+        memory_region_iommu_replay(mr, notifier);
     }
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 3/7] x86-iommu: use DeviceClass properties
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate() Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 2/7] memory: remove the last param in memory_region_iommu_replay() Peter Xu
@ 2017-04-17 11:32 ` Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 4/7] intel_iommu: renaming context entry helpers Peter Xu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum

No reason to keep tens of lines if we can do it actually far shorter.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/x86-iommu.c | 47 +++++++----------------------------------------
 1 file changed, 7 insertions(+), 40 deletions(-)

diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 23dcd3f..02b8825 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -88,55 +88,22 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
     x86_iommu_set_default(X86_IOMMU_DEVICE(dev));
 }
 
+static Property x86_iommu_properties[] = {
+    DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
+    DEFINE_PROP_BOOL("device-iotlb", X86IOMMUState, dt_supported, false),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void x86_iommu_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     dc->realize = x86_iommu_realize;
-}
-
-static bool x86_iommu_intremap_prop_get(Object *o, Error **errp)
-{
-    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
-    return s->intr_supported;
-}
-
-static void x86_iommu_intremap_prop_set(Object *o, bool value, Error **errp)
-{
-    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
-    s->intr_supported = value;
-}
-
-static bool x86_iommu_device_iotlb_prop_get(Object *o, Error **errp)
-{
-    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
-    return s->dt_supported;
-}
-
-static void x86_iommu_device_iotlb_prop_set(Object *o, bool value, Error **errp)
-{
-    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
-    s->dt_supported = value;
-}
-
-static void x86_iommu_instance_init(Object *o)
-{
-    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
-
-    /* By default, do not support IR */
-    s->intr_supported = false;
-    object_property_add_bool(o, "intremap", x86_iommu_intremap_prop_get,
-                             x86_iommu_intremap_prop_set, NULL);
-    s->dt_supported = false;
-    object_property_add_bool(o, "device-iotlb",
-                             x86_iommu_device_iotlb_prop_get,
-                             x86_iommu_device_iotlb_prop_set,
-                             NULL);
+    dc->props = x86_iommu_properties;
 }
 
 static const TypeInfo x86_iommu_info = {
     .name          = TYPE_X86_IOMMU_DEVICE,
     .parent        = TYPE_SYS_BUS_DEVICE,
-    .instance_init = x86_iommu_instance_init,
     .instance_size = sizeof(X86IOMMUState),
     .class_init    = x86_iommu_class_init,
     .class_size    = sizeof(X86IOMMUClass),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 4/7] intel_iommu: renaming context entry helpers
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
                   ` (2 preceding siblings ...)
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 3/7] x86-iommu: use DeviceClass properties Peter Xu
@ 2017-04-17 11:32 ` Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 5/7] intel_iommu: provide vtd_ce_get_type() Peter Xu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum

The old names are too long and less ordered. Let's start to use
vtd_ce_*() as a pattern.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ea54ec3..4628f04 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -512,7 +512,7 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
     return 0;
 }
 
-static inline bool vtd_context_entry_present(VTDContextEntry *context)
+static inline bool vtd_ce_present(VTDContextEntry *context)
 {
     return context->lo & VTD_CONTEXT_ENTRY_P;
 }
@@ -533,7 +533,7 @@ static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
     return 0;
 }
 
-static inline dma_addr_t vtd_get_slpt_base_from_context(VTDContextEntry *ce)
+static inline dma_addr_t vtd_ce_get_slpt_base(VTDContextEntry *ce)
 {
     return ce->lo & VTD_CONTEXT_ENTRY_SLPTPTR;
 }
@@ -585,19 +585,19 @@ static inline bool vtd_is_level_supported(IntelIOMMUState *s, uint32_t level)
 /* Get the page-table level that hardware should use for the second-level
  * page-table walk from the Address Width field of context-entry.
  */
-static inline uint32_t vtd_get_level_from_context_entry(VTDContextEntry *ce)
+static inline uint32_t vtd_ce_get_level(VTDContextEntry *ce)
 {
     return 2 + (ce->hi & VTD_CONTEXT_ENTRY_AW);
 }
 
-static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
+static inline uint32_t vtd_ce_get_agaw(VTDContextEntry *ce)
 {
     return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
 }
 
 static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
 {
-    uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
+    uint32_t ce_agaw = vtd_ce_get_agaw(ce);
     return 1ULL << MIN(ce_agaw, VTD_MGAW);
 }
 
@@ -642,8 +642,8 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
                              uint64_t *slptep, uint32_t *slpte_level,
                              bool *reads, bool *writes)
 {
-    dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
-    uint32_t level = vtd_get_level_from_context_entry(ce);
+    dma_addr_t addr = vtd_ce_get_slpt_base(ce);
+    uint32_t level = vtd_ce_get_level(ce);
     uint32_t offset;
     uint64_t slpte;
     uint64_t access_right_check;
@@ -664,7 +664,7 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
             VTD_DPRINTF(GENERAL, "error: fail to access second-level paging "
                         "entry at level %"PRIu32 " for iova 0x%"PRIx64,
                         level, iova);
-            if (level == vtd_get_level_from_context_entry(ce)) {
+            if (level == vtd_ce_get_level(ce)) {
                 /* Invalid programming of context-entry */
                 return -VTD_FR_CONTEXT_ENTRY_INV;
             } else {
@@ -809,8 +809,8 @@ static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
                          vtd_page_walk_hook hook_fn, void *private,
                          bool notify_unmap)
 {
-    dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
-    uint32_t level = vtd_get_level_from_context_entry(ce);
+    dma_addr_t addr = vtd_ce_get_slpt_base(ce);
+    uint32_t level = vtd_ce_get_level(ce);
 
     if (!vtd_iova_range_check(start, ce)) {
         return -VTD_FR_ADDR_BEYOND_MGAW;
@@ -851,7 +851,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         return ret_fr;
     }
 
-    if (!vtd_context_entry_present(ce)) {
+    if (!vtd_ce_present(ce)) {
         /* Not error - it's okay we don't have context entry. */
         trace_vtd_ce_not_present(bus_num, devfn);
         return -VTD_FR_CONTEXT_ENTRY_P;
@@ -861,7 +861,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         return -VTD_FR_CONTEXT_ENTRY_RSVD;
     }
     /* Check if the programming of context-entry is valid */
-    if (!vtd_is_level_supported(s, vtd_get_level_from_context_entry(ce))) {
+    if (!vtd_is_level_supported(s, vtd_ce_get_level(ce))) {
         trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_INV;
     } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 5/7] intel_iommu: provide vtd_ce_get_type()
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
                   ` (3 preceding siblings ...)
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 4/7] intel_iommu: renaming context entry helpers Peter Xu
@ 2017-04-17 11:32 ` Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 6/7] intel_iommu: use IOMMU_ACCESS_FLAG() Peter Xu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum

Helper to fetch VT-d context entry type.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4628f04..c8751ba 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -595,6 +595,11 @@ static inline uint32_t vtd_ce_get_agaw(VTDContextEntry *ce)
     return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
 }
 
+static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
+{
+    return ce->lo & VTD_CONTEXT_ENTRY_TT;
+}
+
 static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
 {
     uint32_t ce_agaw = vtd_ce_get_agaw(ce);
@@ -865,7 +870,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_INV;
     } else {
-        switch (ce->lo & VTD_CONTEXT_ENTRY_TT) {
+        switch (vtd_ce_get_type(ce)) {
         case VTD_CONTEXT_TT_MULTI_LEVEL:
             /* fall through */
         case VTD_CONTEXT_TT_DEV_IOTLB:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 6/7] intel_iommu: use IOMMU_ACCESS_FLAG()
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
                   ` (4 preceding siblings ...)
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 5/7] intel_iommu: provide vtd_ce_get_type() Peter Xu
@ 2017-04-17 11:32 ` Peter Xu
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT) Peter Xu
  2017-04-18 10:29 ` [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Paolo Bonzini
  7 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum

We have that now, so why not use it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c8751ba..05ae631 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1010,7 +1010,7 @@ out:
     entry->iova = addr & page_mask;
     entry->translated_addr = vtd_get_slpte_addr(slpte) & page_mask;
     entry->addr_mask = ~page_mask;
-    entry->perm = (writes ? 2 : 0) + (reads ? 1 : 0);
+    entry->perm = IOMMU_ACCESS_FLAG(reads, writes);
 }
 
 static void vtd_root_table_setup(IntelIOMMUState *s)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
                   ` (5 preceding siblings ...)
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 6/7] intel_iommu: use IOMMU_ACCESS_FLAG() Peter Xu
@ 2017-04-17 11:32 ` Peter Xu
  2017-04-18  4:30   ` Liu, Yi L
  2017-04-19  7:13   ` Liu, Yi L
  2017-04-18 10:29 ` [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Paolo Bonzini
  7 siblings, 2 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-17 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Lan Tianyu, peterx, Jason Wang,
	David Gibson, Marcel Apfelbaum

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 109 +++++++++++++++++++++++++++++++++--------
 hw/i386/intel_iommu_internal.h |   1 +
 hw/i386/trace-events           |   1 +
 hw/i386/x86-iommu.c            |   1 +
 include/hw/i386/x86-iommu.h    |   1 +
 5 files changed, 93 insertions(+), 20 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 05ae631..deb2007 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -872,7 +872,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     } else {
         switch (vtd_ce_get_type(ce)) {
         case VTD_CONTEXT_TT_MULTI_LEVEL:
-            /* fall through */
+        case VTD_CONTEXT_TT_PASS_THROUGH:
         case VTD_CONTEXT_TT_DEV_IOTLB:
             break;
         default:
@@ -883,6 +883,73 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     return 0;
 }
 
+/* Fetch translation type for specific device. Returns <0 if error
+ * happens, otherwise return the shifted type to check against
+ * VTD_CONTEXT_TT_*. */
+static int vtd_dev_get_trans_type(VTDAddressSpace *as)
+{
+    IntelIOMMUState *s;
+    VTDContextEntry ce;
+    int ret;
+
+    s = as->iommu_state;
+
+    ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
+                                   as->devfn, &ce);
+    if (ret) {
+        return ret;
+    }
+
+    return vtd_ce_get_type(&ce);
+}
+
+static bool vtd_dev_pt_enabled(VTDAddressSpace *as)
+{
+    int ret;
+
+    assert(as);
+
+    ret = vtd_dev_get_trans_type(as);
+    if (ret < 0) {
+        /*
+         * Possibly failed to parse the context entry for some reason
+         * (e.g., during init, or any guest configuration errors on
+         * context entries). We should assume PT not enabled for
+         * safety.
+         */
+        return false;
+    }
+
+    return ret == VTD_CONTEXT_TT_PASS_THROUGH;
+}
+
+static void vtd_switch_address_space(VTDAddressSpace *as)
+{
+    bool use_iommu;
+
+    assert(as);
+
+    use_iommu = as->iommu_state->dmar_enabled;
+    if (use_iommu) {
+        /* Further checks per-device configuration */
+        use_iommu &= !vtd_dev_pt_enabled(as);
+    }
+
+    trace_vtd_switch_address_space(pci_bus_num(as->bus),
+                                   VTD_PCI_SLOT(as->devfn),
+                                   VTD_PCI_FUNC(as->devfn),
+                                   use_iommu);
+
+    /* Turn off first then on the other */
+    if (use_iommu) {
+        memory_region_set_enabled(&as->sys_alias, false);
+        memory_region_set_enabled(&as->iommu, true);
+    } else {
+        memory_region_set_enabled(&as->iommu, false);
+        memory_region_set_enabled(&as->sys_alias, true);
+    }
+}
+
 static inline uint16_t vtd_make_source_id(uint8_t bus_num, uint8_t devfn)
 {
     return ((bus_num & 0xffUL) << 8) | (devfn & 0xffUL);
@@ -991,6 +1058,18 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
 
+    /*
+     * We don't need to translate for pass-through context entries.
+     * Also, let's ignore IOTLB caching as well for PT devices.
+     */
+    if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
+        entry->translated_addr = entry->iova;
+        entry->addr_mask = VTD_PAGE_SIZE - 1;
+        entry->perm = IOMMU_RW;
+        trace_vtd_translate_pt(source_id, entry->iova);
+        return;
+    }
+
     ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
                                &reads, &writes);
     if (ret_fr) {
@@ -1135,6 +1214,11 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                                              VTD_PCI_FUNC(devfn_it));
                 vtd_as->context_cache_entry.context_cache_gen = 0;
                 /*
+                 * Do switch address space when needed, in case if the
+                 * device passthrough bit is switched.
+                 */
+                vtd_switch_address_space(vtd_as);
+                /*
                  * So a device is moving out of (or moving into) a
                  * domain, a replay() suites here to notify all the
                  * IOMMU_NOTIFIER_MAP registers about this change.
@@ -1366,25 +1450,6 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
     vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
 }
 
-static void vtd_switch_address_space(VTDAddressSpace *as)
-{
-    assert(as);
-
-    trace_vtd_switch_address_space(pci_bus_num(as->bus),
-                                   VTD_PCI_SLOT(as->devfn),
-                                   VTD_PCI_FUNC(as->devfn),
-                                   as->iommu_state->dmar_enabled);
-
-    /* Turn off first then on the other */
-    if (as->iommu_state->dmar_enabled) {
-        memory_region_set_enabled(&as->sys_alias, false);
-        memory_region_set_enabled(&as->iommu, true);
-    } else {
-        memory_region_set_enabled(&as->iommu, false);
-        memory_region_set_enabled(&as->sys_alias, true);
-    }
-}
-
 static void vtd_switch_address_space_all(IntelIOMMUState *s)
 {
     GHashTableIter iter;
@@ -2849,6 +2914,10 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_DT;
     }
 
+    if (x86_iommu->pt_supported) {
+        s->ecap |= VTD_ECAP_PT;
+    }
+
     if (s->caching_mode) {
         s->cap |= VTD_CAP_CM;
     }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 29d6707..0e73a65 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -187,6 +187,7 @@
 /* Interrupt Remapping support */
 #define VTD_ECAP_IR                 (1ULL << 3)
 #define VTD_ECAP_EIM                (1ULL << 4)
+#define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
 
 /* CAP_REG */
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 04a6980..867ad0b 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -38,6 +38,7 @@ vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"P
 vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
 vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
 vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, uint64_t size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
+vtd_translate_pt(uint16_t sid, uint64_t addr) "source id 0x%"PRIu16", iova 0x%"PRIx64
 
 # hw/i386/amd_iommu.c
 amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" +  offset 0x%"PRIx32
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 02b8825..293caf8 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -91,6 +91,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 static Property x86_iommu_properties[] = {
     DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
     DEFINE_PROP_BOOL("device-iotlb", X86IOMMUState, dt_supported, false),
+    DEFINE_PROP_BOOL("pt", X86IOMMUState, pt_supported, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 361c07c..ef89c0c 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -74,6 +74,7 @@ struct X86IOMMUState {
     SysBusDevice busdev;
     bool intr_supported;        /* Whether vIOMMU supports IR */
     bool dt_supported;          /* Whether vIOMMU supports DT */
+    bool pt_supported;          /* Whether vIOMMU supports pass-through */
     IommuType type;             /* IOMMU type - AMD/Intel     */
     QLIST_HEAD(, IEC_Notifier) iec_notifiers; /* IEC notify list */
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate()
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate() Peter Xu
@ 2017-04-18  4:07   ` David Gibson
  0 siblings, 0 replies; 27+ messages in thread
From: David Gibson @ 2017-04-18  4:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Michael S . Tsirkin, Lan Tianyu, Jason Wang,
	Marcel Apfelbaum, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 9877 bytes --]

On Mon, Apr 17, 2017 at 07:32:04PM +0800, Peter Xu wrote:
> This patch converts the old "is_write" bool into IOMMUAccessFlags. The
> difference is that "is_write" can only express either read/write, but
> sometimes what we really want is "none" here (neither read nor write).
> Replay is an good example - during replay, we should not check any RW
> permission bits since thats not an actual IO at all.
> 
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

spapr specific part,

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  exec.c                   |  6 ++++--
>  hw/alpha/typhoon.c       |  2 +-
>  hw/dma/rc4030.c          |  2 +-
>  hw/i386/amd_iommu.c      |  4 ++--
>  hw/i386/intel_iommu.c    |  4 ++--
>  hw/pci-host/apb.c        |  2 +-
>  hw/ppc/spapr_iommu.c     |  2 +-
>  hw/s390x/s390-pci-bus.c  |  2 +-
>  hw/s390x/s390-pci-inst.c |  2 +-
>  include/exec/memory.h    | 10 ++++++++--
>  memory.c                 |  3 ++-
>  11 files changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index c97ef4a..188892b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -475,7 +475,8 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>              break;
>          }
>  
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write ?
> +                                         IOMMU_WO : IOMMU_RO);
>          if (!(iotlb.perm & (1 << is_write))) {
>              iotlb.target_as = NULL;
>              break;
> @@ -507,7 +508,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
>              break;
>          }
>  
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write ?
> +                                         IOMMU_WO : IOMMU_RO);
>          addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>                  | (addr & iotlb.addr_mask));
>          *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index f50f5cf..c1cf780 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -664,7 +664,7 @@ static bool window_translate(TyphoonWindow *win, hwaddr addr,
>  /* TODO: A translation failure here ought to set PCI error codes on the
>     Pchip and generate a machine check interrupt.  */
>  static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> -                                             bool is_write)
> +                                             IOMMUAccessFlags flag)
>  {
>      TyphoonPchip *pchip = container_of(iommu, TyphoonPchip, iommu);
>      IOMMUTLBEntry ret;
> diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c
> index 0080141..edf9432 100644
> --- a/hw/dma/rc4030.c
> +++ b/hw/dma/rc4030.c
> @@ -489,7 +489,7 @@ static const MemoryRegionOps jazzio_ops = {
>  };
>  
>  static IOMMUTLBEntry rc4030_dma_translate(MemoryRegion *iommu, hwaddr addr,
> -                                          bool is_write)
> +                                          IOMMUAccessFlags flag)
>  {
>      rc4030State *s = container_of(iommu, rc4030State, dma_mr);
>      IOMMUTLBEntry ret = {
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index f86a40a..42b34ef 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -987,7 +987,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>  }
>  
>  static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
> -                                     bool is_write)
> +                                     IOMMUAccessFlags flag)
>  {
>      AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
>      AMDVIState *s = as->iommu_state;
> @@ -1016,7 +1016,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
>          return ret;
>      }
>  
> -    amdvi_do_translate(as, addr, is_write, &ret);
> +    amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
>      trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
>              PCI_FUNC(as->devfn), addr, ret.translated_addr);
>      return ret;
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 02f047c..ea54ec3 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2221,7 +2221,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>  }
>  
>  static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
> -                                         bool is_write)
> +                                         IOMMUAccessFlags flag)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>      IntelIOMMUState *s = vtd_as->iommu_state;
> @@ -2243,7 +2243,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
>      }
>  
>      vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, addr,
> -                           is_write, &ret);
> +                           flag & IOMMU_WO, &ret);
>      VTD_DPRINTF(MMU,
>                  "bus %"PRIu8 " slot %"PRIu8 " func %"PRIu8 " devfn %"PRIu8
>                  " iova 0x%"PRIx64 " hpa 0x%"PRIx64, pci_bus_num(vtd_as->bus),
> diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
> index 653e711..ad7abb2 100644
> --- a/hw/pci-host/apb.c
> +++ b/hw/pci-host/apb.c
> @@ -209,7 +209,7 @@ static AddressSpace *pbm_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>  
>  /* Called from RCU critical section */
>  static IOMMUTLBEntry pbm_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> -                                         bool is_write)
> +                                         IOMMUAccessFlags flag)
>  {
>      IOMMUState *is = container_of(iommu, IOMMUState, iommu);
>      hwaddr baseaddr, offset;
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index ae30bbe..a3ae78d 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -110,7 +110,7 @@ static void spapr_tce_free_table(uint64_t *table, int fd, uint32_t nb_table)
>  
>  /* Called from RCU critical section */
>  static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> -                                               bool is_write)
> +                                               IOMMUAccessFlags flag)
>  {
>      sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>      uint64_t tce;
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 69b0291..9b2c344 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -355,7 +355,7 @@ out:
>  }
>  
>  static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *mr, hwaddr addr,
> -                                          bool is_write)
> +                                          IOMMUAccessFlags flag)
>  {
>      uint64_t pte;
>      uint32_t flags;
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index d2a8c0a..0087aa7 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -622,7 +622,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
>  
>      mr = &iommu->iommu_mr;
>      while (start < end) {
> -        entry = mr->iommu_ops->translate(mr, start, 0);
> +        entry = mr->iommu_ops->translate(mr, start, IOMMU_NONE);
>  
>          if (!entry.translated_addr) {
>              pbdev->state = ZPCI_FS_ERROR;
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c4fc94d..9047bf3 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -185,8 +185,14 @@ struct MemoryRegionOps {
>  typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
>  
>  struct MemoryRegionIOMMUOps {
> -    /* Return a TLB entry that contains a given address. */
> -    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
> +    /*
> +     * Return a TLB entry that contains a given address. Flag should
> +     * be the access permission of this translation operation. We can
> +     * set flag to IOMMU_NONE to mean that we don't need any
> +     * read/write permission checks, like, when for region replay.
> +     */
> +    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr,
> +                               IOMMUAccessFlags flag);
>      /* Returns minimum supported page size */
>      uint64_t (*get_min_page_size)(MemoryRegion *iommu);
>      /* Called when IOMMU Notifier flag changed */
> diff --git a/memory.c b/memory.c
> index b782d5b..47dc107 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1625,6 +1625,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>  {
>      hwaddr addr, granularity;
>      IOMMUTLBEntry iotlb;
> +    IOMMUAccessFlags flag = is_write ? IOMMU_WO : IOMMU_RO;
>  
>      /* If the IOMMU has its own replay callback, override */
>      if (mr->iommu_ops->replay) {
> @@ -1635,7 +1636,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>      granularity = memory_region_iommu_get_min_page_size(mr);
>  
>      for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = mr->iommu_ops->translate(mr, addr, flag);
>          if (iotlb.perm != IOMMU_NONE) {
>              n->notify(n, &iotlb);
>          }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/7] memory: remove the last param in memory_region_iommu_replay()
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 2/7] memory: remove the last param in memory_region_iommu_replay() Peter Xu
@ 2017-04-18  4:08   ` David Gibson
  0 siblings, 0 replies; 27+ messages in thread
From: David Gibson @ 2017-04-18  4:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Michael S . Tsirkin, Lan Tianyu, Jason Wang,
	Marcel Apfelbaum, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3690 bytes --]

On Mon, Apr 17, 2017 at 07:32:05PM +0800, Peter Xu wrote:
> We were always passing in that one as "false" to assume that's an read
> operation, and we also assume that IOMMU translation would always have
> that read permission. A better permission would be IOMMU_NONE since the
> replay is after all not a real read operation, but just a page table
> rebuilding process.
> 
> CC: David Gibson <david@gibson.dropbear.id.au>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/vfio/common.c      | 2 +-
>  include/exec/memory.h | 5 +----
>  memory.c              | 8 +++-----
>  3 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6b33b9f..d008a4b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -488,7 +488,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>  
>          memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> -        memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
> +        memory_region_iommu_replay(giommu->iommu, &giommu->n);
>  
>          return;
>      }
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 9047bf3..8721d53 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -731,11 +731,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
>   *
>   * @mr: the memory region to observe
>   * @n: the notifier to which to replay iommu mappings
> - * @is_write: Whether to treat the replay as a translate "write"
> - *     through the iommu
>   */
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> -                                bool is_write);
> +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n);
>  
>  /**
>   * memory_region_iommu_replay_all: replay existing IOMMU translations
> diff --git a/memory.c b/memory.c
> index 47dc107..6b2fdb7 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1620,12 +1620,10 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
>      return TARGET_PAGE_SIZE;
>  }
>  
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> -                                bool is_write)
> +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
>  {
>      hwaddr addr, granularity;
>      IOMMUTLBEntry iotlb;
> -    IOMMUAccessFlags flag = is_write ? IOMMU_WO : IOMMU_RO;
>  
>      /* If the IOMMU has its own replay callback, override */
>      if (mr->iommu_ops->replay) {
> @@ -1636,7 +1634,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>      granularity = memory_region_iommu_get_min_page_size(mr);
>  
>      for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> -        iotlb = mr->iommu_ops->translate(mr, addr, flag);
> +        iotlb = mr->iommu_ops->translate(mr, addr, IOMMU_NONE);
>          if (iotlb.perm != IOMMU_NONE) {
>              n->notify(n, &iotlb);
>          }
> @@ -1654,7 +1652,7 @@ void memory_region_iommu_replay_all(MemoryRegion *mr)
>      IOMMUNotifier *notifier;
>  
>      IOMMU_NOTIFIER_FOREACH(notifier, mr) {
> -        memory_region_iommu_replay(mr, notifier, false);
> +        memory_region_iommu_replay(mr, notifier);
>      }
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT) Peter Xu
@ 2017-04-18  4:30   ` Liu, Yi L
  2017-04-18  4:54     ` Peter Xu
  2017-04-19  7:13   ` Liu, Yi L
  1 sibling, 1 reply; 27+ messages in thread
From: Liu, Yi L @ 2017-04-18  4:30 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Lan, Tianyu, Michael S . Tsirkin, Jason Wang, Marcel Apfelbaum,
	David Gibson

> -----Original Message-----
> From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org] On
> Behalf Of Peter Xu
> Sent: Monday, April 17, 2017 7:32 PM
> To: qemu-devel@nongnu.org
> Cc: Lan, Tianyu <tianyu.lan@intel.com>; Michael S . Tsirkin <mst@redhat.com>;
> Jason Wang <jasowang@redhat.com>; peterx@redhat.com; Marcel Apfelbaum
> <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>
> Subject: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c          | 109 +++++++++++++++++++++++++++++++++--------
>  hw/i386/intel_iommu_internal.h |   1 +
>  hw/i386/trace-events           |   1 +
>  hw/i386/x86-iommu.c            |   1 +
>  include/hw/i386/x86-iommu.h    |   1 +
>  5 files changed, 93 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> 05ae631..deb2007 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -872,7 +872,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> uint8_t bus_num,
>      } else {
>          switch (vtd_ce_get_type(ce)) {
>          case VTD_CONTEXT_TT_MULTI_LEVEL:
> -            /* fall through */
> +        case VTD_CONTEXT_TT_PASS_THROUGH:
>          case VTD_CONTEXT_TT_DEV_IOTLB:
>              break;
>          default:
> @@ -883,6 +883,73 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> uint8_t bus_num,
>      return 0;
>  }
> 
> +/* Fetch translation type for specific device. Returns <0 if error
> + * happens, otherwise return the shifted type to check against
> + * VTD_CONTEXT_TT_*. */
> +static int vtd_dev_get_trans_type(VTDAddressSpace *as) {
> +    IntelIOMMUState *s;
> +    VTDContextEntry ce;
> +    int ret;
> +
> +    s = as->iommu_state;
> +
> +    ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
> +                                   as->devfn, &ce);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    return vtd_ce_get_type(&ce);
> +}
> +
> +static bool vtd_dev_pt_enabled(VTDAddressSpace *as) {
> +    int ret;
> +
> +    assert(as);
> +
> +    ret = vtd_dev_get_trans_type(as);
> +    if (ret < 0) {
> +        /*
> +         * Possibly failed to parse the context entry for some reason
> +         * (e.g., during init, or any guest configuration errors on
> +         * context entries). We should assume PT not enabled for
> +         * safety.
> +         */
> +        return false;
> +    }
> +
> +    return ret == VTD_CONTEXT_TT_PASS_THROUGH; }
> +
> +static void vtd_switch_address_space(VTDAddressSpace *as) {
> +    bool use_iommu;
> +
> +    assert(as);
> +
> +    use_iommu = as->iommu_state->dmar_enabled;
> +    if (use_iommu) {
> +        /* Further checks per-device configuration */
> +        use_iommu &= !vtd_dev_pt_enabled(as);
> +    }
> +
> +    trace_vtd_switch_address_space(pci_bus_num(as->bus),
> +                                   VTD_PCI_SLOT(as->devfn),
> +                                   VTD_PCI_FUNC(as->devfn),
> +                                   use_iommu);

Hi Peter,

Skip address space switching is a good idea to support Passthru mode.
However, without the address space, the vfio notifier would not be
registered, thus vIOMMU emulator has no way to connect to host. It is
no harm if there is only map/unmap notifier. But if we have more notifiers
other than map/unmap, it may be a problem.

I think we need to reconsider it here. 

Regards,
Yi L
> +    /* Turn off first then on the other */
> +    if (use_iommu) {
> +        memory_region_set_enabled(&as->sys_alias, false);
> +        memory_region_set_enabled(&as->iommu, true);
> +    } else {
> +        memory_region_set_enabled(&as->iommu, false);
> +        memory_region_set_enabled(&as->sys_alias, true);
> +    }
> +}
> +
>  static inline uint16_t vtd_make_source_id(uint8_t bus_num, uint8_t devfn)  {
>      return ((bus_num & 0xffUL) << 8) | (devfn & 0xffUL); @@ -991,6 +1058,18 @@
> static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>          cc_entry->context_cache_gen = s->context_cache_gen;
>      }
> 
> +    /*
> +     * We don't need to translate for pass-through context entries.
> +     * Also, let's ignore IOTLB caching as well for PT devices.
> +     */
> +    if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
> +        entry->translated_addr = entry->iova;
> +        entry->addr_mask = VTD_PAGE_SIZE - 1;
> +        entry->perm = IOMMU_RW;
> +        trace_vtd_translate_pt(source_id, entry->iova);
> +        return;
> +    }
> +
>      ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
>                                 &reads, &writes);
>      if (ret_fr) {
> @@ -1135,6 +1214,11 @@ static void
> vtd_context_device_invalidate(IntelIOMMUState *s,
>                                               VTD_PCI_FUNC(devfn_it));
>                  vtd_as->context_cache_entry.context_cache_gen = 0;
>                  /*
> +                 * Do switch address space when needed, in case if the
> +                 * device passthrough bit is switched.
> +                 */
> +                vtd_switch_address_space(vtd_as);
> +                /*
>                   * So a device is moving out of (or moving into) a
>                   * domain, a replay() suites here to notify all the
>                   * IOMMU_NOTIFIER_MAP registers about this change.
> @@ -1366,25 +1450,6 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
>      vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);  }
> 
> -static void vtd_switch_address_space(VTDAddressSpace *as) -{
> -    assert(as);
> -
> -    trace_vtd_switch_address_space(pci_bus_num(as->bus),
> -                                   VTD_PCI_SLOT(as->devfn),
> -                                   VTD_PCI_FUNC(as->devfn),
> -                                   as->iommu_state->dmar_enabled);
> -
> -    /* Turn off first then on the other */
> -    if (as->iommu_state->dmar_enabled) {
> -        memory_region_set_enabled(&as->sys_alias, false);
> -        memory_region_set_enabled(&as->iommu, true);
> -    } else {
> -        memory_region_set_enabled(&as->iommu, false);
> -        memory_region_set_enabled(&as->sys_alias, true);
> -    }
> -}
> -
>  static void vtd_switch_address_space_all(IntelIOMMUState *s)  {
>      GHashTableIter iter;
> @@ -2849,6 +2914,10 @@ static void vtd_init(IntelIOMMUState *s)
>          s->ecap |= VTD_ECAP_DT;
>      }
> 
> +    if (x86_iommu->pt_supported) {
> +        s->ecap |= VTD_ECAP_PT;
> +    }
> +
>      if (s->caching_mode) {
>          s->cap |= VTD_CAP_CM;
>      }
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index
> 29d6707..0e73a65 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -187,6 +187,7 @@
>  /* Interrupt Remapping support */
>  #define VTD_ECAP_IR                 (1ULL << 3)
>  #define VTD_ECAP_EIM                (1ULL << 4)
> +#define VTD_ECAP_PT                 (1ULL << 6)
>  #define VTD_ECAP_MHMV               (15ULL << 20)
> 
>  /* CAP_REG */
> diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 04a6980..867ad0b
> 100644
> --- a/hw/i386/trace-events
> +++ b/hw/i386/trace-events
> @@ -38,6 +38,7 @@ vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page
> walk skip iova 0x%"P  vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next)
> "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
>  vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on)
> "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
>  vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, uint64_t
> size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
> +vtd_translate_pt(uint16_t sid, uint64_t addr) "source id 0x%"PRIu16",
> +iova 0x%"PRIx64
> 
>  # hw/i386/amd_iommu.c
>  amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr
> 0x%"PRIx64" +  offset 0x%"PRIx32 diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-
> iommu.c index 02b8825..293caf8 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -91,6 +91,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
> static Property x86_iommu_properties[] = {
>      DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
>      DEFINE_PROP_BOOL("device-iotlb", X86IOMMUState, dt_supported, false),
> +    DEFINE_PROP_BOOL("pt", X86IOMMUState, pt_supported, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> 
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h index
> 361c07c..ef89c0c 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -74,6 +74,7 @@ struct X86IOMMUState {
>      SysBusDevice busdev;
>      bool intr_supported;        /* Whether vIOMMU supports IR */
>      bool dt_supported;          /* Whether vIOMMU supports DT */
> +    bool pt_supported;          /* Whether vIOMMU supports pass-through */
>      IommuType type;             /* IOMMU type - AMD/Intel     */
>      QLIST_HEAD(, IEC_Notifier) iec_notifiers; /* IEC notify list */  };
> --
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-18  4:30   ` Liu, Yi L
@ 2017-04-18  4:54     ` Peter Xu
  2017-04-18  6:02       ` Liu, Yi L
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2017-04-18  4:54 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, Lan, Tianyu, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson

On Tue, Apr 18, 2017 at 04:30:40AM +0000, Liu, Yi L wrote:

[...]

> > +static void vtd_switch_address_space(VTDAddressSpace *as) {
> > +    bool use_iommu;
> > +
> > +    assert(as);
> > +
> > +    use_iommu = as->iommu_state->dmar_enabled;
> > +    if (use_iommu) {
> > +        /* Further checks per-device configuration */
> > +        use_iommu &= !vtd_dev_pt_enabled(as);
> > +    }
> > +
> > +    trace_vtd_switch_address_space(pci_bus_num(as->bus),
> > +                                   VTD_PCI_SLOT(as->devfn),
> > +                                   VTD_PCI_FUNC(as->devfn),
> > +                                   use_iommu);
> 
> Hi Peter,
> 
> Skip address space switching is a good idea to support Passthru mode.
> However, without the address space, the vfio notifier would not be
> registered, thus vIOMMU emulator has no way to connect to host. It is
> no harm if there is only map/unmap notifier. But if we have more notifiers
> other than map/unmap, it may be a problem.
> 
> I think we need to reconsider it here. 

For now I think as switching is good to us in general. Could I know
more context about this? Would it be okay to work on top of this in
the future?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-18  4:54     ` Peter Xu
@ 2017-04-18  6:02       ` Liu, Yi L
  2017-04-18  7:27         ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Liu, Yi L @ 2017-04-18  6:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Lan, Tianyu, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson

> > Hi Peter,
> >
> > Skip address space switching is a good idea to support Passthru mode.
> > However, without the address space, the vfio notifier would not be
> > registered, thus vIOMMU emulator has no way to connect to host. It is
> > no harm if there is only map/unmap notifier. But if we have more
> > notifiers other than map/unmap, it may be a problem.
> >
> > I think we need to reconsider it here.
> 
> For now I think as switching is good to us in general. Could I know more context
> about this? Would it be okay to work on top of this in the future?
> 

Let me explain. For vSVM enabling, it needs to add new notifier to bind gPASID table
to host. If an assigned device uses SVM in guest, as designed guest IOMMU driver would
set "pt" for it. This is to make sure the host second-level page table stores GPA->HPA
mapping. So that pIOMMU can do nested translation to achieve gVA->GPA GPA->HPA 
mapping.

So the situation is if we want to keep GPA->HPA mapping, we should skip address space
switch, but the vfio listener would not know vIOMMU is added, so no vfio notifier would
be registered. But if we do the switch, the GPA->HPA mapping is unmapped. And need
another way to build the GPA->HPA mapping.

I think we may have two choice here. Pls let me know your ideas.

1. skip the switch for "pt" mode, find other way to register vfio notifiers. not sure if this
is workable. Just a quick thought.

2. do the switch, and rebuild GPA->HPA mapping if device use "pt" mode. For this option,
I also have two way to build the GPA->HPA mapping.
a) walk all the memory region sections in address_space_memory, and build the mapping.
address_space_memory.dispatch->map.sections, sections stores all the mapped sections.

b) let guest iommu driver mimics a 1:1 mapping for the devices use "pt" mode. in this way,
the GPA->HPA mapping could be setup by walking the guest SL page table. This is consistent
with your vIOVA replay solution.

Also I'm curious if Tianyu's fault report framework needs to register new notifiers.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-18  6:02       ` Liu, Yi L
@ 2017-04-18  7:27         ` Peter Xu
  2017-04-18  9:04           ` Liu, Yi L
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2017-04-18  7:27 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, Lan, Tianyu, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson

On Tue, Apr 18, 2017 at 06:02:44AM +0000, Liu, Yi L wrote:
> > > Hi Peter,
> > >
> > > Skip address space switching is a good idea to support Passthru mode.
> > > However, without the address space, the vfio notifier would not be
> > > registered, thus vIOMMU emulator has no way to connect to host. It is
> > > no harm if there is only map/unmap notifier. But if we have more
> > > notifiers other than map/unmap, it may be a problem.
> > >
> > > I think we need to reconsider it here.
> > 
> > For now I think as switching is good to us in general. Could I know more context
> > about this? Would it be okay to work on top of this in the future?
> > 
> 
> Let me explain. For vSVM enabling, it needs to add new notifier to bind gPASID table
> to host. If an assigned device uses SVM in guest, as designed guest IOMMU driver would
> set "pt" for it. This is to make sure the host second-level page table stores GPA->HPA
> mapping. So that pIOMMU can do nested translation to achieve gVA->GPA GPA->HPA 
> mapping.

That should mean that in the guest it will only enable first level
translation, while in the host it'll be nested (first + second)? Then
you should be trapping the guest extended context entry invalidation,
then pass the PASID table pointer downward to the host IOMMU, am I
correct?

> 
> So the situation is if we want to keep GPA->HPA mapping, we should skip address space
> switch, but the vfio listener would not know vIOMMU is added, so no vfio notifier would
> be registered. But if we do the switch, the GPA->HPA mapping is unmapped. And need
> another way to build the GPA->HPA mapping.

If my understanding above is correct, current IOMMU notifier may not
satisfy your need. If you see the current notify interface, it's
delivering IOMMUTLBEntry. I suspect it only suites for IOTLB
notifications, but not if you want to pass some pointer (one GPA)
downward somewhere.

> 
> I think we may have two choice here. Pls let me know your ideas.
> 
> 1. skip the switch for "pt" mode, find other way to register vfio notifiers. not sure if this
> is workable. Just a quick thought.
> 
> 2. do the switch, and rebuild GPA->HPA mapping if device use "pt" mode. For this option,
> I also have two way to build the GPA->HPA mapping.
> a) walk all the memory region sections in address_space_memory, and build the mapping.
> address_space_memory.dispatch->map.sections, sections stores all the mapped sections.
> 
> b) let guest iommu driver mimics a 1:1 mapping for the devices use "pt" mode. in this way,
> the GPA->HPA mapping could be setup by walking the guest SL page table. This is consistent
> with your vIOVA replay solution.

I'll prefer (1). Reason explained above.

> 
> Also I'm curious if Tianyu's fault report framework needs to register new notifiers.

For Tianyu's case, I feel like we need other kind of notifier as well,
but not the current IOTLB one. And, of course in this case it'll be in
an reverted direction, which is from device to vIOMMU.

(I am thinking whether it's a good time now to let any PCI device able
 to fetch its direct owner IOMMU object, then it can register anything
 it want onto that object maybe?)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-18  7:27         ` Peter Xu
@ 2017-04-18  9:04           ` Liu, Yi L
  2017-04-19  7:27             ` Lan Tianyu
  0 siblings, 1 reply; 27+ messages in thread
From: Liu, Yi L @ 2017-04-18  9:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Lan, Tianyu, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson, Tian, Kevin

> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, April 18, 2017 3:27 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: qemu-devel@nongnu.org; Lan, Tianyu <tianyu.lan@intel.com>; Michael S .
> Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>; Marcel
> Apfelbaum <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>
> Subject: Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> 
> On Tue, Apr 18, 2017 at 06:02:44AM +0000, Liu, Yi L wrote:
> > > > Hi Peter,
> > > >
> > > > Skip address space switching is a good idea to support Passthru mode.
> > > > However, without the address space, the vfio notifier would not be
> > > > registered, thus vIOMMU emulator has no way to connect to host. It
> > > > is no harm if there is only map/unmap notifier. But if we have
> > > > more notifiers other than map/unmap, it may be a problem.
> > > >
> > > > I think we need to reconsider it here.
> > >
> > > For now I think as switching is good to us in general. Could I know
> > > more context about this? Would it be okay to work on top of this in the future?
> > >
> >
> > Let me explain. For vSVM enabling, it needs to add new notifier to
> > bind gPASID table to host. If an assigned device uses SVM in guest, as
> > designed guest IOMMU driver would set "pt" for it. This is to make
> > sure the host second-level page table stores GPA->HPA mapping. So that
> > pIOMMU can do nested translation to achieve gVA->GPA GPA->HPA mapping.
> 
> That should mean that in the guest it will only enable first level translation, while in
> the host it'll be nested (first + second)? Then you should be trapping the guest
> extended context entry invalidation, then pass the PASID table pointer downward to
> the host IOMMU, am I correct?

exactly.

> 
> >
> > So the situation is if we want to keep GPA->HPA mapping, we should
> > skip address space switch, but the vfio listener would not know vIOMMU
> > is added, so no vfio notifier would be registered. But if we do the
> > switch, the GPA->HPA mapping is unmapped. And need another way to build the
> GPA->HPA mapping.
> 
> If my understanding above is correct, current IOMMU notifier may not satisfy your
> need. If you see the current notify interface, it's delivering IOMMUTLBEntry. I
> suspect it only suites for IOTLB notifications, but not if you want to pass some
> pointer (one GPA) downward somewhere.

Yeah, you got it more than absolutely. One of my patch is to modify the para to be
void * and let each notifiers to translate separately. I think it should be a reasonable
change.

Supposedly, I would send RFC for vSVM soon. We may talk more it at that thread.

> >
> > I think we may have two choice here. Pls let me know your ideas.
> >
> > 1. skip the switch for "pt" mode, find other way to register vfio
> > notifiers. not sure if this is workable. Just a quick thought.
> >
> > 2. do the switch, and rebuild GPA->HPA mapping if device use "pt"
> > mode. For this option, I also have two way to build the GPA->HPA mapping.
> > a) walk all the memory region sections in address_space_memory, and build the
> mapping.
> > address_space_memory.dispatch->map.sections, sections stores all the mapped
> sections.
> >
> > b) let guest iommu driver mimics a 1:1 mapping for the devices use
> > "pt" mode. in this way, the GPA->HPA mapping could be setup by walking
> > the guest SL page table. This is consistent with your vIOVA replay solution.
> 
> I'll prefer (1). Reason explained above.
> 
> >
> > Also I'm curious if Tianyu's fault report framework needs to register new notifiers.
> 
> For Tianyu's case, I feel like we need other kind of notifier as well, but not the
> current IOTLB one. And, of course in this case it'll be in an reverted direction, which
> is from device to vIOMMU.
> 
> (I am thinking whether it's a good time now to let any PCI device able  to fetch its
> direct owner IOMMU object, then it can register anything  it want onto that object
> maybe?)
> 
Hmmm, let's see Tianyu's comment as he's driving fault report. Let's keep in touch on
this Passthru-Mode enabling.

Thanks,
Yi L


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes
  2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
                   ` (6 preceding siblings ...)
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT) Peter Xu
@ 2017-04-18 10:29 ` Paolo Bonzini
  7 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2017-04-18 10:29 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Lan Tianyu, Michael S . Tsirkin, Jason Wang, Marcel Apfelbaum,
	David Gibson



On 17/04/2017 13:32, Peter Xu wrote:
> v2:
> - resending v1 since my patch script had a regression, which failed to
>   send the patches to the list... trying again. sorry for the noise!
> 
> This series is for 2.10, and based on following series:
> 
>     [PATCH v8 0/9] VT-d: vfio enablement and misc enhances
> 
> This series add support for per-device passthrough mode for VT-d
> emulation, along with some tweaks on existing codes.
> 
> Patches 1-2 are memory related cleanups.

These two:

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT) Peter Xu
  2017-04-18  4:30   ` Liu, Yi L
@ 2017-04-19  7:13   ` Liu, Yi L
  2017-04-20  3:07     ` Peter Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Liu, Yi L @ 2017-04-19  7:13 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Lan, Tianyu, Michael S . Tsirkin, Jason Wang, Marcel Apfelbaum,
	David Gibson

Peter,

Besides the gPA->hPA mapping, pt mode enabling requires some sanity check on
s->pt_supported. See comments inline.

> -----Original Message-----
> From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org] On
> Behalf Of Peter Xu
> Sent: Monday, April 17, 2017 7:32 PM
> To: qemu-devel@nongnu.org
> Cc: Lan, Tianyu <tianyu.lan@intel.com>; Michael S . Tsirkin <mst@redhat.com>;
> Jason Wang <jasowang@redhat.com>; peterx@redhat.com; Marcel Apfelbaum
> <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>
> Subject: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c          | 109 +++++++++++++++++++++++++++++++++--------
>  hw/i386/intel_iommu_internal.h |   1 +
>  hw/i386/trace-events           |   1 +
>  hw/i386/x86-iommu.c            |   1 +
>  include/hw/i386/x86-iommu.h    |   1 +
>  5 files changed, 93 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> 05ae631..deb2007 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -872,7 +872,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> uint8_t bus_num,
>      } else {
>          switch (vtd_ce_get_type(ce)) {
>          case VTD_CONTEXT_TT_MULTI_LEVEL:
> -            /* fall through */
> +        case VTD_CONTEXT_TT_PASS_THROUGH:

Passthru type is a reserved type if ecap.PT=0, so add a check here. If s->pt_supported
is false, report this translation type as unsupported and report an invalid context 
programming to guest.

>          case VTD_CONTEXT_TT_DEV_IOTLB:
>              break;
>          default:
> @@ -883,6 +883,73 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> uint8_t bus_num,
>      return 0;
>  }
> 
> +/* Fetch translation type for specific device. Returns <0 if error
> + * happens, otherwise return the shifted type to check against
> + * VTD_CONTEXT_TT_*. */
> +static int vtd_dev_get_trans_type(VTDAddressSpace *as) {
> +    IntelIOMMUState *s;
> +    VTDContextEntry ce;
> +    int ret;
> +
> +    s = as->iommu_state;
> +
> +    ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
> +                                   as->devfn, &ce);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    return vtd_ce_get_type(&ce);
> +}
> +
> +static bool vtd_dev_pt_enabled(VTDAddressSpace *as) {
> +    int ret;
> +
> +    assert(as);
> +
> +    ret = vtd_dev_get_trans_type(as);
> +    if (ret < 0) {
> +        /*
> +         * Possibly failed to parse the context entry for some reason
> +         * (e.g., during init, or any guest configuration errors on
> +         * context entries). We should assume PT not enabled for
> +         * safety.
> +         */
> +        return false;
> +    }
> +
> +    return ret == VTD_CONTEXT_TT_PASS_THROUGH; }
> +
> +static void vtd_switch_address_space(VTDAddressSpace *as) {
> +    bool use_iommu;
> +
> +    assert(as);
> +
> +    use_iommu = as->iommu_state->dmar_enabled;
> +    if (use_iommu) {

if s->pt_supported is false, no need to check the context.TT, even
context.TT=pt, then it should be an invalid context programming.

Regards,
Yi L

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-18  9:04           ` Liu, Yi L
@ 2017-04-19  7:27             ` Lan Tianyu
  2017-04-20  3:04               ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Lan Tianyu @ 2017-04-19  7:27 UTC (permalink / raw)
  To: Liu, Yi L, Peter Xu
  Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Marcel Apfelbaum,
	David Gibson, Tian, Kevin

Hi All:
	Sorry for later response.
On 2017年04月18日 17:04, Liu, Yi L wrote:
>> -----Original Message-----
>> From: Peter Xu [mailto:peterx@redhat.com]
>> Sent: Tuesday, April 18, 2017 3:27 PM
>> To: Liu, Yi L <yi.l.liu@intel.com>
>> Cc: qemu-devel@nongnu.org; Lan, Tianyu <tianyu.lan@intel.com>; Michael S .
>> Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>; Marcel
>> Apfelbaum <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>
>> Subject: Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
>>
>> On Tue, Apr 18, 2017 at 06:02:44AM +0000, Liu, Yi L wrote:
>>>>> Hi Peter,
>>>>>
>>>>> Skip address space switching is a good idea to support Passthru mode.
>>>>> However, without the address space, the vfio notifier would not be
>>>>> registered, thus vIOMMU emulator has no way to connect to host. It
>>>>> is no harm if there is only map/unmap notifier. But if we have
>>>>> more notifiers other than map/unmap, it may be a problem.
>>>>>
>>>>> I think we need to reconsider it here.
>>>>
>>>> For now I think as switching is good to us in general. Could I know
>>>> more context about this? Would it be okay to work on top of this in the future?
>>>>
>>>
>>> Let me explain. For vSVM enabling, it needs to add new notifier to
>>> bind gPASID table to host. If an assigned device uses SVM in guest, as
>>> designed guest IOMMU driver would set "pt" for it. This is to make
>>> sure the host second-level page table stores GPA->HPA mapping. So that
>>> pIOMMU can do nested translation to achieve gVA->GPA GPA->HPA mapping.
>>
>> That should mean that in the guest it will only enable first level translation, while in
>> the host it'll be nested (first + second)? Then you should be trapping the guest
>> extended context entry invalidation, then pass the PASID table pointer downward to
>> the host IOMMU, am I correct?
> 
> exactly.
> 
>>
>>>
>>> So the situation is if we want to keep GPA->HPA mapping, we should
>>> skip address space switch, but the vfio listener would not know vIOMMU
>>> is added, so no vfio notifier would be registered. But if we do the
>>> switch, the GPA->HPA mapping is unmapped. And need another way to build the
>> GPA->HPA mapping.
>>
>> If my understanding above is correct, current IOMMU notifier may not satisfy your
>> need. If you see the current notify interface, it's delivering IOMMUTLBEntry. I
>> suspect it only suites for IOTLB notifications, but not if you want to pass some
>> pointer (one GPA) downward somewhere.
> 
> Yeah, you got it more than absolutely. One of my patch is to modify the para to be
> void * and let each notifiers to translate separately. I think it should be a reasonable
> change.
> 
> Supposedly, I would send RFC for vSVM soon. We may talk more it at that thread.
> 
>>>
>>> I think we may have two choice here. Pls let me know your ideas.
>>>
>>> 1. skip the switch for "pt" mode, find other way to register vfio
>>> notifiers. not sure if this is workable. Just a quick thought.
>>>
>>> 2. do the switch, and rebuild GPA->HPA mapping if device use "pt"
>>> mode. For this option, I also have two way to build the GPA->HPA mapping.
>>> a) walk all the memory region sections in address_space_memory, and build the
>> mapping.
>>> address_space_memory.dispatch->map.sections, sections stores all the mapped
>> sections.
>>>
>>> b) let guest iommu driver mimics a 1:1 mapping for the devices use
>>> "pt" mode. in this way, the GPA->HPA mapping could be setup by walking
>>> the guest SL page table. This is consistent with your vIOVA replay solution.
>>
>> I'll prefer (1). Reason explained above.
>>
>>>
>>> Also I'm curious if Tianyu's fault report framework needs to register new notifiers.
>>
>> For Tianyu's case, I feel like we need other kind of notifier as well, but not the
>> current IOTLB one. And, of course in this case it'll be in an reverted direction, which
>> is from device to vIOMMU.
>>
>> (I am thinking whether it's a good time now to let any PCI device able  to fetch its
>> direct owner IOMMU object, then it can register anything  it want onto that object
>> maybe?)
>>
> Hmmm, let's see Tianyu's comment as he's driving fault report. Let's keep in touch on
> this Passthru-Mode enabling.

In my previous RFC patchset of fault event reporting, I registered fault
notifier when there is a VFIO group attached to VFIO container and used
the address space to check whether vIOMMU is added. The address space is
returned by vtd_host_dma_iommu(). vtd_find_add_as() initializes device's
IOMMU memory region and put it into device's address space as root
memory region.
Does this make sense?

@@ -1103,6 +1132,14 @@ static int vfio_connect_container(VFIOGroup
*group, AddressSpace *as,
         goto listener_release_exit;
     }

+    if (memory_region_is_iommu(container->space->as->root)) {
+        if (vfio_set_iommu_fault_notifier(container)) {
+            error_setg_errno(errp, -ret,
+                "Fail to set IOMMU fault notifier");
+            goto listener_release_exit;
+        }
+    }
+
     container->initialized = true;

-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-19  7:27             ` Lan Tianyu
@ 2017-04-20  3:04               ` Peter Xu
  2017-04-20  4:55                 ` Liu, Yi L
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2017-04-20  3:04 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: Liu, Yi L, qemu-devel, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson, Tian, Kevin

On Wed, Apr 19, 2017 at 03:27:52PM +0800, Lan Tianyu wrote:
> Hi All:
> 	Sorry for later response.
> On 2017年04月18日 17:04, Liu, Yi L wrote:
> >> -----Original Message-----
> >> From: Peter Xu [mailto:peterx@redhat.com]
> >> Sent: Tuesday, April 18, 2017 3:27 PM
> >> To: Liu, Yi L <yi.l.liu@intel.com>
> >> Cc: qemu-devel@nongnu.org; Lan, Tianyu <tianyu.lan@intel.com>; Michael S .
> >> Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>; Marcel
> >> Apfelbaum <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>
> >> Subject: Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> >>
> >> On Tue, Apr 18, 2017 at 06:02:44AM +0000, Liu, Yi L wrote:
> >>>>> Hi Peter,
> >>>>>
> >>>>> Skip address space switching is a good idea to support Passthru mode.
> >>>>> However, without the address space, the vfio notifier would not be
> >>>>> registered, thus vIOMMU emulator has no way to connect to host. It
> >>>>> is no harm if there is only map/unmap notifier. But if we have
> >>>>> more notifiers other than map/unmap, it may be a problem.
> >>>>>
> >>>>> I think we need to reconsider it here.
> >>>>
> >>>> For now I think as switching is good to us in general. Could I know
> >>>> more context about this? Would it be okay to work on top of this in the future?
> >>>>
> >>>
> >>> Let me explain. For vSVM enabling, it needs to add new notifier to
> >>> bind gPASID table to host. If an assigned device uses SVM in guest, as
> >>> designed guest IOMMU driver would set "pt" for it. This is to make
> >>> sure the host second-level page table stores GPA->HPA mapping. So that
> >>> pIOMMU can do nested translation to achieve gVA->GPA GPA->HPA mapping.
> >>
> >> That should mean that in the guest it will only enable first level translation, while in
> >> the host it'll be nested (first + second)? Then you should be trapping the guest
> >> extended context entry invalidation, then pass the PASID table pointer downward to
> >> the host IOMMU, am I correct?
> > 
> > exactly.
> > 
> >>
> >>>
> >>> So the situation is if we want to keep GPA->HPA mapping, we should
> >>> skip address space switch, but the vfio listener would not know vIOMMU
> >>> is added, so no vfio notifier would be registered. But if we do the
> >>> switch, the GPA->HPA mapping is unmapped. And need another way to build the
> >> GPA->HPA mapping.
> >>
> >> If my understanding above is correct, current IOMMU notifier may not satisfy your
> >> need. If you see the current notify interface, it's delivering IOMMUTLBEntry. I
> >> suspect it only suites for IOTLB notifications, but not if you want to pass some
> >> pointer (one GPA) downward somewhere.
> > 
> > Yeah, you got it more than absolutely. One of my patch is to modify the para to be
> > void * and let each notifiers to translate separately. I think it should be a reasonable
> > change.
> > 
> > Supposedly, I would send RFC for vSVM soon. We may talk more it at that thread.

I suspect whether it'll be good that we hang everything under current
IOMMU notifiers... But sure I'm looking forward to your RFC. :)

> > 
> >>>
> >>> I think we may have two choice here. Pls let me know your ideas.
> >>>
> >>> 1. skip the switch for "pt" mode, find other way to register vfio
> >>> notifiers. not sure if this is workable. Just a quick thought.
> >>>
> >>> 2. do the switch, and rebuild GPA->HPA mapping if device use "pt"
> >>> mode. For this option, I also have two way to build the GPA->HPA mapping.
> >>> a) walk all the memory region sections in address_space_memory, and build the
> >> mapping.
> >>> address_space_memory.dispatch->map.sections, sections stores all the mapped
> >> sections.
> >>>
> >>> b) let guest iommu driver mimics a 1:1 mapping for the devices use
> >>> "pt" mode. in this way, the GPA->HPA mapping could be setup by walking
> >>> the guest SL page table. This is consistent with your vIOVA replay solution.
> >>
> >> I'll prefer (1). Reason explained above.
> >>
> >>>
> >>> Also I'm curious if Tianyu's fault report framework needs to register new notifiers.
> >>
> >> For Tianyu's case, I feel like we need other kind of notifier as well, but not the
> >> current IOTLB one. And, of course in this case it'll be in an reverted direction, which
> >> is from device to vIOMMU.
> >>
> >> (I am thinking whether it's a good time now to let any PCI device able  to fetch its
> >> direct owner IOMMU object, then it can register anything  it want onto that object
> >> maybe?)
> >>
> > Hmmm, let's see Tianyu's comment as he's driving fault report. Let's keep in touch on
> > this Passthru-Mode enabling.
> 
> In my previous RFC patchset of fault event reporting, I registered fault
> notifier when there is a VFIO group attached to VFIO container and used
> the address space to check whether vIOMMU is added. The address space is
> returned by vtd_host_dma_iommu(). vtd_find_add_as() initializes device's
> IOMMU memory region and put it into device's address space as root
> memory region.
> Does this make sense?
> 
> @@ -1103,6 +1132,14 @@ static int vfio_connect_container(VFIOGroup
> *group, AddressSpace *as,
>          goto listener_release_exit;
>      }
> 
> +    if (memory_region_is_iommu(container->space->as->root)) {

I would suggest we don't play with as->root any more. After vtd vfio
series, this may not be true if passthrough mode is enabled (then the
root may be switched to an system memory alias). I don't know the best
way to check this, one alternative might be that we check whether
container->space->as == system_memory(), it should be workable, but in
a slightly hackish way.

Thanks,

> +        if (vfio_set_iommu_fault_notifier(container)) {
> +            error_setg_errno(errp, -ret,
> +                "Fail to set IOMMU fault notifier");
> +            goto listener_release_exit;
> +        }
> +    }
> +
>      container->initialized = true;
> 
> -- 
> Best regards
> Tianyu Lan

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-19  7:13   ` Liu, Yi L
@ 2017-04-20  3:07     ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-20  3:07 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, Lan, Tianyu, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson

On Wed, Apr 19, 2017 at 07:13:47AM +0000, Liu, Yi L wrote:
> Peter,
> 
> Besides the gPA->hPA mapping, pt mode enabling requires some sanity check on
> s->pt_supported. See comments inline.
> 
> > -----Original Message-----
> > From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org] On
> > Behalf Of Peter Xu
> > Sent: Monday, April 17, 2017 7:32 PM
> > To: qemu-devel@nongnu.org
> > Cc: Lan, Tianyu <tianyu.lan@intel.com>; Michael S . Tsirkin <mst@redhat.com>;
> > Jason Wang <jasowang@redhat.com>; peterx@redhat.com; Marcel Apfelbaum
> > <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>
> > Subject: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c          | 109 +++++++++++++++++++++++++++++++++--------
> >  hw/i386/intel_iommu_internal.h |   1 +
> >  hw/i386/trace-events           |   1 +
> >  hw/i386/x86-iommu.c            |   1 +
> >  include/hw/i386/x86-iommu.h    |   1 +
> >  5 files changed, 93 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > 05ae631..deb2007 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -872,7 +872,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> > uint8_t bus_num,
> >      } else {
> >          switch (vtd_ce_get_type(ce)) {
> >          case VTD_CONTEXT_TT_MULTI_LEVEL:
> > -            /* fall through */
> > +        case VTD_CONTEXT_TT_PASS_THROUGH:
> 
> Passthru type is a reserved type if ecap.PT=0, so add a check here. If s->pt_supported
> is false, report this translation type as unsupported and report an invalid context 
> programming to guest.
> 
> >          case VTD_CONTEXT_TT_DEV_IOTLB:
> >              break;
> >          default:
> > @@ -883,6 +883,73 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> > uint8_t bus_num,
> >      return 0;
> >  }
> > 
> > +/* Fetch translation type for specific device. Returns <0 if error
> > + * happens, otherwise return the shifted type to check against
> > + * VTD_CONTEXT_TT_*. */
> > +static int vtd_dev_get_trans_type(VTDAddressSpace *as) {
> > +    IntelIOMMUState *s;
> > +    VTDContextEntry ce;
> > +    int ret;
> > +
> > +    s = as->iommu_state;
> > +
> > +    ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
> > +                                   as->devfn, &ce);
> > +    if (ret) {
> > +        return ret;
> > +    }
> > +
> > +    return vtd_ce_get_type(&ce);
> > +}
> > +
> > +static bool vtd_dev_pt_enabled(VTDAddressSpace *as) {
> > +    int ret;
> > +
> > +    assert(as);
> > +
> > +    ret = vtd_dev_get_trans_type(as);
> > +    if (ret < 0) {
> > +        /*
> > +         * Possibly failed to parse the context entry for some reason
> > +         * (e.g., during init, or any guest configuration errors on
> > +         * context entries). We should assume PT not enabled for
> > +         * safety.
> > +         */
> > +        return false;
> > +    }
> > +
> > +    return ret == VTD_CONTEXT_TT_PASS_THROUGH; }
> > +
> > +static void vtd_switch_address_space(VTDAddressSpace *as) {
> > +    bool use_iommu;
> > +
> > +    assert(as);
> > +
> > +    use_iommu = as->iommu_state->dmar_enabled;
> > +    if (use_iommu) {
> 
> if s->pt_supported is false, no need to check the context.TT, even
> context.TT=pt, then it should be an invalid context programming.

Agree with you on both comments. Thanks. :-)

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-20  3:04               ` Peter Xu
@ 2017-04-20  4:55                 ` Liu, Yi L
  2017-04-20  5:40                   ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Liu, Yi L @ 2017-04-20  4:55 UTC (permalink / raw)
  To: Peter Xu, Lan, Tianyu
  Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Marcel Apfelbaum,
	David Gibson, Tian, Kevin

> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Thursday, April 20, 2017 11:04 AM
> To: Lan, Tianyu <tianyu.lan@intel.com>
> Cc: Liu, Yi L <yi.l.liu@intel.com>; qemu-devel@nongnu.org; Michael S . Tsirkin
> <mst@redhat.com>; Jason Wang <jasowang@redhat.com>; Marcel Apfelbaum
> <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>; Tian, Kevin
> <kevin.tian@intel.com>
> Subject: Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> 
> On Wed, Apr 19, 2017 at 03:27:52PM +0800, Lan Tianyu wrote:
> > Hi All:
> > 	Sorry for later response.
> > On 2017年04月18日 17:04, Liu, Yi L wrote:
> > >> -----Original Message-----
> > >> From: Peter Xu [mailto:peterx@redhat.com]
> > >> Sent: Tuesday, April 18, 2017 3:27 PM
> > >> To: Liu, Yi L <yi.l.liu@intel.com>
> > >> Cc: qemu-devel@nongnu.org; Lan, Tianyu <tianyu.lan@intel.com>; Michael S .
> > >> Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>; Marcel
> > >> Apfelbaum <marcel@redhat.com>; David Gibson
> > >> <david@gibson.dropbear.id.au>
> > >> Subject: Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support
> > >> passthrough (PT)
> > >>
> > >> On Tue, Apr 18, 2017 at 06:02:44AM +0000, Liu, Yi L wrote:
> > >>>>> Hi Peter,
> > >>>>>
> > >>>>> Skip address space switching is a good idea to support Passthru mode.
> > >>>>> However, without the address space, the vfio notifier would not
> > >>>>> be registered, thus vIOMMU emulator has no way to connect to
> > >>>>> host. It is no harm if there is only map/unmap notifier. But if
> > >>>>> we have more notifiers other than map/unmap, it may be a problem.
> > >>>>>
> > >>>>> I think we need to reconsider it here.
> > >>>>
> > >>>> For now I think as switching is good to us in general. Could I
> > >>>> know more context about this? Would it be okay to work on top of this in the
> future?
> > >>>>
> > >>>
> > >>> Let me explain. For vSVM enabling, it needs to add new notifier to
> > >>> bind gPASID table to host. If an assigned device uses SVM in
> > >>> guest, as designed guest IOMMU driver would set "pt" for it. This
> > >>> is to make sure the host second-level page table stores GPA->HPA
> > >>> mapping. So that pIOMMU can do nested translation to achieve gVA->GPA
> GPA->HPA mapping.
> > >>
> > >> That should mean that in the guest it will only enable first level
> > >> translation, while in the host it'll be nested (first + second)?
> > >> Then you should be trapping the guest extended context entry
> > >> invalidation, then pass the PASID table pointer downward to the host IOMMU,
> am I correct?
> > >
> > > exactly.
> > >
> > >>
> > >>>
> > >>> So the situation is if we want to keep GPA->HPA mapping, we should
> > >>> skip address space switch, but the vfio listener would not know
> > >>> vIOMMU is added, so no vfio notifier would be registered. But if
> > >>> we do the switch, the GPA->HPA mapping is unmapped. And need
> > >>> another way to build the
> > >> GPA->HPA mapping.
> > >>
> > >> If my understanding above is correct, current IOMMU notifier may
> > >> not satisfy your need. If you see the current notify interface,
> > >> it's delivering IOMMUTLBEntry. I suspect it only suites for IOTLB
> > >> notifications, but not if you want to pass some pointer (one GPA) downward
> somewhere.
> > >
> > > Yeah, you got it more than absolutely. One of my patch is to modify
> > > the para to be void * and let each notifiers to translate
> > > separately. I think it should be a reasonable change.
> > >
> > > Supposedly, I would send RFC for vSVM soon. We may talk more it at that thread.
> 
> I suspect whether it'll be good that we hang everything under current IOMMU
> notifiers... But sure I'm looking forward to your RFC. :)

The current IOMMU notifier is no doubt MAP/UNMAP specific. However, I think it should
be a common requirement that vIOMMU emulator needs notifiers other than MAP/UNMAP
to notify pIOMMU. So I think it's better to show the basic design for vSVM. And discuss
the best way to place the new notifiers. Would accelerate my work. And thx for your
understanding.

> > >
> > >>>
> > >>> I think we may have two choice here. Pls let me know your ideas.
> > >>>
> > >>> 1. skip the switch for "pt" mode, find other way to register vfio
> > >>> notifiers. not sure if this is workable. Just a quick thought.
> > >>>
> > >>> 2. do the switch, and rebuild GPA->HPA mapping if device use "pt"
> > >>> mode. For this option, I also have two way to build the GPA->HPA mapping.
> > >>> a) walk all the memory region sections in address_space_memory,
> > >>> and build the
> > >> mapping.
> > >>> address_space_memory.dispatch->map.sections, sections stores all
> > >>> the mapped
> > >> sections.
> > >>>
> > >>> b) let guest iommu driver mimics a 1:1 mapping for the devices use
> > >>> "pt" mode. in this way, the GPA->HPA mapping could be setup by
> > >>> walking the guest SL page table. This is consistent with your vIOVA replay
> solution.
> > >>
> > >> I'll prefer (1). Reason explained above.
> > >>
> > >>>
> > >>> Also I'm curious if Tianyu's fault report framework needs to register new
> notifiers.
> > >>
> > >> For Tianyu's case, I feel like we need other kind of notifier as
> > >> well, but not the current IOTLB one. And, of course in this case
> > >> it'll be in an reverted direction, which is from device to vIOMMU.
> > >>
> > >> (I am thinking whether it's a good time now to let any PCI device
> > >> able  to fetch its direct owner IOMMU object, then it can register
> > >> anything  it want onto that object
> > >> maybe?)
> > >>
> > > Hmmm, let's see Tianyu's comment as he's driving fault report. Let's
> > > keep in touch on this Passthru-Mode enabling.
> >
> > In my previous RFC patchset of fault event reporting, I registered
> > fault notifier when there is a VFIO group attached to VFIO container
> > and used the address space to check whether vIOMMU is added. The
> > address space is returned by vtd_host_dma_iommu(). vtd_find_add_as()
> > initializes device's IOMMU memory region and put it into device's
> > address space as root memory region.
> > Does this make sense?
> >
> > @@ -1103,6 +1132,14 @@ static int vfio_connect_container(VFIOGroup
> > *group, AddressSpace *as,
> >          goto listener_release_exit;
> >      }
> >
> > +    if (memory_region_is_iommu(container->space->as->root)) {
> 
> I would suggest we don't play with as->root any more. After vtd vfio series, this may
> not be true if passthrough mode is enabled (then the root may be switched to an
> system memory alias). I don't know the best way to check this, one alternative might
> be that we check whether
> container->space->as == system_memory(), it should be workable, but in
> a slightly hackish way.

In my understanding, container->space->as->root cannot work here no matter passthru-mode
is enabled or not. The code here is aiming to check if vIOMMU exists. After the vfio series,
the vtd_dev_as->root is not initialized to be a iommu MemoryRegion. Compared with checking
if it is system_memory(), I think adding a mechanism to get the iommu MemoryRegion may
be a better choice. Just like the current pci_device_iommu_address_space().

> 
> > +        if (vfio_set_iommu_fault_notifier(container)) {
> > +            error_setg_errno(errp, -ret,
> > +                "Fail to set IOMMU fault notifier");
> > +            goto listener_release_exit;
> > +        }
> > +    }
> > +
> >      container->initialized = true;
> >
> > --
> > Best regards
> > Tianyu Lan
> 
> --
> Peter Xu

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-20  4:55                 ` Liu, Yi L
@ 2017-04-20  5:40                   ` Peter Xu
  2017-04-20  6:36                     ` Liu, Yi L
  2017-04-20  6:51                     ` Lan, Tianyu
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-20  5:40 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Lan, Tianyu, qemu-devel, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson, Tian, Kevin

On Thu, Apr 20, 2017 at 04:55:24AM +0000, Liu, Yi L wrote:

[...]

> > > In my previous RFC patchset of fault event reporting, I registered
> > > fault notifier when there is a VFIO group attached to VFIO container
> > > and used the address space to check whether vIOMMU is added. The
> > > address space is returned by vtd_host_dma_iommu(). vtd_find_add_as()
> > > initializes device's IOMMU memory region and put it into device's
> > > address space as root memory region.
> > > Does this make sense?
> > >
> > > @@ -1103,6 +1132,14 @@ static int vfio_connect_container(VFIOGroup
> > > *group, AddressSpace *as,
> > >          goto listener_release_exit;
> > >      }
> > >
> > > +    if (memory_region_is_iommu(container->space->as->root)) {
> > 
> > I would suggest we don't play with as->root any more. After vtd vfio series, this may
> > not be true if passthrough mode is enabled (then the root may be switched to an
> > system memory alias). I don't know the best way to check this, one alternative might
> > be that we check whether
> > container->space->as == system_memory(), it should be workable, but in

Sorry, I was meaning &address_space_memory.

> > a slightly hackish way.
> 
> In my understanding, container->space->as->root cannot work here no matter passthru-mode
> is enabled or not. The code here is aiming to check if vIOMMU exists. After the vfio series,
> the vtd_dev_as->root is not initialized to be a iommu MemoryRegion. Compared with checking
> if it is system_memory(), I think adding a mechanism to get the iommu MemoryRegion may
> be a better choice. Just like the current pci_device_iommu_address_space().

Isn't pci_device_iommu_address_space() used to get that IOMMU memory
region? And, one thing to mention is that container->space->as is
actually derived from pci_device_iommu_address_space() (when calling
vfio_get_group()).

I feel like that playing around with an IOMMU memory region is still
not clear enough in many cases. I still feel like some day we would
like an "IOMMU object". Then, we can register non-iotlb notifiers
against that IOMMU object, rather than memory regions...

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-20  5:40                   ` Peter Xu
@ 2017-04-20  6:36                     ` Liu, Yi L
  2017-04-20  7:04                       ` Peter Xu
  2017-04-20  6:51                     ` Lan, Tianyu
  1 sibling, 1 reply; 27+ messages in thread
From: Liu, Yi L @ 2017-04-20  6:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: Lan, Tianyu, qemu-devel, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson, Tian, Kevin



> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Thursday, April 20, 2017 1:40 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: Lan, Tianyu <tianyu.lan@intel.com>; qemu-devel@nongnu.org; Michael S .
> Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>; Marcel
> Apfelbaum <marcel@redhat.com>; David Gibson <david@gibson.dropbear.id.au>;
> Tian, Kevin <kevin.tian@intel.com>
> Subject: Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> 
> On Thu, Apr 20, 2017 at 04:55:24AM +0000, Liu, Yi L wrote:
> 
> [...]
> 
> > > > In my previous RFC patchset of fault event reporting, I registered
> > > > fault notifier when there is a VFIO group attached to VFIO
> > > > container and used the address space to check whether vIOMMU is
> > > > added. The address space is returned by vtd_host_dma_iommu().
> > > > vtd_find_add_as() initializes device's IOMMU memory region and put
> > > > it into device's address space as root memory region.
> > > > Does this make sense?
> > > >
> > > > @@ -1103,6 +1132,14 @@ static int vfio_connect_container(VFIOGroup
> > > > *group, AddressSpace *as,
> > > >          goto listener_release_exit;
> > > >      }
> > > >
> > > > +    if (memory_region_is_iommu(container->space->as->root)) {
> > >
> > > I would suggest we don't play with as->root any more. After vtd vfio
> > > series, this may not be true if passthrough mode is enabled (then
> > > the root may be switched to an system memory alias). I don't know
> > > the best way to check this, one alternative might be that we check
> > > whether
> > > container->space->as == system_memory(), it should be workable, but
> > > container->space->in
> 
> Sorry, I was meaning &address_space_memory.

correct, I got your point all the same.

> 
> > > a slightly hackish way.
> >
> > In my understanding, container->space->as->root cannot work here no
> > matter passthru-mode is enabled or not. The code here is aiming to
> > check if vIOMMU exists. After the vfio series, the vtd_dev_as->root is
> > not initialized to be a iommu MemoryRegion. Compared with checking if
> > it is system_memory(), I think adding a mechanism to get the iommu
> MemoryRegion may be a better choice. Just like the current
> pci_device_iommu_address_space().
> 
> Isn't pci_device_iommu_address_space() used to get that IOMMU memory region?

It actually returns the AddressSpace, and the AddressSpace includes a memory region.
It is as->root. But after adding the vfio series, through the IOMMU memory region
is got, but it has no iommu_ops. Just as the following code shows. That's why I said
even without passthru-mode, Tianyu's that code snippet is not able to get the correct
check.

        memory_region_init(&vtd_dev_as->root, OBJECT(s),
                           "vtd_root", UINT64_MAX);
        address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, name);

> And, one thing to mention is that container->space->as is actually derived from
> pci_device_iommu_address_space() (when calling vfio_get_group()).

yes, it is.

> I feel like that playing around with an IOMMU memory region is still not clear
> enough in many cases. I still feel like some day we would like an "IOMMU object".
> Then, we can register non-iotlb notifiers against that IOMMU object, rather than
> memory regions...
> 
haven’t think much about it. if you have any early patch, may be glad to have a look.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-20  5:40                   ` Peter Xu
  2017-04-20  6:36                     ` Liu, Yi L
@ 2017-04-20  6:51                     ` Lan, Tianyu
  2017-04-20  7:00                       ` Peter Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Lan, Tianyu @ 2017-04-20  6:51 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Marcel Apfelbaum,
	David Gibson, Tian, Kevin

On 4/20/2017 1:40 PM, Peter Xu wrote:
> On Thu, Apr 20, 2017 at 04:55:24AM +0000, Liu, Yi L wrote:
>
> [...]
>
>>>> In my previous RFC patchset of fault event reporting, I registered
>>>> fault notifier when there is a VFIO group attached to VFIO container
>>>> and used the address space to check whether vIOMMU is added. The
>>>> address space is returned by vtd_host_dma_iommu(). vtd_find_add_as()
>>>> initializes device's IOMMU memory region and put it into device's
>>>> address space as root memory region.
>>>> Does this make sense?
>>>>
>>>> @@ -1103,6 +1132,14 @@ static int vfio_connect_container(VFIOGroup
>>>> *group, AddressSpace *as,
>>>>          goto listener_release_exit;
>>>>      }
>>>>
>>>> +    if (memory_region_is_iommu(container->space->as->root)) {
>>>
>>> I would suggest we don't play with as->root any more. After vtd vfio series, this may
>>> not be true if passthrough mode is enabled (then the root may be switched to an
>>> system memory alias). I don't know the best way to check this, one alternative might
>>> be that we check whether
>>> container->space->as == system_memory(), it should be workable, but in
>
> Sorry, I was meaning &address_space_memory.
>
>>> a slightly hackish way.
>>
>> In my understanding, container->space->as->root cannot work here no matter passthru-mode
>> is enabled or not. The code here is aiming to check if vIOMMU exists. After the vfio series,
>> the vtd_dev_as->root is not initialized to be a iommu MemoryRegion. Compared with checking
>> if it is system_memory(), I think adding a mechanism to get the iommu MemoryRegion may
>> be a better choice. Just like the current pci_device_iommu_address_space().
>
> Isn't pci_device_iommu_address_space() used to get that IOMMU memory
> region? And, one thing to mention is that container->space->as is
> actually derived from pci_device_iommu_address_space() (when calling
> vfio_get_group()).
>
> I feel like that playing around with an IOMMU memory region is still
> not clear enough in many cases. I still feel like some day we would
> like an "IOMMU object". Then, we can register non-iotlb notifiers
> against that IOMMU object, rather than memory regions...

Our target is to check whether assigned device is under a vIOMMU.
We may check whether iommu_fn point has been populated in its pci bus' 
data structure(PCIDevice). This is what pci_device_iommu_address_space()
does.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-20  6:51                     ` Lan, Tianyu
@ 2017-04-20  7:00                       ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2017-04-20  7:00 UTC (permalink / raw)
  To: Lan, Tianyu
  Cc: Liu, Yi L, qemu-devel, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson, Tian, Kevin

On Thu, Apr 20, 2017 at 02:51:11PM +0800, Lan, Tianyu wrote:
> On 4/20/2017 1:40 PM, Peter Xu wrote:

[...]

> >>>a slightly hackish way.
> >>
> >>In my understanding, container->space->as->root cannot work here no matter passthru-mode
> >>is enabled or not. The code here is aiming to check if vIOMMU exists. After the vfio series,
> >>the vtd_dev_as->root is not initialized to be a iommu MemoryRegion. Compared with checking
> >>if it is system_memory(), I think adding a mechanism to get the iommu MemoryRegion may
> >>be a better choice. Just like the current pci_device_iommu_address_space().
> >
> >Isn't pci_device_iommu_address_space() used to get that IOMMU memory
> >region? And, one thing to mention is that container->space->as is
> >actually derived from pci_device_iommu_address_space() (when calling
> >vfio_get_group()).
> >
> >I feel like that playing around with an IOMMU memory region is still
> >not clear enough in many cases. I still feel like some day we would
> >like an "IOMMU object". Then, we can register non-iotlb notifiers
> >against that IOMMU object, rather than memory regions...
> 
> Our target is to check whether assigned device is under a vIOMMU.
> We may check whether iommu_fn point has been populated in its pci bus' data
> structure(PCIDevice). This is what pci_device_iommu_address_space()
> does.

Agreed. Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-20  6:36                     ` Liu, Yi L
@ 2017-04-20  7:04                       ` Peter Xu
  2017-04-20  7:10                         ` Lan Tianyu
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2017-04-20  7:04 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Lan, Tianyu, qemu-devel, Michael S . Tsirkin, Jason Wang,
	Marcel Apfelbaum, David Gibson, Tian, Kevin

On Thu, Apr 20, 2017 at 06:36:16AM +0000, Liu, Yi L wrote:

[...]

> > > In my understanding, container->space->as->root cannot work here no
> > > matter passthru-mode is enabled or not. The code here is aiming to
> > > check if vIOMMU exists. After the vfio series, the vtd_dev_as->root is
> > > not initialized to be a iommu MemoryRegion. Compared with checking if
> > > it is system_memory(), I think adding a mechanism to get the iommu
> > MemoryRegion may be a better choice. Just like the current
> > pci_device_iommu_address_space().
> > 
> > Isn't pci_device_iommu_address_space() used to get that IOMMU memory region?

Again I should say s/memory region/address space/...

> 
> It actually returns the AddressSpace, and the AddressSpace includes a memory region.
> It is as->root. But after adding the vfio series, through the IOMMU memory region
> is got, but it has no iommu_ops. Just as the following code shows. That's why I said
> even without passthru-mode, Tianyu's that code snippet is not able to get the correct
> check.
> 
>         memory_region_init(&vtd_dev_as->root, OBJECT(s),
>                            "vtd_root", UINT64_MAX);
>         address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, name);

The problem is, I am not sure whether there is always _only_ one IOMMU
region behind one device. E.g., IIUC ppc can have more than one IOMMU
memory regions, but translate for different address ranges.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
  2017-04-20  7:04                       ` Peter Xu
@ 2017-04-20  7:10                         ` Lan Tianyu
  0 siblings, 0 replies; 27+ messages in thread
From: Lan Tianyu @ 2017-04-20  7:10 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Marcel Apfelbaum,
	David Gibson, Tian, Kevin

On 2017年04月20日 15:04, Peter Xu wrote:
> On Thu, Apr 20, 2017 at 06:36:16AM +0000, Liu, Yi L wrote:
> 
> [...]
> 
>>>> In my understanding, container->space->as->root cannot work here no
>>>> matter passthru-mode is enabled or not. The code here is aiming to
>>>> check if vIOMMU exists. After the vfio series, the vtd_dev_as->root is
>>>> not initialized to be a iommu MemoryRegion. Compared with checking if
>>>> it is system_memory(), I think adding a mechanism to get the iommu
>>> MemoryRegion may be a better choice. Just like the current
>>> pci_device_iommu_address_space().
>>>
>>> Isn't pci_device_iommu_address_space() used to get that IOMMU memory region?
> 
> Again I should say s/memory region/address space/...
> 
>>
>> It actually returns the AddressSpace, and the AddressSpace includes a memory region.
>> It is as->root. But after adding the vfio series, through the IOMMU memory region
>> is got, but it has no iommu_ops. Just as the following code shows. That's why I said
>> even without passthru-mode, Tianyu's that code snippet is not able to get the correct
>> check.
>>
>>         memory_region_init(&vtd_dev_as->root, OBJECT(s),
>>                            "vtd_root", UINT64_MAX);
>>         address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, name);
> 
> The problem is, I am not sure whether there is always _only_ one IOMMU
> region behind one device. E.g., IIUC ppc can have more than one IOMMU
> memory regions, but translate for different address ranges.
> 

If we really want to check this via memory region, we may check all
subregions under container's address space. Register vIOMMU related
notifiers if one of them has iommu_ops,


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-04-20  7:19 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-17 11:32 [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate() Peter Xu
2017-04-18  4:07   ` David Gibson
2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 2/7] memory: remove the last param in memory_region_iommu_replay() Peter Xu
2017-04-18  4:08   ` David Gibson
2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 3/7] x86-iommu: use DeviceClass properties Peter Xu
2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 4/7] intel_iommu: renaming context entry helpers Peter Xu
2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 5/7] intel_iommu: provide vtd_ce_get_type() Peter Xu
2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 6/7] intel_iommu: use IOMMU_ACCESS_FLAG() Peter Xu
2017-04-17 11:32 ` [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT) Peter Xu
2017-04-18  4:30   ` Liu, Yi L
2017-04-18  4:54     ` Peter Xu
2017-04-18  6:02       ` Liu, Yi L
2017-04-18  7:27         ` Peter Xu
2017-04-18  9:04           ` Liu, Yi L
2017-04-19  7:27             ` Lan Tianyu
2017-04-20  3:04               ` Peter Xu
2017-04-20  4:55                 ` Liu, Yi L
2017-04-20  5:40                   ` Peter Xu
2017-04-20  6:36                     ` Liu, Yi L
2017-04-20  7:04                       ` Peter Xu
2017-04-20  7:10                         ` Lan Tianyu
2017-04-20  6:51                     ` Lan, Tianyu
2017-04-20  7:00                       ` Peter Xu
2017-04-19  7:13   ` Liu, Yi L
2017-04-20  3:07     ` Peter Xu
2017-04-18 10:29 ` [Qemu-devel] [PATCH v2 0/7] VT-d: PT (passthrough) mode support and misc fixes Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.