All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
@ 2018-03-01 10:33 Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier Liu, Yi L
                   ` (12 more replies)
  0 siblings, 13 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang

This patchset is to introduce a notifier framework for virt-SVA.
You may find virt-SVA design details from the link below.

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04925.html

SVA is short for Shared Virtual Addressing. This is also called Shared
Virtual Memory in previous patchsets. However, SVM is confusing as it
can also be short for Secure Virtual Machine. So this patchset use
Shared Virtual Addressing instead of Shared Virtual Memory. And it
would be applied in future (SVA)related patch series as well.

Qemu has an existing notifier framework based on MemoryRegion, which
are used for MAP/UNMAP. However, it is not well suited for virt-SVA.
Reasons are as below:
- virt-SVA works along with PT = 1
- if PT = 1 IOMMU MR are disabled so MR notifier are not registered
- new notifiers do not fit nicely in this framework as they need to be
  registered even if PT = 1
- need a new framework to attach the new notifiers
- Additional background can be got from:
  https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04931.html

So a new iommu notifier framework is needed. This patchset introduces
a notifier framework based on IOMMUSVAContext. IOMMUSVAContext is
introduced to be an abstract of virt-SVA operations in Qemu.

Patch Overview:
* 1 - 2: rename existing naming, the IOMMU MemoryRegion Notifier
         framework
* 3 - 4: introduce SVA notifier framework based on IOMMUSVAContext
* 5 - 7: introduce PCISVAOps and expose the SVA notfier framework
         through hw/pci layer
* 8 - 12: show the usage of SVA notifier in Intel vIOMMU emulator

[v2->v3 changes]
* Rephrase the cover letter
* Follow David's suggestion, take emulated SVA capable device
  into consideration
* renaming IOMMUObject to be IOMMUSVAContext
* Expose SVA nofitier registeration through hw/pci layer
* rename the file hw/core/iommu.c to be hw/core/pasid.c
         include/hw/core/iommu.h to be include/hw/core/pasid.h
* use SVA instead of SVM in patchset
* rename patchset title, previous is "Introduce new iommu notifier framework"

v2 link:
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04553.html

[v1->v2 changes]
* Rephrase the cover letter
* Re-sort the sequence of the patches
* Split the patch to introduce IOMMUObject and AddressSpaceOps
* Address two missed list init spotted by Auger Eric

v1 link:
http://qemu-devel.nongnu.narkive.com/XhqBQ8wc/resend-patch-0-6-introduce-new-iommu-notifier-framework

Original patchset from Peter Xu can be found in the link below.

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05360.html

Comments and suggestions are always welcomed, thanks.

Liu, Yi L (10):
  vfio: rename GuestIOMMU to be GuestIOMMUMR
  vfio/pci: add notify framework based on IOMMUSVAContext
  hw/pci: introduce PCISVAOps to PCIDevice
  vfio/pci: provide vfio_pci_sva_ops instance
  vfio/pci: register sva notifier
  hw/pci: introduce pci_device_notify_iommu()
  intel_iommu: record assigned devices in a list
  intel_iommu: bind guest pasid table to host
  intel_iommu: add framework for PASID AddressSpace management
  intel_iommu: bind device to PASID tagged AddressSpace

Peter Xu (2):
  memory: rename existing iommu notifier to be iommu mr notifier
  hw/core: introduce IOMMUSVAContext for virt-SVA

 hw/alpha/typhoon.c             |   2 +-
 hw/core/Makefile.objs          |   1 +
 hw/core/pasid.c                |  64 ++++++++++
 hw/hppa/dino.c                 |   2 +-
 hw/i386/amd_iommu.c            |   8 +-
 hw/i386/intel_iommu.c          | 272 +++++++++++++++++++++++++++++++++++++----
 hw/i386/intel_iommu_internal.h |  10 ++
 hw/pci-host/ppce500.c          |   2 +-
 hw/pci-host/prep.c             |   2 +-
 hw/pci-host/sabre.c            |   2 +-
 hw/pci/pci.c                   |  85 ++++++++++++-
 hw/ppc/spapr_iommu.c           |   8 +-
 hw/ppc/spapr_pci.c             |   2 +-
 hw/s390x/s390-pci-bus.c        |   6 +-
 hw/vfio/common.c               |  28 +++--
 hw/vfio/pci.c                  |  84 +++++++++++++
 hw/virtio/vhost.c              |  10 +-
 include/exec/memory.h          |  55 +++++----
 include/hw/core/pasid.h        | 110 +++++++++++++++++
 include/hw/i386/intel_iommu.h  |  43 ++++++-
 include/hw/pci/pci.h           |  33 ++++-
 include/hw/pci/pci_bus.h       |   1 +
 include/hw/vfio/vfio-common.h  |  19 ++-
 include/hw/virtio/vhost.h      |   4 +-
 memory.c                       |  37 +++---
 25 files changed, 774 insertions(+), 116 deletions(-)
 create mode 100644 hw/core/pasid.c
 create mode 100644 include/hw/core/pasid.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-02 15:01   ` Paolo Bonzini
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 02/12] vfio: rename GuestIOMMU to be GuestIOMMUMR Liu, Yi L
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

From: Peter Xu <peterx@redhat.com>

IOMMU notifiers before are mostly used for [dev-]IOTLB stuffs. It is not
suitable for other kind of notifiers (one example would be the future
virt-svm support). Considering that current notifiers are targeted for
per memory region, renaming the iommu notifier definitions.

This patch has following changes:
* all the notifier types from IOMMU_NOTIFIER_* prefix into IOMMU_MR_EVENT_*
  to better show its usage (for memory regions).
* rename IOMMUNotifier to IOMMUMRNotifier
* rename iommu_notifier to iommu_mr_notifier

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/amd_iommu.c           |  6 ++---
 hw/i386/intel_iommu.c         | 34 +++++++++++++-------------
 hw/ppc/spapr_iommu.c          |  8 +++----
 hw/s390x/s390-pci-bus.c       |  2 +-
 hw/vfio/common.c              | 10 ++++----
 hw/virtio/vhost.c             | 10 ++++----
 include/exec/memory.h         | 55 ++++++++++++++++++++++---------------------
 include/hw/i386/intel_iommu.h |  8 +++----
 include/hw/vfio/vfio-common.h |  2 +-
 include/hw/virtio/vhost.h     |  4 ++--
 memory.c                      | 37 +++++++++++++++--------------
 11 files changed, 89 insertions(+), 87 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 63d46ff..7bfde37 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1075,12 +1075,12 @@ static const MemoryRegionOps mmio_mem_ops = {
 };
 
 static void amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
-                                            IOMMUNotifierFlag old,
-                                            IOMMUNotifierFlag new)
+                                            IOMMUMREventFlag old,
+                                            IOMMUMREventFlag new)
 {
     AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
 
-    if (new & IOMMU_NOTIFIER_MAP) {
+    if (new & IOMMU_MR_EVENT_MAP) {
         error_report("device %02x.%02x.%x requires iommu notifier which is not "
                      "currently supported", as->bus_num, PCI_SLOT(as->devfn),
                      PCI_FUNC(as->devfn));
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2e841cd..9edf392 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1230,7 +1230,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 
 static void vtd_iommu_replay_all(IntelIOMMUState *s)
 {
-    IntelIOMMUNotifierNode *node;
+    IntelIOMMUMRNotifierNode *node;
 
     QLIST_FOREACH(node, &s->notifiers_list, next) {
         memory_region_iommu_replay_all(&node->vtd_as->iommu);
@@ -1304,7 +1304,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                 /*
                  * So a device is moving out of (or moving into) a
                  * domain, a replay() suites here to notify all the
-                 * IOMMU_NOTIFIER_MAP registers about this change.
+                 * IOMMU_MR_EVENT_MAP registers about this change.
                  * This won't bring bad even if we have no such
                  * notifier registered - the IOMMU notification
                  * framework will skip MAP notifications if that
@@ -1354,7 +1354,7 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState *s)
 
 static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 {
-    IntelIOMMUNotifierNode *node;
+    IntelIOMMUMRNotifierNode *node;
     VTDContextEntry ce;
     VTDAddressSpace *vtd_as;
 
@@ -1384,7 +1384,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                            uint16_t domain_id, hwaddr addr,
                                            uint8_t am)
 {
-    IntelIOMMUNotifierNode *node;
+    IntelIOMMUMRNotifierNode *node;
     VTDContextEntry ce;
     int ret;
 
@@ -2314,21 +2314,21 @@ static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
 }
 
 static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
-                                          IOMMUNotifierFlag old,
-                                          IOMMUNotifierFlag new)
+                                          IOMMUMREventFlag old,
+                                          IOMMUMREventFlag new)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
-    IntelIOMMUNotifierNode *node = NULL;
-    IntelIOMMUNotifierNode *next_node = NULL;
+    IntelIOMMUMRNotifierNode *node = NULL;
+    IntelIOMMUMRNotifierNode *next_node = NULL;
 
-    if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
+    if (!s->caching_mode && new & IOMMU_MR_EVENT_MAP) {
         error_report("We need to set caching-mode=1 for intel-iommu to enable "
                      "device assignment with IOMMU protection.");
         exit(1);
     }
 
-    if (old == IOMMU_NOTIFIER_NONE) {
+    if (old == IOMMU_MR_EVENT_NONE) {
         node = g_malloc0(sizeof(*node));
         node->vtd_as = vtd_as;
         QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
@@ -2338,7 +2338,7 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
     /* update notifier node with new flags */
     QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
         if (node->vtd_as == vtd_as) {
-            if (new == IOMMU_NOTIFIER_NONE) {
+            if (new == IOMMU_MR_EVENT_NONE) {
                 QLIST_REMOVE(node, next);
                 g_free(node);
             }
@@ -2757,7 +2757,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 }
 
 /* Unmap the whole range in the notifier's scope. */
-static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
+static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUMRNotifier *n)
 {
     IOMMUTLBEntry entry;
     hwaddr size;
@@ -2813,13 +2813,13 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 
 static void vtd_address_space_unmap_all(IntelIOMMUState *s)
 {
-    IntelIOMMUNotifierNode *node;
+    IntelIOMMUMRNotifierNode *node;
     VTDAddressSpace *vtd_as;
-    IOMMUNotifier *n;
+    IOMMUMRNotifier *n;
 
     QLIST_FOREACH(node, &s->notifiers_list, next) {
         vtd_as = node->vtd_as;
-        IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
+        IOMMU_MR_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
             vtd_address_space_unmap(vtd_as, n);
         }
     }
@@ -2827,11 +2827,11 @@ static void vtd_address_space_unmap_all(IntelIOMMUState *s)
 
 static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
 {
-    memory_region_notify_one((IOMMUNotifier *)private, entry);
+    memory_region_notify_one((IOMMUMRNotifier *)private, entry);
     return 0;
 }
 
-static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
+static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUMRNotifier *n)
 {
     VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index aaa6010..74cddc3 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -174,14 +174,14 @@ static int spapr_tce_get_attr(IOMMUMemoryRegion *iommu,
 }
 
 static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
-                                          IOMMUNotifierFlag old,
-                                          IOMMUNotifierFlag new)
+                                          IOMMUMREventFlag old,
+                                          IOMMUMREventFlag new)
 {
     struct sPAPRTCETable *tbl = container_of(iommu, sPAPRTCETable, iommu);
 
-    if (old == IOMMU_NOTIFIER_NONE && new != IOMMU_NOTIFIER_NONE) {
+    if (old == IOMMU_MR_EVENT_NONE && new != IOMMU_MR_EVENT_NONE) {
         spapr_tce_set_need_vfio(tbl, true);
-    } else if (old != IOMMU_NOTIFIER_NONE && new == IOMMU_NOTIFIER_NONE) {
+    } else if (old != IOMMU_MR_EVENT_NONE && new == IOMMU_MR_EVENT_NONE) {
         spapr_tce_set_need_vfio(tbl, false);
     }
 }
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 77a50ca..1bad7ab 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -541,7 +541,7 @@ err:
 }
 
 static void s390_pci_iommu_replay(IOMMUMemoryRegion *iommu,
-                                  IOMMUNotifier *notifier)
+                                  IOMMUMRNotifier *notifier)
 {
     /* It's impossible to plug a pci device on s390x that already has iommu
      * mappings which need to be replayed, that is due to the "one iommu per
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f895e3c..cbda506 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -346,7 +346,7 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
     return true;
 }
 
-static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vfio_iommu_map_notify(IOMMUMRNotifier *n, IOMMUTLBEntry *iotlb)
 {
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
     VFIOContainer *container = giommu->container;
@@ -522,10 +522,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
         llend = int128_add(int128_make64(section->offset_within_region),
                            section->size);
         llend = int128_sub(llend, int128_one());
-        iommu_notifier_init(&giommu->n, vfio_iommu_map_notify,
-                            IOMMU_NOTIFIER_ALL,
-                            section->offset_within_region,
-                            int128_get64(llend));
+        iommu_mr_notifier_init(&giommu->n, vfio_iommu_map_notify,
+                               IOMMU_MR_EVENT_ALL,
+                               section->offset_within_region,
+                               int128_get64(llend));
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
         memory_region_register_iommu_notifier(section->mr, &giommu->n);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 4a44e6e..49f0a26 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -579,7 +579,7 @@ static void vhost_region_addnop(MemoryListener *listener,
     vhost_region_add_section(dev, section);
 }
 
-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vhost_iommu_unmap_notify(IOMMUMRNotifier *n, IOMMUTLBEntry *iotlb)
 {
     struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
     struct vhost_dev *hdev = iommu->hdev;
@@ -607,10 +607,10 @@ static void vhost_iommu_region_add(MemoryListener *listener,
     end = int128_add(int128_make64(section->offset_within_region),
                      section->size);
     end = int128_sub(end, int128_one());
-    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
-                        IOMMU_NOTIFIER_UNMAP,
-                        section->offset_within_region,
-                        int128_get64(end));
+    iommu_mr_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
+                           IOMMU_MR_EVENT_UNMAP,
+                           section->offset_within_region,
+                           int128_get64(end));
     iommu->mr = section->mr;
     iommu->iommu_offset = section->offset_within_address_space -
                           section->offset_within_region;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15e8111..520d409 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -75,36 +75,36 @@ struct IOMMUTLBEntry {
 };
 
 /*
- * Bitmap for different IOMMUNotifier capabilities. Each notifier can
+ * Bitmap for different IOMMUMRNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
  */
 typedef enum {
-    IOMMU_NOTIFIER_NONE = 0,
+    IOMMU_MR_EVENT_NONE = 0,
     /* Notify cache invalidations */
-    IOMMU_NOTIFIER_UNMAP = 0x1,
+    IOMMU_MR_EVENT_UNMAP = 0x1,
     /* Notify entry changes (newly created entries) */
-    IOMMU_NOTIFIER_MAP = 0x2,
-} IOMMUNotifierFlag;
+    IOMMU_MR_EVENT_MAP = 0x2,
+} IOMMUMREventFlag;
 
-#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
+#define IOMMU_MR_EVENT_ALL (IOMMU_MR_EVENT_MAP | IOMMU_MR_EVENT_UNMAP)
 
-struct IOMMUNotifier;
-typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
+struct IOMMUMRNotifier;
+typedef void (*IOMMUMRNotify)(struct IOMMUMRNotifier *notifier,
                             IOMMUTLBEntry *data);
 
-struct IOMMUNotifier {
-    IOMMUNotify notify;
-    IOMMUNotifierFlag notifier_flags;
+struct IOMMUMRNotifier {
+    IOMMUMRNotify notify;
+    IOMMUMREventFlag notifier_flags;
     /* Notify for address space range start <= addr <= end */
     hwaddr start;
     hwaddr end;
-    QLIST_ENTRY(IOMMUNotifier) node;
+    QLIST_ENTRY(IOMMUMRNotifier) node;
 };
-typedef struct IOMMUNotifier IOMMUNotifier;
+typedef struct IOMMUMRNotifier IOMMUMRNotifier;
 
-static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
-                                       IOMMUNotifierFlag flags,
-                                       hwaddr start, hwaddr end)
+static inline void iommu_mr_notifier_init(IOMMUMRNotifier *n, IOMMUMRNotify fn,
+                                          IOMMUMREventFlag flags,
+                                          hwaddr start, hwaddr end)
 {
     n->notify = fn;
     n->notifier_flags = flags;
@@ -210,10 +210,10 @@ typedef struct IOMMUMemoryRegionClass {
     uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
     /* Called when IOMMU Notifier flag changed */
     void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
-                                IOMMUNotifierFlag old_flags,
-                                IOMMUNotifierFlag new_flags);
+                                IOMMUMREventFlag old_flags,
+                                IOMMUMREventFlag new_flags);
     /* Set this up to provide customized IOMMU replay function */
-    void (*replay)(IOMMUMemoryRegion *iommu, IOMMUNotifier *notifier);
+    void (*replay)(IOMMUMemoryRegion *iommu, IOMMUMRNotifier *notifier);
 
     /* Get IOMMU misc attributes */
     int (*get_attr)(IOMMUMemoryRegion *iommu, enum IOMMUMemoryRegionAttr,
@@ -267,11 +267,11 @@ struct MemoryRegion {
 struct IOMMUMemoryRegion {
     MemoryRegion parent_obj;
 
-    QLIST_HEAD(, IOMMUNotifier) iommu_notify;
-    IOMMUNotifierFlag iommu_notify_flags;
+    QLIST_HEAD(, IOMMUMRNotifier) iommu_notify;
+    IOMMUMREventFlag iommu_notify_flags;
 };
 
-#define IOMMU_NOTIFIER_FOREACH(n, mr) \
+#define IOMMU_MR_NOTIFIER_FOREACH(n, mr) \
     QLIST_FOREACH((n), &(mr)->iommu_notify, node)
 
 /**
@@ -913,7 +913,7 @@ void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
  *         replaces all old entries for the same virtual I/O address range.
  *         Deleted entries have .@perm == 0.
  */
-void memory_region_notify_one(IOMMUNotifier *notifier,
+void memory_region_notify_one(IOMMUMRNotifier *notifier,
                               IOMMUTLBEntry *entry);
 
 /**
@@ -921,12 +921,12 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
  * IOMMU translation entries.
  *
  * @mr: the memory region to observe
- * @n: the IOMMUNotifier to be added; the notify callback receives a
+ * @n: the IOMMUMRNotifier to be added; the notify callback receives a
  *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
  *     ceases to be valid on exit from the notifier.
  */
 void memory_region_register_iommu_notifier(MemoryRegion *mr,
-                                           IOMMUNotifier *n);
+                                           IOMMUMRNotifier *n);
 
 /**
  * memory_region_iommu_replay: replay existing IOMMU translations to
@@ -936,7 +936,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
  * @iommu_mr: the memory region to observe
  * @n: the notifier to which to replay iommu mappings
  */
-void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n);
+void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
+                                IOMMUMRNotifier *n);
 
 /**
  * memory_region_iommu_replay_all: replay existing IOMMU translations
@@ -955,7 +956,7 @@ void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr);
  * @n: the notifier to be removed.
  */
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
-                                             IOMMUNotifier *n);
+                                             IOMMUMRNotifier *n);
 
 /**
  * memory_region_iommu_get_attr: return an IOMMU attr if get_attr() is
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 45ec891..1df6fa9 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -67,7 +67,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDIrq VTDIrq;
 typedef struct VTD_MSIMessage VTD_MSIMessage;
-typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
+typedef struct IntelIOMMUMRNotifierNode IntelIOMMUMRNotifierNode;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -253,9 +253,9 @@ struct VTD_MSIMessage {
 /* When IR is enabled, all MSI/MSI-X data bits should be zero */
 #define VTD_IR_MSI_DATA          (0)
 
-struct IntelIOMMUNotifierNode {
+struct IntelIOMMUMRNotifierNode {
     VTDAddressSpace *vtd_as;
-    QLIST_ENTRY(IntelIOMMUNotifierNode) next;
+    QLIST_ENTRY(IntelIOMMUMRNotifierNode) next;
 };
 
 /* The iommu (DMAR) device state struct */
@@ -295,7 +295,7 @@ struct IntelIOMMUState {
     GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
     /* list of registered notifiers */
-    QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
+    QLIST_HEAD(, IntelIOMMUMRNotifierNode) notifiers_list;
 
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f3a2ac9..865e3e7 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -97,7 +97,7 @@ typedef struct VFIOGuestIOMMU {
     VFIOContainer *container;
     IOMMUMemoryRegion *iommu;
     hwaddr iommu_offset;
-    IOMMUNotifier n;
+    IOMMUMRNotifier n;
     QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
 } VFIOGuestIOMMU;
 
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a7f449f..401ce60 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -42,7 +42,7 @@ struct vhost_iommu {
     struct vhost_dev *hdev;
     MemoryRegion *mr;
     hwaddr iommu_offset;
-    IOMMUNotifier n;
+    IOMMUMRNotifier n;
     QLIST_ENTRY(vhost_iommu) iommu_next;
 };
 
@@ -80,7 +80,7 @@ struct vhost_dev {
     struct vhost_log *log;
     QLIST_ENTRY(vhost_dev) entry;
     QLIST_HEAD(, vhost_iommu) iommu_list;
-    IOMMUNotifier n;
+    IOMMUMRNotifier n;
     const VhostDevConfigOps *config_ops;
 };
 
diff --git a/memory.c b/memory.c
index 6515131..6882a78 100644
--- a/memory.c
+++ b/memory.c
@@ -1702,7 +1702,7 @@ void memory_region_init_iommu(void *_iommu_mr,
     iommu_mr = IOMMU_MEMORY_REGION(mr);
     mr->terminates = true;  /* then re-forwards */
     QLIST_INIT(&iommu_mr->iommu_notify);
-    iommu_mr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
+    iommu_mr->iommu_notify_flags = IOMMU_MR_EVENT_NONE;
 }
 
 static void memory_region_finalize(Object *obj)
@@ -1799,12 +1799,12 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
 
 static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommu_mr)
 {
-    IOMMUNotifierFlag flags = IOMMU_NOTIFIER_NONE;
-    IOMMUNotifier *iommu_notifier;
+    IOMMUMREventFlag flags = IOMMU_MR_EVENT_NONE;
+    IOMMUMRNotifier *iommu_mr_notifier;
     IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
 
-    IOMMU_NOTIFIER_FOREACH(iommu_notifier, iommu_mr) {
-        flags |= iommu_notifier->notifier_flags;
+    IOMMU_MR_NOTIFIER_FOREACH(iommu_mr_notifier, iommu_mr) {
+        flags |= iommu_mr_notifier->notifier_flags;
     }
 
     if (flags != iommu_mr->iommu_notify_flags && imrc->notify_flag_changed) {
@@ -1817,7 +1817,7 @@ static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommu_mr)
 }
 
 void memory_region_register_iommu_notifier(MemoryRegion *mr,
-                                           IOMMUNotifier *n)
+                                           IOMMUMRNotifier *n)
 {
     IOMMUMemoryRegion *iommu_mr;
 
@@ -1828,7 +1828,7 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
 
     /* We need to register for at least one bitfield */
     iommu_mr = IOMMU_MEMORY_REGION(mr);
-    assert(n->notifier_flags != IOMMU_NOTIFIER_NONE);
+    assert(n->notifier_flags != IOMMU_MR_EVENT_NONE);
     assert(n->start <= n->end);
     QLIST_INSERT_HEAD(&iommu_mr->iommu_notify, n, node);
     memory_region_update_iommu_notify_flags(iommu_mr);
@@ -1844,7 +1844,8 @@ uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *iommu_mr)
     return TARGET_PAGE_SIZE;
 }
 
-void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
+void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
+                                IOMMUMRNotifier *n)
 {
     MemoryRegion *mr = MEMORY_REGION(iommu_mr);
     IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
@@ -1875,15 +1876,15 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
 
 void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr)
 {
-    IOMMUNotifier *notifier;
+    IOMMUMRNotifier *notifier;
 
-    IOMMU_NOTIFIER_FOREACH(notifier, iommu_mr) {
+    IOMMU_MR_NOTIFIER_FOREACH(notifier, iommu_mr) {
         memory_region_iommu_replay(iommu_mr, notifier);
     }
 }
 
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
-                                             IOMMUNotifier *n)
+                                             IOMMUMRNotifier *n)
 {
     IOMMUMemoryRegion *iommu_mr;
 
@@ -1896,10 +1897,10 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
     memory_region_update_iommu_notify_flags(iommu_mr);
 }
 
-void memory_region_notify_one(IOMMUNotifier *notifier,
+void memory_region_notify_one(IOMMUMRNotifier *notifier,
                               IOMMUTLBEntry *entry)
 {
-    IOMMUNotifierFlag request_flags;
+    IOMMUMREventFlag request_flags;
 
     /*
      * Skip the notification if the notification does not overlap
@@ -1911,9 +1912,9 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
     }
 
     if (entry->perm & IOMMU_RW) {
-        request_flags = IOMMU_NOTIFIER_MAP;
+        request_flags = IOMMU_MR_EVENT_MAP;
     } else {
-        request_flags = IOMMU_NOTIFIER_UNMAP;
+        request_flags = IOMMU_MR_EVENT_UNMAP;
     }
 
     if (notifier->notifier_flags & request_flags) {
@@ -1924,12 +1925,12 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
 void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
                                 IOMMUTLBEntry entry)
 {
-    IOMMUNotifier *iommu_notifier;
+    IOMMUMRNotifier *iommu_mr_notifier;
 
     assert(memory_region_is_iommu(MEMORY_REGION(iommu_mr)));
 
-    IOMMU_NOTIFIER_FOREACH(iommu_notifier, iommu_mr) {
-        memory_region_notify_one(iommu_notifier, &entry);
+    IOMMU_MR_NOTIFIER_FOREACH(iommu_mr_notifier, iommu_mr) {
+        memory_region_notify_one(iommu_mr_notifier, &entry);
     }
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 02/12] vfio: rename GuestIOMMU to be GuestIOMMUMR
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA Liu, Yi L
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch renames GuestIOMMU to GuestIOMMUMR as the existing GuestIOMMU
is for MemoryRegion related notifiers.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c              | 17 +++++++++--------
 include/hw/vfio/vfio-common.h |  8 ++++----
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cbda506..06277d2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -348,7 +348,7 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
 
 static void vfio_iommu_map_notify(IOMMUMRNotifier *n, IOMMUTLBEntry *iotlb)
 {
-    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOGuestIOMMUMR *giommu = container_of(n, VFIOGuestIOMMUMR, n);
     VFIOContainer *container = giommu->container;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     bool read_only;
@@ -504,7 +504,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
+        VFIOGuestIOMMUMR *giommu;
         IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
 
         trace_vfio_listener_region_add_iommu(iova, end);
@@ -526,7 +526,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                                IOMMU_MR_EVENT_ALL,
                                section->offset_within_region,
                                int128_get64(llend));
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+        QLIST_INSERT_HEAD(&container->giommu_mr_list, giommu, giommu_next);
 
         memory_region_register_iommu_notifier(section->mr, &giommu->n);
         memory_region_iommu_replay(giommu->iommu, &giommu->n);
@@ -593,9 +593,9 @@ static void vfio_listener_region_del(MemoryListener *listener,
     }
 
     if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
+        VFIOGuestIOMMUMR *giommu;
 
-        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+        QLIST_FOREACH(giommu, &container->giommu_mr_list, giommu_next) {
             if (MEMORY_REGION(giommu->iommu) == section->mr &&
                 giommu->n.start == section->offset_within_region) {
                 memory_region_unregister_iommu_notifier(section->mr,
@@ -1017,7 +1017,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container = g_malloc0(sizeof(*container));
     container->space = space;
     container->fd = fd;
-    QLIST_INIT(&container->giommu_list);
+    QLIST_INIT(&container->giommu_mr_list);
     QLIST_INIT(&container->hostwin_list);
     if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
         ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
@@ -1206,11 +1206,12 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
     if (QLIST_EMPTY(&container->group_list)) {
         VFIOAddressSpace *space = container->space;
-        VFIOGuestIOMMU *giommu, *tmp;
+        VFIOGuestIOMMUMR *giommu, *tmp;
 
         QLIST_REMOVE(container, next);
 
-        QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
+        QLIST_FOREACH_SAFE(giommu, &container->giommu_mr_list,
+                           giommu_next, tmp) {
             memory_region_unregister_iommu_notifier(
                     MEMORY_REGION(giommu->iommu), &giommu->n);
             QLIST_REMOVE(giommu, giommu_next);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 865e3e7..702a085 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -87,19 +87,19 @@ typedef struct VFIOContainer {
      * contiguous IOVA window.  We may need to generalize that in
      * future
      */
-    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_HEAD(, VFIOGuestIOMMUMR) giommu_mr_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;
 } VFIOContainer;
 
-typedef struct VFIOGuestIOMMU {
+typedef struct VFIOGuestIOMMUMR {
     VFIOContainer *container;
     IOMMUMemoryRegion *iommu;
     hwaddr iommu_offset;
     IOMMUMRNotifier n;
-    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
+    QLIST_ENTRY(VFIOGuestIOMMUMR) giommu_next;
+} VFIOGuestIOMMUMR;
 
 typedef struct VFIOHostDMAWindow {
     hwaddr min_iova;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 02/12] vfio: rename GuestIOMMU to be GuestIOMMUMR Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-02 15:13   ` Paolo Bonzini
  2018-03-06  8:51   ` Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 04/12] vfio/pci: add notify framework based on IOMMUSVAContext Liu, Yi L
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

From: Peter Xu <peterx@redhat.com>

This patch adds IOMMUSVAContext as an abstract for virt-SVA in
Qemu.

IOMMUSVAContext is per-PASID(Process Address Space Identity).
A PASID Tagged AddressSpace should have an IOMMUSVAContext
created for it. virt-SVA emulation for emulated SVA capable
devices would use IOMMUSVAContext. And for assigned devices,
Qemu also needs to propagate guest tlb flush to host through
the sva_notifer based on IOMMUSVAContext.

This patch proposes to include a sva_notifier list and
an IOMMUSVAContextOps in IOMMUSVAContext.

* The sva_notifier list would include tlb invalidate nofitifer
  to propagate guest's iotlb flush to host.
* The first callback in IOMMUSVAContextOps would be an address
  translation callback. For the SVA aware DMAs issued by emulated
  SVA capable devices, it requires Qemu to emulate data read/write
  to guest process address space. Qemu needs to do address translation
  with guest process page table. So the IOMMUSVAContextOps.translate()
  callback would be helpful for emulating SVA capable devices.

Note: to fulfill the IOMMUSVAContext based address translation
framework, may duplicate quite a few existing MemoryRegion based
translation code in Qemu. As this patchset is mainly to support
assigned SVA capable devices. So this patchset hasn't done the
duplication. In future, if any requirement for emulating SVA
capable device, it would require a separate patchset to fulfill
the translation framework.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/core/Makefile.objs   |   1 +
 hw/core/pasid.c         |  64 ++++++++++++++++++++++++++++
 include/hw/core/pasid.h | 110 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 175 insertions(+)
 create mode 100644 hw/core/pasid.c
 create mode 100644 include/hw/core/pasid.h

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index 1240728..01989d2 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_SOFTMMU) += fw-path-provider.o
 # irq.o needed for qdev GPIO handling:
 common-obj-y += irq.o
 common-obj-y += hotplug.o
+common-obj-y += pasid.o
 common-obj-$(CONFIG_SOFTMMU) += nmi.o
 
 common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o
diff --git a/hw/core/pasid.c b/hw/core/pasid.c
new file mode 100644
index 0000000..c4b0c5d
--- /dev/null
+++ b/hw/core/pasid.c
@@ -0,0 +1,64 @@
+/*
+ * QEMU abstract of Shared Virtual Memory logic
+ *
+ * Copyright (C) 2018 Red Hat Inc.
+ *
+ * Authors: Peter Xu <peterx@redhat.com>,
+ *          Liu, Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/core/pasid.h"
+
+void iommu_sva_notifier_register(IOMMUSVAContext *sva_ctx,
+                                 IOMMUSVANotifier *n,
+                                 IOMMUSVANotifyFn fn,
+                                 IOMMUSVAEvent event)
+{
+    n->event = event;
+    n->sva_notify = fn;
+    QLIST_INSERT_HEAD(&sva_ctx->sva_notifiers, n, node);
+    return;
+}
+
+void iommu_sva_notifier_unregister(IOMMUSVAContext *sva_ctx,
+                                   IOMMUSVANotifier *notifier)
+{
+    IOMMUSVANotifier *cur, *next;
+
+    QLIST_FOREACH_SAFE(cur, &sva_ctx->sva_notifiers, node, next) {
+        if (cur == notifier) {
+            QLIST_REMOVE(cur, node);
+            break;
+        }
+    }
+}
+
+void iommu_sva_notify(IOMMUSVAContext *sva_ctx, IOMMUSVAEventData *event_data)
+{
+    IOMMUSVANotifier *cur;
+
+    QLIST_FOREACH(cur, &sva_ctx->sva_notifiers, node) {
+        if ((cur->event == event_data->event) && cur->sva_notify) {
+            cur->sva_notify(cur, event_data);
+        }
+    }
+}
+
+void iommu_sva_ctx_init(IOMMUSVAContext *sva_ctx)
+{
+    QLIST_INIT(&sva_ctx->sva_notifiers);
+}
diff --git a/include/hw/core/pasid.h b/include/hw/core/pasid.h
new file mode 100644
index 0000000..4c7dccb
--- /dev/null
+++ b/include/hw/core/pasid.h
@@ -0,0 +1,110 @@
+/*
+ * QEMU abstraction of Shared Virtual Memory
+ *
+ * Copyright (C) 2018 Red Hat Inc.
+ *
+ * Authors: Peter Xu <peterx@redhat.com>,
+ *          Liu, Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_PCI_PASID_H
+#define HW_PCI_PASID_H
+
+#include "qemu/queue.h"
+#ifndef CONFIG_USER_ONLY
+#include "exec/hwaddr.h"
+#endif
+
+typedef struct IOMMUSVAContext IOMMUSVAContext;
+
+enum IOMMUSVAEvent {
+    IOMMU_SVA_EVENT_TLB_INV,
+};
+typedef enum IOMMUSVAEvent IOMMUSVAEvent;
+
+struct IOMMUSVAEventData {
+    IOMMUSVAEvent event;
+    uint64_t length;
+    void *data;
+};
+typedef struct IOMMUSVAEventData IOMMUSVAEventData;
+
+typedef struct IOMMUSVANotifier IOMMUSVANotifier;
+
+typedef void (*IOMMUSVANotifyFn)(IOMMUSVANotifier *notifier,
+                                 IOMMUSVAEventData *event_data);
+
+typedef struct IOMMUSVATLBEntry IOMMUSVATLBEntry;
+
+/* See address_space_translate: bit 0 is read, bit 1 is write.  */
+typedef enum {
+    IOMMU_SVA_NONE = 0,
+    IOMMU_SVA_RO   = 1,
+    IOMMU_SVA_WO   = 2,
+    IOMMU_SVA_RW   = 3,
+} IOMMUSVAAccessFlags;
+
+#define IOMMU_SVA_ACCESS_FLAG(r, w) (((r) ? IOMMU_SVA_RO : 0) | \
+                                     ((w) ? IOMMU_SVA_WO : 0))
+
+struct IOMMUSVATLBEntry {
+    AddressSpace    *target_as;
+    hwaddr           va;
+    hwaddr           translated_addr;
+    hwaddr           addr_mask;  /* 0xfff = 4k translation */
+    IOMMUSVAAccessFlags perm;
+};
+
+typedef struct IOMMUSVAContextOps IOMMUSVAContextOps;
+struct IOMMUSVAContextOps {
+    /* Return a TLB entry that contains a given address. */
+    IOMMUSVATLBEntry (*translate)(IOMMUSVAContext *sva_ctx,
+                                  hwaddr addr, bool is_write);
+};
+
+struct IOMMUSVANotifier {
+    IOMMUSVANotifyFn sva_notify;
+    /*
+     * What events we are listening to. Let's allow multiple event
+     * registrations from beginning.
+     */
+    IOMMUSVAEvent event;
+    QLIST_ENTRY(IOMMUSVANotifier) node;
+};
+
+/*
+ * This stands for an IOMMU unit. Any translation device should have
+ * this struct inside its own structure to make sure it can leverage
+ * common IOMMU functionalities.
+ */
+struct IOMMUSVAContext {
+    uint32_t pasid;
+    QLIST_HEAD(, IOMMUSVANotifier) sva_notifiers;
+    const IOMMUSVAContextOps *sva_ctx_ops;
+};
+
+void iommu_sva_notifier_register(IOMMUSVAContext *sva_ctx,
+                                 IOMMUSVANotifier *n,
+                                 IOMMUSVANotifyFn fn,
+                                 IOMMUSVAEvent event);
+void iommu_sva_notifier_unregister(IOMMUSVAContext *sva_ctx,
+                                   IOMMUSVANotifier *notifier);
+void iommu_sva_notify(IOMMUSVAContext *sva_ctx,
+                      IOMMUSVAEventData *event_data);
+
+void iommu_sva_ctx_init(IOMMUSVAContext *sva_ctx);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 04/12] vfio/pci: add notify framework based on IOMMUSVAContext
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (2 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-05  7:45   ` Peter Xu
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice Liu, Yi L
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch introduces a notify framework for IOMMUSVAContext.sva_notifiers.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c              | 1 +
 include/hw/vfio/vfio-common.h | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 06277d2..1cc96df 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1019,6 +1019,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->fd = fd;
     QLIST_INIT(&container->giommu_mr_list);
     QLIST_INIT(&container->hostwin_list);
+    QLIST_INIT(&container->gsva_ctx_list);
     if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
         ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
         bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 702a085..4c16b4c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -29,6 +29,7 @@
 #ifdef CONFIG_LINUX
 #include <linux/vfio.h>
 #endif
+#include "hw/core/pasid.h"
 
 #define ERR_PREFIX "vfio error: %s: "
 #define WARN_PREFIX "vfio warning: %s: "
@@ -88,6 +89,7 @@ typedef struct VFIOContainer {
      * future
      */
     QLIST_HEAD(, VFIOGuestIOMMUMR) giommu_mr_list;
+    QLIST_HEAD(, VFIOGuestIOMMUSVAContext) gsva_ctx_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;
@@ -101,6 +103,13 @@ typedef struct VFIOGuestIOMMUMR {
     QLIST_ENTRY(VFIOGuestIOMMUMR) giommu_next;
 } VFIOGuestIOMMUMR;
 
+typedef struct VFIOGuestIOMMUSVAContext {
+    VFIOContainer *container;
+    IOMMUSVAContext *sva_ctx;
+    IOMMUSVANotifier n;
+    QLIST_ENTRY(VFIOGuestIOMMUSVAContext) gsva_ctx_next;
+} VFIOGuestIOMMUSVAContext;
+
 typedef struct VFIOHostDMAWindow {
     hwaddr min_iova;
     hwaddr max_iova;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (3 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 04/12] vfio/pci: add notify framework based on IOMMUSVAContext Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-02 15:10   ` Paolo Bonzini
  2018-03-06 10:33   ` Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 06/12] vfio/pci: provide vfio_pci_sva_ops instance Liu, Yi L
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch intoduces PCISVAOps for virt-SVA.

So far, to setup virt-SVA for assigned SVA capable device, needs to
config host translation structures. e.g. for VT-d, needs to set the
guest pasid table to host and enable nested translation. Besides,
vIOMMU emulator needs to forward guest's cache invalidation to host.
On VT-d, it is guest's invalidation to 1st level translation related
cache, such invalidation should be forwarded to host.

Proposed PCISVAOps are:
* sva_bind_guest_pasid_table: set the guest pasid table to host, and
                              enable nested translation in host
* sva_register_notifier: register sva_notifier to forward guest's
                         cache invalidation to host
* sva_unregister_notifier: unregister sva_notifier

The PCISVAOps should be provided by vfio or modules alike. Mainly for
assigned SVA capable devices.

Take virt-SVA on VT-d as an exmaple:
If a guest wants to setup virt-SVA for an assigned SVA capable device,
it programs its context entry. vIOMMU emulator captures guest's context
entry programming, and figure out the target device. vIOMMU emulator
use the pci_device_sva_bind_pasid_table() API to bind the guest pasid
table to host.

Guest would also program its pasid table. vIOMMU emulator captures
guest's pasid entry programming. In Qemu, needs to allocate an
AddressSpace to stand for the pasid tagged address space and Qemu also
needs to register sva_notifier to forward future cache invalidation
request to host.

Allocating AddressSpace to stand for the pasid tagged address space is
for the emulation of emulated SVA capable devices. Emulated SVA capable
devices may issue SVA aware DMAs, Qemu needs to emulate read/write to a
pasid tagged AddressSpace. Thus needs an abstraction for such address
space in Qemu.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/pci/pci.c         | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci.h | 21 ++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e006b6a..157fe21 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2573,6 +2573,66 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
     bus->iommu_opaque = opaque;
 }
 
+void pci_setup_sva_ops(PCIDevice *dev, PCISVAOps *ops)
+{
+    if (dev) {
+        dev->sva_ops = ops;
+    }
+    return;
+}
+
+void pci_device_sva_bind_pasid_table(PCIBus *bus,
+                     int32_t devfn, uint64_t addr, uint32_t size)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->sva_ops) {
+        dev->sva_ops->sva_bind_pasid_table(bus, devfn, addr, size);
+    }
+    return;
+}
+
+void pci_device_sva_register_notifier(PCIBus *bus, int32_t devfn,
+                                      IOMMUSVAContext *sva_ctx)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->sva_ops) {
+        dev->sva_ops->sva_register_notifier(bus,
+                                            devfn,
+                                            sva_ctx);
+    }
+    return;
+}
+
+void pci_device_sva_unregister_notifier(PCIBus *bus, int32_t devfn,
+                                        IOMMUSVAContext *sva_ctx)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->sva_ops) {
+        dev->sva_ops->sva_unregister_notifier(bus,
+                                              devfn,
+                                              sva_ctx);
+    }
+    return;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
     Range *range = opaque;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d8c18c7..32889a4 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -10,6 +10,8 @@
 
 #include "hw/pci/pcie.h"
 
+#include "hw/core/pasid.h"
+
 extern bool pci_available;
 
 /* PCI bus */
@@ -262,6 +264,16 @@ struct PCIReqIDCache {
 };
 typedef struct PCIReqIDCache PCIReqIDCache;
 
+typedef struct PCISVAOps PCISVAOps;
+struct PCISVAOps {
+    void (*sva_bind_pasid_table)(PCIBus *bus, int32_t devfn,
+             uint64_t pasidt_addr, uint32_t size);
+    void (*sva_register_notifier)(PCIBus *bus, int32_t devfn,
+                                  IOMMUSVAContext *sva_ctx);
+    void (*sva_unregister_notifier)(PCIBus *bus, int32_t devfn,
+                                    IOMMUSVAContext *sva_ctx);
+};
+
 struct PCIDevice {
     DeviceState qdev;
 
@@ -351,6 +363,7 @@ struct PCIDevice {
     MSIVectorUseNotifier msix_vector_use_notifier;
     MSIVectorReleaseNotifier msix_vector_release_notifier;
     MSIVectorPollNotifier msix_vector_poll_notifier;
+    PCISVAOps *sva_ops;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
@@ -477,6 +490,14 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
+void pci_setup_sva_ops(PCIDevice *dev, PCISVAOps *ops);
+void pci_device_sva_bind_pasid_table(PCIBus *bus, int32_t devfn,
+                     uint64_t pasidt_addr, uint32_t size);
+void pci_device_sva_register_notifier(PCIBus *bus, int32_t devfn,
+                                      IOMMUSVAContext *sva_ctx);
+void pci_device_sva_unregister_notifier(PCIBus *bus, int32_t devfn,
+                                       IOMMUSVAContext *sva_ctx);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 06/12] vfio/pci: provide vfio_pci_sva_ops instance
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (4 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier Liu, Yi L
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

VFIO is the bridge for vIOMMU and host IOMMU. Needs to provide API
for vIOMMU emulator to set configs to host IOMMU. In this patchset,
such API is exposed in hw/pci.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/pci.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 033cc8d..a60a4d7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2775,6 +2775,34 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
+                 int32_t devfn, uint64_t pasidt_addr, uint32_t size)
+{
+ /* Propagate the guest pasid table pointer to host IOMMU, and
+    enable nested translation accordingly. Depends on HW design.
+    So far, Intel VT-d and AMD IOMMU requires it. */
+}
+
+static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
+                          int32_t devfn, IOMMUSVAContext *sva_ctx)
+{
+    /* Register notifier for TLB invalidation propagation
+       */
+}
+
+static void vfio_pci_device_sva_unregister_notifier(PCIBus *bus,
+                          int32_t devfn, IOMMUSVAContext *sva_ctx)
+{
+    /* Unregister notifier for TLB invalidation propagation
+       */
+}
+
+static PCISVAOps vfio_pci_sva_ops = {
+    .sva_bind_pasid_table = vfio_pci_device_sva_bind_pasid_table,
+    .sva_register_notifier = vfio_pci_device_sva_register_notifier,
+    .sva_unregister_notifier = vfio_pci_device_sva_unregister_notifier,
+};
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
@@ -3019,6 +3047,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
+
     return;
 
 out_teardown:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (5 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 06/12] vfio/pci: provide vfio_pci_sva_ops instance Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-06  6:44   ` Peter Xu
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu() Liu, Yi L
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch shows how sva notifier is registered. And provided
an example by registering notify func for tlb flush propagation.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/pci.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a60a4d7..b7297cc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2775,6 +2775,26 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static VFIOContainer *vfio_get_container_from_busdev(PCIBus *bus,
+                                                     int32_t devfn)
+{
+    VFIOGroup *group;
+    VFIOPCIDevice *vdev_iter;
+    VFIODevice *vbasedev_iter;
+    PCIDevice *pdev_iter;
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            vdev_iter = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+            pdev_iter = &vdev_iter->pdev;
+            if (pci_get_bus(pdev_iter) == bus && pdev_iter->devfn == devfn) {
+                return group->container;
+            }
+        }
+    }
+    return NULL;
+}
+
 static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
                  int32_t devfn, uint64_t pasidt_addr, uint32_t size)
 {
@@ -2783,11 +2803,42 @@ static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
     So far, Intel VT-d and AMD IOMMU requires it. */
 }
 
+static void vfio_iommu_sva_tlb_invalidate_notify(IOMMUSVANotifier *n,
+                                                 IOMMUSVAEventData *event_data)
+{
+/*  Sample code, would be detailed in coming virt-SVA patchset.
+    VFIOGuestIOMMUSVAContext *gsva_ctx;
+    IOMMUSVAContext *sva_ctx;
+    VFIOContainer *container;
+
+    gsva_ctx = container_of(n, VFIOGuestIOMMUSVAContext, n);
+    container = gsva_ctx->container;
+
+    TODO: forward to host through VFIO IOCTL
+*/
+}
+
 static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
                           int32_t devfn, IOMMUSVAContext *sva_ctx)
 {
-    /* Register notifier for TLB invalidation propagation
-       */
+    VFIOContainer *container = vfio_get_container_from_busdev(bus, devfn);
+
+    if (container != NULL) {
+        VFIOGuestIOMMUSVAContext *gsva_ctx;
+        gsva_ctx = g_malloc0(sizeof(*gsva_ctx));
+        gsva_ctx->sva_ctx = sva_ctx;
+        gsva_ctx->container = container;
+        QLIST_INSERT_HEAD(&container->gsva_ctx_list,
+                          gsva_ctx,
+                          gsva_ctx_next);
+       /* Register vfio_iommu_sva_tlb_invalidate_notify with event flag
+           IOMMU_SVA_EVENT_TLB_INV */
+        iommu_sva_notifier_register(sva_ctx,
+                                    &gsva_ctx->n,
+                                    vfio_iommu_sva_tlb_invalidate_notify,
+                                    IOMMU_SVA_EVENT_TLB_INV);
+        return;
+    }
 }
 
 static void vfio_pci_device_sva_unregister_notifier(PCIBus *bus,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (6 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-02 15:12   ` Paolo Bonzini
                     ` (2 more replies)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 09/12] intel_iommu: record assigned devices in a list Liu, Yi L
                   ` (4 subsequent siblings)
  12 siblings, 3 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch adds pci_device_notify_iommu() for notify virtual IOMMU
emulator when assigned device is added. And adds a new notify_func
in PCIBus. vIOMMU emulator provides the instance of this notify_func.

Reason:
When virtual IOMMU is exposed to guest, vIOMMU emulator needs to
programm host IOMMU to setup DMA mapping for assigned devices. This
is a per-device operation, to be efficient, vIOMMU emulator needs
to record the assigned devices.

Example: devices assigned thru vfio, vfio_realize would call
pci_device_notify_iommu() to notify vIOMMU emulator to record necessary
info for assigned device.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/alpha/typhoon.c       |  2 +-
 hw/hppa/dino.c           |  2 +-
 hw/i386/amd_iommu.c      |  2 +-
 hw/i386/intel_iommu.c    | 22 +++++++++++++++++++++-
 hw/pci-host/ppce500.c    |  2 +-
 hw/pci-host/prep.c       |  2 +-
 hw/pci-host/sabre.c      |  2 +-
 hw/pci/pci.c             | 25 +++++++++++++++++++++++--
 hw/ppc/spapr_pci.c       |  2 +-
 hw/s390x/s390-pci-bus.c  |  4 ++--
 hw/vfio/pci.c            |  3 +++
 include/hw/pci/pci.h     | 12 +++++++++++-
 include/hw/pci/pci_bus.h |  1 +
 13 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 6a40869..a7b02cd 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -894,7 +894,7 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus,
                              "iommu-typhoon", UINT64_MAX);
     address_space_init(&s->pchip.iommu_as, MEMORY_REGION(&s->pchip.iommu),
                        "pchip0-pci");
-    pci_setup_iommu(b, typhoon_pci_dma_iommu, s);
+    pci_setup_iommu(b, typhoon_pci_dma_iommu, NULL, s);
 
     /* Pchip0 PCI special/interrupt acknowledge, 0x801.F800.0000, 64MB.  */
     memory_region_init_io(&s->pchip.reg_iack, OBJECT(s), &alpha_pci_iack_ops,
diff --git a/hw/hppa/dino.c b/hw/hppa/dino.c
index 15aefde..7867b46 100644
--- a/hw/hppa/dino.c
+++ b/hw/hppa/dino.c
@@ -481,7 +481,7 @@ PCIBus *dino_init(MemoryRegion *addr_space,
                                 0xf0000000 + DINO_MEM_CHUNK_SIZE,
                                 &s->bm_pci_alias);
     address_space_init(&s->bm_as, &s->bm, "pci-bm");
-    pci_setup_iommu(b, dino_pcihost_set_iommu, s);
+    pci_setup_iommu(b, dino_pcihost_set_iommu, NULL, s);
 
     *p_rtc_irq = qemu_allocate_irq(dino_set_timer_irq, s, 0);
     *p_ser_irq = qemu_allocate_irq(dino_set_serial_irq, s, 0);
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 7bfde37..341a14d 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1178,7 +1178,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
 
     sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->mmio);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, AMDVI_BASE_ADDR);
-    pci_setup_iommu(bus, amdvi_host_dma_iommu, s);
+    pci_setup_iommu(bus, amdvi_host_dma_iommu, NULL, s);
     s->devid = object_property_get_int(OBJECT(&s->pci), "addr", err);
     msi_init(&s->pci.dev, 0, 1, true, false, err);
     amdvi_init(s);
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9edf392..2fd0a6d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3005,6 +3005,26 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &vtd_as->as;
 }
 
+static int vtd_device_notify(PCIBus *bus,
+                             void *opaque,
+                             int devfn,
+                             PCIDevNotifyType type)
+{
+    IntelIOMMUState *s = opaque;
+    VTDAddressSpace *vtd_as;
+
+    assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+
+    vtd_as = vtd_find_add_as(s, bus, devfn);
+
+    if (vtd_as == NULL) {
+        return -1;
+    }
+
+    /* TODO: record assigned device in IOMMU Emulator */
+    return 0;
+}
+
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
@@ -3075,7 +3095,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
                                               g_free, g_free);
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
-    pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
+    pci_setup_iommu(bus, vtd_host_dma_iommu, vtd_device_notify, dev);
     /* Pseudo address space under root PCI bus. */
     pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 }
diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c
index eb75e08..8175df5 100644
--- a/hw/pci-host/ppce500.c
+++ b/hw/pci-host/ppce500.c
@@ -469,7 +469,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
     memory_region_init(&s->bm, OBJECT(s), "bm-e500", UINT64_MAX);
     memory_region_add_subregion(&s->bm, 0x0, &s->busmem);
     address_space_init(&s->bm_as, &s->bm, "pci-bm");
-    pci_setup_iommu(b, e500_pcihost_set_iommu, s);
+    pci_setup_iommu(b, e500_pcihost_set_iommu, NULL, s);
 
     pci_create_simple(b, 0, "e500-host-bridge");
 
diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
index 01f67f9..b4d02cf 100644
--- a/hw/pci-host/prep.c
+++ b/hw/pci-host/prep.c
@@ -282,7 +282,7 @@ static void raven_pcihost_initfn(Object *obj)
     memory_region_add_subregion(&s->bm, 0         , &s->bm_pci_memory_alias);
     memory_region_add_subregion(&s->bm, 0x80000000, &s->bm_ram_alias);
     address_space_init(&s->bm_as, &s->bm, "raven-bm");
-    pci_setup_iommu(&s->pci_bus, raven_pcihost_set_iommu, s);
+    pci_setup_iommu(&s->pci_bus, raven_pcihost_set_iommu, NULL, s);
 
     h->bus = &s->pci_bus;
 
diff --git a/hw/pci-host/sabre.c b/hw/pci-host/sabre.c
index e2f4ee4..e119753 100644
--- a/hw/pci-host/sabre.c
+++ b/hw/pci-host/sabre.c
@@ -399,7 +399,7 @@ static void sabre_realize(DeviceState *dev, Error **errp)
     /* IOMMU */
     memory_region_add_subregion_overlap(&s->sabre_config, 0x200,
                     sysbus_mmio_get_region(SYS_BUS_DEVICE(s->iommu), 0), 1);
-    pci_setup_iommu(phb->bus, sabre_pci_dma_iommu, s->iommu);
+    pci_setup_iommu(phb->bus, sabre_pci_dma_iommu, NULL, s->iommu);
 
     /* APB secondary busses */
     pci_dev = pci_create_multifunction(phb->bus, PCI_DEVFN(1, 0), true,
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 157fe21..0f2db02 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2567,9 +2567,30 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     return &address_space_memory;
 }
 
-void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
+void pci_device_notify_iommu(PCIDevice *dev, PCIDevNotifyType type)
 {
-    bus->iommu_fn = fn;
+    PCIBus *bus = PCI_BUS(pci_get_bus(dev));
+    PCIBus *iommu_bus = bus;
+
+    while (iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
+        iommu_bus = PCI_BUS(pci_get_bus(iommu_bus->parent_dev));
+    }
+    if (iommu_bus && iommu_bus->notify_fn) {
+        iommu_bus->notify_fn(bus,
+                             iommu_bus->iommu_opaque,
+                             dev->devfn,
+                             type);
+    }
+    return;
+}
+
+void pci_setup_iommu(PCIBus *bus,
+                     PCIIOMMUFunc iommu_fn,
+                     PCIIOMMUNotifyFunc notify_fn,
+                     void *opaque)
+{
+    bus->iommu_fn = iommu_fn;
+    bus->notify_fn = notify_fn;
     bus->iommu_opaque = opaque;
 }
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 39a1498..2be0a0f 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1687,7 +1687,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     memory_region_add_subregion(&sphb->iommu_root, SPAPR_PCI_MSI_WINDOW,
                                 &sphb->msiwindow);
 
-    pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
+    pci_setup_iommu(bus, spapr_pci_dma_iommu, NULL, sphb);
 
     pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
 
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 1bad7ab..cc650c4 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -705,7 +705,7 @@ static int s390_pcihost_init(SysBusDevice *dev)
                               s390_pci_set_irq, s390_pci_map_irq, NULL,
                               get_system_memory(), get_system_io(), 0, 64,
                               TYPE_PCI_BUS);
-    pci_setup_iommu(b, s390_pci_dma_iommu, s);
+    pci_setup_iommu(b, s390_pci_dma_iommu, NULL, s);
 
     bus = BUS(b);
     qbus_set_hotplug_handler(bus, DEVICE(dev), NULL);
@@ -817,7 +817,7 @@ static void s390_pcihost_hot_plug(HotplugHandler *hotplug_dev,
         PCIDevice *pdev = PCI_DEVICE(dev);
 
         pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
-        pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
+        pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, NULL, s);
 
         bus = BUS(&pb->sec_bus);
         qbus_set_hotplug_handler(bus, DEVICE(s), errp);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b7297cc..a9c0898 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3098,6 +3098,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    pci_device_notify_iommu(pdev, PCI_NTY_DEV_ADD);
+
     pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
 
     return;
@@ -3134,6 +3136,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 
+    pci_device_notify_iommu(pdev, PCI_NTY_DEV_DEL);
     vfio_unregister_req_notifier(vdev);
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 32889a4..964be2b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -485,10 +485,20 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
+enum PCIDevNotifyType {
+    PCI_NTY_DEV_ADD = 0,
+    PCI_NTY_DEV_DEL,
+};
+typedef enum PCIDevNotifyType PCIDevNotifyType;
 typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
+typedef int (*PCIIOMMUNotifyFunc)(PCIBus *, void *, int, PCIDevNotifyType);
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
-void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
+void pci_device_notify_iommu(PCIDevice *dev, PCIDevNotifyType type);
+void pci_setup_iommu(PCIBus *bus,
+                     PCIIOMMUFunc iommu_fn,
+                     PCIIOMMUNotifyFunc notify_fn,
+                     void *opaque);
 
 void pci_setup_sva_ops(PCIDevice *dev, PCISVAOps *ops);
 void pci_device_sva_bind_pasid_table(PCIBus *bus, int32_t devfn,
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index b7da8f5..54a0c8e 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -21,6 +21,7 @@ typedef struct PCIBusClass {
 struct PCIBus {
     BusState qbus;
     PCIIOMMUFunc iommu_fn;
+    PCIIOMMUNotifyFunc notify_fn;
     void *iommu_opaque;
     uint8_t devfn_min;
     uint32_t slot_reserved_mask;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 09/12] intel_iommu: record assigned devices in a list
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (7 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu() Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-02 15:08   ` Paolo Bonzini
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 10/12] intel_iommu: bind guest pasid table to host Liu, Yi L
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch records assigned devices in a list within Intel vIOMMU
emulator. The recorded info can be used to filter out affect assigned
devices when Qemu captured guest's cache invalidate request.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c         | 31 ++++++++++++++++++++++++++-----
 include/hw/i386/intel_iommu.h | 11 ++++++++++-
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2fd0a6d..978f47a 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2685,7 +2685,10 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
     },
 };
 
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s,
+                                 PCIBus *bus,
+                                 int devfn,
+                                 bool allocate)
 {
     uintptr_t key = (uintptr_t)bus;
     VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
@@ -2704,7 +2707,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 
     vtd_dev_as = vtd_bus->dev_as[devfn];
 
-    if (!vtd_dev_as) {
+    if (!vtd_dev_as && allocate) {
         snprintf(name, sizeof(name), "intel_iommu_devfn_%d", devfn);
         vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
 
@@ -3001,7 +3004,7 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
     assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
 
-    vtd_as = vtd_find_add_as(s, bus, devfn);
+    vtd_as = vtd_find_add_as(s, bus, devfn, true);
     return &vtd_as->as;
 }
 
@@ -3012,16 +3015,34 @@ static int vtd_device_notify(PCIBus *bus,
 {
     IntelIOMMUState *s = opaque;
     VTDAddressSpace *vtd_as;
+    IntelIOMMUAssignedDeviceNode *node = NULL;
+    IntelIOMMUAssignedDeviceNode *next_node = NULL;
 
     assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
 
-    vtd_as = vtd_find_add_as(s, bus, devfn);
+    vtd_as = vtd_find_add_as(s, bus, devfn, false);
 
     if (vtd_as == NULL) {
         return -1;
     }
 
-    /* TODO: record assigned device in IOMMU Emulator */
+    if (type == PCI_NTY_DEV_ADD) {
+        node = g_malloc0(sizeof(*node));
+        node->vtd_as = vtd_as;
+        QLIST_INSERT_HEAD(&s->assigned_device_list, node, next);
+        return 0;
+    }
+
+    QLIST_FOREACH_SAFE(node, &s->assigned_device_list, next, next_node) {
+        if (node->vtd_as == vtd_as) {
+            if (type == PCI_NTY_DEV_DEL) {
+                QLIST_REMOVE(node, next);
+                g_free(node);
+            }
+            break;
+        }
+    }
+
     return 0;
 }
 
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 1df6fa9..0b6dc32 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -68,6 +68,7 @@ typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDIrq VTDIrq;
 typedef struct VTD_MSIMessage VTD_MSIMessage;
 typedef struct IntelIOMMUMRNotifierNode IntelIOMMUMRNotifierNode;
+typedef struct IntelIOMMUAssignedDeviceNode IntelIOMMUAssignedDeviceNode;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -258,6 +259,11 @@ struct IntelIOMMUMRNotifierNode {
     QLIST_ENTRY(IntelIOMMUMRNotifierNode) next;
 };
 
+struct IntelIOMMUAssignedDeviceNode {
+    VTDAddressSpace *vtd_as;
+    QLIST_ENTRY(IntelIOMMUAssignedDeviceNode) next;
+};
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
     X86IOMMUState x86_iommu;
@@ -296,6 +302,8 @@ struct IntelIOMMUState {
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
     /* list of registered notifiers */
     QLIST_HEAD(, IntelIOMMUMRNotifierNode) notifiers_list;
+    /* list of assigned devices */
+    QLIST_HEAD(, IntelIOMMUAssignedDeviceNode) assigned_device_list;
 
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
@@ -310,6 +318,7 @@ struct IntelIOMMUState {
 /* Find the VTD Address space associated with the given bus pointer,
  * create a new one if none exists
  */
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
+VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
+                                 int devfn, bool allocate);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 10/12] intel_iommu: bind guest pasid table to host
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (8 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 09/12] intel_iommu: record assigned devices in a list Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management Liu, Yi L
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

For assigned SVA capable devices, needs to bind guest pasid table
to host. Intel vIOMMU emulator captures device selective context
cache flush, and propagate the guest pasid table pointer to host,
in host iommu driver configs the guest pasid table pointer in its
translation structure.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 978f47a..d92a66d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -37,6 +37,10 @@
 #include "kvm_i386.h"
 #include "trace.h"
 
+static bool vtd_device_is_assigned(IntelIOMMUState *s,
+                                   PCIBus *bus,
+                                   uint16_t devfn);
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
@@ -1255,6 +1259,20 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
     vtd_iommu_replay_all(s);
 }
 
+static uint64_t vtd_get_pasid_table_from_context(VTDContextEntry *ce)
+{
+    uint64_t pasidt_addr = ce->hi;
+    /* TODO: TBD */
+    return pasidt_addr;
+}
+
+static uint32_t vtd_get_pasidt_size_from_context(VTDContextEntry *ce)
+{
+    uint32_t pasidt_size = ce->hi;
+    /* TODO: TBD */
+    return pasidt_size;
+}
+
 /* Do a context-cache device-selective invalidation.
  * @func_mask: FM field after shifting
  */
@@ -1291,6 +1309,11 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
     if (vtd_bus) {
         devfn = VTD_SID_TO_DEVFN(source_id);
         for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
+            VTDContextEntry ce;
+            int ret = 0;
+            uint64_t pasidt_addr;
+            uint32_t size;
+
             vtd_as = vtd_bus->dev_as[devfn_it];
             if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
                 trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
@@ -1311,6 +1334,26 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                  * happened.
                  */
                 memory_region_iommu_replay_all(&vtd_as->iommu);
+
+                /*
+                 * If device is SVA capable assigned device, needs
+                 * to bind guest pasid table to host
+                 *
+                 */
+                if (!vtd_device_is_assigned(s, vtd_as->bus, devfn_it)) {
+                    continue;
+                }
+
+                ret = vtd_dev_to_context_entry(s, bus_n,
+                                               vtd_as->devfn, &ce);
+                if (ret) {
+                    continue;
+                }
+
+                pasidt_addr = vtd_get_pasid_table_from_context(&ce);
+                size = vtd_get_pasidt_size_from_context(&ce);
+                pci_device_sva_bind_pasid_table(vtd_as->bus, devfn_it,
+                                                pasidt_addr, size);
             }
         }
     }
@@ -3046,6 +3089,32 @@ static int vtd_device_notify(PCIBus *bus,
     return 0;
 }
 
+static bool vtd_device_is_assigned(IntelIOMMUState *s,
+                                   PCIBus *bus,
+                                   uint16_t devfn)
+{
+    VTDAddressSpace *vtd_as;
+    IntelIOMMUAssignedDeviceNode *node = NULL;
+    IntelIOMMUAssignedDeviceNode *next_node = NULL;
+
+    vtd_as = vtd_find_add_as(s, bus, devfn, false);
+
+    if (vtd_as == NULL) {
+        /*
+         * If vtd_as is NULL, return false for safe
+         */
+        return false;
+    }
+
+    QLIST_FOREACH_SAFE(node, &s->assigned_device_list, next, next_node) {
+        if (node->vtd_as == vtd_as) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (9 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 10/12] intel_iommu: bind guest pasid table to host Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-02 14:52   ` Paolo Bonzini
  2018-03-02 15:00   ` Paolo Bonzini
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace Liu, Yi L
  2018-03-06  6:55 ` [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Peter Xu
  12 siblings, 2 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch introduces a framework to manage PASID tagged AddressSpace
in Intel vIOMMU emulator. PASID tagged AddressSpace is an address sapce
which is an abstract of guest process address space in Qemu. The
management framework is as below:

         s->pasid_as_list
              /|\ \
             / | \ \
     pasid_as_node  ...
        /|\ \
       / | \ \
  device ...

There is a list to store all the PASID tagged AddressSpace, and each
PASID tagged AddressSpace has a device list behind it. This is due to
the fact that a PASID tagged AddressSpace can have multiple devices
binded.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c         |  1 +
 include/hw/i386/intel_iommu.h | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d92a66d..b8e8dbb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3174,6 +3174,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     }
 
     QLIST_INIT(&s->notifiers_list);
+    QLIST_INIT(&s->pasid_as_list);
     memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
     memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
                           "intel_iommu", DMAR_REG_SIZE);
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 0b6dc32..c45dbfe 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -61,6 +61,7 @@ typedef struct VTDContextEntry VTDContextEntry;
 typedef struct VTDContextCacheEntry VTDContextCacheEntry;
 typedef struct IntelIOMMUState IntelIOMMUState;
 typedef struct VTDAddressSpace VTDAddressSpace;
+typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace;
 typedef struct VTDIOTLBEntry VTDIOTLBEntry;
 typedef struct VTDBus VTDBus;
 typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
@@ -69,6 +70,8 @@ typedef struct VTDIrq VTDIrq;
 typedef struct VTD_MSIMessage VTD_MSIMessage;
 typedef struct IntelIOMMUMRNotifierNode IntelIOMMUMRNotifierNode;
 typedef struct IntelIOMMUAssignedDeviceNode IntelIOMMUAssignedDeviceNode;
+typedef struct IntelPASIDNode IntelPASIDNode;
+typedef struct VTDDeviceNode VTDDeviceNode;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -84,6 +87,20 @@ struct VTDContextCacheEntry {
     struct VTDContextEntry context_entry;
 };
 
+struct VTDDeviceNode {
+    PCIBus *bus;
+    uint8_t devfn;
+    QLIST_ENTRY(VTDDeviceNode) next;
+};
+
+struct VTDPASIDAddressSpace {
+    AddressSpace as;
+    IOMMUSVAContext sva_ctx;
+    IntelIOMMUState *iommu_state;
+    /* list of devices binded to a pasid tagged address space */
+    QLIST_HEAD(, VTDDeviceNode) device_list;
+};
+
 struct VTDAddressSpace {
     PCIBus *bus;
     uint8_t devfn;
@@ -264,6 +281,11 @@ struct IntelIOMMUAssignedDeviceNode {
     QLIST_ENTRY(IntelIOMMUAssignedDeviceNode) next;
 };
 
+struct IntelPASIDNode {
+    VTDPASIDAddressSpace *pasid_as;
+    QLIST_ENTRY(IntelPASIDNode) next;
+};
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
     X86IOMMUState x86_iommu;
@@ -304,6 +326,8 @@ struct IntelIOMMUState {
     QLIST_HEAD(, IntelIOMMUMRNotifierNode) notifiers_list;
     /* list of assigned devices */
     QLIST_HEAD(, IntelIOMMUAssignedDeviceNode) assigned_device_list;
+    /* list of pasid tagged address space */
+    QLIST_HEAD(, IntelPASIDNode) pasid_as_list;
 
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (10 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management Liu, Yi L
@ 2018-03-01 10:33 ` Liu, Yi L
  2018-03-02 14:51   ` Paolo Bonzini
  2018-03-06 11:43   ` Peter Xu
  2018-03-06  6:55 ` [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Peter Xu
  12 siblings, 2 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang, Liu, Yi L

This patch shows the idea of how a device is binded to a PASID tagged
AddressSpace.

when Intel vIOMMU emulator detected a pasid table entry programming
from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace
with the pasid field of pasid cache invalidate request.

* If it is to bind a device to a guest process, needs add the device
  to the device list behind the VTDPASIDAddressSpace. And if the device
  is assigned device, need to register sva_notfier for future tlb
  flushing if any mapping changed to the process address space.

* If it is to unbind a device from a guest process, then need to remove
  the device from the device list behind the VTDPASIDAddressSpace.
  And also needs to unregister the sva_notfier if the device is assigned
  device.

This patch hasn't added the unbind logic. It depends on guest pasid
table entry parsing which requires further emulation. Here just want
to show the idea for the PASID tagged AddressSpace management framework.
Full unregister logic would be included in future virt-SVA patchset.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 119 +++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  10 ++++
 2 files changed, 129 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b8e8dbb..ed07035 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s,
+                                              uint32_t pasid)
+{
+    VTDPASIDAddressSpace *vtd_pasid_as = NULL;
+    IntelPASIDNode *node;
+    char name[128];
+
+    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
+        vtd_pasid_as = node->pasid_as;
+        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
+            return vtd_pasid_as;
+        }
+    }
+
+    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
+    vtd_pasid_as->iommu_state = s;
+    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
+    address_space_init(&vtd_pasid_as->as, NULL, "pasid");
+    QLIST_INIT(&vtd_pasid_as->device_list);
+
+    node = g_malloc0(sizeof(*node));
+    node->pasid_as = vtd_pasid_as;
+    QLIST_INSERT_HEAD(&s->pasid_as_list, node, next);
+
+    return vtd_pasid_as;
+}
+
+static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as,
+                                        PCIBus *bus, uint8_t devfn)
+{
+    VTDDeviceNode *node = NULL;
+
+    QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) {
+        if (node->bus == bus && node->devfn == devfn) {
+            return;
+        }
+    }
+
+    node = g_malloc0(sizeof(*node));
+    node->bus = bus;
+    node->devfn = devfn;
+    QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next);
+
+    pci_device_sva_register_notifier(bus, devfn, &vtd_pasid_as->sva_ctx);
+
+    return;
+}
+
+static bool vtd_process_pc_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
+{
+
+    IntelIOMMUAssignedDeviceNode *node = NULL;
+    int ret = 0;
+
+    uint16_t domain_id;
+    uint32_t pasid;
+    VTDPASIDAddressSpace *vtd_pasid_as;
+
+    if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) {
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo);
+
+    switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) {
+    case VTD_INV_DESC_PASIDC_ALL_ALL:
+        /* TODO: invalidate all pasid related cache */
+        break;
+
+    case VTD_INV_DESC_PASIDC_PASID_SI:
+        pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo);
+        vtd_pasid_as = vtd_get_pasid_as(s, pasid);
+        QLIST_FOREACH(node, &(s->assigned_device_list), next) {
+            VTDAddressSpace *vtd_as = node->vtd_as;
+            VTDContextEntry ce;
+            uint16_t did;
+            uint8_t bus = pci_bus_num(vtd_as->bus);
+            ret = vtd_dev_to_context_entry(s, bus,
+                                   vtd_as->devfn, &ce);
+            if (ret != 0) {
+                continue;
+            }
+
+            did = VTD_CONTEXT_ENTRY_DID(ce.hi);
+            /*
+             * If did field equals to the domain_id field of inv_descriptor,
+             * then the device is affect by this invalidate request, need to
+             * bind or unbind the device to the pasid tagged address space.
+             * a) If it is bind, need to add the device to the device list,
+             *    add register tlb flush notifier for it
+             * b) If it is unbind, need to remove the device from the device
+             *    list, and unregister the tlb flush notifier
+             * TODO: add unbind logic accordingly, depends on the parsing of
+             *       guest pasid table entry pasrsing, here has no parsing to
+             *       pasid table entry.
+             *
+             */
+            if (did == domain_id) {
+                vtd_bind_device_to_pasid_as(vtd_pasid_as,
+                                  vtd_as->bus, vtd_as->devfn);
+            }
+        }
+        break;
+
+    default:
+        return false;
+    }
+
+    return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
                                      VTDInvDesc *inv_desc)
 {
@@ -1911,6 +2023,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_PC:
+        trace_vtd_inv_desc("pc", inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_pc_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_IEC:
         trace_vtd_inv_desc("iec", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index d084099..31d0d53 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -332,6 +332,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IEC                0x4 /* Interrupt Entry Cache
                                                Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
+#define VTD_INV_DESC_PC                 0x7 /* PASID-cache Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -388,6 +389,15 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
         (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 
+#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_LO    0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_HI    0xffffffffffffffffULL
+
+#define VTD_INV_DESC_PASIDC_ALL_ALL    (0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace Liu, Yi L
@ 2018-03-02 14:51   ` Paolo Bonzini
  2018-03-05  9:56     ` Liu, Yi L
  2018-03-06 11:43   ` Peter Xu
  1 sibling, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 14:51 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> +    IntelPASIDNode *node;
> +    char name[128];
> +
> +    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
> +        vtd_pasid_as = node->pasid_as;
> +        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
> +            return vtd_pasid_as;
> +        }
> +    }
> +
> +    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
> +    vtd_pasid_as->iommu_state = s;
> +    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
> +    address_space_init(&vtd_pasid_as->as, NULL, "pasid");

The name is unused here.  The call to address_space_init should probably
use it.

You also don't need the separate IntelPASIDNode, because the
QLIST_ENTRY can be placed directly in VTDPASIDAddressSpace.

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management Liu, Yi L
@ 2018-03-02 14:52   ` Paolo Bonzini
  2018-03-05  9:12     ` Liu, Yi L
  2018-03-02 15:00   ` Paolo Bonzini
  1 sibling, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 14:52 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> This patch introduces a framework to manage PASID tagged AddressSpace
> in Intel vIOMMU emulator. PASID tagged AddressSpace is an address sapce
> which is an abstract of guest process address space in Qemu. The
> management framework is as below:
> 
>          s->pasid_as_list
>               /|\ \
>              / | \ \
>      pasid_as_node  ...
>         /|\ \
>        / | \ \
>   device ...
> 
> There is a list to store all the PASID tagged AddressSpace, and each
> PASID tagged AddressSpace has a device list behind it. This is due to
> the fact that a PASID tagged AddressSpace can have multiple devices
> binded.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c         |  1 +
>  include/hw/i386/intel_iommu.h | 24 ++++++++++++++++++++++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index d92a66d..b8e8dbb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3174,6 +3174,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      }
>  
>      QLIST_INIT(&s->notifiers_list);
> +    QLIST_INIT(&s->pasid_as_list);
>      memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
>      memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
>                            "intel_iommu", DMAR_REG_SIZE);
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 0b6dc32..c45dbfe 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -61,6 +61,7 @@ typedef struct VTDContextEntry VTDContextEntry;
>  typedef struct VTDContextCacheEntry VTDContextCacheEntry;
>  typedef struct IntelIOMMUState IntelIOMMUState;
>  typedef struct VTDAddressSpace VTDAddressSpace;
> +typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace;
>  typedef struct VTDIOTLBEntry VTDIOTLBEntry;
>  typedef struct VTDBus VTDBus;
>  typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
> @@ -69,6 +70,8 @@ typedef struct VTDIrq VTDIrq;
>  typedef struct VTD_MSIMessage VTD_MSIMessage;
>  typedef struct IntelIOMMUMRNotifierNode IntelIOMMUMRNotifierNode;
>  typedef struct IntelIOMMUAssignedDeviceNode IntelIOMMUAssignedDeviceNode;
> +typedef struct IntelPASIDNode IntelPASIDNode;
> +typedef struct VTDDeviceNode VTDDeviceNode;
>  
>  /* Context-Entry */
>  struct VTDContextEntry {
> @@ -84,6 +87,20 @@ struct VTDContextCacheEntry {
>      struct VTDContextEntry context_entry;
>  };
>  
> +struct VTDDeviceNode {
> +    PCIBus *bus;
> +    uint8_t devfn;
> +    QLIST_ENTRY(VTDDeviceNode) next;
> +};
> +
> +struct VTDPASIDAddressSpace {
> +    AddressSpace as;
> +    IOMMUSVAContext sva_ctx;
> +    IntelIOMMUState *iommu_state;
> +    /* list of devices binded to a pasid tagged address space */
> +    QLIST_HEAD(, VTDDeviceNode) device_list;
> +};
> +
>  struct VTDAddressSpace {
>      PCIBus *bus;
>      uint8_t devfn;
> @@ -264,6 +281,11 @@ struct IntelIOMMUAssignedDeviceNode {
>      QLIST_ENTRY(IntelIOMMUAssignedDeviceNode) next;
>  };
>  
> +struct IntelPASIDNode {
> +    VTDPASIDAddressSpace *pasid_as;
> +    QLIST_ENTRY(IntelPASIDNode) next;
> +};
> +
>  /* The iommu (DMAR) device state struct */
>  struct IntelIOMMUState {
>      X86IOMMUState x86_iommu;
> @@ -304,6 +326,8 @@ struct IntelIOMMUState {
>      QLIST_HEAD(, IntelIOMMUMRNotifierNode) notifiers_list;
>      /* list of assigned devices */
>      QLIST_HEAD(, IntelIOMMUAssignedDeviceNode) assigned_device_list;
> +    /* list of pasid tagged address space */
> +    QLIST_HEAD(, IntelPASIDNode) pasid_as_list;
>  
>      /* interrupt remapping */
>      bool intr_enabled;              /* Whether guest enabled IR */
> 

Please merge this patch with the next one, since they are basically the
.h and .c sides of the same thing.

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management Liu, Yi L
  2018-03-02 14:52   ` Paolo Bonzini
@ 2018-03-02 15:00   ` Paolo Bonzini
  2018-03-05  9:11     ` Liu, Yi L
  1 sibling, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 15:00 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> +struct VTDDeviceNode {
> +    PCIBus *bus;
> +    uint8_t devfn;
> +    QLIST_ENTRY(VTDDeviceNode) next;
> +};

Do you really need VTDDeviceNode?  I think can you simply put the
QLIST_ENTRY in VTDAddressSpace (named e.g. next_by_pasid), since
VTDAddressSpace already includes a (bus, devfn).

Thanks,

Paolo

> +struct VTDPASIDAddressSpace {
> +    AddressSpace as;
> +    IOMMUSVAContext sva_ctx;
> +    IntelIOMMUState *iommu_state;
> +    /* list of devices binded to a pasid tagged address space */
> +    QLIST_HEAD(, VTDDeviceNode) device_list;
> +};
> +

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier Liu, Yi L
@ 2018-03-02 15:01   ` Paolo Bonzini
  2018-03-05 10:09     ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 15:01 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> From: Peter Xu <peterx@redhat.com>
> 
> IOMMU notifiers before are mostly used for [dev-]IOTLB stuffs. It is not
> suitable for other kind of notifiers (one example would be the future
> virt-svm support). Considering that current notifiers are targeted for
> per memory region, renaming the iommu notifier definitions.
> 
> This patch has following changes:
> * all the notifier types from IOMMU_NOTIFIER_* prefix into IOMMU_MR_EVENT_*
>   to better show its usage (for memory regions).
> * rename IOMMUNotifier to IOMMUMRNotifier
> * rename iommu_notifier to iommu_mr_notifier

Do you need this?  Could the IOMMUSVANotifier simply be renamed to
SVANotifier?

Thanks,

Paolo

> Signed-off-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 09/12] intel_iommu: record assigned devices in a list
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 09/12] intel_iommu: record assigned devices in a list Liu, Yi L
@ 2018-03-02 15:08   ` Paolo Bonzini
  2018-03-05  9:39     ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 15:08 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
>  
> +struct IntelIOMMUAssignedDeviceNode {
> +    VTDAddressSpace *vtd_as;
> +    QLIST_ENTRY(IntelIOMMUAssignedDeviceNode) next;
> +};
> +

This QLIST_ENTRY can also be placed directly in VTDAddressSpace (e.g.
next_assigned_dev), so that the notify function can simply do a
QLIST_REMOVE when an assigned device is hot-unplugged.

Does the notify_func need an "unbind from PASID address space" step,
that does the opposite of vtd_bind_device_to_pasid_as?

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice Liu, Yi L
@ 2018-03-02 15:10   ` Paolo Bonzini
  2018-03-05  8:11     ` Liu, Yi L
  2018-03-06 10:33   ` Liu, Yi L
  1 sibling, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 15:10 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> +void pci_setup_sva_ops(PCIDevice *dev, PCISVAOps *ops)
> +{
> +    if (dev) {
> +        dev->sva_ops = ops;
> +    }
> +    return;
> +}
> +

Better:

{
    assert(ops && !dev->sva_ops);
    dev->sva_ops = ops;
}

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu() Liu, Yi L
@ 2018-03-02 15:12   ` Paolo Bonzini
  2018-03-05  8:42     ` Liu, Yi L
  2018-03-02 16:06   ` Paolo Bonzini
  2018-03-05  8:27   ` Peter Xu
  2 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 15:12 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> This patch adds pci_device_notify_iommu() for notify virtual IOMMU
> emulator when assigned device is added. And adds a new notify_func
> in PCIBus. vIOMMU emulator provides the instance of this notify_func.
> 
> Reason:
> When virtual IOMMU is exposed to guest, vIOMMU emulator needs to
> programm host IOMMU to setup DMA mapping for assigned devices. This
> is a per-device operation, to be efficient, vIOMMU emulator needs
> to record the assigned devices.
> 
> Example: devices assigned thru vfio, vfio_realize would call
> pci_device_notify_iommu() to notify vIOMMU emulator to record necessary
> info for assigned device.

I think the notification should not be left to the individual device.
Instead, the add notification should be done in pci_setup_sva_ops, and
the delete notification should be done in pci_qdev_unrealize, before
calling pc->exit, and only if dev->sva_ops is not NULL.

In general I think it's better to change your names from "assigned_dev"
to "sva_dev", because the point of the list is to only iterate over
devices that might be interested in using SVA.

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA Liu, Yi L
@ 2018-03-02 15:13   ` Paolo Bonzini
  2018-03-05  8:10     ` Liu, Yi L
  2018-03-06  8:51   ` Liu, Yi L
  1 sibling, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 15:13 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> +void iommu_sva_notifier_unregister(IOMMUSVAContext *sva_ctx,
> +                                   IOMMUSVANotifier *notifier)
> +{
> +    IOMMUSVANotifier *cur, *next;
> +
> +    QLIST_FOREACH_SAFE(cur, &sva_ctx->sva_notifiers, node, next) {
> +        if (cur == notifier) {
> +            QLIST_REMOVE(cur, node);
> +            break;
> +        }
> +    }
> +}

It's enough to just do QLIST_REMOVE(notifier, node) here.

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu() Liu, Yi L
  2018-03-02 15:12   ` Paolo Bonzini
@ 2018-03-02 16:06   ` Paolo Bonzini
  2018-03-05  8:43     ` Liu, Yi L
  2018-03-05  8:27   ` Peter Xu
  2 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-02 16:06 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, mst, david
  Cc: alex.williamson, eric.auger.pro, yi.l.liu, peterx, kevin.tian, jasowang

On 01/03/2018 11:33, Liu, Yi L wrote:
> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_ADD);
> +
>      pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
>  
>      return;
> @@ -3134,6 +3136,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>  
> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_DEL);

Please make the names longer: PCI_IOMMU_NOTIFY_DEVICE_ADDED and
PCI_IOMMU_NOTIFY_DEVICE_REMOVED.  (This is independent of my other
remark, about doing this in generic PCI code for all devices that
register SVA ops).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/12] vfio/pci: add notify framework based on IOMMUSVAContext
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 04/12] vfio/pci: add notify framework based on IOMMUSVAContext Liu, Yi L
@ 2018-03-05  7:45   ` Peter Xu
  2018-03-05  8:05     ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-05  7:45 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Thu, Mar 01, 2018 at 06:33:27PM +0800, Liu, Yi L wrote:
> This patch introduces a notify framework for IOMMUSVAContext.sva_notifiers.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/vfio/common.c              | 1 +
>  include/hw/vfio/vfio-common.h | 9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 06277d2..1cc96df 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1019,6 +1019,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      container->fd = fd;
>      QLIST_INIT(&container->giommu_mr_list);
>      QLIST_INIT(&container->hostwin_list);
> +    QLIST_INIT(&container->gsva_ctx_list);
>      if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
>          ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
>          bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 702a085..4c16b4c 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -29,6 +29,7 @@
>  #ifdef CONFIG_LINUX
>  #include <linux/vfio.h>
>  #endif
> +#include "hw/core/pasid.h"
>  
>  #define ERR_PREFIX "vfio error: %s: "
>  #define WARN_PREFIX "vfio warning: %s: "
> @@ -88,6 +89,7 @@ typedef struct VFIOContainer {
>       * future
>       */
>      QLIST_HEAD(, VFIOGuestIOMMUMR) giommu_mr_list;
> +    QLIST_HEAD(, VFIOGuestIOMMUSVAContext) gsva_ctx_list;

IIUC vfio container is per-domain, so here we have a per-domain
context.  Does that mean that all the devices in the same IOMMU group
(or say, share the 2nd level page table) must share the same PASID
table (or say, the 1st level page table)?  Thanks,

>      QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
>      QLIST_ENTRY(VFIOContainer) next;
> @@ -101,6 +103,13 @@ typedef struct VFIOGuestIOMMUMR {
>      QLIST_ENTRY(VFIOGuestIOMMUMR) giommu_next;
>  } VFIOGuestIOMMUMR;
>  
> +typedef struct VFIOGuestIOMMUSVAContext {
> +    VFIOContainer *container;
> +    IOMMUSVAContext *sva_ctx;
> +    IOMMUSVANotifier n;
> +    QLIST_ENTRY(VFIOGuestIOMMUSVAContext) gsva_ctx_next;
> +} VFIOGuestIOMMUSVAContext;
> +
>  typedef struct VFIOHostDMAWindow {
>      hwaddr min_iova;
>      hwaddr max_iova;
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/12] vfio/pci: add notify framework based on IOMMUSVAContext
  2018-03-05  7:45   ` Peter Xu
@ 2018-03-05  8:05     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  8:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: kevin.tian, yi.l.liu, mst, jasowang, qemu-devel, alex.williamson,
	pbonzini, eric.auger.pro, david

On Mon, Mar 05, 2018 at 03:45:55PM +0800, Peter Xu wrote:
> On Thu, Mar 01, 2018 at 06:33:27PM +0800, Liu, Yi L wrote:
> > This patch introduces a notify framework for IOMMUSVAContext.sva_notifiers.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/vfio/common.c              | 1 +
> >  include/hw/vfio/vfio-common.h | 9 +++++++++
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 06277d2..1cc96df 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -1019,6 +1019,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >      container->fd = fd;
> >      QLIST_INIT(&container->giommu_mr_list);
> >      QLIST_INIT(&container->hostwin_list);
> > +    QLIST_INIT(&container->gsva_ctx_list);
> >      if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
> >          ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
> >          bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index 702a085..4c16b4c 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -29,6 +29,7 @@
> >  #ifdef CONFIG_LINUX
> >  #include <linux/vfio.h>
> >  #endif
> > +#include "hw/core/pasid.h"
> >  
> >  #define ERR_PREFIX "vfio error: %s: "
> >  #define WARN_PREFIX "vfio warning: %s: "
> > @@ -88,6 +89,7 @@ typedef struct VFIOContainer {
> >       * future
> >       */
> >      QLIST_HEAD(, VFIOGuestIOMMUMR) giommu_mr_list;
> > +    QLIST_HEAD(, VFIOGuestIOMMUSVAContext) gsva_ctx_list;
> 
> IIUC vfio container is per-domain, so here we have a per-domain
> context.  Does that mean that all the devices in the same IOMMU group
> (or say, share the 2nd level page table) must share the same PASID
> table (or say, the 1st level page table)?  Thanks,

Correct. We are also discussing it in another thread which is
on the bind granularity. Basically, SVA may only works for the
case in which only a single device in an iommu group. This is
based on the fact that there is no special handling to the PASID
TLP Prefix in PCI-E spec.

https://patchwork.kernel.org/patch/10213877/

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA
  2018-03-02 15:13   ` Paolo Bonzini
@ 2018-03-05  8:10     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  8:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On Fri, Mar 02, 2018 at 04:13:17PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> > +void iommu_sva_notifier_unregister(IOMMUSVAContext *sva_ctx,
> > +                                   IOMMUSVANotifier *notifier)
> > +{
> > +    IOMMUSVANotifier *cur, *next;
> > +
> > +    QLIST_FOREACH_SAFE(cur, &sva_ctx->sva_notifiers, node, next) {
> > +        if (cur == notifier) {
> > +            QLIST_REMOVE(cur, node);
> > +            break;
> > +        }
> > +    }
> > +}
> 
> It's enough to just do QLIST_REMOVE(notifier, node) here.

Thanks, will apply in next version.

Regards,
Yi Liu 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice
  2018-03-02 15:10   ` Paolo Bonzini
@ 2018-03-05  8:11     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  8:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On Fri, Mar 02, 2018 at 04:10:48PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> > +void pci_setup_sva_ops(PCIDevice *dev, PCISVAOps *ops)
> > +{
> > +    if (dev) {
> > +        dev->sva_ops = ops;
> > +    }
> > +    return;
> > +}
> > +
> 
> Better:
> 
> {
>     assert(ops && !dev->sva_ops);
>     dev->sva_ops = ops;
> }

Thanks, would apply in next version.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu() Liu, Yi L
  2018-03-02 15:12   ` Paolo Bonzini
  2018-03-02 16:06   ` Paolo Bonzini
@ 2018-03-05  8:27   ` Peter Xu
  2018-03-05  8:46     ` Liu, Yi L
  2 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-05  8:27 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Thu, Mar 01, 2018 at 06:33:31PM +0800, Liu, Yi L wrote:

[...]

> -void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
> +void pci_device_notify_iommu(PCIDevice *dev, PCIDevNotifyType type)
>  {
> -    bus->iommu_fn = fn;
> +    PCIBus *bus = PCI_BUS(pci_get_bus(dev));
> +    PCIBus *iommu_bus = bus;
> +
> +    while (iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> +        iommu_bus = PCI_BUS(pci_get_bus(iommu_bus->parent_dev));
> +    }
> +    if (iommu_bus && iommu_bus->notify_fn) {
> +        iommu_bus->notify_fn(bus,
> +                             iommu_bus->iommu_opaque,
> +                             dev->devfn,
> +                             type);

We didn't really check the return code for notify function.  What if
it failed?  If we care, we'd better handle the failure; or we can just
define the notify_fn() to return void (now it's int).

> +    }
> +    return;

I saw many places in the series that you added explicit return for
"void" return-typed functions.  IMHO all of them can be dropped.

> +}

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-02 15:12   ` Paolo Bonzini
@ 2018-03-05  8:42     ` Liu, Yi L
  2018-03-06 10:18       ` Paolo Bonzini
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  8:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On Fri, Mar 02, 2018 at 04:12:01PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> > This patch adds pci_device_notify_iommu() for notify virtual IOMMU
> > emulator when assigned device is added. And adds a new notify_func
> > in PCIBus. vIOMMU emulator provides the instance of this notify_func.
> > 
> > Reason:
> > When virtual IOMMU is exposed to guest, vIOMMU emulator needs to
> > programm host IOMMU to setup DMA mapping for assigned devices. This
> > is a per-device operation, to be efficient, vIOMMU emulator needs
> > to record the assigned devices.
> > 
> > Example: devices assigned thru vfio, vfio_realize would call
> > pci_device_notify_iommu() to notify vIOMMU emulator to record necessary
> > info for assigned device.
> 
> I think the notification should not be left to the individual device.
> Instead, the add notification should be done in pci_setup_sva_ops, and
> the delete notification should be done in pci_qdev_unrealize, before
> calling pc->exit, and only if dev->sva_ops is not NULL.

Agreed. I think it works together with your comments against
"[PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice". Would apply
it in next version.
 
> In general I think it's better to change your names from "assigned_dev"
> to "sva_dev", because the point of the list is to only iterate over
> devices that might be interested in using SVA.

For "assigned_dev", my purpose is to distinguish "assigned devices" from
emulated devices. Only the SVA usage on "assigned devices" is cared here.
But it is true only SVA capable device is interested. So I may need to
rename it as "assigned_sva_dev". How about your opinion?

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-02 16:06   ` Paolo Bonzini
@ 2018-03-05  8:43     ` Liu, Yi L
  2018-03-05 10:43       ` Peter Xu
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  8:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On Fri, Mar 02, 2018 at 05:06:56PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> > +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_ADD);
> > +
> >      pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
> >  
> >      return;
> > @@ -3134,6 +3136,7 @@ static void vfio_exitfn(PCIDevice *pdev)
> >  {
> >      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> >  
> > +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_DEL);
> 
> Please make the names longer: PCI_IOMMU_NOTIFY_DEVICE_ADDED and
> PCI_IOMMU_NOTIFY_DEVICE_REMOVED.  (This is independent of my other
> remark, about doing this in generic PCI code for all devices that
> register SVA ops).

Thanks for the suggestion, will appply.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-05  8:27   ` Peter Xu
@ 2018-03-05  8:46     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  8:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Mon, Mar 05, 2018 at 04:27:43PM +0800, Peter Xu wrote:
> On Thu, Mar 01, 2018 at 06:33:31PM +0800, Liu, Yi L wrote:
> 
> [...]
> 
> > -void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
> > +void pci_device_notify_iommu(PCIDevice *dev, PCIDevNotifyType type)
> >  {
> > -    bus->iommu_fn = fn;
> > +    PCIBus *bus = PCI_BUS(pci_get_bus(dev));
> > +    PCIBus *iommu_bus = bus;
> > +
> > +    while (iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> > +        iommu_bus = PCI_BUS(pci_get_bus(iommu_bus->parent_dev));
> > +    }
> > +    if (iommu_bus && iommu_bus->notify_fn) {
> > +        iommu_bus->notify_fn(bus,
> > +                             iommu_bus->iommu_opaque,
> > +                             dev->devfn,
> > +                             type);
> 
> We didn't really check the return code for notify function.  What if
> it failed?  If we care, we'd better handle the failure; or we can just
> define the notify_fn() to return void (now it's int).

Good catch. I think we need to handle failure. User should be aware of
it. I'll try to add accordingly in next version.
 
> > +    }
> > +    return;
> 
> I saw many places in the series that you added explicit return for
> "void" return-typed functions.  IMHO all of them can be dropped.

Thanks for spotting it, would fix them in next version.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management
  2018-03-02 15:00   ` Paolo Bonzini
@ 2018-03-05  9:11     ` Liu, Yi L
  2018-03-06 10:26       ` Paolo Bonzini
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  9:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, kevin.tian, yi.l.liu, jasowang, peterx,
	alex.williamson, eric.auger.pro

On Fri, Mar 02, 2018 at 04:00:23PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> > +struct VTDDeviceNode {
> > +    PCIBus *bus;
> > +    uint8_t devfn;
> > +    QLIST_ENTRY(VTDDeviceNode) next;
> > +};
> 
> Do you really need VTDDeviceNode?  I think can you simply put the
> QLIST_ENTRY in VTDAddressSpace (named e.g. next_by_pasid), since
> VTDAddressSpace already includes a (bus, devfn).

Existing VTDAddressSpace is actaully per-device. While for PASID tagged
address space, it is possible to have multiple devices tied to a single
PASID tagged address space. Reuse VTDAddressSpace could be a choice since
it is a per-device structure, but it may be missleading since there is
other fileds in VTDAddressSpace. This is why I proposed to have VTDDeviceNode.
But consolidation is possible here.

Thanks,
Yi Liu

> > +struct VTDPASIDAddressSpace {
> > +    AddressSpace as;
> > +    IOMMUSVAContext sva_ctx;
> > +    IntelIOMMUState *iommu_state;
> > +    /* list of devices binded to a pasid tagged address space */
> > +    QLIST_HEAD(, VTDDeviceNode) device_list;
> > +};
> > +
> 
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management
  2018-03-02 14:52   ` Paolo Bonzini
@ 2018-03-05  9:12     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  9:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On Fri, Mar 02, 2018 at 03:52:44PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> 
> Please merge this patch with the next one, since they are basically the
> .h and .c sides of the same thing.

yes, would do it in next version.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 09/12] intel_iommu: record assigned devices in a list
  2018-03-02 15:08   ` Paolo Bonzini
@ 2018-03-05  9:39     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  9:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On Fri, Mar 02, 2018 at 04:08:47PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> >  
> > +struct IntelIOMMUAssignedDeviceNode {
> > +    VTDAddressSpace *vtd_as;
> > +    QLIST_ENTRY(IntelIOMMUAssignedDeviceNode) next;
> > +};
> > +
> 
> This QLIST_ENTRY can also be placed directly in VTDAddressSpace (e.g.
> next_assigned_dev), so that the notify function can simply do a
> QLIST_REMOVE when an assigned device is hot-unplugged.

Thanks for this idea, would try to do consolidation together with
another comments from you.
 
> Does the notify_func need an "unbind from PASID address space" step,
> that does the opposite of vtd_bind_device_to_pasid_as?

yes, it is possible that a device is un-tied from a PASID tagged
address space. And this should be emulated in Qemu. Thanks for
the remind.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-02 14:51   ` Paolo Bonzini
@ 2018-03-05  9:56     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05  9:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On Fri, Mar 02, 2018 at 03:51:53PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> > +    IntelPASIDNode *node;
> > +    char name[128];
> > +
> > +    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
> > +        vtd_pasid_as = node->pasid_as;
> > +        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
> > +            return vtd_pasid_as;
> > +        }
> > +    }
> > +
> > +    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
> > +    vtd_pasid_as->iommu_state = s;
> > +    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
> > +    address_space_init(&vtd_pasid_as->as, NULL, "pasid");
> 
> The name is unused here.  The call to address_space_init should probably
> use it.

yes, it is. I missed it. Thanks for catching it.

> You also don't need the separate IntelPASIDNode, because the
> QLIST_ENTRY can be placed directly in VTDPASIDAddressSpace.

Would refine it in next version.

Ragards,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier
  2018-03-02 15:01   ` Paolo Bonzini
@ 2018-03-05 10:09     ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-05 10:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, david, kevin.tian, yi.l.liu, jasowang, peterx,
	alex.williamson, eric.auger.pro

On Fri, Mar 02, 2018 at 04:01:11PM +0100, Paolo Bonzini wrote:
> On 01/03/2018 11:33, Liu, Yi L wrote:
> > From: Peter Xu <peterx@redhat.com>
> > 
> > IOMMU notifiers before are mostly used for [dev-]IOTLB stuffs. It is not
> > suitable for other kind of notifiers (one example would be the future
> > virt-svm support). Considering that current notifiers are targeted for
> > per memory region, renaming the iommu notifier definitions.
> > 
> > This patch has following changes:
> > * all the notifier types from IOMMU_NOTIFIER_* prefix into IOMMU_MR_EVENT_*
> >   to better show its usage (for memory regions).
> > * rename IOMMUNotifier to IOMMUMRNotifier
> > * rename iommu_notifier to iommu_mr_notifier
> 
> Do you need this?  Could the IOMMUSVANotifier simply be renamed to
> SVANotifier?

I also considered it before sending out this series. It is necessary under
previous naming(the new notifier was planned to be name as IOMMUNotifier).
However, it is not necessary now since notifier can be SVANotifier.

But the changes in this patch still have some benefit. e.g. much clear
on the naming since existing iommu notifier is actually IOMMU MR based.
So I didn't remove it from this series. I plan to remove it in next
version as it is not necessary for this series now. Even we need it,
it can be done in other series. How about your opinion?

Thanks,
Yi Liu
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-05  8:43     ` Liu, Yi L
@ 2018-03-05 10:43       ` Peter Xu
  2018-03-06 10:19         ` Paolo Bonzini
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-05 10:43 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Paolo Bonzini, qemu-devel, mst, david, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Mon, Mar 05, 2018 at 04:43:09PM +0800, Liu, Yi L wrote:
> On Fri, Mar 02, 2018 at 05:06:56PM +0100, Paolo Bonzini wrote:
> > On 01/03/2018 11:33, Liu, Yi L wrote:
> > > +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_ADD);
> > > +
> > >      pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
> > >  
> > >      return;
> > > @@ -3134,6 +3136,7 @@ static void vfio_exitfn(PCIDevice *pdev)
> > >  {
> > >      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > >  
> > > +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_DEL);
> > 
> > Please make the names longer: PCI_IOMMU_NOTIFY_DEVICE_ADDED and
> > PCI_IOMMU_NOTIFY_DEVICE_REMOVED.  (This is independent of my other
> > remark, about doing this in generic PCI code for all devices that
> > register SVA ops).
> 
> Thanks for the suggestion, will appply.

Isn't the name too generic if it's tailored for VFIO only? Would
something like PCI_IOMMU_NOTIFY_VFIO_ADD be a bit better?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier Liu, Yi L
@ 2018-03-06  6:44   ` Peter Xu
  2018-03-06  8:00     ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-06  6:44 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Thu, Mar 01, 2018 at 06:33:30PM +0800, Liu, Yi L wrote:
> This patch shows how sva notifier is registered. And provided
> an example by registering notify func for tlb flush propagation.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/vfio/pci.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index a60a4d7..b7297cc 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2775,6 +2775,26 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +static VFIOContainer *vfio_get_container_from_busdev(PCIBus *bus,
> +                                                     int32_t devfn)
> +{
> +    VFIOGroup *group;
> +    VFIOPCIDevice *vdev_iter;
> +    VFIODevice *vbasedev_iter;
> +    PCIDevice *pdev_iter;
> +
> +    QLIST_FOREACH(group, &vfio_group_list, next) {
> +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +            vdev_iter = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> +            pdev_iter = &vdev_iter->pdev;
> +            if (pci_get_bus(pdev_iter) == bus && pdev_iter->devfn == devfn) {
> +                return group->container;
> +            }
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
>                   int32_t devfn, uint64_t pasidt_addr, uint32_t size)
>  {
> @@ -2783,11 +2803,42 @@ static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
>      So far, Intel VT-d and AMD IOMMU requires it. */
>  }
>  
> +static void vfio_iommu_sva_tlb_invalidate_notify(IOMMUSVANotifier *n,
> +                                                 IOMMUSVAEventData *event_data)
> +{
> +/*  Sample code, would be detailed in coming virt-SVA patchset.
> +    VFIOGuestIOMMUSVAContext *gsva_ctx;
> +    IOMMUSVAContext *sva_ctx;
> +    VFIOContainer *container;
> +
> +    gsva_ctx = container_of(n, VFIOGuestIOMMUSVAContext, n);
> +    container = gsva_ctx->container;
> +
> +    TODO: forward to host through VFIO IOCTL

IMHO if the series is not ready for merging, we can still mark it as
RFC and declare that so people won't need to go into details of the
patches.

> +*/
> +}
> +
>  static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
>                            int32_t devfn, IOMMUSVAContext *sva_ctx)
>  {
> -    /* Register notifier for TLB invalidation propagation
> -       */
> +    VFIOContainer *container = vfio_get_container_from_busdev(bus, devfn);
> +
> +    if (container != NULL) {
> +        VFIOGuestIOMMUSVAContext *gsva_ctx;
> +        gsva_ctx = g_malloc0(sizeof(*gsva_ctx));
> +        gsva_ctx->sva_ctx = sva_ctx;
> +        gsva_ctx->container = container;
> +        QLIST_INSERT_HEAD(&container->gsva_ctx_list,
> +                          gsva_ctx,
> +                          gsva_ctx_next);
> +       /* Register vfio_iommu_sva_tlb_invalidate_notify with event flag
> +           IOMMU_SVA_EVENT_TLB_INV */
> +        iommu_sva_notifier_register(sva_ctx,
> +                                    &gsva_ctx->n,
> +                                    vfio_iommu_sva_tlb_invalidate_notify,
> +                                    IOMMU_SVA_EVENT_TLB_INV);

I would squash this patch into previous one since basically this is
only part of the implementation to provide vfio-speicific register
hook.

But a more important question is... why this?

IMHO the notifier registration can be general for PCI.  Why vfio needs
to provide it's own register callback?  Would it be enough if it only
provides its own notify callback?

Thanks,

> +        return;
> +    }
>  }
>  
>  static void vfio_pci_device_sva_unregister_notifier(PCIBus *bus,
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
  2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
                   ` (11 preceding siblings ...)
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace Liu, Yi L
@ 2018-03-06  6:55 ` Peter Xu
  2018-03-06  7:45   ` Liu, Yi L
  12 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-06  6:55 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Thu, Mar 01, 2018 at 06:33:23PM +0800, Liu, Yi L wrote:
> This patchset is to introduce a notifier framework for virt-SVA.
> You may find virt-SVA design details from the link below.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04925.html
> 
> SVA is short for Shared Virtual Addressing. This is also called Shared
> Virtual Memory in previous patchsets. However, SVM is confusing as it
> can also be short for Secure Virtual Machine. So this patchset use
> Shared Virtual Addressing instead of Shared Virtual Memory. And it
> would be applied in future (SVA)related patch series as well.
> 
> Qemu has an existing notifier framework based on MemoryRegion, which
> are used for MAP/UNMAP. However, it is not well suited for virt-SVA.
> Reasons are as below:
> - virt-SVA works along with PT = 1
> - if PT = 1 IOMMU MR are disabled so MR notifier are not registered
> - new notifiers do not fit nicely in this framework as they need to be
>   registered even if PT = 1
> - need a new framework to attach the new notifiers
> - Additional background can be got from:
>   https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04931.html
> 
> So a new iommu notifier framework is needed. This patchset introduces
> a notifier framework based on IOMMUSVAContext. IOMMUSVAContext is
> introduced to be an abstract of virt-SVA operations in Qemu.
> 
> Patch Overview:
> * 1 - 2: rename existing naming, the IOMMU MemoryRegion Notifier
>          framework
> * 3 - 4: introduce SVA notifier framework based on IOMMUSVAContext
> * 5 - 7: introduce PCISVAOps and expose the SVA notfier framework
>          through hw/pci layer
> * 8 - 12: show the usage of SVA notifier in Intel vIOMMU emulator

Do you have online branch so that I can check out?

The patches are a bit scattered and it's really hard for me to
reference things within it... So a complete tree to read would be
nice.

I roughly went over most of the patches, and the framework you
introduced is still not that clear to me.  For now I feel like it can
be simplified somehow, but I'll hold and speak after I read the whole
tree again.

Also, it'll be good too if you can always provide some status update
of the kernel-counterpart it.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
  2018-03-06  6:55 ` [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Peter Xu
@ 2018-03-06  7:45   ` Liu, Yi L
  2018-03-07  5:38     ` Peter Xu
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-06  7:45 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, March 6, 2018 2:56 PM
> Subject: Re: [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
> 
> On Thu, Mar 01, 2018 at 06:33:23PM +0800, Liu, Yi L wrote:
> > This patchset is to introduce a notifier framework for virt-SVA.
> > You may find virt-SVA design details from the link below.
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04925.html
> >
> > SVA is short for Shared Virtual Addressing. This is also called Shared
> > Virtual Memory in previous patchsets. However, SVM is confusing as it
> > can also be short for Secure Virtual Machine. So this patchset use
> > Shared Virtual Addressing instead of Shared Virtual Memory. And it
> > would be applied in future (SVA)related patch series as well.
> >
> > Qemu has an existing notifier framework based on MemoryRegion, which
> > are used for MAP/UNMAP. However, it is not well suited for virt-SVA.
> > Reasons are as below:
> > - virt-SVA works along with PT = 1
> > - if PT = 1 IOMMU MR are disabled so MR notifier are not registered
> > - new notifiers do not fit nicely in this framework as they need to be
> >   registered even if PT = 1
> > - need a new framework to attach the new notifiers
> > - Additional background can be got from:
> >   https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04931.html
> >
> > So a new iommu notifier framework is needed. This patchset introduces
> > a notifier framework based on IOMMUSVAContext. IOMMUSVAContext is
> > introduced to be an abstract of virt-SVA operations in Qemu.
> >
> > Patch Overview:
> > * 1 - 2: rename existing naming, the IOMMU MemoryRegion Notifier
> >          framework
> > * 3 - 4: introduce SVA notifier framework based on IOMMUSVAContext
> > * 5 - 7: introduce PCISVAOps and expose the SVA notfier framework
> >          through hw/pci layer
> > * 8 - 12: show the usage of SVA notifier in Intel vIOMMU emulator
> 
> Do you have online branch so that I can check out?

yes, I should have pasted it. Here it is:
https://github.com/luxis1999/sva_notifier.git

> The patches are a bit scattered and it's really hard for me to
> reference things within it... So a complete tree to read would be
> nice.
> 
> I roughly went over most of the patches, and the framework you
> introduced is still not that clear to me.  For now I feel like it can
> be simplified somehow, but I'll hold and speak after I read the whole
> tree again.
> 
> Also, it'll be good too if you can always provide some status update
> of the kernel-counterpart it.

Good suggestion. For this patchset, it only affects Qemu. Yeah, but for
the whole virt-SVA enabling, there is kernel-counterparts. I would do
it in the virt-SVA patchset series.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier
  2018-03-06  6:44   ` Peter Xu
@ 2018-03-06  8:00     ` Liu, Yi L
  2018-03-06 12:09       ` Peter Xu
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-06  8:00 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, March 6, 2018 2:45 PM
> Subject: Re: [PATCH v3 07/12] vfio/pci: register sva notifier
> 
> On Thu, Mar 01, 2018 at 06:33:30PM +0800, Liu, Yi L wrote:
> > This patch shows how sva notifier is registered. And provided an
> > example by registering notify func for tlb flush propagation.
> >
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/vfio/pci.c | 55
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 53 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index a60a4d7..b7297cc
> > 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2775,6 +2775,26 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice
> *vdev)
> >      vdev->req_enabled = false;
> >  }
> >
> > +static VFIOContainer *vfio_get_container_from_busdev(PCIBus *bus,
> > +                                                     int32_t devfn) {
> > +    VFIOGroup *group;
> > +    VFIOPCIDevice *vdev_iter;
> > +    VFIODevice *vbasedev_iter;
> > +    PCIDevice *pdev_iter;
> > +
> > +    QLIST_FOREACH(group, &vfio_group_list, next) {
> > +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> > +            vdev_iter = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> > +            pdev_iter = &vdev_iter->pdev;
> > +            if (pci_get_bus(pdev_iter) == bus && pdev_iter->devfn == devfn) {
> > +                return group->container;
> > +            }
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> >  static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> >                   int32_t devfn, uint64_t pasidt_addr, uint32_t size)
> > { @@ -2783,11 +2803,42 @@ static void
> > vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> >      So far, Intel VT-d and AMD IOMMU requires it. */  }
> >
> > +static void vfio_iommu_sva_tlb_invalidate_notify(IOMMUSVANotifier *n,
> > +                                                 IOMMUSVAEventData
> > +*event_data) {
> > +/*  Sample code, would be detailed in coming virt-SVA patchset.
> > +    VFIOGuestIOMMUSVAContext *gsva_ctx;
> > +    IOMMUSVAContext *sva_ctx;
> > +    VFIOContainer *container;
> > +
> > +    gsva_ctx = container_of(n, VFIOGuestIOMMUSVAContext, n);
> > +    container = gsva_ctx->container;
> > +
> > +    TODO: forward to host through VFIO IOCTL
> 
> IMHO if the series is not ready for merging, we can still mark it as RFC and declare
> that so people won't need to go into details of the patches.

Thanks for the suggestion. Actually, I was hesitating it. As you may know, this is actually
3rd version of this effort. But yes, I would follow your suggestion in coming versions.

> > +*/
> > +}
> > +
> >  static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
> >                            int32_t devfn, IOMMUSVAContext *sva_ctx)  {
> > -    /* Register notifier for TLB invalidation propagation
> > -       */
> > +    VFIOContainer *container = vfio_get_container_from_busdev(bus,
> > + devfn);
> > +
> > +    if (container != NULL) {
> > +        VFIOGuestIOMMUSVAContext *gsva_ctx;
> > +        gsva_ctx = g_malloc0(sizeof(*gsva_ctx));
> > +        gsva_ctx->sva_ctx = sva_ctx;
> > +        gsva_ctx->container = container;
> > +        QLIST_INSERT_HEAD(&container->gsva_ctx_list,
> > +                          gsva_ctx,
> > +                          gsva_ctx_next);
> > +       /* Register vfio_iommu_sva_tlb_invalidate_notify with event flag
> > +           IOMMU_SVA_EVENT_TLB_INV */
> > +        iommu_sva_notifier_register(sva_ctx,
> > +                                    &gsva_ctx->n,
> > +                                    vfio_iommu_sva_tlb_invalidate_notify,
> > +                                    IOMMU_SVA_EVENT_TLB_INV);
> 
> I would squash this patch into previous one since basically this is only part of the
> implementation to provide vfio-speicific register hook.

sure.

> But a more important question is... why this?
> 
> IMHO the notifier registration can be general for PCI.  Why vfio needs to provide it's
> own register callback?  Would it be enough if it only provides its own notify callback?

The notifiers are in VFIO. However, the registration is controlled by vIOMMU emulator.
In this series, PASID tagged Address Space is introduced. And the new notifiers are for
such Address Space. Such Address Space is created and deleted in vIOMMU emulator.
So the notifier registration needs to happen accordingly.

e.g. guest SVM application bind a device to a process, it programs the guest iommu
translation structure, vIOMMU emulator captures the change, and create a PASID
tagged Address Space for it and register notifiers.

That's why I do it in such a manner.

Thanks,
Yi Liu



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA Liu, Yi L
  2018-03-02 15:13   ` Paolo Bonzini
@ 2018-03-06  8:51   ` Liu, Yi L
  1 sibling, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-06  8:51 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang

On Mon, Mar 05, 2018 at 02:25:09PM +1100, David Gibson wrote:
> On Thu, Mar 01, 2018 at 06:31:53PM +0800, Liu, Yi L wrote:
> > From: Peter Xu <peterx@redhat.com>
> >
> > This patch adds IOMMUSVAContext as an abstract for virt-SVA in
> > Qemu.
> >
> > IOMMUSVAContext is per-PASID(Process Address Space Identity).
> > A PASID Tagged AddressSpace should have an IOMMUSVAContext
> > created for it. virt-SVA emulation for emulated SVA capable
> > devices would use IOMMUSVAContext. And for assigned devices,
> > Qemu also needs to propagate guest tlb flush to host through
> > the sva_notifer based on IOMMUSVAContext.
> >
> > This patch proposes to include a sva_notifier list and
> > an IOMMUSVAContextOps in IOMMUSVAContext.
> >
> > * The sva_notifier list would include tlb invalidate nofitifer
> >   to propagate guest's iotlb flush to host.
> > * The first callback in IOMMUSVAContextOps would be an address
> >   translation callback. For the SVA aware DMAs issued by emulated
> >   SVA capable devices, it requires Qemu to emulate data read/write
> >   to guest process address space. Qemu needs to do address translation
> >   with guest process page table. So the IOMMUSVAContextOps.translate()
> >   callback would be helpful for emulating SVA capable devices.
> >
> > Note: to fulfill the IOMMUSVAContext based address translation
> > framework, may duplicate quite a few existing MemoryRegion based
> > translation code in Qemu. As this patchset is mainly to support
> > assigned SVA capable devices. So this patchset hasn't done the
> > duplication. In future, if any requirement for emulating SVA
> > capable device, it would require a separate patchset to fulfill
> > the translation framework.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/core/Makefile.objs   |   1 +
> >  hw/core/pasid.c         |  64 ++++++++++++++++++++++++++++
> >  include/hw/core/pasid.h | 110 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 175 insertions(+)
> >  create mode 100644 hw/core/pasid.c
> >  create mode 100644 include/hw/core/pasid.h
>
> [snip]
> > +
> > +#ifndef HW_PCI_PASID_H
> > +#define HW_PCI_PASID_H
> > +
 > > +#include "qemu/queue.h"
> > +#ifndef CONFIG_USER_ONLY
> > +#include "exec/hwaddr.h"
> > +#endif
> > +
> > +typedef struct IOMMUSVAContext IOMMUSVAContext;
> > +
> > +enum IOMMUSVAEvent {
> > +    IOMMU_SVA_EVENT_TLB_INV,
> > +};
> > +typedef enum IOMMUSVAEvent IOMMUSVAEvent;
> > +
> > +struct IOMMUSVAEventData {
> > +    IOMMUSVAEvent event;
> > +    uint64_t length;
> > +    void *data;
> > +};
> > +typedef struct IOMMUSVAEventData IOMMUSVAEventData;
> > +
> > +typedef struct IOMMUSVANotifier IOMMUSVANotifier;
> > +
> > +typedef void (*IOMMUSVANotifyFn)(IOMMUSVANotifier *notifier,
> > +                                 IOMMUSVAEventData *event_data);
> > +
> > +typedef struct IOMMUSVATLBEntry IOMMUSVATLBEntry;
> > +
> > +/* See address_space_translate: bit 0 is read, bit 1 is write.  */
> > +typedef enum {
> > +    IOMMU_SVA_NONE = 0,
> > +    IOMMU_SVA_RO   = 1,
> > +    IOMMU_SVA_WO   = 2,
> > +    IOMMU_SVA_RW   = 3,
> > +} IOMMUSVAAccessFlags;
> > +
> > +#define IOMMU_SVA_ACCESS_FLAG(r, w) (((r) ? IOMMU_SVA_RO : 0) | \
> > +                                     ((w) ? IOMMU_SVA_WO : 0))
> > +
> > +struct IOMMUSVATLBEntry {
> > +    AddressSpace    *target_as;
> > +    hwaddr           va;
> > +    hwaddr           translated_addr;
> > +    hwaddr           addr_mask;  /* 0xfff = 4k translation */
> > +    IOMMUSVAAccessFlags perm;
> > +};
> > +
> > +typedef struct IOMMUSVAContextOps IOMMUSVAContextOps;
> > +struct IOMMUSVAContextOps {
> > +    /* Return a TLB entry that contains a given address. */
> > +    IOMMUSVATLBEntry (*translate)(IOMMUSVAContext *sva_ctx,
> > +                                  hwaddr addr, bool is_write);
> > +};
>
> A lot of the above seems to just duplicate stuff from IOMMU MRs and
> it's not clear why we need both.

yes, this is for the potential SVA aware DMA emulation. And this
is similar to IOMMU MRs. Only difference is the translation for PASID
tagged address space is based on IOMMUSVAContext. As why we need both,
it is due to not proper to mix SVA notifier with MAP/UNMAP notifier
in a chain.

> > +struct IOMMUSVANotifier {
> > +    IOMMUSVANotifyFn sva_notify;
> > +    /*
> > +     * What events we are listening to. Let's allow multiple event
> > +     * registrations from beginning.
> > +     */
> > +    IOMMUSVAEvent event;
> > +    QLIST_ENTRY(IOMMUSVANotifier) node;
> > +};
> > +
> > +/*
> > + * This stands for an IOMMU unit. Any translation device should have
> > + * this struct inside its own structure to make sure it can leverage
> > + * common IOMMU functionalities.
> > + */
> > +struct IOMMUSVAContext {
> > +    uint32_t pasid;
> > +    QLIST_HEAD(, IOMMUSVANotifier) sva_notifiers;
> > +    const IOMMUSVAContextOps *sva_ctx_ops;
> > +};
>
> I think the problem is here.  The SVAContext represents a *single*
> PASID, and once you have a single PASID the resulting object *is*
> functionally equivalent to an AddressSpace (though effectively
> required to have nothing but a single IOMMUMR within it).

I also evaluated reusing IOMMU MR. If reuse IOMMU MR, then the SVA notifiers
would be in the same list which MAP/UNMAP notifier locates. This may break
some existing logic. e.g. each time the registration of MR notifier would
result in flag changed, and some vIOMMU emulator logic relies on it. Also,
the replay logic in intel_iommu emulator also relies on the MAP/UNMAP
notifier, if new notifier added in the list, it may be a confusion. So
I didn't go with reusing IOMMU MR. But any better idea would be welcomed.

> It also seems to me unlikely that different PASIDs for the same device
> / IOMMU domain will have truly different sva_ctx_ops.

yes, sva_ctx_ops should be the same for different PASIDs. So far, translate
callback is the only candidate.

> It really seems to me the object you actually want is a level up from
> that, representing the whole cluster of address spaces indexed by
> PASID.  They would have the same operations for all PASIDs in the
> cluster, but those would take the pasid number.

yes, that's also my thought. Here IOMMUSVAContext is supposed to be
per-PASID. But the sva_ctx_ops pointer is actually shared by all
IOMMUSVAContext instances. For the sva_notifiers list, I think it should
be per-PASID since some address spaces indexed by PASID doesn't require
SVA notifier. e.g. the one binded to an emulated SVA capable device. But
the notifier functions are also shared by all IOMMUSVAContext instances.

In this series, IOMMUSVAContext co-exists with an AddressSpace within a
super structure(VTDPASIDAddressSpace in patch 11 of this series).

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-05  8:42     ` Liu, Yi L
@ 2018-03-06 10:18       ` Paolo Bonzini
  2018-03-06 11:03         ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-06 10:18 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

On 05/03/2018 09:42, Liu, Yi L wrote:
>> In general I think it's better to change your names from "assigned_dev"
>> to "sva_dev", because the point of the list is to only iterate over
>> devices that might be interested in using SVA.
>
> For "assigned_dev", my purpose is to distinguish "assigned devices" from
> emulated devices. Only the SVA usage on "assigned devices" is cared here.
> But it is true only SVA capable device is interested. So I may need to
> rename it as "assigned_sva_dev". How about your opinion?

What you care about is not whether the device assigned, but rather
whether it called or not pci_setup_sva_ops.  Currently only VFIO does
this, but that's not a requirement.  Hence my suggestion of calling it
sva_dev.

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-05 10:43       ` Peter Xu
@ 2018-03-06 10:19         ` Paolo Bonzini
  2018-03-06 10:47           ` Peter Xu
  0 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-06 10:19 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, mst, david, alex.williamson, eric.auger.pro,
	yi.l.liu, kevin.tian, jasowang

On 05/03/2018 11:43, Peter Xu wrote:
> On Mon, Mar 05, 2018 at 04:43:09PM +0800, Liu, Yi L wrote:
>> On Fri, Mar 02, 2018 at 05:06:56PM +0100, Paolo Bonzini wrote:
>>> On 01/03/2018 11:33, Liu, Yi L wrote:
>>>> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_ADD);
>>>> +
>>>>      pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
>>>>  
>>>>      return;
>>>> @@ -3134,6 +3136,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>>>>  {
>>>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>>>  
>>>> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_DEL);
>>>
>>> Please make the names longer: PCI_IOMMU_NOTIFY_DEVICE_ADDED and
>>> PCI_IOMMU_NOTIFY_DEVICE_REMOVED.  (This is independent of my other
>>> remark, about doing this in generic PCI code for all devices that
>>> register SVA ops).
>>
>> Thanks for the suggestion, will appply.
> 
> Isn't the name too generic if it's tailored for VFIO only? Would
> something like PCI_IOMMU_NOTIFY_VFIO_ADD be a bit better?

I don't think it's for VFIO only.  It's just that VFIO is the only
caller of pci_setup_sva_ops.

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management
  2018-03-05  9:11     ` Liu, Yi L
@ 2018-03-06 10:26       ` Paolo Bonzini
  2018-03-08 10:42         ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-06 10:26 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, david, kevin.tian, yi.l.liu, jasowang, peterx,
	alex.williamson, eric.auger.pro

On 05/03/2018 10:11, Liu, Yi L wrote:
>> Do you really need VTDDeviceNode?  I think can you simply put the
>> QLIST_ENTRY in VTDAddressSpace (named e.g. next_by_pasid), since
>> VTDAddressSpace already includes a (bus, devfn).
> Existing VTDAddressSpace is actaully per-device. While for PASID tagged
> address space, it is possible to have multiple devices tied to a single
> PASID tagged address space.

Yes, that's the purpose of VTDPASIDAddressSpace.

> Reuse VTDAddressSpace could be a choice since
> it is a per-device structure, but it may be missleading since there is
> other fileds in VTDAddressSpace. This is why I proposed to have VTDDeviceNode.

I think it's okay to put all per-device setup in VTDAddressSpace.  Later
if it makes sense VTDAddressSpace could become a union, according to
whether the IOMMU is configured for PASID or requester ID operation, and
could be renamed to VTDDeviceInfo.  But for now it's not needed.

Paolo

> But consolidation is possible here.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice Liu, Yi L
  2018-03-02 15:10   ` Paolo Bonzini
@ 2018-03-06 10:33   ` Liu, Yi L
  2018-04-12  2:36     ` David Gibson
  1 sibling, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-06 10:33 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, yi.l.liu, peterx,
	kevin.tian, jasowang

On Mon, Mar 05, 2018 at 02:31:44PM +1100, David Gibson wrote:
> On Thu, Mar 01, 2018 at 06:31:55PM +0800, Liu, Yi L wrote:
> > This patch intoduces PCISVAOps for virt-SVA.
> >
> > So far, to setup virt-SVA for assigned SVA capable device, needs to
> > config host translation structures. e.g. for VT-d, needs to set the
> > guest pasid table to host and enable nested translation. Besides,
> > vIOMMU emulator needs to forward guest's cache invalidation to host.
> > On VT-d, it is guest's invalidation to 1st level translation related
> > cache, such invalidation should be forwarded to host.
> >
> > Proposed PCISVAOps are:
> > * sva_bind_guest_pasid_table: set the guest pasid table to host, and
> >                               enable nested translation in host
> > * sva_register_notifier: register sva_notifier to forward guest's
> >                          cache invalidation to host
> > * sva_unregister_notifier: unregister sva_notifier
> >
> > The PCISVAOps should be provided by vfio or modules alike. Mainly for
> > assigned SVA capable devices.
> >
> > Take virt-SVA on VT-d as an exmaple:
> > If a guest wants to setup virt-SVA for an assigned SVA capable device,
> > it programs its context entry. vIOMMU emulator captures guest's context
> > entry programming, and figure out the target device. vIOMMU emulator
> > use the pci_device_sva_bind_pasid_table() API to bind the guest pasid
> > table to host.
> >
> > Guest would also program its pasid table. vIOMMU emulator captures
> > guest's pasid entry programming. In Qemu, needs to allocate an
> > AddressSpace to stand for the pasid tagged address space and Qemu also
> > needs to register sva_notifier to forward future cache invalidation
> > request to host.
> >
> > Allocating AddressSpace to stand for the pasid tagged address space is
> > for the emulation of emulated SVA capable devices. Emulated SVA capable
> > devices may issue SVA aware DMAs, Qemu needs to emulate read/write to a
> > pasid tagged AddressSpace. Thus needs an abstraction for such address
> > space in Qemu.
> >
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>
> So PCISVAOps is roughly equivalent to the cluster-of-PASIDs context I
> was suggesting in my earlier comments,

yes, it is. The purpose is to expose pasid table bind and sva notfier
registration/unregistration to vIOMMU emulators.

> however it's only an ops
> structure.  That means you can't easily share a context between
> multiple PCI devices which is unfortunate because:
>     * The simplest use case for SVA I can see would just put the
>       same set of PASIDs into place for every SVA capable device

Do you mean for emulated SVA capable device?

>     * Sometimes the IOMMU can't determine exactly what device a DMA
>       came from.  Now the bridge cases where this applies are probably
>       unlikely with SVA devices, but I wouldn't want to bet on it.  In
>       addition, the chances some manufacturer will eventually put out
>       a buggy multifunction SVA capable device that use the wrong RIDs
>       for the secondary functions is pretty darn high.

I'm not sure I 100% got your point here. Do yu mean physical device?
In PCIE TLP, DMA packet should have a RID field? And it looks more like
a hardware layer trouble. For this series, it only provides necessary
software support to make sure guest's SVA operation is well prepared
before the SVA device issues the SVA aware DMA. e.g. link guest's pasid
table to host, and config iommu translation in nested mode.

>
> So I think instead you want a cluster-of-PASIDs object which has an
> ops table including both these and the per-PASID calls from the
> earlier patches (but the per-PASID calls would now take an explicit
> PASID value).

I didn't quite get "including both these and the per-PASID calls".
What do you mean by "these"? Do you mean the PCISVAOps?

Thanks,
Yi Liu
> > ---
> >  hw/pci/pci.c         | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/pci/pci.h | 21 ++++++++++++++++++
> >  2 files changed, 81 insertions(+)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index e006b6a..157fe21 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -2573,6 +2573,66 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
> >      bus->iommu_opaque = opaque;
> >  }
> >
> > +void pci_setup_sva_ops(PCIDevice *dev, PCISVAOps *ops)
> > +{
> > +    if (dev) {
> > +        dev->sva_ops = ops;
> > +    }
> > +    return;
> > +}
> > +
> > +void pci_device_sva_bind_pasid_table(PCIBus *bus,
> > +                     int32_t devfn, uint64_t addr, uint32_t size)
> > +{
> > +    PCIDevice *dev;
> > +
> > +    if (!bus) {
> > +        return;
> > +    }
> > +
> > +    dev = bus->devices[devfn];
> > +    if (dev && dev->sva_ops) {
> > +        dev->sva_ops->sva_bind_pasid_table(bus, devfn, addr, size);
> > +    }
> > +    return;
> > +}
> > +
> > +void pci_device_sva_register_notifier(PCIBus *bus, int32_t devfn,
> > +                                      IOMMUSVAContext *sva_ctx)
> > +{
> > +    PCIDevice *dev;
> > +
> > +    if (!bus) {
> > +        return;
> > +    }
> > +
> > +    dev = bus->devices[devfn];
> > +    if (dev && dev->sva_ops) {
> > +        dev->sva_ops->sva_register_notifier(bus,
> > +                                            devfn,
> > +                                            sva_ctx);
> > +    }
> > +    return;
> > +}
> > +
> > +void pci_device_sva_unregister_notifier(PCIBus *bus, int32_t devfn,
> > +                                        IOMMUSVAContext *sva_ctx)
> > +{
> > +    PCIDevice *dev;
> > +
> > +    if (!bus) {
> > +        return;
> > +    }
> > +
> > +    dev = bus->devices[devfn];
> > +    if (dev && dev->sva_ops) {
> > +        dev->sva_ops->sva_unregister_notifier(bus,
> > +                                              devfn,
> > +                                              sva_ctx);
> > +    }
> > +    return;
> > +}
> > +
> >  static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
> >  {
> >      Range *range = opaque;
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index d8c18c7..32889a4 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -10,6 +10,8 @@
> >
> >  #include "hw/pci/pcie.h"
> >
> > +#include "hw/core/pasid.h"
> > +
> >  extern bool pci_available;
> >
> >  /* PCI bus */
> > @@ -262,6 +264,16 @@ struct PCIReqIDCache {
> >  };
> >  typedef struct PCIReqIDCache PCIReqIDCache;
> >
> > +typedef struct PCISVAOps PCISVAOps;
> > +struct PCISVAOps {
> > +    void (*sva_bind_pasid_table)(PCIBus *bus, int32_t devfn,
> > +             uint64_t pasidt_addr, uint32_t size);
> > +    void (*sva_register_notifier)(PCIBus *bus, int32_t devfn,
> > +                                  IOMMUSVAContext *sva_ctx);
> > +    void (*sva_unregister_notifier)(PCIBus *bus, int32_t devfn,
> > +                                    IOMMUSVAContext *sva_ctx);
> > +};
> > +
> >  struct PCIDevice {
> >      DeviceState qdev;
> >
> > @@ -351,6 +363,7 @@ struct PCIDevice {
> >      MSIVectorUseNotifier msix_vector_use_notifier;
> >      MSIVectorReleaseNotifier msix_vector_release_notifier;
> >      MSIVectorPollNotifier msix_vector_poll_notifier;
> > +    PCISVAOps *sva_ops;
> >  };
> >
> >  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> > @@ -477,6 +490,14 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
> >  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> >  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
> >
> > +void pci_setup_sva_ops(PCIDevice *dev, PCISVAOps *ops);
> > +void pci_device_sva_bind_pasid_table(PCIBus *bus, int32_t devfn,
> > +                     uint64_t pasidt_addr, uint32_t size);
> > +void pci_device_sva_register_notifier(PCIBus *bus, int32_t devfn,
> > +                                      IOMMUSVAContext *sva_ctx);
> > +void pci_device_sva_unregister_notifier(PCIBus *bus, int32_t devfn,
> > +                                       IOMMUSVAContext *sva_ctx);
> > +
> >  static inline void
> >  pci_set_byte(uint8_t *config, uint8_t val)
> >  {

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-06 10:19         ` Paolo Bonzini
@ 2018-03-06 10:47           ` Peter Xu
  2018-03-06 11:06             ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-06 10:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu, Yi L, qemu-devel, mst, david, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Tue, Mar 06, 2018 at 11:19:13AM +0100, Paolo Bonzini wrote:
> On 05/03/2018 11:43, Peter Xu wrote:
> > On Mon, Mar 05, 2018 at 04:43:09PM +0800, Liu, Yi L wrote:
> >> On Fri, Mar 02, 2018 at 05:06:56PM +0100, Paolo Bonzini wrote:
> >>> On 01/03/2018 11:33, Liu, Yi L wrote:
> >>>> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_ADD);
> >>>> +
> >>>>      pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
> >>>>  
> >>>>      return;
> >>>> @@ -3134,6 +3136,7 @@ static void vfio_exitfn(PCIDevice *pdev)
> >>>>  {
> >>>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> >>>>  
> >>>> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_DEL);
> >>>
> >>> Please make the names longer: PCI_IOMMU_NOTIFY_DEVICE_ADDED and
> >>> PCI_IOMMU_NOTIFY_DEVICE_REMOVED.  (This is independent of my other
> >>> remark, about doing this in generic PCI code for all devices that
> >>> register SVA ops).
> >>
> >> Thanks for the suggestion, will appply.
> > 
> > Isn't the name too generic if it's tailored for VFIO only? Would
> > something like PCI_IOMMU_NOTIFY_VFIO_ADD be a bit better?
> 
> I don't think it's for VFIO only.  It's just that VFIO is the only
> caller of pci_setup_sva_ops.

Indeed.  E.g., we can have emulated devices that also want to provide
the SVA ops.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-06 10:18       ` Paolo Bonzini
@ 2018-03-06 11:03         ` Liu, Yi L
  2018-03-06 11:22           ` Paolo Bonzini
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-06 11:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kevin.tian, yi.l.liu, mst, jasowang, qemu-devel, peterx,
	alex.williamson, eric.auger.pro, david

On Tue, Mar 06, 2018 at 11:18:43AM +0100, Paolo Bonzini wrote:
> On 05/03/2018 09:42, Liu, Yi L wrote:
> >> In general I think it's better to change your names from "assigned_dev"
> >> to "sva_dev", because the point of the list is to only iterate over
> >> devices that might be interested in using SVA.
> >
> > For "assigned_dev", my purpose is to distinguish "assigned devices" from
> > emulated devices. Only the SVA usage on "assigned devices" is cared here.
> > But it is true only SVA capable device is interested. So I may need to
> > rename it as "assigned_sva_dev". How about your opinion?
> 
> What you care about is not whether the device assigned, but rather
> whether it called or not pci_setup_sva_ops.  Currently only VFIO does
> this, but that's not a requirement.  Hence my suggestion of calling it
> sva_dev.

Yes, only VFIO calls pci_setup_sva_ops so far, but it should not limited to.
I'll apply in next version.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-06 10:47           ` Peter Xu
@ 2018-03-06 11:06             ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-06 11:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, qemu-devel, mst, david, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Tue, Mar 06, 2018 at 06:47:27PM +0800, Peter Xu wrote:
> On Tue, Mar 06, 2018 at 11:19:13AM +0100, Paolo Bonzini wrote:
> > On 05/03/2018 11:43, Peter Xu wrote:
> > > On Mon, Mar 05, 2018 at 04:43:09PM +0800, Liu, Yi L wrote:
> > >> On Fri, Mar 02, 2018 at 05:06:56PM +0100, Paolo Bonzini wrote:
> > >>> On 01/03/2018 11:33, Liu, Yi L wrote:
> > >>>> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_ADD);
> > >>>> +
> > >>>>      pci_setup_sva_ops(pdev, &vfio_pci_sva_ops);
> > >>>>  
> > >>>>      return;
> > >>>> @@ -3134,6 +3136,7 @@ static void vfio_exitfn(PCIDevice *pdev)
> > >>>>  {
> > >>>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > >>>>  
> > >>>> +    pci_device_notify_iommu(pdev, PCI_NTY_DEV_DEL);
> > >>>
> > >>> Please make the names longer: PCI_IOMMU_NOTIFY_DEVICE_ADDED and
> > >>> PCI_IOMMU_NOTIFY_DEVICE_REMOVED.  (This is independent of my other
> > >>> remark, about doing this in generic PCI code for all devices that
> > >>> register SVA ops).
> > >>
> > >> Thanks for the suggestion, will appply.
> > > 
> > > Isn't the name too generic if it's tailored for VFIO only? Would
> > > something like PCI_IOMMU_NOTIFY_VFIO_ADD be a bit better?
> > 
> > I don't think it's for VFIO only.  It's just that VFIO is the only
> > caller of pci_setup_sva_ops.
> 
> Indeed.  E.g., we can have emulated devices that also want to provide
> the SVA ops.

Excatly as Paolo commented. ^_^

Thanks,
Yi Liu 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-06 11:03         ` Liu, Yi L
@ 2018-03-06 11:22           ` Paolo Bonzini
  2018-03-06 11:27             ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Paolo Bonzini @ 2018-03-06 11:22 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: kevin.tian, yi.l.liu, mst, jasowang, qemu-devel, peterx,
	alex.williamson, eric.auger.pro, david

On 06/03/2018 12:03, Liu, Yi L wrote:
> On Tue, Mar 06, 2018 at 11:18:43AM +0100, Paolo Bonzini wrote:
>> On 05/03/2018 09:42, Liu, Yi L wrote:
>>>> In general I think it's better to change your names from "assigned_dev"
>>>> to "sva_dev", because the point of the list is to only iterate over
>>>> devices that might be interested in using SVA.
>>>
>>> For "assigned_dev", my purpose is to distinguish "assigned devices" from
>>> emulated devices. Only the SVA usage on "assigned devices" is cared here.
>>> But it is true only SVA capable device is interested. So I may need to
>>> rename it as "assigned_sva_dev". How about your opinion?
>>
>> What you care about is not whether the device assigned, but rather
>> whether it called or not pci_setup_sva_ops.  Currently only VFIO does
>> this, but that's not a requirement.  Hence my suggestion of calling it
>> sva_dev.
> 
> Yes, only VFIO calls pci_setup_sva_ops so far, but it should not limited to.
> I'll apply in next version.

For what it's worth, I agree with David's suggestion for naming (so
pci_setup_pasid_ops, pasid_dev, etc.)

Paolo

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu()
  2018-03-06 11:22           ` Paolo Bonzini
@ 2018-03-06 11:27             ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-06 11:27 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Yi L
  Cc: Tian, Kevin, mst, jasowang, qemu-devel, peterx, alex.williamson,
	eric.auger.pro, david

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Tuesday, March 6, 2018 7:22 PM
> Subject: Re: [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce
> pci_device_notify_iommu()
> 
> On 06/03/2018 12:03, Liu, Yi L wrote:
> > On Tue, Mar 06, 2018 at 11:18:43AM +0100, Paolo Bonzini wrote:
> >> On 05/03/2018 09:42, Liu, Yi L wrote:
> >>>> In general I think it's better to change your names from "assigned_dev"
> >>>> to "sva_dev", because the point of the list is to only iterate over
> >>>> devices that might be interested in using SVA.
> >>>
> >>> For "assigned_dev", my purpose is to distinguish "assigned devices"
> >>> from emulated devices. Only the SVA usage on "assigned devices" is cared here.
> >>> But it is true only SVA capable device is interested. So I may need
> >>> to rename it as "assigned_sva_dev". How about your opinion?
> >>
> >> What you care about is not whether the device assigned, but rather
> >> whether it called or not pci_setup_sva_ops.  Currently only VFIO does
> >> this, but that's not a requirement.  Hence my suggestion of calling
> >> it sva_dev.
> >
> > Yes, only VFIO calls pci_setup_sva_ops so far, but it should not limited to.
> > I'll apply in next version.
> 
> For what it's worth, I agree with David's suggestion for naming (so
> pci_setup_pasid_ops, pasid_dev, etc.)

Thanks, Paolo. I would follow suggestions from you two.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace Liu, Yi L
  2018-03-02 14:51   ` Paolo Bonzini
@ 2018-03-06 11:43   ` Peter Xu
  2018-03-08  9:39     ` Liu, Yi L
  1 sibling, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-06 11:43 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, yi.l.liu, kevin.tian, jasowang

On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote:
> This patch shows the idea of how a device is binded to a PASID tagged
> AddressSpace.
> 
> when Intel vIOMMU emulator detected a pasid table entry programming
> from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace
> with the pasid field of pasid cache invalidate request.
> 
> * If it is to bind a device to a guest process, needs add the device
>   to the device list behind the VTDPASIDAddressSpace. And if the device
>   is assigned device, need to register sva_notfier for future tlb
>   flushing if any mapping changed to the process address space.
> 
> * If it is to unbind a device from a guest process, then need to remove
>   the device from the device list behind the VTDPASIDAddressSpace.
>   And also needs to unregister the sva_notfier if the device is assigned
>   device.
> 
> This patch hasn't added the unbind logic. It depends on guest pasid
> table entry parsing which requires further emulation. Here just want
> to show the idea for the PASID tagged AddressSpace management framework.
> Full unregister logic would be included in future virt-SVA patchset.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 119 +++++++++++++++++++++++++++++++++++++++++
>  hw/i386/intel_iommu_internal.h |  10 ++++
>  2 files changed, 129 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index b8e8dbb..ed07035 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>      return true;
>  }
>  
> +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s,
> +                                              uint32_t pasid)
> +{
> +    VTDPASIDAddressSpace *vtd_pasid_as = NULL;
> +    IntelPASIDNode *node;
> +    char name[128];
> +
> +    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
> +        vtd_pasid_as = node->pasid_as;
> +        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
> +            return vtd_pasid_as;
> +        }
> +    }

This seems to be a per-iommu pasid table.  However from the spec it
looks more like that should be per-domain (I'm seeing figure 3-8).
For example, each domain should be able to have its own pasid table.
Then IIUC a pasid context will need a (domain, pasid) tuple to
identify, not only the pasid itself?

And, do we need to destroy the pasid context after it's freed by the
guest?  Here it seems that we'll cache it forever.

> +
> +    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
> +    vtd_pasid_as->iommu_state = s;
> +    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
> +    address_space_init(&vtd_pasid_as->as, NULL, "pasid");

I saw that this is only inited and never used.  Could I ask when this
will be used?

> +    QLIST_INIT(&vtd_pasid_as->device_list);
> +
> +    node = g_malloc0(sizeof(*node));
> +    node->pasid_as = vtd_pasid_as;
> +    QLIST_INSERT_HEAD(&s->pasid_as_list, node, next);
> +
> +    return vtd_pasid_as;
> +}
> +
> +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as,
> +                                        PCIBus *bus, uint8_t devfn)
> +{
> +    VTDDeviceNode *node = NULL;
> +
> +    QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) {
> +        if (node->bus == bus && node->devfn == devfn) {
> +            return;
> +        }
> +    }
> +
> +    node = g_malloc0(sizeof(*node));
> +    node->bus = bus;
> +    node->devfn = devfn;
> +    QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next);

So here I have the same confusion - IIUC according to the spec two
devices can have differnet pasid tables, however they can be sharing
the same PASID number (e.g., pasid=1) in the table.  Here since
vtd_pasid_as is only per-IOMMU, could it possible that we put multiple
devices under same PASID context while actually they are not sharing
the same process page table?  Problematic?

Please correct me if needed.

> +
> +    pci_device_sva_register_notifier(bus, devfn, &vtd_pasid_as->sva_ctx);
> +
> +    return;
> +}
> +
> +static bool vtd_process_pc_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> +{
> +
> +    IntelIOMMUAssignedDeviceNode *node = NULL;
> +    int ret = 0;
> +
> +    uint16_t domain_id;
> +    uint32_t pasid;
> +    VTDPASIDAddressSpace *vtd_pasid_as;
> +
> +    if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) ||
> +        (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) {
> +        return false;
> +    }
> +
> +    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo);
> +
> +    switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) {
> +    case VTD_INV_DESC_PASIDC_ALL_ALL:
> +        /* TODO: invalidate all pasid related cache */

I think it's fine as RFC, but we'd better have this in the final
version?

IIUC you'll need caching-mode too for virt-sva, and here you'll
possibly need to walk and scan every context entry that has the same
domain ID specified in the invalidation request.  Maybe further you'll
need to scan the pasid entries too, register notifiers when needed.

Thanks,

> +        break;
> +
> +    case VTD_INV_DESC_PASIDC_PASID_SI:
> +        pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo);
> +        vtd_pasid_as = vtd_get_pasid_as(s, pasid);
> +        QLIST_FOREACH(node, &(s->assigned_device_list), next) {
> +            VTDAddressSpace *vtd_as = node->vtd_as;
> +            VTDContextEntry ce;
> +            uint16_t did;
> +            uint8_t bus = pci_bus_num(vtd_as->bus);
> +            ret = vtd_dev_to_context_entry(s, bus,
> +                                   vtd_as->devfn, &ce);
> +            if (ret != 0) {
> +                continue;
> +            }
> +
> +            did = VTD_CONTEXT_ENTRY_DID(ce.hi);
> +            /*
> +             * If did field equals to the domain_id field of inv_descriptor,
> +             * then the device is affect by this invalidate request, need to
> +             * bind or unbind the device to the pasid tagged address space.
> +             * a) If it is bind, need to add the device to the device list,
> +             *    add register tlb flush notifier for it
> +             * b) If it is unbind, need to remove the device from the device
> +             *    list, and unregister the tlb flush notifier
> +             * TODO: add unbind logic accordingly, depends on the parsing of
> +             *       guest pasid table entry pasrsing, here has no parsing to
> +             *       pasid table entry.
> +             *
> +             */
> +            if (did == domain_id) {
> +                vtd_bind_device_to_pasid_as(vtd_pasid_as,
> +                                  vtd_as->bus, vtd_as->devfn);
> +            }
> +        }
> +        break;
> +
> +    default:
> +        return false;
> +    }
> +
> +    return true;
> +}
> +

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier
  2018-03-06  8:00     ` Liu, Yi L
@ 2018-03-06 12:09       ` Peter Xu
  2018-03-08 11:22         ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-06 12:09 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

On Tue, Mar 06, 2018 at 08:00:41AM +0000, Liu, Yi L wrote:
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Tuesday, March 6, 2018 2:45 PM
> > Subject: Re: [PATCH v3 07/12] vfio/pci: register sva notifier
> > 
> > On Thu, Mar 01, 2018 at 06:33:30PM +0800, Liu, Yi L wrote:
> > > This patch shows how sva notifier is registered. And provided an
> > > example by registering notify func for tlb flush propagation.
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  hw/vfio/pci.c | 55
> > > +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 53 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index a60a4d7..b7297cc
> > > 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2775,6 +2775,26 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice
> > *vdev)
> > >      vdev->req_enabled = false;
> > >  }
> > >
> > > +static VFIOContainer *vfio_get_container_from_busdev(PCIBus *bus,
> > > +                                                     int32_t devfn) {
> > > +    VFIOGroup *group;
> > > +    VFIOPCIDevice *vdev_iter;
> > > +    VFIODevice *vbasedev_iter;
> > > +    PCIDevice *pdev_iter;
> > > +
> > > +    QLIST_FOREACH(group, &vfio_group_list, next) {
> > > +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> > > +            vdev_iter = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> > > +            pdev_iter = &vdev_iter->pdev;
> > > +            if (pci_get_bus(pdev_iter) == bus && pdev_iter->devfn == devfn) {
> > > +                return group->container;
> > > +            }
> > > +        }
> > > +    }
> > > +    return NULL;
> > > +}
> > > +
> > >  static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> > >                   int32_t devfn, uint64_t pasidt_addr, uint32_t size)
> > > { @@ -2783,11 +2803,42 @@ static void
> > > vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> > >      So far, Intel VT-d and AMD IOMMU requires it. */  }
> > >
> > > +static void vfio_iommu_sva_tlb_invalidate_notify(IOMMUSVANotifier *n,
> > > +                                                 IOMMUSVAEventData
> > > +*event_data) {
> > > +/*  Sample code, would be detailed in coming virt-SVA patchset.
> > > +    VFIOGuestIOMMUSVAContext *gsva_ctx;
> > > +    IOMMUSVAContext *sva_ctx;
> > > +    VFIOContainer *container;
> > > +
> > > +    gsva_ctx = container_of(n, VFIOGuestIOMMUSVAContext, n);
> > > +    container = gsva_ctx->container;
> > > +
> > > +    TODO: forward to host through VFIO IOCTL
> > 
> > IMHO if the series is not ready for merging, we can still mark it as RFC and declare
> > that so people won't need to go into details of the patches.
> 
> Thanks for the suggestion. Actually, I was hesitating it. As you may know, this is actually
> 3rd version of this effort. But yes, I would follow your suggestion in coming versions.

Yeah, it's a long way even since the first version of the work.
However IMHO it's not about which version are you working with, it's
about whether you think it's a complete work and ready to be merged.
IMHO if you are very sure it's not good for merging, we should better
provide the RFC tag, or mention that in the cover letter.  So firstly
the maintainer won't accidentaly merge your series; meanwhile
reviewers will know the state of series so they can decide on which
aspect they'll focus on during the review.

> 
> > > +*/
> > > +}
> > > +
> > >  static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
> > >                            int32_t devfn, IOMMUSVAContext *sva_ctx)  {
> > > -    /* Register notifier for TLB invalidation propagation
> > > -       */
> > > +    VFIOContainer *container = vfio_get_container_from_busdev(bus,
> > > + devfn);
> > > +
> > > +    if (container != NULL) {
> > > +        VFIOGuestIOMMUSVAContext *gsva_ctx;
> > > +        gsva_ctx = g_malloc0(sizeof(*gsva_ctx));
> > > +        gsva_ctx->sva_ctx = sva_ctx;
> > > +        gsva_ctx->container = container;
> > > +        QLIST_INSERT_HEAD(&container->gsva_ctx_list,
> > > +                          gsva_ctx,
> > > +                          gsva_ctx_next);
> > > +       /* Register vfio_iommu_sva_tlb_invalidate_notify with event flag
> > > +           IOMMU_SVA_EVENT_TLB_INV */
> > > +        iommu_sva_notifier_register(sva_ctx,
> > > +                                    &gsva_ctx->n,
> > > +                                    vfio_iommu_sva_tlb_invalidate_notify,
> > > +                                    IOMMU_SVA_EVENT_TLB_INV);
> > 
> > I would squash this patch into previous one since basically this is only part of the
> > implementation to provide vfio-speicific register hook.
> 
> sure.
> 
> > But a more important question is... why this?
> > 
> > IMHO the notifier registration can be general for PCI.  Why vfio needs to provide it's
> > own register callback?  Would it be enough if it only provides its own notify callback?
> 
> The notifiers are in VFIO. However, the registration is controlled by vIOMMU emulator.
> In this series, PASID tagged Address Space is introduced. And the new notifiers are for
> such Address Space. Such Address Space is created and deleted in vIOMMU emulator.
> So the notifier registration needs to happen accordingly.
> 
> e.g. guest SVM application bind a device to a process, it programs the guest iommu
> translation structure, vIOMMU emulator captures the change, and create a PASID
> tagged Address Space for it and register notifiers.
> 
> That's why I do it in such a manner.

I agree that the things are mostly managed by vIOMMU, but I still
cannot understand why vfio must have its own register hook.

Let me try to explain a bit more on my question.  Basically I was
asking about whether we can removet the register/unregister hook in
the SVAOps, instead we can have something like (I'll start to use
pasid as prefix):

struct PCIPASIDOps {
    void (*pasid_bind_table)(PCIBus *bus, int32_t devfn, ...);
    void (*pasid_invalidate_extend_iotlb)(PCIBus *bus, int32_t devfn, ...)
};

Firstly we keep the bind_table operation, but instead of the reg/unreg
we only provide a hook that device can override to listen to extend
iotlb invalidations.

IMHO my understanding is that the vIOMMU should be able to even hide
the detailed PASID information here, and only call the
pasid_invalidate_extend_iotlb() if the device gets extended-iotlb
invalidations (then it passes it down to host IOMMU, with the same
information like domain ID, PASID, granularity).

Would that work?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
  2018-03-06  7:45   ` Liu, Yi L
@ 2018-03-07  5:38     ` Peter Xu
  2018-03-08  9:10       ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-07  5:38 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

On Tue, Mar 06, 2018 at 07:45:39AM +0000, Liu, Yi L wrote:

[...]

> > Do you have online branch so that I can check out?
> 
> yes, I should have pasted it. Here it is:
> https://github.com/luxis1999/sva_notifier.git

Thanks.

> 
> > The patches are a bit scattered and it's really hard for me to
> > reference things within it... So a complete tree to read would be
> > nice.
> > 
> > I roughly went over most of the patches, and the framework you
> > introduced is still not that clear to me.  For now I feel like it can
> > be simplified somehow, but I'll hold and speak after I read the whole
> > tree again.
> > 
> > Also, it'll be good too if you can always provide some status update
> > of the kernel-counterpart it.
> 
> Good suggestion. For this patchset, it only affects Qemu. Yeah, but for
> the whole virt-SVA enabling, there is kernel-counterparts. I would do
> it in the virt-SVA patchset series.

If you still want to post separately - I'm thinking whether it'll be
good you put the vfio changes into the 2nd virt-sva series, since that
looks more like in that category.  Or say, we can introduce
SVAOps/PASIDOps, we implement more vIOMMU invalidation request
handling, we call it in IOMMU code, but we don't implement any of the
device (vfio) that provide that ops.

Or maybe we can just post the whole stuff altogether, since after all
these two series are still closely related IMHO (e.g., the SVAOps
definition should be closely related to how the first vfio user would
like to use it).

Only my two cents, and I don't know how other people think.  It's up
to you after all. :)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
  2018-03-07  5:38     ` Peter Xu
@ 2018-03-08  9:10       ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-08  9:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Wednesday, March 7, 2018 1:38 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: Liu, Yi L <yi.l.liu@linux.intel.com>; qemu-devel@nongnu.org; mst@redhat.com;
> david@gibson.dropbear.id.au; pbonzini@redhat.com; alex.williamson@redhat.com;
> eric.auger.pro@gmail.com; Tian, Kevin <kevin.tian@intel.com>;
> jasowang@redhat.com
> Subject: Re: [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
> 
> On Tue, Mar 06, 2018 at 07:45:39AM +0000, Liu, Yi L wrote:
> 
> [...]
> 
> > > Do you have online branch so that I can check out?
> >
> > yes, I should have pasted it. Here it is:
> > https://github.com/luxis1999/sva_notifier.git
> 
> Thanks.
> 
> >
> > > The patches are a bit scattered and it's really hard for me to
> > > reference things within it... So a complete tree to read would be
> > > nice.
> > >
> > > I roughly went over most of the patches, and the framework you
> > > introduced is still not that clear to me.  For now I feel like it
> > > can be simplified somehow, but I'll hold and speak after I read the
> > > whole tree again.
> > >
> > > Also, it'll be good too if you can always provide some status update
> > > of the kernel-counterpart it.
> >
> > Good suggestion. For this patchset, it only affects Qemu. Yeah, but
> > for the whole virt-SVA enabling, there is kernel-counterparts. I would
> > do it in the virt-SVA patchset series.
> 
> If you still want to post separately - I'm thinking whether it'll be good you put the
> vfio changes into the 2nd virt-sva series, since that looks more like in that category.
> Or say, we can introduce SVAOps/PASIDOps, we implement more vIOMMU
> invalidation request handling, we call it in IOMMU code, but we don't implement any
> of the device (vfio) that provide that ops.
> 
> Or maybe we can just post the whole stuff altogether, since after all these two series
> are still closely related IMHO (e.g., the SVAOps definition should be closely related to
> how the first vfio user would like to use it).
> 
> Only my two cents, and I don't know how other people think.  It's up to you after
> all. :)

Your suggestion is appreciated. My initial plan is: if the SVAOps/PASIDOps and
SVAContext proposal is accepted by reviewers. Then I can further merge the two
series. So far, still needs to work with David on the SVAContext definition. I'll
balance the suggestion from you when sending next version.

Thanks,
Yi Liu


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-06 11:43   ` Peter Xu
@ 2018-03-08  9:39     ` Liu, Yi L
  2018-03-09  7:59       ` Peter Xu
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-08  9:39 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, March 6, 2018 7:44 PM
> Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged
> AddressSpace
> 
> On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote:
> > This patch shows the idea of how a device is binded to a PASID tagged
> > AddressSpace.
> >
> > when Intel vIOMMU emulator detected a pasid table entry programming
> > from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace
> > with the pasid field of pasid cache invalidate request.
> >
> > * If it is to bind a device to a guest process, needs add the device
> >   to the device list behind the VTDPASIDAddressSpace. And if the device
> >   is assigned device, need to register sva_notfier for future tlb
> >   flushing if any mapping changed to the process address space.
> >
> > * If it is to unbind a device from a guest process, then need to remove
> >   the device from the device list behind the VTDPASIDAddressSpace.
> >   And also needs to unregister the sva_notfier if the device is assigned
> >   device.
> >
> > This patch hasn't added the unbind logic. It depends on guest pasid
> > table entry parsing which requires further emulation. Here just want
> > to show the idea for the PASID tagged AddressSpace management framework.
> > Full unregister logic would be included in future virt-SVA patchset.
> >
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 119
> +++++++++++++++++++++++++++++++++++++++++
> >  hw/i386/intel_iommu_internal.h |  10 ++++
> >  2 files changed, 129 insertions(+)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index b8e8dbb..ed07035 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState
> *s, VTDInvDesc *inv_desc)
> >      return true;
> >  }
> >
> > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s,
> > +                                              uint32_t pasid)
> > +{
> > +    VTDPASIDAddressSpace *vtd_pasid_as = NULL;
> > +    IntelPASIDNode *node;
> > +    char name[128];
> > +
> > +    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
> > +        vtd_pasid_as = node->pasid_as;
> > +        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
> > +            return vtd_pasid_as;
> > +        }
> > +    }
> 
> This seems to be a per-iommu pasid table.  However from the spec it
> looks more like that should be per-domain (I'm seeing figure 3-8).
> For example, each domain should be able to have its own pasid table.
> Then IIUC a pasid context will need a (domain, pasid) tuple to
> identify, not only the pasid itself?

Yes, this is a per-iommu table here. Actually, how we assemble the
table here depends on the PASID namespace. You may refer to the
iommu driver code. intel-svm.c, it's actually per-iommu.

		/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
		ret = idr_alloc(&iommu->pasid_idr, svm,
				!!cap_caching_mode(iommu->cap),
				pasid_max - 1, GFP_KERNEL);

> 
> And, do we need to destroy the pasid context after it's freed by the
> guest?  Here it seems that we'll cache it forever.

If we need to do it. A PASID can be bind to multiple devices. If there
is no device binding on it, then needs to destroy it. This may be done
by refcount. As I mentioned in the description, that requires further
vIOMMU emulation, so I didn't include it. But it should be covered
in final version. Good catch.

> 
> > +
> > +    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
> > +    vtd_pasid_as->iommu_state = s;
> > +    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
> > +    address_space_init(&vtd_pasid_as->as, NULL, "pasid");
> 
> I saw that this is only inited and never used.  Could I ask when this
> will be used?

AddressSpace is actually introduced for future support of emulated
SVA capable devices and possible 1st level paging shadowing(similar
to the 2nd level page table shadowing you upstreamed).

> 
> > +    QLIST_INIT(&vtd_pasid_as->device_list);
> > +
> > +    node = g_malloc0(sizeof(*node));
> > +    node->pasid_as = vtd_pasid_as;
> > +    QLIST_INSERT_HEAD(&s->pasid_as_list, node, next);
> > +
> > +    return vtd_pasid_as;
> > +}
> > +
> > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as,
> > +                                        PCIBus *bus, uint8_t devfn)
> > +{
> > +    VTDDeviceNode *node = NULL;
> > +
> > +    QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) {
> > +        if (node->bus == bus && node->devfn == devfn) {
> > +            return;
> > +        }
> > +    }
> > +
> > +    node = g_malloc0(sizeof(*node));
> > +    node->bus = bus;
> > +    node->devfn = devfn;
> > +    QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next);
> 
> So here I have the same confusion - IIUC according to the spec two
> devices can have differnet pasid tables, however they can be sharing
> the same PASID number (e.g., pasid=1) in the table.  

Do you mean the pasid table in iommu driver? I can not say it is impossible,
but you may notice that in current iommu driver, the devices behind a single
iommu unit shared pasid table.

> Here since
> vtd_pasid_as is only per-IOMMU, could it possible that we put multiple
> devices under same PASID context while actually they are not sharing
> the same process page table?  Problematic?

You are correct, two devices may under same PASID context. For the case
you described, I don't think it is allowed as it breaks the PASID concept.
Software should avoid it.

Does it make sense?

> 
> Please correct me if needed.
> 
> > +
> > +    pci_device_sva_register_notifier(bus, devfn, &vtd_pasid_as->sva_ctx);
> > +
> > +    return;
> > +}
> > +
> > +static bool vtd_process_pc_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> > +{
> > +
> > +    IntelIOMMUAssignedDeviceNode *node = NULL;
> > +    int ret = 0;
> > +
> > +    uint16_t domain_id;
> > +    uint32_t pasid;
> > +    VTDPASIDAddressSpace *vtd_pasid_as;
> > +
> > +    if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) ||
> > +        (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) {
> > +        return false;
> > +    }
> > +
> > +    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo);
> > +
> > +    switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) {
> > +    case VTD_INV_DESC_PASIDC_ALL_ALL:
> > +        /* TODO: invalidate all pasid related cache */
> 
> I think it's fine as RFC, but we'd better have this in the final
> version?

Definitely.

> 
> IIUC you'll need caching-mode too for virt-sva, and here you'll
> possibly need to walk and scan every context entry that has the same
> domain ID specified in the invalidation request.  Maybe further you'll
> need to scan the pasid entries too, register notifiers when needed.

Sure, should be in the full virt-sva series.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management
  2018-03-06 10:26       ` Paolo Bonzini
@ 2018-03-08 10:42         ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-08 10:42 UTC (permalink / raw)
  To: Paolo Bonzini, Liu, Yi L
  Cc: qemu-devel, mst, david, Tian, Kevin, jasowang, peterx,
	alex.williamson, eric.auger.pro

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Tuesday, March 6, 2018 6:27 PM
> Subject: Re: [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID
> AddressSpace management
> 
> On 05/03/2018 10:11, Liu, Yi L wrote:
> >> Do you really need VTDDeviceNode?  I think can you simply put the
> >> QLIST_ENTRY in VTDAddressSpace (named e.g. next_by_pasid), since
> >> VTDAddressSpace already includes a (bus, devfn).
> > Existing VTDAddressSpace is actaully per-device. While for PASID
> > tagged address space, it is possible to have multiple devices tied to
> > a single PASID tagged address space.
> 
> Yes, that's the purpose of VTDPASIDAddressSpace.
> 
> > Reuse VTDAddressSpace could be a choice since it is a per-device
> > structure, but it may be missleading since there is other fileds in
> > VTDAddressSpace. This is why I proposed to have VTDDeviceNode.
> 
> I think it's okay to put all per-device setup in VTDAddressSpace.  Later if it makes
> sense VTDAddressSpace could become a union, according to whether the IOMMU is
> configured for PASID or requester ID operation, and could be renamed to
> VTDDeviceInfo.  But for now it's not needed.

Agreed. Let me apply the idea in next version.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier
  2018-03-06 12:09       ` Peter Xu
@ 2018-03-08 11:22         ` Liu, Yi L
  2018-03-09  7:05           ` Peter Xu
  0 siblings, 1 reply; 65+ messages in thread
From: Liu, Yi L @ 2018-03-08 11:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, March 6, 2018 8:10 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: Liu, Yi L <yi.l.liu@linux.intel.com>; qemu-devel@nongnu.org; mst@redhat.com;
> david@gibson.dropbear.id.au; pbonzini@redhat.com; alex.williamson@redhat.com;
> eric.auger.pro@gmail.com; Tian, Kevin <kevin.tian@intel.com>;
> jasowang@redhat.com
> Subject: Re: [PATCH v3 07/12] vfio/pci: register sva notifier
> 
> On Tue, Mar 06, 2018 at 08:00:41AM +0000, Liu, Yi L wrote:
> > > From: Peter Xu [mailto:peterx@redhat.com]
> > > Sent: Tuesday, March 6, 2018 2:45 PM
> > > Subject: Re: [PATCH v3 07/12] vfio/pci: register sva notifier
> > >
> > > On Thu, Mar 01, 2018 at 06:33:30PM +0800, Liu, Yi L wrote:
> > > > This patch shows how sva notifier is registered. And provided an
> > > > example by registering notify func for tlb flush propagation.
> > > >
> > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > > ---
> > > >  hw/vfio/pci.c | 55
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 53 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index a60a4d7..b7297cc
> > > > 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -2775,6 +2775,26 @@ static void
> > > > vfio_unregister_req_notifier(VFIOPCIDevice
> > > *vdev)
> > > >      vdev->req_enabled = false;
> > > >  }
> > > >
> > > > +static VFIOContainer *vfio_get_container_from_busdev(PCIBus *bus,
> > > > +                                                     int32_t devfn) {
> > > > +    VFIOGroup *group;
> > > > +    VFIOPCIDevice *vdev_iter;
> > > > +    VFIODevice *vbasedev_iter;
> > > > +    PCIDevice *pdev_iter;
> > > > +
> > > > +    QLIST_FOREACH(group, &vfio_group_list, next) {
> > > > +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> > > > +            vdev_iter = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> > > > +            pdev_iter = &vdev_iter->pdev;
> > > > +            if (pci_get_bus(pdev_iter) == bus && pdev_iter->devfn == devfn) {
> > > > +                return group->container;
> > > > +            }
> > > > +        }
> > > > +    }
> > > > +    return NULL;
> > > > +}
> > > > +
> > > >  static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> > > >                   int32_t devfn, uint64_t pasidt_addr, uint32_t
> > > > size) { @@ -2783,11 +2803,42 @@ static void
> > > > vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> > > >      So far, Intel VT-d and AMD IOMMU requires it. */  }
> > > >
> > > > +static void vfio_iommu_sva_tlb_invalidate_notify(IOMMUSVANotifier *n,
> > > > +
> > > > +IOMMUSVAEventData
> > > > +*event_data) {
> > > > +/*  Sample code, would be detailed in coming virt-SVA patchset.
> > > > +    VFIOGuestIOMMUSVAContext *gsva_ctx;
> > > > +    IOMMUSVAContext *sva_ctx;
> > > > +    VFIOContainer *container;
> > > > +
> > > > +    gsva_ctx = container_of(n, VFIOGuestIOMMUSVAContext, n);
> > > > +    container = gsva_ctx->container;
> > > > +
> > > > +    TODO: forward to host through VFIO IOCTL
> > >
> > > IMHO if the series is not ready for merging, we can still mark it as
> > > RFC and declare that so people won't need to go into details of the patches.
> >
> > Thanks for the suggestion. Actually, I was hesitating it. As you may
> > know, this is actually 3rd version of this effort. But yes, I would follow your
> suggestion in coming versions.
> 
> Yeah, it's a long way even since the first version of the work.
> However IMHO it's not about which version are you working with, it's about whether
> you think it's a complete work and ready to be merged.
> IMHO if you are very sure it's not good for merging, we should better provide the
> RFC tag, or mention that in the cover letter.  So firstly the maintainer won't
> accidentaly merge your series; meanwhile reviewers will know the state of series so
> they can decide on which aspect they'll focus on during the review.

thanks for the guiding~

> >
> > > > +*/
> > > > +}
> > > > +
> > > >  static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
> > > >                            int32_t devfn, IOMMUSVAContext *sva_ctx)  {
> > > > -    /* Register notifier for TLB invalidation propagation
> > > > -       */
> > > > +    VFIOContainer *container =
> > > > + vfio_get_container_from_busdev(bus,
> > > > + devfn);
> > > > +
> > > > +    if (container != NULL) {
> > > > +        VFIOGuestIOMMUSVAContext *gsva_ctx;
> > > > +        gsva_ctx = g_malloc0(sizeof(*gsva_ctx));
> > > > +        gsva_ctx->sva_ctx = sva_ctx;
> > > > +        gsva_ctx->container = container;
> > > > +        QLIST_INSERT_HEAD(&container->gsva_ctx_list,
> > > > +                          gsva_ctx,
> > > > +                          gsva_ctx_next);
> > > > +       /* Register vfio_iommu_sva_tlb_invalidate_notify with event flag
> > > > +           IOMMU_SVA_EVENT_TLB_INV */
> > > > +        iommu_sva_notifier_register(sva_ctx,
> > > > +                                    &gsva_ctx->n,
> > > > +                                    vfio_iommu_sva_tlb_invalidate_notify,
> > > > +                                    IOMMU_SVA_EVENT_TLB_INV);
> > >
> > > I would squash this patch into previous one since basically this is
> > > only part of the implementation to provide vfio-speicific register hook.
> >
> > sure.
> >
> > > But a more important question is... why this?
> > >
> > > IMHO the notifier registration can be general for PCI.  Why vfio
> > > needs to provide it's own register callback?  Would it be enough if it only
> provides its own notify callback?
> >
> > The notifiers are in VFIO. However, the registration is controlled by vIOMMU
> emulator.
> > In this series, PASID tagged Address Space is introduced. And the new
> > notifiers are for such Address Space. Such Address Space is created and deleted in
> vIOMMU emulator.
> > So the notifier registration needs to happen accordingly.
> >
> > e.g. guest SVM application bind a device to a process, it programs the
> > guest iommu translation structure, vIOMMU emulator captures the
> > change, and create a PASID tagged Address Space for it and register notifiers.
> >
> > That's why I do it in such a manner.
> 
> I agree that the things are mostly managed by vIOMMU, but I still cannot understand
> why vfio must have its own register hook.
> 
> Let me try to explain a bit more on my question.  Basically I was asking about
> whether we can removet the register/unregister hook in the SVAOps, instead we can
> have something like (I'll start to use pasid as prefix):
> 
> struct PCIPASIDOps {
>     void (*pasid_bind_table)(PCIBus *bus, int32_t devfn, ...);
>     void (*pasid_invalidate_extend_iotlb)(PCIBus *bus, int32_t devfn, ...) };
> 
> Firstly we keep the bind_table operation, but instead of the reg/unreg we only
> provide a hook that device can override to listen to extend iotlb invalidations.

Yeah, I also considered do invalidation this manner. I turned to the one in this patch.
Reason as below:
    the invalidate operation is supposed to pass down thru vfio container IOCTL, for
    each pasid_invalidate_extend_iotlb() calling, it needs to figure out a vfio container,
    which may be time consuming. Pls refer to vfio_get_container_from_busdev() in this
    patch. If we expose register/unregister hook, searching container will happen only in
    the register/unregister phase. And future invalidation could only be notifier firing.
With the reason above, I chose the register/unregister hook solution. If there is solution
to save the container searching, it would be better to do it in your proposal. Pls feel free
to let me know if any idea from you.

> IMHO my understanding is that the vIOMMU should be able to even hide the detailed
> PASID information here, and only call the
> pasid_invalidate_extend_iotlb() if the device gets extended-iotlb invalidations (then
> it passes it down to host IOMMU, with the same information like domain ID, PASID,
> granularity).
> 
> Would that work?

address, size, PASID, granularity may be enough. DID should be in host.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier
  2018-03-08 11:22         ` Liu, Yi L
@ 2018-03-09  7:05           ` Peter Xu
  2018-03-09 10:25             ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Xu @ 2018-03-09  7:05 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

On Thu, Mar 08, 2018 at 11:22:26AM +0000, Liu, Yi L wrote:
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Tuesday, March 6, 2018 8:10 PM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Cc: Liu, Yi L <yi.l.liu@linux.intel.com>; qemu-devel@nongnu.org; mst@redhat.com;
> > david@gibson.dropbear.id.au; pbonzini@redhat.com; alex.williamson@redhat.com;
> > eric.auger.pro@gmail.com; Tian, Kevin <kevin.tian@intel.com>;
> > jasowang@redhat.com
> > Subject: Re: [PATCH v3 07/12] vfio/pci: register sva notifier
> > 
> > On Tue, Mar 06, 2018 at 08:00:41AM +0000, Liu, Yi L wrote:
> > > > From: Peter Xu [mailto:peterx@redhat.com]
> > > > Sent: Tuesday, March 6, 2018 2:45 PM
> > > > Subject: Re: [PATCH v3 07/12] vfio/pci: register sva notifier
> > > >
> > > > On Thu, Mar 01, 2018 at 06:33:30PM +0800, Liu, Yi L wrote:
> > > > > This patch shows how sva notifier is registered. And provided an
> > > > > example by registering notify func for tlb flush propagation.
> > > > >
> > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > > > ---
> > > > >  hw/vfio/pci.c | 55
> > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 53 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index a60a4d7..b7297cc
> > > > > 100644
> > > > > --- a/hw/vfio/pci.c
> > > > > +++ b/hw/vfio/pci.c
> > > > > @@ -2775,6 +2775,26 @@ static void
> > > > > vfio_unregister_req_notifier(VFIOPCIDevice
> > > > *vdev)
> > > > >      vdev->req_enabled = false;
> > > > >  }
> > > > >
> > > > > +static VFIOContainer *vfio_get_container_from_busdev(PCIBus *bus,
> > > > > +                                                     int32_t devfn) {
> > > > > +    VFIOGroup *group;
> > > > > +    VFIOPCIDevice *vdev_iter;
> > > > > +    VFIODevice *vbasedev_iter;
> > > > > +    PCIDevice *pdev_iter;
> > > > > +
> > > > > +    QLIST_FOREACH(group, &vfio_group_list, next) {
> > > > > +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> > > > > +            vdev_iter = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> > > > > +            pdev_iter = &vdev_iter->pdev;
> > > > > +            if (pci_get_bus(pdev_iter) == bus && pdev_iter->devfn == devfn) {
> > > > > +                return group->container;
> > > > > +            }
> > > > > +        }
> > > > > +    }
> > > > > +    return NULL;
> > > > > +}
> > > > > +
> > > > >  static void vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> > > > >                   int32_t devfn, uint64_t pasidt_addr, uint32_t
> > > > > size) { @@ -2783,11 +2803,42 @@ static void
> > > > > vfio_pci_device_sva_bind_pasid_table(PCIBus *bus,
> > > > >      So far, Intel VT-d and AMD IOMMU requires it. */  }
> > > > >
> > > > > +static void vfio_iommu_sva_tlb_invalidate_notify(IOMMUSVANotifier *n,
> > > > > +
> > > > > +IOMMUSVAEventData
> > > > > +*event_data) {
> > > > > +/*  Sample code, would be detailed in coming virt-SVA patchset.
> > > > > +    VFIOGuestIOMMUSVAContext *gsva_ctx;
> > > > > +    IOMMUSVAContext *sva_ctx;
> > > > > +    VFIOContainer *container;
> > > > > +
> > > > > +    gsva_ctx = container_of(n, VFIOGuestIOMMUSVAContext, n);
> > > > > +    container = gsva_ctx->container;
> > > > > +
> > > > > +    TODO: forward to host through VFIO IOCTL
> > > >
> > > > IMHO if the series is not ready for merging, we can still mark it as
> > > > RFC and declare that so people won't need to go into details of the patches.
> > >
> > > Thanks for the suggestion. Actually, I was hesitating it. As you may
> > > know, this is actually 3rd version of this effort. But yes, I would follow your
> > suggestion in coming versions.
> > 
> > Yeah, it's a long way even since the first version of the work.
> > However IMHO it's not about which version are you working with, it's about whether
> > you think it's a complete work and ready to be merged.
> > IMHO if you are very sure it's not good for merging, we should better provide the
> > RFC tag, or mention that in the cover letter.  So firstly the maintainer won't
> > accidentaly merge your series; meanwhile reviewers will know the state of series so
> > they can decide on which aspect they'll focus on during the review.
> 
> thanks for the guiding~
> 
> > >
> > > > > +*/
> > > > > +}
> > > > > +
> > > > >  static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
> > > > >                            int32_t devfn, IOMMUSVAContext *sva_ctx)  {
> > > > > -    /* Register notifier for TLB invalidation propagation
> > > > > -       */
> > > > > +    VFIOContainer *container =
> > > > > + vfio_get_container_from_busdev(bus,
> > > > > + devfn);
> > > > > +
> > > > > +    if (container != NULL) {
> > > > > +        VFIOGuestIOMMUSVAContext *gsva_ctx;
> > > > > +        gsva_ctx = g_malloc0(sizeof(*gsva_ctx));
> > > > > +        gsva_ctx->sva_ctx = sva_ctx;
> > > > > +        gsva_ctx->container = container;
> > > > > +        QLIST_INSERT_HEAD(&container->gsva_ctx_list,
> > > > > +                          gsva_ctx,
> > > > > +                          gsva_ctx_next);
> > > > > +       /* Register vfio_iommu_sva_tlb_invalidate_notify with event flag
> > > > > +           IOMMU_SVA_EVENT_TLB_INV */
> > > > > +        iommu_sva_notifier_register(sva_ctx,
> > > > > +                                    &gsva_ctx->n,
> > > > > +                                    vfio_iommu_sva_tlb_invalidate_notify,
> > > > > +                                    IOMMU_SVA_EVENT_TLB_INV);
> > > >
> > > > I would squash this patch into previous one since basically this is
> > > > only part of the implementation to provide vfio-speicific register hook.
> > >
> > > sure.
> > >
> > > > But a more important question is... why this?
> > > >
> > > > IMHO the notifier registration can be general for PCI.  Why vfio
> > > > needs to provide it's own register callback?  Would it be enough if it only
> > provides its own notify callback?
> > >
> > > The notifiers are in VFIO. However, the registration is controlled by vIOMMU
> > emulator.
> > > In this series, PASID tagged Address Space is introduced. And the new
> > > notifiers are for such Address Space. Such Address Space is created and deleted in
> > vIOMMU emulator.
> > > So the notifier registration needs to happen accordingly.
> > >
> > > e.g. guest SVM application bind a device to a process, it programs the
> > > guest iommu translation structure, vIOMMU emulator captures the
> > > change, and create a PASID tagged Address Space for it and register notifiers.
> > >
> > > That's why I do it in such a manner.
> > 
> > I agree that the things are mostly managed by vIOMMU, but I still cannot understand
> > why vfio must have its own register hook.
> > 
> > Let me try to explain a bit more on my question.  Basically I was asking about
> > whether we can removet the register/unregister hook in the SVAOps, instead we can
> > have something like (I'll start to use pasid as prefix):
> > 
> > struct PCIPASIDOps {
> >     void (*pasid_bind_table)(PCIBus *bus, int32_t devfn, ...);
> >     void (*pasid_invalidate_extend_iotlb)(PCIBus *bus, int32_t devfn, ...) };
> > 
> > Firstly we keep the bind_table operation, but instead of the reg/unreg we only
> > provide a hook that device can override to listen to extend iotlb invalidations.
> 
> Yeah, I also considered do invalidation this manner. I turned to the one in this patch.
> Reason as below:
>     the invalidate operation is supposed to pass down thru vfio container IOCTL, for
>     each pasid_invalidate_extend_iotlb() calling, it needs to figure out a vfio container,
>     which may be time consuming. Pls refer to vfio_get_container_from_busdev() in this
>     patch. If we expose register/unregister hook, searching container will happen only in
>     the register/unregister phase. And future invalidation could only be notifier firing.
> With the reason above, I chose the register/unregister hook solution. If there is solution
> to save the container searching, it would be better to do it in your proposal. Pls feel free
> to let me know if any idea from you.

If there is PCIBus* and devfn passed into
pasid_invalidate_extend_iotlb() (let's assume it's called this way),
then IMHO we can get the PCIDevice*, which can be upcast to a
VFIOPCIDevice, then would VFIOPCIDevice.vbasedev.group->container be
the container for that device?

> 
> > IMHO my understanding is that the vIOMMU should be able to even hide the detailed
> > PASID information here, and only call the
> > pasid_invalidate_extend_iotlb() if the device gets extended-iotlb invalidations (then
> > it passes it down to host IOMMU, with the same information like domain ID, PASID,
> > granularity).
> > 
> > Would that work?
> 
> address, size, PASID, granularity may be enough. DID should be in host.

Yeah, it is.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-08  9:39     ` Liu, Yi L
@ 2018-03-09  7:59       ` Peter Xu
  2018-03-09  8:09         ` Tian, Kevin
  2018-03-09 11:05         ` Liu, Yi L
  0 siblings, 2 replies; 65+ messages in thread
From: Peter Xu @ 2018-03-09  7:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

On Thu, Mar 08, 2018 at 09:39:18AM +0000, Liu, Yi L wrote:
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Tuesday, March 6, 2018 7:44 PM
> > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged
> > AddressSpace
> > 
> > On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote:
> > > This patch shows the idea of how a device is binded to a PASID tagged
> > > AddressSpace.
> > >
> > > when Intel vIOMMU emulator detected a pasid table entry programming
> > > from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace
> > > with the pasid field of pasid cache invalidate request.
> > >
> > > * If it is to bind a device to a guest process, needs add the device
> > >   to the device list behind the VTDPASIDAddressSpace. And if the device
> > >   is assigned device, need to register sva_notfier for future tlb
> > >   flushing if any mapping changed to the process address space.
> > >
> > > * If it is to unbind a device from a guest process, then need to remove
> > >   the device from the device list behind the VTDPASIDAddressSpace.
> > >   And also needs to unregister the sva_notfier if the device is assigned
> > >   device.
> > >
> > > This patch hasn't added the unbind logic. It depends on guest pasid
> > > table entry parsing which requires further emulation. Here just want
> > > to show the idea for the PASID tagged AddressSpace management framework.
> > > Full unregister logic would be included in future virt-SVA patchset.
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  hw/i386/intel_iommu.c          | 119
> > +++++++++++++++++++++++++++++++++++++++++
> > >  hw/i386/intel_iommu_internal.h |  10 ++++
> > >  2 files changed, 129 insertions(+)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index b8e8dbb..ed07035 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState
> > *s, VTDInvDesc *inv_desc)
> > >      return true;
> > >  }
> > >
> > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s,
> > > +                                              uint32_t pasid)
> > > +{
> > > +    VTDPASIDAddressSpace *vtd_pasid_as = NULL;
> > > +    IntelPASIDNode *node;
> > > +    char name[128];
> > > +
> > > +    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
> > > +        vtd_pasid_as = node->pasid_as;
> > > +        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
> > > +            return vtd_pasid_as;
> > > +        }
> > > +    }
> > 
> > This seems to be a per-iommu pasid table.  However from the spec it
> > looks more like that should be per-domain (I'm seeing figure 3-8).
> > For example, each domain should be able to have its own pasid table.
> > Then IIUC a pasid context will need a (domain, pasid) tuple to
> > identify, not only the pasid itself?
> 
> Yes, this is a per-iommu table here. Actually, how we assemble the
> table here depends on the PASID namespace. You may refer to the
> iommu driver code. intel-svm.c, it's actually per-iommu.
> 
> 		/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
> 		ret = idr_alloc(&iommu->pasid_idr, svm,
> 				!!cap_caching_mode(iommu->cap),
> 				pasid_max - 1, GFP_KERNEL);

Thanks for the pointer.

However from the spec, I see that PASID table pointer is per-context,
IMHO which means that the spec will allow the PASID table to be
different from one device to another.  Even if current Linux is
sharing a single PASID table now, I don't know whether that can be
expanded in the future.  Also, what if we run a guest with another OS
besides Linux?

After all we are emulating the device, so IIUC the only thing we
should follow is the spec.

> 
> > 
> > And, do we need to destroy the pasid context after it's freed by the
> > guest?  Here it seems that we'll cache it forever.
> 
> If we need to do it. A PASID can be bind to multiple devices. If there
> is no device binding on it, then needs to destroy it. This may be done
> by refcount. As I mentioned in the description, that requires further
> vIOMMU emulation, so I didn't include it. But it should be covered
> in final version. Good catch.
> 
> > 
> > > +
> > > +    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
> > > +    vtd_pasid_as->iommu_state = s;
> > > +    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
> > > +    address_space_init(&vtd_pasid_as->as, NULL, "pasid");
> > 
> > I saw that this is only inited and never used.  Could I ask when this
> > will be used?
> 
> AddressSpace is actually introduced for future support of emulated
> SVA capable devices and possible 1st level paging shadowing(similar
> to the 2nd level page table shadowing you upstreamed).

I am not sure whether that can be useful even if there will be such a
device.  The reason is that if you see current with-IOMMU guests, they
are actually "somehow" bypassing the address space framework by
calling the IOMMU MR's translate() to do the page walking. If there
will be an emulated device that (for example) supports PASID, and with
the 1st page table enabled, I think it'll also work naturally with
current translate() interface, just that in the VT-d code we'll find
that we'll need to walk a process page table this time rather than the
IOMMU device page table.

And no matter what, I would prefer you drop this address space until
it'll be firstly used.

> 
> > 
> > > +    QLIST_INIT(&vtd_pasid_as->device_list);
> > > +
> > > +    node = g_malloc0(sizeof(*node));
> > > +    node->pasid_as = vtd_pasid_as;
> > > +    QLIST_INSERT_HEAD(&s->pasid_as_list, node, next);
> > > +
> > > +    return vtd_pasid_as;
> > > +}
> > > +
> > > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as,
> > > +                                        PCIBus *bus, uint8_t devfn)
> > > +{
> > > +    VTDDeviceNode *node = NULL;
> > > +
> > > +    QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) {
> > > +        if (node->bus == bus && node->devfn == devfn) {
> > > +            return;
> > > +        }
> > > +    }
> > > +
> > > +    node = g_malloc0(sizeof(*node));
> > > +    node->bus = bus;
> > > +    node->devfn = devfn;
> > > +    QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next);
> > 
> > So here I have the same confusion - IIUC according to the spec two
> > devices can have differnet pasid tables, however they can be sharing
> > the same PASID number (e.g., pasid=1) in the table.  
> 
> Do you mean the pasid table in iommu driver? I can not say it is impossible,
> but you may notice that in current iommu driver, the devices behind a single
> iommu unit shared pasid table.
> 
> > Here since
> > vtd_pasid_as is only per-IOMMU, could it possible that we put multiple
> > devices under same PASID context while actually they are not sharing
> > the same process page table?  Problematic?
> 
> You are correct, two devices may under same PASID context. For the case
> you described, I don't think it is allowed as it breaks the PASID concept.
> Software should avoid it.

Yeh, so here my question would be the same as above: is it following
the spec that all devices _must_ share a PASID table between devices,
or it is just that we _can_ share it as a first version of Linux SVA
implementation?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-09  7:59       ` Peter Xu
@ 2018-03-09  8:09         ` Tian, Kevin
  2018-03-09 11:05         ` Liu, Yi L
  1 sibling, 0 replies; 65+ messages in thread
From: Tian, Kevin @ 2018-03-09  8:09 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, March 9, 2018 3:59 PM
> 
> On Thu, Mar 08, 2018 at 09:39:18AM +0000, Liu, Yi L wrote:
> > > From: Peter Xu [mailto:peterx@redhat.com]
> > > Sent: Tuesday, March 6, 2018 7:44 PM
> > > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged
> > > AddressSpace
> > >
> > > On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote:
> > > > This patch shows the idea of how a device is binded to a PASID tagged
> > > > AddressSpace.
> > > >
> > > > when Intel vIOMMU emulator detected a pasid table entry
> programming
> > > > from guest. Intel vIOMMU emulator firstly finds a
> VTDPASIDAddressSpace
> > > > with the pasid field of pasid cache invalidate request.
> > > >
> > > > * If it is to bind a device to a guest process, needs add the device
> > > >   to the device list behind the VTDPASIDAddressSpace. And if the
> device
> > > >   is assigned device, need to register sva_notfier for future tlb
> > > >   flushing if any mapping changed to the process address space.
> > > >
> > > > * If it is to unbind a device from a guest process, then need to remove
> > > >   the device from the device list behind the VTDPASIDAddressSpace.
> > > >   And also needs to unregister the sva_notfier if the device is assigned
> > > >   device.
> > > >
> > > > This patch hasn't added the unbind logic. It depends on guest pasid
> > > > table entry parsing which requires further emulation. Here just want
> > > > to show the idea for the PASID tagged AddressSpace management
> framework.
> > > > Full unregister logic would be included in future virt-SVA patchset.
> > > >
> > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > > ---
> > > >  hw/i386/intel_iommu.c          | 119
> > > +++++++++++++++++++++++++++++++++++++++++
> > > >  hw/i386/intel_iommu_internal.h |  10 ++++
> > > >  2 files changed, 129 insertions(+)
> > > >
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > index b8e8dbb..ed07035 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -1801,6 +1801,118 @@ static bool
> vtd_process_iotlb_desc(IntelIOMMUState
> > > *s, VTDInvDesc *inv_desc)
> > > >      return true;
> > > >  }
> > > >
> > > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState
> *s,
> > > > +                                              uint32_t pasid)
> > > > +{
> > > > +    VTDPASIDAddressSpace *vtd_pasid_as = NULL;
> > > > +    IntelPASIDNode *node;
> > > > +    char name[128];
> > > > +
> > > > +    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
> > > > +        vtd_pasid_as = node->pasid_as;
> > > > +        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
> > > > +            return vtd_pasid_as;
> > > > +        }
> > > > +    }
> > >
> > > This seems to be a per-iommu pasid table.  However from the spec it
> > > looks more like that should be per-domain (I'm seeing figure 3-8).
> > > For example, each domain should be able to have its own pasid table.
> > > Then IIUC a pasid context will need a (domain, pasid) tuple to
> > > identify, not only the pasid itself?
> >
> > Yes, this is a per-iommu table here. Actually, how we assemble the
> > table here depends on the PASID namespace. You may refer to the
> > iommu driver code. intel-svm.c, it's actually per-iommu.
> >
> > 		/* Do not use PASID 0 in caching mode (virtualised IOMMU)
> */
> > 		ret = idr_alloc(&iommu->pasid_idr, svm,
> > 				!!cap_caching_mode(iommu->cap),
> > 				pasid_max - 1, GFP_KERNEL);
> 
> Thanks for the pointer.
> 
> However from the spec, I see that PASID table pointer is per-context,
> IMHO which means that the spec will allow the PASID table to be
> different from one device to another.  Even if current Linux is
> sharing a single PASID table now, I don't know whether that can be
> expanded in the future.  Also, what if we run a guest with another OS
> besides Linux?
> 
> After all we are emulating the device, so IIUC the only thing we
> should follow is the spec.
> 
> >
> > >
> > > And, do we need to destroy the pasid context after it's freed by the
> > > guest?  Here it seems that we'll cache it forever.
> >
> > If we need to do it. A PASID can be bind to multiple devices. If there
> > is no device binding on it, then needs to destroy it. This may be done
> > by refcount. As I mentioned in the description, that requires further
> > vIOMMU emulation, so I didn't include it. But it should be covered
> > in final version. Good catch.
> >
> > >
> > > > +
> > > > +    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
> > > > +    vtd_pasid_as->iommu_state = s;
> > > > +    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
> > > > +    address_space_init(&vtd_pasid_as->as, NULL, "pasid");
> > >
> > > I saw that this is only inited and never used.  Could I ask when this
> > > will be used?
> >
> > AddressSpace is actually introduced for future support of emulated
> > SVA capable devices and possible 1st level paging shadowing(similar
> > to the 2nd level page table shadowing you upstreamed).
> 
> I am not sure whether that can be useful even if there will be such a
> device.  The reason is that if you see current with-IOMMU guests, they
> are actually "somehow" bypassing the address space framework by
> calling the IOMMU MR's translate() to do the page walking. If there
> will be an emulated device that (for example) supports PASID, and with
> the 1st page table enabled, I think it'll also work naturally with
> current translate() interface, just that in the VT-d code we'll find
> that we'll need to walk a process page table this time rather than the
> IOMMU device page table.
> 
> And no matter what, I would prefer you drop this address space until
> it'll be firstly used.
> 
> >
> > >
> > > > +    QLIST_INIT(&vtd_pasid_as->device_list);
> > > > +
> > > > +    node = g_malloc0(sizeof(*node));
> > > > +    node->pasid_as = vtd_pasid_as;
> > > > +    QLIST_INSERT_HEAD(&s->pasid_as_list, node, next);
> > > > +
> > > > +    return vtd_pasid_as;
> > > > +}
> > > > +
> > > > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace
> *vtd_pasid_as,
> > > > +                                        PCIBus *bus, uint8_t devfn)
> > > > +{
> > > > +    VTDDeviceNode *node = NULL;
> > > > +
> > > > +    QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) {
> > > > +        if (node->bus == bus && node->devfn == devfn) {
> > > > +            return;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    node = g_malloc0(sizeof(*node));
> > > > +    node->bus = bus;
> > > > +    node->devfn = devfn;
> > > > +    QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next);
> > >
> > > So here I have the same confusion - IIUC according to the spec two
> > > devices can have differnet pasid tables, however they can be sharing
> > > the same PASID number (e.g., pasid=1) in the table.
> >
> > Do you mean the pasid table in iommu driver? I can not say it is
> impossible,
> > but you may notice that in current iommu driver, the devices behind a
> single
> > iommu unit shared pasid table.
> >
> > > Here since
> > > vtd_pasid_as is only per-IOMMU, could it possible that we put multiple
> > > devices under same PASID context while actually they are not sharing
> > > the same process page table?  Problematic?
> >
> > You are correct, two devices may under same PASID context. For the case
> > you described, I don't think it is allowed as it breaks the PASID concept.
> > Software should avoid it.
> 
> Yeh, so here my question would be the same as above: is it following
> the spec that all devices _must_ share a PASID table between devices,
> or it is just that we _can_ share it as a first version of Linux SVA
> implementation?
> 

the spec defines PASID table as per device. Software may decide whether
to share PASID table between devices based on its needs (e.g. with kernel
drivers sharing PASID table can reduce footprint, but with user space drivers
then per-device PASID table is necessary to ensure isolation). VT-d emulation
code shouldn't stick to one specific software usage here...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier
  2018-03-09  7:05           ` Peter Xu
@ 2018-03-09 10:25             ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-09 10:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, March 9, 2018 3:06 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
[...]
> > > > > > +
> > > > > >  static void vfio_pci_device_sva_register_notifier(PCIBus *bus,
> > > > > >                            int32_t devfn, IOMMUSVAContext *sva_ctx)  {
> > > > > > -    /* Register notifier for TLB invalidation propagation
> > > > > > -       */
> > > > > > +    VFIOContainer *container =
> > > > > > + vfio_get_container_from_busdev(bus,
> > > > > > + devfn);
> > > > > > +
> > > > > > +    if (container != NULL) {
> > > > > > +        VFIOGuestIOMMUSVAContext *gsva_ctx;
> > > > > > +        gsva_ctx = g_malloc0(sizeof(*gsva_ctx));
> > > > > > +        gsva_ctx->sva_ctx = sva_ctx;
> > > > > > +        gsva_ctx->container = container;
> > > > > > +        QLIST_INSERT_HEAD(&container->gsva_ctx_list,
> > > > > > +                          gsva_ctx,
> > > > > > +                          gsva_ctx_next);
> > > > > > +       /* Register vfio_iommu_sva_tlb_invalidate_notify with event flag
> > > > > > +           IOMMU_SVA_EVENT_TLB_INV */
> > > > > > +        iommu_sva_notifier_register(sva_ctx,
> > > > > > +                                    &gsva_ctx->n,
> > > > > > +                                    vfio_iommu_sva_tlb_invalidate_notify,
> > > > > > +                                    IOMMU_SVA_EVENT_TLB_INV);
> > > > >
> > > > > I would squash this patch into previous one since basically this
> > > > > is only part of the implementation to provide vfio-speicific register hook.
> > > >
> > > > sure.
> > > >
> > > > > But a more important question is... why this?
> > > > >
> > > > > IMHO the notifier registration can be general for PCI.  Why vfio
> > > > > needs to provide it's own register callback?  Would it be enough
> > > > > if it only
> > > provides its own notify callback?
> > > >
> > > > The notifiers are in VFIO. However, the registration is controlled
> > > > by vIOMMU
> > > emulator.
> > > > In this series, PASID tagged Address Space is introduced. And the
> > > > new notifiers are for such Address Space. Such Address Space is
> > > > created and deleted in
> > > vIOMMU emulator.
> > > > So the notifier registration needs to happen accordingly.
> > > >
> > > > e.g. guest SVM application bind a device to a process, it programs
> > > > the guest iommu translation structure, vIOMMU emulator captures
> > > > the change, and create a PASID tagged Address Space for it and register
> notifiers.
> > > >
> > > > That's why I do it in such a manner.
> > >
> > > I agree that the things are mostly managed by vIOMMU, but I still
> > > cannot understand why vfio must have its own register hook.
> > >
> > > Let me try to explain a bit more on my question.  Basically I was
> > > asking about whether we can removet the register/unregister hook in
> > > the SVAOps, instead we can have something like (I'll start to use pasid as prefix):
> > >
> > > struct PCIPASIDOps {
> > >     void (*pasid_bind_table)(PCIBus *bus, int32_t devfn, ...);
> > >     void (*pasid_invalidate_extend_iotlb)(PCIBus *bus, int32_t
> > > devfn, ...) };
> > >
> > > Firstly we keep the bind_table operation, but instead of the
> > > reg/unreg we only provide a hook that device can override to listen to extend
> iotlb invalidations.
> >
> > Yeah, I also considered do invalidation this manner. I turned to the one in this
> patch.
> > Reason as below:
> >     the invalidate operation is supposed to pass down thru vfio container IOCTL, for
> >     each pasid_invalidate_extend_iotlb() calling, it needs to figure out a vfio
> container,
> >     which may be time consuming. Pls refer to vfio_get_container_from_busdev() in
> this
> >     patch. If we expose register/unregister hook, searching container will happen
> only in
> >     the register/unregister phase. And future invalidation could only be notifier firing.
> > With the reason above, I chose the register/unregister hook solution.
> > If there is solution to save the container searching, it would be
> > better to do it in your proposal. Pls feel free to let me know if any idea from you.
> 
> If there is PCIBus* and devfn passed into
> pasid_invalidate_extend_iotlb() (let's assume it's called this way), then IMHO we can
> get the PCIDevice*, which can be upcast to a VFIOPCIDevice, then would
> VFIOPCIDevice.vbasedev.group->container be the container for that device?

Good catch. Would apply. Let me try to use it in next version.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-09  7:59       ` Peter Xu
  2018-03-09  8:09         ` Tian, Kevin
@ 2018-03-09 11:05         ` Liu, Yi L
  1 sibling, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-09 11:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: Liu, Yi L, qemu-devel, mst, david, pbonzini, alex.williamson,
	eric.auger.pro, Tian, Kevin, jasowang

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, March 9, 2018 3:59 PM
> Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged
> AddressSpace
> 
> On Thu, Mar 08, 2018 at 09:39:18AM +0000, Liu, Yi L wrote:
> > > From: Peter Xu [mailto:peterx@redhat.com]
> > > Sent: Tuesday, March 6, 2018 7:44 PM
> > > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID
> > > tagged AddressSpace
> > >
> > > On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote:
> > > > This patch shows the idea of how a device is binded to a PASID
> > > > tagged AddressSpace.
> > > >
> > > > when Intel vIOMMU emulator detected a pasid table entry
> > > > programming from guest. Intel vIOMMU emulator firstly finds a
> > > > VTDPASIDAddressSpace with the pasid field of pasid cache invalidate request.
> > > >
> > > > * If it is to bind a device to a guest process, needs add the device
> > > >   to the device list behind the VTDPASIDAddressSpace. And if the device
> > > >   is assigned device, need to register sva_notfier for future tlb
> > > >   flushing if any mapping changed to the process address space.
> > > >
> > > > * If it is to unbind a device from a guest process, then need to remove
> > > >   the device from the device list behind the VTDPASIDAddressSpace.
> > > >   And also needs to unregister the sva_notfier if the device is assigned
> > > >   device.
> > > >
> > > > This patch hasn't added the unbind logic. It depends on guest
> > > > pasid table entry parsing which requires further emulation. Here
> > > > just want to show the idea for the PASID tagged AddressSpace management
> framework.
> > > > Full unregister logic would be included in future virt-SVA patchset.
> > > >
> > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > > ---
> > > >  hw/i386/intel_iommu.c          | 119
> > > +++++++++++++++++++++++++++++++++++++++++
> > > >  hw/i386/intel_iommu_internal.h |  10 ++++
> > > >  2 files changed, 129 insertions(+)
> > > >
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > > > b8e8dbb..ed07035 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -1801,6 +1801,118 @@ static bool
> > > > vtd_process_iotlb_desc(IntelIOMMUState
> > > *s, VTDInvDesc *inv_desc)
> > > >      return true;
> > > >  }
> > > >
> > > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s,
> > > > +                                              uint32_t pasid) {
> > > > +    VTDPASIDAddressSpace *vtd_pasid_as = NULL;
> > > > +    IntelPASIDNode *node;
> > > > +    char name[128];
> > > > +
> > > > +    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
> > > > +        vtd_pasid_as = node->pasid_as;
> > > > +        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
> > > > +            return vtd_pasid_as;
> > > > +        }
> > > > +    }
> > >
> > > This seems to be a per-iommu pasid table.  However from the spec it
> > > looks more like that should be per-domain (I'm seeing figure 3-8).
> > > For example, each domain should be able to have its own pasid table.
> > > Then IIUC a pasid context will need a (domain, pasid) tuple to
> > > identify, not only the pasid itself?
> >
> > Yes, this is a per-iommu table here. Actually, how we assemble the
> > table here depends on the PASID namespace. You may refer to the iommu
> > driver code. intel-svm.c, it's actually per-iommu.
> >
> > 		/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
> > 		ret = idr_alloc(&iommu->pasid_idr, svm,
> > 				!!cap_caching_mode(iommu->cap),
> > 				pasid_max - 1, GFP_KERNEL);
> 
> Thanks for the pointer.
> 
> However from the spec, I see that PASID table pointer is per-context, IMHO which
> means that the spec will allow the PASID table to be different from one device to
> another.  Even if current Linux is sharing a single PASID table now, I don't know
> whether that can be expanded in the future.  Also, what if we run a guest with
> another OS besides Linux?
> 
> After all we are emulating the device, so IIUC the only thing we should follow is the
> spec.

Agree. just echo Kevin's reply here. Let' me re-consider a way to maintain all the
PASID tagged address space here.

> 
> >
> > >
> > > And, do we need to destroy the pasid context after it's freed by the
> > > guest?  Here it seems that we'll cache it forever.
> >
> > If we need to do it. A PASID can be bind to multiple devices. If there
> > is no device binding on it, then needs to destroy it. This may be done
> > by refcount. As I mentioned in the description, that requires further
> > vIOMMU emulation, so I didn't include it. But it should be covered in
> > final version. Good catch.
> >
> > >
> > > > +
> > > > +    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
> > > > +    vtd_pasid_as->iommu_state = s;
> > > > +    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
> > > > +    address_space_init(&vtd_pasid_as->as, NULL, "pasid");
> > >
> > > I saw that this is only inited and never used.  Could I ask when
> > > this will be used?
> >
> > AddressSpace is actually introduced for future support of emulated SVA
> > capable devices and possible 1st level paging shadowing(similar to the
> > 2nd level page table shadowing you upstreamed).
> 
> I am not sure whether that can be useful even if there will be such a device.  The
> reason is that if you see current with-IOMMU guests, they are actually "somehow"
> bypassing the address space framework by calling the IOMMU MR's translate() to do
> the page walking. If there will be an emulated device that (for example) supports
> PASID, and with the 1st page table enabled, I think it'll also work naturally with
> current translate() interface, just that in the VT-d code we'll find that we'll need to
> walk a process page table this time rather than the IOMMU device page table.
> 
> And no matter what, I would prefer you drop this address space until it'll be firstly
> used.

yeah, I would. May add parameter like pasid in the existing MR translate() interface
to meet the requirement.

> >
> > >
> > > > +    QLIST_INIT(&vtd_pasid_as->device_list);
> > > > +
> > > > +    node = g_malloc0(sizeof(*node));
> > > > +    node->pasid_as = vtd_pasid_as;
> > > > +    QLIST_INSERT_HEAD(&s->pasid_as_list, node, next);
> > > > +
> > > > +    return vtd_pasid_as;
> > > > +}
> > > > +
> > > > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace
> *vtd_pasid_as,
> > > > +                                        PCIBus *bus, uint8_t
> > > > +devfn) {
> > > > +    VTDDeviceNode *node = NULL;
> > > > +
> > > > +    QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) {
> > > > +        if (node->bus == bus && node->devfn == devfn) {
> > > > +            return;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    node = g_malloc0(sizeof(*node));
> > > > +    node->bus = bus;
> > > > +    node->devfn = devfn;
> > > > +    QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next);
> > >
> > > So here I have the same confusion - IIUC according to the spec two
> > > devices can have differnet pasid tables, however they can be sharing
> > > the same PASID number (e.g., pasid=1) in the table.
> >
> > Do you mean the pasid table in iommu driver? I can not say it is
> > impossible, but you may notice that in current iommu driver, the
> > devices behind a single iommu unit shared pasid table.
> >
> > > Here since
> > > vtd_pasid_as is only per-IOMMU, could it possible that we put
> > > multiple devices under same PASID context while actually they are
> > > not sharing the same process page table?  Problematic?
> >
> > You are correct, two devices may under same PASID context. For the
> > case you described, I don't think it is allowed as it breaks the PASID concept.
> > Software should avoid it.
> 
> Yeh, so here my question would be the same as above: is it following the spec that
> all devices _must_ share a PASID table between devices, or it is just that we _can_
> share it as a first version of Linux SVA implementation?

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice
  2018-03-06 10:33   ` Liu, Yi L
@ 2018-04-12  2:36     ` David Gibson
  2018-04-12 11:06       ` Liu, Yi L
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2018-04-12  2:36 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger.pro,
	yi.l.liu, peterx, kevin.tian, jasowang

[-- Attachment #1: Type: text/plain, Size: 5493 bytes --]

On Tue, Mar 06, 2018 at 06:33:52PM +0800, Liu, Yi L wrote:
> On Mon, Mar 05, 2018 at 02:31:44PM +1100, David Gibson wrote:
> > On Thu, Mar 01, 2018 at 06:31:55PM +0800, Liu, Yi L wrote:
> > > This patch intoduces PCISVAOps for virt-SVA.
> > >
> > > So far, to setup virt-SVA for assigned SVA capable device, needs to
> > > config host translation structures. e.g. for VT-d, needs to set the
> > > guest pasid table to host and enable nested translation. Besides,
> > > vIOMMU emulator needs to forward guest's cache invalidation to host.
> > > On VT-d, it is guest's invalidation to 1st level translation related
> > > cache, such invalidation should be forwarded to host.
> > >
> > > Proposed PCISVAOps are:
> > > * sva_bind_guest_pasid_table: set the guest pasid table to host, and
> > >                               enable nested translation in host
> > > * sva_register_notifier: register sva_notifier to forward guest's
> > >                          cache invalidation to host
> > > * sva_unregister_notifier: unregister sva_notifier
> > >
> > > The PCISVAOps should be provided by vfio or modules alike. Mainly for
> > > assigned SVA capable devices.
> > >
> > > Take virt-SVA on VT-d as an exmaple:
> > > If a guest wants to setup virt-SVA for an assigned SVA capable device,
> > > it programs its context entry. vIOMMU emulator captures guest's context
> > > entry programming, and figure out the target device. vIOMMU emulator
> > > use the pci_device_sva_bind_pasid_table() API to bind the guest pasid
> > > table to host.
> > >
> > > Guest would also program its pasid table. vIOMMU emulator captures
> > > guest's pasid entry programming. In Qemu, needs to allocate an
> > > AddressSpace to stand for the pasid tagged address space and Qemu also
> > > needs to register sva_notifier to forward future cache invalidation
> > > request to host.
> > >
> > > Allocating AddressSpace to stand for the pasid tagged address space is
> > > for the emulation of emulated SVA capable devices. Emulated SVA capable
> > > devices may issue SVA aware DMAs, Qemu needs to emulate read/write to a
> > > pasid tagged AddressSpace. Thus needs an abstraction for such address
> > > space in Qemu.
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >
> > So PCISVAOps is roughly equivalent to the cluster-of-PASIDs context I
> > was suggesting in my earlier comments,
> 
> yes, it is. The purpose is to expose pasid table bind and sva notfier
> registration/unregistration to vIOMMU emulators.
> 
> > however it's only an ops
> > structure.  That means you can't easily share a context between
> > multiple PCI devices which is unfortunate because:
> >     * The simplest use case for SVA I can see would just put the
> >       same set of PASIDs into place for every SVA capable device
> 
> Do you mean for emulated SVA capable device?

Not necessarily.  I'd expect that model could be useful for both
emulated and passthrough SVA capable devices.

> >     * Sometimes the IOMMU can't determine exactly what device a DMA
> >       came from.  Now the bridge cases where this applies are probably
> >       unlikely with SVA devices, but I wouldn't want to bet on it.  In
> >       addition, the chances some manufacturer will eventually put out
> >       a buggy multifunction SVA capable device that use the wrong RIDs
> >       for the secondary functions is pretty darn high.
> 
> I'm not sure I 100% got your point here. Do yu mean physical device?
> In PCIE TLP, DMA packet should have a RID field?

Yes, but that RID isn't accurate in all cases.

One case is if you have a PCIe device behind both a PCIe->PCI and
PCI->PCIe bridge.  Now obviously SVA won't work in that case, but it
would be good to at least detect it and refuse to attempt SVA.

Another case is with a buggy device that just sends the wrong RID.  In
particular there are some multifunction devices that use function 0's
RID for all functions.  Obviously that's a hardware bug and we can't
expect everything to work in this case.  But forcing all the functions
to share an SVAContext in this case - like we alreayd force them to
share an IOMMU group - allows us to reason about what will and won't work

> And it looks more like
> a hardware layer trouble. For this series, it only provides necessary
> software support to make sure guest's SVA operation is well prepared
> before the SVA device issues the SVA aware DMA. e.g. link guest's pasid
> table to host, and config iommu translation in nested mode.
> 
> >
> > So I think instead you want a cluster-of-PASIDs object which has an
> > ops table including both these and the per-PASID calls from the
> > earlier patches (but the per-PASID calls would now take an explicit
> > PASID value).
> 
> I didn't quite get "including both these and the per-PASID calls".
> What do you mean by "these"? Do you mean the PCISVAOps?

I mean that I think PCISVAOps should become a full object including an
ops table, not just an ops table.  That table would include the things
currently in PCISVAOps.  It would also include callbacks for the
things that are in your per-PASID object in this draft, but those
callbacks would now need to take an explicit PASIC parameter.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice
  2018-04-12  2:36     ` David Gibson
@ 2018-04-12 11:06       ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-04-12 11:06 UTC (permalink / raw)
  To: David Gibson, Liu, Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger.pro,
	peterx, Tian, Kevin, jasowang

Hi David,

> From: David Gibson [mailto:david@gibson.dropbear.id.au]
> Sent: Thursday, April 12, 2018 10:36 AM
> On Tue, Mar 06, 2018 at 06:33:52PM +0800, Liu, Yi L wrote:
> > On Mon, Mar 05, 2018 at 02:31:44PM +1100, David Gibson wrote:
> > > On Thu, Mar 01, 2018 at 06:31:55PM +0800, Liu, Yi L wrote:
> > > > This patch intoduces PCISVAOps for virt-SVA.
> > > >
> > > > So far, to setup virt-SVA for assigned SVA capable device, needs to
> > > > config host translation structures. e.g. for VT-d, needs to set the
> > > > guest pasid table to host and enable nested translation. Besides,
> > > > vIOMMU emulator needs to forward guest's cache invalidation to host.
> > > > On VT-d, it is guest's invalidation to 1st level translation related
> > > > cache, such invalidation should be forwarded to host.
> > > >
> > > > Proposed PCISVAOps are:
> > > > * sva_bind_guest_pasid_table: set the guest pasid table to host, and
> > > >                               enable nested translation in host
> > > > * sva_register_notifier: register sva_notifier to forward guest's
> > > >                          cache invalidation to host
> > > > * sva_unregister_notifier: unregister sva_notifier
> > > >
> > > > The PCISVAOps should be provided by vfio or modules alike. Mainly for
> > > > assigned SVA capable devices.
> > > >
> > > > Take virt-SVA on VT-d as an exmaple:
> > > > If a guest wants to setup virt-SVA for an assigned SVA capable device,
> > > > it programs its context entry. vIOMMU emulator captures guest's context
> > > > entry programming, and figure out the target device. vIOMMU emulator
> > > > use the pci_device_sva_bind_pasid_table() API to bind the guest pasid
> > > > table to host.
> > > >
> > > > Guest would also program its pasid table. vIOMMU emulator captures
> > > > guest's pasid entry programming. In Qemu, needs to allocate an
> > > > AddressSpace to stand for the pasid tagged address space and Qemu also
> > > > needs to register sva_notifier to forward future cache invalidation
> > > > request to host.
> > > >
> > > > Allocating AddressSpace to stand for the pasid tagged address space is
> > > > for the emulation of emulated SVA capable devices. Emulated SVA capable
> > > > devices may issue SVA aware DMAs, Qemu needs to emulate read/write to a
> > > > pasid tagged AddressSpace. Thus needs an abstraction for such address
> > > > space in Qemu.
> > > >
> > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > >
> > > So PCISVAOps is roughly equivalent to the cluster-of-PASIDs context I
> > > was suggesting in my earlier comments,
> >
> > yes, it is. The purpose is to expose pasid table bind and sva notfier
> > registration/unregistration to vIOMMU emulators.
> >
> > > however it's only an ops
> > > structure.  That means you can't easily share a context between
> > > multiple PCI devices which is unfortunate because:
> > >     * The simplest use case for SVA I can see would just put the
> > >       same set of PASIDs into place for every SVA capable device
> >
> > Do you mean for emulated SVA capable device?
> 
> Not necessarily.  I'd expect that model could be useful for both
> emulated and passthrough SVA capable devices.
> 
> > >     * Sometimes the IOMMU can't determine exactly what device a DMA
> > >       came from.  Now the bridge cases where this applies are probably
> > >       unlikely with SVA devices, but I wouldn't want to bet on it.  In
> > >       addition, the chances some manufacturer will eventually put out
> > >       a buggy multifunction SVA capable device that use the wrong RIDs
> > >       for the secondary functions is pretty darn high.
> >
> > I'm not sure I 100% got your point here. Do yu mean physical device?
> > In PCIE TLP, DMA packet should have a RID field?
> 
> Yes, but that RID isn't accurate in all cases.
> 
> One case is if you have a PCIe device behind both a PCIe->PCI and
> PCI->PCIe bridge.  Now obviously SVA won't work in that case, but it
> would be good to at least detect it and refuse to attempt SVA.
> 
> Another case is with a buggy device that just sends the wrong RID.  In
> particular there are some multifunction devices that use function 0's
> RID for all functions.  Obviously that's a hardware bug and we can't
> expect everything to work in this case.  But forcing all the functions
> to share an SVAContext in this case - like we alreayd force them to
> share an IOMMU group - allows us to reason about what will and won't work

Agree.

> 
> > And it looks more like
> > a hardware layer trouble. For this series, it only provides necessary
> > software support to make sure guest's SVA operation is well prepared
> > before the SVA device issues the SVA aware DMA. e.g. link guest's pasid
> > table to host, and config iommu translation in nested mode.

yes, it is.

> >
> > >
> > > So I think instead you want a cluster-of-PASIDs object which has an
> > > ops table including both these and the per-PASID calls from the
> > > earlier patches (but the per-PASID calls would now take an explicit
> > > PASID value).
> >
> > I didn't quite get "including both these and the per-PASID calls".
> > What do you mean by "these"? Do you mean the PCISVAOps?
> 
> I mean that I think PCISVAOps should become a full object including an
> ops table, not just an ops table.  That table would include the things
> currently in PCISVAOps.  It would also include callbacks for the
> things that are in your per-PASID object in this draft, but those
> callbacks would now need to take an explicit PASIC parameter.

Based on some comments from Peter Xu and your comments on the
" [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA"
I've considered a new approach which might be able to reuse the existing
translation logic based on MemoryRegion. I'm preparing some code to
show it. Before that, I'd like to see your opinion.

As we discussed, for assigned devices, we want to prepare the config before
the SVA device issues the SVA aware DMA. And this can be achieved by
PCISVAOps proposed as below:

struct PCISVAOps {
    void (*pasid_bind_table)(PCIBus *bus, int32_t devfn,
                    uint64_t pasidt_addr, uint32_t size);
    void (*pasid_invalidate_extend_iotlb)(PCIBus *bus,
                              int32_t devfn, void *data);
}
This is no more notifier based, for further extension, it could include more
callback in this ops. Previously, I thought notifier is better. However, Peter
corrected me in another thread.

While for emulated devices, we need to support address translation. So I
introduced IOMMUSVAContext and translate callback in this series. But it
duplicates much translation logic from MemoryRegion based logic. So I
reconsidered if it is ok to reuse MemoryRegion. And seems like possible.
Below is my thought. I took VT-d as example, you may check if works based
on your understanding.

1) add "pasid" and "pasid_allocated" field in the structure below. For each
PASID tagged address space, Qemu creates a VTDAddressSpace instance,
and set the pasid field. For PCI DMA address space, it won't init pasid and
pasid_allocated field.

struct VTDAddressSpace {
     PCIBus *bus;
     uint8_t devfn;
+    bool pasid_allocated;
+    uint32_t pasid;
     AddressSpace as;
     IOMMUMemoryRegion iommu;
     MemoryRegion root;
     ...
}

When emulated SVA capable devices issues SVA aware DMA, its device
model should be able to get a PASID, and then get correct AddressSpace.
The DMA emulation logic would finally call into imrc->translate() callback
which provided by IOMMU emulator. In the callback, it can get the
VTDAddressSpace and check if PASID is allocated. If yes, do translation with
corresponding 1st-level page table. If no, walk through the I/O page table(
2nd level-page table).

Another benefit of reusing MemoryRegion would be: if someone wants to
implement SVA with shadow solution, he or she could use the MAP/UNMAP
API to shadow SVA mapping to host iommu.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace
  2018-03-01 10:31 Liu, Yi L
@ 2018-03-01 10:32 ` Liu, Yi L
  0 siblings, 0 replies; 65+ messages in thread
From: Liu, Yi L @ 2018-03-01 10:32 UTC (permalink / raw)
  To: qemu-devel, mst, david
  Cc: pbonzini, alex.williamson, eric.auger.pro, Liu, Yi L

This patch shows the idea of how a device is binded to a PASID tagged
AddressSpace.

when Intel vIOMMU emulator detected a pasid table entry programming
from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace
with the pasid field of pasid cache invalidate request.

* If it is to bind a device to a guest process, needs add the device
  to the device list behind the VTDPASIDAddressSpace. And if the device
  is assigned device, need to register sva_notfier for future tlb
  flushing if any mapping changed to the process address space.

* If it is to unbind a device from a guest process, then need to remove
  the device from the device list behind the VTDPASIDAddressSpace.
  And also needs to unregister the sva_notfier if the device is assigned
  device.

This patch hasn't added the unbind logic. It depends on guest pasid
table entry parsing which requires further emulation. Here just want
to show the idea for the PASID tagged AddressSpace management framework.
Full unregister logic would be included in future virt-SVA patchset.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 119 +++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  10 ++++
 2 files changed, 129 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b8e8dbb..ed07035 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s,
+                                              uint32_t pasid)
+{
+    VTDPASIDAddressSpace *vtd_pasid_as = NULL;
+    IntelPASIDNode *node;
+    char name[128];
+
+    QLIST_FOREACH(node, &(s->pasid_as_list), next) {
+        vtd_pasid_as = node->pasid_as;
+        if (pasid == vtd_pasid_as->sva_ctx.pasid) {
+            return vtd_pasid_as;
+        }
+    }
+
+    vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as));
+    vtd_pasid_as->iommu_state = s;
+    snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid);
+    address_space_init(&vtd_pasid_as->as, NULL, "pasid");
+    QLIST_INIT(&vtd_pasid_as->device_list);
+
+    node = g_malloc0(sizeof(*node));
+    node->pasid_as = vtd_pasid_as;
+    QLIST_INSERT_HEAD(&s->pasid_as_list, node, next);
+
+    return vtd_pasid_as;
+}
+
+static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as,
+                                        PCIBus *bus, uint8_t devfn)
+{
+    VTDDeviceNode *node = NULL;
+
+    QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) {
+        if (node->bus == bus && node->devfn == devfn) {
+            return;
+        }
+    }
+
+    node = g_malloc0(sizeof(*node));
+    node->bus = bus;
+    node->devfn = devfn;
+    QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next);
+
+    pci_device_sva_register_notifier(bus, devfn, &vtd_pasid_as->sva_ctx);
+
+    return;
+}
+
+static bool vtd_process_pc_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
+{
+
+    IntelIOMMUAssignedDeviceNode *node = NULL;
+    int ret = 0;
+
+    uint16_t domain_id;
+    uint32_t pasid;
+    VTDPASIDAddressSpace *vtd_pasid_as;
+
+    if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) {
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo);
+
+    switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) {
+    case VTD_INV_DESC_PASIDC_ALL_ALL:
+        /* TODO: invalidate all pasid related cache */
+        break;
+
+    case VTD_INV_DESC_PASIDC_PASID_SI:
+        pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo);
+        vtd_pasid_as = vtd_get_pasid_as(s, pasid);
+        QLIST_FOREACH(node, &(s->assigned_device_list), next) {
+            VTDAddressSpace *vtd_as = node->vtd_as;
+            VTDContextEntry ce;
+            uint16_t did;
+            uint8_t bus = pci_bus_num(vtd_as->bus);
+            ret = vtd_dev_to_context_entry(s, bus,
+                                   vtd_as->devfn, &ce);
+            if (ret != 0) {
+                continue;
+            }
+
+            did = VTD_CONTEXT_ENTRY_DID(ce.hi);
+            /*
+             * If did field equals to the domain_id field of inv_descriptor,
+             * then the device is affect by this invalidate request, need to
+             * bind or unbind the device to the pasid tagged address space.
+             * a) If it is bind, need to add the device to the device list,
+             *    add register tlb flush notifier for it
+             * b) If it is unbind, need to remove the device from the device
+             *    list, and unregister the tlb flush notifier
+             * TODO: add unbind logic accordingly, depends on the parsing of
+             *       guest pasid table entry pasrsing, here has no parsing to
+             *       pasid table entry.
+             *
+             */
+            if (did == domain_id) {
+                vtd_bind_device_to_pasid_as(vtd_pasid_as,
+                                  vtd_as->bus, vtd_as->devfn);
+            }
+        }
+        break;
+
+    default:
+        return false;
+    }
+
+    return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
                                      VTDInvDesc *inv_desc)
 {
@@ -1911,6 +2023,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_PC:
+        trace_vtd_inv_desc("pc", inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_pc_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_IEC:
         trace_vtd_inv_desc("iec", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index d084099..31d0d53 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -332,6 +332,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IEC                0x4 /* Interrupt Entry Cache
                                                Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
+#define VTD_INV_DESC_PC                 0x7 /* PASID-cache Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -388,6 +389,15 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
         (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 
+#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_LO    0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_HI    0xffffffffffffffffULL
+
+#define VTD_INV_DESC_PASIDC_ALL_ALL    (0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2018-04-12 11:06 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-01 10:33 [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 01/12] memory: rename existing iommu notifier to be iommu mr notifier Liu, Yi L
2018-03-02 15:01   ` Paolo Bonzini
2018-03-05 10:09     ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 02/12] vfio: rename GuestIOMMU to be GuestIOMMUMR Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 03/12] hw/core: introduce IOMMUSVAContext for virt-SVA Liu, Yi L
2018-03-02 15:13   ` Paolo Bonzini
2018-03-05  8:10     ` Liu, Yi L
2018-03-06  8:51   ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 04/12] vfio/pci: add notify framework based on IOMMUSVAContext Liu, Yi L
2018-03-05  7:45   ` Peter Xu
2018-03-05  8:05     ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 05/12] hw/pci: introduce PCISVAOps to PCIDevice Liu, Yi L
2018-03-02 15:10   ` Paolo Bonzini
2018-03-05  8:11     ` Liu, Yi L
2018-03-06 10:33   ` Liu, Yi L
2018-04-12  2:36     ` David Gibson
2018-04-12 11:06       ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 06/12] vfio/pci: provide vfio_pci_sva_ops instance Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 07/12] vfio/pci: register sva notifier Liu, Yi L
2018-03-06  6:44   ` Peter Xu
2018-03-06  8:00     ` Liu, Yi L
2018-03-06 12:09       ` Peter Xu
2018-03-08 11:22         ` Liu, Yi L
2018-03-09  7:05           ` Peter Xu
2018-03-09 10:25             ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 08/12] hw/pci: introduce pci_device_notify_iommu() Liu, Yi L
2018-03-02 15:12   ` Paolo Bonzini
2018-03-05  8:42     ` Liu, Yi L
2018-03-06 10:18       ` Paolo Bonzini
2018-03-06 11:03         ` Liu, Yi L
2018-03-06 11:22           ` Paolo Bonzini
2018-03-06 11:27             ` Liu, Yi L
2018-03-02 16:06   ` Paolo Bonzini
2018-03-05  8:43     ` Liu, Yi L
2018-03-05 10:43       ` Peter Xu
2018-03-06 10:19         ` Paolo Bonzini
2018-03-06 10:47           ` Peter Xu
2018-03-06 11:06             ` Liu, Yi L
2018-03-05  8:27   ` Peter Xu
2018-03-05  8:46     ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 09/12] intel_iommu: record assigned devices in a list Liu, Yi L
2018-03-02 15:08   ` Paolo Bonzini
2018-03-05  9:39     ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 10/12] intel_iommu: bind guest pasid table to host Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 11/12] intel_iommu: add framework for PASID AddressSpace management Liu, Yi L
2018-03-02 14:52   ` Paolo Bonzini
2018-03-05  9:12     ` Liu, Yi L
2018-03-02 15:00   ` Paolo Bonzini
2018-03-05  9:11     ` Liu, Yi L
2018-03-06 10:26       ` Paolo Bonzini
2018-03-08 10:42         ` Liu, Yi L
2018-03-01 10:33 ` [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace Liu, Yi L
2018-03-02 14:51   ` Paolo Bonzini
2018-03-05  9:56     ` Liu, Yi L
2018-03-06 11:43   ` Peter Xu
2018-03-08  9:39     ` Liu, Yi L
2018-03-09  7:59       ` Peter Xu
2018-03-09  8:09         ` Tian, Kevin
2018-03-09 11:05         ` Liu, Yi L
2018-03-06  6:55 ` [Qemu-devel] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA Peter Xu
2018-03-06  7:45   ` Liu, Yi L
2018-03-07  5:38     ` Peter Xu
2018-03-08  9:10       ` Liu, Yi L
  -- strict thread matches above, loose matches on Subject: below --
2018-03-01 10:31 Liu, Yi L
2018-03-01 10:32 ` [Qemu-devel] [PATCH v3 12/12] intel_iommu: bind device to PASID tagged AddressSpace Liu, Yi L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.