All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/3] Introduce IOMMUNotifier struct
@ 2016-09-07  5:32 Peter Xu
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps Peter Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Xu @ 2016-09-07  5:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: dgibson, jasowang, mst, pbonzini, cornelia.huck, wexu, vkaplans,
	alex.williamson, peterx

V3:
- use QLIST instead of embedding Notifier into IOMMUNotifier [Paolo]
- fix a build error for ppc64-softmmu

The idea originates from one of Alex's reply:

  https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg00254.html

But after further discussions, it seems that only adding a simple type
for notifier is not enough. This series introduced IOMMUNotifier
struct to replace the old Notifier interface. Along with it, we can
provide registration for one (or multiple) of the IOMMU notifications:

- cache invalidations
- entry changes

This is a support material for Jason's vhost dmar patchset.

Please read commit messages for detailed information. Thanks,

Peter Xu (3):
  memory: introduce IOMMUNotifier and its caps
  memory: generalize iommu_ops.notify_started to notifier_add
  intel_iommu: allow invalidation typed notifiers

 hw/i386/intel_iommu.c         | 16 +++++++++------
 hw/ppc/spapr_iommu.c          | 18 +++++++++++------
 hw/vfio/common.c              |  3 ++-
 include/exec/memory.h         | 47 +++++++++++++++++++++++++++++++++----------
 include/hw/ppc/spapr.h        |  1 +
 include/hw/vfio/vfio-common.h |  2 +-
 memory.c                      | 43 +++++++++++++++++++++++++++------------
 7 files changed, 92 insertions(+), 38 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps
  2016-09-07  5:32 [Qemu-devel] [PATCH v3 0/3] Introduce IOMMUNotifier struct Peter Xu
@ 2016-09-07  5:32 ` Peter Xu
  2016-09-07  6:02   ` David Gibson
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add Peter Xu
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 3/3] intel_iommu: allow invalidation typed notifiers Peter Xu
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2016-09-07  5:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: dgibson, jasowang, mst, pbonzini, cornelia.huck, wexu, vkaplans,
	alex.williamson, peterx

IOMMU Notifier list is used for notifying IO address mapping changes.
Currently VFIO is the only user.

However it is possible that future consumer like vhost would like to
only listen to part of its notifications (e.g., cache invalidations).

This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
grained control of it.

IOMMUNotifier contains a bitfield for the notify consumer describing
what kind of notification it is interested in. Currently two kinds of
notifications are defined:

- IOMMU_NOTIFIER_CHANGE:       for entry changes (additions)
- IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)

When registering the IOMMU notifier, we need to specify one or multiple
capability bit(s) to listen to.

When notifications are triggered, it will be checked against the
notifier's capability bits, and only notifiers with registered bits will
be notified.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/vfio/common.c              |  3 ++-
 include/exec/memory.h         | 39 ++++++++++++++++++++++++++++++++-------
 include/hw/vfio/vfio-common.h |  2 +-
 memory.c                      | 37 ++++++++++++++++++++++++++++---------
 4 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b313e7c..b0cea2c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -293,7 +293,7 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
            section->offset_within_address_space & (1ULL << 63);
 }
 
-static void vfio_iommu_map_notify(Notifier *n, void *data)
+static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
 {
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
     VFIOContainer *container = giommu->container;
@@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                                section->offset_within_region;
         giommu->container = container;
         giommu->n.notify = vfio_iommu_map_notify;
+        giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
         memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3e4d416..92f14db 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -67,6 +67,28 @@ struct IOMMUTLBEntry {
     IOMMUAccessFlags perm;
 };
 
+/*
+ * Bitmap for differnet IOMMUNotifier capabilities. Each notifier can
+ * register with one or multiple IOMMU Notifier capability bit(s).
+ */
+typedef enum {
+    IOMMU_NOTIFIER_NONE = 0,
+    /* Notify cache invalidations */
+    IOMMU_NOTIFIER_INVALIDATION = 0x1,
+    /* Notify entry changes (newly created entries) */
+    IOMMU_NOTIFIER_CHANGE = 0x2,
+} IOMMUNotifierCap;
+
+#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_INVALIDATION | \
+                            IOMMU_NOTIFIER_CHANGE)
+
+struct IOMMUNotifier {
+    void (*notify)(struct IOMMUNotifier *notifier, void *data);
+    IOMMUNotifierCap notifier_caps;
+    QLIST_ENTRY(IOMMUNotifier) node;
+};
+typedef struct IOMMUNotifier IOMMUNotifier;
+
 /* New-style MMIO accessors can indicate that the transaction failed.
  * A zero (MEMTX_OK) response means success; anything else is a failure
  * of some kind. The memory subsystem will bitwise-OR together results
@@ -201,7 +223,7 @@ struct MemoryRegion {
     const char *name;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
-    NotifierList iommu_notify;
+    QLIST_HEAD(, IOMMUNotifier) iommu_notify;
 };
 
 /**
@@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
  * IOMMU translation entries.
  *
  * @mr: the memory region to observe
- * @n: the notifier to be added; the notifier receives a pointer to an
- *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
- *     valid on exit from the notifier.
+ * @n: the IOMMUNotifier to be added; the notify callback receives a
+ *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
+ *     ceases to be valid on exit from the notifier.
  */
-void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
+void memory_region_register_iommu_notifier(MemoryRegion *mr,
+                                           IOMMUNotifier *n);
 
 /**
  * memory_region_iommu_replay: replay existing IOMMU translations to
@@ -636,7 +659,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
  * @is_write: Whether to treat the replay as a translate "write"
  *     through the iommu
  */
-void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
+void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
+                                bool is_write);
 
 /**
  * memory_region_unregister_iommu_notifier: unregister a notifier for
@@ -646,7 +670,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
  *      needs to be called
  * @n: the notifier to be removed.
  */
-void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n);
+void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
+                                             IOMMUNotifier *n);
 
 /**
  * memory_region_name: get a memory region's name
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 94dfae3..c17602e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -93,7 +93,7 @@ typedef struct VFIOGuestIOMMU {
     VFIOContainer *container;
     MemoryRegion *iommu;
     hwaddr iommu_offset;
-    Notifier n;
+    IOMMUNotifier n;
     QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
 } VFIOGuestIOMMU;
 
diff --git a/memory.c b/memory.c
index 0eb6895..45a3902 100644
--- a/memory.c
+++ b/memory.c
@@ -1418,7 +1418,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
     memory_region_init(mr, owner, name, size);
     mr->iommu_ops = ops,
     mr->terminates = true;  /* then re-forwards */
-    notifier_list_init(&mr->iommu_notify);
+    QLIST_INIT(&mr->iommu_notify);
 }
 
 static void memory_region_finalize(Object *obj)
@@ -1513,13 +1513,16 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
     return memory_region_get_dirty_log_mask(mr) & (1 << client);
 }
 
-void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
+void memory_region_register_iommu_notifier(MemoryRegion *mr,
+                                           IOMMUNotifier *n)
 {
+    /* We need to register for at least one bitfield */
+    assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);
     if (mr->iommu_ops->notify_started &&
-        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
+        QLIST_EMPTY(&mr->iommu_notify)) {
         mr->iommu_ops->notify_started(mr);
     }
-    notifier_list_add(&mr->iommu_notify, n);
+    QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
 }
 
 uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
@@ -1531,7 +1534,8 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
     return TARGET_PAGE_SIZE;
 }
 
-void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write)
+void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
+                                bool is_write)
 {
     hwaddr addr, granularity;
     IOMMUTLBEntry iotlb;
@@ -1552,11 +1556,12 @@ void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write)
     }
 }
 
-void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
+void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
+                                             IOMMUNotifier *n)
 {
-    notifier_remove(n);
+    QLIST_REMOVE(n, node);
     if (mr->iommu_ops->notify_stopped &&
-        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
+        QLIST_EMPTY(&mr->iommu_notify)) {
         mr->iommu_ops->notify_stopped(mr);
     }
 }
@@ -1564,8 +1569,22 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
 void memory_region_notify_iommu(MemoryRegion *mr,
                                 IOMMUTLBEntry entry)
 {
+    IOMMUNotifier *iommu_notifier;
+    IOMMUNotifierCap request_cap;
+
     assert(memory_region_is_iommu(mr));
-    notifier_list_notify(&mr->iommu_notify, &entry);
+
+    if (entry.perm & IOMMU_RW) {
+        request_cap = IOMMU_NOTIFIER_CHANGE;
+    } else {
+        request_cap = IOMMU_NOTIFIER_INVALIDATION;
+    }
+
+    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+        if (iommu_notifier->notifier_caps & request_cap) {
+            iommu_notifier->notify(iommu_notifier, &entry);
+        }
+    }
 }
 
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-07  5:32 [Qemu-devel] [PATCH v3 0/3] Introduce IOMMUNotifier struct Peter Xu
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps Peter Xu
@ 2016-09-07  5:32 ` Peter Xu
  2016-09-07  6:05   ` David Gibson
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 3/3] intel_iommu: allow invalidation typed notifiers Peter Xu
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2016-09-07  5:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: dgibson, jasowang, mst, pbonzini, cornelia.huck, wexu, vkaplans,
	alex.williamson, peterx

Considering that we may have multiple IOMMU notifier consumers in the
future, converting iommu_ops.notify_{started|stopped} into some more
general form. Now we can trap all notifier registerations and
deregistrations, rather than only the first ones.

Power was leveraging the notifier_{started|stopped}, adding iommu_user
field for counting on Power guests to achieve the same goal.

Suggested-by:  Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c  |  4 ++--
 hw/ppc/spapr_iommu.c   | 18 ++++++++++++------
 include/exec/memory.h  |  8 ++++----
 include/hw/ppc/spapr.h |  1 +
 memory.c               | 10 ++++------
 5 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 28c31a2..c6bd8f6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1974,7 +1974,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
     return ret;
 }
 
-static void vtd_iommu_notify_started(MemoryRegion *iommu)
+static void vtd_iommu_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
 
@@ -2348,7 +2348,7 @@ static void vtd_init(IntelIOMMUState *s)
     memset(s->womask, 0, DMAR_REG_SIZE);
 
     s->iommu_ops.translate = vtd_iommu_translate;
-    s->iommu_ops.notify_started = vtd_iommu_notify_started;
+    s->iommu_ops.notifier_add = vtd_iommu_notifier_add;
     s->root = 0;
     s->root_extended = false;
     s->dmar_enabled = false;
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 6bc4d4d..99c83a4 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -156,14 +156,20 @@ static uint64_t spapr_tce_get_min_page_size(MemoryRegion *iommu)
     return 1ULL << tcet->page_shift;
 }
 
-static void spapr_tce_notify_started(MemoryRegion *iommu)
+static void spapr_tce_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
 {
-    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true);
+    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+    if (tcet->iommu_users++ == 0) {
+        spapr_tce_set_need_vfio(tcet, true);
+    }
 }
 
-static void spapr_tce_notify_stopped(MemoryRegion *iommu)
+static void spapr_tce_notifier_del(MemoryRegion *iommu, IOMMUNotifier *n)
 {
-    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), false);
+    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+    if (--tcet->iommu_users == 0) {
+        spapr_tce_set_need_vfio(tcet, false);
+    }
 }
 
 static int spapr_tce_table_post_load(void *opaque, int version_id)
@@ -246,8 +252,8 @@ static const VMStateDescription vmstate_spapr_tce_table = {
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
     .translate = spapr_tce_translate_iommu,
     .get_min_page_size = spapr_tce_get_min_page_size,
-    .notify_started = spapr_tce_notify_started,
-    .notify_stopped = spapr_tce_notify_stopped,
+    .notifier_add = spapr_tce_notifier_add,
+    .notifier_del = spapr_tce_notifier_del,
 };
 
 static int spapr_tce_table_realize(DeviceState *dev)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 92f14db..9efcf1b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -175,10 +175,10 @@ struct MemoryRegionIOMMUOps {
     IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
     /* Returns minimum supported page size */
     uint64_t (*get_min_page_size)(MemoryRegion *iommu);
-    /* Called when the first notifier is set */
-    void (*notify_started)(MemoryRegion *iommu);
-    /* Called when the last notifier is removed */
-    void (*notify_stopped)(MemoryRegion *iommu);
+    /* Called when someone registers to the notify list */
+    void (*notifier_add)(MemoryRegion *iommu, IOMMUNotifier *n);
+    /* Called when someone unregisters from the notify list */
+    void (*notifier_del)(MemoryRegion *iommu, IOMMUNotifier *n);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index caf7be9..08627ec 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -564,6 +564,7 @@ struct sPAPRTCETable {
     MemoryRegion root, iommu;
     struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
     QLIST_ENTRY(sPAPRTCETable) list;
+    int iommu_users;
 };
 
 sPAPRTCETable *spapr_tce_find_by_liobn(target_ulong liobn);
diff --git a/memory.c b/memory.c
index 45a3902..d913043 100644
--- a/memory.c
+++ b/memory.c
@@ -1518,9 +1518,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
 {
     /* We need to register for at least one bitfield */
     assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);
-    if (mr->iommu_ops->notify_started &&
-        QLIST_EMPTY(&mr->iommu_notify)) {
-        mr->iommu_ops->notify_started(mr);
+    if (mr->iommu_ops->notifier_add) {
+        mr->iommu_ops->notifier_add(mr, n);
     }
     QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
 }
@@ -1560,9 +1559,8 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
                                              IOMMUNotifier *n)
 {
     QLIST_REMOVE(n, node);
-    if (mr->iommu_ops->notify_stopped &&
-        QLIST_EMPTY(&mr->iommu_notify)) {
-        mr->iommu_ops->notify_stopped(mr);
+    if (mr->iommu_ops->notifier_del) {
+        mr->iommu_ops->notifier_del(mr, n);
     }
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] intel_iommu: allow invalidation typed notifiers
  2016-09-07  5:32 [Qemu-devel] [PATCH v3 0/3] Introduce IOMMUNotifier struct Peter Xu
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps Peter Xu
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add Peter Xu
@ 2016-09-07  5:32 ` Peter Xu
  2 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2016-09-07  5:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: dgibson, jasowang, mst, pbonzini, cornelia.huck, wexu, vkaplans,
	alex.williamson, peterx

Intel vIOMMU is still lacking of a complete IOMMU notifier mechanism.
Before that is achieved, let's open a door for vhost DMAR support, which
only requires device-IOTLB based cache invalidations.

Meanwhile, converting hw_error() to error_report() and exit(), to make
the error messages clean and obvious (so no CPU registers will be
dumped).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c6bd8f6..227c931 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1978,10 +1978,14 @@ static void vtd_iommu_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
 
-    hw_error("Device at bus %s addr %02x.%d requires iommu notifier which "
-             "is currently not supported by intel-iommu emulation",
-             vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
-             PCI_FUNC(vtd_as->devfn));
+    if (n->notifier_caps & IOMMU_NOTIFIER_CHANGE) {
+        error_report("Device at bus %s addr %02x.%d requires iommu "
+                     "notifier which is currently not supported by "
+                     "intel-iommu emulation",
+                     vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
+                     PCI_FUNC(vtd_as->devfn));
+        exit(1);
+    }
 }
 
 static const VMStateDescription vtd_vmstate = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps Peter Xu
@ 2016-09-07  6:02   ` David Gibson
  2016-09-07  7:09     ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2016-09-07  6:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	pbonzini, cornelia.huck, dgibson

[-- Attachment #1: Type: text/plain, Size: 11120 bytes --]

On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote:
> IOMMU Notifier list is used for notifying IO address mapping changes.
> Currently VFIO is the only user.
> 
> However it is possible that future consumer like vhost would like to
> only listen to part of its notifications (e.g., cache invalidations).
> 
> This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
> grained control of it.
> 
> IOMMUNotifier contains a bitfield for the notify consumer describing
> what kind of notification it is interested in. Currently two kinds of
> notifications are defined:
> 
> - IOMMU_NOTIFIER_CHANGE:       for entry changes (additions)
> - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)

As noted on the other thread, I think the correct options for your
bitmap here are "map" and "unmap".  Which are triggered depends on the
permissions / existence of the *previous* mapping, as well as the new
one.

You could in fact have "map-read", "map-write", "unmap-read",
"unmap-write" as separate bitmap options (e.g. changing a mapping from
RO to WO would be both a read-unmap and write-map event).  I can't see
any real use for that though, so just "map" and "unmap" are probably
sufficient.

> When registering the IOMMU notifier, we need to specify one or multiple
> capability bit(s) to listen to.
> 
> When notifications are triggered, it will be checked against the
> notifier's capability bits, and only notifiers with registered bits will
> be notified.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/vfio/common.c              |  3 ++-
>  include/exec/memory.h         | 39 ++++++++++++++++++++++++++++++++-------
>  include/hw/vfio/vfio-common.h |  2 +-
>  memory.c                      | 37 ++++++++++++++++++++++++++++---------
>  4 files changed, 63 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b313e7c..b0cea2c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -293,7 +293,7 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>             section->offset_within_address_space & (1ULL << 63);
>  }
>  
> -static void vfio_iommu_map_notify(Notifier *n, void *data)
> +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
>  {
>      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>      VFIOContainer *container = giommu->container;
> @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                                 section->offset_within_region;
>          giommu->container = container;
>          giommu->n.notify = vfio_iommu_map_notify;
> +        giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;

"caps" isn't really right.  It's a *requirement* that VFIO get all the
notifications, not a capability.  "caps" would only make sense on the
other side (the vIOMMU implementation).

>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>  
>          memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3e4d416..92f14db 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -67,6 +67,28 @@ struct IOMMUTLBEntry {
>      IOMMUAccessFlags perm;
>  };
>  
> +/*
> + * Bitmap for differnet IOMMUNotifier capabilities. Each notifier can
> + * register with one or multiple IOMMU Notifier capability bit(s).
> + */
> +typedef enum {
> +    IOMMU_NOTIFIER_NONE = 0,
> +    /* Notify cache invalidations */
> +    IOMMU_NOTIFIER_INVALIDATION = 0x1,
> +    /* Notify entry changes (newly created entries) */
> +    IOMMU_NOTIFIER_CHANGE = 0x2,
> +} IOMMUNotifierCap;
> +
> +#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_INVALIDATION | \
> +                            IOMMU_NOTIFIER_CHANGE)
> +
> +struct IOMMUNotifier {
> +    void (*notify)(struct IOMMUNotifier *notifier, void *data);
> +    IOMMUNotifierCap notifier_caps;
> +    QLIST_ENTRY(IOMMUNotifier) node;
> +};
> +typedef struct IOMMUNotifier IOMMUNotifier;
> +
>  /* New-style MMIO accessors can indicate that the transaction failed.
>   * A zero (MEMTX_OK) response means success; anything else is a failure
>   * of some kind. The memory subsystem will bitwise-OR together results
> @@ -201,7 +223,7 @@ struct MemoryRegion {
>      const char *name;
>      unsigned ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
> -    NotifierList iommu_notify;
> +    QLIST_HEAD(, IOMMUNotifier) iommu_notify;
>  };
>  
>  /**
> @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>   * IOMMU translation entries.
>   *
>   * @mr: the memory region to observe
> - * @n: the notifier to be added; the notifier receives a pointer to an
> - *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> - *     valid on exit from the notifier.
> + * @n: the IOMMUNotifier to be added; the notify callback receives a
> + *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
> + *     ceases to be valid on exit from the notifier.
>   */
> -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> +                                           IOMMUNotifier *n);

It seems to me that this should be allowed to fail, if the notifier
you're trying to register requires notifications that the MR
implementation can't supply.  That seems cleaner than delaying the
checking until the notification actually happens.

>  /**
>   * memory_region_iommu_replay: replay existing IOMMU translations to
> @@ -636,7 +659,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
>   * @is_write: Whether to treat the replay as a translate "write"
>   *     through the iommu
>   */
> -void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
> +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> +                                bool is_write);
>  
>  /**
>   * memory_region_unregister_iommu_notifier: unregister a notifier for
> @@ -646,7 +670,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
>   *      needs to be called
>   * @n: the notifier to be removed.
>   */
> -void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n);
> +void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
> +                                             IOMMUNotifier *n);
>  
>  /**
>   * memory_region_name: get a memory region's name
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 94dfae3..c17602e 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -93,7 +93,7 @@ typedef struct VFIOGuestIOMMU {
>      VFIOContainer *container;
>      MemoryRegion *iommu;
>      hwaddr iommu_offset;
> -    Notifier n;
> +    IOMMUNotifier n;
>      QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
>  } VFIOGuestIOMMU;
>  
> diff --git a/memory.c b/memory.c
> index 0eb6895..45a3902 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1418,7 +1418,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
>      memory_region_init(mr, owner, name, size);
>      mr->iommu_ops = ops,
>      mr->terminates = true;  /* then re-forwards */
> -    notifier_list_init(&mr->iommu_notify);
> +    QLIST_INIT(&mr->iommu_notify);
>  }
>  
>  static void memory_region_finalize(Object *obj)
> @@ -1513,13 +1513,16 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
>      return memory_region_get_dirty_log_mask(mr) & (1 << client);
>  }
>  
> -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> +                                           IOMMUNotifier *n)
>  {
> +    /* We need to register for at least one bitfield */
> +    assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);

Not sure if it makes sense to implement NOTIFIER_NONE as a no-op just
for orthogonality.

>      if (mr->iommu_ops->notify_started &&
> -        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
> +        QLIST_EMPTY(&mr->iommu_notify)) {
>          mr->iommu_ops->notify_started(mr);

As noted above, I think register_notify should get the ability to
fail, which would happen if notify_started() failed (obviously it
needs to get a failure mode as well.  Basically notify_started is
required to check that this vIOMMU is able to supply the notifications
that have been requested.

>      }
> -    notifier_list_add(&mr->iommu_notify, n);
> +    QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
>  }
>  
>  uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
> @@ -1531,7 +1534,8 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
>      return TARGET_PAGE_SIZE;
>  }
>  
> -void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write)
> +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> +                                bool is_write)
>  {
>      hwaddr addr, granularity;
>      IOMMUTLBEntry iotlb;
> @@ -1552,11 +1556,12 @@ void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write)
>      }
>  }
>  
> -void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
> +void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
> +                                             IOMMUNotifier *n)
>  {
> -    notifier_remove(n);
> +    QLIST_REMOVE(n, node);
>      if (mr->iommu_ops->notify_stopped &&
> -        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
> +        QLIST_EMPTY(&mr->iommu_notify)) {
>          mr->iommu_ops->notify_stopped(mr);
>      }
>  }
> @@ -1564,8 +1569,22 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
>  void memory_region_notify_iommu(MemoryRegion *mr,
>                                  IOMMUTLBEntry entry)
>  {
> +    IOMMUNotifier *iommu_notifier;
> +    IOMMUNotifierCap request_cap;
> +
>      assert(memory_region_is_iommu(mr));
> -    notifier_list_notify(&mr->iommu_notify, &entry);
> +
> +    if (entry.perm & IOMMU_RW) {
> +        request_cap = IOMMU_NOTIFIER_CHANGE;
> +    } else {
> +        request_cap = IOMMU_NOTIFIER_INVALIDATION;
> +    }

As noted right at the top, I don't think this logic is really right.
An in-place change should be treated as both a map and unmap.

> +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> +        if (iommu_notifier->notifier_caps & request_cap) {
> +            iommu_notifier->notify(iommu_notifier, &entry);
> +        }
> +    }
>  }
>  
>  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add Peter Xu
@ 2016-09-07  6:05   ` David Gibson
  2016-09-07  7:23     ` Peter Xu
  2016-09-07 10:54     ` Paolo Bonzini
  0 siblings, 2 replies; 15+ messages in thread
From: David Gibson @ 2016-09-07  6:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	pbonzini, cornelia.huck, dgibson

[-- Attachment #1: Type: text/plain, Size: 6414 bytes --]

On Wed, Sep 07, 2016 at 01:32:23PM +0800, Peter Xu wrote:
> Considering that we may have multiple IOMMU notifier consumers in the
> future, converting iommu_ops.notify_{started|stopped} into some more
> general form. Now we can trap all notifier registerations and
> deregistrations, rather than only the first ones.
> 
> Power was leveraging the notifier_{started|stopped}, adding iommu_user
> field for counting on Power guests to achieve the same goal.

Requiring each vIOMMU frontend to reference count or whatever seems
like a pain.  The semantics of notify_started() were designed to avoid
that.

Instead I'd suggest a callback which gets tripped any time the logical
OR of the requested notifications for all current notifiers changes.

> Suggested-by:  Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c  |  4 ++--
>  hw/ppc/spapr_iommu.c   | 18 ++++++++++++------
>  include/exec/memory.h  |  8 ++++----
>  include/hw/ppc/spapr.h |  1 +
>  memory.c               | 10 ++++------
>  5 files changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 28c31a2..c6bd8f6 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1974,7 +1974,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
>      return ret;
>  }
>  
> -static void vtd_iommu_notify_started(MemoryRegion *iommu)
> +static void vtd_iommu_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>  
> @@ -2348,7 +2348,7 @@ static void vtd_init(IntelIOMMUState *s)
>      memset(s->womask, 0, DMAR_REG_SIZE);
>  
>      s->iommu_ops.translate = vtd_iommu_translate;
> -    s->iommu_ops.notify_started = vtd_iommu_notify_started;
> +    s->iommu_ops.notifier_add = vtd_iommu_notifier_add;
>      s->root = 0;
>      s->root_extended = false;
>      s->dmar_enabled = false;
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 6bc4d4d..99c83a4 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -156,14 +156,20 @@ static uint64_t spapr_tce_get_min_page_size(MemoryRegion *iommu)
>      return 1ULL << tcet->page_shift;
>  }
>  
> -static void spapr_tce_notify_started(MemoryRegion *iommu)
> +static void spapr_tce_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
>  {
> -    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true);
> +    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> +    if (tcet->iommu_users++ == 0) {
> +        spapr_tce_set_need_vfio(tcet, true);
> +    }
>  }
>  
> -static void spapr_tce_notify_stopped(MemoryRegion *iommu)
> +static void spapr_tce_notifier_del(MemoryRegion *iommu, IOMMUNotifier *n)
>  {
> -    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), false);
> +    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> +    if (--tcet->iommu_users == 0) {
> +        spapr_tce_set_need_vfio(tcet, false);
> +    }
>  }
>  
>  static int spapr_tce_table_post_load(void *opaque, int version_id)
> @@ -246,8 +252,8 @@ static const VMStateDescription vmstate_spapr_tce_table = {
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>      .translate = spapr_tce_translate_iommu,
>      .get_min_page_size = spapr_tce_get_min_page_size,
> -    .notify_started = spapr_tce_notify_started,
> -    .notify_stopped = spapr_tce_notify_stopped,
> +    .notifier_add = spapr_tce_notifier_add,
> +    .notifier_del = spapr_tce_notifier_del,
>  };
>  
>  static int spapr_tce_table_realize(DeviceState *dev)
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 92f14db..9efcf1b 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -175,10 +175,10 @@ struct MemoryRegionIOMMUOps {
>      IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
>      /* Returns minimum supported page size */
>      uint64_t (*get_min_page_size)(MemoryRegion *iommu);
> -    /* Called when the first notifier is set */
> -    void (*notify_started)(MemoryRegion *iommu);
> -    /* Called when the last notifier is removed */
> -    void (*notify_stopped)(MemoryRegion *iommu);
> +    /* Called when someone registers to the notify list */
> +    void (*notifier_add)(MemoryRegion *iommu, IOMMUNotifier *n);
> +    /* Called when someone unregisters from the notify list */
> +    void (*notifier_del)(MemoryRegion *iommu, IOMMUNotifier *n);
>  };
>  
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index caf7be9..08627ec 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -564,6 +564,7 @@ struct sPAPRTCETable {
>      MemoryRegion root, iommu;
>      struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
>      QLIST_ENTRY(sPAPRTCETable) list;
> +    int iommu_users;
>  };
>  
>  sPAPRTCETable *spapr_tce_find_by_liobn(target_ulong liobn);
> diff --git a/memory.c b/memory.c
> index 45a3902..d913043 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1518,9 +1518,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
>  {
>      /* We need to register for at least one bitfield */
>      assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);
> -    if (mr->iommu_ops->notify_started &&
> -        QLIST_EMPTY(&mr->iommu_notify)) {
> -        mr->iommu_ops->notify_started(mr);
> +    if (mr->iommu_ops->notifier_add) {
> +        mr->iommu_ops->notifier_add(mr, n);
>      }
>      QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
>  }
> @@ -1560,9 +1559,8 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>                                               IOMMUNotifier *n)
>  {
>      QLIST_REMOVE(n, node);
> -    if (mr->iommu_ops->notify_stopped &&
> -        QLIST_EMPTY(&mr->iommu_notify)) {
> -        mr->iommu_ops->notify_stopped(mr);
> +    if (mr->iommu_ops->notifier_del) {
> +        mr->iommu_ops->notifier_del(mr, n);
>      }
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps
  2016-09-07  6:02   ` David Gibson
@ 2016-09-07  7:09     ` Peter Xu
  2016-09-07 10:20       ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2016-09-07  7:09 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	pbonzini, cornelia.huck, dgibson

On Wed, Sep 07, 2016 at 04:02:39PM +1000, David Gibson wrote:
> On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote:
> > IOMMU Notifier list is used for notifying IO address mapping changes.
> > Currently VFIO is the only user.
> > 
> > However it is possible that future consumer like vhost would like to
> > only listen to part of its notifications (e.g., cache invalidations).
> > 
> > This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
> > grained control of it.
> > 
> > IOMMUNotifier contains a bitfield for the notify consumer describing
> > what kind of notification it is interested in. Currently two kinds of
> > notifications are defined:
> > 
> > - IOMMU_NOTIFIER_CHANGE:       for entry changes (additions)
> > - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)
> 
> As noted on the other thread, I think the correct options for your
> bitmap here are "map" and "unmap".  Which are triggered depends on the
> permissions / existence of the *previous* mapping, as well as the new
> one.

As mentioned in previous reply, both work for me. :)

If you insist on changing it (without anyone that strongly
disagree...), I can do it in next spin.

> 
> You could in fact have "map-read", "map-write", "unmap-read",
> "unmap-write" as separate bitmap options (e.g. changing a mapping from
> RO to WO would be both a read-unmap and write-map event).  I can't see
> any real use for that though, so just "map" and "unmap" are probably
> sufficient.

Agreed. We can enhance it in the future if there is any real
requirement. Before that, it would be over-engineering.

(Btw, we should not need {read|write}_unmap in all cases. IIUC unmap
 should not need any permission check.)

[...]

> > -static void vfio_iommu_map_notify(Notifier *n, void *data)
> > +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
> >  {
> >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> >      VFIOContainer *container = giommu->container;
> > @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >                                 section->offset_within_region;
> >          giommu->container = container;
> >          giommu->n.notify = vfio_iommu_map_notify;
> > +        giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;
> 
> "caps" isn't really right.  It's a *requirement* that VFIO get all the
> notifications, not a capability.  "caps" would only make sense on the
> other side (the vIOMMU implementation).

Sounds reasonable. How about "flags"? Or any suggestion?

[...]

> >  /**
> > @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> >   * IOMMU translation entries.
> >   *
> >   * @mr: the memory region to observe
> > - * @n: the notifier to be added; the notifier receives a pointer to an
> > - *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> > - *     valid on exit from the notifier.
> > + * @n: the IOMMUNotifier to be added; the notify callback receives a
> > + *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
> > + *     ceases to be valid on exit from the notifier.
> >   */
> > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > +                                           IOMMUNotifier *n);
> 
> It seems to me that this should be allowed to fail, if the notifier
> you're trying to register requires notifications that the MR
> implementation can't supply.  That seems cleaner than delaying the
> checking until the notification actually happens.

Do we have real use case for this one? For example, do we have case
that VM "registering required notifications that MR cannot support"
can still work?

Currently there is only one use case AFAIK, which is VFIO+IntelIOMMU.
In that case, I take it a configuration error (we should never allow
that configuration happen). IMHO All configuration errors should be
reported ASAP, and we should never let VM start.

> 
> >  /**
> >   * memory_region_iommu_replay: replay existing IOMMU translations to
> > @@ -636,7 +659,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> >   * @is_write: Whether to treat the replay as a translate "write"
> >   *     through the iommu
> >   */
> > -void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
> > +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> > +                                bool is_write);
> >  
> >  /**
> >   * memory_region_unregister_iommu_notifier: unregister a notifier for
> > @@ -646,7 +670,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
> >   *      needs to be called
> >   * @n: the notifier to be removed.
> >   */
> > -void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n);
> > +void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
> > +                                             IOMMUNotifier *n);
> >  
> >  /**
> >   * memory_region_name: get a memory region's name
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index 94dfae3..c17602e 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -93,7 +93,7 @@ typedef struct VFIOGuestIOMMU {
> >      VFIOContainer *container;
> >      MemoryRegion *iommu;
> >      hwaddr iommu_offset;
> > -    Notifier n;
> > +    IOMMUNotifier n;
> >      QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
> >  } VFIOGuestIOMMU;
> >  
> > diff --git a/memory.c b/memory.c
> > index 0eb6895..45a3902 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1418,7 +1418,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
> >      memory_region_init(mr, owner, name, size);
> >      mr->iommu_ops = ops,
> >      mr->terminates = true;  /* then re-forwards */
> > -    notifier_list_init(&mr->iommu_notify);
> > +    QLIST_INIT(&mr->iommu_notify);
> >  }
> >  
> >  static void memory_region_finalize(Object *obj)
> > @@ -1513,13 +1513,16 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
> >      return memory_region_get_dirty_log_mask(mr) & (1 << client);
> >  }
> >  
> > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > +                                           IOMMUNotifier *n)
> >  {
> > +    /* We need to register for at least one bitfield */
> > +    assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);
> 
> Not sure if it makes sense to implement NOTIFIER_NONE as a no-op just
> for orthogonality.

I would think it is a programming error when registering to IOMMO
notifier without any flags set (or cap). So I put an assert() here.
Even if we have a no-op thing, anyone registers with
IOMMU_NOTIFIER_NONE flag is suspecious and strange.

> 
> >      if (mr->iommu_ops->notify_started &&
> > -        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
> > +        QLIST_EMPTY(&mr->iommu_notify)) {
> >          mr->iommu_ops->notify_started(mr);
> 
> As noted above, I think register_notify should get the ability to
> fail, which would happen if notify_started() failed (obviously it
> needs to get a failure mode as well.  Basically notify_started is
> required to check that this vIOMMU is able to supply the notifications
> that have been requested.

Same as above, I just failed to think out a use case for that yet. As
long as we can have a use case for it, it's easy to replace the
returned void type into something like int, and report it up along the
way.

[...]

> > @@ -1564,8 +1569,22 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
> >  void memory_region_notify_iommu(MemoryRegion *mr,
> >                                  IOMMUTLBEntry entry)
> >  {
> > +    IOMMUNotifier *iommu_notifier;
> > +    IOMMUNotifierCap request_cap;
> > +
> >      assert(memory_region_is_iommu(mr));
> > -    notifier_list_notify(&mr->iommu_notify, &entry);
> > +
> > +    if (entry.perm & IOMMU_RW) {
> > +        request_cap = IOMMU_NOTIFIER_CHANGE;
> > +    } else {
> > +        request_cap = IOMMU_NOTIFIER_INVALIDATION;
> > +    }
> 
> As noted right at the top, I don't think this logic is really right.
> An in-place change should be treated as both a map and unmap.

IIUC, guest needs to send two invalidations for an in-place change,
right? So a "change" will actually trigger this twice.

(At least this should be true for Power, and I am guessing the same
 for Intel, otherwise PowerIOMMU+VFIO should not work before this
 series...)

Thanks!

-- peterx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-07  6:05   ` David Gibson
@ 2016-09-07  7:23     ` Peter Xu
  2016-09-07 10:23       ` David Gibson
  2016-09-07 10:54     ` Paolo Bonzini
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Xu @ 2016-09-07  7:23 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	pbonzini, cornelia.huck, dgibson

On Wed, Sep 07, 2016 at 04:05:50PM +1000, David Gibson wrote:
> On Wed, Sep 07, 2016 at 01:32:23PM +0800, Peter Xu wrote:
> > Considering that we may have multiple IOMMU notifier consumers in the
> > future, converting iommu_ops.notify_{started|stopped} into some more
> > general form. Now we can trap all notifier registerations and
> > deregistrations, rather than only the first ones.
> > 
> > Power was leveraging the notifier_{started|stopped}, adding iommu_user
> > field for counting on Power guests to achieve the same goal.
> 
> Requiring each vIOMMU frontend to reference count or whatever seems
> like a pain.  The semantics of notify_started() were designed to avoid
> that.

The problem is that, I think we need something like "notifier_add",
just like you have mentioned before, e.g., what if we have more than
one registers for the notifier list? And if with that, it'll be
awkward to still keep the notify_started since it's actually a subset
of "notifier_add".

Considering the above, I think simply adding a count for IOMMUs who
want it is a reasonable trade-off.

> 
> Instead I'd suggest a callback which gets tripped any time the logical
> OR of the requested notifications for all current notifiers changes.

If so, we will need two callbacks (notify_started,
notifier_xxx_changed) instead of one. In that case I'd prefer a single
notifier_add. Besides that, I'd say the notifier_xxx_changed interface
is really hard to understand from the first glance.

Another reason to have notifier_add() is that, this is more easily to
be extended. E.g., we can add more fields to IOMMUNotifier in the
future (a channel between the consumer and provider), and it'll be
passed to each IOMMU's notifier_add() naturally.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps
  2016-09-07  7:09     ` Peter Xu
@ 2016-09-07 10:20       ` David Gibson
  2016-09-08 10:00         ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2016-09-07 10:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	pbonzini, cornelia.huck, dgibson

[-- Attachment #1: Type: text/plain, Size: 10597 bytes --]

On Wed, Sep 07, 2016 at 03:09:16PM +0800, Peter Xu wrote:
> On Wed, Sep 07, 2016 at 04:02:39PM +1000, David Gibson wrote:
> > On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote:
> > > IOMMU Notifier list is used for notifying IO address mapping changes.
> > > Currently VFIO is the only user.
> > > 
> > > However it is possible that future consumer like vhost would like to
> > > only listen to part of its notifications (e.g., cache invalidations).
> > > 
> > > This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
> > > grained control of it.
> > > 
> > > IOMMUNotifier contains a bitfield for the notify consumer describing
> > > what kind of notification it is interested in. Currently two kinds of
> > > notifications are defined:
> > > 
> > > - IOMMU_NOTIFIER_CHANGE:       for entry changes (additions)
> > > - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)
> > 
> > As noted on the other thread, I think the correct options for your
> > bitmap here are "map" and "unmap".  Which are triggered depends on the
> > permissions / existence of the *previous* mapping, as well as the new
> > one.
> 
> As mentioned in previous reply, both work for me. :)
> 
> If you insist on changing it (without anyone that strongly
> disagree...), I can do it in next spin.

This is not just a name change I'm proposing, but a semantic change
(or at least a clarification).

> > You could in fact have "map-read", "map-write", "unmap-read",
> > "unmap-write" as separate bitmap options (e.g. changing a mapping from
> > RO to WO would be both a read-unmap and write-map event).  I can't see
> > any real use for that though, so just "map" and "unmap" are probably
> > sufficient.
> 
> Agreed. We can enhance it in the future if there is any real
> requirement. Before that, it would be over-engineering.
> 
> (Btw, we should not need {read|write}_unmap in all cases. IIUC unmap
>  should not need any permission check.)

In practice probably not, but they are distinct operations.
read_unmap means a readable mapping has been removed, write_unmap
means a writable mapping has been removed.  Again - the permissions on
the *old* mapping are what matters here.

> 
> [...]
> 
> > > -static void vfio_iommu_map_notify(Notifier *n, void *data)
> > > +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
> > >  {
> > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > >      VFIOContainer *container = giommu->container;
> > > @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > >                                 section->offset_within_region;
> > >          giommu->container = container;
> > >          giommu->n.notify = vfio_iommu_map_notify;
> > > +        giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;
> > 
> > "caps" isn't really right.  It's a *requirement* that VFIO get all the
> > notifications, not a capability.  "caps" would only make sense on the
> > other side (the vIOMMU implementation).
> 
> Sounds reasonable. How about "flags"? Or any suggestion?

"flags" would do.  I feel like there should be a better name, but I
can't think of it.

> [...]
> 
> > >  /**
> > > @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> > >   * IOMMU translation entries.
> > >   *
> > >   * @mr: the memory region to observe
> > > - * @n: the notifier to be added; the notifier receives a pointer to an
> > > - *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> > > - *     valid on exit from the notifier.
> > > + * @n: the IOMMUNotifier to be added; the notify callback receives a
> > > + *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
> > > + *     ceases to be valid on exit from the notifier.
> > >   */
> > > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> > > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > +                                           IOMMUNotifier *n);
> > 
> > It seems to me that this should be allowed to fail, if the notifier
> > you're trying to register requires notifications that the MR
> > implementation can't supply.  That seems cleaner than delaying the
> > checking until the notification actually happens.
> 
> Do we have real use case for this one? For example, do we have case
> that VM "registering required notifications that MR cannot support"
> can still work?
> 
> Currently there is only one use case AFAIK, which is VFIO+IntelIOMMU.
> In that case, I take it a configuration error (we should never allow
> that configuration happen). IMHO All configuration errors should be
> reported ASAP, and we should never let VM start.

Yes... I'm not proposing changing that.  I just think it would be
cleaner to report the error through the error channels, instead of
just aborting.

> > >  /**
> > >   * memory_region_iommu_replay: replay existing IOMMU translations to
> > > @@ -636,7 +659,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> > >   * @is_write: Whether to treat the replay as a translate "write"
> > >   *     through the iommu
> > >   */
> > > -void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
> > > +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> > > +                                bool is_write);
> > >  
> > >  /**
> > >   * memory_region_unregister_iommu_notifier: unregister a notifier for
> > > @@ -646,7 +670,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
> > >   *      needs to be called
> > >   * @n: the notifier to be removed.
> > >   */
> > > -void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n);
> > > +void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
> > > +                                             IOMMUNotifier *n);
> > >  
> > >  /**
> > >   * memory_region_name: get a memory region's name
> > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > > index 94dfae3..c17602e 100644
> > > --- a/include/hw/vfio/vfio-common.h
> > > +++ b/include/hw/vfio/vfio-common.h
> > > @@ -93,7 +93,7 @@ typedef struct VFIOGuestIOMMU {
> > >      VFIOContainer *container;
> > >      MemoryRegion *iommu;
> > >      hwaddr iommu_offset;
> > > -    Notifier n;
> > > +    IOMMUNotifier n;
> > >      QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
> > >  } VFIOGuestIOMMU;
> > >  
> > > diff --git a/memory.c b/memory.c
> > > index 0eb6895..45a3902 100644
> > > --- a/memory.c
> > > +++ b/memory.c
> > > @@ -1418,7 +1418,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
> > >      memory_region_init(mr, owner, name, size);
> > >      mr->iommu_ops = ops,
> > >      mr->terminates = true;  /* then re-forwards */
> > > -    notifier_list_init(&mr->iommu_notify);
> > > +    QLIST_INIT(&mr->iommu_notify);
> > >  }
> > >  
> > >  static void memory_region_finalize(Object *obj)
> > > @@ -1513,13 +1513,16 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
> > >      return memory_region_get_dirty_log_mask(mr) & (1 << client);
> > >  }
> > >  
> > > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> > > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > +                                           IOMMUNotifier *n)
> > >  {
> > > +    /* We need to register for at least one bitfield */
> > > +    assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);
> > 
> > Not sure if it makes sense to implement NOTIFIER_NONE as a no-op just
> > for orthogonality.
> 
> I would think it is a programming error when registering to IOMMO
> notifier without any flags set (or cap). So I put an assert() here.
> Even if we have a no-op thing, anyone registers with
> IOMMU_NOTIFIER_NONE flag is suspecious and strange.

True.

> > 
> > >      if (mr->iommu_ops->notify_started &&
> > > -        QLIST_EMPTY(&mr->iommu_notify.notifiers)) {
> > > +        QLIST_EMPTY(&mr->iommu_notify)) {
> > >          mr->iommu_ops->notify_started(mr);
> > 
> > As noted above, I think register_notify should get the ability to
> > fail, which would happen if notify_started() failed (obviously it
> > needs to get a failure mode as well.  Basically notify_started is
> > required to check that this vIOMMU is able to supply the notifications
> > that have been requested.
> 
> Same as above, I just failed to think out a use case for that yet. As
> long as we can have a use case for it, it's easy to replace the
> returned void type into something like int, and report it up along the
> way.

Hm, I suppose.

> 
> [...]
> 
> > > @@ -1564,8 +1569,22 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
> > >  void memory_region_notify_iommu(MemoryRegion *mr,
> > >                                  IOMMUTLBEntry entry)
> > >  {
> > > +    IOMMUNotifier *iommu_notifier;
> > > +    IOMMUNotifierCap request_cap;
> > > +
> > >      assert(memory_region_is_iommu(mr));
> > > -    notifier_list_notify(&mr->iommu_notify, &entry);
> > > +
> > > +    if (entry.perm & IOMMU_RW) {
> > > +        request_cap = IOMMU_NOTIFIER_CHANGE;
> > > +    } else {
> > > +        request_cap = IOMMU_NOTIFIER_INVALIDATION;
> > > +    }
> > 
> > As noted right at the top, I don't think this logic is really right.
> > An in-place change should be treated as both a map and unmap.
> 
> IIUC, guest needs to send two invalidations for an in-place change,
> right? So a "change" will actually trigger this twice.

No, or at least not necessarily.  How the invalidate is reported
really depends on the guest side vIOMMU interface.  Maybe it requires
an explicit invalidate followed by a set, maybe a direct in place
replacement is possible (the PAPR hypercall interface allows this at
least in theory).

What I had in mind here is that assuming the vIOMMU can detect an in
place change, it would then ping all notifiers that have *either* the
MAP or UNMAP flag bits set.

> (At least this should be true for Power, and I am guessing the same
>  for Intel, otherwise PowerIOMMU+VFIO should not work before this
>  series...)

I don't really follow what you're saying here.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-07  7:23     ` Peter Xu
@ 2016-09-07 10:23       ` David Gibson
  0 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2016-09-07 10:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	pbonzini, cornelia.huck, dgibson

[-- Attachment #1: Type: text/plain, Size: 2515 bytes --]

On Wed, Sep 07, 2016 at 03:23:16PM +0800, Peter Xu wrote:
> On Wed, Sep 07, 2016 at 04:05:50PM +1000, David Gibson wrote:
> > On Wed, Sep 07, 2016 at 01:32:23PM +0800, Peter Xu wrote:
> > > Considering that we may have multiple IOMMU notifier consumers in the
> > > future, converting iommu_ops.notify_{started|stopped} into some more
> > > general form. Now we can trap all notifier registerations and
> > > deregistrations, rather than only the first ones.
> > > 
> > > Power was leveraging the notifier_{started|stopped}, adding iommu_user
> > > field for counting on Power guests to achieve the same goal.
> > 
> > Requiring each vIOMMU frontend to reference count or whatever seems
> > like a pain.  The semantics of notify_started() were designed to avoid
> > that.
> 
> The problem is that, I think we need something like "notifier_add",
> just like you have mentioned before, e.g., what if we have more than
> one registers for the notifier list?

The approach I had in mind is that whenever the notifier list changed,
the infrastructure would rescan the list and recompute the bitmask of
"events any notifier cares about".  If that changed from the previous
value, it would call the notify_change() cb on the vIOMMU.

> And if with that, it'll be
> awkward to still keep the notify_started since it's actually a subset
> of "notifier_add".
> 
> Considering the above, I think simply adding a count for IOMMUs who
> want it is a reasonable trade-off.
> 
> > 
> > Instead I'd suggest a callback which gets tripped any time the logical
> > OR of the requested notifications for all current notifiers changes.
> 
> If so, we will need two callbacks (notify_started,
> notifier_xxx_changed) instead of one.

No, just one.  notify_started is just notify_change() with a non-zero
bitmask, when the mask was previously zero.

> In that case I'd prefer a single
> notifier_add. Besides that, I'd say the notifier_xxx_changed interface
> is really hard to understand from the first glance.
> 
> Another reason to have notifier_add() is that, this is more easily to
> be extended. E.g., we can add more fields to IOMMUNotifier in the
> future (a channel between the consumer and provider), and it'll be
> passed to each IOMMU's notifier_add() naturally.

Hm, I suppose.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-07  6:05   ` David Gibson
  2016-09-07  7:23     ` Peter Xu
@ 2016-09-07 10:54     ` Paolo Bonzini
  2016-09-08 10:22       ` Peter Xu
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2016-09-07 10:54 UTC (permalink / raw)
  To: David Gibson, Peter Xu
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	cornelia.huck, dgibson



On 07/09/2016 08:05, David Gibson wrote:
> On Wed, Sep 07, 2016 at 01:32:23PM +0800, Peter Xu wrote:
>> Considering that we may have multiple IOMMU notifier consumers in the
>> future, converting iommu_ops.notify_{started|stopped} into some more
>> general form. Now we can trap all notifier registerations and
>> deregistrations, rather than only the first ones.
>>
>> Power was leveraging the notifier_{started|stopped}, adding iommu_user
>> field for counting on Power guests to achieve the same goal.
> 
> Requiring each vIOMMU frontend to reference count or whatever seems
> like a pain.  The semantics of notify_started() were designed to avoid
> that.
> 
> Instead I'd suggest a callback which gets tripped any time the logical
> OR of the requested notifications for all current notifiers changes.

I like the suggestion.  Alternatively you could call notify_stopped if
old & ~new is nonzero (and you pass old & ~new), and notify_started if
new & ~old is nonzero (and you pass new & ~old).

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps
  2016-09-07 10:20       ` David Gibson
@ 2016-09-08 10:00         ` Peter Xu
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2016-09-08 10:00 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, mst, jasowang, vkaplans, alex.williamson, wexu,
	pbonzini, cornelia.huck, dgibson

On Wed, Sep 07, 2016 at 08:20:35PM +1000, David Gibson wrote:
> On Wed, Sep 07, 2016 at 03:09:16PM +0800, Peter Xu wrote:
> > On Wed, Sep 07, 2016 at 04:02:39PM +1000, David Gibson wrote:
> > > On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote:
> > > > IOMMU Notifier list is used for notifying IO address mapping changes.
> > > > Currently VFIO is the only user.
> > > > 
> > > > However it is possible that future consumer like vhost would like to
> > > > only listen to part of its notifications (e.g., cache invalidations).
> > > > 
> > > > This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
> > > > grained control of it.
> > > > 
> > > > IOMMUNotifier contains a bitfield for the notify consumer describing
> > > > what kind of notification it is interested in. Currently two kinds of
> > > > notifications are defined:
> > > > 
> > > > - IOMMU_NOTIFIER_CHANGE:       for entry changes (additions)
> > > > - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)
> > > 
> > > As noted on the other thread, I think the correct options for your
> > > bitmap here are "map" and "unmap".  Which are triggered depends on the
> > > permissions / existence of the *previous* mapping, as well as the new
> > > one.
> > 
> > As mentioned in previous reply, both work for me. :)
> > 
> > If you insist on changing it (without anyone that strongly
> > disagree...), I can do it in next spin.
> 
> This is not just a name change I'm proposing, but a semantic change
> (or at least a clarification).

I see that kernel IOMMU driver is using map_page() and unmap_page()
for its interface. Now I prefer MAP/UNMAP.

> 
> > > You could in fact have "map-read", "map-write", "unmap-read",
> > > "unmap-write" as separate bitmap options (e.g. changing a mapping from
> > > RO to WO would be both a read-unmap and write-map event).  I can't see
> > > any real use for that though, so just "map" and "unmap" are probably
> > > sufficient.
> > 
> > Agreed. We can enhance it in the future if there is any real
> > requirement. Before that, it would be over-engineering.
> > 
> > (Btw, we should not need {read|write}_unmap in all cases. IIUC unmap
> >  should not need any permission check.)
> 
> In practice probably not, but they are distinct operations.
> read_unmap means a readable mapping has been removed, write_unmap
> means a writable mapping has been removed.  Again - the permissions on
> the *old* mapping are what matters here.

Why they are distinct operations? Or could you help explain in what
case would we need this flag (read/write) for an unmap() operation?

> 
> > 
> > [...]
> > 
> > > > -static void vfio_iommu_map_notify(Notifier *n, void *data)
> > > > +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
> > > >  {
> > > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > >      VFIOContainer *container = giommu->container;
> > > > @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > > >                                 section->offset_within_region;
> > > >          giommu->container = container;
> > > >          giommu->n.notify = vfio_iommu_map_notify;
> > > > +        giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;
> > > 
> > > "caps" isn't really right.  It's a *requirement* that VFIO get all the
> > > notifications, not a capability.  "caps" would only make sense on the
> > > other side (the vIOMMU implementation).
> > 
> > Sounds reasonable. How about "flags"? Or any suggestion?
> 
> "flags" would do.  I feel like there should be a better name, but I
> can't think of it.

Sure. I can switch.

> 
> > [...]
> > 
> > > >  /**
> > > > @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> > > >   * IOMMU translation entries.
> > > >   *
> > > >   * @mr: the memory region to observe
> > > > - * @n: the notifier to be added; the notifier receives a pointer to an
> > > > - *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> > > > - *     valid on exit from the notifier.
> > > > + * @n: the IOMMUNotifier to be added; the notify callback receives a
> > > > + *     pointer to an #IOMMUTLBEntry as the opaque value; the pointer
> > > > + *     ceases to be valid on exit from the notifier.
> > > >   */
> > > > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> > > > +void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > > +                                           IOMMUNotifier *n);
> > > 
> > > It seems to me that this should be allowed to fail, if the notifier
> > > you're trying to register requires notifications that the MR
> > > implementation can't supply.  That seems cleaner than delaying the
> > > checking until the notification actually happens.
> > 
> > Do we have real use case for this one? For example, do we have case
> > that VM "registering required notifications that MR cannot support"
> > can still work?
> > 
> > Currently there is only one use case AFAIK, which is VFIO+IntelIOMMU.
> > In that case, I take it a configuration error (we should never allow
> > that configuration happen). IMHO All configuration errors should be
> > reported ASAP, and we should never let VM start.
> 
> Yes... I'm not proposing changing that.  I just think it would be
> cleaner to report the error through the error channels, instead of
> just aborting.

Agree. But since I have no obvious reason to change the return code,
I'd prefer to keep it as it is for this series if you don't mind.

[...]

> > > > @@ -1564,8 +1569,22 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n)
> > > >  void memory_region_notify_iommu(MemoryRegion *mr,
> > > >                                  IOMMUTLBEntry entry)
> > > >  {
> > > > +    IOMMUNotifier *iommu_notifier;
> > > > +    IOMMUNotifierCap request_cap;
> > > > +
> > > >      assert(memory_region_is_iommu(mr));
> > > > -    notifier_list_notify(&mr->iommu_notify, &entry);
> > > > +
> > > > +    if (entry.perm & IOMMU_RW) {
> > > > +        request_cap = IOMMU_NOTIFIER_CHANGE;
> > > > +    } else {
> > > > +        request_cap = IOMMU_NOTIFIER_INVALIDATION;
> > > > +    }
> > > 
> > > As noted right at the top, I don't think this logic is really right.
> > > An in-place change should be treated as both a map and unmap.
> > 
> > IIUC, guest needs to send two invalidations for an in-place change,
> > right? So a "change" will actually trigger this twice.
> 
> No, or at least not necessarily.  How the invalidate is reported
> really depends on the guest side vIOMMU interface.  Maybe it requires
> an explicit invalidate followed by a set, maybe a direct in place
> replacement is possible (the PAPR hypercall interface allows this at
> least in theory).
> 
> What I had in mind here is that assuming the vIOMMU can detect an in
> place change, it would then ping all notifiers that have *either* the
> MAP or UNMAP flag bits set.

Yes, so I think we don't need a "change" interface. And maybe we
should start using MAP/UNMAP for the flags naming to avoid unecessary
misunderstandings (when we talk about CHANGE flag, it's actually
map(), but not unmap() + map()).

> 
> > (At least this should be true for Power, and I am guessing the same
> >  for Intel, otherwise PowerIOMMU+VFIO should not work before this
> >  series...)
> 
> I don't really follow what you're saying here.

I meant at least current Power IOMMU should be sending two notifies if
a "change" happened, otherwise current code won't work.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-07 10:54     ` Paolo Bonzini
@ 2016-09-08 10:22       ` Peter Xu
  2016-09-12  1:17         ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2016-09-08 10:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Gibson, qemu-devel, mst, jasowang, vkaplans,
	alex.williamson, wexu, cornelia.huck, dgibson

On Wed, Sep 07, 2016 at 12:54:23PM +0200, Paolo Bonzini wrote:
> 
> 
> On 07/09/2016 08:05, David Gibson wrote:
> > On Wed, Sep 07, 2016 at 01:32:23PM +0800, Peter Xu wrote:
> >> Considering that we may have multiple IOMMU notifier consumers in the
> >> future, converting iommu_ops.notify_{started|stopped} into some more
> >> general form. Now we can trap all notifier registerations and
> >> deregistrations, rather than only the first ones.
> >>
> >> Power was leveraging the notifier_{started|stopped}, adding iommu_user
> >> field for counting on Power guests to achieve the same goal.
> > 
> > Requiring each vIOMMU frontend to reference count or whatever seems
> > like a pain.  The semantics of notify_started() were designed to avoid
> > that.
> > 
> > Instead I'd suggest a callback which gets tripped any time the logical
> > OR of the requested notifications for all current notifiers changes.
> 
> I like the suggestion.  Alternatively you could call notify_stopped if
> old & ~new is nonzero (and you pass old & ~new), and notify_started if
> new & ~old is nonzero (and you pass new & ~old).

I think now I understand the point... Then I'd prefer to use David's
suggestion. A single notify_changed() looks cleaner. To be more
explicit, I would prefer to rename it to notifier_flag_changed(),
since notify_changed() looks like to be called every time notifier
list changed, but actually it is for monitoring the flags.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-08 10:22       ` Peter Xu
@ 2016-09-12  1:17         ` David Gibson
  2016-09-12  5:54           ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2016-09-12  1:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, qemu-devel, mst, jasowang, vkaplans,
	alex.williamson, wexu, cornelia.huck, dgibson

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

On Thu, Sep 08, 2016 at 06:22:40PM +0800, Peter Xu wrote:
> On Wed, Sep 07, 2016 at 12:54:23PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 07/09/2016 08:05, David Gibson wrote:
> > > On Wed, Sep 07, 2016 at 01:32:23PM +0800, Peter Xu wrote:
> > >> Considering that we may have multiple IOMMU notifier consumers in the
> > >> future, converting iommu_ops.notify_{started|stopped} into some more
> > >> general form. Now we can trap all notifier registerations and
> > >> deregistrations, rather than only the first ones.
> > >>
> > >> Power was leveraging the notifier_{started|stopped}, adding iommu_user
> > >> field for counting on Power guests to achieve the same goal.
> > > 
> > > Requiring each vIOMMU frontend to reference count or whatever seems
> > > like a pain.  The semantics of notify_started() were designed to avoid
> > > that.
> > > 
> > > Instead I'd suggest a callback which gets tripped any time the logical
> > > OR of the requested notifications for all current notifiers changes.
> > 
> > I like the suggestion.  Alternatively you could call notify_stopped if
> > old & ~new is nonzero (and you pass old & ~new), and notify_started if
> > new & ~old is nonzero (and you pass new & ~old).
> 
> I think now I understand the point... Then I'd prefer to use David's
> suggestion. A single notify_changed() looks cleaner. To be more
> explicit, I would prefer to rename it to notifier_flag_changed(),
> since notify_changed() looks like to be called every time notifier
> list changed, but actually it is for monitoring the flags.

That sounds reasonable.  I think notifier_flag_changed() should be
passed both the old and new flags, to save the backend having to keep
track of the old ones - which flags have changed might affect what the
callback needs to do.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add
  2016-09-12  1:17         ` David Gibson
@ 2016-09-12  5:54           ` Peter Xu
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2016-09-12  5:54 UTC (permalink / raw)
  To: David Gibson
  Cc: Paolo Bonzini, qemu-devel, mst, jasowang, vkaplans,
	alex.williamson, wexu, cornelia.huck, dgibson

On Mon, Sep 12, 2016 at 11:17:37AM +1000, David Gibson wrote:

[...]

> > I think now I understand the point... Then I'd prefer to use David's
> > suggestion. A single notify_changed() looks cleaner. To be more
> > explicit, I would prefer to rename it to notifier_flag_changed(),
> > since notify_changed() looks like to be called every time notifier
> > list changed, but actually it is for monitoring the flags.
> 
> That sounds reasonable.  I think notifier_flag_changed() should be
> passed both the old and new flags, to save the backend having to keep
> track of the old ones - which flags have changed might affect what the
> callback needs to do.

Agree. That's exactly what v4 was doing. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-09-12  5:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-07  5:32 [Qemu-devel] [PATCH v3 0/3] Introduce IOMMUNotifier struct Peter Xu
2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps Peter Xu
2016-09-07  6:02   ` David Gibson
2016-09-07  7:09     ` Peter Xu
2016-09-07 10:20       ` David Gibson
2016-09-08 10:00         ` Peter Xu
2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add Peter Xu
2016-09-07  6:05   ` David Gibson
2016-09-07  7:23     ` Peter Xu
2016-09-07 10:23       ` David Gibson
2016-09-07 10:54     ` Paolo Bonzini
2016-09-08 10:22       ` Peter Xu
2016-09-12  1:17         ` David Gibson
2016-09-12  5:54           ` Peter Xu
2016-09-07  5:32 ` [Qemu-devel] [PATCH v3 3/3] intel_iommu: allow invalidation typed notifiers Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.