All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci
@ 2019-07-01  9:30 Eric Auger
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all() Eric Auger
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Eric Auger @ 2019-07-01  9:30 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

This series fixes the guest stall observed when attempting to run
a guest exposed with a SMMUv3 and a VFIO-PCI device. As a reminder
SMMUv3 is not yet integrated with VFIO (the device will not work
properly) but this shouldn't prevent the guest from booting.

It also silences some spurious translation configuration decoding
errors (STE out of span or invalid STE) that may happen on guest IOVA
invalidation notifications.

Best Regards

Eric

History:

v1 -> v2:
- Added "memory: Remove unused memory_region_iommu_replay_all()" &
  "hw/arm/smmuv3: Log a guest error when decoding an invalid STE"
- do not attempt to implement replay Cb but rather remove the call
  in case it is not needed
- explain why we do not remove other log messages on config decoding

Eric Auger (6):
  memory: Remove unused memory_region_iommu_replay_all()
  memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  hw/vfio/common: Do not replay IOMMU mappings in nested case
  hw/arm/smmuv3: Advertise VFIO_NESTED
  hw/arm/smmuv3: Log a guest error when decoding an invalid STE
  hw/arm/smmuv3: Remove spurious error messages on IOVA invalidations

 hw/arm/smmuv3-internal.h |  1 +
 hw/arm/smmuv3.c          | 26 ++++++++++++++++++++------
 hw/vfio/common.c         |  7 ++++++-
 include/exec/memory.h    | 13 ++-----------
 memory.c                 |  9 ---------
 5 files changed, 29 insertions(+), 27 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all()
  2019-07-01  9:30 [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci Eric Auger
@ 2019-07-01  9:30 ` Eric Auger
  2019-07-01  9:58   ` Philippe Mathieu-Daudé
  2019-07-03  5:41   ` Peter Xu
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 2/6] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute Eric Auger
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Eric Auger @ 2019-07-01  9:30 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

memory_region_iommu_replay_all is not used. Remove it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/exec/memory.h | 10 ----------
 memory.c              |  9 ---------
 2 files changed, 19 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index e6140e8a04..bdd76653a8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1076,16 +1076,6 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
  */
 void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n);
 
-/**
- * memory_region_iommu_replay_all: replay existing IOMMU translations
- * to all the notifiers registered.
- *
- * Note: this is not related to record-and-replay functionality.
- *
- * @iommu_mr: the memory region to observe
- */
-void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr);
-
 /**
  * memory_region_unregister_iommu_notifier: unregister a notifier for
  * changes to IOMMU translation entries.
diff --git a/memory.c b/memory.c
index 0a089a73ae..290a1493ef 100644
--- a/memory.c
+++ b/memory.c
@@ -1910,15 +1910,6 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
     }
 }
 
-void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr)
-{
-    IOMMUNotifier *notifier;
-
-    IOMMU_NOTIFIER_FOREACH(notifier, iommu_mr) {
-        memory_region_iommu_replay(iommu_mr, notifier);
-    }
-}
-
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
                                              IOMMUNotifier *n)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 2/6] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  2019-07-01  9:30 [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci Eric Auger
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all() Eric Auger
@ 2019-07-01  9:30 ` Eric Auger
  2019-07-03  5:42   ` Peter Xu
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case Eric Auger
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Eric Auger @ 2019-07-01  9:30 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

We introduce a new IOMMU Memory Region attribute,
IOMMU_ATTR_VFIO_NESTED that tells whether the virtual IOMMU
requires physical nested stages for VFIO integration.

Current Intel virtual IOMMU device supports "Caching
Mode" and does not require 2 stages at physical level to be
integrated with VFIO. However SMMUv3 does not implement such
"caching mode" and requires to use physical stage 1 for VFIO
integration.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/exec/memory.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index bdd76653a8..dd7ef23f96 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -204,7 +204,8 @@ struct MemoryRegionOps {
 };
 
 enum IOMMUMemoryRegionAttr {
-    IOMMU_ATTR_SPAPR_TCE_FD
+    IOMMU_ATTR_SPAPR_TCE_FD,
+    IOMMU_ATTR_VFIO_NESTED,
 };
 
 /**
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case
  2019-07-01  9:30 [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci Eric Auger
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all() Eric Auger
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 2/6] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute Eric Auger
@ 2019-07-01  9:30 ` Eric Auger
  2019-07-03  5:41   ` Peter Xu
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 4/6] hw/arm/smmuv3: Advertise VFIO_NESTED Eric Auger
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Eric Auger @ 2019-07-01  9:30 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

In nested mode, the stage 1 translation tables are owned by
the guest and there is no caching on host side. So there is
no need to replay the mappings.

As of today, the SMMUv3 nested mode is not yet implemented
and there is no functional VFIO integration without. But
keeping the replay call would execute the default implementation
of memory_region_iommu_replay and attempt to translate the whole
address range, completely stalling qemu. Keeping the MAP/UNMAP
notifier registration allows to hit a warning message in the
SMMUv3 device that tells the user which VFIO device will not
function properly:

"qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning:
SMMUv3 does not support notification on MAP: device vfio-pci will not
function properly"

Besides, removing the replay call now allows the guest to boot.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/vfio/common.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a859298fda..9ea58df67a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
         IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+        bool nested = false;
         int iommu_idx;
 
         trace_vfio_listener_region_add_iommu(iova, end);
@@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
         memory_region_register_iommu_notifier(section->mr, &giommu->n);
-        memory_region_iommu_replay(giommu->iommu, &giommu->n);
 
+        memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+                                     (void *)&nested);
+        if (!nested) {
+            memory_region_iommu_replay(iommu_mr, &giommu->n);
+        }
         return;
     }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 4/6] hw/arm/smmuv3: Advertise VFIO_NESTED
  2019-07-01  9:30 [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci Eric Auger
                   ` (2 preceding siblings ...)
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case Eric Auger
@ 2019-07-01  9:30 ` Eric Auger
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 5/6] hw/arm/smmuv3: Log a guest error when decoding an invalid STE Eric Auger
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 6/6] hw/arm/smmuv3: Remove spurious error messages on IOVA invalidations Eric Auger
  5 siblings, 0 replies; 17+ messages in thread
From: Eric Auger @ 2019-07-01  9:30 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

Virtual SMMUv3 requires physical nested stages for VFIO integration.
Advertise this attribute.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
---
 hw/arm/smmuv3.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index e96d5beb9a..384c02cb91 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1490,6 +1490,17 @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
     }
 }
 
+static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
+                           enum IOMMUMemoryRegionAttr attr,
+                           void *data)
+{
+    if (attr == IOMMU_ATTR_VFIO_NESTED) {
+        *(bool *) data = true;
+        return 0;
+    }
+    return -EINVAL;
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
                                                   void *data)
 {
@@ -1497,6 +1508,7 @@ static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 
     imrc->translate = smmuv3_translate;
     imrc->notify_flag_changed = smmuv3_notify_flag_changed;
+    imrc->get_attr = smmuv3_get_attr;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 5/6] hw/arm/smmuv3: Log a guest error when decoding an invalid STE
  2019-07-01  9:30 [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci Eric Auger
                   ` (3 preceding siblings ...)
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 4/6] hw/arm/smmuv3: Advertise VFIO_NESTED Eric Auger
@ 2019-07-01  9:30 ` Eric Auger
  2019-07-01  9:58   ` Philippe Mathieu-Daudé
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 6/6] hw/arm/smmuv3: Remove spurious error messages on IOVA invalidations Eric Auger
  5 siblings, 1 reply; 17+ messages in thread
From: Eric Auger @ 2019-07-01  9:30 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

Log a guest error when encountering an invalid STE.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/smmuv3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 384c02cb91..2e270a0f07 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -320,6 +320,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
     uint32_t config;
 
     if (!STE_VALID(ste)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "invalid STE\n");
         goto bad_ste;
     }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 6/6] hw/arm/smmuv3: Remove spurious error messages on IOVA invalidations
  2019-07-01  9:30 [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci Eric Auger
                   ` (4 preceding siblings ...)
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 5/6] hw/arm/smmuv3: Log a guest error when decoding an invalid STE Eric Auger
@ 2019-07-01  9:30 ` Eric Auger
  5 siblings, 0 replies; 17+ messages in thread
From: Eric Auger @ 2019-07-01  9:30 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

An IOVA/ASID invalidation is notified to all IOMMU Memory Regions
through smmuv3_inv_notifiers_iova/smmuv3_notify_iova.

When the notification occurs it is possible that some of the
PCIe devices associated to the notified regions do not have a
valid stream table entry. In that case we output a LOG_GUEST_ERROR
message, for example:

invalid sid=<SID> (L1STD span=0)
"smmuv3_notify_iova error decoding the configuration for iommu mr=<MR>

This is unfortunate as the user gets the impression that there
are some translation decoding errors whereas there are not.

This patch adds a new field in SMMUEventInfo that tells whether
the detection of an invalid STE must lead to an error report.
invalid_ste_allowed is set before doing the invalidations and
kept unset on actual translation.

The other configuration decoding error messages are kept since if the
STE is valid then the rest of the config must be correct.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- explain why we keep the other config decoding errors
- handle the new guest error log on STE invalid
---
 hw/arm/smmuv3-internal.h |  1 +
 hw/arm/smmuv3.c          | 15 ++++++++-------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index b160289cd1..d190181ef1 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -381,6 +381,7 @@ typedef struct SMMUEventInfo {
     uint32_t sid;
     bool recorded;
     bool record_trans_faults;
+    bool inval_ste_allowed;
     union {
         struct {
             uint32_t ssid;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2e270a0f07..517755aed5 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -320,7 +320,9 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
     uint32_t config;
 
     if (!STE_VALID(ste)) {
-        qemu_log_mask(LOG_GUEST_ERROR, "invalid STE\n");
+        if (!event->inval_ste_allowed) {
+            qemu_log_mask(LOG_GUEST_ERROR, "invalid STE\n");
+        }
         goto bad_ste;
     }
 
@@ -405,7 +407,7 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
 
         span = L1STD_SPAN(&l1std);
 
-        if (!span) {
+        if (!span && !event->inval_ste_allowed) {
             /* l2ptr is not valid */
             qemu_log_mask(LOG_GUEST_ERROR,
                           "invalid sid=%d (L1STD span=0)\n", sid);
@@ -603,7 +605,9 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
     SMMUv3State *s = sdev->smmu;
     uint32_t sid = smmu_get_sid(sdev);
-    SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid};
+    SMMUEventInfo event = {.type = SMMU_EVT_NONE,
+                           .sid = sid,
+                           .inval_ste_allowed = false};
     SMMUPTWEventInfo ptw_info = {};
     SMMUTranslationStatus status;
     SMMUState *bs = ARM_SMMU(s);
@@ -796,16 +800,13 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
                                dma_addr_t iova)
 {
     SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
-    SMMUEventInfo event = {};
+    SMMUEventInfo event = {.inval_ste_allowed = true};
     SMMUTransTableInfo *tt;
     SMMUTransCfg *cfg;
     IOMMUTLBEntry entry;
 
     cfg = smmuv3_get_config(sdev, &event);
     if (!cfg) {
-        qemu_log_mask(LOG_GUEST_ERROR,
-                      "%s error decoding the configuration for iommu mr=%s\n",
-                      __func__, mr->parent_obj.name);
         return;
     }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/6] hw/arm/smmuv3: Log a guest error when decoding an invalid STE
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 5/6] hw/arm/smmuv3: Log a guest error when decoding an invalid STE Eric Auger
@ 2019-07-01  9:58   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-07-01  9:58 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

On 7/1/19 11:30 AM, Eric Auger wrote:
> Log a guest error when encountering an invalid STE.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/arm/smmuv3.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 384c02cb91..2e270a0f07 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -320,6 +320,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
>      uint32_t config;
>  
>      if (!STE_VALID(ste)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "invalid STE\n");
>          goto bad_ste;
>      }
>  
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all()
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all() Eric Auger
@ 2019-07-01  9:58   ` Philippe Mathieu-Daudé
  2019-07-03  5:41   ` Peter Xu
  1 sibling, 0 replies; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-07-01  9:58 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	peterx, pbonzini, alex.williamson

On 7/1/19 11:30 AM, Eric Auger wrote:
> memory_region_iommu_replay_all is not used. Remove it.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  include/exec/memory.h | 10 ----------
>  memory.c              |  9 ---------
>  2 files changed, 19 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index e6140e8a04..bdd76653a8 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1076,16 +1076,6 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
>   */
>  void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n);
>  
> -/**
> - * memory_region_iommu_replay_all: replay existing IOMMU translations
> - * to all the notifiers registered.
> - *
> - * Note: this is not related to record-and-replay functionality.
> - *
> - * @iommu_mr: the memory region to observe
> - */
> -void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr);
> -
>  /**
>   * memory_region_unregister_iommu_notifier: unregister a notifier for
>   * changes to IOMMU translation entries.
> diff --git a/memory.c b/memory.c
> index 0a089a73ae..290a1493ef 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1910,15 +1910,6 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>      }
>  }
>  
> -void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr)
> -{
> -    IOMMUNotifier *notifier;
> -
> -    IOMMU_NOTIFIER_FOREACH(notifier, iommu_mr) {
> -        memory_region_iommu_replay(iommu_mr, notifier);
> -    }
> -}
> -
>  void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>                                               IOMMUNotifier *n)
>  {
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case Eric Auger
@ 2019-07-03  5:41   ` Peter Xu
  2019-07-03  9:04     ` Auger Eric
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2019-07-03  5:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote:
> In nested mode, the stage 1 translation tables are owned by
> the guest and there is no caching on host side. So there is
> no need to replay the mappings.
> 
> As of today, the SMMUv3 nested mode is not yet implemented
> and there is no functional VFIO integration without. But
> keeping the replay call would execute the default implementation
> of memory_region_iommu_replay and attempt to translate the whole
> address range, completely stalling qemu. Keeping the MAP/UNMAP
> notifier registration allows to hit a warning message in the
> SMMUv3 device that tells the user which VFIO device will not
> function properly:
> 
> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning:
> SMMUv3 does not support notification on MAP: device vfio-pci will not
> function properly"
> 
> Besides, removing the replay call now allows the guest to boot.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/vfio/common.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index a859298fda..9ea58df67a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      if (memory_region_is_iommu(section->mr)) {
>          VFIOGuestIOMMU *giommu;
>          IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> +        bool nested = false;
>          int iommu_idx;
>  
>          trace_vfio_listener_region_add_iommu(iova, end);
> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>  
>          memory_region_register_iommu_notifier(section->mr, &giommu->n);
> -        memory_region_iommu_replay(giommu->iommu, &giommu->n);
>  
> +        memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
> +                                     (void *)&nested);
> +        if (!nested) {
> +            memory_region_iommu_replay(iommu_mr, &giommu->n);
> +        }

For nested, do we need these IOMMU notifiers after all?

I'm asking because the no-IOMMU case of vfio_listener_region_add()
seems to suite very well for nested page tables to me.  For example,
vfio does not need to listen to MAP events any more because we'll
simply share the guest IOMMU page table to be the 1st level page table
of the host SMMU IIUC.  And if we have 2nd page table changes (like
memory hotplug) then IMHO vfio_listener_region_add() will do this for
us as well just like when there's no SMMU.

Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all
the memory regions, so it also seems a bit awkward to make it per
memory region.  If you see the other real user of this flag (which is
IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region.

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all()
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all() Eric Auger
  2019-07-01  9:58   ` Philippe Mathieu-Daudé
@ 2019-07-03  5:41   ` Peter Xu
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Xu @ 2019-07-03  5:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

On Mon, Jul 01, 2019 at 11:30:29AM +0200, Eric Auger wrote:
> memory_region_iommu_replay_all is not used. Remove it.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/6] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 2/6] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute Eric Auger
@ 2019-07-03  5:42   ` Peter Xu
  2019-07-03  9:10     ` Auger Eric
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2019-07-03  5:42 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

On Mon, Jul 01, 2019 at 11:30:30AM +0200, Eric Auger wrote:
> We introduce a new IOMMU Memory Region attribute,
> IOMMU_ATTR_VFIO_NESTED that tells whether the virtual IOMMU
> requires physical nested stages for VFIO integration.
> 
> Current Intel virtual IOMMU device supports "Caching
> Mode" and does not require 2 stages at physical level to be
> integrated with VFIO. However SMMUv3 does not implement such
> "caching mode" and requires to use physical stage 1 for VFIO
> integration.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  include/exec/memory.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index bdd76653a8..dd7ef23f96 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -204,7 +204,8 @@ struct MemoryRegionOps {
>  };
>  
>  enum IOMMUMemoryRegionAttr {
> -    IOMMU_ATTR_SPAPR_TCE_FD
> +    IOMMU_ATTR_SPAPR_TCE_FD,
> +    IOMMU_ATTR_VFIO_NESTED,

IMHO it'll be better if this patch can be squashed into the first user
of the new flag to better clarify itself on why it will be needed (if
finally we still would like to have this flag).

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case
  2019-07-03  5:41   ` Peter Xu
@ 2019-07-03  9:04     ` Auger Eric
  2019-07-03 10:21       ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Auger Eric @ 2019-07-03  9:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

Hi Peter,

On 7/3/19 7:41 AM, Peter Xu wrote:
> On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote:
>> In nested mode, the stage 1 translation tables are owned by
>> the guest and there is no caching on host side. So there is
>> no need to replay the mappings.
>>
>> As of today, the SMMUv3 nested mode is not yet implemented
>> and there is no functional VFIO integration without. But
>> keeping the replay call would execute the default implementation
>> of memory_region_iommu_replay and attempt to translate the whole
>> address range, completely stalling qemu. Keeping the MAP/UNMAP
>> notifier registration allows to hit a warning message in the
>> SMMUv3 device that tells the user which VFIO device will not
>> function properly:
>>
>> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning:
>> SMMUv3 does not support notification on MAP: device vfio-pci will not
>> function properly"
>>
>> Besides, removing the replay call now allows the guest to boot.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  hw/vfio/common.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index a859298fda..9ea58df67a 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>      if (memory_region_is_iommu(section->mr)) {
>>          VFIOGuestIOMMU *giommu;
>>          IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
>> +        bool nested = false;
>>          int iommu_idx;
>>  
>>          trace_vfio_listener_region_add_iommu(iova, end);
>> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>>  
>>          memory_region_register_iommu_notifier(section->mr, &giommu->n);
>> -        memory_region_iommu_replay(giommu->iommu, &giommu->n);
>>  
>> +        memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
>> +                                     (void *)&nested);
>> +        if (!nested) {
>> +            memory_region_iommu_replay(iommu_mr, &giommu->n);
>> +        }
> 
> For nested, do we need these IOMMU notifiers after all?
> 
> I'm asking because the no-IOMMU case of vfio_listener_region_add()
> seems to suite very well for nested page tables to me.  For example,
> vfio does not need to listen to MAP events any more because we'll
> simply share the guest IOMMU page table to be the 1st level page table
> of the host SMMU IIUC.
We don't need the MAP notifier but we need the UNMAP notifier: when the
guest invalidates an ASID/IOVA we need to propagate this to the physical
IOMMU.

As mentioned in the cover letter, at the moment, I still register both
MAP/UNMAP notifiers as the MAP notifier registration produces an
explicit warning message in the SMMUv3 device. If I remove the
registration we will loose this message. I hope this code is just an
intermediate state towards the actual nested stage support.

  And if we have 2nd page table changes (like
> memory hotplug) then IMHO vfio_listener_region_add() will do this for
> us as well just like when there's no SMMU.

In the current integration, see [RFC v4 20/27] hw/vfio/common: Setup
nested stage mappings (https://patchwork.kernel.org/patch/10962721/) I
use a prereg_listener for stage 2 mappings.
> 
> Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all
> the memory regions, so it also seems a bit awkward to make it per
> memory region.  If you see the other real user of this flag (which is
> IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region.

That's correct all SMMUv3 regions will return this value. But what other
API can be used to query IOMMU level attributes?

On the other hand,

Alexey's commit f1334de60b2 ("memory/iommu: Add get_attr()") says:
    This adds get_attr() to IOMMUMemoryRegionClass, like
    iommu_ops::domain_get_attr in the Linux kernel.

and DOMAIN_ATTR_NESTING is part of enum iommu_attr at kernel level.

Thanks

Eric



> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/6] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  2019-07-03  5:42   ` Peter Xu
@ 2019-07-03  9:10     ` Auger Eric
  0 siblings, 0 replies; 17+ messages in thread
From: Auger Eric @ 2019-07-03  9:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

Hi Peter,

On 7/3/19 7:42 AM, Peter Xu wrote:
> On Mon, Jul 01, 2019 at 11:30:30AM +0200, Eric Auger wrote:
>> We introduce a new IOMMU Memory Region attribute,
>> IOMMU_ATTR_VFIO_NESTED that tells whether the virtual IOMMU
>> requires physical nested stages for VFIO integration.
>>
>> Current Intel virtual IOMMU device supports "Caching
>> Mode" and does not require 2 stages at physical level to be
>> integrated with VFIO. However SMMUv3 does not implement such
>> "caching mode" and requires to use physical stage 1 for VFIO
>> integration.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  include/exec/memory.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index bdd76653a8..dd7ef23f96 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -204,7 +204,8 @@ struct MemoryRegionOps {
>>  };
>>  
>>  enum IOMMUMemoryRegionAttr {
>> -    IOMMU_ATTR_SPAPR_TCE_FD
>> +    IOMMU_ATTR_SPAPR_TCE_FD,
>> +    IOMMU_ATTR_VFIO_NESTED,
> 
> IMHO it'll be better if this patch can be squashed into the first user
> of the new flag to better clarify itself on why it will be needed (if
> finally we still would like to have this flag).
sure I will squash it.

Nested mode requires important adaptations in the current
hw/vfio/common.c code to register specific notifiers: UNMAP, config
change, MSI binding notifiers (this one actually uses a MAP notifier by
the way). So there we need to recognize an IOMMU works in nested mode
one way or another.

Thanks

Eric
> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case
  2019-07-03  9:04     ` Auger Eric
@ 2019-07-03 10:21       ` Peter Xu
  2019-07-03 10:45         ` Auger Eric
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2019-07-03 10:21 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

On Wed, Jul 03, 2019 at 11:04:38AM +0200, Auger Eric wrote:
> Hi Peter,

Hi, Eric,

> 
> On 7/3/19 7:41 AM, Peter Xu wrote:
> > On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote:
> >> In nested mode, the stage 1 translation tables are owned by
> >> the guest and there is no caching on host side. So there is
> >> no need to replay the mappings.
> >>
> >> As of today, the SMMUv3 nested mode is not yet implemented
> >> and there is no functional VFIO integration without. But
> >> keeping the replay call would execute the default implementation
> >> of memory_region_iommu_replay and attempt to translate the whole
> >> address range, completely stalling qemu. Keeping the MAP/UNMAP
> >> notifier registration allows to hit a warning message in the
> >> SMMUv3 device that tells the user which VFIO device will not
> >> function properly:
> >>
> >> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning:
> >> SMMUv3 does not support notification on MAP: device vfio-pci will not
> >> function properly"
> >>
> >> Besides, removing the replay call now allows the guest to boot.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> ---
> >>  hw/vfio/common.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index a859298fda..9ea58df67a 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >>      if (memory_region_is_iommu(section->mr)) {
> >>          VFIOGuestIOMMU *giommu;
> >>          IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> >> +        bool nested = false;
> >>          int iommu_idx;
> >>  
> >>          trace_vfio_listener_region_add_iommu(iova, end);
> >> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> >>  
> >>          memory_region_register_iommu_notifier(section->mr, &giommu->n);
> >> -        memory_region_iommu_replay(giommu->iommu, &giommu->n);
> >>  
> >> +        memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
> >> +                                     (void *)&nested);
> >> +        if (!nested) {
> >> +            memory_region_iommu_replay(iommu_mr, &giommu->n);
> >> +        }
> > 
> > For nested, do we need these IOMMU notifiers after all?
> > 
> > I'm asking because the no-IOMMU case of vfio_listener_region_add()
> > seems to suite very well for nested page tables to me.  For example,
> > vfio does not need to listen to MAP events any more because we'll
> > simply share the guest IOMMU page table to be the 1st level page table
> > of the host SMMU IIUC.
> We don't need the MAP notifier but we need the UNMAP notifier: when the
> guest invalidates an ASID/IOVA we need to propagate this to the physical
> IOMMU.

Indeed we need the unmaps.  However I've got a major confusion here:
With nested mode, we should need unmap events for the 1st level rather
than the 2nd level, am I right?  I mean, the invalidate request should
be a GVA range rather than GPA range?  While here IIUC
vfio_listener_region_add() should be working on GPA address space.

I don't know SMMU enough, but for Intel there should have two
different kinds of invalidation messages.  Currently because we still
don't support nested on Intel so the 1st level invalidation is still
not yet implemented (VTD_INV_DESC_PIOTLB).  And IMHO if it is going to
be implemented, I think it should be different comparing to current
IOMMU_NOTIFIER_UNMAP in that it should not even need to bind to a
memory region, and modules like vfio should simply deliver that exact
message to the host IOMMU driver for the GVA range to be invalidated,
just like what it will do with the root pointer of guest 1st level
page table.

> 
> As mentioned in the cover letter, at the moment, I still register both
> MAP/UNMAP notifiers as the MAP notifier registration produces an
> explicit warning message in the SMMUv3 device. If I remove the
> registration we will loose this message. I hope this code is just an
> intermediate state towards the actual nested stage support.

I didn't see it in the cover letter.  Would you please provide a link
to the message?

> 
>   And if we have 2nd page table changes (like
> > memory hotplug) then IMHO vfio_listener_region_add() will do this for
> > us as well just like when there's no SMMU.
> 
> In the current integration, see [RFC v4 20/27] hw/vfio/common: Setup
> nested stage mappings (https://patchwork.kernel.org/patch/10962721/) I
> use a prereg_listener for stage 2 mappings.
> > 
> > Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all
> > the memory regions, so it also seems a bit awkward to make it per
> > memory region.  If you see the other real user of this flag (which is
> > IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region.
> 
> That's correct all SMMUv3 regions will return this value. But what other
> API can be used to query IOMMU level attributes?
> 
> On the other hand,
> 
> Alexey's commit f1334de60b2 ("memory/iommu: Add get_attr()") says:
>     This adds get_attr() to IOMMUMemoryRegionClass, like
>     iommu_ops::domain_get_attr in the Linux kernel.
> 
> and DOMAIN_ATTR_NESTING is part of enum iommu_attr at kernel level.

Yeah it's fine to me.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case
  2019-07-03 10:21       ` Peter Xu
@ 2019-07-03 10:45         ` Auger Eric
  2019-07-04  2:36           ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Auger Eric @ 2019-07-03 10:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

Hi Peter,
On 7/3/19 12:21 PM, Peter Xu wrote:
> On Wed, Jul 03, 2019 at 11:04:38AM +0200, Auger Eric wrote:
>> Hi Peter,
> 
> Hi, Eric,
> 
>>
>> On 7/3/19 7:41 AM, Peter Xu wrote:
>>> On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote:
>>>> In nested mode, the stage 1 translation tables are owned by
>>>> the guest and there is no caching on host side. So there is
>>>> no need to replay the mappings.
>>>>
>>>> As of today, the SMMUv3 nested mode is not yet implemented
>>>> and there is no functional VFIO integration without. But
>>>> keeping the replay call would execute the default implementation
>>>> of memory_region_iommu_replay and attempt to translate the whole
>>>> address range, completely stalling qemu. Keeping the MAP/UNMAP
>>>> notifier registration allows to hit a warning message in the
>>>> SMMUv3 device that tells the user which VFIO device will not
>>>> function properly:
>>>>
>>>> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning:
>>>> SMMUv3 does not support notification on MAP: device vfio-pci will not
>>>> function properly"
>>>>
>>>> Besides, removing the replay call now allows the guest to boot.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>  hw/vfio/common.c | 7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index a859298fda..9ea58df67a 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>>>      if (memory_region_is_iommu(section->mr)) {
>>>>          VFIOGuestIOMMU *giommu;
>>>>          IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
>>>> +        bool nested = false;
>>>>          int iommu_idx;
>>>>  
>>>>          trace_vfio_listener_region_add_iommu(iova, end);
>>>> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>>>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>>>>  
>>>>          memory_region_register_iommu_notifier(section->mr, &giommu->n);
>>>> -        memory_region_iommu_replay(giommu->iommu, &giommu->n);
>>>>  
>>>> +        memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
>>>> +                                     (void *)&nested);
>>>> +        if (!nested) {
>>>> +            memory_region_iommu_replay(iommu_mr, &giommu->n);
>>>> +        }
>>>
>>> For nested, do we need these IOMMU notifiers after all?
>>>
>>> I'm asking because the no-IOMMU case of vfio_listener_region_add()
>>> seems to suite very well for nested page tables to me.  For example,
>>> vfio does not need to listen to MAP events any more because we'll
>>> simply share the guest IOMMU page table to be the 1st level page table
>>> of the host SMMU IIUC.
>> We don't need the MAP notifier but we need the UNMAP notifier: when the
>> guest invalidates an ASID/IOVA we need to propagate this to the physical
>> IOMMU.
> 
> Indeed we need the unmaps.  However I've got a major confusion here:
> With nested mode, we should need unmap events for the 1st level rather
> than the 2nd level, am I right?

yes that's correct

  I mean, the invalidate request should
> be a GVA range rather than GPA range?  While here IIUC
> vfio_listener_region_add() should be working on GPA address space.

Sorry I don't get your point. My understanding is in
vfio_listener_region_add() we detect the addition of an IOMMU MR and
init a notifier that covers the input AS it translates (GVA). When the
guest sends an IOTLB invalidation on its first stage, this is trapped,
we notify the UNMAP notifier and this eventually produces a stage1
invalidation at physical level (through VFIO/IOMMU kernel path). This
piece is not yet implemented: see below.


> 
> I don't know SMMU enough, but for Intel there should have two
> different kinds of invalidation messages.  Currently because we still
> don't support nested on Intel so the 1st level invalidation is still
> not yet implemented (VTD_INV_DESC_PIOTLB).  And IMHO if it is going to
> be implemented, I think it should be different comparing to current
> IOMMU_NOTIFIER_UNMAP
Yes the UNMAP notifier implementation is definitively different. It
calls a VFIO iotcl that eventually produces a physical IOMMU stage1
invalidation. See ttps://patchwork.kernel.org/patch/10962721/.

Maybe the confusion comes from the fact this patch is *not* an
integration for nested SMMUv3 with VFIO. SMMUv3/VFIO still does not
work. It just allows the guest to boot by bypassing the replay function.
If things are clearer maybe I should simply assert() in case we detect a
VFIO device protected by an SMMUv3.

 in that it should not even need to bind to a
> memory region, and modules like vfio should simply deliver that exact
> message to the host IOMMU driver for the GVA range to be invalidated,
> just like what it will do with the root pointer of guest 1st level
> page table.
> 
>>
>> As mentioned in the cover letter, at the moment, I still register both
>> MAP/UNMAP notifiers as the MAP notifier registration produces an
>> explicit warning message in the SMMUv3 device. If I remove the
>> registration we will loose this message. I hope this code is just an
>> intermediate state towards the actual nested stage support.
> 
> I didn't see it in the cover letter.  Would you please provide a link
> to the message?
Sorry it is in this commit message. Reference to

 "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning:
SMMUv3 does not support notification on MAP: device vfio-pci will not
function properly"
> 
>>
>>   And if we have 2nd page table changes (like
>>> memory hotplug) then IMHO vfio_listener_region_add() will do this for
>>> us as well just like when there's no SMMU.
>>
>> In the current integration, see [RFC v4 20/27] hw/vfio/common: Setup
>> nested stage mappings (https://patchwork.kernel.org/patch/10962721/) I
>> use a prereg_listener for stage 2 mappings.
>>>
>>> Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all
>>> the memory regions, so it also seems a bit awkward to make it per
>>> memory region.  If you see the other real user of this flag (which is
>>> IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region.
>>
>> That's correct all SMMUv3 regions will return this value. But what other
>> API can be used to query IOMMU level attributes?
>>
>> On the other hand,
>>
>> Alexey's commit f1334de60b2 ("memory/iommu: Add get_attr()") says:
>>     This adds get_attr() to IOMMUMemoryRegionClass, like
>>     iommu_ops::domain_get_attr in the Linux kernel.
>>
>> and DOMAIN_ATTR_NESTING is part of enum iommu_attr at kernel level.
> 
> Yeah it's fine to me.

Thanks

Eric
> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case
  2019-07-03 10:45         ` Auger Eric
@ 2019-07-04  2:36           ` Peter Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2019-07-04  2:36 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, qemu-devel, alex.williamson, qemu-arm, pbonzini,
	eric.auger.pro

On Wed, Jul 03, 2019 at 12:45:37PM +0200, Auger Eric wrote:
> Hi Peter,

Hi, Eric,

> On 7/3/19 12:21 PM, Peter Xu wrote:
> > On Wed, Jul 03, 2019 at 11:04:38AM +0200, Auger Eric wrote:
> >> Hi Peter,
> > 
> > Hi, Eric,
> > 
> >>
> >> On 7/3/19 7:41 AM, Peter Xu wrote:
> >>> On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote:
> >>>> In nested mode, the stage 1 translation tables are owned by
> >>>> the guest and there is no caching on host side. So there is
> >>>> no need to replay the mappings.
> >>>>
> >>>> As of today, the SMMUv3 nested mode is not yet implemented
> >>>> and there is no functional VFIO integration without. But
> >>>> keeping the replay call would execute the default implementation
> >>>> of memory_region_iommu_replay and attempt to translate the whole
> >>>> address range, completely stalling qemu. Keeping the MAP/UNMAP
> >>>> notifier registration allows to hit a warning message in the
> >>>> SMMUv3 device that tells the user which VFIO device will not
> >>>> function properly:
> >>>>
> >>>> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning:
> >>>> SMMUv3 does not support notification on MAP: device vfio-pci will not
> >>>> function properly"
> >>>>
> >>>> Besides, removing the replay call now allows the guest to boot.
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>> ---
> >>>>  hw/vfio/common.c | 7 ++++++-
> >>>>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>>> index a859298fda..9ea58df67a 100644
> >>>> --- a/hw/vfio/common.c
> >>>> +++ b/hw/vfio/common.c
> >>>> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >>>>      if (memory_region_is_iommu(section->mr)) {
> >>>>          VFIOGuestIOMMU *giommu;
> >>>>          IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> >>>> +        bool nested = false;
> >>>>          int iommu_idx;
> >>>>  
> >>>>          trace_vfio_listener_region_add_iommu(iova, end);
> >>>> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >>>>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> >>>>  
> >>>>          memory_region_register_iommu_notifier(section->mr, &giommu->n);
> >>>> -        memory_region_iommu_replay(giommu->iommu, &giommu->n);
> >>>>  
> >>>> +        memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
> >>>> +                                     (void *)&nested);
> >>>> +        if (!nested) {
> >>>> +            memory_region_iommu_replay(iommu_mr, &giommu->n);
> >>>> +        }
> >>>
> >>> For nested, do we need these IOMMU notifiers after all?
> >>>
> >>> I'm asking because the no-IOMMU case of vfio_listener_region_add()
> >>> seems to suite very well for nested page tables to me.  For example,
> >>> vfio does not need to listen to MAP events any more because we'll
> >>> simply share the guest IOMMU page table to be the 1st level page table
> >>> of the host SMMU IIUC.
> >> We don't need the MAP notifier but we need the UNMAP notifier: when the
> >> guest invalidates an ASID/IOVA we need to propagate this to the physical
> >> IOMMU.
> > 
> > Indeed we need the unmaps.  However I've got a major confusion here:
> > With nested mode, we should need unmap events for the 1st level rather
> > than the 2nd level, am I right?
> 
> yes that's correct
> 
>   I mean, the invalidate request should
> > be a GVA range rather than GPA range?  While here IIUC
> > vfio_listener_region_add() should be working on GPA address space.
> 
> Sorry I don't get your point. My understanding is in
> vfio_listener_region_add() we detect the addition of an IOMMU MR and
> init a notifier that covers the input AS it translates (GVA). When the
> guest sends an IOTLB invalidation on its first stage, this is trapped,
> we notify the UNMAP notifier and this eventually produces a stage1
> invalidation at physical level (through VFIO/IOMMU kernel path). This
> piece is not yet implemented: see below.
> 
> 
> > 
> > I don't know SMMU enough, but for Intel there should have two
> > different kinds of invalidation messages.  Currently because we still
> > don't support nested on Intel so the 1st level invalidation is still
> > not yet implemented (VTD_INV_DESC_PIOTLB).  And IMHO if it is going to
> > be implemented, I think it should be different comparing to current
> > IOMMU_NOTIFIER_UNMAP
> Yes the UNMAP notifier implementation is definitively different. It
> calls a VFIO iotcl that eventually produces a physical IOMMU stage1
> invalidation. See ttps://patchwork.kernel.org/patch/10962721/.

[1]

> 
> Maybe the confusion comes from the fact this patch is *not* an
> integration for nested SMMUv3 with VFIO. SMMUv3/VFIO still does not
> work. It just allows the guest to boot by bypassing the replay function.
> If things are clearer maybe I should simply assert() in case we detect a
> VFIO device protected by an SMMUv3.

Actually that's also my question to your other patch [1]:

+        if (container->iommu_type == VFIO_TYPE1_NESTING_IOMMU) {
+            /* Config notifier to propagate guest stage 1 config changes */
+            giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset);
+            iommu_config_notifier_init(&giommu->n, vfio_iommu_nested_notify,
+                                       IOMMU_NOTIFIER_CONFIG_PASID, iommu_idx);
+            QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+            memory_region_register_iommu_notifier(section->mr, &giommu->n);
+
+            /* IOTLB unmap notifier to propagate guest IOTLB invalidations */
+            giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset);
+            iommu_iotlb_notifier_init(&giommu->n, vfio_iommu_unmap_notify,
+                                      IOMMU_NOTIFIER_IOTLB_UNMAP,
+                                      section->offset_within_region,
+                                      int128_get64(llend),
+                                      iommu_idx);
+            QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+            memory_region_register_iommu_notifier(section->mr, &giommu->n);
+        } else {

It'll be fine if we want to do this way finally, but it just let me
feel a bit confusing when we register these notifiers with current
IOMMU notifiers, because IMHO all these two kinds of events:

  - PASID root pointer
  - PASID-based IOTLB invalidations

should not bind to any memory region at all, and should not have a
concept of "memory range to register".  It'll be easier for me to
understand if vfio simply registers with IOMMU directly (or maybe
registering with the PCI layer could be a bit better from code
prospective?) in this case with these two notifiers and there seems to
have nothing to do with current memory region framework.

My vague memory was that Liu Yi has had some similar work (e.g.,
introduce some PCI level notifers and let VFIO registers to that
instead for the nested case, though that's for Intel but IMHO it
suites too for ARM) but I've totally forgotten the details.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-07-04  2:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01  9:30 [Qemu-devel] [PATCH v2 0/6] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci Eric Auger
2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 1/6] memory: Remove unused memory_region_iommu_replay_all() Eric Auger
2019-07-01  9:58   ` Philippe Mathieu-Daudé
2019-07-03  5:41   ` Peter Xu
2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 2/6] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute Eric Auger
2019-07-03  5:42   ` Peter Xu
2019-07-03  9:10     ` Auger Eric
2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 3/6] hw/vfio/common: Do not replay IOMMU mappings in nested case Eric Auger
2019-07-03  5:41   ` Peter Xu
2019-07-03  9:04     ` Auger Eric
2019-07-03 10:21       ` Peter Xu
2019-07-03 10:45         ` Auger Eric
2019-07-04  2:36           ` Peter Xu
2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 4/6] hw/arm/smmuv3: Advertise VFIO_NESTED Eric Auger
2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 5/6] hw/arm/smmuv3: Log a guest error when decoding an invalid STE Eric Auger
2019-07-01  9:58   ` Philippe Mathieu-Daudé
2019-07-01  9:30 ` [Qemu-devel] [PATCH v2 6/6] hw/arm/smmuv3: Remove spurious error messages on IOVA invalidations Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.