All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] vhost-vdpa: add support for vIOMMU
@ 2022-10-27  7:40 Cindy Lu
  2022-10-27  7:40 ` [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c Cindy Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Cindy Lu @ 2022-10-27  7:40 UTC (permalink / raw)
  To: lulu, alex.williamson, jasowang, mst, pbonzini, peterx, david,
	f4bug, sgarzare
  Cc: qemu-devel

These patches are to support vIOMMU in vdpa device

changes in V3
1. Move function vfio_get_xlat_addr to memory.c
2. Use the existing memory listener, while the MR is
iommu MR then call the function iommu_region_add/
iommu_region_del

changes in V4
1.make the comments in vfio_get_xlat_addr more general

Cindy Lu (2):
  vfio: move the function vfio_get_xlat_addr() to memory.c
  vhost-vdpa: add support for vIOMMU

 hw/vfio/common.c               |  92 +----------------------
 hw/virtio/vhost-vdpa.c         | 131 ++++++++++++++++++++++++++++++---
 include/exec/memory.h          |   4 +
 include/hw/virtio/vhost-vdpa.h |  10 +++
 softmmu/memory.c               |  84 +++++++++++++++++++++
 5 files changed, 222 insertions(+), 99 deletions(-)

-- 
2.34.3



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-27  7:40 [PATCH v4 0/2] vhost-vdpa: add support for vIOMMU Cindy Lu
@ 2022-10-27  7:40 ` Cindy Lu
  2022-10-27  8:11   ` Jason Wang
                     ` (2 more replies)
  2022-10-27  7:40 ` [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU Cindy Lu
  2022-10-29  7:57 ` [PATCH v4 0/2] " Michael S. Tsirkin
  2 siblings, 3 replies; 14+ messages in thread
From: Cindy Lu @ 2022-10-27  7:40 UTC (permalink / raw)
  To: lulu, alex.williamson, jasowang, mst, pbonzini, peterx, david,
	f4bug, sgarzare
  Cc: qemu-devel

Move the function vfio_get_xlat_addr to softmmu/memory.c, and
change the name to memory_get_xlat_addr().So we can use this
function in other devices,such as vDPA device.

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/vfio/common.c      | 92 ++-----------------------------------------
 include/exec/memory.h |  4 ++
 softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 92 insertions(+), 88 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562a9b..2b5a9f3d8d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
            section->offset_within_address_space & (1ULL << 63);
 }
 
-/* Called with rcu_read_lock held.  */
-static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
-                               ram_addr_t *ram_addr, bool *read_only)
-{
-    MemoryRegion *mr;
-    hwaddr xlat;
-    hwaddr len = iotlb->addr_mask + 1;
-    bool writable = iotlb->perm & IOMMU_WO;
-
-    /*
-     * The IOMMU TLB entry we have just covers translation through
-     * this IOMMU to its immediate target.  We need to translate
-     * it the rest of the way through to memory.
-     */
-    mr = address_space_translate(&address_space_memory,
-                                 iotlb->translated_addr,
-                                 &xlat, &len, writable,
-                                 MEMTXATTRS_UNSPECIFIED);
-    if (!memory_region_is_ram(mr)) {
-        error_report("iommu map to non memory area %"HWADDR_PRIx"",
-                     xlat);
-        return false;
-    } else if (memory_region_has_ram_discard_manager(mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
-        MemoryRegionSection tmp = {
-            .mr = mr,
-            .offset_within_region = xlat,
-            .size = int128_make64(len),
-        };
-
-        /*
-         * Malicious VMs can map memory into the IOMMU, which is expected
-         * to remain discarded. vfio will pin all pages, populating memory.
-         * Disallow that. vmstate priorities make sure any RamDiscardManager
-         * were already restored before IOMMUs are restored.
-         */
-        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
-            error_report("iommu map to discarded memory (e.g., unplugged via"
-                         " virtio-mem): %"HWADDR_PRIx"",
-                         iotlb->translated_addr);
-            return false;
-        }
-
-        /*
-         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
-         * pages will remain pinned inside vfio until unmapped, resulting in a
-         * higher memory consumption than expected. If memory would get
-         * populated again later, there would be an inconsistency between pages
-         * pinned by vfio and pages seen by QEMU. This is the case until
-         * unmapped from the IOMMU (e.g., during device reset).
-         *
-         * With malicious guests, we really only care about pinning more memory
-         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
-         * exceeded and can be used to mitigate this problem.
-         */
-        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
-                         " RAM (e.g., virtio-mem) works, however, malicious"
-                         " guests can trigger pinning of more memory than"
-                         " intended via an IOMMU. It's possible to mitigate "
-                         " by setting/adjusting RLIMIT_MEMLOCK.");
-    }
-
-    /*
-     * Translation truncates length to the IOMMU page size,
-     * check that it did not truncate too much.
-     */
-    if (len & iotlb->addr_mask) {
-        error_report("iommu has granularity incompatible with target AS");
-        return false;
-    }
-
-    if (vaddr) {
-        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
-    }
-
-    if (ram_addr) {
-        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
-    }
-
-    if (read_only) {
-        *read_only = !writable || mr->readonly;
-    }
-
-    return true;
-}
-
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
         bool read_only;
 
-        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
+        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
+                                  &address_space_memory)) {
             goto out;
         }
         /*
@@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     }
 
     rcu_read_lock();
-    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
+    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
+                             &address_space_memory)) {
         int ret;
 
         ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index bfb1de8eea..282de1d5ad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
 void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
                                              RamDiscardListener *rdl);
 
+bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
+                          ram_addr_t *ram_addr, bool *read_only,
+                          AddressSpace *as);
+
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7ba2048836..8586863ffa 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
     rdmc->unregister_listener(rdm, rdl);
 }
 
+/* Called with rcu_read_lock held.  */
+bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
+                          ram_addr_t *ram_addr, bool *read_only,
+                          AddressSpace *as)
+{
+    MemoryRegion *mr;
+    hwaddr xlat;
+    hwaddr len = iotlb->addr_mask + 1;
+    bool writable = iotlb->perm & IOMMU_WO;
+
+    /*
+     * The IOMMU TLB entry we have just covers translation through
+     * this IOMMU to its immediate target.  We need to translate
+     * it the rest of the way through to memory.
+     */
+    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
+                                 writable, MEMTXATTRS_UNSPECIFIED);
+    if (!memory_region_is_ram(mr)) {
+        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
+        return false;
+    } else if (memory_region_has_ram_discard_manager(mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
+        MemoryRegionSection tmp = {
+            .mr = mr,
+            .offset_within_region = xlat,
+            .size = int128_make64(len),
+        };
+
+        /*
+         * Malicious VMs can map memory into the IOMMU, which is expected
+         * to remain discarded. device will pin all pages, populating memory.
+         * Disallow that. vmstate priorities make sure any RamDiscardManager
+         * were already restored before IOMMUs are restored.
+         */
+        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
+            error_report("iommu map to discarded memory (e.g., unplugged via"
+                         " virtio-mem): %" HWADDR_PRIx "",
+                         iotlb->translated_addr);
+            return false;
+        }
+
+        /*
+         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
+         * pages will remain pinned inside device until unmapped, resulting in a
+         * higher memory consumption than expected. If memory would get
+         * populated again later, there would be an inconsistency between pages
+         * pinned by device and pages seen by QEMU. This is the case until
+         * unmapped from the IOMMU (e.g., during device reset).
+         *
+         * With malicious guests, we really only care about pinning more memory
+         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
+         * exceeded and can be used to mitigate this problem.
+         */
+        warn_report_once("Using device with vIOMMUs and coordinated discarding"
+                         " of RAM (e.g., virtio-mem) works, however, malicious"
+                         " guests can trigger pinning of more memory than"
+                         " intended via an IOMMU. It's possible to mitigate "
+                         " by setting/adjusting RLIMIT_MEMLOCK.");
+    }
+
+    /*
+     * Translation truncates length to the IOMMU page size,
+     * check that it did not truncate too much.
+     */
+    if (len & iotlb->addr_mask) {
+        error_report("iommu has granularity incompatible with target AS");
+        return false;
+    }
+
+    if (vaddr) {
+        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
+    }
+
+    if (ram_addr) {
+        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
+    }
+
+    if (read_only) {
+        *read_only = !writable || mr->readonly;
+    }
+
+    return true;
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
-- 
2.34.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU
  2022-10-27  7:40 [PATCH v4 0/2] vhost-vdpa: add support for vIOMMU Cindy Lu
  2022-10-27  7:40 ` [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c Cindy Lu
@ 2022-10-27  7:40 ` Cindy Lu
  2022-10-27  8:10   ` Jason Wang
  2022-10-29  7:57 ` [PATCH v4 0/2] " Michael S. Tsirkin
  2 siblings, 1 reply; 14+ messages in thread
From: Cindy Lu @ 2022-10-27  7:40 UTC (permalink / raw)
  To: lulu, alex.williamson, jasowang, mst, pbonzini, peterx, david,
	f4bug, sgarzare
  Cc: qemu-devel

Add support for vIOMMU. add the new function to deal with iommu MR.
- during iommu_region_add register a specific IOMMU notifier,
 and store all notifiers in a list.
- during iommu_region_del, compare and delete the IOMMU notifier from the list

Verified in vp_vdpa and vdpa_sim_net driver

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/virtio/vhost-vdpa.c         | 131 ++++++++++++++++++++++++++++++---
 include/hw/virtio/vhost-vdpa.h |  10 +++
 2 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3ff9ce3501..407f3e9ac2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -26,6 +26,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "hw/virtio/virtio-access.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -44,7 +45,6 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
                                                 uint64_t iova_min,
                                                 uint64_t iova_max)
 {
-    Int128 llend;
 
     if ((!memory_region_is_ram(section->mr) &&
          !memory_region_is_iommu(section->mr)) ||
@@ -61,14 +61,6 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
         return true;
     }
 
-    llend = vhost_vdpa_section_end(section);
-    if (int128_gt(llend, int128_make64(iova_max))) {
-        error_report("RAM section out of device range (max=0x%" PRIx64
-                     ", end addr=0x%" PRIx64 ")",
-                     iova_max, int128_get64(llend));
-        return true;
-    }
-
     return false;
 }
 
@@ -173,6 +165,115 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
     v->iotlb_batch_begin_sent = false;
 }
 
+static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
+
+    hwaddr iova = iotlb->iova + iommu->iommu_offset;
+    struct vhost_vdpa *v = iommu->dev;
+    void *vaddr;
+    int ret;
+
+    if (iotlb->target_as != &address_space_memory) {
+        error_report("Wrong target AS \"%s\", only system memory is allowed",
+                     iotlb->target_as->name ? iotlb->target_as->name : "none");
+        return;
+    }
+    RCU_READ_LOCK_GUARD();
+    vhost_vdpa_iotlb_batch_begin_once(v);
+
+    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        bool read_only;
+
+        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
+                                  &address_space_memory)) {
+            return;
+        }
+        ret =
+            vhost_vdpa_dma_map(v, iova, iotlb->addr_mask + 1, vaddr, read_only);
+        if (ret) {
+            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
+                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
+                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
+        }
+    } else {
+        ret = vhost_vdpa_dma_unmap(v, iova, iotlb->addr_mask + 1);
+        if (ret) {
+            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
+                         "0x%" HWADDR_PRIx ") = %d (%m)",
+                         v, iova, iotlb->addr_mask + 1, ret);
+        }
+    }
+}
+
+static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
+                                        MemoryRegionSection *section)
+{
+    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+    struct vdpa_iommu *iommu;
+    Int128 end;
+    int iommu_idx;
+    IOMMUMemoryRegion *iommu_mr;
+    int ret;
+
+    if (!memory_region_is_iommu(section->mr)) {
+        return;
+    }
+
+    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+
+    iommu = g_malloc0(sizeof(*iommu));
+    end =  int128_add(int128_make64(section->offset_within_region),
+        section->size);
+    end = int128_sub(end, int128_one());
+    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
+        MEMTXATTRS_UNSPECIFIED);
+
+    iommu->iommu_mr = iommu_mr;
+
+    iommu_notifier_init(
+        &iommu->n, vhost_vdpa_iommu_map_notify, IOMMU_NOTIFIER_IOTLB_EVENTS,
+        section->offset_within_region, int128_get64(end), iommu_idx);
+    iommu->iommu_offset =
+        section->offset_within_address_space - section->offset_within_region;
+    iommu->dev = v;
+
+    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
+    if (ret) {
+        g_free(iommu);
+        return;
+    }
+
+    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
+    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
+
+    return;
+}
+
+static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
+                                        MemoryRegionSection *section)
+{
+    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+    struct vdpa_iommu *iommu;
+
+    if (!memory_region_is_iommu(section->mr)) {
+        return;
+    }
+
+    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
+    {
+        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
+            iommu->n.start == section->offset_within_region) {
+            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
+            QLIST_REMOVE(iommu, iommu_next);
+            g_free(iommu);
+            break;
+        }
+    }
+}
+
 static void vhost_vdpa_listener_region_add(MemoryListener *listener,
                                            MemoryRegionSection *section)
 {
@@ -186,6 +287,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
                                             v->iova_range.last)) {
         return;
     }
+    if (memory_region_is_iommu(section->mr)) {
+        vhost_vdpa_iommu_region_add(listener, section);
+        return;
+    }
 
     if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
                  (section->offset_within_region & ~TARGET_PAGE_MASK))) {
@@ -260,6 +365,10 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
                                             v->iova_range.last)) {
         return;
     }
+    if (memory_region_is_iommu(section->mr)) {
+        vhost_vdpa_iommu_region_del(listener, section);
+        return;
+    }
 
     if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
                  (section->offset_within_region & ~TARGET_PAGE_MASK))) {
@@ -587,7 +696,6 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     v = dev->opaque;
     trace_vhost_vdpa_cleanup(dev, v);
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
-    memory_listener_unregister(&v->listener);
     vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
@@ -1127,7 +1235,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     }
 
     if (started) {
-        memory_listener_register(&v->listener, &address_space_memory);
+        memory_listener_register(&v->listener, dev->vdev->dma_as);
+
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
         vhost_vdpa_reset_device(dev);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d10a89303e..64a46e37cb 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -41,8 +41,18 @@ typedef struct vhost_vdpa {
     void *shadow_vq_ops_opaque;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
+    QLIST_HEAD(, vdpa_iommu) iommu_list;
+    IOMMUNotifier n;
 } VhostVDPA;
 
+struct vdpa_iommu {
+    struct vhost_vdpa *dev;
+    IOMMUMemoryRegion *iommu_mr;
+    hwaddr iommu_offset;
+    IOMMUNotifier n;
+    QLIST_ENTRY(vdpa_iommu) iommu_next;
+};
+
 int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
                        void *vaddr, bool readonly);
 int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
-- 
2.34.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU
  2022-10-27  7:40 ` [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU Cindy Lu
@ 2022-10-27  8:10   ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2022-10-27  8:10 UTC (permalink / raw)
  To: Cindy Lu
  Cc: alex.williamson, mst, pbonzini, peterx, david, f4bug, sgarzare,
	qemu-devel

On Thu, Oct 27, 2022 at 3:41 PM Cindy Lu <lulu@redhat.com> wrote:
>
> Add support for vIOMMU. add the new function to deal with iommu MR.
> - during iommu_region_add register a specific IOMMU notifier,
>  and store all notifiers in a list.
> - during iommu_region_del, compare and delete the IOMMU notifier from the list
>
> Verified in vp_vdpa and vdpa_sim_net driver
>
> Signed-off-by: Cindy Lu <lulu@redhat.com>

Acked-by: Jason Wang <jasowang@redhat.com>

(some nits, see below)

> ---
>  hw/virtio/vhost-vdpa.c         | 131 ++++++++++++++++++++++++++++++---
>  include/hw/virtio/vhost-vdpa.h |  10 +++
>  2 files changed, 130 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 3ff9ce3501..407f3e9ac2 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -26,6 +26,7 @@
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-access.h"
>
>  /*
>   * Return one past the end of the end of section. Be careful with uint64_t
> @@ -44,7 +45,6 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>                                                  uint64_t iova_min,
>                                                  uint64_t iova_max)
>  {
> -    Int128 llend;
>
>      if ((!memory_region_is_ram(section->mr) &&
>           !memory_region_is_iommu(section->mr)) ||
> @@ -61,14 +61,6 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>          return true;
>      }
>
> -    llend = vhost_vdpa_section_end(section);
> -    if (int128_gt(llend, int128_make64(iova_max))) {
> -        error_report("RAM section out of device range (max=0x%" PRIx64
> -                     ", end addr=0x%" PRIx64 ")",
> -                     iova_max, int128_get64(llend));
> -        return true;
> -    }
> -
>      return false;
>  }
>
> @@ -173,6 +165,115 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
>      v->iotlb_batch_begin_sent = false;
>  }
>
> +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> +{
> +    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> +
> +    hwaddr iova = iotlb->iova + iommu->iommu_offset;
> +    struct vhost_vdpa *v = iommu->dev;
> +    void *vaddr;
> +    int ret;
> +
> +    if (iotlb->target_as != &address_space_memory) {
> +        error_report("Wrong target AS \"%s\", only system memory is allowed",
> +                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> +        return;
> +    }
> +    RCU_READ_LOCK_GUARD();
> +    vhost_vdpa_iotlb_batch_begin_once(v);
> +
> +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> +        bool read_only;
> +
> +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +                                  &address_space_memory)) {
> +            return;
> +        }
> +        ret =
> +            vhost_vdpa_dma_map(v, iova, iotlb->addr_mask + 1, vaddr, read_only);
> +        if (ret) {
> +            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> +                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> +                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
> +        }
> +    } else {
> +        ret = vhost_vdpa_dma_unmap(v, iova, iotlb->addr_mask + 1);
> +        if (ret) {
> +            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> +                         "0x%" HWADDR_PRIx ") = %d (%m)",
> +                         v, iova, iotlb->addr_mask + 1, ret);
> +        }
> +    }
> +}
> +
> +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> +                                        MemoryRegionSection *section)
> +{
> +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +
> +    struct vdpa_iommu *iommu;
> +    Int128 end;
> +    int iommu_idx;
> +    IOMMUMemoryRegion *iommu_mr;
> +    int ret;
> +
> +    if (!memory_region_is_iommu(section->mr)) {
> +        return;

Nit: So we had already had one check in the caller, there's no need to
check twice. (this could be done on top).

> +    }
> +
> +    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> +
> +    iommu = g_malloc0(sizeof(*iommu));
> +    end =  int128_add(int128_make64(section->offset_within_region),
> +        section->size);
> +    end = int128_sub(end, int128_one());
> +    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> +        MEMTXATTRS_UNSPECIFIED);
> +
> +    iommu->iommu_mr = iommu_mr;
> +
> +    iommu_notifier_init(
> +        &iommu->n, vhost_vdpa_iommu_map_notify, IOMMU_NOTIFIER_IOTLB_EVENTS,
> +        section->offset_within_region, int128_get64(end), iommu_idx);
> +    iommu->iommu_offset =
> +        section->offset_within_address_space - section->offset_within_region;
> +    iommu->dev = v;
> +
> +    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> +    if (ret) {
> +        g_free(iommu);
> +        return;
> +    }
> +
> +    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
> +    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
> +
> +    return;
> +}
> +
> +static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
> +                                        MemoryRegionSection *section)
> +{
> +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +
> +    struct vdpa_iommu *iommu;
> +
> +    if (!memory_region_is_iommu(section->mr)) {
> +        return;

Ditto.

Thanks

> +    }
> +
> +    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
> +    {
> +        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
> +            iommu->n.start == section->offset_within_region) {
> +            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
> +            QLIST_REMOVE(iommu, iommu_next);
> +            g_free(iommu);
> +            break;
> +        }
> +    }
> +}
> +
>  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>                                             MemoryRegionSection *section)
>  {
> @@ -186,6 +287,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>                                              v->iova_range.last)) {
>          return;
>      }
> +    if (memory_region_is_iommu(section->mr)) {
> +        vhost_vdpa_iommu_region_add(listener, section);
> +        return;
> +    }
>
>      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
>                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> @@ -260,6 +365,10 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>                                              v->iova_range.last)) {
>          return;
>      }
> +    if (memory_region_is_iommu(section->mr)) {
> +        vhost_vdpa_iommu_region_del(listener, section);
> +        return;
> +    }
>
>      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
>                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> @@ -587,7 +696,6 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>      v = dev->opaque;
>      trace_vhost_vdpa_cleanup(dev, v);
>      vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> -    memory_listener_unregister(&v->listener);
>      vhost_vdpa_svq_cleanup(dev);
>
>      dev->opaque = NULL;
> @@ -1127,7 +1235,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>      }
>
>      if (started) {
> -        memory_listener_register(&v->listener, &address_space_memory);
> +        memory_listener_register(&v->listener, dev->vdev->dma_as);
> +
>          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>      } else {
>          vhost_vdpa_reset_device(dev);
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index d10a89303e..64a46e37cb 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -41,8 +41,18 @@ typedef struct vhost_vdpa {
>      void *shadow_vq_ops_opaque;
>      struct vhost_dev *dev;
>      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> +    QLIST_HEAD(, vdpa_iommu) iommu_list;
> +    IOMMUNotifier n;
>  } VhostVDPA;
>
> +struct vdpa_iommu {
> +    struct vhost_vdpa *dev;
> +    IOMMUMemoryRegion *iommu_mr;
> +    hwaddr iommu_offset;
> +    IOMMUNotifier n;
> +    QLIST_ENTRY(vdpa_iommu) iommu_next;
> +};
> +
>  int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
>                         void *vaddr, bool readonly);
>  int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
> --
> 2.34.3
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-27  7:40 ` [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c Cindy Lu
@ 2022-10-27  8:11   ` Jason Wang
  2022-10-27 15:30   ` Peter Xu
  2022-10-27 21:11   ` Alex Williamson
  2 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2022-10-27  8:11 UTC (permalink / raw)
  To: Cindy Lu
  Cc: alex.williamson, mst, pbonzini, peterx, david, f4bug, sgarzare,
	qemu-devel

On Thu, Oct 27, 2022 at 3:41 PM Cindy Lu <lulu@redhat.com> wrote:
>
> Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> change the name to memory_get_xlat_addr().So we can use this
> function in other devices,such as vDPA device.
>
> Signed-off-by: Cindy Lu <lulu@redhat.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>  hw/vfio/common.c      | 92 ++-----------------------------------------
>  include/exec/memory.h |  4 ++
>  softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 92 insertions(+), 88 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9b..2b5a9f3d8d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>             section->offset_within_address_space & (1ULL << 63);
>  }
>
> -/* Called with rcu_read_lock held.  */
> -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> -                               ram_addr_t *ram_addr, bool *read_only)
> -{
> -    MemoryRegion *mr;
> -    hwaddr xlat;
> -    hwaddr len = iotlb->addr_mask + 1;
> -    bool writable = iotlb->perm & IOMMU_WO;
> -
> -    /*
> -     * The IOMMU TLB entry we have just covers translation through
> -     * this IOMMU to its immediate target.  We need to translate
> -     * it the rest of the way through to memory.
> -     */
> -    mr = address_space_translate(&address_space_memory,
> -                                 iotlb->translated_addr,
> -                                 &xlat, &len, writable,
> -                                 MEMTXATTRS_UNSPECIFIED);
> -    if (!memory_region_is_ram(mr)) {
> -        error_report("iommu map to non memory area %"HWADDR_PRIx"",
> -                     xlat);
> -        return false;
> -    } else if (memory_region_has_ram_discard_manager(mr)) {
> -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> -        MemoryRegionSection tmp = {
> -            .mr = mr,
> -            .offset_within_region = xlat,
> -            .size = int128_make64(len),
> -        };
> -
> -        /*
> -         * Malicious VMs can map memory into the IOMMU, which is expected
> -         * to remain discarded. vfio will pin all pages, populating memory.
> -         * Disallow that. vmstate priorities make sure any RamDiscardManager
> -         * were already restored before IOMMUs are restored.
> -         */
> -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> -            error_report("iommu map to discarded memory (e.g., unplugged via"
> -                         " virtio-mem): %"HWADDR_PRIx"",
> -                         iotlb->translated_addr);
> -            return false;
> -        }
> -
> -        /*
> -         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> -         * pages will remain pinned inside vfio until unmapped, resulting in a
> -         * higher memory consumption than expected. If memory would get
> -         * populated again later, there would be an inconsistency between pages
> -         * pinned by vfio and pages seen by QEMU. This is the case until
> -         * unmapped from the IOMMU (e.g., during device reset).
> -         *
> -         * With malicious guests, we really only care about pinning more memory
> -         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> -         * exceeded and can be used to mitigate this problem.
> -         */
> -        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
> -                         " RAM (e.g., virtio-mem) works, however, malicious"
> -                         " guests can trigger pinning of more memory than"
> -                         " intended via an IOMMU. It's possible to mitigate "
> -                         " by setting/adjusting RLIMIT_MEMLOCK.");
> -    }
> -
> -    /*
> -     * Translation truncates length to the IOMMU page size,
> -     * check that it did not truncate too much.
> -     */
> -    if (len & iotlb->addr_mask) {
> -        error_report("iommu has granularity incompatible with target AS");
> -        return false;
> -    }
> -
> -    if (vaddr) {
> -        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -    }
> -
> -    if (ram_addr) {
> -        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> -    }
> -
> -    if (read_only) {
> -        *read_only = !writable || mr->readonly;
> -    }
> -
> -    return true;
> -}
> -
>  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>  {
>      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>      if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>          bool read_only;
>
> -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +                                  &address_space_memory)) {
>              goto out;
>          }
>          /*
> @@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>      }
>
>      rcu_read_lock();
> -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> +    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
> +                             &address_space_memory)) {
>          int ret;
>
>          ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index bfb1de8eea..282de1d5ad 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
>  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
>                                               RamDiscardListener *rdl);
>
> +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> +                          ram_addr_t *ram_addr, bool *read_only,
> +                          AddressSpace *as);
> +
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7ba2048836..8586863ffa 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
>      rdmc->unregister_listener(rdm, rdl);
>  }
>
> +/* Called with rcu_read_lock held.  */
> +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> +                          ram_addr_t *ram_addr, bool *read_only,
> +                          AddressSpace *as)
> +{
> +    MemoryRegion *mr;
> +    hwaddr xlat;
> +    hwaddr len = iotlb->addr_mask + 1;
> +    bool writable = iotlb->perm & IOMMU_WO;
> +
> +    /*
> +     * The IOMMU TLB entry we have just covers translation through
> +     * this IOMMU to its immediate target.  We need to translate
> +     * it the rest of the way through to memory.
> +     */
> +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> +                                 writable, MEMTXATTRS_UNSPECIFIED);
> +    if (!memory_region_is_ram(mr)) {
> +        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
> +        return false;
> +    } else if (memory_region_has_ram_discard_manager(mr)) {
> +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> +        MemoryRegionSection tmp = {
> +            .mr = mr,
> +            .offset_within_region = xlat,
> +            .size = int128_make64(len),
> +        };
> +
> +        /*
> +         * Malicious VMs can map memory into the IOMMU, which is expected
> +         * to remain discarded. device will pin all pages, populating memory.
> +         * Disallow that. vmstate priorities make sure any RamDiscardManager
> +         * were already restored before IOMMUs are restored.
> +         */
> +        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> +            error_report("iommu map to discarded memory (e.g., unplugged via"
> +                         " virtio-mem): %" HWADDR_PRIx "",
> +                         iotlb->translated_addr);
> +            return false;
> +        }
> +
> +        /*
> +         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> +         * pages will remain pinned inside device until unmapped, resulting in a
> +         * higher memory consumption than expected. If memory would get
> +         * populated again later, there would be an inconsistency between pages
> +         * pinned by device and pages seen by QEMU. This is the case until
> +         * unmapped from the IOMMU (e.g., during device reset).
> +         *
> +         * With malicious guests, we really only care about pinning more memory
> +         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> +         * exceeded and can be used to mitigate this problem.
> +         */
> +        warn_report_once("Using device with vIOMMUs and coordinated discarding"
> +                         " of RAM (e.g., virtio-mem) works, however, malicious"
> +                         " guests can trigger pinning of more memory than"
> +                         " intended via an IOMMU. It's possible to mitigate "
> +                         " by setting/adjusting RLIMIT_MEMLOCK.");
> +    }
> +
> +    /*
> +     * Translation truncates length to the IOMMU page size,
> +     * check that it did not truncate too much.
> +     */
> +    if (len & iotlb->addr_mask) {
> +        error_report("iommu has granularity incompatible with target AS");
> +        return false;
> +    }
> +
> +    if (vaddr) {
> +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> +    }
> +
> +    if (ram_addr) {
> +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> +    }
> +
> +    if (read_only) {
> +        *read_only = !writable || mr->readonly;
> +    }
> +
> +    return true;
> +}
> +
>  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>  {
>      uint8_t mask = 1 << client;
> --
> 2.34.3
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-27  7:40 ` [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c Cindy Lu
  2022-10-27  8:11   ` Jason Wang
@ 2022-10-27 15:30   ` Peter Xu
  2022-10-27 21:11   ` Alex Williamson
  2 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2022-10-27 15:30 UTC (permalink / raw)
  To: Cindy Lu
  Cc: alex.williamson, jasowang, mst, pbonzini, david, f4bug, sgarzare,
	qemu-devel

On Thu, Oct 27, 2022 at 03:40:31PM +0800, Cindy Lu wrote:
> Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> change the name to memory_get_xlat_addr().So we can use this
> function in other devices,such as vDPA device.
> 
> Signed-off-by: Cindy Lu <lulu@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>

Trivial nit below.

[...]

> +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> +                          ram_addr_t *ram_addr, bool *read_only,
> +                          AddressSpace *as)
> +{
> +    MemoryRegion *mr;
> +    hwaddr xlat;
> +    hwaddr len = iotlb->addr_mask + 1;
> +    bool writable = iotlb->perm & IOMMU_WO;
> +
> +    /*
> +     * The IOMMU TLB entry we have just covers translation through
> +     * this IOMMU to its immediate target.  We need to translate
> +     * it the rest of the way through to memory.
> +     */
> +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> +                                 writable, MEMTXATTRS_UNSPECIFIED);

Can "as" be anything not address_space_memory in this case?

I had a feeling that you wanted to check iotlb->target_as only by peeking
at the next patch, but that can also be checked in this function too, and
the new parameter may not be needed.  Another benefit is we can also drop
the same check in vfio_iommu_map_notify() and
vfio_iommu_map_dirty_notify():

    if (iotlb->target_as != &address_space_memory) {
        error_report("Wrong target AS \"%s\", only system memory is allowed",
                     iotlb->target_as->name ? iotlb->target_as->name : "none");
        return;
    }

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-27  7:40 ` [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c Cindy Lu
  2022-10-27  8:11   ` Jason Wang
  2022-10-27 15:30   ` Peter Xu
@ 2022-10-27 21:11   ` Alex Williamson
  2022-10-28  1:50     ` Jason Wang
  2 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2022-10-27 21:11 UTC (permalink / raw)
  To: Cindy Lu
  Cc: jasowang, mst, pbonzini, peterx, david, f4bug, sgarzare, qemu-devel

On Thu, 27 Oct 2022 15:40:31 +0800
Cindy Lu <lulu@redhat.com> wrote:

> Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> change the name to memory_get_xlat_addr().So we can use this
> function in other devices,such as vDPA device.
> 
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
>  hw/vfio/common.c      | 92 ++-----------------------------------------
>  include/exec/memory.h |  4 ++
>  softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 92 insertions(+), 88 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9b..2b5a9f3d8d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>             section->offset_within_address_space & (1ULL << 63);
>  }
>  
> -/* Called with rcu_read_lock held.  */
> -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> -                               ram_addr_t *ram_addr, bool *read_only)
> -{
> -    MemoryRegion *mr;
> -    hwaddr xlat;
> -    hwaddr len = iotlb->addr_mask + 1;
> -    bool writable = iotlb->perm & IOMMU_WO;
> -
> -    /*
> -     * The IOMMU TLB entry we have just covers translation through
> -     * this IOMMU to its immediate target.  We need to translate
> -     * it the rest of the way through to memory.
> -     */
> -    mr = address_space_translate(&address_space_memory,
> -                                 iotlb->translated_addr,
> -                                 &xlat, &len, writable,
> -                                 MEMTXATTRS_UNSPECIFIED);
> -    if (!memory_region_is_ram(mr)) {
> -        error_report("iommu map to non memory area %"HWADDR_PRIx"",
> -                     xlat);
> -        return false;
> -    } else if (memory_region_has_ram_discard_manager(mr)) {
> -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> -        MemoryRegionSection tmp = {
> -            .mr = mr,
> -            .offset_within_region = xlat,
> -            .size = int128_make64(len),
> -        };
> -
> -        /*
> -         * Malicious VMs can map memory into the IOMMU, which is expected
> -         * to remain discarded. vfio will pin all pages, populating memory.
> -         * Disallow that. vmstate priorities make sure any RamDiscardManager
> -         * were already restored before IOMMUs are restored.
> -         */
> -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> -            error_report("iommu map to discarded memory (e.g., unplugged via"
> -                         " virtio-mem): %"HWADDR_PRIx"",
> -                         iotlb->translated_addr);
> -            return false;
> -        }
> -
> -        /*
> -         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> -         * pages will remain pinned inside vfio until unmapped, resulting in a
> -         * higher memory consumption than expected. If memory would get
> -         * populated again later, there would be an inconsistency between pages
> -         * pinned by vfio and pages seen by QEMU. This is the case until
> -         * unmapped from the IOMMU (e.g., during device reset).
> -         *
> -         * With malicious guests, we really only care about pinning more memory
> -         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> -         * exceeded and can be used to mitigate this problem.
> -         */
> -        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
> -                         " RAM (e.g., virtio-mem) works, however, malicious"
> -                         " guests can trigger pinning of more memory than"
> -                         " intended via an IOMMU. It's possible to mitigate "
> -                         " by setting/adjusting RLIMIT_MEMLOCK.");
> -    }
> -
> -    /*
> -     * Translation truncates length to the IOMMU page size,
> -     * check that it did not truncate too much.
> -     */
> -    if (len & iotlb->addr_mask) {
> -        error_report("iommu has granularity incompatible with target AS");
> -        return false;
> -    }
> -
> -    if (vaddr) {
> -        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -    }
> -
> -    if (ram_addr) {
> -        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> -    }
> -
> -    if (read_only) {
> -        *read_only = !writable || mr->readonly;
> -    }
> -
> -    return true;
> -}
> -
>  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>  {
>      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>      if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>          bool read_only;
>  
> -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +                                  &address_space_memory)) {
>              goto out;
>          }
>          /*
> @@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>      }
>  
>      rcu_read_lock();
> -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> +    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
> +                             &address_space_memory)) {
>          int ret;
>  
>          ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index bfb1de8eea..282de1d5ad 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
>  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
>                                               RamDiscardListener *rdl);
>  
> +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> +                          ram_addr_t *ram_addr, bool *read_only,
> +                          AddressSpace *as);
> +
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>  
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7ba2048836..8586863ffa 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
>      rdmc->unregister_listener(rdm, rdl);
>  }
>  
> +/* Called with rcu_read_lock held.  */
> +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> +                          ram_addr_t *ram_addr, bool *read_only,
> +                          AddressSpace *as)
> +{
> +    MemoryRegion *mr;
> +    hwaddr xlat;
> +    hwaddr len = iotlb->addr_mask + 1;
> +    bool writable = iotlb->perm & IOMMU_WO;
> +
> +    /*
> +     * The IOMMU TLB entry we have just covers translation through
> +     * this IOMMU to its immediate target.  We need to translate
> +     * it the rest of the way through to memory.
> +     */
> +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> +                                 writable, MEMTXATTRS_UNSPECIFIED);
> +    if (!memory_region_is_ram(mr)) {
> +        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
> +        return false;
> +    } else if (memory_region_has_ram_discard_manager(mr)) {
> +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> +        MemoryRegionSection tmp = {
> +            .mr = mr,
> +            .offset_within_region = xlat,
> +            .size = int128_make64(len),
> +        };
> +
> +        /*
> +         * Malicious VMs can map memory into the IOMMU, which is expected
> +         * to remain discarded. device will pin all pages, populating memory.
> +         * Disallow that. vmstate priorities make sure any RamDiscardManager
> +         * were already restored before IOMMUs are restored.
> +         */
> +        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> +            error_report("iommu map to discarded memory (e.g., unplugged via"
> +                         " virtio-mem): %" HWADDR_PRIx "",
> +                         iotlb->translated_addr);
> +            return false;
> +        }
> +
> +        /*
> +         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> +         * pages will remain pinned inside device until unmapped, resulting in a
> +         * higher memory consumption than expected. If memory would get
> +         * populated again later, there would be an inconsistency between pages
> +         * pinned by device and pages seen by QEMU. This is the case until
> +         * unmapped from the IOMMU (e.g., during device reset).
> +         *
> +         * With malicious guests, we really only care about pinning more memory
> +         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> +         * exceeded and can be used to mitigate this problem.
> +         */
> +        warn_report_once("Using device with vIOMMUs and coordinated discarding"
> +                         " of RAM (e.g., virtio-mem) works, however, malicious"
> +                         " guests can trigger pinning of more memory than"
> +                         " intended via an IOMMU. It's possible to mitigate "
> +                         " by setting/adjusting RLIMIT_MEMLOCK.");

Is this really fit to be in shared code?  Simply replacing "vfio" with
"device" for comments and warnings that are really of concern for a
specific use case doesn't look much better to me.

I think translating an unpopulated address, as in the previous test
above, is generally invalid, but the comment is certainly trying to
frame the severity of this error relative to a specific use case.

Here we're generating an unconditional warning, assuming that this code
path has been triggered by device code, for the condition of simply
asking for a translation to a MemoryRegion under discard manager
control?  Again, isn't that an action that has implications for a
specific use case of a device that supports pinning host memory?

Should the shared code be generating this warning, or could an optional
pointer arg be updated to indicate a translation to discard manager
controlled memory and this comment and warning should remain in the
caller?  Thanks,

Alex

> +    }
> +
> +    /*
> +     * Translation truncates length to the IOMMU page size,
> +     * check that it did not truncate too much.
> +     */
> +    if (len & iotlb->addr_mask) {
> +        error_report("iommu has granularity incompatible with target AS");
> +        return false;
> +    }
> +
> +    if (vaddr) {
> +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> +    }
> +
> +    if (ram_addr) {
> +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> +    }
> +
> +    if (read_only) {
> +        *read_only = !writable || mr->readonly;
> +    }
> +
> +    return true;
> +}
> +
>  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>  {
>      uint8_t mask = 1 << client;



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-27 21:11   ` Alex Williamson
@ 2022-10-28  1:50     ` Jason Wang
  2022-10-28  2:08       ` Alex Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2022-10-28  1:50 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cindy Lu, mst, pbonzini, peterx, david, f4bug, sgarzare, qemu-devel

On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Thu, 27 Oct 2022 15:40:31 +0800
> Cindy Lu <lulu@redhat.com> wrote:
>
> > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > change the name to memory_get_xlat_addr().So we can use this
> > function in other devices,such as vDPA device.
> >
> > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > ---
> >  hw/vfio/common.c      | 92 ++-----------------------------------------
> >  include/exec/memory.h |  4 ++
> >  softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 92 insertions(+), 88 deletions(-)
> >
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index ace9562a9b..2b5a9f3d8d 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> >             section->offset_within_address_space & (1ULL << 63);
> >  }
> >
> > -/* Called with rcu_read_lock held.  */
> > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > -                               ram_addr_t *ram_addr, bool *read_only)
> > -{
> > -    MemoryRegion *mr;
> > -    hwaddr xlat;
> > -    hwaddr len = iotlb->addr_mask + 1;
> > -    bool writable = iotlb->perm & IOMMU_WO;
> > -
> > -    /*
> > -     * The IOMMU TLB entry we have just covers translation through
> > -     * this IOMMU to its immediate target.  We need to translate
> > -     * it the rest of the way through to memory.
> > -     */
> > -    mr = address_space_translate(&address_space_memory,
> > -                                 iotlb->translated_addr,
> > -                                 &xlat, &len, writable,
> > -                                 MEMTXATTRS_UNSPECIFIED);
> > -    if (!memory_region_is_ram(mr)) {
> > -        error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > -                     xlat);
> > -        return false;
> > -    } else if (memory_region_has_ram_discard_manager(mr)) {
> > -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > -        MemoryRegionSection tmp = {
> > -            .mr = mr,
> > -            .offset_within_region = xlat,
> > -            .size = int128_make64(len),
> > -        };
> > -
> > -        /*
> > -         * Malicious VMs can map memory into the IOMMU, which is expected
> > -         * to remain discarded. vfio will pin all pages, populating memory.
> > -         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > -         * were already restored before IOMMUs are restored.
> > -         */
> > -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > -            error_report("iommu map to discarded memory (e.g., unplugged via"
> > -                         " virtio-mem): %"HWADDR_PRIx"",
> > -                         iotlb->translated_addr);
> > -            return false;
> > -        }
> > -
> > -        /*
> > -         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > -         * pages will remain pinned inside vfio until unmapped, resulting in a
> > -         * higher memory consumption than expected. If memory would get
> > -         * populated again later, there would be an inconsistency between pages
> > -         * pinned by vfio and pages seen by QEMU. This is the case until
> > -         * unmapped from the IOMMU (e.g., during device reset).
> > -         *
> > -         * With malicious guests, we really only care about pinning more memory
> > -         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > -         * exceeded and can be used to mitigate this problem.
> > -         */
> > -        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
> > -                         " RAM (e.g., virtio-mem) works, however, malicious"
> > -                         " guests can trigger pinning of more memory than"
> > -                         " intended via an IOMMU. It's possible to mitigate "
> > -                         " by setting/adjusting RLIMIT_MEMLOCK.");
> > -    }
> > -
> > -    /*
> > -     * Translation truncates length to the IOMMU page size,
> > -     * check that it did not truncate too much.
> > -     */
> > -    if (len & iotlb->addr_mask) {
> > -        error_report("iommu has granularity incompatible with target AS");
> > -        return false;
> > -    }
> > -
> > -    if (vaddr) {
> > -        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > -    }
> > -
> > -    if (ram_addr) {
> > -        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > -    }
> > -
> > -    if (read_only) {
> > -        *read_only = !writable || mr->readonly;
> > -    }
> > -
> > -    return true;
> > -}
> > -
> >  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >  {
> >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >      if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> >          bool read_only;
> >
> > -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> > +                                  &address_space_memory)) {
> >              goto out;
> >          }
> >          /*
> > @@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >      }
> >
> >      rcu_read_lock();
> > -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> > +    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
> > +                             &address_space_memory)) {
> >          int ret;
> >
> >          ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index bfb1de8eea..282de1d5ad 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
> >  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> >                                               RamDiscardListener *rdl);
> >
> > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > +                          ram_addr_t *ram_addr, bool *read_only,
> > +                          AddressSpace *as);
> > +
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> >  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> >
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index 7ba2048836..8586863ffa 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> >      rdmc->unregister_listener(rdm, rdl);
> >  }
> >
> > +/* Called with rcu_read_lock held.  */
> > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > +                          ram_addr_t *ram_addr, bool *read_only,
> > +                          AddressSpace *as)
> > +{
> > +    MemoryRegion *mr;
> > +    hwaddr xlat;
> > +    hwaddr len = iotlb->addr_mask + 1;
> > +    bool writable = iotlb->perm & IOMMU_WO;
> > +
> > +    /*
> > +     * The IOMMU TLB entry we have just covers translation through
> > +     * this IOMMU to its immediate target.  We need to translate
> > +     * it the rest of the way through to memory.
> > +     */
> > +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> > +                                 writable, MEMTXATTRS_UNSPECIFIED);
> > +    if (!memory_region_is_ram(mr)) {
> > +        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
> > +        return false;
> > +    } else if (memory_region_has_ram_discard_manager(mr)) {
> > +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > +        MemoryRegionSection tmp = {
> > +            .mr = mr,
> > +            .offset_within_region = xlat,
> > +            .size = int128_make64(len),
> > +        };
> > +
> > +        /*
> > +         * Malicious VMs can map memory into the IOMMU, which is expected
> > +         * to remain discarded. device will pin all pages, populating memory.
> > +         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > +         * were already restored before IOMMUs are restored.
> > +         */
> > +        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > +            error_report("iommu map to discarded memory (e.g., unplugged via"
> > +                         " virtio-mem): %" HWADDR_PRIx "",
> > +                         iotlb->translated_addr);
> > +            return false;
> > +        }
> > +
> > +        /*
> > +         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > +         * pages will remain pinned inside device until unmapped, resulting in a
> > +         * higher memory consumption than expected. If memory would get
> > +         * populated again later, there would be an inconsistency between pages
> > +         * pinned by device and pages seen by QEMU. This is the case until
> > +         * unmapped from the IOMMU (e.g., during device reset).
> > +         *
> > +         * With malicious guests, we really only care about pinning more memory
> > +         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > +         * exceeded and can be used to mitigate this problem.
> > +         */
> > +        warn_report_once("Using device with vIOMMUs and coordinated discarding"
> > +                         " of RAM (e.g., virtio-mem) works, however, malicious"
> > +                         " guests can trigger pinning of more memory than"
> > +                         " intended via an IOMMU. It's possible to mitigate "
> > +                         " by setting/adjusting RLIMIT_MEMLOCK.");
>
> Is this really fit to be in shared code?  Simply replacing "vfio" with
> "device" for comments and warnings that are really of concern for a
> specific use case doesn't look much better to me.
>
> I think translating an unpopulated address, as in the previous test
> above, is generally invalid, but the comment is certainly trying to
> frame the severity of this error relative to a specific use case.
>
> Here we're generating an unconditional warning, assuming that this code
> path has been triggered by device code, for the condition of simply
> asking for a translation to a MemoryRegion under discard manager
> control?  Again, isn't that an action that has implications for a
> specific use case of a device that supports pinning host memory?


Or can we rename the function to memory_get_xlat_addr_no_discard()?
This looks more general and fit for the caller that doesn't want to
map region that has a discard manager.

>
> Should the shared code be generating this warning, or could an optional
> pointer arg be updated to indicate a translation to discard manager
> controlled memory and this comment and warning should remain in the
> caller?  Thanks,

I think this should also work.

Thanks

>
> Alex
>
> > +    }
> > +
> > +    /*
> > +     * Translation truncates length to the IOMMU page size,
> > +     * check that it did not truncate too much.
> > +     */
> > +    if (len & iotlb->addr_mask) {
> > +        error_report("iommu has granularity incompatible with target AS");
> > +        return false;
> > +    }
> > +
> > +    if (vaddr) {
> > +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > +    }
> > +
> > +    if (ram_addr) {
> > +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > +    }
> > +
> > +    if (read_only) {
> > +        *read_only = !writable || mr->readonly;
> > +    }
> > +
> > +    return true;
> > +}
> > +
> >  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> >  {
> >      uint8_t mask = 1 << client;
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-28  1:50     ` Jason Wang
@ 2022-10-28  2:08       ` Alex Williamson
  2022-10-28  2:23         ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2022-10-28  2:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cindy Lu, mst, pbonzini, peterx, david, f4bug, sgarzare, qemu-devel

On Fri, 28 Oct 2022 09:50:10 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > On Thu, 27 Oct 2022 15:40:31 +0800
> > Cindy Lu <lulu@redhat.com> wrote:
> >  
> > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > change the name to memory_get_xlat_addr().So we can use this
> > > function in other devices,such as vDPA device.
> > >
> > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > ---
> > >  hw/vfio/common.c      | 92 ++-----------------------------------------
> > >  include/exec/memory.h |  4 ++
> > >  softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > >
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index ace9562a9b..2b5a9f3d8d 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > >             section->offset_within_address_space & (1ULL << 63);
> > >  }
> > >
> > > -/* Called with rcu_read_lock held.  */
> > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > -                               ram_addr_t *ram_addr, bool *read_only)
> > > -{
> > > -    MemoryRegion *mr;
> > > -    hwaddr xlat;
> > > -    hwaddr len = iotlb->addr_mask + 1;
> > > -    bool writable = iotlb->perm & IOMMU_WO;
> > > -
> > > -    /*
> > > -     * The IOMMU TLB entry we have just covers translation through
> > > -     * this IOMMU to its immediate target.  We need to translate
> > > -     * it the rest of the way through to memory.
> > > -     */
> > > -    mr = address_space_translate(&address_space_memory,
> > > -                                 iotlb->translated_addr,
> > > -                                 &xlat, &len, writable,
> > > -                                 MEMTXATTRS_UNSPECIFIED);
> > > -    if (!memory_region_is_ram(mr)) {
> > > -        error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > -                     xlat);
> > > -        return false;
> > > -    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > -        MemoryRegionSection tmp = {
> > > -            .mr = mr,
> > > -            .offset_within_region = xlat,
> > > -            .size = int128_make64(len),
> > > -        };
> > > -
> > > -        /*
> > > -         * Malicious VMs can map memory into the IOMMU, which is expected
> > > -         * to remain discarded. vfio will pin all pages, populating memory.
> > > -         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > -         * were already restored before IOMMUs are restored.
> > > -         */
> > > -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > -            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > -                         " virtio-mem): %"HWADDR_PRIx"",
> > > -                         iotlb->translated_addr);
> > > -            return false;
> > > -        }
> > > -
> > > -        /*
> > > -         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > -         * pages will remain pinned inside vfio until unmapped, resulting in a
> > > -         * higher memory consumption than expected. If memory would get
> > > -         * populated again later, there would be an inconsistency between pages
> > > -         * pinned by vfio and pages seen by QEMU. This is the case until
> > > -         * unmapped from the IOMMU (e.g., during device reset).
> > > -         *
> > > -         * With malicious guests, we really only care about pinning more memory
> > > -         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > -         * exceeded and can be used to mitigate this problem.
> > > -         */
> > > -        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
> > > -                         " RAM (e.g., virtio-mem) works, however, malicious"
> > > -                         " guests can trigger pinning of more memory than"
> > > -                         " intended via an IOMMU. It's possible to mitigate "
> > > -                         " by setting/adjusting RLIMIT_MEMLOCK.");
> > > -    }
> > > -
> > > -    /*
> > > -     * Translation truncates length to the IOMMU page size,
> > > -     * check that it did not truncate too much.
> > > -     */
> > > -    if (len & iotlb->addr_mask) {
> > > -        error_report("iommu has granularity incompatible with target AS");
> > > -        return false;
> > > -    }
> > > -
> > > -    if (vaddr) {
> > > -        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > -    }
> > > -
> > > -    if (ram_addr) {
> > > -        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > -    }
> > > -
> > > -    if (read_only) {
> > > -        *read_only = !writable || mr->readonly;
> > > -    }
> > > -
> > > -    return true;
> > > -}
> > > -
> > >  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > >  {
> > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > >      if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > >          bool read_only;
> > >
> > > -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> > > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> > > +                                  &address_space_memory)) {
> > >              goto out;
> > >          }
> > >          /*
> > > @@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > >      }
> > >
> > >      rcu_read_lock();
> > > -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> > > +    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
> > > +                             &address_space_memory)) {
> > >          int ret;
> > >
> > >          ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index bfb1de8eea..282de1d5ad 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
> > >  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > >                                               RamDiscardListener *rdl);
> > >
> > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > +                          AddressSpace *as);
> > > +
> > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > >  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> > >
> > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > index 7ba2048836..8586863ffa 100644
> > > --- a/softmmu/memory.c
> > > +++ b/softmmu/memory.c
> > > @@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > >      rdmc->unregister_listener(rdm, rdl);
> > >  }
> > >
> > > +/* Called with rcu_read_lock held.  */
> > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > +                          AddressSpace *as)
> > > +{
> > > +    MemoryRegion *mr;
> > > +    hwaddr xlat;
> > > +    hwaddr len = iotlb->addr_mask + 1;
> > > +    bool writable = iotlb->perm & IOMMU_WO;
> > > +
> > > +    /*
> > > +     * The IOMMU TLB entry we have just covers translation through
> > > +     * this IOMMU to its immediate target.  We need to translate
> > > +     * it the rest of the way through to memory.
> > > +     */
> > > +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> > > +                                 writable, MEMTXATTRS_UNSPECIFIED);
> > > +    if (!memory_region_is_ram(mr)) {
> > > +        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
> > > +        return false;
> > > +    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > +        MemoryRegionSection tmp = {
> > > +            .mr = mr,
> > > +            .offset_within_region = xlat,
> > > +            .size = int128_make64(len),
> > > +        };
> > > +
> > > +        /*
> > > +         * Malicious VMs can map memory into the IOMMU, which is expected
> > > +         * to remain discarded. device will pin all pages, populating memory.
> > > +         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > +         * were already restored before IOMMUs are restored.
> > > +         */
> > > +        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > +            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > +                         " virtio-mem): %" HWADDR_PRIx "",
> > > +                         iotlb->translated_addr);
> > > +            return false;
> > > +        }
> > > +
> > > +        /*
> > > +         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > +         * pages will remain pinned inside device until unmapped, resulting in a
> > > +         * higher memory consumption than expected. If memory would get
> > > +         * populated again later, there would be an inconsistency between pages
> > > +         * pinned by device and pages seen by QEMU. This is the case until
> > > +         * unmapped from the IOMMU (e.g., during device reset).
> > > +         *
> > > +         * With malicious guests, we really only care about pinning more memory
> > > +         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > +         * exceeded and can be used to mitigate this problem.
> > > +         */
> > > +        warn_report_once("Using device with vIOMMUs and coordinated discarding"
> > > +                         " of RAM (e.g., virtio-mem) works, however, malicious"
> > > +                         " guests can trigger pinning of more memory than"
> > > +                         " intended via an IOMMU. It's possible to mitigate "
> > > +                         " by setting/adjusting RLIMIT_MEMLOCK.");  
> >
> > Is this really fit to be in shared code?  Simply replacing "vfio" with
> > "device" for comments and warnings that are really of concern for a
> > specific use case doesn't look much better to me.
> >
> > I think translating an unpopulated address, as in the previous test
> > above, is generally invalid, but the comment is certainly trying to
> > frame the severity of this error relative to a specific use case.
> >
> > Here we're generating an unconditional warning, assuming that this code
> > path has been triggered by device code, for the condition of simply
> > asking for a translation to a MemoryRegion under discard manager
> > control?  Again, isn't that an action that has implications for a
> > specific use case of a device that supports pinning host memory?  
> 
> 
> Or can we rename the function to memory_get_xlat_addr_no_discard()?
> This looks more general and fit for the caller that doesn't want to
> map region that has a discard manager.

Is a guest restricted from mapping virtio-mem regions to a device?
AFAIK, this is something that a guest can do and we can't restrict them
from doing it, it's just that it opens some potential for malicious
activity that we rely on things like locked memory limits to keep from
getting out of hand.  Thanks,

Alex

> > Should the shared code be generating this warning, or could an optional
> > pointer arg be updated to indicate a translation to discard manager
> > controlled memory and this comment and warning should remain in the
> > caller?  Thanks,  
> 
> I think this should also work.
> 
> Thanks
> 
> >
> > Alex
> >  
> > > +    }
> > > +
> > > +    /*
> > > +     * Translation truncates length to the IOMMU page size,
> > > +     * check that it did not truncate too much.
> > > +     */
> > > +    if (len & iotlb->addr_mask) {
> > > +        error_report("iommu has granularity incompatible with target AS");
> > > +        return false;
> > > +    }
> > > +
> > > +    if (vaddr) {
> > > +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > +    }
> > > +
> > > +    if (ram_addr) {
> > > +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > +    }
> > > +
> > > +    if (read_only) {
> > > +        *read_only = !writable || mr->readonly;
> > > +    }
> > > +
> > > +    return true;
> > > +}
> > > +
> > >  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> > >  {
> > >      uint8_t mask = 1 << client;  
> >  
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-28  2:08       ` Alex Williamson
@ 2022-10-28  2:23         ` Jason Wang
  2022-10-28  2:35           ` Alex Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2022-10-28  2:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cindy Lu, mst, pbonzini, peterx, david, f4bug, sgarzare, qemu-devel

On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Fri, 28 Oct 2022 09:50:10 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
> > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > >
> > > On Thu, 27 Oct 2022 15:40:31 +0800
> > > Cindy Lu <lulu@redhat.com> wrote:
> > >
> > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > > change the name to memory_get_xlat_addr().So we can use this
> > > > function in other devices,such as vDPA device.
> > > >
> > > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > > ---
> > > >  hw/vfio/common.c      | 92 ++-----------------------------------------
> > > >  include/exec/memory.h |  4 ++
> > > >  softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > > >
> > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > index ace9562a9b..2b5a9f3d8d 100644
> > > > --- a/hw/vfio/common.c
> > > > +++ b/hw/vfio/common.c
> > > > @@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > > >             section->offset_within_address_space & (1ULL << 63);
> > > >  }
> > > >
> > > > -/* Called with rcu_read_lock held.  */
> > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > -                               ram_addr_t *ram_addr, bool *read_only)
> > > > -{
> > > > -    MemoryRegion *mr;
> > > > -    hwaddr xlat;
> > > > -    hwaddr len = iotlb->addr_mask + 1;
> > > > -    bool writable = iotlb->perm & IOMMU_WO;
> > > > -
> > > > -    /*
> > > > -     * The IOMMU TLB entry we have just covers translation through
> > > > -     * this IOMMU to its immediate target.  We need to translate
> > > > -     * it the rest of the way through to memory.
> > > > -     */
> > > > -    mr = address_space_translate(&address_space_memory,
> > > > -                                 iotlb->translated_addr,
> > > > -                                 &xlat, &len, writable,
> > > > -                                 MEMTXATTRS_UNSPECIFIED);
> > > > -    if (!memory_region_is_ram(mr)) {
> > > > -        error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > > -                     xlat);
> > > > -        return false;
> > > > -    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > > -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > > -        MemoryRegionSection tmp = {
> > > > -            .mr = mr,
> > > > -            .offset_within_region = xlat,
> > > > -            .size = int128_make64(len),
> > > > -        };
> > > > -
> > > > -        /*
> > > > -         * Malicious VMs can map memory into the IOMMU, which is expected
> > > > -         * to remain discarded. vfio will pin all pages, populating memory.
> > > > -         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > > -         * were already restored before IOMMUs are restored.
> > > > -         */
> > > > -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > -            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > > -                         " virtio-mem): %"HWADDR_PRIx"",
> > > > -                         iotlb->translated_addr);
> > > > -            return false;
> > > > -        }
> > > > -
> > > > -        /*
> > > > -         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > > -         * pages will remain pinned inside vfio until unmapped, resulting in a
> > > > -         * higher memory consumption than expected. If memory would get
> > > > -         * populated again later, there would be an inconsistency between pages
> > > > -         * pinned by vfio and pages seen by QEMU. This is the case until
> > > > -         * unmapped from the IOMMU (e.g., during device reset).
> > > > -         *
> > > > -         * With malicious guests, we really only care about pinning more memory
> > > > -         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > > -         * exceeded and can be used to mitigate this problem.
> > > > -         */
> > > > -        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
> > > > -                         " RAM (e.g., virtio-mem) works, however, malicious"
> > > > -                         " guests can trigger pinning of more memory than"
> > > > -                         " intended via an IOMMU. It's possible to mitigate "
> > > > -                         " by setting/adjusting RLIMIT_MEMLOCK.");
> > > > -    }
> > > > -
> > > > -    /*
> > > > -     * Translation truncates length to the IOMMU page size,
> > > > -     * check that it did not truncate too much.
> > > > -     */
> > > > -    if (len & iotlb->addr_mask) {
> > > > -        error_report("iommu has granularity incompatible with target AS");
> > > > -        return false;
> > > > -    }
> > > > -
> > > > -    if (vaddr) {
> > > > -        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > > -    }
> > > > -
> > > > -    if (ram_addr) {
> > > > -        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > > -    }
> > > > -
> > > > -    if (read_only) {
> > > > -        *read_only = !writable || mr->readonly;
> > > > -    }
> > > > -
> > > > -    return true;
> > > > -}
> > > > -
> > > >  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > >  {
> > > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > >      if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > >          bool read_only;
> > > >
> > > > -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> > > > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> > > > +                                  &address_space_memory)) {
> > > >              goto out;
> > > >          }
> > > >          /*
> > > > @@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > >      }
> > > >
> > > >      rcu_read_lock();
> > > > -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> > > > +    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
> > > > +                             &address_space_memory)) {
> > > >          int ret;
> > > >
> > > >          ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
> > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > index bfb1de8eea..282de1d5ad 100644
> > > > --- a/include/exec/memory.h
> > > > +++ b/include/exec/memory.h
> > > > @@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
> > > >  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > > >                                               RamDiscardListener *rdl);
> > > >
> > > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > > +                          AddressSpace *as);
> > > > +
> > > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > >  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> > > >
> > > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > > index 7ba2048836..8586863ffa 100644
> > > > --- a/softmmu/memory.c
> > > > +++ b/softmmu/memory.c
> > > > @@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > > >      rdmc->unregister_listener(rdm, rdl);
> > > >  }
> > > >
> > > > +/* Called with rcu_read_lock held.  */
> > > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > > +                          AddressSpace *as)
> > > > +{
> > > > +    MemoryRegion *mr;
> > > > +    hwaddr xlat;
> > > > +    hwaddr len = iotlb->addr_mask + 1;
> > > > +    bool writable = iotlb->perm & IOMMU_WO;
> > > > +
> > > > +    /*
> > > > +     * The IOMMU TLB entry we have just covers translation through
> > > > +     * this IOMMU to its immediate target.  We need to translate
> > > > +     * it the rest of the way through to memory.
> > > > +     */
> > > > +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> > > > +                                 writable, MEMTXATTRS_UNSPECIFIED);
> > > > +    if (!memory_region_is_ram(mr)) {
> > > > +        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
> > > > +        return false;
> > > > +    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > > +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > > +        MemoryRegionSection tmp = {
> > > > +            .mr = mr,
> > > > +            .offset_within_region = xlat,
> > > > +            .size = int128_make64(len),
> > > > +        };
> > > > +
> > > > +        /*
> > > > +         * Malicious VMs can map memory into the IOMMU, which is expected
> > > > +         * to remain discarded. device will pin all pages, populating memory.
> > > > +         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > > +         * were already restored before IOMMUs are restored.
> > > > +         */
> > > > +        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > +            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > > +                         " virtio-mem): %" HWADDR_PRIx "",
> > > > +                         iotlb->translated_addr);
> > > > +            return false;
> > > > +        }
> > > > +
> > > > +        /*
> > > > +         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > > +         * pages will remain pinned inside device until unmapped, resulting in a
> > > > +         * higher memory consumption than expected. If memory would get
> > > > +         * populated again later, there would be an inconsistency between pages
> > > > +         * pinned by device and pages seen by QEMU. This is the case until
> > > > +         * unmapped from the IOMMU (e.g., during device reset).
> > > > +         *
> > > > +         * With malicious guests, we really only care about pinning more memory
> > > > +         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > > +         * exceeded and can be used to mitigate this problem.
> > > > +         */
> > > > +        warn_report_once("Using device with vIOMMUs and coordinated discarding"
> > > > +                         " of RAM (e.g., virtio-mem) works, however, malicious"
> > > > +                         " guests can trigger pinning of more memory than"
> > > > +                         " intended via an IOMMU. It's possible to mitigate "
> > > > +                         " by setting/adjusting RLIMIT_MEMLOCK.");
> > >
> > > Is this really fit to be in shared code?  Simply replacing "vfio" with
> > > "device" for comments and warnings that are really of concern for a
> > > specific use case doesn't look much better to me.
> > >
> > > I think translating an unpopulated address, as in the previous test
> > > above, is generally invalid, but the comment is certainly trying to
> > > frame the severity of this error relative to a specific use case.
> > >
> > > Here we're generating an unconditional warning, assuming that this code
> > > path has been triggered by device code, for the condition of simply
> > > asking for a translation to a MemoryRegion under discard manager
> > > control?  Again, isn't that an action that has implications for a
> > > specific use case of a device that supports pinning host memory?
> >
> >
> > Or can we rename the function to memory_get_xlat_addr_no_discard()?
> > This looks more general and fit for the caller that doesn't want to
> > map region that has a discard manager.
>
> Is a guest restricted from mapping virtio-mem regions to a device?

For the region that is not populated, it should be restricted. If this
is wrong, we need a separate fix.

> AFAIK, this is something that a guest can do and we can't restrict them
> from doing it, it's just that it opens some potential for malicious
> activity that we rely on things like locked memory limits to keep from
> getting out of hand.  Thanks,

So it's the fault of the name, it could be
memory_get_xlat_addr_no_unpopulated_discard().

Or as I replied, stick to what you've suggested.

Thanks

>
> Alex
>
> > > Should the shared code be generating this warning, or could an optional
> > > pointer arg be updated to indicate a translation to discard manager
> > > controlled memory and this comment and warning should remain in the
> > > caller?  Thanks,
> >
> > I think this should also work.
> >
> > Thanks
> >
> > >
> > > Alex
> > >
> > > > +    }
> > > > +
> > > > +    /*
> > > > +     * Translation truncates length to the IOMMU page size,
> > > > +     * check that it did not truncate too much.
> > > > +     */
> > > > +    if (len & iotlb->addr_mask) {
> > > > +        error_report("iommu has granularity incompatible with target AS");
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    if (vaddr) {
> > > > +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > > +    }
> > > > +
> > > > +    if (ram_addr) {
> > > > +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > > +    }
> > > > +
> > > > +    if (read_only) {
> > > > +        *read_only = !writable || mr->readonly;
> > > > +    }
> > > > +
> > > > +    return true;
> > > > +}
> > > > +
> > > >  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> > > >  {
> > > >      uint8_t mask = 1 << client;
> > >
> >
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-28  2:23         ` Jason Wang
@ 2022-10-28  2:35           ` Alex Williamson
  2022-10-28  2:40             ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2022-10-28  2:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cindy Lu, mst, pbonzini, peterx, david, f4bug, sgarzare, qemu-devel

On Fri, 28 Oct 2022 10:23:16 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > On Fri, 28 Oct 2022 09:50:10 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> > > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
> > > <alex.williamson@redhat.com> wrote:  
> > > >
> > > > On Thu, 27 Oct 2022 15:40:31 +0800
> > > > Cindy Lu <lulu@redhat.com> wrote:
> > > >  
> > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > > > change the name to memory_get_xlat_addr().So we can use this
> > > > > function in other devices,such as vDPA device.
> > > > >
> > > > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > > > ---
> > > > >  hw/vfio/common.c      | 92 ++-----------------------------------------
> > > > >  include/exec/memory.h |  4 ++
> > > > >  softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
> > > > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > > > >
> > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > > index ace9562a9b..2b5a9f3d8d 100644
> > > > > --- a/hw/vfio/common.c
> > > > > +++ b/hw/vfio/common.c
> > > > > @@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > > > >             section->offset_within_address_space & (1ULL << 63);
> > > > >  }
> > > > >
> > > > > -/* Called with rcu_read_lock held.  */
> > > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > -                               ram_addr_t *ram_addr, bool *read_only)
> > > > > -{
> > > > > -    MemoryRegion *mr;
> > > > > -    hwaddr xlat;
> > > > > -    hwaddr len = iotlb->addr_mask + 1;
> > > > > -    bool writable = iotlb->perm & IOMMU_WO;
> > > > > -
> > > > > -    /*
> > > > > -     * The IOMMU TLB entry we have just covers translation through
> > > > > -     * this IOMMU to its immediate target.  We need to translate
> > > > > -     * it the rest of the way through to memory.
> > > > > -     */
> > > > > -    mr = address_space_translate(&address_space_memory,
> > > > > -                                 iotlb->translated_addr,
> > > > > -                                 &xlat, &len, writable,
> > > > > -                                 MEMTXATTRS_UNSPECIFIED);
> > > > > -    if (!memory_region_is_ram(mr)) {
> > > > > -        error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > > > -                     xlat);
> > > > > -        return false;
> > > > > -    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > > > -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > > > -        MemoryRegionSection tmp = {
> > > > > -            .mr = mr,
> > > > > -            .offset_within_region = xlat,
> > > > > -            .size = int128_make64(len),
> > > > > -        };
> > > > > -
> > > > > -        /*
> > > > > -         * Malicious VMs can map memory into the IOMMU, which is expected
> > > > > -         * to remain discarded. vfio will pin all pages, populating memory.
> > > > > -         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > > > -         * were already restored before IOMMUs are restored.
> > > > > -         */
> > > > > -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > > -            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > > > -                         " virtio-mem): %"HWADDR_PRIx"",
> > > > > -                         iotlb->translated_addr);
> > > > > -            return false;
> > > > > -        }
> > > > > -
> > > > > -        /*
> > > > > -         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > > > -         * pages will remain pinned inside vfio until unmapped, resulting in a
> > > > > -         * higher memory consumption than expected. If memory would get
> > > > > -         * populated again later, there would be an inconsistency between pages
> > > > > -         * pinned by vfio and pages seen by QEMU. This is the case until
> > > > > -         * unmapped from the IOMMU (e.g., during device reset).
> > > > > -         *
> > > > > -         * With malicious guests, we really only care about pinning more memory
> > > > > -         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > > > -         * exceeded and can be used to mitigate this problem.
> > > > > -         */
> > > > > -        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
> > > > > -                         " RAM (e.g., virtio-mem) works, however, malicious"
> > > > > -                         " guests can trigger pinning of more memory than"
> > > > > -                         " intended via an IOMMU. It's possible to mitigate "
> > > > > -                         " by setting/adjusting RLIMIT_MEMLOCK.");
> > > > > -    }
> > > > > -
> > > > > -    /*
> > > > > -     * Translation truncates length to the IOMMU page size,
> > > > > -     * check that it did not truncate too much.
> > > > > -     */
> > > > > -    if (len & iotlb->addr_mask) {
> > > > > -        error_report("iommu has granularity incompatible with target AS");
> > > > > -        return false;
> > > > > -    }
> > > > > -
> > > > > -    if (vaddr) {
> > > > > -        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > > > -    }
> > > > > -
> > > > > -    if (ram_addr) {
> > > > > -        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > > > -    }
> > > > > -
> > > > > -    if (read_only) {
> > > > > -        *read_only = !writable || mr->readonly;
> > > > > -    }
> > > > > -
> > > > > -    return true;
> > > > > -}
> > > > > -
> > > > >  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > >  {
> > > > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > > > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > >      if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > >          bool read_only;
> > > > >
> > > > > -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> > > > > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> > > > > +                                  &address_space_memory)) {
> > > > >              goto out;
> > > > >          }
> > > > >          /*
> > > > > @@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > >      }
> > > > >
> > > > >      rcu_read_lock();
> > > > > -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> > > > > +    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
> > > > > +                             &address_space_memory)) {
> > > > >          int ret;
> > > > >
> > > > >          ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
> > > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > > index bfb1de8eea..282de1d5ad 100644
> > > > > --- a/include/exec/memory.h
> > > > > +++ b/include/exec/memory.h
> > > > > @@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
> > > > >  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > > > >                                               RamDiscardListener *rdl);
> > > > >
> > > > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > > > +                          AddressSpace *as);
> > > > > +
> > > > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > > >  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> > > > >
> > > > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > > > index 7ba2048836..8586863ffa 100644
> > > > > --- a/softmmu/memory.c
> > > > > +++ b/softmmu/memory.c
> > > > > @@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > > > >      rdmc->unregister_listener(rdm, rdl);
> > > > >  }
> > > > >
> > > > > +/* Called with rcu_read_lock held.  */
> > > > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > > > +                          AddressSpace *as)
> > > > > +{
> > > > > +    MemoryRegion *mr;
> > > > > +    hwaddr xlat;
> > > > > +    hwaddr len = iotlb->addr_mask + 1;
> > > > > +    bool writable = iotlb->perm & IOMMU_WO;
> > > > > +
> > > > > +    /*
> > > > > +     * The IOMMU TLB entry we have just covers translation through
> > > > > +     * this IOMMU to its immediate target.  We need to translate
> > > > > +     * it the rest of the way through to memory.
> > > > > +     */
> > > > > +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> > > > > +                                 writable, MEMTXATTRS_UNSPECIFIED);
> > > > > +    if (!memory_region_is_ram(mr)) {
> > > > > +        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
> > > > > +        return false;
> > > > > +    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > > > +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > > > +        MemoryRegionSection tmp = {
> > > > > +            .mr = mr,
> > > > > +            .offset_within_region = xlat,
> > > > > +            .size = int128_make64(len),
> > > > > +        };
> > > > > +
> > > > > +        /*
> > > > > +         * Malicious VMs can map memory into the IOMMU, which is expected
> > > > > +         * to remain discarded. device will pin all pages, populating memory.
> > > > > +         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > > > +         * were already restored before IOMMUs are restored.
> > > > > +         */
> > > > > +        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > > +            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > > > +                         " virtio-mem): %" HWADDR_PRIx "",
> > > > > +                         iotlb->translated_addr);
> > > > > +            return false;
> > > > > +        }
> > > > > +
> > > > > +        /*
> > > > > +         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > > > +         * pages will remain pinned inside device until unmapped, resulting in a
> > > > > +         * higher memory consumption than expected. If memory would get
> > > > > +         * populated again later, there would be an inconsistency between pages
> > > > > +         * pinned by device and pages seen by QEMU. This is the case until
> > > > > +         * unmapped from the IOMMU (e.g., during device reset).
> > > > > +         *
> > > > > +         * With malicious guests, we really only care about pinning more memory
> > > > > +         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > > > +         * exceeded and can be used to mitigate this problem.
> > > > > +         */
> > > > > +        warn_report_once("Using device with vIOMMUs and coordinated discarding"
> > > > > +                         " of RAM (e.g., virtio-mem) works, however, malicious"
> > > > > +                         " guests can trigger pinning of more memory than"
> > > > > +                         " intended via an IOMMU. It's possible to mitigate "
> > > > > +                         " by setting/adjusting RLIMIT_MEMLOCK.");  
> > > >
> > > > Is this really fit to be in shared code?  Simply replacing "vfio" with
> > > > "device" for comments and warnings that are really of concern for a
> > > > specific use case doesn't look much better to me.
> > > >
> > > > I think translating an unpopulated address, as in the previous test
> > > > above, is generally invalid, but the comment is certainly trying to
> > > > frame the severity of this error relative to a specific use case.
> > > >
> > > > Here we're generating an unconditional warning, assuming that this code
> > > > path has been triggered by device code, for the condition of simply
> > > > asking for a translation to a MemoryRegion under discard manager
> > > > control?  Again, isn't that an action that has implications for a
> > > > specific use case of a device that supports pinning host memory?  
> > >
> > >
> > > Or can we rename the function to memory_get_xlat_addr_no_discard()?
> > > This looks more general and fit for the caller that doesn't want to
> > > map region that has a discard manager.  
> >
> > Is a guest restricted from mapping virtio-mem regions to a device?  
> 
> For the region that is not populated, it should be restricted. If this
> is wrong, we need a separate fix.
> 
> > AFAIK, this is something that a guest can do and we can't restrict them
> > from doing it, it's just that it opens some potential for malicious
> > activity that we rely on things like locked memory limits to keep from
> > getting out of hand.  Thanks,  
> 
> So it's the fault of the name, it could be
> memory_get_xlat_addr_no_unpopulated_discard().

Unpopulated discard has no translation, it's invalid.  That's the
previous test above where we return false.  The comment there is a bit
vfio specific, but I think the behavior is universal.  That doesn't
resolve this warn_report_once for simply trying to translate something
backed by virtio-mem though.  Thanks,

Alex
 
> Or as I replied, stick to what you've suggested.
> 
> Thanks
> 
> >
> > Alex
> >  
> > > > Should the shared code be generating this warning, or could an optional
> > > > pointer arg be updated to indicate a translation to discard manager
> > > > controlled memory and this comment and warning should remain in the
> > > > caller?  Thanks,  
> > >
> > > I think this should also work.
> > >
> > > Thanks
> > >  
> > > >
> > > > Alex
> > > >  
> > > > > +    }
> > > > > +
> > > > > +    /*
> > > > > +     * Translation truncates length to the IOMMU page size,
> > > > > +     * check that it did not truncate too much.
> > > > > +     */
> > > > > +    if (len & iotlb->addr_mask) {
> > > > > +        error_report("iommu has granularity incompatible with target AS");
> > > > > +        return false;
> > > > > +    }
> > > > > +
> > > > > +    if (vaddr) {
> > > > > +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > > > +    }
> > > > > +
> > > > > +    if (ram_addr) {
> > > > > +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > > > +    }
> > > > > +
> > > > > +    if (read_only) {
> > > > > +        *read_only = !writable || mr->readonly;
> > > > > +    }
> > > > > +
> > > > > +    return true;
> > > > > +}
> > > > > +
> > > > >  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> > > > >  {
> > > > >      uint8_t mask = 1 << client;  
> > > >  
> > >  
> >  
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
  2022-10-28  2:35           ` Alex Williamson
@ 2022-10-28  2:40             ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2022-10-28  2:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cindy Lu, mst, pbonzini, peterx, david, f4bug, sgarzare, qemu-devel

On Fri, Oct 28, 2022 at 10:35 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Fri, 28 Oct 2022 10:23:16 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
> > On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > >
> > > On Fri, 28 Oct 2022 09:50:10 +0800
> > > Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
> > > > <alex.williamson@redhat.com> wrote:
> > > > >
> > > > > On Thu, 27 Oct 2022 15:40:31 +0800
> > > > > Cindy Lu <lulu@redhat.com> wrote:
> > > > >
> > > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > > > > change the name to memory_get_xlat_addr().So we can use this
> > > > > > function in other devices,such as vDPA device.
> > > > > >
> > > > > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > > > > ---
> > > > > >  hw/vfio/common.c      | 92 ++-----------------------------------------
> > > > > >  include/exec/memory.h |  4 ++
> > > > > >  softmmu/memory.c      | 84 +++++++++++++++++++++++++++++++++++++++
> > > > > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > > > index ace9562a9b..2b5a9f3d8d 100644
> > > > > > --- a/hw/vfio/common.c
> > > > > > +++ b/hw/vfio/common.c
> > > > > > @@ -574,92 +574,6 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > > > > >             section->offset_within_address_space & (1ULL << 63);
> > > > > >  }
> > > > > >
> > > > > > -/* Called with rcu_read_lock held.  */
> > > > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > > -                               ram_addr_t *ram_addr, bool *read_only)
> > > > > > -{
> > > > > > -    MemoryRegion *mr;
> > > > > > -    hwaddr xlat;
> > > > > > -    hwaddr len = iotlb->addr_mask + 1;
> > > > > > -    bool writable = iotlb->perm & IOMMU_WO;
> > > > > > -
> > > > > > -    /*
> > > > > > -     * The IOMMU TLB entry we have just covers translation through
> > > > > > -     * this IOMMU to its immediate target.  We need to translate
> > > > > > -     * it the rest of the way through to memory.
> > > > > > -     */
> > > > > > -    mr = address_space_translate(&address_space_memory,
> > > > > > -                                 iotlb->translated_addr,
> > > > > > -                                 &xlat, &len, writable,
> > > > > > -                                 MEMTXATTRS_UNSPECIFIED);
> > > > > > -    if (!memory_region_is_ram(mr)) {
> > > > > > -        error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > > > > -                     xlat);
> > > > > > -        return false;
> > > > > > -    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > > > > -        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > > > > -        MemoryRegionSection tmp = {
> > > > > > -            .mr = mr,
> > > > > > -            .offset_within_region = xlat,
> > > > > > -            .size = int128_make64(len),
> > > > > > -        };
> > > > > > -
> > > > > > -        /*
> > > > > > -         * Malicious VMs can map memory into the IOMMU, which is expected
> > > > > > -         * to remain discarded. vfio will pin all pages, populating memory.
> > > > > > -         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > > > > -         * were already restored before IOMMUs are restored.
> > > > > > -         */
> > > > > > -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > > > -            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > > > > -                         " virtio-mem): %"HWADDR_PRIx"",
> > > > > > -                         iotlb->translated_addr);
> > > > > > -            return false;
> > > > > > -        }
> > > > > > -
> > > > > > -        /*
> > > > > > -         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > > > > -         * pages will remain pinned inside vfio until unmapped, resulting in a
> > > > > > -         * higher memory consumption than expected. If memory would get
> > > > > > -         * populated again later, there would be an inconsistency between pages
> > > > > > -         * pinned by vfio and pages seen by QEMU. This is the case until
> > > > > > -         * unmapped from the IOMMU (e.g., during device reset).
> > > > > > -         *
> > > > > > -         * With malicious guests, we really only care about pinning more memory
> > > > > > -         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > > > > -         * exceeded and can be used to mitigate this problem.
> > > > > > -         */
> > > > > > -        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
> > > > > > -                         " RAM (e.g., virtio-mem) works, however, malicious"
> > > > > > -                         " guests can trigger pinning of more memory than"
> > > > > > -                         " intended via an IOMMU. It's possible to mitigate "
> > > > > > -                         " by setting/adjusting RLIMIT_MEMLOCK.");
> > > > > > -    }
> > > > > > -
> > > > > > -    /*
> > > > > > -     * Translation truncates length to the IOMMU page size,
> > > > > > -     * check that it did not truncate too much.
> > > > > > -     */
> > > > > > -    if (len & iotlb->addr_mask) {
> > > > > > -        error_report("iommu has granularity incompatible with target AS");
> > > > > > -        return false;
> > > > > > -    }
> > > > > > -
> > > > > > -    if (vaddr) {
> > > > > > -        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > > > > -    }
> > > > > > -
> > > > > > -    if (ram_addr) {
> > > > > > -        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > > > > -    }
> > > > > > -
> > > > > > -    if (read_only) {
> > > > > > -        *read_only = !writable || mr->readonly;
> > > > > > -    }
> > > > > > -
> > > > > > -    return true;
> > > > > > -}
> > > > > > -
> > > > > >  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > >  {
> > > > > >      VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > > > > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > >      if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > >          bool read_only;
> > > > > >
> > > > > > -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> > > > > > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> > > > > > +                                  &address_space_memory)) {
> > > > > >              goto out;
> > > > > >          }
> > > > > >          /*
> > > > > > @@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > >      }
> > > > > >
> > > > > >      rcu_read_lock();
> > > > > > -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> > > > > > +    if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, NULL,
> > > > > > +                             &address_space_memory)) {
> > > > > >          int ret;
> > > > > >
> > > > > >          ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
> > > > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > > > index bfb1de8eea..282de1d5ad 100644
> > > > > > --- a/include/exec/memory.h
> > > > > > +++ b/include/exec/memory.h
> > > > > > @@ -713,6 +713,10 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
> > > > > >  void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > > > > >                                               RamDiscardListener *rdl);
> > > > > >
> > > > > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > > > > +                          AddressSpace *as);
> > > > > > +
> > > > > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > > > >  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> > > > > >
> > > > > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > > > > index 7ba2048836..8586863ffa 100644
> > > > > > --- a/softmmu/memory.c
> > > > > > +++ b/softmmu/memory.c
> > > > > > @@ -2121,6 +2121,90 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> > > > > >      rdmc->unregister_listener(rdm, rdl);
> > > > > >  }
> > > > > >
> > > > > > +/* Called with rcu_read_lock held.  */
> > > > > > +bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > > +                          ram_addr_t *ram_addr, bool *read_only,
> > > > > > +                          AddressSpace *as)
> > > > > > +{
> > > > > > +    MemoryRegion *mr;
> > > > > > +    hwaddr xlat;
> > > > > > +    hwaddr len = iotlb->addr_mask + 1;
> > > > > > +    bool writable = iotlb->perm & IOMMU_WO;
> > > > > > +
> > > > > > +    /*
> > > > > > +     * The IOMMU TLB entry we have just covers translation through
> > > > > > +     * this IOMMU to its immediate target.  We need to translate
> > > > > > +     * it the rest of the way through to memory.
> > > > > > +     */
> > > > > > +    mr = address_space_translate(as, iotlb->translated_addr, &xlat, &len,
> > > > > > +                                 writable, MEMTXATTRS_UNSPECIFIED);
> > > > > > +    if (!memory_region_is_ram(mr)) {
> > > > > > +        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
> > > > > > +        return false;
> > > > > > +    } else if (memory_region_has_ram_discard_manager(mr)) {
> > > > > > +        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > > > > > +        MemoryRegionSection tmp = {
> > > > > > +            .mr = mr,
> > > > > > +            .offset_within_region = xlat,
> > > > > > +            .size = int128_make64(len),
> > > > > > +        };
> > > > > > +
> > > > > > +        /*
> > > > > > +         * Malicious VMs can map memory into the IOMMU, which is expected
> > > > > > +         * to remain discarded. device will pin all pages, populating memory.
> > > > > > +         * Disallow that. vmstate priorities make sure any RamDiscardManager
> > > > > > +         * were already restored before IOMMUs are restored.
> > > > > > +         */
> > > > > > +        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > > > +            error_report("iommu map to discarded memory (e.g., unplugged via"
> > > > > > +                         " virtio-mem): %" HWADDR_PRIx "",
> > > > > > +                         iotlb->translated_addr);
> > > > > > +            return false;
> > > > > > +        }
> > > > > > +
> > > > > > +        /*
> > > > > > +         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> > > > > > +         * pages will remain pinned inside device until unmapped, resulting in a
> > > > > > +         * higher memory consumption than expected. If memory would get
> > > > > > +         * populated again later, there would be an inconsistency between pages
> > > > > > +         * pinned by device and pages seen by QEMU. This is the case until
> > > > > > +         * unmapped from the IOMMU (e.g., during device reset).
> > > > > > +         *
> > > > > > +         * With malicious guests, we really only care about pinning more memory
> > > > > > +         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
> > > > > > +         * exceeded and can be used to mitigate this problem.
> > > > > > +         */
> > > > > > +        warn_report_once("Using device with vIOMMUs and coordinated discarding"
> > > > > > +                         " of RAM (e.g., virtio-mem) works, however, malicious"
> > > > > > +                         " guests can trigger pinning of more memory than"
> > > > > > +                         " intended via an IOMMU. It's possible to mitigate "
> > > > > > +                         " by setting/adjusting RLIMIT_MEMLOCK.");
> > > > >
> > > > > Is this really fit to be in shared code?  Simply replacing "vfio" with
> > > > > "device" for comments and warnings that are really of concern for a
> > > > > specific use case doesn't look much better to me.
> > > > >
> > > > > I think translating an unpopulated address, as in the previous test
> > > > > above, is generally invalid, but the comment is certainly trying to
> > > > > frame the severity of this error relative to a specific use case.
> > > > >
> > > > > Here we're generating an unconditional warning, assuming that this code
> > > > > path has been triggered by device code, for the condition of simply
> > > > > asking for a translation to a MemoryRegion under discard manager
> > > > > control?  Again, isn't that an action that has implications for a
> > > > > specific use case of a device that supports pinning host memory?
> > > >
> > > >
> > > > Or can we rename the function to memory_get_xlat_addr_no_discard()?
> > > > This looks more general and fit for the caller that doesn't want to
> > > > map region that has a discard manager.
> > >
> > > Is a guest restricted from mapping virtio-mem regions to a device?
> >
> > For the region that is not populated, it should be restricted. If this
> > is wrong, we need a separate fix.
> >
> > > AFAIK, this is something that a guest can do and we can't restrict them
> > > from doing it, it's just that it opens some potential for malicious
> > > activity that we rely on things like locked memory limits to keep from
> > > getting out of hand.  Thanks,
> >
> > So it's the fault of the name, it could be
> > memory_get_xlat_addr_no_unpopulated_discard().
>
> Unpopulated discard has no translation, it's invalid.  That's the
> previous test above where we return false.  The comment there is a bit
> vfio specific, but I think the behavior is universal.  That doesn't
> resolve this warn_report_once for simply trying to translate something
> backed by virtio-mem though.  Thanks,

Ok, fine, let's add an optional arg then.

Thanks

>
> Alex
>
> > Or as I replied, stick to what you've suggested.
> >
> > Thanks
> >
> > >
> > > Alex
> > >
> > > > > Should the shared code be generating this warning, or could an optional
> > > > > pointer arg be updated to indicate a translation to discard manager
> > > > > controlled memory and this comment and warning should remain in the
> > > > > caller?  Thanks,
> > > >
> > > > I think this should also work.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Alex
> > > > >
> > > > > > +    }
> > > > > > +
> > > > > > +    /*
> > > > > > +     * Translation truncates length to the IOMMU page size,
> > > > > > +     * check that it did not truncate too much.
> > > > > > +     */
> > > > > > +    if (len & iotlb->addr_mask) {
> > > > > > +        error_report("iommu has granularity incompatible with target AS");
> > > > > > +        return false;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (vaddr) {
> > > > > > +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (ram_addr) {
> > > > > > +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (read_only) {
> > > > > > +        *read_only = !writable || mr->readonly;
> > > > > > +    }
> > > > > > +
> > > > > > +    return true;
> > > > > > +}
> > > > > > +
> > > > > >  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> > > > > >  {
> > > > > >      uint8_t mask = 1 << client;
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/2] vhost-vdpa: add support for vIOMMU
  2022-10-27  7:40 [PATCH v4 0/2] vhost-vdpa: add support for vIOMMU Cindy Lu
  2022-10-27  7:40 ` [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c Cindy Lu
  2022-10-27  7:40 ` [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU Cindy Lu
@ 2022-10-29  7:57 ` Michael S. Tsirkin
  2022-10-29  7:58   ` Cindy Lu
  2 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2022-10-29  7:57 UTC (permalink / raw)
  To: Cindy Lu
  Cc: alex.williamson, jasowang, pbonzini, peterx, david, f4bug,
	sgarzare, qemu-devel

On Thu, Oct 27, 2022 at 03:40:30PM +0800, Cindy Lu wrote:
> These patches are to support vIOMMU in vdpa device
> 
> changes in V3
> 1. Move function vfio_get_xlat_addr to memory.c
> 2. Use the existing memory listener, while the MR is
> iommu MR then call the function iommu_region_add/
> iommu_region_del
> 
> changes in V4
> 1.make the comments in vfio_get_xlat_addr more general

I expect there will be v5 addressing Alex's comments.

> Cindy Lu (2):
>   vfio: move the function vfio_get_xlat_addr() to memory.c
>   vhost-vdpa: add support for vIOMMU
> 
>  hw/vfio/common.c               |  92 +----------------------
>  hw/virtio/vhost-vdpa.c         | 131 ++++++++++++++++++++++++++++++---
>  include/exec/memory.h          |   4 +
>  include/hw/virtio/vhost-vdpa.h |  10 +++
>  softmmu/memory.c               |  84 +++++++++++++++++++++
>  5 files changed, 222 insertions(+), 99 deletions(-)
> 
> -- 
> 2.34.3



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/2] vhost-vdpa: add support for vIOMMU
  2022-10-29  7:57 ` [PATCH v4 0/2] " Michael S. Tsirkin
@ 2022-10-29  7:58   ` Cindy Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Cindy Lu @ 2022-10-29  7:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alex.williamson, jasowang, pbonzini, peterx, david, f4bug,
	sgarzare, qemu-devel

On Sat, 29 Oct 2022 at 15:57, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 27, 2022 at 03:40:30PM +0800, Cindy Lu wrote:
> > These patches are to support vIOMMU in vdpa device
> >
> > changes in V3
> > 1. Move function vfio_get_xlat_addr to memory.c
> > 2. Use the existing memory listener, while the MR is
> > iommu MR then call the function iommu_region_add/
> > iommu_region_del
> >
> > changes in V4
> > 1.make the comments in vfio_get_xlat_addr more general
>
> I expect there will be v5 addressing Alex's comments.
>
sure, Thanks Micheal, I will post it soon
Thanks
Cindy
> > Cindy Lu (2):
> >   vfio: move the function vfio_get_xlat_addr() to memory.c
> >   vhost-vdpa: add support for vIOMMU
> >
> >  hw/vfio/common.c               |  92 +----------------------
> >  hw/virtio/vhost-vdpa.c         | 131 ++++++++++++++++++++++++++++++---
> >  include/exec/memory.h          |   4 +
> >  include/hw/virtio/vhost-vdpa.h |  10 +++
> >  softmmu/memory.c               |  84 +++++++++++++++++++++
> >  5 files changed, 222 insertions(+), 99 deletions(-)
> >
> > --
> > 2.34.3
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-29  8:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-27  7:40 [PATCH v4 0/2] vhost-vdpa: add support for vIOMMU Cindy Lu
2022-10-27  7:40 ` [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c Cindy Lu
2022-10-27  8:11   ` Jason Wang
2022-10-27 15:30   ` Peter Xu
2022-10-27 21:11   ` Alex Williamson
2022-10-28  1:50     ` Jason Wang
2022-10-28  2:08       ` Alex Williamson
2022-10-28  2:23         ` Jason Wang
2022-10-28  2:35           ` Alex Williamson
2022-10-28  2:40             ` Jason Wang
2022-10-27  7:40 ` [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU Cindy Lu
2022-10-27  8:10   ` Jason Wang
2022-10-29  7:57 ` [PATCH v4 0/2] " Michael S. Tsirkin
2022-10-29  7:58   ` Cindy Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.