qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX
@ 2020-03-05 14:29 David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
                   ` (14 more replies)
  0 siblings, 15 replies; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Alex Williamson, Shameerali Kolothum Thodi,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Stefan Weil, Stefan Hajnoczi, Richard Henderson

This series is based on [1]:
    "[PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating"

We already allow resizable ram blocks for anonymous memory, however, they
are not actually resized. All memory is mmaped() R/W, including the memory
exceeding the used_length, up to the max_length.

When resizing, effectively only the boundary is moved. Implement actually
resizable anonymous allocations and make use of them in resizable ram
blocks when possible. Memory exceeding the used_length will be
inaccessible. Especially ram block notifiers require care.

Having actually resizable anonymous allocations (via mmap-hackery) allows
to reserve a big region in virtual address space and grow the
accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
is set to "never" under Linux, huge reservations will succeed. If there is
not enough memory when resizing (to populate parts of the reserved region),
trying to resize will fail. Only the actually used size is reserved in the
OS.

Especially, memory notifiers already handle resizing by first removing
the old region, and then re-adding the resized region. prealloc is
currently not possible with resizable ram blocks. mlock() should continue
to work as is. Resizing is currently rare and must only happen on the
start of an incoming migration, or during resets. No code path (except
HAX and SEV ram block notifiers) should access memory outside of the usable
range - and if we ever find one, that one has to be fixed (I did not
identify any).

E.g., virtio-mem [2] wants to reserve big resizable memory regions and
grow the usable part on demand. I think this change is worth sending out
individually. I did excessive tests of this with virtio-mem (which makes
it very easy to trigger resizes), including precopy and postcopy migration.

Accompanied by a bunch of minor fixes and cleanups.

v3 -> v4:
- Added RBs
- "util/mmap-alloc: Factor out activating of memory to mmap_activate()"
-- use "activate" instead of "populate"
- "util: vfio-helpers: Implement ram_block_resized()"
-- Also store max_size in mappings and assert against i
-- Better comment why atomic resizes are not possible
- "exec: Ram blocks with resizeable anonymous allocations under POSIX"
-- Assert that RAM_RESIZEABLE_ALLOC is not set before allocating

v2 -> v3:
- Rebased on current master/[1].
- "util: vfio-helpers: Factor out and fix processing of existing ram
   blocks"
-- moved to [1]
- "util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()"
-- Better parch description
- "util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()"
-- is now "util/mmap-alloc: Factor out calculation of the pagesize for the
   guard page"
-- Decided to keep special handling for the guard page for now
-- Dropped rb's
- "util/mmap-alloc: Prepare for resizeable mmaps"
-- No asserts sizes against the real page size
- "numa: Teach ram block notifiers about resizable ram blocks"
-- Split. One part is in [1], the other is now in "numa: Introduce
   ram_block_notifiers_support_resize()"
- "exec: Ram blocks with resizeable anonymous allocations under POSIX"
-- Call qemu_ram_apply_settings() only populated parts. Call it on
   freshly populated parts when growing.
- Minor changes

v1 -> v2:
- Add "util: vfio-helpers: Fix qemu_vfio_close()"
- Add "util: vfio-helpers: Remove Error parameter from
       qemu_vfio_undo_mapping()"
- Add "util: vfio-helpers: Factor out removal from
       qemu_vfio_undo_mapping()"
- "util/mmap-alloc: ..."
 -- Minor changes due to review feedback (e.g., assert alignment, return
    bool when resizing)
- "util: vfio-helpers: Implement ram_block_resized()"
 -- Reserve max_size in the IOVA address space.
 -- On resize, undo old mapping and do new mapping. We can later implement
    a new ioctl to resize the mapping directly.
- "numa: Teach ram block notifiers about resizable ram blocks"
 -- Pass size/max_size to ram block notifiers, which makes things easier an
    cleaner
- "exec: Ram blocks with resizable anonymous allocations under POSIX"
 -- Adapt to new ram block notifiers
 -- Shrink after notifying. Always trigger ram block notifiers on resizes
 -- Add a safety net that all ram block notifiers registered at runtime
    support resizes.

[1] https://lkml.kernel.org/r/20200226155304.60219-1-david@redhat.com
[2] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/

David Hildenbrand (15):
  util: vfio-helpers: Fix qemu_vfio_close()
  util: vfio-helpers: Remove Error parameter from
    qemu_vfio_undo_mapping()
  util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  exec: Factor out setting ram settings (madvise ...) into
    qemu_ram_apply_settings()
  exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  exec: Drop "shared" parameter from ram_block_add()
  util/mmap-alloc: Factor out calculation of the pagesize for the guard
    page
  util/mmap-alloc: Factor out reserving of a memory region to
    mmap_reserve()
  util/mmap-alloc: Factor out activating of memory to mmap_activate()
  util/mmap-alloc: Prepare for resizeable mmaps
  util/mmap-alloc: Implement resizeable mmaps
  util: vfio-helpers: Implement ram_block_resized()
  util: oslib: Resizeable anonymous allocations under POSIX
  numa: Introduce ram_block_notifiers_support_resize()
  exec: Ram blocks with resizeable anonymous allocations under POSIX

 exec.c                    | 100 ++++++++++++++++++-----
 hw/core/numa.c            |  19 +++++
 include/exec/cpu-common.h |   2 +
 include/exec/memory.h     |   8 ++
 include/exec/ramlist.h    |   1 +
 include/qemu/mmap-alloc.h |  21 +++--
 include/qemu/osdep.h      |   6 +-
 util/mmap-alloc.c         | 168 ++++++++++++++++++++++++--------------
 util/oslib-posix.c        |  37 ++++++++-
 util/oslib-win32.c        |  14 ++++
 util/trace-events         |  11 ++-
 util/vfio-helpers.c       | 145 ++++++++++++++++++++++++--------
 12 files changed, 404 insertions(+), 128 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 01/15] util: vfio-helpers: Fix qemu_vfio_close()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-04-17 10:22   ` Philippe Mathieu-Daudé
  2020-03-05 14:29 ` [PATCH v4 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

qemu_vfio_undo_mapping() will decrement the number of mappings and
reshuffle the array elements to fit into the reduced size.

Iterating over all elements like this does not work as expected, let's make
sure to remove all mappings properly.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 9ec01bfe26..f31aa77ffe 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -695,13 +695,11 @@ static void qemu_vfio_reset(QEMUVFIOState *s)
 /* Close and free the VFIO resources. */
 void qemu_vfio_close(QEMUVFIOState *s)
 {
-    int i;
-
     if (!s) {
         return;
     }
-    for (i = 0; i < s->nr_mappings; ++i) {
-        qemu_vfio_undo_mapping(s, &s->mappings[i], NULL);
+    while (s->nr_mappings) {
+        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 14:32   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 03/15] util: vfio-helpers: Factor out removal " David Hildenbrand
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

Everybody discards the error. Let's error_report() instead so this error
doesn't get lost.

This is now the same error handling as in qemu_vfio_do_mapping(). However,
we don't report any errors via the return value to the caller. This seems
to be one of these "will never happen, but let's better print an error
because the caller can't handle it either way" cases.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index f31aa77ffe..7085ca702c 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -541,8 +541,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
 /**
  * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
-                                   Error **errp)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
 {
     int index;
     struct vfio_iommu_type1_dma_unmap unmap = {
@@ -557,7 +556,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
     assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
     assert(index >= 0 && index < s->nr_mappings);
     if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
-        error_setg_errno(errp, errno, "VFIO_UNMAP_DMA failed");
+        error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
     }
     memmove(mapping, &s->mappings[index + 1],
             sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
@@ -622,7 +621,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
             assert(qemu_vfio_verify_mappings(s));
             ret = qemu_vfio_do_mapping(s, host, size, iova0);
             if (ret) {
-                qemu_vfio_undo_mapping(s, mapping, NULL);
+                qemu_vfio_undo_mapping(s, mapping);
                 goto out;
             }
             s->low_water_mark += size;
@@ -682,7 +681,7 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
     if (!m) {
         goto out;
     }
-    qemu_vfio_undo_mapping(s, m, NULL);
+    qemu_vfio_undo_mapping(s, m);
 out:
     qemu_mutex_unlock(&s->lock);
 }
@@ -699,7 +698,7 @@ void qemu_vfio_close(QEMUVFIOState *s)
         return;
     }
     while (s->nr_mappings) {
-        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
+        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 03/15] util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 14:45   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

Factor it out and properly use it where applicable. Make
qemu_vfio_undo_mapping() look like qemu_vfio_do_mapping(), passing the
size and iova, not the mapping.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 7085ca702c..f0c77f0d69 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -518,6 +518,20 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
     return insert;
 }
 
+/**
+ * Remove the mapping from @s and free it.
+ */
+static void qemu_vfio_remove_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+{
+    const int index = mapping - s->mappings;
+
+    assert(index >= 0 && index < s->nr_mappings);
+    memmove(mapping, &s->mappings[index + 1],
+            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
+    s->nr_mappings--;
+    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
+}
+
 /* Do the DMA mapping with VFIO. */
 static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
                                 uint64_t iova)
@@ -539,29 +553,22 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
 }
 
 /**
- * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
+ * Undo the DMA mapping from @s with VFIO.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, size_t size, uint64_t iova)
 {
-    int index;
     struct vfio_iommu_type1_dma_unmap unmap = {
         .argsz = sizeof(unmap),
         .flags = 0,
-        .iova = mapping->iova,
-        .size = mapping->size,
+        .iova = iova,
+        .size = size,
     };
 
-    index = mapping - s->mappings;
-    assert(mapping->size > 0);
-    assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
-    assert(index >= 0 && index < s->nr_mappings);
+    assert(size > 0);
+    assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
     if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
     }
-    memmove(mapping, &s->mappings[index + 1],
-            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
-    s->nr_mappings--;
-    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
 }
 
 /* Check if the mapping list is (ascending) ordered. */
@@ -621,7 +628,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
             assert(qemu_vfio_verify_mappings(s));
             ret = qemu_vfio_do_mapping(s, host, size, iova0);
             if (ret) {
-                qemu_vfio_undo_mapping(s, mapping);
+                qemu_vfio_remove_mapping(s, mapping);
                 goto out;
             }
             s->low_water_mark += size;
@@ -681,7 +688,8 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
     if (!m) {
         goto out;
     }
-    qemu_vfio_undo_mapping(s, m);
+    qemu_vfio_undo_mapping(s, m->size, m->iova);
+    qemu_vfio_remove_mapping(s, m);
 out:
     qemu_mutex_unlock(&s->lock);
 }
@@ -698,7 +706,10 @@ void qemu_vfio_close(QEMUVFIOState *s)
         return;
     }
     while (s->nr_mappings) {
-        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
+        IOVAMapping *m = &s->mappings[s->nr_mappings - 1];
+
+        qemu_vfio_undo_mapping(s, m->size, m->iova);
+        qemu_vfio_remove_mapping(s, m);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 03/15] util: vfio-helpers: Factor out removal " David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Factor all settings out into qemu_ram_apply_settings().

For memory_try_enable_merging(), the important bit is that it won't be
called with XEN - which is now still the case as new_block->host will
remain NULL.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 3e980e14e0..13a0ca91fb 100644
--- a/exec.c
+++ b/exec.c
@@ -2069,6 +2069,21 @@ static int memory_try_enable_merging(void *addr, size_t len)
     return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
+static void qemu_ram_apply_settings(void *host, size_t length)
+{
+    memory_try_enable_merging(host, length);
+    qemu_ram_setup_dump(host, length);
+    qemu_madvise(host, length, QEMU_MADV_HUGEPAGE);
+    /*
+     * MADV_DONTFORK is also needed by KVM in absence of synchronous MMU
+     * Configure it unless the machine is a qtest server, in which case KVM is
+     * not used and it may be forked (eg for fuzzing purposes).
+     */
+    if (!qtest_enabled()) {
+        qemu_madvise(host, length, QEMU_MADV_DONTFORK);
+    }
+}
+
 /*
  * Resizing RAM while migrating can result in the migration being canceled.
  * Care has to be taken if the guest might have already detected the memory.
@@ -2227,7 +2242,6 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                 qemu_mutex_unlock_ramlist();
                 return;
             }
-            memory_try_enable_merging(new_block->host, new_block->max_length);
         }
     }
 
@@ -2265,17 +2279,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                                         DIRTY_CLIENTS_ALL);
 
     if (new_block->host) {
-        qemu_ram_setup_dump(new_block->host, new_block->max_length);
-        qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
-        /*
-         * MADV_DONTFORK is also needed by KVM in absence of synchronous MMU
-         * Configure it unless the machine is a qtest server, in which case
-         * KVM is not used and it may be forked (eg for fuzzing purposes).
-         */
-        if (!qtest_enabled()) {
-            qemu_madvise(new_block->host, new_block->max_length,
-                         QEMU_MADV_DONTFORK);
-        }
+        qemu_ram_apply_settings(new_block->host, new_block->max_length);
         ram_block_notify_add(new_block->host, new_block->used_length,
                              new_block->max_length);
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 06/15] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

I don't see why we shouldn't apply all settings to make it look like the
surrounding RAM (and enable proper VMA merging).

Note: memory backend settings might have overridden these settings. We
would need a callback to let the memory backend fix that up.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 13a0ca91fb..7df1ecaf5d 100644
--- a/exec.c
+++ b/exec.c
@@ -2516,8 +2516,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                                  length, addr);
                     exit(1);
                 }
-                memory_try_enable_merging(vaddr, length);
-                qemu_ram_setup_dump(vaddr, length);
+                qemu_ram_apply_settings(vaddr, length);
             }
         }
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 06/15] exec: Drop "shared" parameter from ram_block_add()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Properly store it in the flags of the ram block instead (and the flag
even already exists and is used).

E.g., qemu_ram_is_shared() now properly succeeds on all ram blocks that are
actually shared.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index 7df1ecaf5d..9c3cc79193 100644
--- a/exec.c
+++ b/exec.c
@@ -2211,7 +2211,7 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
     }
 }
 
-static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
+static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
     RAMBlock *block;
     RAMBlock *last_block = NULL;
@@ -2234,7 +2234,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
             }
         } else {
             new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align, shared);
+                                             &new_block->mr->align,
+                                             qemu_ram_is_shared(new_block));
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
@@ -2339,7 +2340,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
@@ -2401,10 +2402,13 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     if (host) {
         new_block->flags |= RAM_PREALLOC;
     }
+    if (share) {
+        new_block->flags |= RAM_SHARED;
+    }
     if (resizeable) {
         new_block->flags |= RAM_RESIZEABLE;
     }
-    ram_block_add(new_block, &local_err, share);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 06/15] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 15:03   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu, Greg Kurz,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Let's factor out calculating the size of the guard page and rename the
variable to make it clearer that this pagesize only applies to the
guard page.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 27dcccd8ec..f0277f9fad 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+static inline size_t mmap_guard_pagesize(int fd)
+{
+#if defined(__powerpc64__) && defined(__linux__)
+    /* Mappings in the same segment must share the same page size */
+    return qemu_fd_getpagesize(fd);
+#else
+    return qemu_real_host_page_size;
+#endif
+}
+
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
                     bool shared,
                     bool is_pmem)
 {
+    const size_t guard_pagesize = mmap_guard_pagesize(fd);
     int flags;
     int map_sync_flags = 0;
     int guardfd;
     size_t offset;
-    size_t pagesize;
     size_t total;
     void *guardptr;
     void *ptr;
@@ -113,8 +123,7 @@ void *qemu_ram_mmap(int fd,
      * anonymous memory is OK.
      */
     flags = MAP_PRIVATE;
-    pagesize = qemu_fd_getpagesize(fd);
-    if (fd == -1 || pagesize == qemu_real_host_page_size) {
+    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
         guardfd = -1;
         flags |= MAP_ANONYMOUS;
     } else {
@@ -123,7 +132,6 @@ void *qemu_ram_mmap(int fd,
     }
 #else
     guardfd = -1;
-    pagesize = qemu_real_host_page_size;
     flags = MAP_PRIVATE | MAP_ANONYMOUS;
 #endif
 
@@ -135,7 +143,7 @@ void *qemu_ram_mmap(int fd,
 
     assert(is_power_of_2(align));
     /* Always align to host page size */
-    assert(align >= pagesize);
+    assert(align >= guard_pagesize);
 
     flags = MAP_FIXED;
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
@@ -189,8 +197,8 @@ void *qemu_ram_mmap(int fd,
      * a guard page guarding against potential buffer overflows.
      */
     total -= offset;
-    if (total > size + pagesize) {
-        munmap(ptr + size + pagesize, total - size - pagesize);
+    if (total > size + guard_pagesize) {
+        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
     }
 
     return ptr;
@@ -198,15 +206,8 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
-    size_t pagesize;
-
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-#if defined(__powerpc64__) && defined(__linux__)
-        pagesize = qemu_fd_getpagesize(fd);
-#else
-        pagesize = qemu_real_host_page_size;
-#endif
-        munmap(ptr, size + pagesize);
+        munmap(ptr, size + mmap_guard_pagesize(fd));
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 09/15] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

We want to reserve a memory region without actually populating memory.
Let's factor that out.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 58 +++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index f0277f9fad..9e9534a07e 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,38 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+/*
+ * Reserve a new memory region of the requested size to be used for mapping
+ * from the given fd (if any).
+ */
+static void *mmap_reserve(size_t size, int fd)
+{
+    int flags = MAP_PRIVATE;
+
+#if defined(__powerpc64__) && defined(__linux__)
+    /*
+     * On ppc64 mappings in the same segment (aka slice) must share the same
+     * page size. Since we will be re-allocating part of this segment
+     * from the supplied fd, we should make sure to use the same page size, to
+     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
+     * avoid allocating backing store memory.
+     * We do this unless we are using the system page size, in which case
+     * anonymous memory is OK.
+     */
+    if (fd == -1 || qemu_fd_getpagesize(fd) == qemu_real_host_page_size) {
+        fd = -1;
+        flags |= MAP_ANONYMOUS;
+    } else {
+        flags |= MAP_NORESERVE;
+    }
+#else
+    fd = -1;
+    flags |= MAP_ANONYMOUS;
+#endif
+
+    return mmap(0, size, PROT_NONE, flags, fd, 0);
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -101,7 +133,6 @@ void *qemu_ram_mmap(int fd,
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
     int flags;
     int map_sync_flags = 0;
-    int guardfd;
     size_t offset;
     size_t total;
     void *guardptr;
@@ -113,30 +144,7 @@ void *qemu_ram_mmap(int fd,
      */
     total = size + align;
 
-#if defined(__powerpc64__) && defined(__linux__)
-    /* On ppc64 mappings in the same segment (aka slice) must share the same
-     * page size. Since we will be re-allocating part of this segment
-     * from the supplied fd, we should make sure to use the same page size, to
-     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
-     * avoid allocating backing store memory.
-     * We do this unless we are using the system page size, in which case
-     * anonymous memory is OK.
-     */
-    flags = MAP_PRIVATE;
-    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
-        guardfd = -1;
-        flags |= MAP_ANONYMOUS;
-    } else {
-        guardfd = fd;
-        flags |= MAP_NORESERVE;
-    }
-#else
-    guardfd = -1;
-    flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#endif
-
-    guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
-
+    guardptr = mmap_reserve(total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 09/15] util/mmap-alloc: Factor out activating of memory to mmap_activate()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 10/15] util/mmap-alloc: Prepare for resizeable mmaps David Hildenbrand
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

We want to activate memory within a reserved memory region, to make it
accessible. Let's factor that out.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 90 +++++++++++++++++++++++++----------------------
 1 file changed, 48 insertions(+), 42 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 9e9534a07e..8f40ef4fed 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -114,6 +114,51 @@ static void *mmap_reserve(size_t size, int fd)
     return mmap(0, size, PROT_NONE, flags, fd, 0);
 }
 
+/*
+ * Activate memory in a reserved region from the given fd (if any), to make
+ * it accessible.
+ */
+static void *mmap_activate(void *ptr, size_t size, int fd, bool shared,
+                           bool is_pmem)
+{
+    int map_sync_flags = 0;
+    int flags = MAP_FIXED;
+    void *activated_ptr;
+
+    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
+    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
+    activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
+                         flags | map_sync_flags, fd, 0);
+    if (activated_ptr == MAP_FAILED && map_sync_flags) {
+        if (errno == ENOTSUP) {
+            char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+            char *file_name = g_malloc0(PATH_MAX);
+            int len = readlink(proc_link, file_name, PATH_MAX - 1);
+
+            if (len < 0) {
+                len = 0;
+            }
+            file_name[len] = '\0';
+            fprintf(stderr, "Warning: requesting persistence across crashes "
+                    "for backend file %s failed. Proceeding without "
+                    "persistence, data might become corrupted in case of host "
+                    "crash.\n", file_name);
+            g_free(proc_link);
+            g_free(file_name);
+        }
+        /*
+         * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
+         * again without these flags to handle backwards compatibility.
+         */
+        activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+    }
+    return activated_ptr;
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -131,12 +176,8 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
-    int flags;
-    int map_sync_flags = 0;
-    size_t offset;
-    size_t total;
-    void *guardptr;
-    void *ptr;
+    size_t offset, total;
+    void *ptr, *guardptr;
 
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -153,44 +194,9 @@ void *qemu_ram_mmap(int fd,
     /* Always align to host page size */
     assert(align >= guard_pagesize);
 
-    flags = MAP_FIXED;
-    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
-    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-    if (shared && is_pmem) {
-        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
-    }
-
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-               flags | map_sync_flags, fd, 0);
-
-    if (ptr == MAP_FAILED && map_sync_flags) {
-        if (errno == ENOTSUP) {
-            char *proc_link, *file_name;
-            int len;
-            proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
-            file_name = g_malloc0(PATH_MAX);
-            len = readlink(proc_link, file_name, PATH_MAX - 1);
-            if (len < 0) {
-                len = 0;
-            }
-            file_name[len] = '\0';
-            fprintf(stderr, "Warning: requesting persistence across crashes "
-                    "for backend file %s failed. Proceeding without "
-                    "persistence, data might become corrupted in case of host "
-                    "crash.\n", file_name);
-            g_free(proc_link);
-            g_free(file_name);
-        }
-        /*
-         * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
-         * we will remove these flags to handle compatibility.
-         */
-        ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-                   flags, fd, 0);
-    }
-
+    ptr = mmap_activate(guardptr + offset, size, fd, shared, is_pmem);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 10/15] util/mmap-alloc: Prepare for resizeable mmaps
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 09/15] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 15:09   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 11/15] util/mmap-alloc: Implement " David Hildenbrand
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

When shrinking a mmap we want to re-reserve the already activated area.
When growing a memory region, we want to activate starting with a given
fd_offset. Prepare by allowing to pass these parameters.

Also, let's make sure we always process full pages, to avoid
unmapping/remapping pages that are already in use when
growing/shrinking. Add some asserts.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f40ef4fed..2767caa33b 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -83,12 +83,12 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 }
 
 /*
- * Reserve a new memory region of the requested size to be used for mapping
- * from the given fd (if any).
+ * Reserve a new memory region of the requested size or re-reserve parts
+ * of an activated region to be used for mapping from the given fd (if any).
  */
-static void *mmap_reserve(size_t size, int fd)
+static void *mmap_reserve(void *ptr, size_t size, int fd)
 {
-    int flags = MAP_PRIVATE;
+    int flags = MAP_PRIVATE | (ptr ? MAP_FIXED : 0);
 
 #if defined(__powerpc64__) && defined(__linux__)
     /*
@@ -111,20 +111,24 @@ static void *mmap_reserve(size_t size, int fd)
     flags |= MAP_ANONYMOUS;
 #endif
 
-    return mmap(0, size, PROT_NONE, flags, fd, 0);
+    return mmap(ptr, size, PROT_NONE, flags, fd, 0);
 }
 
 /*
  * Activate memory in a reserved region from the given fd (if any), to make
  * it accessible.
  */
-static void *mmap_activate(void *ptr, size_t size, int fd, bool shared,
-                           bool is_pmem)
+static void *mmap_activate(void *ptr, size_t size, int fd, size_t fd_offset,
+                           bool shared, bool is_pmem)
 {
     int map_sync_flags = 0;
     int flags = MAP_FIXED;
     void *activated_ptr;
 
+    if (fd == -1) {
+        fd_offset = 0;
+    }
+
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
     if (shared && is_pmem) {
@@ -132,7 +136,7 @@ static void *mmap_activate(void *ptr, size_t size, int fd, bool shared,
     }
 
     activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
-                         flags | map_sync_flags, fd, 0);
+                         flags | map_sync_flags, fd, fd_offset);
     if (activated_ptr == MAP_FAILED && map_sync_flags) {
         if (errno == ENOTSUP) {
             char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
@@ -154,7 +158,8 @@ static void *mmap_activate(void *ptr, size_t size, int fd, bool shared,
          * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
          * again without these flags to handle backwards compatibility.
          */
-        activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+        activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd,
+                             fd_offset);
     }
     return activated_ptr;
 }
@@ -176,16 +181,19 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
+    const size_t pagesize = qemu_fd_getpagesize(fd);
     size_t offset, total;
     void *ptr, *guardptr;
 
+    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
     /*
      * Note: this always allocates at least one extra page of virtual address
      * space, even if size is already aligned.
      */
     total = size + align;
 
-    guardptr = mmap_reserve(total, fd);
+    guardptr = mmap_reserve(NULL, total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
@@ -196,7 +204,7 @@ void *qemu_ram_mmap(int fd,
 
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap_activate(guardptr + offset, size, fd, shared, is_pmem);
+    ptr = mmap_activate(guardptr + offset, size, fd, 0, shared, is_pmem);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
@@ -220,6 +228,10 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
+    const size_t pagesize = qemu_fd_getpagesize(fd);
+
+    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
         munmap(ptr, size + mmap_guard_pagesize(fd));
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 11/15] util/mmap-alloc: Implement resizeable mmaps
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 10/15] util/mmap-alloc: Prepare for resizeable mmaps David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 15:14   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

Implement resizeable mmaps. For now, the actual resizing is not wired up.
Introduce qemu_ram_mmap_resizeable() and qemu_ram_mmap_resize(). Make
qemu_ram_mmap() a wrapper of qemu_ram_mmap_resizeable().

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/mmap-alloc.h | 21 +++++++++++--------
 util/mmap-alloc.c         | 43 ++++++++++++++++++++++++++++-----------
 2 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index e786266b92..ca8f7edf70 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,11 +7,13 @@ size_t qemu_fd_getpagesize(int fd);
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
 /**
- * qemu_ram_mmap: mmap the specified file or device.
+ * qemu_ram_mmap_resizeable: reserve a memory region of @max_size to mmap the
+ *                           specified file or device and mmap @size of it.
  *
  * Parameters:
  *  @fd: the file or the device to mmap
  *  @size: the number of bytes to be mmaped
+ *  @max_size: the number of bytes to be reserved
  *  @align: if not zero, specify the alignment of the starting mapping address;
  *          otherwise, the alignment in use will be determined by QEMU.
  *  @shared: map has RAM_SHARED flag.
@@ -21,12 +23,15 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  On success, return a pointer to the mapped area.
  *  On failure, return MAP_FAILED.
  */
-void *qemu_ram_mmap(int fd,
-                    size_t size,
-                    size_t align,
-                    bool shared,
-                    bool is_pmem);
-
-void qemu_ram_munmap(int fd, void *ptr, size_t size);
+void *qemu_ram_mmap_resizeable(int fd, size_t size, size_t max_size,
+                              size_t align, bool shared, bool is_pmem);
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+                          bool shared, bool is_pmem);
+static inline void *qemu_ram_mmap(int fd, size_t size, size_t align,
+                                  bool shared, bool is_pmem)
+{
+    return qemu_ram_mmap_resizeable(fd, size, size, align, shared, is_pmem);
+}
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size);
 
 #endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 2767caa33b..7ed85f16d3 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -174,11 +174,8 @@ static inline size_t mmap_guard_pagesize(int fd)
 #endif
 }
 
-void *qemu_ram_mmap(int fd,
-                    size_t size,
-                    size_t align,
-                    bool shared,
-                    bool is_pmem)
+void *qemu_ram_mmap_resizeable(int fd, size_t size, size_t max_size,
+                               size_t align, bool shared, bool is_pmem)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
     const size_t pagesize = qemu_fd_getpagesize(fd);
@@ -186,12 +183,14 @@ void *qemu_ram_mmap(int fd,
     void *ptr, *guardptr;
 
     g_assert(QEMU_IS_ALIGNED(size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
     /*
      * Note: this always allocates at least one extra page of virtual address
-     * space, even if size is already aligned.
+     * space, even if the size is already aligned. We will reserve an area of
+     * at least max_size, but only activate the requested part of it.
      */
-    total = size + align;
+    total = max_size + align;
 
     guardptr = mmap_reserve(NULL, total, fd);
     if (guardptr == MAP_FAILED) {
@@ -219,21 +218,41 @@ void *qemu_ram_mmap(int fd,
      * a guard page guarding against potential buffer overflows.
      */
     total -= offset;
-    if (total > size + guard_pagesize) {
-        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
+    if (total > max_size + guard_pagesize) {
+        munmap(ptr + max_size + guard_pagesize,
+               total - max_size - guard_pagesize);
     }
 
     return ptr;
 }
 
-void qemu_ram_munmap(int fd, void *ptr, size_t size)
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+                          bool shared, bool is_pmem)
 {
     const size_t pagesize = qemu_fd_getpagesize(fd);
 
-    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(old_size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(new_size, pagesize));
+
+    if (old_size < new_size) {
+        /* activate the missing piece in the reserved area */
+        ptr = mmap_activate(ptr + old_size, new_size - old_size, fd, old_size,
+                            shared, is_pmem);
+    } else if (old_size > new_size) {
+        /* discard this piece, marking it reserved */
+        ptr = mmap_reserve(ptr + new_size, old_size - new_size, fd);
+    }
+    return ptr != MAP_FAILED;
+}
+
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size)
+{
+    const size_t pagesize = qemu_fd_getpagesize(fd);
+
+    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-        munmap(ptr, size + mmap_guard_pagesize(fd));
+        munmap(ptr, max_size + mmap_guard_pagesize(fd));
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 11/15] util/mmap-alloc: Implement " David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 15:17   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

Let's implement ram_block_resized(), allowing resizeable mappings.

For resizeable mappings, we reserve $max_size IOVA address space, but only
map $size of it. When resizing, unmap the old part and remap the new
part. We'll need e.g., new ioctl to do this atomically (e.g., to resize
while the guest is running).

Right now, we only resize RAM blocks during incoming migration (when
syncing RAM block sizes during the precopy phase) or after guest
resets when building acpi tables. Any future user of resizeable RAM has to
be aware that vfio has to be treated with care.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/trace-events   |  7 ++--
 util/vfio-helpers.c | 95 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 88 insertions(+), 14 deletions(-)

diff --git a/util/trace-events b/util/trace-events
index 83b6639018..a4d39eca5e 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -74,10 +74,11 @@ qemu_mutex_unlock(void *mutex, const char *file, const int line) "released mutex
 
 # vfio-helpers.c
 qemu_vfio_dma_reset_temporary(void *s) "s %p"
-qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
-qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
+qemu_vfio_ram_block_added(void *s, void *p, size_t size, size_t max_size) "s %p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_removed(void *s, void *p, size_t size, size_t max_size) "s %p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_resized(void *s, void *p, size_t old_size, size_t new_sizze) "s %p host %p old_size 0x%zx new_size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
-qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t iova) "s %p host %p size %zu index %d iova 0x%"PRIx64
+qemu_vfio_new_mapping(void *s, void *host, size_t size, size_t max_size, int index, uint64_t iova) "s %p host %p size %zu max_size %zu index %d iova 0x%"PRIx64
 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p host %p size %zu iova 0x%"PRIx64
 qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size %zu temporary %d iova %p"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index f0c77f0d69..789faf38bd 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -36,6 +36,7 @@ typedef struct {
     /* Page aligned addr. */
     void *host;
     size_t size;
+    size_t max_size;
     uint64_t iova;
 } IOVAMapping;
 
@@ -372,14 +373,20 @@ fail_container:
     return ret;
 }
 
+static int qemu_vfio_dma_map_resizeable(QEMUVFIOState *s, void *host,
+                                        size_t size, size_t max_size,
+                                        bool temporary, uint64_t *iova);
+static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
+                                     size_t old_size, size_t new_size);
+
 static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
                                       size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     int ret;
 
-    trace_qemu_vfio_ram_block_added(s, host, max_size);
-    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
+    trace_qemu_vfio_ram_block_added(s, host, size, max_size);
+    ret = qemu_vfio_dma_map_resizeable(s, host, size, max_size, false, NULL);
     if (ret) {
         error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, max_size,
                      strerror(-ret));
@@ -391,16 +398,28 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host,
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     if (host) {
-        trace_qemu_vfio_ram_block_removed(s, host, max_size);
+        trace_qemu_vfio_ram_block_removed(s, host, size, max_size);
         qemu_vfio_dma_unmap(s, host);
     }
 }
 
+static void qemu_vfio_ram_block_resized(RAMBlockNotifier *n, void *host,
+                                        size_t old_size, size_t new_size)
+{
+    QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+
+    if (host) {
+        trace_qemu_vfio_ram_block_resized(s, host, old_size, new_size);
+        qemu_vfio_dma_map_resize(s, host, old_size, new_size);
+    }
+}
+
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
     qemu_mutex_init(&s->lock);
     s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
     s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
+    s->ram_notifier.ram_block_resized = qemu_vfio_ram_block_resized;
     s->low_water_mark = QEMU_VFIO_IOVA_MIN;
     s->high_water_mark = QEMU_VFIO_IOVA_MAX;
     ram_block_notifier_add(&s->ram_notifier);
@@ -495,16 +514,23 @@ static IOVAMapping *qemu_vfio_find_mapping(QEMUVFIOState *s, void *host,
  */
 static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
                                           void *host, size_t size,
-                                          int index, uint64_t iova)
+                                          size_t max_size, int index,
+                                          uint64_t iova)
 {
+    const IOVAMapping m = {
+        .host = host,
+        .size = size,
+        .max_size = max_size,
+        .iova = iova,
+    };
     int shift;
-    IOVAMapping m = {.host = host, .size = size, .iova = iova};
     IOVAMapping *insert;
 
     assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
+    assert(size <= max_size);
     assert(QEMU_IS_ALIGNED(s->low_water_mark, qemu_real_host_page_size));
     assert(QEMU_IS_ALIGNED(s->high_water_mark, qemu_real_host_page_size));
-    trace_qemu_vfio_new_mapping(s, host, size, index, iova);
+    trace_qemu_vfio_new_mapping(s, host, size, max_size, index, iova);
 
     assert(index >= 0);
     s->nr_mappings++;
@@ -597,9 +623,14 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
  * the result in @iova if not NULL. The caller need to make sure the area is
  * aligned to page size, and mustn't overlap with existing mapping areas (split
  * mapping status within this area is not allowed).
+ *
+ * If size < max_size, a region of max_size in IOVA address is reserved, such
+ * that the mapping can later be resized. Resizeable mappings are only allowed
+ * for !temporary mappings.
  */
-int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-                      bool temporary, uint64_t *iova)
+static int qemu_vfio_dma_map_resizeable(QEMUVFIOState *s, void *host,
+                                        size_t size, size_t max_size,
+                                        bool temporary, uint64_t *iova)
 {
     int ret = 0;
     int index;
@@ -608,19 +639,24 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
 
     assert(QEMU_PTR_IS_ALIGNED(host, qemu_real_host_page_size));
     assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
+    assert(QEMU_IS_ALIGNED(max_size, qemu_real_host_page_size));
+    assert(size == max_size || !temporary);
+    assert(size <= max_size);
+
     trace_qemu_vfio_dma_map(s, host, size, temporary, iova);
     qemu_mutex_lock(&s->lock);
     mapping = qemu_vfio_find_mapping(s, host, &index);
     if (mapping) {
         iova0 = mapping->iova + ((uint8_t *)host - (uint8_t *)mapping->host);
     } else {
-        if (s->high_water_mark - s->low_water_mark + 1 < size) {
+        if (s->high_water_mark - s->low_water_mark + 1 < max_size) {
             ret = -ENOMEM;
             goto out;
         }
         if (!temporary) {
             iova0 = s->low_water_mark;
-            mapping = qemu_vfio_add_mapping(s, host, size, index + 1, iova0);
+            mapping = qemu_vfio_add_mapping(s, host, size, max_size, index + 1,
+                                            iova0);
             if (!mapping) {
                 ret = -ENOMEM;
                 goto out;
@@ -631,7 +667,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
                 qemu_vfio_remove_mapping(s, mapping);
                 goto out;
             }
-            s->low_water_mark += size;
+            s->low_water_mark += max_size;
             qemu_vfio_dump_mappings(s);
         } else {
             iova0 = s->high_water_mark - size;
@@ -650,6 +686,12 @@ out:
     return ret;
 }
 
+int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
+                      bool temporary, uint64_t *iova)
+{
+    return qemu_vfio_dma_map_resizeable(s, host, size, size, temporary, iova);
+}
+
 /* Reset the high watermark and free all "temporary" mappings. */
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s)
 {
@@ -694,6 +736,37 @@ out:
     qemu_mutex_unlock(&s->lock);
 }
 
+static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
+                                     size_t old_size, size_t new_size)
+{
+    IOVAMapping *m;
+    int index = 0;
+
+    qemu_mutex_lock(&s->lock);
+    m = qemu_vfio_find_mapping(s, host, &index);
+    if (!m) {
+        return;
+    }
+    assert(m->size == old_size);
+    assert(new_size <= m->max_size);
+
+    /*
+     * For now, we must unmap the whole mapped range first and remap with
+     * the new size. The reason is that VFIO_IOMMU_UNMAP_DMA might fail
+     * when partially unmapping previous mappings. Although we could add
+     * new mappings to extend the old range, we won't able to always
+     * shrink. The side effect is that it's never safe to resize during VM
+     * execution and we'll e.g., need a new IOCTL to make this work.
+     */
+    qemu_vfio_undo_mapping(s, m->iova, m->size);
+    qemu_vfio_do_mapping(s, host, m->iova, new_size);
+
+    m->size = new_size;
+    assert(qemu_vfio_verify_mappings(s));
+
+    qemu_mutex_unlock(&s->lock);
+}
+
 static void qemu_vfio_reset(QEMUVFIOState *s)
 {
     ioctl(s->device, VFIO_DEVICE_RESET);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 13/15] util: oslib: Resizeable anonymous allocations under POSIX
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 15:20   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
  2020-03-05 14:29 ` [PATCH v4 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Introduce qemu_anon_ram_alloc_resizeable() and qemu_anon_ram_resize().
Implement them under POSIX and make them return NULL under WIN32.

Under POSIX, we make use of resizeable mmaps. An implementation under
WIN32 is theoretically possible AFAIK and can be added later.

In qemu_anon_ram_free(), rename the size parameter to max_size, to make
it clearer that we have to use the maximum size when freeing resizeable
anonymous allocations.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/osdep.h |  6 +++++-
 util/oslib-posix.c   | 37 ++++++++++++++++++++++++++++++++++---
 util/oslib-win32.c   | 14 ++++++++++++++
 util/trace-events    |  4 +++-
 4 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 9bd3dcfd13..a1ea9e043d 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -311,8 +311,12 @@ int qemu_daemon(int nochdir, int noclose);
 void *qemu_try_memalign(size_t alignment, size_t size);
 void *qemu_memalign(size_t alignment, size_t size);
 void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
+void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
+                                     uint64_t *align, bool shared);
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared);
 void qemu_vfree(void *ptr);
-void qemu_anon_ram_free(void *ptr, size_t size);
+void qemu_anon_ram_free(void *ptr, size_t max_size);
 
 #define QEMU_MADV_INVALID -1
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 897e8f3ba6..34b1ce74b7 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -223,16 +223,47 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
     return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
+                                     uint64_t *alignment, bool shared)
+{
+    size_t align = QEMU_VMALLOC_ALIGN;
+    void *ptr = qemu_ram_mmap_resizeable(-1, size, max_size, align, shared,
+                                         false);
+
+    if (ptr == MAP_FAILED) {
+        return NULL;
+    }
+
+    if (alignment) {
+        *alignment = align;
+    }
+
+    trace_qemu_anon_ram_alloc_resizeable(size, max_size, ptr);
+    return ptr;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared)
+{
+    bool resized = qemu_ram_mmap_resize(ptr, -1, old_size, new_size, shared,
+                                        false);
+
+    if (resized) {
+        trace_qemu_anon_ram_resize(old_size, new_size, ptr);
+    }
+    return resized;
+}
+
 void qemu_vfree(void *ptr)
 {
     trace_qemu_vfree(ptr);
     free(ptr);
 }
 
-void qemu_anon_ram_free(void *ptr, size_t size)
+void qemu_anon_ram_free(void *ptr, size_t max_size)
 {
-    trace_qemu_anon_ram_free(ptr, size);
-    qemu_ram_munmap(-1, ptr, size);
+    trace_qemu_anon_ram_free(ptr, max_size);
+    qemu_ram_munmap(-1, ptr, max_size);
 }
 
 void qemu_set_block(int fd)
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e9b14ab178..c034fdfe01 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -90,6 +90,20 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared)
     return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
+                                     uint64_t *align, bool shared)
+{
+    /* resizeable ram not implemented yet */
+    return NULL;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared)
+{
+    /* resizeable ram not implemented yet */
+    return false;
+}
+
 void qemu_vfree(void *ptr)
 {
     trace_qemu_vfree(ptr);
diff --git a/util/trace-events b/util/trace-events
index a4d39eca5e..24a6f1a1e1 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -46,8 +46,10 @@ qemu_co_mutex_unlock_return(void *mutex, void *self) "mutex %p self %p"
 # oslib-posix.c
 qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size %zu ptr %p"
 qemu_anon_ram_alloc(size_t size, void *ptr) "size %zu ptr %p"
+qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size, void *ptr) "size %zu max_size %zu ptr %p"
+qemu_anon_ram_resize(size_t old_size, size_t new_size, void *ptr) "old_size %zu new_size %zu ptr %p"
 qemu_vfree(void *ptr) "ptr %p"
-qemu_anon_ram_free(void *ptr, size_t size) "ptr %p size %zu"
+qemu_anon_ram_free(void *ptr, size_t max_size) "ptr %p max_size %zu"
 
 # hbitmap.c
 hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned long cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx"
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 14/15] numa: Introduce ram_block_notifiers_support_resize()
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (12 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 15:24   ` Murilo Opsfelder Araújo
  2020-03-05 14:29 ` [PATCH v4 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We want to actually use resizeable allocations in resizeable ram blocks
(IOW, make everything between used_length and max_length inaccessible) -
however, not all ram block notifiers can support that.

Introduce a way to detect if any registered notifier does not
support resizes - ram_block_notifiers_support_resize() - which we can later
use to fallback to legacy handling if a registered notifier (esp., SEV and
HAX) does not support actual resizes.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/numa.c         | 12 ++++++++++++
 include/exec/ramlist.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 37ce175e13..1d5288c22c 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -914,3 +914,15 @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
         }
     }
 }
+
+bool ram_block_notifiers_support_resize(void)
+{
+    RAMBlockNotifier *notifier;
+
+    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+        if (!notifier->ram_block_resized) {
+            return false;
+        }
+    }
+    return true;
+}
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 293c0ddabe..ac5811be96 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -79,6 +79,7 @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
 void ram_block_notify_add(void *host, size_t size, size_t max_size);
 void ram_block_notify_remove(void *host, size_t size, size_t max_size);
 void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
+bool ram_block_notifiers_support_resize(void);
 
 void ram_block_dump(Monitor *mon);
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX
  2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (13 preceding siblings ...)
  2020-03-05 14:29 ` [PATCH v4 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
@ 2020-03-05 14:29 ` David Hildenbrand
  2020-03-25 15:34   ` Murilo Opsfelder Araújo
  14 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2020-03-05 14:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Shameerali Kolothum Thodi, Murilo Opsfelder Araujo,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

We can now make use of resizeable anonymous allocations to implement
actually resizeable ram blocks. Resizeable anonymous allocations are
not implemented under WIN32 yet and are not available when using
alternative allocators. Fall back to the existing handling.

We also have to fallback to the existing handling in case any ram block
notifier does not support resizing (esp., AMD SEV, HAX) yet. Remember
in RAM_RESIZEABLE_ALLOC if we are using resizeable anonymous allocations.

Try to grow early, as that can easily fail if out of memory. Shrink late
and ignore errors (nothing will actually break). Warn only.

The benefit of actually resizeable ram blocks is that e.g., under Linux,
only the actual size will be reserved (even if
"/proc/sys/vm/overcommit_memory" is set to "never"). Additional memory will
be reserved when trying to resize, which allows to have ram blocks that
start small but can theoretically grow very large.

Note1: We are not able to create resizeable ram blocks with pre-allocated
       memory yet, so prealloc is not affected.
Note2: mlock should work as it used to as os_mlock() does a
       mlockall(MCL_CURRENT | MCL_FUTURE), which includes future
       mappings.
Note3: Nobody should access memory beyond used_length. Memory notifiers
       already properly take care of this, only ram block notifiers
       violate this constraint and, therefore, have to be special-cased.
       Especially, any ram block notifier that might dynamically
       register at runtime (e.g., vfio) has to support resizes. Add an
       assert for that. Both, HAX and SEV register early, so they are
       fine.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                    | 65 ++++++++++++++++++++++++++++++++++++---
 hw/core/numa.c            |  7 +++++
 include/exec/cpu-common.h |  2 ++
 include/exec/memory.h     |  8 +++++
 4 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 9c3cc79193..6c6b6e12d2 100644
--- a/exec.c
+++ b/exec.c
@@ -2001,6 +2001,16 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
     rb->flags &= ~RAM_MIGRATABLE;
 }
 
+bool qemu_ram_is_resizeable(RAMBlock *rb)
+{
+    return rb->flags & RAM_RESIZEABLE;
+}
+
+bool qemu_ram_is_resizeable_alloc(RAMBlock *rb)
+{
+    return rb->flags & RAM_RESIZEABLE_ALLOC;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState *dev)
 {
@@ -2094,6 +2104,7 @@ static void qemu_ram_apply_settings(void *host, size_t length)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+    const bool shared = block->flags & RAM_SHARED;
     const ram_addr_t oldsize = block->used_length;
 
     assert(block);
@@ -2104,7 +2115,7 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return 0;
     }
 
-    if (!(block->flags & RAM_RESIZEABLE)) {
+    if (!qemu_ram_is_resizeable(block)) {
         error_setg_errno(errp, EINVAL,
                          "Length mismatch: %s: 0x" RAM_ADDR_FMT
                          " in != 0x" RAM_ADDR_FMT, block->idstr,
@@ -2120,6 +2131,15 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return -EINVAL;
     }
 
+    if (oldsize < newsize && qemu_ram_is_resizeable_alloc(block)) {
+        if (!qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+            error_setg_errno(errp, -ENOMEM, "Cannot allocate enough memory.");
+            return -ENOMEM;
+        }
+        /* apply settings for the newly accessible memory */
+        qemu_ram_apply_settings(block->host + oldsize, newsize - oldsize);
+    }
+
     /* Notify before modifying the ram block and touching the bitmaps. */
     if (block->host) {
         ram_block_notify_resize(block->host, oldsize, newsize);
@@ -2133,6 +2153,16 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
     if (block->resized) {
         block->resized(block->idstr, newsize, block->host);
     }
+
+    /*
+     * Shrinking will only fail in rare scenarios (e.g., maximum number of
+     * mappings reached), and can be ignored. Warn only.
+     */
+    if (newsize < oldsize && qemu_ram_is_resizeable_alloc(block) &&
+        !qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+        warn_report("Shrinking memory allocation failed.");
+    }
+
     return 0;
 }
 
@@ -2211,6 +2241,29 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
     }
 }
 
+static void ram_block_alloc_ram(RAMBlock *rb)
+{
+    const bool shared = qemu_ram_is_shared(rb);
+
+    assert(!(rb->flags & RAM_RESIZEABLE_ALLOC));
+    /*
+     * If we can, try to allocate actually resizeable ram. Will also fail
+     * if qemu_anon_ram_alloc_resizeable() is not implemented.
+     */
+    if (phys_mem_alloc == qemu_anon_ram_alloc &&
+        qemu_ram_is_resizeable(rb) &&
+        ram_block_notifiers_support_resize()) {
+        rb->host = qemu_anon_ram_alloc_resizeable(rb->used_length,
+                                                  rb->max_length,
+                                                  &rb->mr->align, shared);
+        if (rb->host) {
+            rb->flags |= RAM_RESIZEABLE_ALLOC;
+            return;
+        }
+    }
+    rb->host = phys_mem_alloc(rb->max_length, &rb->mr->align, shared);
+}
+
 static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
     RAMBlock *block;
@@ -2233,9 +2286,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                 return;
             }
         } else {
-            new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align,
-                                             qemu_ram_is_shared(new_block));
+            ram_block_alloc_ram(new_block);
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
@@ -2280,7 +2331,11 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                                         DIRTY_CLIENTS_ALL);
 
     if (new_block->host) {
-        qemu_ram_apply_settings(new_block->host, new_block->max_length);
+        if (qemu_ram_is_resizeable_alloc(new_block)) {
+            qemu_ram_apply_settings(new_block->host, new_block->used_length);
+        } else {
+            qemu_ram_apply_settings(new_block->host, new_block->max_length);
+        }
         ram_block_notify_add(new_block->host, new_block->used_length,
                              new_block->max_length);
     }
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 1d5288c22c..c547549e49 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -862,6 +862,13 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
     RAMBlockNotifier *notifier = opaque;
 
     if (host) {
+        /*
+         * Dynamically adding notifiers that don't support resizes is forbidden
+         * when dealing with resizeable ram blocks that have actually resizeable
+         * allocations.
+         */
+        g_assert(!qemu_ram_is_resizeable_alloc(rb) ||
+                 notifier->ram_block_resized);
         notifier->ram_block_added(notifier, host, size, max_size);
     }
     return 0;
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 09decb8d93..aacbf33b85 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -66,6 +66,8 @@ void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
 void qemu_ram_set_migratable(RAMBlock *rb);
 void qemu_ram_unset_migratable(RAMBlock *rb);
+bool qemu_ram_is_resizeable(RAMBlock *rb);
+bool qemu_ram_is_resizeable_alloc(RAMBlock *rb);
 
 size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index b9b9470a56..74805dd448 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/*
+ * Implies RAM_RESIZEABLE. Memory beyond the used_length is inaccessible
+ * (esp. initially and after resizing). For such memory blocks, only the
+ * used_length is reserved in the OS - resizing might fail. Will only be
+ * used with host OS support and if all ram block notifiers support resizing.
+ */
+#define RAM_RESIZEABLE_ALLOC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()
  2020-03-05 14:29 ` [PATCH v4 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
@ 2020-03-25 14:32   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 14:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Stefan Hajnoczi, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Alex Williamson, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:32 AM -03 David Hildenbrand wrote:
> Everybody discards the error. Let's error_report() instead so this error
> doesn't get lost.
>
> This is now the same error handling as in qemu_vfio_do_mapping(). However,
> we don't report any errors via the return value to the caller. This seems
> to be one of these "will never happen, but let's better print an error
> because the caller can't handle it either way" cases.
>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  util/vfio-helpers.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index f31aa77ffe..7085ca702c 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -541,8 +541,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void
> *host, size_t size, /**
>   * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
>   */
> -static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
> -                                   Error **errp)
> +static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
>  {
>      int index;
>      struct vfio_iommu_type1_dma_unmap unmap = {
> @@ -557,7 +556,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s,
> IOVAMapping *mapping, assert(QEMU_IS_ALIGNED(mapping->size,
> qemu_real_host_page_size)); assert(index >= 0 && index < s->nr_mappings);
>      if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
> -        error_setg_errno(errp, errno, "VFIO_UNMAP_DMA failed");
> +        error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
>      }
>      memmove(mapping, &s->mappings[index + 1],
>              sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
> @@ -622,7 +621,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host,
> size_t size, assert(qemu_vfio_verify_mappings(s));
>              ret = qemu_vfio_do_mapping(s, host, size, iova0);
>              if (ret) {
> -                qemu_vfio_undo_mapping(s, mapping, NULL);
> +                qemu_vfio_undo_mapping(s, mapping);
>                  goto out;
>              }
>              s->low_water_mark += size;
> @@ -682,7 +681,7 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
>      if (!m) {
>          goto out;
>      }
> -    qemu_vfio_undo_mapping(s, m, NULL);
> +    qemu_vfio_undo_mapping(s, m);
>  out:
>      qemu_mutex_unlock(&s->lock);
>  }
> @@ -699,7 +698,7 @@ void qemu_vfio_close(QEMUVFIOState *s)
>          return;
>      }
>      while (s->nr_mappings) {
> -        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
> +        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
>      }
>      ram_block_notifier_remove(&s->ram_notifier);
>      qemu_vfio_reset(s);


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 03/15] util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  2020-03-05 14:29 ` [PATCH v4 03/15] util: vfio-helpers: Factor out removal " David Hildenbrand
@ 2020-03-25 14:45   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 14:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Stefan Hajnoczi, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Alex Williamson, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:33 AM -03 David Hildenbrand wrote:
> Factor it out and properly use it where applicable. Make
> qemu_vfio_undo_mapping() look like qemu_vfio_do_mapping(), passing the
> size and iova, not the mapping.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  util/vfio-helpers.c | 43 +++++++++++++++++++++++++++----------------
>  1 file changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 7085ca702c..f0c77f0d69 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -518,6 +518,20 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState
> *s, return insert;
>  }
>
> +/**
> + * Remove the mapping from @s and free it.
> + */
> +static void qemu_vfio_remove_mapping(QEMUVFIOState *s, IOVAMapping
> *mapping) +{
> +    const int index = mapping - s->mappings;
> +
> +    assert(index >= 0 && index < s->nr_mappings);
> +    memmove(mapping, &s->mappings[index + 1],
> +            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
> +    s->nr_mappings--;
> +    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
> +}
> +
>  /* Do the DMA mapping with VFIO. */
>  static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
>                                  uint64_t iova)
> @@ -539,29 +553,22 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void
> *host, size_t size, }
>
>  /**
> - * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
> + * Undo the DMA mapping from @s with VFIO.
>   */
> -static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
> +static void qemu_vfio_undo_mapping(QEMUVFIOState *s, size_t size, uint64_t
> iova) {
> -    int index;
>      struct vfio_iommu_type1_dma_unmap unmap = {
>          .argsz = sizeof(unmap),
>          .flags = 0,
> -        .iova = mapping->iova,
> -        .size = mapping->size,
> +        .iova = iova,
> +        .size = size,
>      };
>
> -    index = mapping - s->mappings;
> -    assert(mapping->size > 0);
> -    assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
> -    assert(index >= 0 && index < s->nr_mappings);
> +    assert(size > 0);
> +    assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
>      if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
>          error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
>      }
> -    memmove(mapping, &s->mappings[index + 1],
> -            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
> -    s->nr_mappings--;
> -    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
>  }
>
>  /* Check if the mapping list is (ascending) ordered. */
> @@ -621,7 +628,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host,
> size_t size, assert(qemu_vfio_verify_mappings(s));
>              ret = qemu_vfio_do_mapping(s, host, size, iova0);
>              if (ret) {
> -                qemu_vfio_undo_mapping(s, mapping);
> +                qemu_vfio_remove_mapping(s, mapping);
>                  goto out;
>              }
>              s->low_water_mark += size;
> @@ -681,7 +688,8 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
>      if (!m) {
>          goto out;
>      }
> -    qemu_vfio_undo_mapping(s, m);
> +    qemu_vfio_undo_mapping(s, m->size, m->iova);
> +    qemu_vfio_remove_mapping(s, m);
>  out:
>      qemu_mutex_unlock(&s->lock);
>  }
> @@ -698,7 +706,10 @@ void qemu_vfio_close(QEMUVFIOState *s)
>          return;
>      }
>      while (s->nr_mappings) {
> -        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
> +        IOVAMapping *m = &s->mappings[s->nr_mappings - 1];
> +
> +        qemu_vfio_undo_mapping(s, m->size, m->iova);
> +        qemu_vfio_remove_mapping(s, m);
>      }
>      ram_block_notifier_remove(&s->ram_notifier);
>      qemu_vfio_reset(s);


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page
  2020-03-05 14:29 ` [PATCH v4 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
@ 2020-03-25 15:03   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 15:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	qemu-devel, Peter Xu, Dr . David Alan Gilbert, Greg Kurz,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:37 AM -03 David Hildenbrand wrote:
> Let's factor out calculating the size of the guard page and rename the
> variable to make it clearer that this pagesize only applies to the
> guard page.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  util/mmap-alloc.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 27dcccd8ec..f0277f9fad 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>      return qemu_real_host_page_size;
>  }
>
> +static inline size_t mmap_guard_pagesize(int fd)
> +{
> +#if defined(__powerpc64__) && defined(__linux__)
> +    /* Mappings in the same segment must share the same page size */
> +    return qemu_fd_getpagesize(fd);
> +#else
> +    return qemu_real_host_page_size;
> +#endif
> +}
> +
>  void *qemu_ram_mmap(int fd,
>                      size_t size,
>                      size_t align,
>                      bool shared,
>                      bool is_pmem)
>  {
> +    const size_t guard_pagesize = mmap_guard_pagesize(fd);
>      int flags;
>      int map_sync_flags = 0;
>      int guardfd;
>      size_t offset;
> -    size_t pagesize;
>      size_t total;
>      void *guardptr;
>      void *ptr;
> @@ -113,8 +123,7 @@ void *qemu_ram_mmap(int fd,
>       * anonymous memory is OK.
>       */
>      flags = MAP_PRIVATE;
> -    pagesize = qemu_fd_getpagesize(fd);
> -    if (fd == -1 || pagesize == qemu_real_host_page_size) {
> +    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
>          guardfd = -1;
>          flags |= MAP_ANONYMOUS;
>      } else {
> @@ -123,7 +132,6 @@ void *qemu_ram_mmap(int fd,
>      }
>  #else
>      guardfd = -1;
> -    pagesize = qemu_real_host_page_size;
>      flags = MAP_PRIVATE | MAP_ANONYMOUS;
>  #endif
>
> @@ -135,7 +143,7 @@ void *qemu_ram_mmap(int fd,
>
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
> -    assert(align >= pagesize);
> +    assert(align >= guard_pagesize);
>
>      flags = MAP_FIXED;
>      flags |= fd == -1 ? MAP_ANONYMOUS : 0;
> @@ -189,8 +197,8 @@ void *qemu_ram_mmap(int fd,
>       * a guard page guarding against potential buffer overflows.
>       */
>      total -= offset;
> -    if (total > size + pagesize) {
> -        munmap(ptr + size + pagesize, total - size - pagesize);
> +    if (total > size + guard_pagesize) {
> +        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
> }
>
>      return ptr;
> @@ -198,15 +206,8 @@ void *qemu_ram_mmap(int fd,
>
>  void qemu_ram_munmap(int fd, void *ptr, size_t size)
>  {
> -    size_t pagesize;
> -
>      if (ptr) {
>          /* Unmap both the RAM block and the guard page */
> -#if defined(__powerpc64__) && defined(__linux__)
> -        pagesize = qemu_fd_getpagesize(fd);
> -#else
> -        pagesize = qemu_real_host_page_size;
> -#endif
> -        munmap(ptr, size + pagesize);
> +        munmap(ptr, size + mmap_guard_pagesize(fd));
>      }
>  }


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 10/15] util/mmap-alloc: Prepare for resizeable mmaps
  2020-03-05 14:29 ` [PATCH v4 10/15] util/mmap-alloc: Prepare for resizeable mmaps David Hildenbrand
@ 2020-03-25 15:09   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 15:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Greg Kurz, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:40 AM -03 David Hildenbrand wrote:
> When shrinking a mmap we want to re-reserve the already activated area.
> When growing a memory region, we want to activate starting with a given
> fd_offset. Prepare by allowing to pass these parameters.
>
> Also, let's make sure we always process full pages, to avoid
> unmapping/remapping pages that are already in use when
> growing/shrinking. Add some asserts.
>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  util/mmap-alloc.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 8f40ef4fed..2767caa33b 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -83,12 +83,12 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>  }
>
>  /*
> - * Reserve a new memory region of the requested size to be used for mapping
> - * from the given fd (if any).
> + * Reserve a new memory region of the requested size or re-reserve parts
> + * of an activated region to be used for mapping from the given fd (if
> any). */
> -static void *mmap_reserve(size_t size, int fd)
> +static void *mmap_reserve(void *ptr, size_t size, int fd)
>  {
> -    int flags = MAP_PRIVATE;
> +    int flags = MAP_PRIVATE | (ptr ? MAP_FIXED : 0);
>
>  #if defined(__powerpc64__) && defined(__linux__)
>      /*
> @@ -111,20 +111,24 @@ static void *mmap_reserve(size_t size, int fd)
>      flags |= MAP_ANONYMOUS;
>  #endif
>
> -    return mmap(0, size, PROT_NONE, flags, fd, 0);
> +    return mmap(ptr, size, PROT_NONE, flags, fd, 0);
>  }
>
>  /*
>   * Activate memory in a reserved region from the given fd (if any), to make
> * it accessible.
>   */
> -static void *mmap_activate(void *ptr, size_t size, int fd, bool shared,
> -                           bool is_pmem)
> +static void *mmap_activate(void *ptr, size_t size, int fd, size_t
> fd_offset, +                           bool shared, bool is_pmem)
>  {
>      int map_sync_flags = 0;
>      int flags = MAP_FIXED;
>      void *activated_ptr;
>
> +    if (fd == -1) {
> +        fd_offset = 0;
> +    }
> +
>      flags |= fd == -1 ? MAP_ANONYMOUS : 0;
>      flags |= shared ? MAP_SHARED : MAP_PRIVATE;
>      if (shared && is_pmem) {
> @@ -132,7 +136,7 @@ static void *mmap_activate(void *ptr, size_t size, int
> fd, bool shared, }
>
>      activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
> -                         flags | map_sync_flags, fd, 0);
> +                         flags | map_sync_flags, fd, fd_offset);
>      if (activated_ptr == MAP_FAILED && map_sync_flags) {
>          if (errno == ENOTSUP) {
>              char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
> @@ -154,7 +158,8 @@ static void *mmap_activate(void *ptr, size_t size, int
> fd, bool shared, * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we
> will try * again without these flags to handle backwards compatibility. */
> -        activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd,
> 0); +        activated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags,
> fd, +                             fd_offset);
>      }
>      return activated_ptr;
>  }
> @@ -176,16 +181,19 @@ void *qemu_ram_mmap(int fd,
>                      bool is_pmem)
>  {
>      const size_t guard_pagesize = mmap_guard_pagesize(fd);
> +    const size_t pagesize = qemu_fd_getpagesize(fd);
>      size_t offset, total;
>      void *ptr, *guardptr;
>
> +    g_assert(QEMU_IS_ALIGNED(size, pagesize));
> +
>      /*
>       * Note: this always allocates at least one extra page of virtual
> address * space, even if size is already aligned.
>       */
>      total = size + align;
>
> -    guardptr = mmap_reserve(total, fd);
> +    guardptr = mmap_reserve(NULL, total, fd);
>      if (guardptr == MAP_FAILED) {
>          return MAP_FAILED;
>      }
> @@ -196,7 +204,7 @@ void *qemu_ram_mmap(int fd,
>
>      offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) -
> (uintptr_t)guardptr;
>
> -    ptr = mmap_activate(guardptr + offset, size, fd, shared, is_pmem);
> +    ptr = mmap_activate(guardptr + offset, size, fd, 0, shared, is_pmem);
>      if (ptr == MAP_FAILED) {
>          munmap(guardptr, total);
>          return MAP_FAILED;
> @@ -220,6 +228,10 @@ void *qemu_ram_mmap(int fd,
>
>  void qemu_ram_munmap(int fd, void *ptr, size_t size)
>  {
> +    const size_t pagesize = qemu_fd_getpagesize(fd);
> +
> +    g_assert(QEMU_IS_ALIGNED(size, pagesize));
> +
>      if (ptr) {
>          /* Unmap both the RAM block and the guard page */
>          munmap(ptr, size + mmap_guard_pagesize(fd));


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 11/15] util/mmap-alloc: Implement resizeable mmaps
  2020-03-05 14:29 ` [PATCH v4 11/15] util/mmap-alloc: Implement " David Hildenbrand
@ 2020-03-25 15:14   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 15:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Greg Kurz, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:41 AM -03 David Hildenbrand wrote:
> Implement resizeable mmaps. For now, the actual resizing is not wired up.
> Introduce qemu_ram_mmap_resizeable() and qemu_ram_mmap_resize(). Make
> qemu_ram_mmap() a wrapper of qemu_ram_mmap_resizeable().
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  include/qemu/mmap-alloc.h | 21 +++++++++++--------
>  util/mmap-alloc.c         | 43 ++++++++++++++++++++++++++++-----------
>  2 files changed, 44 insertions(+), 20 deletions(-)
>
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index e786266b92..ca8f7edf70 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,11 +7,13 @@ size_t qemu_fd_getpagesize(int fd);
>  size_t qemu_mempath_getpagesize(const char *mem_path);
>
>  /**
> - * qemu_ram_mmap: mmap the specified file or device.
> + * qemu_ram_mmap_resizeable: reserve a memory region of @max_size to mmap
> the + *                           specified file or device and mmap @size
> of it. *
>   * Parameters:
>   *  @fd: the file or the device to mmap
>   *  @size: the number of bytes to be mmaped
> + *  @max_size: the number of bytes to be reserved
>   *  @align: if not zero, specify the alignment of the starting mapping
> address; *          otherwise, the alignment in use will be determined by
> QEMU. *  @shared: map has RAM_SHARED flag.
> @@ -21,12 +23,15 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>   *  On success, return a pointer to the mapped area.
>   *  On failure, return MAP_FAILED.
>   */
> -void *qemu_ram_mmap(int fd,
> -                    size_t size,
> -                    size_t align,
> -                    bool shared,
> -                    bool is_pmem);
> -
> -void qemu_ram_munmap(int fd, void *ptr, size_t size);
> +void *qemu_ram_mmap_resizeable(int fd, size_t size, size_t max_size,
> +                              size_t align, bool shared, bool is_pmem);
> +bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t
> new_size, +                          bool shared, bool is_pmem);
> +static inline void *qemu_ram_mmap(int fd, size_t size, size_t align,
> +                                  bool shared, bool is_pmem)
> +{
> +    return qemu_ram_mmap_resizeable(fd, size, size, align, shared,
> is_pmem); +}
> +void qemu_ram_munmap(int fd, void *ptr, size_t max_size);
>
>  #endif
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 2767caa33b..7ed85f16d3 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -174,11 +174,8 @@ static inline size_t mmap_guard_pagesize(int fd)
>  #endif
>  }
>
> -void *qemu_ram_mmap(int fd,
> -                    size_t size,
> -                    size_t align,
> -                    bool shared,
> -                    bool is_pmem)
> +void *qemu_ram_mmap_resizeable(int fd, size_t size, size_t max_size,
> +                               size_t align, bool shared, bool is_pmem)
>  {
>      const size_t guard_pagesize = mmap_guard_pagesize(fd);
>      const size_t pagesize = qemu_fd_getpagesize(fd);
> @@ -186,12 +183,14 @@ void *qemu_ram_mmap(int fd,
>      void *ptr, *guardptr;
>
>      g_assert(QEMU_IS_ALIGNED(size, pagesize));
> +    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
>
>      /*
>       * Note: this always allocates at least one extra page of virtual
> address -     * space, even if size is already aligned.
> +     * space, even if the size is already aligned. We will reserve an area
> of +     * at least max_size, but only activate the requested part of it.
> */
> -    total = size + align;
> +    total = max_size + align;
>
>      guardptr = mmap_reserve(NULL, total, fd);
>      if (guardptr == MAP_FAILED) {
> @@ -219,21 +218,41 @@ void *qemu_ram_mmap(int fd,
>       * a guard page guarding against potential buffer overflows.
>       */
>      total -= offset;
> -    if (total > size + guard_pagesize) {
> -        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
> +    if (total > max_size + guard_pagesize) {
> +        munmap(ptr + max_size + guard_pagesize,
> +               total - max_size - guard_pagesize);
>      }
>
>      return ptr;
>  }
>
> -void qemu_ram_munmap(int fd, void *ptr, size_t size)
> +bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t
> new_size, +                          bool shared, bool is_pmem)
>  {
>      const size_t pagesize = qemu_fd_getpagesize(fd);
>
> -    g_assert(QEMU_IS_ALIGNED(size, pagesize));
> +    g_assert(QEMU_IS_ALIGNED(old_size, pagesize));
> +    g_assert(QEMU_IS_ALIGNED(new_size, pagesize));
> +
> +    if (old_size < new_size) {
> +        /* activate the missing piece in the reserved area */
> +        ptr = mmap_activate(ptr + old_size, new_size - old_size, fd,
> old_size, +                            shared, is_pmem);
> +    } else if (old_size > new_size) {
> +        /* discard this piece, marking it reserved */
> +        ptr = mmap_reserve(ptr + new_size, old_size - new_size, fd);
> +    }
> +    return ptr != MAP_FAILED;
> +}
> +
> +void qemu_ram_munmap(int fd, void *ptr, size_t max_size)
> +{
> +    const size_t pagesize = qemu_fd_getpagesize(fd);
> +
> +    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
>
>      if (ptr) {
>          /* Unmap both the RAM block and the guard page */
> -        munmap(ptr, size + mmap_guard_pagesize(fd));
> +        munmap(ptr, max_size + mmap_guard_pagesize(fd));
>      }
>  }


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-03-05 14:29 ` [PATCH v4 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
@ 2020-03-25 15:17   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 15:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Stefan Hajnoczi, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Alex Williamson, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:42 AM -03 David Hildenbrand wrote:
> Let's implement ram_block_resized(), allowing resizeable mappings.
>
> For resizeable mappings, we reserve $max_size IOVA address space, but only
> map $size of it. When resizing, unmap the old part and remap the new
> part. We'll need e.g., new ioctl to do this atomically (e.g., to resize
> while the guest is running).
>
> Right now, we only resize RAM blocks during incoming migration (when
> syncing RAM block sizes during the precopy phase) or after guest
> resets when building acpi tables. Any future user of resizeable RAM has to
> be aware that vfio has to be treated with care.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  util/trace-events   |  7 ++--
>  util/vfio-helpers.c | 95 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 88 insertions(+), 14 deletions(-)
>
> diff --git a/util/trace-events b/util/trace-events
> index 83b6639018..a4d39eca5e 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -74,10 +74,11 @@ qemu_mutex_unlock(void *mutex, const char *file, const
> int line) "released mutex
>
>  # vfio-helpers.c
>  qemu_vfio_dma_reset_temporary(void *s) "s %p"
> -qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size
> 0x%zx" -qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p
> host %p size 0x%zx" +qemu_vfio_ram_block_added(void *s, void *p, size_t
> size, size_t max_size) "s %p host %p size 0x%zx max_size 0x%zx"
> +qemu_vfio_ram_block_removed(void *s, void *p, size_t size, size_t
> max_size) "s %p host %p size 0x%zx max_size 0x%zx"
> +qemu_vfio_ram_block_resized(void *s, void *p, size_t old_size, size_t
> new_sizze) "s %p host %p old_size 0x%zx new_size 0x%zx"
> qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
> -qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t
> iova) "s %p host %p size %zu index %d iova 0x%"PRIx64
> +qemu_vfio_new_mapping(void *s, void *host, size_t size, size_t max_size,
> int index, uint64_t iova) "s %p host %p size %zu max_size %zu index %d iova
> 0x%"PRIx64 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t
> iova) "s %p host %p size %zu iova 0x%"PRIx64 qemu_vfio_dma_map(void *s,
> void *host, size_t size, bool temporary, uint64_t *iova) "s %p host %p size
> %zu temporary %d iova %p" qemu_vfio_dma_unmap(void *s, void *host) "s %p
> host %p"
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index f0c77f0d69..789faf38bd 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -36,6 +36,7 @@ typedef struct {
>      /* Page aligned addr. */
>      void *host;
>      size_t size;
> +    size_t max_size;
>      uint64_t iova;
>  } IOVAMapping;
>
> @@ -372,14 +373,20 @@ fail_container:
>      return ret;
>  }
>
> +static int qemu_vfio_dma_map_resizeable(QEMUVFIOState *s, void *host,
> +                                        size_t size, size_t max_size,
> +                                        bool temporary, uint64_t *iova);
> +static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
> +                                     size_t old_size, size_t new_size);
> +
>  static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
>                                        size_t size, size_t max_size)
>  {
>      QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
>      int ret;
>
> -    trace_qemu_vfio_ram_block_added(s, host, max_size);
> -    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
> +    trace_qemu_vfio_ram_block_added(s, host, size, max_size);
> +    ret = qemu_vfio_dma_map_resizeable(s, host, size, max_size, false,
> NULL); if (ret) {
>          error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host,
> max_size, strerror(-ret));
> @@ -391,16 +398,28 @@ static void
> qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host, {
>      QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
>      if (host) {
> -        trace_qemu_vfio_ram_block_removed(s, host, max_size);
> +        trace_qemu_vfio_ram_block_removed(s, host, size, max_size);
>          qemu_vfio_dma_unmap(s, host);
>      }
>  }
>
> +static void qemu_vfio_ram_block_resized(RAMBlockNotifier *n, void *host,
> +                                        size_t old_size, size_t new_size)
> +{
> +    QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
> +
> +    if (host) {
> +        trace_qemu_vfio_ram_block_resized(s, host, old_size, new_size);
> +        qemu_vfio_dma_map_resize(s, host, old_size, new_size);
> +    }
> +}
> +
>  static void qemu_vfio_open_common(QEMUVFIOState *s)
>  {
>      qemu_mutex_init(&s->lock);
>      s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
>      s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
> +    s->ram_notifier.ram_block_resized = qemu_vfio_ram_block_resized;
>      s->low_water_mark = QEMU_VFIO_IOVA_MIN;
>      s->high_water_mark = QEMU_VFIO_IOVA_MAX;
>      ram_block_notifier_add(&s->ram_notifier);
> @@ -495,16 +514,23 @@ static IOVAMapping
> *qemu_vfio_find_mapping(QEMUVFIOState *s, void *host, */
>  static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
>                                            void *host, size_t size,
> -                                          int index, uint64_t iova)
> +                                          size_t max_size, int index,
> +                                          uint64_t iova)
>  {
> +    const IOVAMapping m = {
> +        .host = host,
> +        .size = size,
> +        .max_size = max_size,
> +        .iova = iova,
> +    };
>      int shift;
> -    IOVAMapping m = {.host = host, .size = size, .iova = iova};
>      IOVAMapping *insert;
>
>      assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
> +    assert(size <= max_size);
>      assert(QEMU_IS_ALIGNED(s->low_water_mark, qemu_real_host_page_size));
>      assert(QEMU_IS_ALIGNED(s->high_water_mark, qemu_real_host_page_size));
> -    trace_qemu_vfio_new_mapping(s, host, size, index, iova);
> +    trace_qemu_vfio_new_mapping(s, host, size, max_size, index, iova);
>
>      assert(index >= 0);
>      s->nr_mappings++;
> @@ -597,9 +623,14 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
> * the result in @iova if not NULL. The caller need to make sure the area is
> * aligned to page size, and mustn't overlap with existing mapping areas
> (split * mapping status within this area is not allowed).
> + *
> + * If size < max_size, a region of max_size in IOVA address is reserved,
> such + * that the mapping can later be resized. Resizeable mappings are
> only allowed + * for !temporary mappings.
>   */
> -int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
> -                      bool temporary, uint64_t *iova)
> +static int qemu_vfio_dma_map_resizeable(QEMUVFIOState *s, void *host,
> +                                        size_t size, size_t max_size,
> +                                        bool temporary, uint64_t *iova)
>  {
>      int ret = 0;
>      int index;
> @@ -608,19 +639,24 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host,
> size_t size,
>
>      assert(QEMU_PTR_IS_ALIGNED(host, qemu_real_host_page_size));
>      assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
> +    assert(QEMU_IS_ALIGNED(max_size, qemu_real_host_page_size));
> +    assert(size == max_size || !temporary);
> +    assert(size <= max_size);
> +
>      trace_qemu_vfio_dma_map(s, host, size, temporary, iova);
>      qemu_mutex_lock(&s->lock);
>      mapping = qemu_vfio_find_mapping(s, host, &index);
>      if (mapping) {
>          iova0 = mapping->iova + ((uint8_t *)host - (uint8_t
> *)mapping->host); } else {
> -        if (s->high_water_mark - s->low_water_mark + 1 < size) {
> +        if (s->high_water_mark - s->low_water_mark + 1 < max_size) {
>              ret = -ENOMEM;
>              goto out;
>          }
>          if (!temporary) {
>              iova0 = s->low_water_mark;
> -            mapping = qemu_vfio_add_mapping(s, host, size, index + 1,
> iova0); +            mapping = qemu_vfio_add_mapping(s, host, size,
> max_size, index + 1, +                                            iova0);
>              if (!mapping) {
>                  ret = -ENOMEM;
>                  goto out;
> @@ -631,7 +667,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host,
> size_t size, qemu_vfio_remove_mapping(s, mapping);
>                  goto out;
>              }
> -            s->low_water_mark += size;
> +            s->low_water_mark += max_size;
>              qemu_vfio_dump_mappings(s);
>          } else {
>              iova0 = s->high_water_mark - size;
> @@ -650,6 +686,12 @@ out:
>      return ret;
>  }
>
> +int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
> +                      bool temporary, uint64_t *iova)
> +{
> +    return qemu_vfio_dma_map_resizeable(s, host, size, size, temporary,
> iova); +}
> +
>  /* Reset the high watermark and free all "temporary" mappings. */
>  int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s)
>  {
> @@ -694,6 +736,37 @@ out:
>      qemu_mutex_unlock(&s->lock);
>  }
>
> +static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
> +                                     size_t old_size, size_t new_size)
> +{
> +    IOVAMapping *m;
> +    int index = 0;
> +
> +    qemu_mutex_lock(&s->lock);
> +    m = qemu_vfio_find_mapping(s, host, &index);
> +    if (!m) {
> +        return;
> +    }
> +    assert(m->size == old_size);
> +    assert(new_size <= m->max_size);
> +
> +    /*
> +     * For now, we must unmap the whole mapped range first and remap with
> +     * the new size. The reason is that VFIO_IOMMU_UNMAP_DMA might fail
> +     * when partially unmapping previous mappings. Although we could add
> +     * new mappings to extend the old range, we won't able to always
> +     * shrink. The side effect is that it's never safe to resize during VM
> +     * execution and we'll e.g., need a new IOCTL to make this work.
> +     */
> +    qemu_vfio_undo_mapping(s, m->iova, m->size);
> +    qemu_vfio_do_mapping(s, host, m->iova, new_size);
> +
> +    m->size = new_size;
> +    assert(qemu_vfio_verify_mappings(s));
> +
> +    qemu_mutex_unlock(&s->lock);
> +}
> +
>  static void qemu_vfio_reset(QEMUVFIOState *s)
>  {
>      ioctl(s->device, VFIO_DEVICE_RESET);


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 13/15] util: oslib: Resizeable anonymous allocations under POSIX
  2020-03-05 14:29 ` [PATCH v4 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-03-25 15:20   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 15:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:43 AM -03 David Hildenbrand wrote:
> Introduce qemu_anon_ram_alloc_resizeable() and qemu_anon_ram_resize().
> Implement them under POSIX and make them return NULL under WIN32.
>
> Under POSIX, we make use of resizeable mmaps. An implementation under
> WIN32 is theoretically possible AFAIK and can be added later.
>
> In qemu_anon_ram_free(), rename the size parameter to max_size, to make
> it clearer that we have to use the maximum size when freeing resizeable
> anonymous allocations.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Stefan Weil <sw@weilnetz.de>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  include/qemu/osdep.h |  6 +++++-
>  util/oslib-posix.c   | 37 ++++++++++++++++++++++++++++++++++---
>  util/oslib-win32.c   | 14 ++++++++++++++
>  util/trace-events    |  4 +++-
>  4 files changed, 56 insertions(+), 5 deletions(-)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 9bd3dcfd13..a1ea9e043d 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -311,8 +311,12 @@ int qemu_daemon(int nochdir, int noclose);
>  void *qemu_try_memalign(size_t alignment, size_t size);
>  void *qemu_memalign(size_t alignment, size_t size);
>  void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
> +void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
> +                                     uint64_t *align, bool shared);
> +bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
> +                          bool shared);
>  void qemu_vfree(void *ptr);
> -void qemu_anon_ram_free(void *ptr, size_t size);
> +void qemu_anon_ram_free(void *ptr, size_t max_size);
>
>  #define QEMU_MADV_INVALID -1
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 897e8f3ba6..34b1ce74b7 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -223,16 +223,47 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t
> *alignment, bool shared) return ptr;
>  }
>
> +void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
> +                                     uint64_t *alignment, bool shared)
> +{
> +    size_t align = QEMU_VMALLOC_ALIGN;
> +    void *ptr = qemu_ram_mmap_resizeable(-1, size, max_size, align, shared,
> +                                         false);
> +
> +    if (ptr == MAP_FAILED) {
> +        return NULL;
> +    }
> +
> +    if (alignment) {
> +        *alignment = align;
> +    }
> +
> +    trace_qemu_anon_ram_alloc_resizeable(size, max_size, ptr);
> +    return ptr;
> +}
> +
> +bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
> +                          bool shared)
> +{
> +    bool resized = qemu_ram_mmap_resize(ptr, -1, old_size, new_size,
> shared, +                                        false);
> +
> +    if (resized) {
> +        trace_qemu_anon_ram_resize(old_size, new_size, ptr);
> +    }
> +    return resized;
> +}
> +
>  void qemu_vfree(void *ptr)
>  {
>      trace_qemu_vfree(ptr);
>      free(ptr);
>  }
>
> -void qemu_anon_ram_free(void *ptr, size_t size)
> +void qemu_anon_ram_free(void *ptr, size_t max_size)
>  {
> -    trace_qemu_anon_ram_free(ptr, size);
> -    qemu_ram_munmap(-1, ptr, size);
> +    trace_qemu_anon_ram_free(ptr, max_size);
> +    qemu_ram_munmap(-1, ptr, max_size);
>  }
>
>  void qemu_set_block(int fd)
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index e9b14ab178..c034fdfe01 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -90,6 +90,20 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align,
> bool shared) return ptr;
>  }
>
> +void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
> +                                     uint64_t *align, bool shared)
> +{
> +    /* resizeable ram not implemented yet */
> +    return NULL;
> +}
> +
> +bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
> +                          bool shared)
> +{
> +    /* resizeable ram not implemented yet */
> +    return false;
> +}
> +
>  void qemu_vfree(void *ptr)
>  {
>      trace_qemu_vfree(ptr);
> diff --git a/util/trace-events b/util/trace-events
> index a4d39eca5e..24a6f1a1e1 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -46,8 +46,10 @@ qemu_co_mutex_unlock_return(void *mutex, void *self)
> "mutex %p self %p" # oslib-posix.c
>  qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size
> %zu ptr %p" qemu_anon_ram_alloc(size_t size, void *ptr) "size %zu ptr %p"
> +qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size, void *ptr)
> "size %zu max_size %zu ptr %p" +qemu_anon_ram_resize(size_t old_size,
> size_t new_size, void *ptr) "old_size %zu new_size %zu ptr %p"
> qemu_vfree(void *ptr) "ptr %p"
> -qemu_anon_ram_free(void *ptr, size_t size) "ptr %p size %zu"
> +qemu_anon_ram_free(void *ptr, size_t max_size) "ptr %p max_size %zu"
>
>  # hbitmap.c
>  hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned
> long cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx"


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 14/15] numa: Introduce ram_block_notifiers_support_resize()
  2020-03-05 14:29 ` [PATCH v4 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
@ 2020-03-25 15:24   ` Murilo Opsfelder Araújo
  0 siblings, 0 replies; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 15:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	qemu-devel, Peter Xu, Dr . David Alan Gilbert, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On Thursday, March 5, 2020 11:29:44 AM -03 David Hildenbrand wrote:
> We want to actually use resizeable allocations in resizeable ram blocks
> (IOW, make everything between used_length and max_length inaccessible) -
> however, not all ram block notifiers can support that.
>
> Introduce a way to detect if any registered notifier does not
> support resizes - ram_block_notifiers_support_resize() - which we can later
> use to fallback to legacy handling if a registered notifier (esp., SEV and
> HAX) does not support actual resizes.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>  hw/core/numa.c         | 12 ++++++++++++
>  include/exec/ramlist.h |  1 +
>  2 files changed, 13 insertions(+)
>
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 37ce175e13..1d5288c22c 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -914,3 +914,15 @@ void ram_block_notify_resize(void *host, size_t
> old_size, size_t new_size) }
>      }
>  }
> +
> +bool ram_block_notifiers_support_resize(void)
> +{
> +    RAMBlockNotifier *notifier;
> +
> +    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
> +        if (!notifier->ram_block_resized) {
> +            return false;
> +        }
> +    }
> +    return true;
> +}
> diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
> index 293c0ddabe..ac5811be96 100644
> --- a/include/exec/ramlist.h
> +++ b/include/exec/ramlist.h
> @@ -79,6 +79,7 @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
>  void ram_block_notify_add(void *host, size_t size, size_t max_size);
>  void ram_block_notify_remove(void *host, size_t size, size_t max_size);
>  void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
> +bool ram_block_notifiers_support_resize(void);
>
>  void ram_block_dump(Monitor *mon);


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX
  2020-03-05 14:29 ` [PATCH v4 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-03-25 15:34   ` Murilo Opsfelder Araújo
  2020-03-27 11:24     ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-03-25 15:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Shameerali Kolothum Thodi, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

On Thursday, March 5, 2020 11:29:45 AM -03 David Hildenbrand wrote:
> We can now make use of resizeable anonymous allocations to implement
> actually resizeable ram blocks. Resizeable anonymous allocations are
> not implemented under WIN32 yet and are not available when using
> alternative allocators. Fall back to the existing handling.
>
> We also have to fallback to the existing handling in case any ram block
> notifier does not support resizing (esp., AMD SEV, HAX) yet. Remember
> in RAM_RESIZEABLE_ALLOC if we are using resizeable anonymous allocations.
>
> Try to grow early, as that can easily fail if out of memory. Shrink late
> and ignore errors (nothing will actually break). Warn only.
>
> The benefit of actually resizeable ram blocks is that e.g., under Linux,
> only the actual size will be reserved (even if
> "/proc/sys/vm/overcommit_memory" is set to "never"). Additional memory will
> be reserved when trying to resize, which allows to have ram blocks that
> start small but can theoretically grow very large.
>
> Note1: We are not able to create resizeable ram blocks with pre-allocated
>        memory yet, so prealloc is not affected.
> Note2: mlock should work as it used to as os_mlock() does a
>        mlockall(MCL_CURRENT | MCL_FUTURE), which includes future
>        mappings.
> Note3: Nobody should access memory beyond used_length. Memory notifiers
>        already properly take care of this, only ram block notifiers
>        violate this constraint and, therefore, have to be special-cased.
>        Especially, any ram block notifier that might dynamically
>        register at runtime (e.g., vfio) has to support resizes. Add an
>        assert for that. Both, HAX and SEV register early, so they are
>        fine.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Stefan Weil <sw@weilnetz.de>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  exec.c                    | 65 ++++++++++++++++++++++++++++++++++++---
>  hw/core/numa.c            |  7 +++++
>  include/exec/cpu-common.h |  2 ++
>  include/exec/memory.h     |  8 +++++
>  4 files changed, 77 insertions(+), 5 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 9c3cc79193..6c6b6e12d2 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2001,6 +2001,16 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
>      rb->flags &= ~RAM_MIGRATABLE;
>  }
>
> +bool qemu_ram_is_resizeable(RAMBlock *rb)
> +{
> +    return rb->flags & RAM_RESIZEABLE;
> +}
> +
> +bool qemu_ram_is_resizeable_alloc(RAMBlock *rb)
> +{
> +    return rb->flags & RAM_RESIZEABLE_ALLOC;
> +}
> +
>  /* Called with iothread lock held.  */
>  void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState
> *dev) {
> @@ -2094,6 +2104,7 @@ static void qemu_ram_apply_settings(void *host, size_t
> length) */
>  int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
>  {
> +    const bool shared = block->flags & RAM_SHARED;

Do you think a new function, for example, qemu_ram_is_shared() would be
welcome to check for RAM_SHARED flag as well?  Similar to what is done
in qemu_ram_is_resizeable() and qemu_ram_is_resizeable_alloc().

Apart from that,

Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

>      const ram_addr_t oldsize = block->used_length;
>
>      assert(block);
> @@ -2104,7 +2115,7 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t
> newsize, Error **errp) return 0;
>      }
>
> -    if (!(block->flags & RAM_RESIZEABLE)) {
> +    if (!qemu_ram_is_resizeable(block)) {
>          error_setg_errno(errp, EINVAL,
>                           "Length mismatch: %s: 0x" RAM_ADDR_FMT
>                           " in != 0x" RAM_ADDR_FMT, block->idstr,
> @@ -2120,6 +2131,15 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t
> newsize, Error **errp) return -EINVAL;
>      }
>
> +    if (oldsize < newsize && qemu_ram_is_resizeable_alloc(block)) {
> +        if (!qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
> +            error_setg_errno(errp, -ENOMEM, "Cannot allocate enough
> memory."); +            return -ENOMEM;
> +        }
> +        /* apply settings for the newly accessible memory */
> +        qemu_ram_apply_settings(block->host + oldsize, newsize - oldsize);
> +    }
> +
>      /* Notify before modifying the ram block and touching the bitmaps. */
>      if (block->host) {
>          ram_block_notify_resize(block->host, oldsize, newsize);
> @@ -2133,6 +2153,16 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t
> newsize, Error **errp) if (block->resized) {
>          block->resized(block->idstr, newsize, block->host);
>      }
> +
> +    /*
> +     * Shrinking will only fail in rare scenarios (e.g., maximum number of
> +     * mappings reached), and can be ignored. Warn only.
> +     */
> +    if (newsize < oldsize && qemu_ram_is_resizeable_alloc(block) &&
> +        !qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
> +        warn_report("Shrinking memory allocation failed.");
> +    }
> +
>      return 0;
>  }
>
> @@ -2211,6 +2241,29 @@ static void dirty_memory_extend(ram_addr_t
> old_ram_size, }
>  }
>
> +static void ram_block_alloc_ram(RAMBlock *rb)
> +{
> +    const bool shared = qemu_ram_is_shared(rb);
> +
> +    assert(!(rb->flags & RAM_RESIZEABLE_ALLOC));
> +    /*
> +     * If we can, try to allocate actually resizeable ram. Will also fail
> +     * if qemu_anon_ram_alloc_resizeable() is not implemented.
> +     */
> +    if (phys_mem_alloc == qemu_anon_ram_alloc &&
> +        qemu_ram_is_resizeable(rb) &&
> +        ram_block_notifiers_support_resize()) {
> +        rb->host = qemu_anon_ram_alloc_resizeable(rb->used_length,
> +                                                  rb->max_length,
> +                                                  &rb->mr->align, shared);
> +        if (rb->host) {
> +            rb->flags |= RAM_RESIZEABLE_ALLOC;
> +            return;
> +        }
> +    }
> +    rb->host = phys_mem_alloc(rb->max_length, &rb->mr->align, shared);
> +}
> +
>  static void ram_block_add(RAMBlock *new_block, Error **errp)
>  {
>      RAMBlock *block;
> @@ -2233,9 +2286,7 @@ static void ram_block_add(RAMBlock *new_block, Error
> **errp) return;
>              }
>          } else {
> -            new_block->host = phys_mem_alloc(new_block->max_length,
> -                                             &new_block->mr->align,
> -
> qemu_ram_is_shared(new_block)); +
> ram_block_alloc_ram(new_block);
>              if (!new_block->host) {
>                  error_setg_errno(errp, errno,
>                                   "cannot set up guest memory '%s'",
> @@ -2280,7 +2331,11 @@ static void ram_block_add(RAMBlock *new_block, Error
> **errp) DIRTY_CLIENTS_ALL);
>
>      if (new_block->host) {
> -        qemu_ram_apply_settings(new_block->host, new_block->max_length);
> +        if (qemu_ram_is_resizeable_alloc(new_block)) {
> +            qemu_ram_apply_settings(new_block->host,
> new_block->used_length); +        } else {
> +            qemu_ram_apply_settings(new_block->host,
> new_block->max_length); +        }
>          ram_block_notify_add(new_block->host, new_block->used_length,
>                               new_block->max_length);
>      }
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 1d5288c22c..c547549e49 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -862,6 +862,13 @@ static int ram_block_notify_add_single(RAMBlock *rb,
> void *opaque) RAMBlockNotifier *notifier = opaque;
>
>      if (host) {
> +        /*
> +         * Dynamically adding notifiers that don't support resizes is
> forbidden +         * when dealing with resizeable ram blocks that have
> actually resizeable +         * allocations.
> +         */
> +        g_assert(!qemu_ram_is_resizeable_alloc(rb) ||
> +                 notifier->ram_block_resized);
>          notifier->ram_block_added(notifier, host, size, max_size);
>      }
>      return 0;
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 09decb8d93..aacbf33b85 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -66,6 +66,8 @@ void qemu_ram_set_uf_zeroable(RAMBlock *rb);
>  bool qemu_ram_is_migratable(RAMBlock *rb);
>  void qemu_ram_set_migratable(RAMBlock *rb);
>  void qemu_ram_unset_migratable(RAMBlock *rb);
> +bool qemu_ram_is_resizeable(RAMBlock *rb);
> +bool qemu_ram_is_resizeable_alloc(RAMBlock *rb);
>
>  size_t qemu_ram_pagesize(RAMBlock *block);
>  size_t qemu_ram_pagesize_largest(void);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index b9b9470a56..74805dd448 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -129,6 +129,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
>  /* RAM is a persistent kind memory */
>  #define RAM_PMEM (1 << 5)
>
> +/*
> + * Implies RAM_RESIZEABLE. Memory beyond the used_length is inaccessible
> + * (esp. initially and after resizing). For such memory blocks, only the
> + * used_length is reserved in the OS - resizing might fail. Will only be
> + * used with host OS support and if all ram block notifiers support
> resizing. + */
> +#define RAM_RESIZEABLE_ALLOC (1 << 6)
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
>                                         hwaddr start, hwaddr end,


--
Murilo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX
  2020-03-25 15:34   ` Murilo Opsfelder Araújo
@ 2020-03-27 11:24     ` David Hildenbrand
  0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2020-03-27 11:24 UTC (permalink / raw)
  To: Murilo Opsfelder Araújo
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Shameerali Kolothum Thodi, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

On 25.03.20 16:34, Murilo Opsfelder Araújo wrote:
> On Thursday, March 5, 2020 11:29:45 AM -03 David Hildenbrand wrote:
>> We can now make use of resizeable anonymous allocations to implement
>> actually resizeable ram blocks. Resizeable anonymous allocations are
>> not implemented under WIN32 yet and are not available when using
>> alternative allocators. Fall back to the existing handling.
>>
>> We also have to fallback to the existing handling in case any ram block
>> notifier does not support resizing (esp., AMD SEV, HAX) yet. Remember
>> in RAM_RESIZEABLE_ALLOC if we are using resizeable anonymous allocations.
>>
>> Try to grow early, as that can easily fail if out of memory. Shrink late
>> and ignore errors (nothing will actually break). Warn only.
>>
>> The benefit of actually resizeable ram blocks is that e.g., under Linux,
>> only the actual size will be reserved (even if
>> "/proc/sys/vm/overcommit_memory" is set to "never"). Additional memory will
>> be reserved when trying to resize, which allows to have ram blocks that
>> start small but can theoretically grow very large.
>>
>> Note1: We are not able to create resizeable ram blocks with pre-allocated
>>        memory yet, so prealloc is not affected.
>> Note2: mlock should work as it used to as os_mlock() does a
>>        mlockall(MCL_CURRENT | MCL_FUTURE), which includes future
>>        mappings.
>> Note3: Nobody should access memory beyond used_length. Memory notifiers
>>        already properly take care of this, only ram block notifiers
>>        violate this constraint and, therefore, have to be special-cased.
>>        Especially, any ram block notifier that might dynamically
>>        register at runtime (e.g., vfio) has to support resizes. Add an
>>        assert for that. Both, HAX and SEV register early, so they are
>>        fine.
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Cc: Stefan Weil <sw@weilnetz.de>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  exec.c                    | 65 ++++++++++++++++++++++++++++++++++++---
>>  hw/core/numa.c            |  7 +++++
>>  include/exec/cpu-common.h |  2 ++
>>  include/exec/memory.h     |  8 +++++
>>  4 files changed, 77 insertions(+), 5 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9c3cc79193..6c6b6e12d2 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2001,6 +2001,16 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
>>      rb->flags &= ~RAM_MIGRATABLE;
>>  }
>>
>> +bool qemu_ram_is_resizeable(RAMBlock *rb)
>> +{
>> +    return rb->flags & RAM_RESIZEABLE;
>> +}
>> +
>> +bool qemu_ram_is_resizeable_alloc(RAMBlock *rb)
>> +{
>> +    return rb->flags & RAM_RESIZEABLE_ALLOC;
>> +}
>> +
>>  /* Called with iothread lock held.  */
>>  void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState
>> *dev) {
>> @@ -2094,6 +2104,7 @@ static void qemu_ram_apply_settings(void *host, size_t
>> length) */
>>  int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
>>  {
>> +    const bool shared = block->flags & RAM_SHARED;
> 
> Do you think a new function, for example, qemu_ram_is_shared() would be
> welcome to check for RAM_SHARED flag as well?  Similar to what is done
> in qemu_ram_is_resizeable() and qemu_ram_is_resizeable_alloc().

We have that one already, and I'll just reuse it :)

Thanks!

> 
> Apart from that,
> 
> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> 

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 01/15] util: vfio-helpers: Fix qemu_vfio_close()
  2020-03-05 14:29 ` [PATCH v4 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
@ 2020-04-17 10:22   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-04-17 10:22 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Dr . David Alan Gilbert, Peter Xu, Alex Williamson,
	Paolo Bonzini, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Igor Mammedov, Richard Henderson

On 3/5/20 3:29 PM, David Hildenbrand wrote:
> qemu_vfio_undo_mapping() will decrement the number of mappings and
> reshuffle the array elements to fit into the reduced size.
> 
> Iterating over all elements like this does not work as expected, let's make
> sure to remove all mappings properly.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   util/vfio-helpers.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 9ec01bfe26..f31aa77ffe 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -695,13 +695,11 @@ static void qemu_vfio_reset(QEMUVFIOState *s)
>   /* Close and free the VFIO resources. */
>   void qemu_vfio_close(QEMUVFIOState *s)
>   {
> -    int i;
> -
>       if (!s) {
>           return;
>       }
> -    for (i = 0; i < s->nr_mappings; ++i) {
> -        qemu_vfio_undo_mapping(s, &s->mappings[i], NULL);
> +    while (s->nr_mappings) {
> +        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
>       }
>       ram_block_notifier_remove(&s->ram_notifier);
>       qemu_vfio_reset(s);
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-04-17 10:23 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-05 14:29 [PATCH v4 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
2020-03-05 14:29 ` [PATCH v4 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
2020-04-17 10:22   ` Philippe Mathieu-Daudé
2020-03-05 14:29 ` [PATCH v4 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
2020-03-25 14:32   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 03/15] util: vfio-helpers: Factor out removal " David Hildenbrand
2020-03-25 14:45   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
2020-03-05 14:29 ` [PATCH v4 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
2020-03-05 14:29 ` [PATCH v4 06/15] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
2020-03-05 14:29 ` [PATCH v4 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
2020-03-25 15:03   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2020-03-05 14:29 ` [PATCH v4 09/15] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
2020-03-05 14:29 ` [PATCH v4 10/15] util/mmap-alloc: Prepare for resizeable mmaps David Hildenbrand
2020-03-25 15:09   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 11/15] util/mmap-alloc: Implement " David Hildenbrand
2020-03-25 15:14   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
2020-03-25 15:17   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
2020-03-25 15:20   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
2020-03-25 15:24   ` Murilo Opsfelder Araújo
2020-03-05 14:29 ` [PATCH v4 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
2020-03-25 15:34   ` Murilo Opsfelder Araújo
2020-03-27 11:24     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).