All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX
@ 2020-02-27 10:11 David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
                   ` (14 more replies)
  0 siblings, 15 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Alex Williamson, Shameerali Kolothum Thodi,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Stefan Weil, Stefan Hajnoczi, Richard Henderson

This series is now based on [1]:
    "[PATCH v3 00/13] migrate/ram: Fix resizing RAM blocks while migrating"

We already allow resizable ram blocks for anonymous memory, however, they
are not actually resized. All memory is mmaped() R/W, including the memory
exceeding the used_length, up to the max_length.

When resizing, effectively only the boundary is moved. Implement actually
resizable anonymous allocations and make use of them in resizable ram
blocks when possible. Memory exceeding the used_length will be
inaccessible. Especially ram block notifiers require care.

Having actually resizable anonymous allocations (via mmap-hackery) allows
to reserve a big region in virtual address space and grow the
accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
is set to "never" under Linux, huge reservations will succeed. If there is
not enough memory when resizing (to populate parts of the reserved region),
trying to resize will fail. Only the actually used size is reserved in the
OS.

Especially, memory notifiers already handle resizing by first removing
the old region, and then re-adding the resized region. prealloc is
currently not possible with resizable ram blocks. mlock() should continue
to work as is. Resizing is currently rare and must only happen on the
start of an incoming migration, or during resets. No code path (except
HAX and SEV ram block notifiers) should access memory outside of the usable
range - and if we ever find one, that one has to be fixed (I did not
identify any).

E.g., virtio-mem [2] wants to reserve big resizable memory regions and
grow the usable part on demand. I think this change is worth sending out
individually. I did excessive tests of this with virtio-mem (which makes
it very easy to trigger resizes), including precopy and postcopy migration.

Accompanied by a bunch of minor fixes and cleanups.

v2 -> v3:
- Rebased on current master/[1].
- "util: vfio-helpers: Factor out and fix processing of existing ram
   blocks"
-- moved to [1]
- "util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()"
-- Better parch description
- "util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()"
-- is now "util/mmap-alloc: Factor out calculation of the pagesize for the
   guard page"
-- Decided to keep special handling for the guard page for now
-- Dropped rb's
- "util/mmap-alloc: Prepare for resizeable mmaps"
-- No asserts sizes against the real page size
- "numa: Teach ram block notifiers about resizable ram blocks"
-- Split. One part is in [1], the other is now in "numa: Introduce
   ram_block_notifiers_support_resize()"
- "exec: Ram blocks with resizeable anonymous allocations under POSIX"
-- Call qemu_ram_apply_settings() only populated parts. Call it on
   freshly populated parts when growing.
- Minor changes

v1 -> v2:
- Add "util: vfio-helpers: Fix qemu_vfio_close()"
- Add "util: vfio-helpers: Remove Error parameter from
       qemu_vfio_undo_mapping()"
- Add "util: vfio-helpers: Factor out removal from
       qemu_vfio_undo_mapping()"
- "util/mmap-alloc: ..."
 -- Minor changes due to review feedback (e.g., assert alignment, return
    bool when resizing)
- "util: vfio-helpers: Implement ram_block_resized()"
 -- Reserve max_size in the IOVA address space.
 -- On resize, undo old mapping and do new mapping. We can later implement
    a new ioctl to resize the mapping directly.
- "numa: Teach ram block notifiers about resizable ram blocks"
 -- Pass size/max_size to ram block notifiers, which makes things easier an
    cleaner
- "exec: Ram blocks with resizable anonymous allocations under POSIX"
 -- Adapt to new ram block notifiers
 -- Shrink after notifying. Always trigger ram block notifiers on resizes
 -- Add a safety net that all ram block notifiers registered at runtime
    support resizes.

[1] https://lkml.kernel.org/r/20200226155304.60219-1-david@redhat.com
[2] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/

David Hildenbrand (15):
  util: vfio-helpers: Fix qemu_vfio_close()
  util: vfio-helpers: Remove Error parameter from
    qemu_vfio_undo_mapping()
  util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  exec: Factor out setting ram settings (madvise ...) into
    qemu_ram_apply_settings()
  exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  exec: Drop "shared" parameter from ram_block_add()
  util/mmap-alloc: Factor out calculation of the pagesize for the guard
    page
  util/mmap-alloc: Factor out reserving of a memory region to
    mmap_reserve()
  util/mmap-alloc: Factor out populating of memory to mmap_populate()
  util/mmap-alloc: Prepare for resizeable mmaps
  util/mmap-alloc: Implement resizeable mmaps
  util: vfio-helpers: Implement ram_block_resized()
  util: oslib: Resizeable anonymous allocations under POSIX
  numa: Introduce ram_block_notifiers_support_resize()
  exec: Ram blocks with resizeable anonymous allocations under POSIX

 exec.c                    |  99 +++++++++++++++++-----
 hw/core/numa.c            |  19 +++++
 include/exec/cpu-common.h |   2 +
 include/exec/memory.h     |   8 ++
 include/exec/ramlist.h    |   1 +
 include/qemu/mmap-alloc.h |  21 +++--
 include/qemu/osdep.h      |   6 +-
 util/mmap-alloc.c         | 167 ++++++++++++++++++++++++--------------
 util/oslib-posix.c        |  37 ++++++++-
 util/oslib-win32.c        |  14 ++++
 util/trace-events         |   9 +-
 util/vfio-helpers.c       | 120 ++++++++++++++++++++-------
 12 files changed, 380 insertions(+), 123 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 01/15] util: vfio-helpers: Fix qemu_vfio_close()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

qemu_vfio_undo_mapping() will decrement the number of mappings and
reshuffle the array elements to fit into the reduced size.

Iterating over all elements like this does not work as expected, let's make
sure to remove all mappings properly.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 9ec01bfe26..f31aa77ffe 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -695,13 +695,11 @@ static void qemu_vfio_reset(QEMUVFIOState *s)
 /* Close and free the VFIO resources. */
 void qemu_vfio_close(QEMUVFIOState *s)
 {
-    int i;
-
     if (!s) {
         return;
     }
-    for (i = 0; i < s->nr_mappings; ++i) {
-        qemu_vfio_undo_mapping(s, &s->mappings[i], NULL);
+    while (s->nr_mappings) {
+        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 03/15] util: vfio-helpers: Factor out removal " David Hildenbrand
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

Everybody discards the error. Let's error_report() instead so this error
doesn't get lost.

This is now the same error handling as in qemu_vfio_do_mapping(). However,
we don't report any errors via the return value to the caller. This seems
to be one of these "will never happen, but let's better print an error
because the caller can't handle it either way" cases.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index f31aa77ffe..7085ca702c 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -541,8 +541,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
 /**
  * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
-                                   Error **errp)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
 {
     int index;
     struct vfio_iommu_type1_dma_unmap unmap = {
@@ -557,7 +556,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
     assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
     assert(index >= 0 && index < s->nr_mappings);
     if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
-        error_setg_errno(errp, errno, "VFIO_UNMAP_DMA failed");
+        error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
     }
     memmove(mapping, &s->mappings[index + 1],
             sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
@@ -622,7 +621,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
             assert(qemu_vfio_verify_mappings(s));
             ret = qemu_vfio_do_mapping(s, host, size, iova0);
             if (ret) {
-                qemu_vfio_undo_mapping(s, mapping, NULL);
+                qemu_vfio_undo_mapping(s, mapping);
                 goto out;
             }
             s->low_water_mark += size;
@@ -682,7 +681,7 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
     if (!m) {
         goto out;
     }
-    qemu_vfio_undo_mapping(s, m, NULL);
+    qemu_vfio_undo_mapping(s, m);
 out:
     qemu_mutex_unlock(&s->lock);
 }
@@ -699,7 +698,7 @@ void qemu_vfio_close(QEMUVFIOState *s)
         return;
     }
     while (s->nr_mappings) {
-        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
+        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 03/15] util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

Factor it out and properly use it where applicable. Make
qemu_vfio_undo_mapping() look like qemu_vfio_do_mapping(), passing the
size and iova, not the mapping.

Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 7085ca702c..f0c77f0d69 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -518,6 +518,20 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
     return insert;
 }
 
+/**
+ * Remove the mapping from @s and free it.
+ */
+static void qemu_vfio_remove_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+{
+    const int index = mapping - s->mappings;
+
+    assert(index >= 0 && index < s->nr_mappings);
+    memmove(mapping, &s->mappings[index + 1],
+            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
+    s->nr_mappings--;
+    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
+}
+
 /* Do the DMA mapping with VFIO. */
 static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
                                 uint64_t iova)
@@ -539,29 +553,22 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
 }
 
 /**
- * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
+ * Undo the DMA mapping from @s with VFIO.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, size_t size, uint64_t iova)
 {
-    int index;
     struct vfio_iommu_type1_dma_unmap unmap = {
         .argsz = sizeof(unmap),
         .flags = 0,
-        .iova = mapping->iova,
-        .size = mapping->size,
+        .iova = iova,
+        .size = size,
     };
 
-    index = mapping - s->mappings;
-    assert(mapping->size > 0);
-    assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
-    assert(index >= 0 && index < s->nr_mappings);
+    assert(size > 0);
+    assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
     if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
     }
-    memmove(mapping, &s->mappings[index + 1],
-            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
-    s->nr_mappings--;
-    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
 }
 
 /* Check if the mapping list is (ascending) ordered. */
@@ -621,7 +628,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
             assert(qemu_vfio_verify_mappings(s));
             ret = qemu_vfio_do_mapping(s, host, size, iova0);
             if (ret) {
-                qemu_vfio_undo_mapping(s, mapping);
+                qemu_vfio_remove_mapping(s, mapping);
                 goto out;
             }
             s->low_water_mark += size;
@@ -681,7 +688,8 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
     if (!m) {
         goto out;
     }
-    qemu_vfio_undo_mapping(s, m);
+    qemu_vfio_undo_mapping(s, m->size, m->iova);
+    qemu_vfio_remove_mapping(s, m);
 out:
     qemu_mutex_unlock(&s->lock);
 }
@@ -698,7 +706,10 @@ void qemu_vfio_close(QEMUVFIOState *s)
         return;
     }
     while (s->nr_mappings) {
-        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
+        IOVAMapping *m = &s->mappings[s->nr_mappings - 1];
+
+        qemu_vfio_undo_mapping(s, m->size, m->iova);
+        qemu_vfio_remove_mapping(s, m);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-02-27 10:11 ` [PATCH v3 03/15] util: vfio-helpers: Factor out removal " David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Factor all settings out into qemu_ram_apply_settings().

For memory_try_enable_merging(), the important bit is that it won't be
called with XEN - which is now still the case as new_block->host will
remain NULL.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9d351a7492..a0a8d93fee 100644
--- a/exec.c
+++ b/exec.c
@@ -2069,6 +2069,21 @@ static int memory_try_enable_merging(void *addr, size_t len)
     return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
+static void qemu_ram_apply_settings(void *host, size_t length)
+{
+    memory_try_enable_merging(host, length);
+    qemu_ram_setup_dump(host, length);
+    qemu_madvise(host, length, QEMU_MADV_HUGEPAGE);
+    /*
+     * MADV_DONTFORK is also needed by KVM in absence of synchronous MMU
+     * Configure it unless the machine is a qtest server, in which case KVM is
+     * not used and it may be forked (eg for fuzzing purposes).
+     */
+    if (!qtest_enabled()) {
+        qemu_madvise(host, length, QEMU_MADV_DONTFORK);
+    }
+}
+
 /*
  * Resizing RAM while migrating can result in the migration being canceled.
  * Care has to be taken if the guest might have already detected the memory.
@@ -2227,7 +2242,6 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                 qemu_mutex_unlock_ramlist();
                 return;
             }
-            memory_try_enable_merging(new_block->host, new_block->max_length);
         }
     }
 
@@ -2265,17 +2279,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                                         DIRTY_CLIENTS_ALL);
 
     if (new_block->host) {
-        qemu_ram_setup_dump(new_block->host, new_block->max_length);
-        qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
-        /*
-         * MADV_DONTFORK is also needed by KVM in absence of synchronous MMU
-         * Configure it unless the machine is a qtest server, in which case
-         * KVM is not used and it may be forked (eg for fuzzing purposes).
-         */
-        if (!qtest_enabled()) {
-            qemu_madvise(new_block->host, new_block->max_length,
-                         QEMU_MADV_DONTFORK);
-        }
+        qemu_ram_apply_settings(new_block->host, new_block->max_length);
         ram_block_notify_add(new_block->host, new_block->used_length,
                              new_block->max_length);
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-02-27 10:11 ` [PATCH v3 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 06/15] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

I don't see why we shouldn't apply all settings to make it look like the
surrounding RAM (and enable proper VMA merging).

Note: memory backend settings might have overridden these settings. We
would need a callback to let the memory backend fix that up.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index a0a8d93fee..1fc1fa237e 100644
--- a/exec.c
+++ b/exec.c
@@ -2516,8 +2516,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                                  length, addr);
                     exit(1);
                 }
-                memory_try_enable_merging(vaddr, length);
-                qemu_ram_setup_dump(vaddr, length);
+                qemu_ram_apply_settings(vaddr, length);
             }
         }
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 06/15] exec: Drop "shared" parameter from ram_block_add()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-02-27 10:11 ` [PATCH v3 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Properly store it in the flags of the ram block instead (and the flag
even already exists and is used).

E.g., qemu_ram_is_shared() now properly succeeds on all ram blocks that are
actually shared.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index 1fc1fa237e..cdf962a936 100644
--- a/exec.c
+++ b/exec.c
@@ -2211,7 +2211,7 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
     }
 }
 
-static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
+static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
     RAMBlock *block;
     RAMBlock *last_block = NULL;
@@ -2234,7 +2234,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
             }
         } else {
             new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align, shared);
+                                             &new_block->mr->align,
+                                             qemu_ram_is_shared(new_block));
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
@@ -2339,7 +2340,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
@@ -2401,10 +2402,13 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     if (host) {
         new_block->flags |= RAM_PREALLOC;
     }
+    if (share) {
+        new_block->flags |= RAM_SHARED;
+    }
     if (resizeable) {
         new_block->flags |= RAM_RESIZEABLE;
     }
-    ram_block_add(new_block, &local_err, share);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-02-27 10:11 ` [PATCH v3 06/15] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-28 19:43   ` Peter Xu
  2020-02-27 10:11 ` [PATCH v3 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu, Greg Kurz,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Let's factor out calculating the size of the guard page and rename the
variable to make it clearer that this pagesize only applies to the
guard page.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 27dcccd8ec..f0277f9fad 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+static inline size_t mmap_guard_pagesize(int fd)
+{
+#if defined(__powerpc64__) && defined(__linux__)
+    /* Mappings in the same segment must share the same page size */
+    return qemu_fd_getpagesize(fd);
+#else
+    return qemu_real_host_page_size;
+#endif
+}
+
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
                     bool shared,
                     bool is_pmem)
 {
+    const size_t guard_pagesize = mmap_guard_pagesize(fd);
     int flags;
     int map_sync_flags = 0;
     int guardfd;
     size_t offset;
-    size_t pagesize;
     size_t total;
     void *guardptr;
     void *ptr;
@@ -113,8 +123,7 @@ void *qemu_ram_mmap(int fd,
      * anonymous memory is OK.
      */
     flags = MAP_PRIVATE;
-    pagesize = qemu_fd_getpagesize(fd);
-    if (fd == -1 || pagesize == qemu_real_host_page_size) {
+    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
         guardfd = -1;
         flags |= MAP_ANONYMOUS;
     } else {
@@ -123,7 +132,6 @@ void *qemu_ram_mmap(int fd,
     }
 #else
     guardfd = -1;
-    pagesize = qemu_real_host_page_size;
     flags = MAP_PRIVATE | MAP_ANONYMOUS;
 #endif
 
@@ -135,7 +143,7 @@ void *qemu_ram_mmap(int fd,
 
     assert(is_power_of_2(align));
     /* Always align to host page size */
-    assert(align >= pagesize);
+    assert(align >= guard_pagesize);
 
     flags = MAP_FIXED;
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
@@ -189,8 +197,8 @@ void *qemu_ram_mmap(int fd,
      * a guard page guarding against potential buffer overflows.
      */
     total -= offset;
-    if (total > size + pagesize) {
-        munmap(ptr + size + pagesize, total - size - pagesize);
+    if (total > size + guard_pagesize) {
+        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
     }
 
     return ptr;
@@ -198,15 +206,8 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
-    size_t pagesize;
-
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-#if defined(__powerpc64__) && defined(__linux__)
-        pagesize = qemu_fd_getpagesize(fd);
-#else
-        pagesize = qemu_real_host_page_size;
-#endif
-        munmap(ptr, size + pagesize);
+        munmap(ptr, size + mmap_guard_pagesize(fd));
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-02-27 10:11 ` [PATCH v3 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-02-27 10:11 ` [PATCH v3 09/15] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

We want to reserve a memory region without actually populating memory.
Let's factor that out.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 58 +++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index f0277f9fad..9e9534a07e 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,38 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+/*
+ * Reserve a new memory region of the requested size to be used for mapping
+ * from the given fd (if any).
+ */
+static void *mmap_reserve(size_t size, int fd)
+{
+    int flags = MAP_PRIVATE;
+
+#if defined(__powerpc64__) && defined(__linux__)
+    /*
+     * On ppc64 mappings in the same segment (aka slice) must share the same
+     * page size. Since we will be re-allocating part of this segment
+     * from the supplied fd, we should make sure to use the same page size, to
+     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
+     * avoid allocating backing store memory.
+     * We do this unless we are using the system page size, in which case
+     * anonymous memory is OK.
+     */
+    if (fd == -1 || qemu_fd_getpagesize(fd) == qemu_real_host_page_size) {
+        fd = -1;
+        flags |= MAP_ANONYMOUS;
+    } else {
+        flags |= MAP_NORESERVE;
+    }
+#else
+    fd = -1;
+    flags |= MAP_ANONYMOUS;
+#endif
+
+    return mmap(0, size, PROT_NONE, flags, fd, 0);
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -101,7 +133,6 @@ void *qemu_ram_mmap(int fd,
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
     int flags;
     int map_sync_flags = 0;
-    int guardfd;
     size_t offset;
     size_t total;
     void *guardptr;
@@ -113,30 +144,7 @@ void *qemu_ram_mmap(int fd,
      */
     total = size + align;
 
-#if defined(__powerpc64__) && defined(__linux__)
-    /* On ppc64 mappings in the same segment (aka slice) must share the same
-     * page size. Since we will be re-allocating part of this segment
-     * from the supplied fd, we should make sure to use the same page size, to
-     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
-     * avoid allocating backing store memory.
-     * We do this unless we are using the system page size, in which case
-     * anonymous memory is OK.
-     */
-    flags = MAP_PRIVATE;
-    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
-        guardfd = -1;
-        flags |= MAP_ANONYMOUS;
-    } else {
-        guardfd = fd;
-        flags |= MAP_NORESERVE;
-    }
-#else
-    guardfd = -1;
-    flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#endif
-
-    guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
-
+    guardptr = mmap_reserve(total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 09/15] util/mmap-alloc: Factor out populating of memory to mmap_populate()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-02-27 10:11 ` [PATCH v3 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
@ 2020-02-27 10:11 ` David Hildenbrand
  2020-03-03  8:43   ` David Hildenbrand
  2020-02-27 10:12 ` [PATCH v3 10/15] util/mmap-alloc: Prepare for resizeable mmaps David Hildenbrand
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

We want to populate memory within a reserved memory region. Let's factor
that out.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 89 +++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 9e9534a07e..73179f36d6 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -114,6 +114,50 @@ static void *mmap_reserve(size_t size, int fd)
     return mmap(0, size, PROT_NONE, flags, fd, 0);
 }
 
+/*
+ * Populate memory in a reserved region from the given fd (if any).
+ */
+static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
+                           bool is_pmem)
+{
+    int map_sync_flags = 0;
+    int flags = MAP_FIXED;
+    void *populated_ptr;
+
+    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
+    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
+    populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
+                         flags | map_sync_flags, fd, 0);
+    if (populated_ptr == MAP_FAILED && map_sync_flags) {
+        if (errno == ENOTSUP) {
+            char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+            char *file_name = g_malloc0(PATH_MAX);
+            int len = readlink(proc_link, file_name, PATH_MAX - 1);
+
+            if (len < 0) {
+                len = 0;
+            }
+            file_name[len] = '\0';
+            fprintf(stderr, "Warning: requesting persistence across crashes "
+                    "for backend file %s failed. Proceeding without "
+                    "persistence, data might become corrupted in case of host "
+                    "crash.\n", file_name);
+            g_free(proc_link);
+            g_free(file_name);
+        }
+        /*
+         * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
+         * again without these flags to handle backwards compatibility.
+         */
+        populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+    }
+    return populated_ptr;
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -131,12 +175,8 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
-    int flags;
-    int map_sync_flags = 0;
-    size_t offset;
-    size_t total;
-    void *guardptr;
-    void *ptr;
+    size_t offset, total;
+    void *ptr, *guardptr;
 
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -153,44 +193,9 @@ void *qemu_ram_mmap(int fd,
     /* Always align to host page size */
     assert(align >= guard_pagesize);
 
-    flags = MAP_FIXED;
-    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
-    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-    if (shared && is_pmem) {
-        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
-    }
-
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-               flags | map_sync_flags, fd, 0);
-
-    if (ptr == MAP_FAILED && map_sync_flags) {
-        if (errno == ENOTSUP) {
-            char *proc_link, *file_name;
-            int len;
-            proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
-            file_name = g_malloc0(PATH_MAX);
-            len = readlink(proc_link, file_name, PATH_MAX - 1);
-            if (len < 0) {
-                len = 0;
-            }
-            file_name[len] = '\0';
-            fprintf(stderr, "Warning: requesting persistence across crashes "
-                    "for backend file %s failed. Proceeding without "
-                    "persistence, data might become corrupted in case of host "
-                    "crash.\n", file_name);
-            g_free(proc_link);
-            g_free(file_name);
-        }
-        /*
-         * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
-         * we will remove these flags to handle compatibility.
-         */
-        ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-                   flags, fd, 0);
-    }
-
+    ptr = mmap_populate(guardptr + offset, size, fd, shared, is_pmem);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 10/15] util/mmap-alloc: Prepare for resizeable mmaps
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-02-27 10:11 ` [PATCH v3 09/15] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
@ 2020-02-27 10:12 ` David Hildenbrand
  2020-02-27 10:12 ` [PATCH v3 11/15] util/mmap-alloc: Implement " David Hildenbrand
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

When shrinking a mmap we want to re-reserve the already populated area.
When growing a memory region, we want to populate starting with a given
fd_offset. Prepare by allowing to pass these parameters.

Also, let's make sure we always process full pages, to avoid
unmapping/remapping pages that are already in use when
growing/shrinking. Add some asserts.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 73179f36d6..c6a6075215 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -83,12 +83,12 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 }
 
 /*
- * Reserve a new memory region of the requested size to be used for mapping
- * from the given fd (if any).
+ * Reserve a new memory region of the requested size or re-reserve parts
+ * of an existing region to be used for mapping from the given fd (if any).
  */
-static void *mmap_reserve(size_t size, int fd)
+static void *mmap_reserve(void *ptr, size_t size, int fd)
 {
-    int flags = MAP_PRIVATE;
+    int flags = MAP_PRIVATE | (ptr ? MAP_FIXED : 0);
 
 #if defined(__powerpc64__) && defined(__linux__)
     /*
@@ -111,19 +111,23 @@ static void *mmap_reserve(size_t size, int fd)
     flags |= MAP_ANONYMOUS;
 #endif
 
-    return mmap(0, size, PROT_NONE, flags, fd, 0);
+    return mmap(ptr, size, PROT_NONE, flags, fd, 0);
 }
 
 /*
  * Populate memory in a reserved region from the given fd (if any).
  */
-static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
-                           bool is_pmem)
+static void *mmap_populate(void *ptr, size_t size, int fd, size_t fd_offset,
+                           bool shared, bool is_pmem)
 {
     int map_sync_flags = 0;
     int flags = MAP_FIXED;
     void *populated_ptr;
 
+    if (fd == -1) {
+        fd_offset = 0;
+    }
+
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
     if (shared && is_pmem) {
@@ -131,7 +135,7 @@ static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
     }
 
     populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
-                         flags | map_sync_flags, fd, 0);
+                         flags | map_sync_flags, fd, fd_offset);
     if (populated_ptr == MAP_FAILED && map_sync_flags) {
         if (errno == ENOTSUP) {
             char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
@@ -153,7 +157,8 @@ static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
          * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
          * again without these flags to handle backwards compatibility.
          */
-        populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+        populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd,
+                             fd_offset);
     }
     return populated_ptr;
 }
@@ -175,16 +180,19 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
+    const size_t pagesize = qemu_fd_getpagesize(fd);
     size_t offset, total;
     void *ptr, *guardptr;
 
+    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
     /*
      * Note: this always allocates at least one extra page of virtual address
      * space, even if size is already aligned.
      */
     total = size + align;
 
-    guardptr = mmap_reserve(total, fd);
+    guardptr = mmap_reserve(NULL, total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
@@ -195,7 +203,7 @@ void *qemu_ram_mmap(int fd,
 
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap_populate(guardptr + offset, size, fd, shared, is_pmem);
+    ptr = mmap_populate(guardptr + offset, size, fd, 0, shared, is_pmem);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
@@ -219,6 +227,10 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
+    const size_t pagesize = qemu_fd_getpagesize(fd);
+
+    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
         munmap(ptr, size + mmap_guard_pagesize(fd));
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 11/15] util/mmap-alloc: Implement resizeable mmaps
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-02-27 10:12 ` [PATCH v3 10/15] util/mmap-alloc: Prepare for resizeable mmaps David Hildenbrand
@ 2020-02-27 10:12 ` David Hildenbrand
  2020-02-28 19:43   ` Peter Xu
  2020-02-27 10:12 ` [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

Implement resizeable mmaps. For now, the actual resizing is not wired up.
Introduce qemu_ram_mmap_resizeable() and qemu_ram_mmap_resize(). Make
qemu_ram_mmap() a wrapper of qemu_ram_mmap_resizeable().

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/mmap-alloc.h | 21 +++++++++++--------
 util/mmap-alloc.c         | 43 ++++++++++++++++++++++++++++-----------
 2 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index e786266b92..ca8f7edf70 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,11 +7,13 @@ size_t qemu_fd_getpagesize(int fd);
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
 /**
- * qemu_ram_mmap: mmap the specified file or device.
+ * qemu_ram_mmap_resizeable: reserve a memory region of @max_size to mmap the
+ *                           specified file or device and mmap @size of it.
  *
  * Parameters:
  *  @fd: the file or the device to mmap
  *  @size: the number of bytes to be mmaped
+ *  @max_size: the number of bytes to be reserved
  *  @align: if not zero, specify the alignment of the starting mapping address;
  *          otherwise, the alignment in use will be determined by QEMU.
  *  @shared: map has RAM_SHARED flag.
@@ -21,12 +23,15 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  On success, return a pointer to the mapped area.
  *  On failure, return MAP_FAILED.
  */
-void *qemu_ram_mmap(int fd,
-                    size_t size,
-                    size_t align,
-                    bool shared,
-                    bool is_pmem);
-
-void qemu_ram_munmap(int fd, void *ptr, size_t size);
+void *qemu_ram_mmap_resizeable(int fd, size_t size, size_t max_size,
+                              size_t align, bool shared, bool is_pmem);
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+                          bool shared, bool is_pmem);
+static inline void *qemu_ram_mmap(int fd, size_t size, size_t align,
+                                  bool shared, bool is_pmem)
+{
+    return qemu_ram_mmap_resizeable(fd, size, size, align, shared, is_pmem);
+}
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size);
 
 #endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index c6a6075215..8f9c533f32 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -173,11 +173,8 @@ static inline size_t mmap_guard_pagesize(int fd)
 #endif
 }
 
-void *qemu_ram_mmap(int fd,
-                    size_t size,
-                    size_t align,
-                    bool shared,
-                    bool is_pmem)
+void *qemu_ram_mmap_resizeable(int fd, size_t size, size_t max_size,
+                               size_t align, bool shared, bool is_pmem)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
     const size_t pagesize = qemu_fd_getpagesize(fd);
@@ -185,12 +182,14 @@ void *qemu_ram_mmap(int fd,
     void *ptr, *guardptr;
 
     g_assert(QEMU_IS_ALIGNED(size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
     /*
      * Note: this always allocates at least one extra page of virtual address
-     * space, even if size is already aligned.
+     * space, even if the size is already aligned. We will reserve an area of
+     * at least max_size, but only populate the requested part of it.
      */
-    total = size + align;
+    total = max_size + align;
 
     guardptr = mmap_reserve(NULL, total, fd);
     if (guardptr == MAP_FAILED) {
@@ -218,21 +217,41 @@ void *qemu_ram_mmap(int fd,
      * a guard page guarding against potential buffer overflows.
      */
     total -= offset;
-    if (total > size + guard_pagesize) {
-        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
+    if (total > max_size + guard_pagesize) {
+        munmap(ptr + max_size + guard_pagesize,
+               total - max_size - guard_pagesize);
     }
 
     return ptr;
 }
 
-void qemu_ram_munmap(int fd, void *ptr, size_t size)
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+                          bool shared, bool is_pmem)
 {
     const size_t pagesize = qemu_fd_getpagesize(fd);
 
-    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(old_size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(new_size, pagesize));
+
+    if (old_size < new_size) {
+        /* populate the missing piece into the reserved area */
+        ptr = mmap_populate(ptr + old_size, new_size - old_size, fd, old_size,
+                            shared, is_pmem);
+    } else if (old_size > new_size) {
+        /* discard this piece, marking it reserved */
+        ptr = mmap_reserve(ptr + new_size, old_size - new_size, fd);
+    }
+    return ptr != MAP_FAILED;
+}
+
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size)
+{
+    const size_t pagesize = qemu_fd_getpagesize(fd);
+
+    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-        munmap(ptr, size + mmap_guard_pagesize(fd));
+        munmap(ptr, max_size + mmap_guard_pagesize(fd));
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-02-27 10:12 ` [PATCH v3 11/15] util/mmap-alloc: Implement " David Hildenbrand
@ 2020-02-27 10:12 ` David Hildenbrand
  2020-02-28 19:42   ` Peter Xu
  2020-02-28 19:55   ` Peter Xu
  2020-02-27 10:12 ` [PATCH v3 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Stefan Hajnoczi, Richard Henderson

Let's implement ram_block_resized(), allowing resizeable mappings.

For resizeable mappings, we reserve $max_size IOVA address space, but only
map $size of it. When resizing, unmap the old part and remap the new
part. We'll need a new ioctl to do this atomically (e.g., to resize
while the guest is running - not allowed for now).

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/trace-events   |  5 ++--
 util/vfio-helpers.c | 70 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/util/trace-events b/util/trace-events
index 83b6639018..88b7dbf4a5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -74,8 +74,9 @@ qemu_mutex_unlock(void *mutex, const char *file, const int line) "released mutex
 
 # vfio-helpers.c
 qemu_vfio_dma_reset_temporary(void *s) "s %p"
-qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
-qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
+qemu_vfio_ram_block_added(void *s, void *p, size_t size, size_t max_size) "s %p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_removed(void *s, void *p, size_t size, size_t max_size) "s %p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_resized(void *s, void *p, size_t old_size, size_t new_sizze) "s %p host %p old_size 0x%zx new_size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t iova) "s %p host %p size %zu index %d iova 0x%"PRIx64
 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p host %p size %zu iova 0x%"PRIx64
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index f0c77f0d69..566eb4e1eb 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -372,14 +372,20 @@ fail_container:
     return ret;
 }
 
+static int qemu_vfio_dma_map_resizeable(QEMUVFIOState *s, void *host,
+                                        size_t size, size_t max_size,
+                                        bool temporary, uint64_t *iova);
+static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
+                                     size_t old_size, size_t new_size);
+
 static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
                                       size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     int ret;
 
-    trace_qemu_vfio_ram_block_added(s, host, max_size);
-    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
+    trace_qemu_vfio_ram_block_added(s, host, size, max_size);
+    ret = qemu_vfio_dma_map_resizeable(s, host, size, max_size, false, NULL);
     if (ret) {
         error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, max_size,
                      strerror(-ret));
@@ -391,16 +397,28 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host,
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     if (host) {
-        trace_qemu_vfio_ram_block_removed(s, host, max_size);
+        trace_qemu_vfio_ram_block_removed(s, host, size, max_size);
         qemu_vfio_dma_unmap(s, host);
     }
 }
 
+static void qemu_vfio_ram_block_resized(RAMBlockNotifier *n, void *host,
+                                        size_t old_size, size_t new_size)
+{
+    QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+
+    if (host) {
+        trace_qemu_vfio_ram_block_resized(s, host, old_size, new_size);
+        qemu_vfio_dma_map_resize(s, host, old_size, new_size);
+    }
+}
+
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
     qemu_mutex_init(&s->lock);
     s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
     s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
+    s->ram_notifier.ram_block_resized = qemu_vfio_ram_block_resized;
     s->low_water_mark = QEMU_VFIO_IOVA_MIN;
     s->high_water_mark = QEMU_VFIO_IOVA_MAX;
     ram_block_notifier_add(&s->ram_notifier);
@@ -597,9 +615,14 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
  * the result in @iova if not NULL. The caller need to make sure the area is
  * aligned to page size, and mustn't overlap with existing mapping areas (split
  * mapping status within this area is not allowed).
+ *
+ * If size < max_size, a region of max_size in IOVA address is reserved, such
+ * that the mapping can later be resized. Resizeable mappings are only allowed
+ * for !temporary mappings.
  */
-int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-                      bool temporary, uint64_t *iova)
+static int qemu_vfio_dma_map_resizeable(QEMUVFIOState *s, void *host,
+                                        size_t size, size_t max_size,
+                                        bool temporary, uint64_t *iova)
 {
     int ret = 0;
     int index;
@@ -608,13 +631,17 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
 
     assert(QEMU_PTR_IS_ALIGNED(host, qemu_real_host_page_size));
     assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
+    assert(QEMU_IS_ALIGNED(max_size, qemu_real_host_page_size));
+    assert(size == max_size || !temporary);
+    assert(size <= max_size);
+
     trace_qemu_vfio_dma_map(s, host, size, temporary, iova);
     qemu_mutex_lock(&s->lock);
     mapping = qemu_vfio_find_mapping(s, host, &index);
     if (mapping) {
         iova0 = mapping->iova + ((uint8_t *)host - (uint8_t *)mapping->host);
     } else {
-        if (s->high_water_mark - s->low_water_mark + 1 < size) {
+        if (s->high_water_mark - s->low_water_mark + 1 < max_size) {
             ret = -ENOMEM;
             goto out;
         }
@@ -631,7 +658,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
                 qemu_vfio_remove_mapping(s, mapping);
                 goto out;
             }
-            s->low_water_mark += size;
+            s->low_water_mark += max_size;
             qemu_vfio_dump_mappings(s);
         } else {
             iova0 = s->high_water_mark - size;
@@ -650,6 +677,12 @@ out:
     return ret;
 }
 
+int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
+                      bool temporary, uint64_t *iova)
+{
+    return qemu_vfio_dma_map_resizeable(s, host, size, size, temporary, iova);
+}
+
 /* Reset the high watermark and free all "temporary" mappings. */
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s)
 {
@@ -694,6 +727,29 @@ out:
     qemu_mutex_unlock(&s->lock);
 }
 
+static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
+                                     size_t old_size, size_t new_size)
+{
+    IOVAMapping *m;
+    int index = 0;
+
+    qemu_mutex_lock(&s->lock);
+    m = qemu_vfio_find_mapping(s, host, &index);
+    if (!m) {
+        return;
+    }
+    assert(m->size == old_size);
+
+    /* Note: Not atomic - we need a new ioctl for that. */
+    qemu_vfio_undo_mapping(s, m->iova, m->size);
+    qemu_vfio_do_mapping(s, host, m->iova, new_size);
+
+    m->size = new_size;
+    assert(qemu_vfio_verify_mappings(s));
+
+    qemu_mutex_unlock(&s->lock);
+}
+
 static void qemu_vfio_reset(QEMUVFIOState *s)
 {
     ioctl(s->device, VFIO_DEVICE_RESET);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 13/15] util: oslib: Resizeable anonymous allocations under POSIX
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-02-27 10:12 ` [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
@ 2020-02-27 10:12 ` David Hildenbrand
  2020-02-28 20:11   ` Peter Xu
  2020-02-27 10:12 ` [PATCH v3 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
  2020-02-27 10:12 ` [PATCH v3 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  14 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Introduce qemu_anon_ram_alloc_resizeable() and qemu_anon_ram_resize().
Implement them under POSIX and make them return NULL under WIN32.

Under POSIX, we make use of resizeable mmaps. An implementation under
WIN32 is theoretically possible AFAIK and can be added later.

In qemu_anon_ram_free(), rename the size parameter to max_size, to make
it clearer that we have to use the maximum size when freeing resizeable
anonymous allocations.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/osdep.h |  6 +++++-
 util/oslib-posix.c   | 37 ++++++++++++++++++++++++++++++++++---
 util/oslib-win32.c   | 14 ++++++++++++++
 util/trace-events    |  4 +++-
 4 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 9bd3dcfd13..a1ea9e043d 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -311,8 +311,12 @@ int qemu_daemon(int nochdir, int noclose);
 void *qemu_try_memalign(size_t alignment, size_t size);
 void *qemu_memalign(size_t alignment, size_t size);
 void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
+void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
+                                     uint64_t *align, bool shared);
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared);
 void qemu_vfree(void *ptr);
-void qemu_anon_ram_free(void *ptr, size_t size);
+void qemu_anon_ram_free(void *ptr, size_t max_size);
 
 #define QEMU_MADV_INVALID -1
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 897e8f3ba6..34b1ce74b7 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -223,16 +223,47 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
     return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
+                                     uint64_t *alignment, bool shared)
+{
+    size_t align = QEMU_VMALLOC_ALIGN;
+    void *ptr = qemu_ram_mmap_resizeable(-1, size, max_size, align, shared,
+                                         false);
+
+    if (ptr == MAP_FAILED) {
+        return NULL;
+    }
+
+    if (alignment) {
+        *alignment = align;
+    }
+
+    trace_qemu_anon_ram_alloc_resizeable(size, max_size, ptr);
+    return ptr;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared)
+{
+    bool resized = qemu_ram_mmap_resize(ptr, -1, old_size, new_size, shared,
+                                        false);
+
+    if (resized) {
+        trace_qemu_anon_ram_resize(old_size, new_size, ptr);
+    }
+    return resized;
+}
+
 void qemu_vfree(void *ptr)
 {
     trace_qemu_vfree(ptr);
     free(ptr);
 }
 
-void qemu_anon_ram_free(void *ptr, size_t size)
+void qemu_anon_ram_free(void *ptr, size_t max_size)
 {
-    trace_qemu_anon_ram_free(ptr, size);
-    qemu_ram_munmap(-1, ptr, size);
+    trace_qemu_anon_ram_free(ptr, max_size);
+    qemu_ram_munmap(-1, ptr, max_size);
 }
 
 void qemu_set_block(int fd)
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e9b14ab178..c034fdfe01 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -90,6 +90,20 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared)
     return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size,
+                                     uint64_t *align, bool shared)
+{
+    /* resizeable ram not implemented yet */
+    return NULL;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared)
+{
+    /* resizeable ram not implemented yet */
+    return false;
+}
+
 void qemu_vfree(void *ptr)
 {
     trace_qemu_vfree(ptr);
diff --git a/util/trace-events b/util/trace-events
index 88b7dbf4a5..e4af970312 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -46,8 +46,10 @@ qemu_co_mutex_unlock_return(void *mutex, void *self) "mutex %p self %p"
 # oslib-posix.c
 qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size %zu ptr %p"
 qemu_anon_ram_alloc(size_t size, void *ptr) "size %zu ptr %p"
+qemu_anon_ram_alloc_resizeable(size_t size, size_t max_size, void *ptr) "size %zu max_size %zu ptr %p"
+qemu_anon_ram_resize(size_t old_size, size_t new_size, void *ptr) "old_size %zu new_size %zu ptr %p"
 qemu_vfree(void *ptr) "ptr %p"
-qemu_anon_ram_free(void *ptr, size_t size) "ptr %p size %zu"
+qemu_anon_ram_free(void *ptr, size_t max_size) "ptr %p max_size %zu"
 
 # hbitmap.c
 hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned long cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx"
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 14/15] numa: Introduce ram_block_notifiers_support_resize()
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (12 preceding siblings ...)
  2020-02-27 10:12 ` [PATCH v3 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-02-27 10:12 ` David Hildenbrand
  2020-02-28 20:11   ` Peter Xu
  2020-02-27 10:12 ` [PATCH v3 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
  14 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	David Hildenbrand, Dr . David Alan Gilbert, Peter Xu,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We want to actually use resizeable allocations in resizeable ram blocks
(IOW, make everything between used_length and max_length inaccessible) -
however, not all ram block notifiers can support that.

Introduce a way to detect if any registered notifier does not
support resizes - ram_block_notifiers_support_resize() - which we can later
use to fallback to legacy handling if a registered notifier (esp., SEV and
HAX) does not support actual resizes.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/numa.c         | 12 ++++++++++++
 include/exec/ramlist.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 37ce175e13..1d5288c22c 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -914,3 +914,15 @@ void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
         }
     }
 }
+
+bool ram_block_notifiers_support_resize(void)
+{
+    RAMBlockNotifier *notifier;
+
+    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+        if (!notifier->ram_block_resized) {
+            return false;
+        }
+    }
+    return true;
+}
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 293c0ddabe..ac5811be96 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -79,6 +79,7 @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
 void ram_block_notify_add(void *host, size_t size, size_t max_size);
 void ram_block_notify_remove(void *host, size_t size, size_t max_size);
 void ram_block_notify_resize(void *host, size_t old_size, size_t new_size);
+bool ram_block_notifiers_support_resize(void);
 
 void ram_block_dump(Monitor *mon);
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX
  2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
                   ` (13 preceding siblings ...)
  2020-02-27 10:12 ` [PATCH v3 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
@ 2020-02-27 10:12 ` David Hildenbrand
  2020-02-28 20:21   ` Peter Xu
  14 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-27 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, David Hildenbrand, Dr . David Alan Gilbert,
	Peter Xu, Shameerali Kolothum Thodi, Murilo Opsfelder Araujo,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

We can now make use of resizeable anonymous allocations to implement
actually resizeable ram blocks. Resizeable anonymous allocations are
not implemented under WIN32 yet and are not available when using
alternative allocators. Fall back to the existing handling.

We also have to fallback to the existing handling in case any ram block
notifier does not support resizing (esp., AMD SEV, HAX) yet. Remember
in RAM_RESIZEABLE_ALLOC if we are using resizeable anonymous allocations.

Try to grow early, as that can easily fail if out of memory. Shrink late
and ignore errors (nothing will actually break). Warn only.

The benefit of actually resizeable ram blocks is that e.g., under Linux,
only the actual size will be reserved (even if
"/proc/sys/vm/overcommit_memory" is set to "never"). Additional memory will
be reserved when trying to resize, which allows to have ram blocks that
start small but can theoretically grow very large.

Note1: We are not able to create resizeable ram blocks with pre-allocated
       memory yet, so prealloc is not affected.
Note2: mlock should work as it used to as os_mlock() does a
       mlockall(MCL_CURRENT | MCL_FUTURE), which includes future
       mappings.
Note3: Nobody should access memory beyond used_length. Memory notifiers
       already properly take care of this, only ram block notifiers
       violate this constraint and, therefore, have to be special-cased.
       Especially, any ram block notifier that might dynamically
       register at runtime (e.g., vfio) has to support resizes. Add an
       assert for that. Both, HAX and SEV register early, so they are
       fine.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                    | 64 ++++++++++++++++++++++++++++++++++++---
 hw/core/numa.c            |  7 +++++
 include/exec/cpu-common.h |  2 ++
 include/exec/memory.h     |  8 +++++
 4 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index cdf962a936..ba72f040e5 100644
--- a/exec.c
+++ b/exec.c
@@ -2001,6 +2001,16 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
     rb->flags &= ~RAM_MIGRATABLE;
 }
 
+bool qemu_ram_is_resizeable(RAMBlock *rb)
+{
+    return rb->flags & RAM_RESIZEABLE;
+}
+
+bool qemu_ram_is_resizeable_alloc(RAMBlock *rb)
+{
+    return rb->flags & RAM_RESIZEABLE_ALLOC;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState *dev)
 {
@@ -2094,6 +2104,7 @@ static void qemu_ram_apply_settings(void *host, size_t length)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+    const bool shared = block->flags & RAM_SHARED;
     const ram_addr_t oldsize = block->used_length;
 
     assert(block);
@@ -2104,7 +2115,7 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return 0;
     }
 
-    if (!(block->flags & RAM_RESIZEABLE)) {
+    if (!qemu_ram_is_resizeable(block)) {
         error_setg_errno(errp, EINVAL,
                          "Length mismatch: %s: 0x" RAM_ADDR_FMT
                          " in != 0x" RAM_ADDR_FMT, block->idstr,
@@ -2120,6 +2131,15 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return -EINVAL;
     }
 
+    if (oldsize < newsize && qemu_ram_is_resizeable_alloc(block)) {
+        if (!qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+            error_setg_errno(errp, -ENOMEM, "Cannot allocate enough memory.");
+            return -ENOMEM;
+        }
+        /* apply settings for the newly accessible memory */
+        qemu_ram_apply_settings(block->host + oldsize, newsize - oldsize);
+    }
+
     /* Notify before modifying the ram block and touching the bitmaps. */
     if (block->host) {
         ram_block_notify_resize(block->host, oldsize, newsize);
@@ -2133,6 +2153,16 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
     if (block->resized) {
         block->resized(block->idstr, newsize, block->host);
     }
+
+    /*
+     * Shrinking will only fail in rare scenarios (e.g., maximum number of
+     * mappings reached), and can be ignored. Warn only.
+     */
+    if (newsize < oldsize && qemu_ram_is_resizeable_alloc(block) &&
+        !qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+        warn_report("Shrinking memory allocation failed.");
+    }
+
     return 0;
 }
 
@@ -2211,6 +2241,28 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
     }
 }
 
+static void ram_block_alloc_ram(RAMBlock *rb)
+{
+    const bool shared = qemu_ram_is_shared(rb);
+
+    /*
+     * If we can, try to allocate actually resizeable ram. Will also fail
+     * if qemu_anon_ram_alloc_resizeable() is not implemented.
+     */
+    if (phys_mem_alloc == qemu_anon_ram_alloc &&
+        qemu_ram_is_resizeable(rb) &&
+        ram_block_notifiers_support_resize()) {
+        rb->host = qemu_anon_ram_alloc_resizeable(rb->used_length,
+                                                  rb->max_length,
+                                                  &rb->mr->align, shared);
+        if (rb->host) {
+            rb->flags |= RAM_RESIZEABLE_ALLOC;
+            return;
+        }
+    }
+    rb->host = phys_mem_alloc(rb->max_length, &rb->mr->align, shared);
+}
+
 static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
     RAMBlock *block;
@@ -2233,9 +2285,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                 return;
             }
         } else {
-            new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align,
-                                             qemu_ram_is_shared(new_block));
+            ram_block_alloc_ram(new_block);
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
@@ -2280,7 +2330,11 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                                         DIRTY_CLIENTS_ALL);
 
     if (new_block->host) {
-        qemu_ram_apply_settings(new_block->host, new_block->max_length);
+        if (qemu_ram_is_resizeable_alloc(new_block)) {
+            qemu_ram_apply_settings(new_block->host, new_block->used_length);
+        } else {
+            qemu_ram_apply_settings(new_block->host, new_block->max_length);
+        }
         ram_block_notify_add(new_block->host, new_block->used_length,
                              new_block->max_length);
     }
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 1d5288c22c..c547549e49 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -862,6 +862,13 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
     RAMBlockNotifier *notifier = opaque;
 
     if (host) {
+        /*
+         * Dynamically adding notifiers that don't support resizes is forbidden
+         * when dealing with resizeable ram blocks that have actually resizeable
+         * allocations.
+         */
+        g_assert(!qemu_ram_is_resizeable_alloc(rb) ||
+                 notifier->ram_block_resized);
         notifier->ram_block_added(notifier, host, size, max_size);
     }
     return 0;
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 09decb8d93..aacbf33b85 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -66,6 +66,8 @@ void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
 void qemu_ram_set_migratable(RAMBlock *rb);
 void qemu_ram_unset_migratable(RAMBlock *rb);
+bool qemu_ram_is_resizeable(RAMBlock *rb);
+bool qemu_ram_is_resizeable_alloc(RAMBlock *rb);
 
 size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index b9b9470a56..74805dd448 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/*
+ * Implies RAM_RESIZEABLE. Memory beyond the used_length is inaccessible
+ * (esp. initially and after resizing). For such memory blocks, only the
+ * used_length is reserved in the OS - resizing might fail. Will only be
+ * used with host OS support and if all ram block notifiers support resizing.
+ */
+#define RAM_RESIZEABLE_ALLOC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-27 10:12 ` [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
@ 2020-02-28 19:42   ` Peter Xu
  2020-02-28 20:16     ` David Hildenbrand
  2020-02-28 19:55   ` Peter Xu
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Xu @ 2020-02-28 19:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On Thu, Feb 27, 2020 at 11:12:02AM +0100, David Hildenbrand wrote:
> Let's implement ram_block_resized(), allowing resizeable mappings.
> 
> For resizeable mappings, we reserve $max_size IOVA address space, but only
> map $size of it. When resizing, unmap the old part and remap the new
> part. We'll need a new ioctl to do this atomically (e.g., to resize
> while the guest is running - not allowed for now).

Curious: I think it's true for now because resizing only happens
during reboot or destination VM during migration (but before
switching).  However is that guaranteed too in the future?

[...]

> @@ -631,7 +658,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
>                  qemu_vfio_remove_mapping(s, mapping);
>                  goto out;
>              }
> -            s->low_water_mark += size;
> +            s->low_water_mark += max_size;

I think it's fine to only increase the low water mark here, however
imo it would be better to also cache the max size in IOVAMapping too,
then in resize() we double check new_size <= max_size?  It also makes
IOVAMapping more self contained.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page
  2020-02-27 10:11 ` [PATCH v3 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
@ 2020-02-28 19:43   ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2020-02-28 19:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	qemu-devel, Dr . David Alan Gilbert, Greg Kurz,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

On Thu, Feb 27, 2020 at 11:11:57AM +0100, David Hildenbrand wrote:
> Let's factor out calculating the size of the guard page and rename the
> variable to make it clearer that this pagesize only applies to the
> guard page.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 11/15] util/mmap-alloc: Implement resizeable mmaps
  2020-02-27 10:12 ` [PATCH v3 11/15] util/mmap-alloc: Implement " David Hildenbrand
@ 2020-02-28 19:43   ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2020-02-28 19:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

On Thu, Feb 27, 2020 at 11:12:01AM +0100, David Hildenbrand wrote:
> Implement resizeable mmaps. For now, the actual resizing is not wired up.
> Introduce qemu_ram_mmap_resizeable() and qemu_ram_mmap_resize(). Make
> qemu_ram_mmap() a wrapper of qemu_ram_mmap_resizeable().
> 
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-27 10:12 ` [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
  2020-02-28 19:42   ` Peter Xu
@ 2020-02-28 19:55   ` Peter Xu
  2020-02-28 20:19     ` David Hildenbrand
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Xu @ 2020-02-28 19:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On Thu, Feb 27, 2020 at 11:12:02AM +0100, David Hildenbrand wrote:
> +static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
> +                                     size_t old_size, size_t new_size)
> +{
> +    IOVAMapping *m;
> +    int index = 0;
> +
> +    qemu_mutex_lock(&s->lock);
> +    m = qemu_vfio_find_mapping(s, host, &index);
> +    if (!m) {
> +        return;
> +    }
> +    assert(m->size == old_size);
> +
> +    /* Note: Not atomic - we need a new ioctl for that. */
> +    qemu_vfio_undo_mapping(s, m->iova, m->size);
> +    qemu_vfio_do_mapping(s, host, m->iova, new_size);

Another way to ask my previous question 1 (in the other reply): Can we
simply map/unmap the extra, while keep the shared untouched?

Thanks,

> +
> +    m->size = new_size;
> +    assert(qemu_vfio_verify_mappings(s));
> +
> +    qemu_mutex_unlock(&s->lock);
> +}

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 13/15] util: oslib: Resizeable anonymous allocations under POSIX
  2020-02-27 10:12 ` [PATCH v3 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-02-28 20:11   ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2020-02-28 20:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, qemu-devel, Dr . David Alan Gilbert,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

On Thu, Feb 27, 2020 at 11:12:03AM +0100, David Hildenbrand wrote:
> Introduce qemu_anon_ram_alloc_resizeable() and qemu_anon_ram_resize().
> Implement them under POSIX and make them return NULL under WIN32.
> 
> Under POSIX, we make use of resizeable mmaps. An implementation under
> WIN32 is theoretically possible AFAIK and can be added later.
> 
> In qemu_anon_ram_free(), rename the size parameter to max_size, to make
> it clearer that we have to use the maximum size when freeing resizeable
> anonymous allocations.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Stefan Weil <sw@weilnetz.de>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 14/15] numa: Introduce ram_block_notifiers_support_resize()
  2020-02-27 10:12 ` [PATCH v3 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
@ 2020-02-28 20:11   ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2020-02-28 20:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	qemu-devel, Dr . David Alan Gilbert, Murilo Opsfelder Araujo,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thu, Feb 27, 2020 at 11:12:04AM +0100, David Hildenbrand wrote:
> We want to actually use resizeable allocations in resizeable ram blocks
> (IOW, make everything between used_length and max_length inaccessible) -
> however, not all ram block notifiers can support that.
> 
> Introduce a way to detect if any registered notifier does not
> support resizes - ram_block_notifiers_support_resize() - which we can later
> use to fallback to legacy handling if a registered notifier (esp., SEV and
> HAX) does not support actual resizes.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-28 19:42   ` Peter Xu
@ 2020-02-28 20:16     ` David Hildenbrand
  2020-02-28 21:01       ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-28 20:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: Igor Kotrasinski, Eduardo Habkost, David Hildenbrand,
	Michael S . Tsirkin, Stefan Hajnoczi, qemu-devel,
	Dr . David Alan Gilbert, Alex Williamson,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson



> Am 28.02.2020 um 20:43 schrieb Peter Xu <peterx@redhat.com>:
> 
> On Thu, Feb 27, 2020 at 11:12:02AM +0100, David Hildenbrand wrote:
>> Let's implement ram_block_resized(), allowing resizeable mappings.
>> 
>> For resizeable mappings, we reserve $max_size IOVA address space, but only
>> map $size of it. When resizing, unmap the old part and remap the new
>> part. We'll need a new ioctl to do this atomically (e.g., to resize
>> while the guest is running - not allowed for now).
> 
> Curious: I think it's true for now because resizing only happens
> during reboot or destination VM during migration (but before
> switching).  However is that guaranteed too in the future?
> 

E.g., virtio-mem will run mutual exclusive with vfio until vfio won‘t pin all guest memory anymore blindly (iow, is compatible with memory overcommit and discarding of ram blocks).

> [...]
> 
>> @@ -631,7 +658,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
>>                 qemu_vfio_remove_mapping(s, mapping);
>>                 goto out;
>>             }
>> -            s->low_water_mark += size;
>> +            s->low_water_mark += max_size;
> 
> I think it's fine to only increase the low water mark here, however
> imo it would be better to also cache the max size in IOVAMapping too,
> then in resize() we double check new_size <= max_size?  It also makes
> IOVAMapping more self contained.
> 

I‘ll have a look how much additional code that will imply - if it‘s simple, I‘ll do it.

Thanks!

> Thanks,
> 
> -- 
> Peter Xu
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-28 19:55   ` Peter Xu
@ 2020-02-28 20:19     ` David Hildenbrand
  2020-02-28 20:49       ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-02-28 20:19 UTC (permalink / raw)
  To: Peter Xu
  Cc: Igor Kotrasinski, Eduardo Habkost, David Hildenbrand,
	Michael S . Tsirkin, Stefan Hajnoczi, qemu-devel,
	Dr . David Alan Gilbert, Alex Williamson,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson



> Am 28.02.2020 um 20:55 schrieb Peter Xu <peterx@redhat.com>:
> 
> On Thu, Feb 27, 2020 at 11:12:02AM +0100, David Hildenbrand wrote:
>> +static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
>> +                                     size_t old_size, size_t new_size)
>> +{
>> +    IOVAMapping *m;
>> +    int index = 0;
>> +
>> +    qemu_mutex_lock(&s->lock);
>> +    m = qemu_vfio_find_mapping(s, host, &index);
>> +    if (!m) {
>> +        return;
>> +    }
>> +    assert(m->size == old_size);
>> +
>> +    /* Note: Not atomic - we need a new ioctl for that. */
>> +    qemu_vfio_undo_mapping(s, m->iova, m->size);
>> +    qemu_vfio_do_mapping(s, host, m->iova, new_size);
> 
> Another way to ask my previous question 1 (in the other reply): Can we
> simply map/unmap the extra, while keep the shared untouched?

As far as I understand the kernel implementation, unfortunately no. You might be able to grow (by mapping the delta), but shrinking is not possible AFAIR. And I *think* with many resizes, there could be an issue if I remember correctly.

Thanks!

> 
> Thanks,
> 
>> +
>> +    m->size = new_size;
>> +    assert(qemu_vfio_verify_mappings(s));
>> +
>> +    qemu_mutex_unlock(&s->lock);
>> +}
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX
  2020-02-27 10:12 ` [PATCH v3 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
@ 2020-02-28 20:21   ` Peter Xu
  2020-03-02 15:27     ` David Hildenbrand
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2020-02-28 20:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, qemu-devel, Shameerali Kolothum Thodi,
	Dr . David Alan Gilbert, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On Thu, Feb 27, 2020 at 11:12:05AM +0100, David Hildenbrand wrote:

[...]

> +static void ram_block_alloc_ram(RAMBlock *rb)
> +{
> +    const bool shared = qemu_ram_is_shared(rb);
> +
> +    /*
> +     * If we can, try to allocate actually resizeable ram. Will also fail
> +     * if qemu_anon_ram_alloc_resizeable() is not implemented.
> +     */
> +    if (phys_mem_alloc == qemu_anon_ram_alloc &&
> +        qemu_ram_is_resizeable(rb) &&
> +        ram_block_notifiers_support_resize()) {
> +        rb->host = qemu_anon_ram_alloc_resizeable(rb->used_length,
> +                                                  rb->max_length,
> +                                                  &rb->mr->align, shared);
> +        if (rb->host) {
> +            rb->flags |= RAM_RESIZEABLE_ALLOC;

A trivial nit: If it should only be set automatically by the memory
code, shall we mark it our somewhere just in case someone passed this
in flag explicitly (which, iiuc, is a misuse)?  Maybe:

  assert(!(rb->flags | RAM_RESIZEABLE_ALLOC));

At the entry of this function?  Other than that it looks sane to me:

Reviewed-by: Peter Xu <peterx@redhat.com>

> +            return;
> +        }
> +    }
> +    rb->host = phys_mem_alloc(rb->max_length, &rb->mr->align, shared);
> +}

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-28 20:19     ` David Hildenbrand
@ 2020-02-28 20:49       ` Peter Xu
  2020-02-28 20:56         ` David Hildenbrand
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2020-02-28 20:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Kotrasinski, Eduardo Habkost, David Hildenbrand,
	Michael S . Tsirkin, Stefan Hajnoczi, qemu-devel,
	Dr . David Alan Gilbert, Alex Williamson,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

On Fri, Feb 28, 2020 at 03:19:45PM -0500, David Hildenbrand wrote:
> 
> 
> > Am 28.02.2020 um 20:55 schrieb Peter Xu <peterx@redhat.com>:
> > 
> > On Thu, Feb 27, 2020 at 11:12:02AM +0100, David Hildenbrand wrote:
> >> +static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
> >> +                                     size_t old_size, size_t new_size)
> >> +{
> >> +    IOVAMapping *m;
> >> +    int index = 0;
> >> +
> >> +    qemu_mutex_lock(&s->lock);
> >> +    m = qemu_vfio_find_mapping(s, host, &index);
> >> +    if (!m) {
> >> +        return;
> >> +    }
> >> +    assert(m->size == old_size);
> >> +
> >> +    /* Note: Not atomic - we need a new ioctl for that. */
> >> +    qemu_vfio_undo_mapping(s, m->iova, m->size);
> >> +    qemu_vfio_do_mapping(s, host, m->iova, new_size);
> > 
> > Another way to ask my previous question 1 (in the other reply): Can we
> > simply map/unmap the extra, while keep the shared untouched?
> 
> As far as I understand the kernel implementation, unfortunately no. You might be able to grow (by mapping the delta), but shrinking is not possible AFAIR. And I *think* with many resizes, there could be an issue if I remember correctly.

Ah right (and I think the same thing will happen to vfio-pci during
memory_region_set_size()).  Then please double confirm that virtio-mem
is disabled for both vfio-helper users and vfio-pci.  Also, would you
mind comment above the unmap+map with more information?  E.g.:

  For now, we must unmap the whole mapped range first and remap with
  the new size.  The reason is that VFIO_IOMMU_UNMAP_DMA might fail
  with partial unmap of previous mappings, so even we can add extra
  new mappings to extend the old range, however we still won't able to
  always success on shrinking memories.  The side effect is that it's
  never safe to do resizing during VM execution.

Or something better.  With an enriched comment, I think it's fine:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-28 20:49       ` Peter Xu
@ 2020-02-28 20:56         ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-02-28 20:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Igor Kotrasinski, Eduardo Habkost, David Hildenbrand,
	Michael S . Tsirkin, Stefan Hajnoczi, qemu-devel,
	Dr . David Alan Gilbert, Alex Williamson,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson



> Am 28.02.2020 um 21:49 schrieb Peter Xu <peterx@redhat.com>:
> 
> On Fri, Feb 28, 2020 at 03:19:45PM -0500, David Hildenbrand wrote:
>> 
>> 
>>>> Am 28.02.2020 um 20:55 schrieb Peter Xu <peterx@redhat.com>:
>>> 
>>> On Thu, Feb 27, 2020 at 11:12:02AM +0100, David Hildenbrand wrote:
>>>> +static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
>>>> +                                     size_t old_size, size_t new_size)
>>>> +{
>>>> +    IOVAMapping *m;
>>>> +    int index = 0;
>>>> +
>>>> +    qemu_mutex_lock(&s->lock);
>>>> +    m = qemu_vfio_find_mapping(s, host, &index);
>>>> +    if (!m) {
>>>> +        return;
>>>> +    }
>>>> +    assert(m->size == old_size);
>>>> +
>>>> +    /* Note: Not atomic - we need a new ioctl for that. */
>>>> +    qemu_vfio_undo_mapping(s, m->iova, m->size);
>>>> +    qemu_vfio_do_mapping(s, host, m->iova, new_size);
>>> 
>>> Another way to ask my previous question 1 (in the other reply): Can we
>>> simply map/unmap the extra, while keep the shared untouched?
>> 
>> As far as I understand the kernel implementation, unfortunately no. You might be able to grow (by mapping the delta), but shrinking is not possible AFAIR. And I *think* with many resizes, there could be an issue if I remember correctly.
> 
> Ah right (and I think the same thing will happen to vfio-pci during
> memory_region_set_size()).  Then please double confirm that virtio-mem
> is disabled for both vfio-helper users and vfio-pci.  Also, would you

Yes, will double check, should have taken care of both.

> mind comment above the unmap+map with more information?  E.g.:
> 

Sure, will add - thanks!

>  For now, we must unmap the whole mapped range first and remap with
>  the new size.  The reason is that VFIO_IOMMU_UNMAP_DMA might fail
>  with partial unmap of previous mappings, so even we can add extra
>  new mappings to extend the old range, however we still won't able to
>  always success on shrinking memories.  The side effect is that it's
>  never safe to do resizing during VM execution.
> 
> Or something better.  With an enriched comment, I think it's fine:
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 

Thanks Peter! Have a nice weekend!



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-28 20:16     ` David Hildenbrand
@ 2020-02-28 21:01       ` Peter Xu
  2020-03-02 15:01         ` David Hildenbrand
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2020-02-28 21:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On Fri, Feb 28, 2020 at 09:16:28PM +0100, David Hildenbrand wrote:
> 

[...]

> >> @@ -631,7 +658,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
> >>                 qemu_vfio_remove_mapping(s, mapping);
> >>                 goto out;
> >>             }
> >> -            s->low_water_mark += size;
> >> +            s->low_water_mark += max_size;
> > 
> > I think it's fine to only increase the low water mark here, however
> > imo it would be better to also cache the max size in IOVAMapping too,
> > then in resize() we double check new_size <= max_size?  It also makes
> > IOVAMapping more self contained.
> > 
> 
> I‘ll have a look how much additional code that will imply - if it‘s simple, I‘ll do it.

AFAICT it'll be as simple as introducing IOVAMapping.max_size, then
pass max_size into qemu_vfio_add_mapping().  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized()
  2020-02-28 21:01       ` Peter Xu
@ 2020-03-02 15:01         ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-03-02 15:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Alex Williamson, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On 28.02.20 22:01, Peter Xu wrote:
> On Fri, Feb 28, 2020 at 09:16:28PM +0100, David Hildenbrand wrote:
>>
> 
> [...]
> 
>>>> @@ -631,7 +658,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
>>>>                 qemu_vfio_remove_mapping(s, mapping);
>>>>                 goto out;
>>>>             }
>>>> -            s->low_water_mark += size;
>>>> +            s->low_water_mark += max_size;
>>>
>>> I think it's fine to only increase the low water mark here, however
>>> imo it would be better to also cache the max size in IOVAMapping too,
>>> then in resize() we double check new_size <= max_size?  It also makes
>>> IOVAMapping more self contained.
>>>
>>
>> I‘ll have a look how much additional code that will imply - if it‘s simple, I‘ll do it.
> 
> AFAICT it'll be as simple as introducing IOVAMapping.max_size, then
> pass max_size into qemu_vfio_add_mapping().  Thanks,
> 

Yeah, was answering from my mobile without code at hand :) added! Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX
  2020-02-28 20:21   ` Peter Xu
@ 2020-03-02 15:27     ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-03-02 15:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	Igor Kotrasinski, qemu-devel, Shameerali Kolothum Thodi,
	Dr . David Alan Gilbert, Murilo Opsfelder Araujo, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

On 28.02.20 21:21, Peter Xu wrote:
> On Thu, Feb 27, 2020 at 11:12:05AM +0100, David Hildenbrand wrote:
> 
> [...]
> 
>> +static void ram_block_alloc_ram(RAMBlock *rb)
>> +{
>> +    const bool shared = qemu_ram_is_shared(rb);
>> +
>> +    /*
>> +     * If we can, try to allocate actually resizeable ram. Will also fail
>> +     * if qemu_anon_ram_alloc_resizeable() is not implemented.
>> +     */
>> +    if (phys_mem_alloc == qemu_anon_ram_alloc &&
>> +        qemu_ram_is_resizeable(rb) &&
>> +        ram_block_notifiers_support_resize()) {
>> +        rb->host = qemu_anon_ram_alloc_resizeable(rb->used_length,
>> +                                                  rb->max_length,
>> +                                                  &rb->mr->align, shared);
>> +        if (rb->host) {
>> +            rb->flags |= RAM_RESIZEABLE_ALLOC;
> 
> A trivial nit: If it should only be set automatically by the memory
> code, shall we mark it our somewhere just in case someone passed this
> in flag explicitly (which, iiuc, is a misuse)?  Maybe:
> 
>   assert(!(rb->flags | RAM_RESIZEABLE_ALLOC));
> 
> At the entry of this function?  Other than that it looks sane to me:

Excellent idea, thanks!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 09/15] util/mmap-alloc: Factor out populating of memory to mmap_populate()
  2020-02-27 10:11 ` [PATCH v3 09/15] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
@ 2020-03-03  8:43   ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-03-03  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, Dr . David Alan Gilbert, Peter Xu, Greg Kurz,
	Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

On 27.02.20 11:11, David Hildenbrand wrote:
> We want to populate memory within a reserved memory region. Let's factor
> that out.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Thought a bit longer about the terminology (as brought up by Peter).
Populate might indeed be confusing, because we are not actually
populating all memory (e.g., MAP_POPULATE). We enable/activate the
mapping to be accessible.

Hmmm

mmap_enable() ?

mmap_activate() ?

I think I like the latter. Alternatives?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-03-03  8:44 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-27 10:11 [PATCH v3 00/15] Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 01/15] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 02/15] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 03/15] util: vfio-helpers: Factor out removal " David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 04/15] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 05/15] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 06/15] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 07/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
2020-02-28 19:43   ` Peter Xu
2020-02-27 10:11 ` [PATCH v3 08/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2020-02-27 10:11 ` [PATCH v3 09/15] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
2020-03-03  8:43   ` David Hildenbrand
2020-02-27 10:12 ` [PATCH v3 10/15] util/mmap-alloc: Prepare for resizeable mmaps David Hildenbrand
2020-02-27 10:12 ` [PATCH v3 11/15] util/mmap-alloc: Implement " David Hildenbrand
2020-02-28 19:43   ` Peter Xu
2020-02-27 10:12 ` [PATCH v3 12/15] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
2020-02-28 19:42   ` Peter Xu
2020-02-28 20:16     ` David Hildenbrand
2020-02-28 21:01       ` Peter Xu
2020-03-02 15:01         ` David Hildenbrand
2020-02-28 19:55   ` Peter Xu
2020-02-28 20:19     ` David Hildenbrand
2020-02-28 20:49       ` Peter Xu
2020-02-28 20:56         ` David Hildenbrand
2020-02-27 10:12 ` [PATCH v3 13/15] util: oslib: Resizeable anonymous allocations under POSIX David Hildenbrand
2020-02-28 20:11   ` Peter Xu
2020-02-27 10:12 ` [PATCH v3 14/15] numa: Introduce ram_block_notifiers_support_resize() David Hildenbrand
2020-02-28 20:11   ` Peter Xu
2020-02-27 10:12 ` [PATCH v3 15/15] exec: Ram blocks with resizeable anonymous allocations under POSIX David Hildenbrand
2020-02-28 20:21   ` Peter Xu
2020-03-02 15:27     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.