qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX
@ 2020-02-12 13:42 David Hildenbrand
  2020-02-12 13:42 ` [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
                   ` (16 more replies)
  0 siblings, 17 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Murilo Opsfelder Araujo, Igor Kotrasinski, Eduardo Habkost,
	Michael S . Tsirkin, Stefan Weil, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert,
	Shameerali Kolothum Thodi, Greg Kurz, Paul Durrant,
	Alex Williamson, Stefano Stabellini, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Stefan Hajnoczi,
	Richard Henderson

We already allow resizable ram blocks for anonymous memory, however, they
are not actually resized. All memory is mmaped() R/W, including the memory
exceeding the used_length, up to the max_length.

When resizing, effectively only the boundary is moved. Implement actually
resizable anonymous allocations and make use of them in resizable ram
blocks when possible. Memory exceeding the used_length will be
inaccessible. Especially ram block notifiers require care.

Having actually resizable anonymous allocations (via mmap-hackery) allows
to reserve a big region in virtual address space and grow the
accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
is set to "never" under Linux, huge reservations will succeed. If there is
not enough memory when resizing (to populate parts of the reserved region),
trying to resize will fail. Only the actually used size is reserved in the
OS.

E.g., virtio-mem [1] wants to reserve big resizable memory regions and
grow the usable part on demand. I think this change is worth sending out
individually. Accompanied by a bunch of minor fixes and cleanups.

Especially, memory notifiers already handle resizing by first removing
the old region, and then re-adding the resized region. prealloc is
currently not possible with resizable ram blocks. mlock() should continue
to work as is. Resizing is currently rare and must only happen on the
start of an incoming migration, or during resets. No code path (except
HAX and SEV ram block notifiers) should access memory outside of the usable
range - and if we ever find one, that one has to be fixed (I did not
identify any).

v1 -> v2:
- Add "util: vfio-helpers: Fix qemu_vfio_close()"
- Add "util: vfio-helpers: Remove Error parameter from
       qemu_vfio_undo_mapping()"
- Add "util: vfio-helpers: Factor out removal from
       qemu_vfio_undo_mapping()"
- "util/mmap-alloc: ..."
 -- Minor changes due to review feedback (e.g., assert alignment, return
    bool when resizing)
- "util: vfio-helpers: Implement ram_block_resized()"
 -- Reserve max_size in the IOVA address space.
 -- On resize, undo old mapping and do new mapping. We can later implement
    a new ioctl to resize the mapping directly.
- "numa: Teach ram block notifiers about resizable ram blocks"
 -- Pass size/max_size to ram block notifiers, which makes things easier an
    cleaner
- "exec: Ram blocks with resizable anonymous allocations under POSIX"
 -- Adapt to new ram block notifiers
 -- Shrink after notifying. Always trigger ram block notifiers on resizes
 -- Add a safety net that all ram block notifiers registered at runtime
    support resizes.

[1] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/

David Hildenbrand (16):
  util: vfio-helpers: Factor out and fix processing of existing ram
    blocks
  util: vfio-helpers: Fix qemu_vfio_close()
  util: vfio-helpers: Remove Error parameter from
    qemu_vfio_undo_mapping()
  util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  exec: Factor out setting ram settings (madvise ...) into
    qemu_ram_apply_settings()
  exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  exec: Drop "shared" parameter from ram_block_add()
  util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  util/mmap-alloc: Factor out reserving of a memory region to
    mmap_reserve()
  util/mmap-alloc: Factor out populating of memory to mmap_populate()
  util/mmap-alloc: Prepare for resizable mmaps
  util/mmap-alloc: Implement resizable mmaps
  numa: Teach ram block notifiers about resizable ram blocks
  util: vfio-helpers: Implement ram_block_resized()
  util: oslib: Resizable anonymous allocations under POSIX
  exec: Ram blocks with resizable anonymous allocations under POSIX

 exec.c                     | 104 +++++++++++++++++++----
 hw/core/numa.c             |  53 +++++++++++-
 hw/i386/xen/xen-mapcache.c |   7 +-
 include/exec/cpu-common.h  |   3 +
 include/exec/memory.h      |   8 ++
 include/exec/ramlist.h     |  14 +++-
 include/qemu/mmap-alloc.h  |  21 +++--
 include/qemu/osdep.h       |   6 +-
 stubs/ram-block.c          |  20 -----
 target/i386/hax-mem.c      |   5 +-
 target/i386/sev.c          |  18 ++--
 util/mmap-alloc.c          | 165 +++++++++++++++++++++++--------------
 util/oslib-posix.c         |  37 ++++++++-
 util/oslib-win32.c         |  14 ++++
 util/trace-events          |   9 +-
 util/vfio-helpers.c        | 145 +++++++++++++++++++++-----------
 16 files changed, 450 insertions(+), 179 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-18 22:00   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 02/16] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Stefan Hajnoczi, Dr . David Alan Gilbert, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

Factor it out into common code when a new notifier is registered, just
as done with the memory region notifier. This allows us to have the
logic about how to process existing ram blocks at a central place (which
will be extended soon).

Just like when adding a new ram block, we have to register the max_length
for now. We don't have a way to get notified about resizes yet, and some
memory would not be mapped when growing the ram block.

Note: Currently, ram blocks are only "fake resized". All memory
(max_length) is accessible.

We can get rid of a bunch of functions in stubs/ram-block.c . Print the
warning from inside qemu_vfio_ram_block_added().

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                    |  5 +++++
 hw/core/numa.c            | 14 ++++++++++++++
 include/exec/cpu-common.h |  1 +
 stubs/ram-block.c         | 20 --------------------
 util/vfio-helpers.c       | 28 +++++++---------------------
 5 files changed, 27 insertions(+), 41 deletions(-)

diff --git a/exec.c b/exec.c
index 67e520d18e..05cfe868ab 100644
--- a/exec.c
+++ b/exec.c
@@ -2017,6 +2017,11 @@ ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
     return rb->used_length;
 }
 
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb)
+{
+    return rb->max_length;
+}
+
 bool qemu_ram_is_shared(RAMBlock *rb)
 {
     return rb->flags & RAM_SHARED;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 0d1b4be76a..6599c69e05 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -899,9 +899,23 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
     }
 }
 
+static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
+{
+    const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    void *host = qemu_ram_get_host_addr(rb);
+    RAMBlockNotifier *notifier = opaque;
+
+    if (host) {
+        notifier->ram_block_added(notifier, host, max_size);
+    }
+    return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
     QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
+    /* Notify about all existing ram blocks. */
+    qemu_ram_foreach_block(ram_block_notify_add_single, n);
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 81753bbb34..9760ac9068 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -59,6 +59,7 @@ const char *qemu_ram_get_idstr(RAMBlock *rb);
 void *qemu_ram_get_host_addr(RAMBlock *rb);
 ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
diff --git a/stubs/ram-block.c b/stubs/ram-block.c
index 73c0a3ee08..10855b52dd 100644
--- a/stubs/ram-block.c
+++ b/stubs/ram-block.c
@@ -2,21 +2,6 @@
 #include "exec/ramlist.h"
 #include "exec/cpu-common.h"
 
-void *qemu_ram_get_host_addr(RAMBlock *rb)
-{
-    return 0;
-}
-
-ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
-{
-    return 0;
-}
-
-ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
-{
-    return 0;
-}
-
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 }
@@ -24,8 +9,3 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 }
-
-int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
-{
-    return 0;
-}
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 813f7ec564..71e02e7f35 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -376,8 +376,13 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
                                       void *host, size_t size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+    int ret;
+
     trace_qemu_vfio_ram_block_added(s, host, size);
-    qemu_vfio_dma_map(s, host, size, false, NULL);
+    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+    if (ret) {
+        error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host, size, ret);
+    }
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -390,33 +395,14 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
     }
 }
 
-static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
-{
-    void *host_addr = qemu_ram_get_host_addr(rb);
-    ram_addr_t length = qemu_ram_get_used_length(rb);
-    int ret;
-    QEMUVFIOState *s = opaque;
-
-    if (!host_addr) {
-        return 0;
-    }
-    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
-    if (ret) {
-        fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-                host_addr, (uint64_t)length);
-    }
-    return 0;
-}
-
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
     qemu_mutex_init(&s->lock);
     s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
     s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
-    ram_block_notifier_add(&s->ram_notifier);
     s->low_water_mark = QEMU_VFIO_IOVA_MIN;
     s->high_water_mark = QEMU_VFIO_IOVA_MAX;
-    qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
+    ram_block_notifier_add(&s->ram_notifier);
 }
 
 /**
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 02/16] util: vfio-helpers: Fix qemu_vfio_close()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
  2020-02-12 13:42 ` [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-18 22:00   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Stefan Hajnoczi, Dr . David Alan Gilbert, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

qemu_vfio_undo_mapping() will decrement the number of mappings and
reshuffle the array elements to fit into the reduced size.

Iterating over all elements like this does not work as expected, let's make
sure to remove all mappings properly.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 71e02e7f35..d6332522c1 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -694,13 +694,11 @@ static void qemu_vfio_reset(QEMUVFIOState *s)
 /* Close and free the VFIO resources. */
 void qemu_vfio_close(QEMUVFIOState *s)
 {
-    int i;
-
     if (!s) {
         return;
     }
-    for (i = 0; i < s->nr_mappings; ++i) {
-        qemu_vfio_undo_mapping(s, &s->mappings[i], NULL);
+    while (s->nr_mappings) {
+        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
  2020-02-12 13:42 ` [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
  2020-02-12 13:42 ` [PATCH v2 fixed 02/16] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-18 22:07   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 04/16] util: vfio-helpers: Factor out removal " David Hildenbrand
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Stefan Hajnoczi, Dr . David Alan Gilbert, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

Everybody discards the error. Let's error_report() instead so this error
doesn't get lost.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index d6332522c1..13dd962d95 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -540,8 +540,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
 /**
  * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
-                                   Error **errp)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
 {
     int index;
     struct vfio_iommu_type1_dma_unmap unmap = {
@@ -556,7 +555,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
     assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
     assert(index >= 0 && index < s->nr_mappings);
     if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
-        error_setg(errp, "VFIO_UNMAP_DMA failed: %d", -errno);
+        error_report("VFIO_UNMAP_DMA failed: %d", -errno);
     }
     memmove(mapping, &s->mappings[index + 1],
             sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
@@ -621,7 +620,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
             assert(qemu_vfio_verify_mappings(s));
             ret = qemu_vfio_do_mapping(s, host, size, iova0);
             if (ret) {
-                qemu_vfio_undo_mapping(s, mapping, NULL);
+                qemu_vfio_undo_mapping(s, mapping);
                 goto out;
             }
             s->low_water_mark += size;
@@ -681,7 +680,7 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
     if (!m) {
         goto out;
     }
-    qemu_vfio_undo_mapping(s, m, NULL);
+    qemu_vfio_undo_mapping(s, m);
 out:
     qemu_mutex_unlock(&s->lock);
 }
@@ -698,7 +697,7 @@ void qemu_vfio_close(QEMUVFIOState *s)
         return;
     }
     while (s->nr_mappings) {
-        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
+        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 04/16] util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-18 22:10   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 05/16] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Stefan Hajnoczi, Dr . David Alan Gilbert, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

Factor it out and properly use it where applicable. Make
qemu_vfio_undo_mapping() look like qemu_vfio_do_mapping(), passing the
size and iova, not the mapping.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/vfio-helpers.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 13dd962d95..b3adc328db 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -517,6 +517,20 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
     return insert;
 }
 
+/**
+ * Remove the mapping from @s and free it.
+ */
+static void qemu_vfio_remove_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+{
+    const int index = mapping - s->mappings;
+
+    assert(index >= 0 && index < s->nr_mappings);
+    memmove(mapping, &s->mappings[index + 1],
+            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
+    s->nr_mappings--;
+    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
+}
+
 /* Do the DMA mapping with VFIO. */
 static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
                                 uint64_t iova)
@@ -538,29 +552,22 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
 }
 
 /**
- * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
+ * Undo the DMA mapping from @s with VFIO.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, size_t size, uint64_t iova)
 {
-    int index;
     struct vfio_iommu_type1_dma_unmap unmap = {
         .argsz = sizeof(unmap),
         .flags = 0,
-        .iova = mapping->iova,
-        .size = mapping->size,
+        .iova = iova,
+        .size = size,
     };
 
-    index = mapping - s->mappings;
-    assert(mapping->size > 0);
-    assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
-    assert(index >= 0 && index < s->nr_mappings);
+    assert(size > 0);
+    assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
     if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
         error_report("VFIO_UNMAP_DMA failed: %d", -errno);
     }
-    memmove(mapping, &s->mappings[index + 1],
-            sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
-    s->nr_mappings--;
-    s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
 }
 
 /* Check if the mapping list is (ascending) ordered. */
@@ -620,7 +627,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
             assert(qemu_vfio_verify_mappings(s));
             ret = qemu_vfio_do_mapping(s, host, size, iova0);
             if (ret) {
-                qemu_vfio_undo_mapping(s, mapping);
+                qemu_vfio_remove_mapping(s, mapping);
                 goto out;
             }
             s->low_water_mark += size;
@@ -680,7 +687,8 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
     if (!m) {
         goto out;
     }
-    qemu_vfio_undo_mapping(s, m);
+    qemu_vfio_undo_mapping(s, m->size, m->iova);
+    qemu_vfio_remove_mapping(s, m);
 out:
     qemu_mutex_unlock(&s->lock);
 }
@@ -697,7 +705,10 @@ void qemu_vfio_close(QEMUVFIOState *s)
         return;
     }
     while (s->nr_mappings) {
-        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
+        IOVAMapping *m = &s->mappings[s->nr_mappings - 1];
+
+        qemu_vfio_undo_mapping(s, m->size, m->iova);
+        qemu_vfio_remove_mapping(s, m);
     }
     ram_block_notifier_remove(&s->ram_notifier);
     qemu_vfio_reset(s);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 05/16] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 04/16] util: vfio-helpers: Factor out removal " David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-18 22:10   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 06/16] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

Factor all settings out into qemu_ram_apply_settings().

For memory_try_enable_merging(), the important bit is that it won't be
called with XEN - which is now still the case as new_block->host will
remain NULL.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 05cfe868ab..31a462a7d3 100644
--- a/exec.c
+++ b/exec.c
@@ -2121,6 +2121,15 @@ static int memory_try_enable_merging(void *addr, size_t len)
     return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
+static void qemu_ram_apply_settings(void *host, size_t length)
+{
+    memory_try_enable_merging(host, length);
+    qemu_ram_setup_dump(host, length);
+    qemu_madvise(host, length, QEMU_MADV_HUGEPAGE);
+    /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
+    qemu_madvise(host, length, QEMU_MADV_DONTFORK);
+}
+
 /* Only legal before guest might have detected the memory size: e.g. on
  * incoming migration, or right after reset.
  *
@@ -2271,7 +2280,6 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                 qemu_mutex_unlock_ramlist();
                 return;
             }
-            memory_try_enable_merging(new_block->host, new_block->max_length);
         }
     }
 
@@ -2309,10 +2317,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                                         DIRTY_CLIENTS_ALL);
 
     if (new_block->host) {
-        qemu_ram_setup_dump(new_block->host, new_block->max_length);
-        qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
-        /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
-        qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_DONTFORK);
+        qemu_ram_apply_settings(new_block->host, new_block->max_length);
         ram_block_notify_add(new_block->host, new_block->max_length);
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 06/16] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 05/16] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-18 22:11   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 07/16] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Richard Henderson, Dr . David Alan Gilbert, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

I don't see why we shouldn't apply all settings to make it look like the
surrounding RAM (and enable proper VMA merging).

Note: memory backend settings might have overridden these settings. We
would need a callback to let the memory backend fix that up.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 31a462a7d3..f7525867ec 100644
--- a/exec.c
+++ b/exec.c
@@ -2552,8 +2552,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                                  length, addr);
                     exit(1);
                 }
-                memory_try_enable_merging(vaddr, length);
-                qemu_ram_setup_dump(vaddr, length);
+                qemu_ram_apply_settings(vaddr, length);
             }
         }
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 07/16] exec: Drop "shared" parameter from ram_block_add()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 06/16] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-18 22:12   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize() David Hildenbrand
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Kotrasinski, Eduardo Habkost, Michael S . Tsirkin,
	David Hildenbrand, Richard Henderson, Dr . David Alan Gilbert,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

Properly store it in the flags of the ram block instead (and the flag
even already exists and is used).

E.g., qemu_ram_is_shared() now properly succeeds on all ram blocks that are
actually shared.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index f7525867ec..fc65c4f7ca 100644
--- a/exec.c
+++ b/exec.c
@@ -2249,7 +2249,7 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
     }
 }
 
-static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
+static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
     RAMBlock *block;
     RAMBlock *last_block = NULL;
@@ -2272,7 +2272,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
             }
         } else {
             new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align, shared);
+                                             &new_block->mr->align,
+                                             qemu_ram_is_shared(new_block));
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
@@ -2376,7 +2377,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
@@ -2438,10 +2439,13 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     if (host) {
         new_block->flags |= RAM_PREALLOC;
     }
+    if (share) {
+        new_block->flags |= RAM_SHARED;
+    }
     if (resizeable) {
         new_block->flags |= RAM_RESIZEABLE;
     }
-    ram_block_add(new_block, &local_err, share);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 07/16] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-19 22:46   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 09/16] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Kotrasinski, Eduardo Habkost, Michael S . Tsirkin,
	David Hildenbrand, Richard Henderson, Dr . David Alan Gilbert,
	Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Factor it out and add a comment.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 27dcccd8ec..82f02a2cec 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+static inline size_t mmap_pagesize(int fd)
+{
+#if defined(__powerpc64__) && defined(__linux__)
+    /* Mappings in the same segment must share the same page size */
+    return qemu_fd_getpagesize(fd);
+#else
+    return qemu_real_host_page_size;
+#endif
+}
+
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
                     bool shared,
                     bool is_pmem)
 {
+    const size_t pagesize = mmap_pagesize(fd);
     int flags;
     int map_sync_flags = 0;
     int guardfd;
     size_t offset;
-    size_t pagesize;
     size_t total;
     void *guardptr;
     void *ptr;
@@ -113,7 +123,6 @@ void *qemu_ram_mmap(int fd,
      * anonymous memory is OK.
      */
     flags = MAP_PRIVATE;
-    pagesize = qemu_fd_getpagesize(fd);
     if (fd == -1 || pagesize == qemu_real_host_page_size) {
         guardfd = -1;
         flags |= MAP_ANONYMOUS;
@@ -123,7 +132,6 @@ void *qemu_ram_mmap(int fd,
     }
 #else
     guardfd = -1;
-    pagesize = qemu_real_host_page_size;
     flags = MAP_PRIVATE | MAP_ANONYMOUS;
 #endif
 
@@ -198,15 +206,10 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
-    size_t pagesize;
+    const size_t pagesize = mmap_pagesize(fd);
 
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-#if defined(__powerpc64__) && defined(__linux__)
-        pagesize = qemu_fd_getpagesize(fd);
-#else
-        pagesize = qemu_real_host_page_size;
-#endif
         munmap(ptr, size + pagesize);
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 09/16] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-19 22:47   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Kotrasinski, Eduardo Habkost, Michael S . Tsirkin,
	David Hildenbrand, Richard Henderson, Dr . David Alan Gilbert,
	Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We want to reserve a memory region without actually populating memory.
Let's factor that out.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 58 +++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 82f02a2cec..43a26f38a8 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,38 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+/*
+ * Reserve a new memory region of the requested size to be used for mapping
+ * from the given fd (if any).
+ */
+static void *mmap_reserve(size_t size, int fd)
+{
+    int flags = MAP_PRIVATE;
+
+#if defined(__powerpc64__) && defined(__linux__)
+    /*
+     * On ppc64 mappings in the same segment (aka slice) must share the same
+     * page size. Since we will be re-allocating part of this segment
+     * from the supplied fd, we should make sure to use the same page size, to
+     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
+     * avoid allocating backing store memory.
+     * We do this unless we are using the system page size, in which case
+     * anonymous memory is OK.
+     */
+    if (fd == -1 || qemu_fd_getpagesize(fd) == qemu_real_host_page_size) {
+        fd = -1;
+        flags |= MAP_ANONYMOUS;
+    } else {
+        flags |= MAP_NORESERVE;
+    }
+#else
+    fd = -1;
+    flags |= MAP_ANONYMOUS;
+#endif
+
+    return mmap(0, size, PROT_NONE, flags, fd, 0);
+}
+
 static inline size_t mmap_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -101,7 +133,6 @@ void *qemu_ram_mmap(int fd,
     const size_t pagesize = mmap_pagesize(fd);
     int flags;
     int map_sync_flags = 0;
-    int guardfd;
     size_t offset;
     size_t total;
     void *guardptr;
@@ -113,30 +144,7 @@ void *qemu_ram_mmap(int fd,
      */
     total = size + align;
 
-#if defined(__powerpc64__) && defined(__linux__)
-    /* On ppc64 mappings in the same segment (aka slice) must share the same
-     * page size. Since we will be re-allocating part of this segment
-     * from the supplied fd, we should make sure to use the same page size, to
-     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
-     * avoid allocating backing store memory.
-     * We do this unless we are using the system page size, in which case
-     * anonymous memory is OK.
-     */
-    flags = MAP_PRIVATE;
-    if (fd == -1 || pagesize == qemu_real_host_page_size) {
-        guardfd = -1;
-        flags |= MAP_ANONYMOUS;
-    } else {
-        guardfd = fd;
-        flags |= MAP_NORESERVE;
-    }
-#else
-    guardfd = -1;
-    flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#endif
-
-    guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
-
+    guardptr = mmap_reserve(total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 09/16] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-19 22:49   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps David Hildenbrand
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Kotrasinski, Eduardo Habkost, Michael S . Tsirkin,
	David Hildenbrand, Richard Henderson, Dr . David Alan Gilbert,
	Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We want to populate memory within a reserved memory region. Let's factor
that out.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 89 +++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 43a26f38a8..2f366dae72 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -114,6 +114,50 @@ static void *mmap_reserve(size_t size, int fd)
     return mmap(0, size, PROT_NONE, flags, fd, 0);
 }
 
+/*
+ * Populate memory in a reserved region from the given fd (if any).
+ */
+static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
+                           bool is_pmem)
+{
+    int map_sync_flags = 0;
+    int flags = MAP_FIXED;
+    void *populated_ptr;
+
+    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
+    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
+    populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
+                         flags | map_sync_flags, fd, 0);
+    if (populated_ptr == MAP_FAILED && map_sync_flags) {
+        if (errno == ENOTSUP) {
+            char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+            char *file_name = g_malloc0(PATH_MAX);
+            int len = readlink(proc_link, file_name, PATH_MAX - 1);
+
+            if (len < 0) {
+                len = 0;
+            }
+            file_name[len] = '\0';
+            fprintf(stderr, "Warning: requesting persistence across crashes "
+                    "for backend file %s failed. Proceeding without "
+                    "persistence, data might become corrupted in case of host "
+                    "crash.\n", file_name);
+            g_free(proc_link);
+            g_free(file_name);
+        }
+        /*
+         * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
+         * again without these flags to handle backwards compatibility.
+         */
+        populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+    }
+    return populated_ptr;
+}
+
 static inline size_t mmap_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -131,12 +175,8 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem)
 {
     const size_t pagesize = mmap_pagesize(fd);
-    int flags;
-    int map_sync_flags = 0;
-    size_t offset;
-    size_t total;
-    void *guardptr;
-    void *ptr;
+    size_t offset, total;
+    void *ptr, *guardptr;
 
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -153,44 +193,9 @@ void *qemu_ram_mmap(int fd,
     /* Always align to host page size */
     assert(align >= pagesize);
 
-    flags = MAP_FIXED;
-    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
-    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-    if (shared && is_pmem) {
-        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
-    }
-
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-               flags | map_sync_flags, fd, 0);
-
-    if (ptr == MAP_FAILED && map_sync_flags) {
-        if (errno == ENOTSUP) {
-            char *proc_link, *file_name;
-            int len;
-            proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
-            file_name = g_malloc0(PATH_MAX);
-            len = readlink(proc_link, file_name, PATH_MAX - 1);
-            if (len < 0) {
-                len = 0;
-            }
-            file_name[len] = '\0';
-            fprintf(stderr, "Warning: requesting persistence across crashes "
-                    "for backend file %s failed. Proceeding without "
-                    "persistence, data might become corrupted in case of host "
-                    "crash.\n", file_name);
-            g_free(proc_link);
-            g_free(file_name);
-        }
-        /*
-         * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
-         * we will remove these flags to handle compatibility.
-         */
-        ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-                   flags, fd, 0);
-    }
-
+    ptr = mmap_populate(guardptr + offset, size, fd, shared, is_pmem);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-19 22:50   ` Peter Xu
  2020-02-12 13:42 ` [PATCH v2 fixed 12/16] util/mmap-alloc: Implement " David Hildenbrand
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Kotrasinski, Eduardo Habkost, Michael S . Tsirkin,
	David Hildenbrand, Richard Henderson, Dr . David Alan Gilbert,
	Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

When shrinking a mmap we want to re-reserve the already populated area.
When growing a memory region, we want to populate starting with a given
fd_offset. Prepare by allowing to pass these parameters.

Also, let's make sure we always process full pages, to avoid
unmapping/remapping pages that are already in use when
growing/shrinking. Add some asserts.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 2f366dae72..fb7ef588fe 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -83,12 +83,12 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 }
 
 /*
- * Reserve a new memory region of the requested size to be used for mapping
- * from the given fd (if any).
+ * Reserve a new memory region of the requested size or re-reserve parts
+ * of an existing region to be used for mapping from the given fd (if any).
  */
-static void *mmap_reserve(size_t size, int fd)
+static void *mmap_reserve(void *ptr, size_t size, int fd)
 {
-    int flags = MAP_PRIVATE;
+    int flags = MAP_PRIVATE | (ptr ? MAP_FIXED : 0);
 
 #if defined(__powerpc64__) && defined(__linux__)
     /*
@@ -111,19 +111,23 @@ static void *mmap_reserve(size_t size, int fd)
     flags |= MAP_ANONYMOUS;
 #endif
 
-    return mmap(0, size, PROT_NONE, flags, fd, 0);
+    return mmap(ptr, size, PROT_NONE, flags, fd, 0);
 }
 
 /*
  * Populate memory in a reserved region from the given fd (if any).
  */
-static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
-                           bool is_pmem)
+static void *mmap_populate(void *ptr, size_t size, int fd, size_t fd_offset,
+                           bool shared, bool is_pmem)
 {
     int map_sync_flags = 0;
     int flags = MAP_FIXED;
     void *populated_ptr;
 
+    if (fd == -1) {
+        fd_offset = 0;
+    }
+
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
     if (shared && is_pmem) {
@@ -131,7 +135,7 @@ static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
     }
 
     populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
-                         flags | map_sync_flags, fd, 0);
+                         flags | map_sync_flags, fd, fd_offset);
     if (populated_ptr == MAP_FAILED && map_sync_flags) {
         if (errno == ENOTSUP) {
             char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
@@ -153,7 +157,8 @@ static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
          * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
          * again without these flags to handle backwards compatibility.
          */
-        populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+        populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd,
+                             fd_offset);
     }
     return populated_ptr;
 }
@@ -178,13 +183,15 @@ void *qemu_ram_mmap(int fd,
     size_t offset, total;
     void *ptr, *guardptr;
 
+    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
     /*
      * Note: this always allocates at least one extra page of virtual address
      * space, even if size is already aligned.
      */
     total = size + align;
 
-    guardptr = mmap_reserve(total, fd);
+    guardptr = mmap_reserve(0, total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
@@ -195,7 +202,7 @@ void *qemu_ram_mmap(int fd,
 
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap_populate(guardptr + offset, size, fd, shared, is_pmem);
+    ptr = mmap_populate(guardptr + offset, size, fd, 0, shared, is_pmem);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
@@ -221,6 +228,8 @@ void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
     const size_t pagesize = mmap_pagesize(fd);
 
+    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
         munmap(ptr, size + pagesize);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 12/16] util/mmap-alloc: Implement resizable mmaps
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-12 13:42 ` [PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks David Hildenbrand
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Kotrasinski, Eduardo Habkost, Michael S . Tsirkin,
	David Hildenbrand, Richard Henderson, Dr . David Alan Gilbert,
	Greg Kurz, Murilo Opsfelder Araujo, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Implement resizable mmaps. For now, the actual resizing is not wired up.
Introduce qemu_ram_mmap_resizable() and qemu_ram_mmap_resize(). Make
qemu_ram_mmap() a wrapper of qemu_ram_mmap_resizable().

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/mmap-alloc.h | 21 ++++++++++++--------
 util/mmap-alloc.c         | 42 ++++++++++++++++++++++++++++-----------
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index e786266b92..3a219721e3 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,11 +7,13 @@ size_t qemu_fd_getpagesize(int fd);
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
 /**
- * qemu_ram_mmap: mmap the specified file or device.
+ * qemu_ram_mmap_resizable: reserve a memory region of @max_size to mmap the
+ *                          specified file or device and mmap @size of it.
  *
  * Parameters:
  *  @fd: the file or the device to mmap
  *  @size: the number of bytes to be mmaped
+ *  @max_size: the number of bytes to be reserved
  *  @align: if not zero, specify the alignment of the starting mapping address;
  *          otherwise, the alignment in use will be determined by QEMU.
  *  @shared: map has RAM_SHARED flag.
@@ -21,12 +23,15 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  On success, return a pointer to the mapped area.
  *  On failure, return MAP_FAILED.
  */
-void *qemu_ram_mmap(int fd,
-                    size_t size,
-                    size_t align,
-                    bool shared,
-                    bool is_pmem);
-
-void qemu_ram_munmap(int fd, void *ptr, size_t size);
+void *qemu_ram_mmap_resizable(int fd, size_t size, size_t max_size,
+                              size_t align, bool shared, bool is_pmem);
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+                          bool shared, bool is_pmem);
+static inline void *qemu_ram_mmap(int fd, size_t size, size_t align,
+                                  bool shared, bool is_pmem)
+{
+    return qemu_ram_mmap_resizable(fd, size, size, align, shared, is_pmem);
+}
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size);
 
 #endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fb7ef588fe..164b88a088 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -173,23 +173,22 @@ static inline size_t mmap_pagesize(int fd)
 #endif
 }
 
-void *qemu_ram_mmap(int fd,
-                    size_t size,
-                    size_t align,
-                    bool shared,
-                    bool is_pmem)
+void *qemu_ram_mmap_resizable(int fd, size_t size, size_t max_size,
+                              size_t align, bool shared, bool is_pmem)
 {
     const size_t pagesize = mmap_pagesize(fd);
     size_t offset, total;
     void *ptr, *guardptr;
 
     g_assert(QEMU_IS_ALIGNED(size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
     /*
      * Note: this always allocates at least one extra page of virtual address
-     * space, even if size is already aligned.
+     * space, even if the size is already aligned. We will reserve an area of
+     * at least max_size, but only populate the requested part of it.
      */
-    total = size + align;
+    total = max_size + align;
 
     guardptr = mmap_reserve(0, total, fd);
     if (guardptr == MAP_FAILED) {
@@ -217,21 +216,40 @@ void *qemu_ram_mmap(int fd,
      * a guard page guarding against potential buffer overflows.
      */
     total -= offset;
-    if (total > size + pagesize) {
-        munmap(ptr + size + pagesize, total - size - pagesize);
+    if (total > max_size + pagesize) {
+        munmap(ptr + max_size + pagesize, total - max_size - pagesize);
     }
 
     return ptr;
 }
 
-void qemu_ram_munmap(int fd, void *ptr, size_t size)
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+                          bool shared, bool is_pmem)
 {
     const size_t pagesize = mmap_pagesize(fd);
 
-    g_assert(QEMU_IS_ALIGNED(size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(old_size, pagesize));
+    g_assert(QEMU_IS_ALIGNED(new_size, pagesize));
+
+    if (old_size < new_size) {
+        /* populate the missing piece into the reserved area */
+        ptr = mmap_populate(ptr + old_size, new_size - old_size, fd, old_size,
+                            shared, is_pmem);
+    } else if (old_size > new_size) {
+        /* discard this piece, marking it reserved */
+        ptr = mmap_reserve(ptr + new_size, old_size - new_size, fd);
+    }
+    return ptr != MAP_FAILED;
+}
+
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size)
+{
+    const size_t pagesize = mmap_pagesize(fd);
+
+    g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-        munmap(ptr, size + pagesize);
+        munmap(ptr, max_size + pagesize);
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 12/16] util/mmap-alloc: Implement " David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-13 12:41   ` Paul Durrant
  2020-02-12 13:42 ` [PATCH v2 fixed 14/16] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S . Tsirkin,
	David Hildenbrand, Dr . David Alan Gilbert, Paul Durrant,
	xen-devel, Igor Mammedov, Anthony Perard, Paolo Bonzini,
	Richard Henderson

We want to actually resize ram blocks (make everything between
used_length and max_length inaccessible) - however, not all ram block
notifiers will support that. Let's teach the notifier that ram blocks
are indeed resizable, but keep using max_size in the existing notifiers.

Supply the max_size when adding and removing ram blocks. Also, notify on
resizes. Introduce a way to detect if any registered notifier does not
support resizes - ram_block_notifiers_support_resize() - which we can later
use to fallback to legacy handling if a registered notifier (esp., SEV and
HAX) does not support actual resizes.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: xen-devel@lists.xenproject.org
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                     | 13 +++++++++++--
 hw/core/numa.c             | 34 +++++++++++++++++++++++++++++-----
 hw/i386/xen/xen-mapcache.c |  7 ++++---
 include/exec/ramlist.h     | 14 ++++++++++----
 target/i386/hax-mem.c      |  5 +++--
 target/i386/sev.c          | 18 ++++++++++--------
 util/vfio-helpers.c        | 17 +++++++++--------
 7 files changed, 76 insertions(+), 32 deletions(-)

diff --git a/exec.c b/exec.c
index fc65c4f7ca..f2d30479b8 100644
--- a/exec.c
+++ b/exec.c
@@ -2139,6 +2139,8 @@ static void qemu_ram_apply_settings(void *host, size_t length)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+    const ram_addr_t oldsize = block->used_length;
+
     assert(block);
 
     newsize = HOST_PAGE_ALIGN(newsize);
@@ -2167,6 +2169,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
     block->used_length = newsize;
     cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
                                         DIRTY_CLIENTS_ALL);
+
+    if (block->host) {
+        ram_block_notify_resized(block->host, oldsize, newsize);
+    }
+
     memory_region_set_size(block->mr, newsize);
     if (block->resized) {
         block->resized(block->idstr, newsize, block->host);
@@ -2319,7 +2326,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
 
     if (new_block->host) {
         qemu_ram_apply_settings(new_block->host, new_block->max_length);
-        ram_block_notify_add(new_block->host, new_block->max_length);
+        ram_block_notify_add(new_block->host, new_block->used_length,
+                             new_block->max_length);
     }
 }
 
@@ -2502,7 +2510,8 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     if (block->host) {
-        ram_block_notify_remove(block->host, block->max_length);
+        ram_block_notify_remove(block->host, block->used_length,
+                                block->max_length);
     }
 
     qemu_mutex_lock_ramlist();
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 6599c69e05..5b20dc726d 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -902,11 +902,12 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 {
     const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    const ram_addr_t size = qemu_ram_get_used_length(rb);
     void *host = qemu_ram_get_host_addr(rb);
     RAMBlockNotifier *notifier = opaque;
 
     if (host) {
-        notifier->ram_block_added(notifier, host, max_size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
     return 0;
 }
@@ -923,20 +924,43 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
     QLIST_REMOVE(n, next);
 }
 
-void ram_block_notify_add(void *host, size_t size)
+void ram_block_notify_add(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_added(notifier, host, size);
+        notifier->ram_block_added(notifier, host, size, max_size);
     }
 }
 
-void ram_block_notify_remove(void *host, size_t size)
+void ram_block_notify_remove(void *host, size_t size, size_t max_size)
 {
     RAMBlockNotifier *notifier;
 
     QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-        notifier->ram_block_removed(notifier, host, size);
+        notifier->ram_block_removed(notifier, host, size, max_size);
     }
 }
+
+void ram_block_notify_resized(void *host, size_t old_size, size_t new_size)
+{
+    RAMBlockNotifier *notifier;
+
+    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+        if (notifier->ram_block_resized) {
+            notifier->ram_block_resized(notifier, host, old_size, new_size);
+        }
+    }
+}
+
+bool ram_block_notifiers_support_resize(void)
+{
+    RAMBlockNotifier *notifier;
+
+    QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+        if (!notifier->ram_block_resized) {
+            return false;
+        }
+    }
+    return true;
+}
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 5b120ed44b..d6dcea65d1 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -169,7 +169,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 
     if (entry->vaddr_base != NULL) {
         if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-            ram_block_notify_remove(entry->vaddr_base, entry->size);
+            ram_block_notify_remove(entry->vaddr_base, entry->size,
+                                    entry->size);
         }
         if (munmap(entry->vaddr_base, entry->size) != 0) {
             perror("unmap fails");
@@ -211,7 +212,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
     }
 
     if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
-        ram_block_notify_add(vaddr_base, size);
+        ram_block_notify_add(vaddr_base, size, size);
     }
 
     entry->vaddr_base = vaddr_base;
@@ -452,7 +453,7 @@ static void xen_invalidate_map_cache_entry_unlocked(uint8_t *buffer)
     }
 
     pentry->next = entry->next;
-    ram_block_notify_remove(entry->vaddr_base, entry->size);
+    ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
     if (munmap(entry->vaddr_base, entry->size) != 0) {
         perror("unmap fails");
         exit(-1);
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index bc4faa1b00..92e548461e 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -65,15 +65,21 @@ void qemu_mutex_lock_ramlist(void);
 void qemu_mutex_unlock_ramlist(void);
 
 struct RAMBlockNotifier {
-    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size);
-    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size);
+    void (*ram_block_added)(RAMBlockNotifier *n, void *host, size_t size,
+                            size_t max_size);
+    void (*ram_block_removed)(RAMBlockNotifier *n, void *host, size_t size,
+                              size_t max_size);
+    void (*ram_block_resized)(RAMBlockNotifier *n, void *host, size_t old_size,
+                              size_t new_size);
     QLIST_ENTRY(RAMBlockNotifier) next;
 };
 
 void ram_block_notifier_add(RAMBlockNotifier *n);
 void ram_block_notifier_remove(RAMBlockNotifier *n);
-void ram_block_notify_add(void *host, size_t size);
-void ram_block_notify_remove(void *host, size_t size);
+void ram_block_notify_add(void *host, size_t size, size_t max_size);
+void ram_block_notify_remove(void *host, size_t size, size_t max_size);
+void ram_block_notify_resized(void *host, size_t old_size, size_t new_size);
+bool ram_block_notifiers_support_resize(void);
 
 void ram_block_dump(Monitor *mon);
 
diff --git a/target/i386/hax-mem.c b/target/i386/hax-mem.c
index 6bb5a24917..454d7fb212 100644
--- a/target/i386/hax-mem.c
+++ b/target/i386/hax-mem.c
@@ -293,7 +293,8 @@ static MemoryListener hax_memory_listener = {
     .priority = 10,
 };
 
-static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                                size_t max_size)
 {
     /*
      * We must register each RAM block with the HAXM kernel module, or
@@ -304,7 +305,7 @@ static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
      * host physical pages for the RAM block as part of this registration
      * process, hence the name hax_populate_ram().
      */
-    if (hax_populate_ram((uint64_t)(uintptr_t)host, size) < 0) {
+    if (hax_populate_ram((uint64_t)(uintptr_t)host, max_size) < 0) {
         fprintf(stderr, "HAX failed to populate RAM\n");
         abort();
     }
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 024bb24e51..6b4cee24a2 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -129,7 +129,8 @@ sev_set_guest_state(SevState new_state)
 }
 
 static void
-sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                    size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -146,19 +147,20 @@ sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_register_region(host, size);
+    trace_kvm_memcrypt_register_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_REG_REGION, &range);
     if (r) {
         error_report("%s: failed to register region (%p+%#zx) error '%s'",
-                     __func__, host, size, strerror(errno));
+                     __func__, host, max_size, strerror(errno));
         exit(1);
     }
 }
 
 static void
-sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
+sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+                      size_t max_size)
 {
     int r;
     struct kvm_enc_region range;
@@ -175,13 +177,13 @@ sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
     }
 
     range.addr = (__u64)(unsigned long)host;
-    range.size = size;
+    range.size = max_size;
 
-    trace_kvm_memcrypt_unregister_region(host, size);
+    trace_kvm_memcrypt_unregister_region(host, max_size);
     r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_UNREG_REGION, &range);
     if (r) {
         error_report("%s: failed to unregister region (%p+%#zx)",
-                     __func__, host, size);
+                     __func__, host, max_size);
     }
 }
 
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index b3adc328db..3db6aa49f4 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -372,25 +372,26 @@ fail_container:
     return ret;
 }
 
-static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
-                                      void *host, size_t size)
+static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
+                                      size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     int ret;
 
-    trace_qemu_vfio_ram_block_added(s, host, size);
-    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+    trace_qemu_vfio_ram_block_added(s, host, max_size);
+    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
     if (ret) {
-        error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host, size, ret);
+        error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host,
+                     max_size, ret);
     }
 }
 
-static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
-                                        void *host, size_t size)
+static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host,
+                                        size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     if (host) {
-        trace_qemu_vfio_ram_block_removed(s, host, size);
+        trace_qemu_vfio_ram_block_removed(s, host, max_size);
         qemu_vfio_dma_unmap(s, host);
     }
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 14/16] util: vfio-helpers: Implement ram_block_resized()
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (12 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-12 13:42 ` [PATCH v2 fixed 15/16] util: oslib: Resizable anonymous allocations under POSIX David Hildenbrand
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Stefan Hajnoczi, Dr . David Alan Gilbert, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

Let's implement ram_block_resized(), allowing resizable mappings.

For resizable mappings, we reserve $max_size IOVA address space, but only
map $size of it. When resizing, unmap the old part and remap the new
part. We'll need a new ioctl to do this atomically (e.g., to resize
while the guest is running - not allowed for now).

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/trace-events   |  5 ++--
 util/vfio-helpers.c | 70 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/util/trace-events b/util/trace-events
index 83b6639018..88b7dbf4a5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -74,8 +74,9 @@ qemu_mutex_unlock(void *mutex, const char *file, const int line) "released mutex
 
 # vfio-helpers.c
 qemu_vfio_dma_reset_temporary(void *s) "s %p"
-qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
-qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 0x%zx"
+qemu_vfio_ram_block_added(void *s, void *p, size_t size, size_t max_size) "s %p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_removed(void *s, void *p, size_t size, size_t max_size) "s %p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_resized(void *s, void *p, size_t old_size, size_t new_sizze) "s %p host %p old_size 0x%zx new_size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t iova) "s %p host %p size %zu index %d iova 0x%"PRIx64
 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p host %p size %zu iova 0x%"PRIx64
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 3db6aa49f4..70877b9ebd 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -372,14 +372,20 @@ fail_container:
     return ret;
 }
 
+static int qemu_vfio_dma_map_resizable(QEMUVFIOState *s, void *host,
+                                       size_t size, size_t max_size,
+                                       bool temporary, uint64_t *iova);
+static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
+                                     size_t old_size, size_t new_size);
+
 static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
                                       size_t size, size_t max_size)
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     int ret;
 
-    trace_qemu_vfio_ram_block_added(s, host, max_size);
-    ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
+    trace_qemu_vfio_ram_block_added(s, host, size, max_size);
+    ret = qemu_vfio_dma_map_resizable(s, host, size, max_size, false, NULL);
     if (ret) {
         error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host,
                      max_size, ret);
@@ -391,16 +397,28 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, void *host,
 {
     QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
     if (host) {
-        trace_qemu_vfio_ram_block_removed(s, host, max_size);
+        trace_qemu_vfio_ram_block_removed(s, host, size, max_size);
         qemu_vfio_dma_unmap(s, host);
     }
 }
 
+static void qemu_vfio_ram_block_resized(RAMBlockNotifier *n, void *host,
+                                        size_t old_size, size_t new_size)
+{
+    QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+
+    if (host) {
+        trace_qemu_vfio_ram_block_resized(s, host, old_size, new_size);
+        qemu_vfio_dma_map_resize(s, host, old_size, new_size);
+    }
+}
+
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
     qemu_mutex_init(&s->lock);
     s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
     s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
+    s->ram_notifier.ram_block_resized = qemu_vfio_ram_block_resized;
     s->low_water_mark = QEMU_VFIO_IOVA_MIN;
     s->high_water_mark = QEMU_VFIO_IOVA_MAX;
     ram_block_notifier_add(&s->ram_notifier);
@@ -597,9 +615,14 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
  * the result in @iova if not NULL. The caller need to make sure the area is
  * aligned to page size, and mustn't overlap with existing mapping areas (split
  * mapping status within this area is not allowed).
+ *
+ * If size < max_size, a region of max_size in IOVA address is reserved, such
+ * that the mapping can later be resized. Resizable mappings are only allowed
+ * for !temporary mappings.
  */
-int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-                      bool temporary, uint64_t *iova)
+static int qemu_vfio_dma_map_resizable(QEMUVFIOState *s, void *host,
+                                       size_t size, size_t max_size,
+                                       bool temporary, uint64_t *iova)
 {
     int ret = 0;
     int index;
@@ -608,13 +631,17 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
 
     assert(QEMU_PTR_IS_ALIGNED(host, qemu_real_host_page_size));
     assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
+    assert(QEMU_IS_ALIGNED(max_size, qemu_real_host_page_size));
+    assert(size == max_size || !temporary);
+    assert(size <= max_size);
+
     trace_qemu_vfio_dma_map(s, host, size, temporary, iova);
     qemu_mutex_lock(&s->lock);
     mapping = qemu_vfio_find_mapping(s, host, &index);
     if (mapping) {
         iova0 = mapping->iova + ((uint8_t *)host - (uint8_t *)mapping->host);
     } else {
-        if (s->high_water_mark - s->low_water_mark + 1 < size) {
+        if (s->high_water_mark - s->low_water_mark + 1 < max_size) {
             ret = -ENOMEM;
             goto out;
         }
@@ -631,7 +658,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
                 qemu_vfio_remove_mapping(s, mapping);
                 goto out;
             }
-            s->low_water_mark += size;
+            s->low_water_mark += max_size;
             qemu_vfio_dump_mappings(s);
         } else {
             iova0 = s->high_water_mark - size;
@@ -650,6 +677,12 @@ out:
     return ret;
 }
 
+int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
+                      bool temporary, uint64_t *iova)
+{
+    return qemu_vfio_dma_map_resizable(s, host, size, size, temporary, iova);
+}
+
 /* Reset the high watermark and free all "temporary" mappings. */
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s)
 {
@@ -694,6 +727,29 @@ out:
     qemu_mutex_unlock(&s->lock);
 }
 
+static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
+                                     size_t old_size, size_t new_size)
+{
+    IOVAMapping *m;
+    int index = 0;
+
+    qemu_mutex_lock(&s->lock);
+    m = qemu_vfio_find_mapping(s, host, &index);
+    if (!m) {
+        return;
+    }
+    assert(m->size == old_size);
+
+    /* Note: Not atomic - we need a new ioctl for that. */
+    qemu_vfio_undo_mapping(s, m->iova, m->size);
+    qemu_vfio_do_mapping(s, host, m->iova, new_size);
+
+    m->size = new_size;
+    assert(qemu_vfio_verify_mappings(s));
+
+    qemu_mutex_unlock(&s->lock);
+}
+
 static void qemu_vfio_reset(QEMUVFIOState *s)
 {
     ioctl(s->device, VFIO_DEVICE_RESET);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 15/16] util: oslib: Resizable anonymous allocations under POSIX
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (13 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 14/16] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-12 13:42 ` [PATCH v2 fixed 16/16] exec: Ram blocks with resizable " David Hildenbrand
  2020-02-12 18:03 ` [PATCH v2 fixed 00/16] " David Hildenbrand
  16 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	David Hildenbrand, Dr . David Alan Gilbert, Igor Mammedov,
	Paolo Bonzini, Richard Henderson

Introduce qemu_anon_ram_alloc_resizable() and qemu_anon_ram_resize().
Implement them under POSIX and make them return NULL under WIN32.

Under POSIX, we make use of resizable mmaps. An implementation under
WIN32 is theoretically possible AFAIK and can be added later.

In qemu_anon_ram_free(), rename the size parameter to max_size, to make
it clearer that we have to use the maximum size when freeing resizable
anonymous allocations.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/osdep.h |  6 +++++-
 util/oslib-posix.c   | 37 ++++++++++++++++++++++++++++++++++---
 util/oslib-win32.c   | 14 ++++++++++++++
 util/trace-events    |  4 +++-
 4 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 9bd3dcfd13..84c54c1647 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -311,8 +311,12 @@ int qemu_daemon(int nochdir, int noclose);
 void *qemu_try_memalign(size_t alignment, size_t size);
 void *qemu_memalign(size_t alignment, size_t size);
 void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
+void *qemu_anon_ram_alloc_resizable(size_t size, size_t max_size,
+                                    uint64_t *align, bool shared);
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared);
 void qemu_vfree(void *ptr);
-void qemu_anon_ram_free(void *ptr, size_t size);
+void qemu_anon_ram_free(void *ptr, size_t max_size);
 
 #define QEMU_MADV_INVALID -1
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 5a291cc982..147246d543 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -219,16 +219,47 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
     return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizable(size_t size, size_t max_size,
+                                    uint64_t *alignment, bool shared)
+{
+    size_t align = QEMU_VMALLOC_ALIGN;
+    void *ptr = qemu_ram_mmap_resizable(-1, size, max_size, align, shared,
+                                        false);
+
+    if (ptr == MAP_FAILED) {
+        return NULL;
+    }
+
+    if (alignment) {
+        *alignment = align;
+    }
+
+    trace_qemu_anon_ram_alloc_resizable(size, max_size, ptr);
+    return ptr;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared)
+{
+    bool resized = qemu_ram_mmap_resize(ptr, -1, old_size, new_size, shared,
+                                        false);
+
+    if (resized) {
+        trace_qemu_anon_ram_resize(old_size, new_size, ptr);
+    }
+    return resized;
+}
+
 void qemu_vfree(void *ptr)
 {
     trace_qemu_vfree(ptr);
     free(ptr);
 }
 
-void qemu_anon_ram_free(void *ptr, size_t size)
+void qemu_anon_ram_free(void *ptr, size_t max_size)
 {
-    trace_qemu_anon_ram_free(ptr, size);
-    qemu_ram_munmap(-1, ptr, size);
+    trace_qemu_anon_ram_free(ptr, max_size);
+    qemu_ram_munmap(-1, ptr, max_size);
 }
 
 void qemu_set_block(int fd)
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e9b14ab178..5ba872bd3b 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -90,6 +90,20 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared)
     return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizable(size_t size, size_t max_size,
+                                    uint64_t *align, bool shared)
+{
+    /* resizable ram not implemented yet */
+    return NULL;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+                          bool shared)
+{
+    /* resizable ram not implemented yet */
+    return false;
+}
+
 void qemu_vfree(void *ptr)
 {
     trace_qemu_vfree(ptr);
diff --git a/util/trace-events b/util/trace-events
index 88b7dbf4a5..8f44dcc1a0 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -46,8 +46,10 @@ qemu_co_mutex_unlock_return(void *mutex, void *self) "mutex %p self %p"
 # oslib-posix.c
 qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size %zu ptr %p"
 qemu_anon_ram_alloc(size_t size, void *ptr) "size %zu ptr %p"
+qemu_anon_ram_alloc_resizable(size_t size, size_t max_size, void *ptr) "size %zu max_size %zu ptr %p"
+qemu_anon_ram_resize(size_t old_size, size_t new_size, void *ptr) "old_size %zu new_size %zu ptr %p"
 qemu_vfree(void *ptr) "ptr %p"
-qemu_anon_ram_free(void *ptr, size_t size) "ptr %p size %zu"
+qemu_anon_ram_free(void *ptr, size_t max_size) "ptr %p max_size %zu"
 
 # hbitmap.c
 hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned long cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx"
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 fixed 16/16] exec: Ram blocks with resizable anonymous allocations under POSIX
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (14 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 15/16] util: oslib: Resizable anonymous allocations under POSIX David Hildenbrand
@ 2020-02-12 13:42 ` David Hildenbrand
  2020-02-12 18:03 ` [PATCH v2 fixed 00/16] " David Hildenbrand
  16 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Stefan Weil,
	David Hildenbrand, Dr . David Alan Gilbert,
	Shameerali Kolothum Thodi, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We can now make use of resizable anonymous allocations to implement
actually resizable ram blocks. Resizable anonymous allocations are
not implemented under WIN32 yet and are not available when using
alternative allocators. Fall back to the existing handling.

We also have to fallback to the existing handling in case any ram block
notifier does not support resizing (esp., AMD SEV, HAX) yet. Remember
in RAM_RESIZEABLE_ALLOC if we are using resizable anonymous allocations.

As the mmap()-hackery will invalidate some madvise settings, we have to
re-apply them after resizing. After resizing, notify the ram block
notifiers.

Try to grow early, as that can easily fail if out of memory. Shrink late
and ignore errors (nothing will actually break). Warn only.

The benefit of actually resizable ram blocks is that e.g., under Linux,
only the actual size will be reserved (even if
"/proc/sys/vm/overcommit_memory" is set to "never"). Additional memory will
be reserved when trying to resize, which allows to have ram blocks that
start small but can theoretically grow very large.

Note1: We are not able to create resizable ram blocks with pre-allocated
       memory yet, so prealloc is not affected.
Note2: mlock should work as it used to as os_mlock() does a
       mlockall(MCL_CURRENT | MCL_FUTURE), which includes future
       mappings.
Note3: Nobody should access memory beyond used_length. Memory notifiers
       already properly take care of this, only ram block notifiers
       violate this constraint and, therefore, have to be special-cased.
       Especially, any ram block notifier that might dynamically
       register at runtime (e.g., vfio), has to support resizes. Add an
       assert for that. Both, HAX and SEV register early, so they are
       fine.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                    | 60 ++++++++++++++++++++++++++++++++++++---
 hw/core/numa.c            |  7 +++++
 include/exec/cpu-common.h |  2 ++
 include/exec/memory.h     |  8 ++++++
 4 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index f2d30479b8..71e32dcc11 100644
--- a/exec.c
+++ b/exec.c
@@ -2053,6 +2053,16 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
     rb->flags &= ~RAM_MIGRATABLE;
 }
 
+bool qemu_ram_is_resizable(RAMBlock *rb)
+{
+    return rb->flags & RAM_RESIZEABLE;
+}
+
+bool qemu_ram_is_resizable_alloc(RAMBlock *rb)
+{
+    return rb->flags & RAM_RESIZEABLE_ALLOC;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState *dev)
 {
@@ -2139,6 +2149,7 @@ static void qemu_ram_apply_settings(void *host, size_t length)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+    const bool shared = block->flags & RAM_SHARED;
     const ram_addr_t oldsize = block->used_length;
 
     assert(block);
@@ -2149,7 +2160,7 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return 0;
     }
 
-    if (!(block->flags & RAM_RESIZEABLE)) {
+    if (!qemu_ram_is_resizable(block)) {
         error_setg_errno(errp, EINVAL,
                          "Length mismatch: %s: 0x" RAM_ADDR_FMT
                          " in != 0x" RAM_ADDR_FMT, block->idstr,
@@ -2165,6 +2176,12 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return -EINVAL;
     }
 
+    if (oldsize < newsize && qemu_ram_is_resizable_alloc(block) &&
+        !qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+        error_setg_errno(errp, -ENOMEM, "Cannot allocate enough memory.");
+        return -ENOMEM;
+    }
+
     cpu_physical_memory_clear_dirty_range(block->offset, block->used_length);
     block->used_length = newsize;
     cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
@@ -2178,6 +2195,21 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
     if (block->resized) {
         block->resized(block->idstr, newsize, block->host);
     }
+
+    /*
+     * Shrinking will only fail in rare scenarios (e.g., maximum number of
+     * mappings reached), and can be ignored. Warn only.
+     */
+    if (newsize < oldsize && qemu_ram_is_resizable_alloc(block) &&
+        !qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+        warn_report("Shrinking memory allocation failed.");
+    }
+
+    if (block->host && qemu_ram_is_resizable_alloc(block)) {
+        /* re-apply settings that might have been overriden by the resize */
+        qemu_ram_apply_settings(block->host, block->max_length);
+    }
+
     return 0;
 }
 
@@ -2256,6 +2288,28 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
     }
 }
 
+static void ram_block_alloc_ram(RAMBlock *rb)
+{
+    const bool shared = qemu_ram_is_shared(rb);
+
+    /*
+     * If we can, try to allocate actually resizable ram. Will also fail
+     * if qemu_anon_ram_alloc_resizable() is not implemented.
+     */
+    if (phys_mem_alloc == qemu_anon_ram_alloc &&
+        qemu_ram_is_resizable(rb) &&
+        ram_block_notifiers_support_resize()) {
+        rb->host = qemu_anon_ram_alloc_resizable(rb->used_length,
+                                                 rb->max_length, &rb->mr->align,
+                                                 shared);
+        if (rb->host) {
+            rb->flags |= RAM_RESIZEABLE_ALLOC;
+            return;
+        }
+    }
+    rb->host = phys_mem_alloc(rb->max_length, &rb->mr->align, shared);
+}
+
 static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
     RAMBlock *block;
@@ -2278,9 +2332,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                 return;
             }
         } else {
-            new_block->host = phys_mem_alloc(new_block->max_length,
-                                             &new_block->mr->align,
-                                             qemu_ram_is_shared(new_block));
+            ram_block_alloc_ram(new_block);
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 5b20dc726d..601cf9f603 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -907,6 +907,13 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
     RAMBlockNotifier *notifier = opaque;
 
     if (host) {
+        /*
+         * Dynamically adding notifiers that don't support resizes is forbidden
+         * when dealing with resizable ram blocks that have actually resizable
+         * allocations.
+         */
+        g_assert(!qemu_ram_is_resizable_alloc(rb) ||
+                 notifier->ram_block_resized);
         notifier->ram_block_added(notifier, host, size, max_size);
     }
     return 0;
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 9760ac9068..a9c76bd5ef 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -66,6 +66,8 @@ void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
 void qemu_ram_set_migratable(RAMBlock *rb);
 void qemu_ram_unset_migratable(RAMBlock *rb);
+bool qemu_ram_is_resizable(RAMBlock *rb);
+bool qemu_ram_is_resizable_alloc(RAMBlock *rb);
 
 size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e85b7de99a..19417943a2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/*
+ * Implies RAM_RESIZEABLE. Memory beyond the used_length is inaccessible
+ * (esp. initially and after resizing). For such memory blocks, only the
+ * used_length is reserved in the OS - resizing might fail. Will only be
+ * used with host OS support and if all ram block notifiers support resizing.
+ */
+#define RAM_RESIZEABLE_ALLOC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX
  2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (15 preceding siblings ...)
  2020-02-12 13:42 ` [PATCH v2 fixed 16/16] exec: Ram blocks with resizable " David Hildenbrand
@ 2020-02-12 18:03 ` David Hildenbrand
  2020-02-13 13:36   ` David Hildenbrand
  2020-02-14 13:08   ` Dr. David Alan Gilbert
  16 siblings, 2 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-12 18:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Murilo Opsfelder Araujo, Stefano Stabellini, Eduardo Habkost,
	Michael S . Tsirkin, Stefan Weil, Igor Kotrasinski,
	Richard Henderson, Dr . David Alan Gilbert,
	Shameerali Kolothum Thodi, Greg Kurz, Paul Durrant,
	Alex Williamson, Igor Mammedov, Anthony Perard, Paolo Bonzini,
	Stefan Hajnoczi, Richard Henderson

On 12.02.20 14:42, David Hildenbrand wrote:
> We already allow resizable ram blocks for anonymous memory, however, they
> are not actually resized. All memory is mmaped() R/W, including the memory
> exceeding the used_length, up to the max_length.
> 
> When resizing, effectively only the boundary is moved. Implement actually
> resizable anonymous allocations and make use of them in resizable ram
> blocks when possible. Memory exceeding the used_length will be
> inaccessible. Especially ram block notifiers require care.
> 
> Having actually resizable anonymous allocations (via mmap-hackery) allows
> to reserve a big region in virtual address space and grow the
> accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
> is set to "never" under Linux, huge reservations will succeed. If there is
> not enough memory when resizing (to populate parts of the reserved region),
> trying to resize will fail. Only the actually used size is reserved in the
> OS.
> 
> E.g., virtio-mem [1] wants to reserve big resizable memory regions and
> grow the usable part on demand. I think this change is worth sending out
> individually. Accompanied by a bunch of minor fixes and cleanups.
> 
> Especially, memory notifiers already handle resizing by first removing
> the old region, and then re-adding the resized region. prealloc is
> currently not possible with resizable ram blocks. mlock() should continue
> to work as is. Resizing is currently rare and must only happen on the
> start of an incoming migration, or during resets. No code path (except
> HAX and SEV ram block notifiers) should access memory outside of the usable
> range - and if we ever find one, that one has to be fixed (I did not
> identify any).
> 
> v1 -> v2:
> - Add "util: vfio-helpers: Fix qemu_vfio_close()"
> - Add "util: vfio-helpers: Remove Error parameter from
>        qemu_vfio_undo_mapping()"
> - Add "util: vfio-helpers: Factor out removal from
>        qemu_vfio_undo_mapping()"
> - "util/mmap-alloc: ..."
>  -- Minor changes due to review feedback (e.g., assert alignment, return
>     bool when resizing)
> - "util: vfio-helpers: Implement ram_block_resized()"
>  -- Reserve max_size in the IOVA address space.
>  -- On resize, undo old mapping and do new mapping. We can later implement
>     a new ioctl to resize the mapping directly.
> - "numa: Teach ram block notifiers about resizable ram blocks"
>  -- Pass size/max_size to ram block notifiers, which makes things easier an
>     cleaner
> - "exec: Ram blocks with resizable anonymous allocations under POSIX"
>  -- Adapt to new ram block notifiers
>  -- Shrink after notifying. Always trigger ram block notifiers on resizes
>  -- Add a safety net that all ram block notifiers registered at runtime
>     support resizes.
> 
> [1] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/
> 
> David Hildenbrand (16):
>   util: vfio-helpers: Factor out and fix processing of existing ram
>     blocks
>   util: vfio-helpers: Fix qemu_vfio_close()
>   util: vfio-helpers: Remove Error parameter from
>     qemu_vfio_undo_mapping()
>   util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
>   exec: Factor out setting ram settings (madvise ...) into
>     qemu_ram_apply_settings()
>   exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
>   exec: Drop "shared" parameter from ram_block_add()
>   util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
>   util/mmap-alloc: Factor out reserving of a memory region to
>     mmap_reserve()
>   util/mmap-alloc: Factor out populating of memory to mmap_populate()
>   util/mmap-alloc: Prepare for resizable mmaps
>   util/mmap-alloc: Implement resizable mmaps
>   numa: Teach ram block notifiers about resizable ram blocks
>   util: vfio-helpers: Implement ram_block_resized()
>   util: oslib: Resizable anonymous allocations under POSIX
>   exec: Ram blocks with resizable anonymous allocations under POSIX
> 
>  exec.c                     | 104 +++++++++++++++++++----
>  hw/core/numa.c             |  53 +++++++++++-
>  hw/i386/xen/xen-mapcache.c |   7 +-
>  include/exec/cpu-common.h  |   3 +
>  include/exec/memory.h      |   8 ++
>  include/exec/ramlist.h     |  14 +++-
>  include/qemu/mmap-alloc.h  |  21 +++--
>  include/qemu/osdep.h       |   6 +-
>  stubs/ram-block.c          |  20 -----
>  target/i386/hax-mem.c      |   5 +-
>  target/i386/sev.c          |  18 ++--
>  util/mmap-alloc.c          | 165 +++++++++++++++++++++++--------------
>  util/oslib-posix.c         |  37 ++++++++-
>  util/oslib-win32.c         |  14 ++++
>  util/trace-events          |   9 +-
>  util/vfio-helpers.c        | 145 +++++++++++++++++++++-----------
>  16 files changed, 450 insertions(+), 179 deletions(-)
> 

1. I will do resizable -> resizeable
2. I think migration might indeed need some care regarding
   max_length. We should never migrate anything beyond used_length. And
   if we receive something, it should be discarded. Will look into that.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks
  2020-02-12 13:42 ` [PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks David Hildenbrand
@ 2020-02-13 12:41   ` Paul Durrant
  0 siblings, 0 replies; 46+ messages in thread
From: Paul Durrant @ 2020-02-13 12:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S . Tsirkin,
	open list:All patches CC here, Dr . David Alan Gilbert,
	xen-devel, Igor Mammedov, Anthony Perard, Paolo Bonzini,
	Richard Henderson

On Wed, 12 Feb 2020 at 14:44, David Hildenbrand <david@redhat.com> wrote:
>
> We want to actually resize ram blocks (make everything between
> used_length and max_length inaccessible) - however, not all ram block
> notifiers will support that. Let's teach the notifier that ram blocks
> are indeed resizable, but keep using max_size in the existing notifiers.
>
> Supply the max_size when adding and removing ram blocks. Also, notify on
> resizes. Introduce a way to detect if any registered notifier does not
> support resizes - ram_block_notifiers_support_resize() - which we can later
> use to fallback to legacy handling if a registered notifier (esp., SEV and
> HAX) does not support actual resizes.
>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Paul Durrant <paul@xen.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: xen-devel@lists.xenproject.org
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Xen parts...

Acked-by: Paul Durrant <paul@xen.org>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX
  2020-02-12 18:03 ` [PATCH v2 fixed 00/16] " David Hildenbrand
@ 2020-02-13 13:36   ` David Hildenbrand
  2020-02-14 13:08   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-13 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Murilo Opsfelder Araujo, Stefano Stabellini, Eduardo Habkost,
	Michael S . Tsirkin, Stefan Weil, Igor Kotrasinski,
	Richard Henderson, Dr . David Alan Gilbert,
	Shameerali Kolothum Thodi, Greg Kurz, Paul Durrant,
	Alex Williamson, Igor Mammedov, Anthony Perard, Paolo Bonzini,
	Stefan Hajnoczi, Richard Henderson

On 12.02.20 19:03, David Hildenbrand wrote:
> On 12.02.20 14:42, David Hildenbrand wrote:
>> We already allow resizable ram blocks for anonymous memory, however, they
>> are not actually resized. All memory is mmaped() R/W, including the memory
>> exceeding the used_length, up to the max_length.
>>
>> When resizing, effectively only the boundary is moved. Implement actually
>> resizable anonymous allocations and make use of them in resizable ram
>> blocks when possible. Memory exceeding the used_length will be
>> inaccessible. Especially ram block notifiers require care.
>>
>> Having actually resizable anonymous allocations (via mmap-hackery) allows
>> to reserve a big region in virtual address space and grow the
>> accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
>> is set to "never" under Linux, huge reservations will succeed. If there is
>> not enough memory when resizing (to populate parts of the reserved region),
>> trying to resize will fail. Only the actually used size is reserved in the
>> OS.
>>
>> E.g., virtio-mem [1] wants to reserve big resizable memory regions and
>> grow the usable part on demand. I think this change is worth sending out
>> individually. Accompanied by a bunch of minor fixes and cleanups.
>>
>> Especially, memory notifiers already handle resizing by first removing
>> the old region, and then re-adding the resized region. prealloc is
>> currently not possible with resizable ram blocks. mlock() should continue
>> to work as is. Resizing is currently rare and must only happen on the
>> start of an incoming migration, or during resets. No code path (except
>> HAX and SEV ram block notifiers) should access memory outside of the usable
>> range - and if we ever find one, that one has to be fixed (I did not
>> identify any).
>>
>> v1 -> v2:
>> - Add "util: vfio-helpers: Fix qemu_vfio_close()"
>> - Add "util: vfio-helpers: Remove Error parameter from
>>        qemu_vfio_undo_mapping()"
>> - Add "util: vfio-helpers: Factor out removal from
>>        qemu_vfio_undo_mapping()"
>> - "util/mmap-alloc: ..."
>>  -- Minor changes due to review feedback (e.g., assert alignment, return
>>     bool when resizing)
>> - "util: vfio-helpers: Implement ram_block_resized()"
>>  -- Reserve max_size in the IOVA address space.
>>  -- On resize, undo old mapping and do new mapping. We can later implement
>>     a new ioctl to resize the mapping directly.
>> - "numa: Teach ram block notifiers about resizable ram blocks"
>>  -- Pass size/max_size to ram block notifiers, which makes things easier an
>>     cleaner
>> - "exec: Ram blocks with resizable anonymous allocations under POSIX"
>>  -- Adapt to new ram block notifiers
>>  -- Shrink after notifying. Always trigger ram block notifiers on resizes
>>  -- Add a safety net that all ram block notifiers registered at runtime
>>     support resizes.
>>
>> [1] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/
>>
>> David Hildenbrand (16):
>>   util: vfio-helpers: Factor out and fix processing of existing ram
>>     blocks
>>   util: vfio-helpers: Fix qemu_vfio_close()
>>   util: vfio-helpers: Remove Error parameter from
>>     qemu_vfio_undo_mapping()
>>   util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
>>   exec: Factor out setting ram settings (madvise ...) into
>>     qemu_ram_apply_settings()
>>   exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
>>   exec: Drop "shared" parameter from ram_block_add()
>>   util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
>>   util/mmap-alloc: Factor out reserving of a memory region to
>>     mmap_reserve()
>>   util/mmap-alloc: Factor out populating of memory to mmap_populate()
>>   util/mmap-alloc: Prepare for resizable mmaps
>>   util/mmap-alloc: Implement resizable mmaps
>>   numa: Teach ram block notifiers about resizable ram blocks
>>   util: vfio-helpers: Implement ram_block_resized()
>>   util: oslib: Resizable anonymous allocations under POSIX
>>   exec: Ram blocks with resizable anonymous allocations under POSIX
>>
>>  exec.c                     | 104 +++++++++++++++++++----
>>  hw/core/numa.c             |  53 +++++++++++-
>>  hw/i386/xen/xen-mapcache.c |   7 +-
>>  include/exec/cpu-common.h  |   3 +
>>  include/exec/memory.h      |   8 ++
>>  include/exec/ramlist.h     |  14 +++-
>>  include/qemu/mmap-alloc.h  |  21 +++--
>>  include/qemu/osdep.h       |   6 +-
>>  stubs/ram-block.c          |  20 -----
>>  target/i386/hax-mem.c      |   5 +-
>>  target/i386/sev.c          |  18 ++--
>>  util/mmap-alloc.c          | 165 +++++++++++++++++++++++--------------
>>  util/oslib-posix.c         |  37 ++++++++-
>>  util/oslib-win32.c         |  14 ++++
>>  util/trace-events          |   9 +-
>>  util/vfio-helpers.c        | 145 +++++++++++++++++++++-----------
>>  16 files changed, 450 insertions(+), 179 deletions(-)
>>
> 
> 1. I will do resizable -> resizeable
> 2. I think migration might indeed need some care regarding
>    max_length. We should never migrate anything beyond used_length. And
>    if we receive something, it should be discarded. Will look into that.

So I double checked and migration will never read or write beyond
used_length. Proper offset_in_ramblock() checks are in place. Will add
some cleanups to v3, though :)


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX
  2020-02-12 18:03 ` [PATCH v2 fixed 00/16] " David Hildenbrand
  2020-02-13 13:36   ` David Hildenbrand
@ 2020-02-14 13:08   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2020-02-14 13:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Murilo Opsfelder Araujo, Stefano Stabellini, Eduardo Habkost,
	Michael S . Tsirkin, Stefan Weil, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Shameerali Kolothum Thodi,
	Greg Kurz, Paul Durrant, Alex Williamson, Igor Mammedov,
	Anthony Perard, Paolo Bonzini, Stefan Hajnoczi,
	Richard Henderson

* David Hildenbrand (david@redhat.com) wrote:
> On 12.02.20 14:42, David Hildenbrand wrote:
> > We already allow resizable ram blocks for anonymous memory, however, they
> > are not actually resized. All memory is mmaped() R/W, including the memory
> > exceeding the used_length, up to the max_length.
> > 
> > When resizing, effectively only the boundary is moved. Implement actually
> > resizable anonymous allocations and make use of them in resizable ram
> > blocks when possible. Memory exceeding the used_length will be
> > inaccessible. Especially ram block notifiers require care.
> > 
> > Having actually resizable anonymous allocations (via mmap-hackery) allows
> > to reserve a big region in virtual address space and grow the
> > accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
> > is set to "never" under Linux, huge reservations will succeed. If there is
> > not enough memory when resizing (to populate parts of the reserved region),
> > trying to resize will fail. Only the actually used size is reserved in the
> > OS.
> > 
> > E.g., virtio-mem [1] wants to reserve big resizable memory regions and
> > grow the usable part on demand. I think this change is worth sending out
> > individually. Accompanied by a bunch of minor fixes and cleanups.
> > 
> > Especially, memory notifiers already handle resizing by first removing
> > the old region, and then re-adding the resized region. prealloc is
> > currently not possible with resizable ram blocks. mlock() should continue
> > to work as is. Resizing is currently rare and must only happen on the
> > start of an incoming migration, or during resets. No code path (except
> > HAX and SEV ram block notifiers) should access memory outside of the usable
> > range - and if we ever find one, that one has to be fixed (I did not
> > identify any).
> > 
> > v1 -> v2:
> > - Add "util: vfio-helpers: Fix qemu_vfio_close()"
> > - Add "util: vfio-helpers: Remove Error parameter from
> >        qemu_vfio_undo_mapping()"
> > - Add "util: vfio-helpers: Factor out removal from
> >        qemu_vfio_undo_mapping()"
> > - "util/mmap-alloc: ..."
> >  -- Minor changes due to review feedback (e.g., assert alignment, return
> >     bool when resizing)
> > - "util: vfio-helpers: Implement ram_block_resized()"
> >  -- Reserve max_size in the IOVA address space.
> >  -- On resize, undo old mapping and do new mapping. We can later implement
> >     a new ioctl to resize the mapping directly.
> > - "numa: Teach ram block notifiers about resizable ram blocks"
> >  -- Pass size/max_size to ram block notifiers, which makes things easier an
> >     cleaner
> > - "exec: Ram blocks with resizable anonymous allocations under POSIX"
> >  -- Adapt to new ram block notifiers
> >  -- Shrink after notifying. Always trigger ram block notifiers on resizes
> >  -- Add a safety net that all ram block notifiers registered at runtime
> >     support resizes.
> > 
> > [1] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/
> > 
> > David Hildenbrand (16):
> >   util: vfio-helpers: Factor out and fix processing of existing ram
> >     blocks
> >   util: vfio-helpers: Fix qemu_vfio_close()
> >   util: vfio-helpers: Remove Error parameter from
> >     qemu_vfio_undo_mapping()
> >   util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
> >   exec: Factor out setting ram settings (madvise ...) into
> >     qemu_ram_apply_settings()
> >   exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
> >   exec: Drop "shared" parameter from ram_block_add()
> >   util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
> >   util/mmap-alloc: Factor out reserving of a memory region to
> >     mmap_reserve()
> >   util/mmap-alloc: Factor out populating of memory to mmap_populate()
> >   util/mmap-alloc: Prepare for resizable mmaps
> >   util/mmap-alloc: Implement resizable mmaps
> >   numa: Teach ram block notifiers about resizable ram blocks
> >   util: vfio-helpers: Implement ram_block_resized()
> >   util: oslib: Resizable anonymous allocations under POSIX
> >   exec: Ram blocks with resizable anonymous allocations under POSIX
> > 
> >  exec.c                     | 104 +++++++++++++++++++----
> >  hw/core/numa.c             |  53 +++++++++++-
> >  hw/i386/xen/xen-mapcache.c |   7 +-
> >  include/exec/cpu-common.h  |   3 +
> >  include/exec/memory.h      |   8 ++
> >  include/exec/ramlist.h     |  14 +++-
> >  include/qemu/mmap-alloc.h  |  21 +++--
> >  include/qemu/osdep.h       |   6 +-
> >  stubs/ram-block.c          |  20 -----
> >  target/i386/hax-mem.c      |   5 +-
> >  target/i386/sev.c          |  18 ++--
> >  util/mmap-alloc.c          | 165 +++++++++++++++++++++++--------------
> >  util/oslib-posix.c         |  37 ++++++++-
> >  util/oslib-win32.c         |  14 ++++
> >  util/trace-events          |   9 +-
> >  util/vfio-helpers.c        | 145 +++++++++++++++++++++-----------
> >  16 files changed, 450 insertions(+), 179 deletions(-)
> > 
> 
> 1. I will do resizable -> resizeable
> 2. I think migration might indeed need some care regarding
>    max_length. We should never migrate anything beyond used_length. And
>    if we receive something, it should be discarded. Will look into that.

It feels like we should warn/error if we receive something beyond used?

Dave

> -- 
> Thanks,
> 
> David / dhildenb
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks
  2020-02-12 13:42 ` [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
@ 2020-02-18 22:00   ` Peter Xu
  2020-02-19  8:43     ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Xu @ 2020-02-18 22:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Wed, Feb 12, 2020 at 02:42:39PM +0100, David Hildenbrand wrote:
> Factor it out into common code when a new notifier is registered, just
> as done with the memory region notifier. This allows us to have the
> logic about how to process existing ram blocks at a central place (which
> will be extended soon).
> 
> Just like when adding a new ram block, we have to register the max_length
> for now. We don't have a way to get notified about resizes yet, and some
> memory would not be mapped when growing the ram block.
> 
> Note: Currently, ram blocks are only "fake resized". All memory
> (max_length) is accessible.
> 
> We can get rid of a bunch of functions in stubs/ram-block.c . Print the
> warning from inside qemu_vfio_ram_block_added().
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  exec.c                    |  5 +++++
>  hw/core/numa.c            | 14 ++++++++++++++
>  include/exec/cpu-common.h |  1 +
>  stubs/ram-block.c         | 20 --------------------
>  util/vfio-helpers.c       | 28 +++++++---------------------
>  5 files changed, 27 insertions(+), 41 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 67e520d18e..05cfe868ab 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2017,6 +2017,11 @@ ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
>      return rb->used_length;
>  }
>  
> +ram_addr_t qemu_ram_get_max_length(RAMBlock *rb)
> +{
> +    return rb->max_length;
> +}
> +
>  bool qemu_ram_is_shared(RAMBlock *rb)
>  {
>      return rb->flags & RAM_SHARED;
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 0d1b4be76a..6599c69e05 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -899,9 +899,23 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
>      }
>  }
>  
> +static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
> +{
> +    const ram_addr_t max_size = qemu_ram_get_max_length(rb);
> +    void *host = qemu_ram_get_host_addr(rb);
> +    RAMBlockNotifier *notifier = opaque;
> +
> +    if (host) {
> +        notifier->ram_block_added(notifier, host, max_size);
> +    }
> +    return 0;
> +}
> +
>  void ram_block_notifier_add(RAMBlockNotifier *n)
>  {
>      QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
> +    /* Notify about all existing ram blocks. */
> +    qemu_ram_foreach_block(ram_block_notify_add_single, n);
>  }
>  
>  void ram_block_notifier_remove(RAMBlockNotifier *n)
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 81753bbb34..9760ac9068 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -59,6 +59,7 @@ const char *qemu_ram_get_idstr(RAMBlock *rb);
>  void *qemu_ram_get_host_addr(RAMBlock *rb);
>  ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
>  ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
> +ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
>  bool qemu_ram_is_shared(RAMBlock *rb);
>  bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
>  void qemu_ram_set_uf_zeroable(RAMBlock *rb);
> diff --git a/stubs/ram-block.c b/stubs/ram-block.c
> index 73c0a3ee08..10855b52dd 100644
> --- a/stubs/ram-block.c
> +++ b/stubs/ram-block.c
> @@ -2,21 +2,6 @@
>  #include "exec/ramlist.h"
>  #include "exec/cpu-common.h"
>  
> -void *qemu_ram_get_host_addr(RAMBlock *rb)
> -{
> -    return 0;
> -}
> -
> -ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
> -{
> -    return 0;
> -}
> -
> -ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
> -{
> -    return 0;
> -}

Maybe put into another patch?

Actually I'm thinking whether it would worth to do...  They're still
declared in include/exec/cpu-common.h, so logically who includes the
header but linked against stubs can still call this function.  So
keeping them there still make sense to me.

> -
>  void ram_block_notifier_add(RAMBlockNotifier *n)
>  {
>  }
> @@ -24,8 +9,3 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
>  void ram_block_notifier_remove(RAMBlockNotifier *n)
>  {
>  }
> -
> -int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
> -{
> -    return 0;
> -}
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 813f7ec564..71e02e7f35 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -376,8 +376,13 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
>                                        void *host, size_t size)
>  {
>      QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
> +    int ret;
> +
>      trace_qemu_vfio_ram_block_added(s, host, size);
> -    qemu_vfio_dma_map(s, host, size, false, NULL);
> +    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
> +    if (ret) {
> +        error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host, size, ret);
> +    }

Irrelevant change (another patch)?

>  }
>  
>  static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
> @@ -390,33 +395,14 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
>      }
>  }
>  
> -static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
> -{
> -    void *host_addr = qemu_ram_get_host_addr(rb);
> -    ram_addr_t length = qemu_ram_get_used_length(rb);
> -    int ret;
> -    QEMUVFIOState *s = opaque;
> -
> -    if (!host_addr) {
> -        return 0;
> -    }
> -    ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
> -    if (ret) {
> -        fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
> -                host_addr, (uint64_t)length);
> -    }
> -    return 0;
> -}
> -
>  static void qemu_vfio_open_common(QEMUVFIOState *s)
>  {
>      qemu_mutex_init(&s->lock);
>      s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
>      s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
> -    ram_block_notifier_add(&s->ram_notifier);
>      s->low_water_mark = QEMU_VFIO_IOVA_MIN;
>      s->high_water_mark = QEMU_VFIO_IOVA_MAX;
> -    qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
> +    ram_block_notifier_add(&s->ram_notifier);

Pure question: this looks like a good improvement, however do you know
why HAX and SEV do not need to init ramblock?

Thanks,

>  }
>  
>  /**
> -- 
> 2.24.1
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 02/16] util: vfio-helpers: Fix qemu_vfio_close()
  2020-02-12 13:42 ` [PATCH v2 fixed 02/16] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
@ 2020-02-18 22:00   ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2020-02-18 22:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Wed, Feb 12, 2020 at 02:42:40PM +0100, David Hildenbrand wrote:
> qemu_vfio_undo_mapping() will decrement the number of mappings and
> reshuffle the array elements to fit into the reduced size.
> 
> Iterating over all elements like this does not work as expected, let's make
> sure to remove all mappings properly.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()
  2020-02-12 13:42 ` [PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
@ 2020-02-18 22:07   ` Peter Xu
  2020-02-19  8:49     ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Xu @ 2020-02-18 22:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Wed, Feb 12, 2020 at 02:42:41PM +0100, David Hildenbrand wrote:
> Everybody discards the error. Let's error_report() instead so this error
> doesn't get lost.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

IMHO error_setg() should be preferred comparing to error_report()
because it has a context to be delivered to the caller, so the error
has a better chance to be used in a better way (e.g., QMP only
supports error_setg()).

A better solution is that we deliver the error upper.  For example,
qemu_vfio_dma_map() is one caller of qemu_vfio_undo_mapping, if you
see the callers of qemu_vfio_dma_map() you'll notice most of them has
Error** defined (e.g., nvme_init_queue).  Then we can link all of them
up.

Another lazy solution (and especially if vfio-helpers are still mostly
used only by advanced users) is we can simply pass in &error_abort for
the three callers then they won't be missed...

Thanks,

> ---
>  util/vfio-helpers.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index d6332522c1..13dd962d95 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -540,8 +540,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
>  /**
>   * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
>   */
> -static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
> -                                   Error **errp)
> +static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
>  {
>      int index;
>      struct vfio_iommu_type1_dma_unmap unmap = {
> @@ -556,7 +555,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
>      assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
>      assert(index >= 0 && index < s->nr_mappings);
>      if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
> -        error_setg(errp, "VFIO_UNMAP_DMA failed: %d", -errno);
> +        error_report("VFIO_UNMAP_DMA failed: %d", -errno);
>      }
>      memmove(mapping, &s->mappings[index + 1],
>              sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
> @@ -621,7 +620,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
>              assert(qemu_vfio_verify_mappings(s));
>              ret = qemu_vfio_do_mapping(s, host, size, iova0);
>              if (ret) {
> -                qemu_vfio_undo_mapping(s, mapping, NULL);
> +                qemu_vfio_undo_mapping(s, mapping);
>                  goto out;
>              }
>              s->low_water_mark += size;
> @@ -681,7 +680,7 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
>      if (!m) {
>          goto out;
>      }
> -    qemu_vfio_undo_mapping(s, m, NULL);
> +    qemu_vfio_undo_mapping(s, m);
>  out:
>      qemu_mutex_unlock(&s->lock);
>  }
> @@ -698,7 +697,7 @@ void qemu_vfio_close(QEMUVFIOState *s)
>          return;
>      }
>      while (s->nr_mappings) {
> -        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
> +        qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
>      }
>      ram_block_notifier_remove(&s->ram_notifier);
>      qemu_vfio_reset(s);
> -- 
> 2.24.1
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 04/16] util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  2020-02-12 13:42 ` [PATCH v2 fixed 04/16] util: vfio-helpers: Factor out removal " David Hildenbrand
@ 2020-02-18 22:10   ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2020-02-18 22:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Wed, Feb 12, 2020 at 02:42:42PM +0100, David Hildenbrand wrote:
> Factor it out and properly use it where applicable. Make
> qemu_vfio_undo_mapping() look like qemu_vfio_do_mapping(), passing the
> size and iova, not the mapping.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 05/16] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings()
  2020-02-12 13:42 ` [PATCH v2 fixed 05/16] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
@ 2020-02-18 22:10   ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2020-02-18 22:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	qemu-devel, Dr . David Alan Gilbert, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Wed, Feb 12, 2020 at 02:42:43PM +0100, David Hildenbrand wrote:
> Factor all settings out into qemu_ram_apply_settings().
> 
> For memory_try_enable_merging(), the important bit is that it won't be
> called with XEN - which is now still the case as new_block->host will
> remain NULL.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 06/16] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  2020-02-12 13:42 ` [PATCH v2 fixed 06/16] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
@ 2020-02-18 22:11   ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2020-02-18 22:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Richard Henderson,
	qemu-devel, Dr . David Alan Gilbert, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Wed, Feb 12, 2020 at 02:42:44PM +0100, David Hildenbrand wrote:
> I don't see why we shouldn't apply all settings to make it look like the
> surrounding RAM (and enable proper VMA merging).
> 
> Note: memory backend settings might have overridden these settings. We
> would need a callback to let the memory backend fix that up.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 07/16] exec: Drop "shared" parameter from ram_block_add()
  2020-02-12 13:42 ` [PATCH v2 fixed 07/16] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
@ 2020-02-18 22:12   ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2020-02-18 22:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Paolo Bonzini, Igor Mammedov, Richard Henderson

On Wed, Feb 12, 2020 at 02:42:45PM +0100, David Hildenbrand wrote:
> Properly store it in the flags of the ram block instead (and the flag
> even already exists and is used).
> 
> E.g., qemu_ram_is_shared() now properly succeeds on all ram blocks that are
> actually shared.
> 
> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks
  2020-02-18 22:00   ` Peter Xu
@ 2020-02-19  8:43     ` David Hildenbrand
  2020-02-19 11:27       ` David Hildenbrand
  2020-02-19 17:34       ` Peter Xu
  0 siblings, 2 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-19  8:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On 18.02.20 23:00, Peter Xu wrote:
> On Wed, Feb 12, 2020 at 02:42:39PM +0100, David Hildenbrand wrote:
>> Factor it out into common code when a new notifier is registered, just
>> as done with the memory region notifier. This allows us to have the
>> logic about how to process existing ram blocks at a central place (which
>> will be extended soon).
>>
>> Just like when adding a new ram block, we have to register the max_length
>> for now. We don't have a way to get notified about resizes yet, and some
>> memory would not be mapped when growing the ram block.
>>
>> Note: Currently, ram blocks are only "fake resized". All memory
>> (max_length) is accessible.
>>
>> We can get rid of a bunch of functions in stubs/ram-block.c . Print the
>> warning from inside qemu_vfio_ram_block_added().

[...]

>>  #include "exec/ramlist.h"
>>  #include "exec/cpu-common.h"
>>  
>> -void *qemu_ram_get_host_addr(RAMBlock *rb)
>> -{
>> -    return 0;
>> -}
>> -
>> -ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
>> -{
>> -    return 0;
>> -}
>> -
>> -ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
>> -{
>> -    return 0;
>> -}
> 
> Maybe put into another patch?
> 
> Actually I'm thinking whether it would worth to do...  They're still
> declared in include/exec/cpu-common.h, so logically who includes the
> header but linked against stubs can still call this function.  So
> keeping them there still make sense to me.

Why keep dead code around? If you look closely, the stubs really only
contain what's strictly necessary to make current code compile, not any
available ramblock related function.

I don't see a good reason for a separate patch either (after all, we're
removing the last users in this patch), but if more people agree, I can
move it to a separate patch.
[...]

>> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
>> index 813f7ec564..71e02e7f35 100644
>> --- a/util/vfio-helpers.c
>> +++ b/util/vfio-helpers.c
>> @@ -376,8 +376,13 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
>>                                        void *host, size_t size)
>>  {
>>      QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
>> +    int ret;
>> +
>>      trace_qemu_vfio_ram_block_added(s, host, size);
>> -    qemu_vfio_dma_map(s, host, size, false, NULL);
>> +    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
>> +    if (ret) {
>> +        error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host, size, ret);
>> +    }
> 
> Irrelevant change (another patch)?

This is the error that was printed in qemu_vfio_init_ramblock(). Not
moving it in this patch would mean we would stop printing the error.
[...]

>> -
>>  static void qemu_vfio_open_common(QEMUVFIOState *s)
>>  {
>>      qemu_mutex_init(&s->lock);
>>      s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
>>      s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
>> -    ram_block_notifier_add(&s->ram_notifier);
>>      s->low_water_mark = QEMU_VFIO_IOVA_MIN;
>>      s->high_water_mark = QEMU_VFIO_IOVA_MAX;
>> -    qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
>> +    ram_block_notifier_add(&s->ram_notifier);
> 
> Pure question: this looks like a good improvement, however do you know
> why HAX and SEV do not need to init ramblock?

They register very early (e.g., at accel init time), before any ram
blocks are added.

Thanks for your review!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()
  2020-02-18 22:07   ` Peter Xu
@ 2020-02-19  8:49     ` David Hildenbrand
  0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-19  8:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On 18.02.20 23:07, Peter Xu wrote:
> On Wed, Feb 12, 2020 at 02:42:41PM +0100, David Hildenbrand wrote:
>> Everybody discards the error. Let's error_report() instead so this error
>> doesn't get lost.
>>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> IMHO error_setg() should be preferred comparing to error_report()
> because it has a context to be delivered to the caller, so the error
> has a better chance to be used in a better way (e.g., QMP only
> supports error_setg()).

Please note that I decided to go for error_report() because that's also
what's being done in the matching opposite way: qemu_vfio_do_mapping().

> 
> A better solution is that we deliver the error upper.  For example,
> qemu_vfio_dma_map() is one caller of qemu_vfio_undo_mapping, if you
> see the callers of qemu_vfio_dma_map() you'll notice most of them has
> Error** defined (e.g., nvme_init_queue).  Then we can link all of them
> up.

Propagating errors is helpful if the caller can actually do something
with the error. If it's a function that's supposed to never fail (which
is the case here obviously), then there is not much benefit in doing so.

> 
> Another lazy solution (and especially if vfio-helpers are still mostly
> used only by advanced users) is we can simply pass in &error_abort for
> the three callers then they won't be missed...

I prefer this variant, as it matches qemu_vfio_do_mapping() - except
that we don't report -errno - which is fine, because the function is
expected to not fail in any sane use case.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks
  2020-02-19  8:43     ` David Hildenbrand
@ 2020-02-19 11:27       ` David Hildenbrand
  2020-02-19 17:34       ` Peter Xu
  1 sibling, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-19 11:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On 19.02.20 09:43, David Hildenbrand wrote:
> On 18.02.20 23:00, Peter Xu wrote:
>> On Wed, Feb 12, 2020 at 02:42:39PM +0100, David Hildenbrand wrote:
>>> Factor it out into common code when a new notifier is registered, just
>>> as done with the memory region notifier. This allows us to have the
>>> logic about how to process existing ram blocks at a central place (which
>>> will be extended soon).
>>>
>>> Just like when adding a new ram block, we have to register the max_length
>>> for now. We don't have a way to get notified about resizes yet, and some
>>> memory would not be mapped when growing the ram block.
>>>
>>> Note: Currently, ram blocks are only "fake resized". All memory
>>> (max_length) is accessible.
>>>
>>> We can get rid of a bunch of functions in stubs/ram-block.c . Print the
>>> warning from inside qemu_vfio_ram_block_added().
> 
> [...]
> 
>>>  #include "exec/ramlist.h"
>>>  #include "exec/cpu-common.h"
>>>  
>>> -void *qemu_ram_get_host_addr(RAMBlock *rb)
>>> -{
>>> -    return 0;
>>> -}
>>> -
>>> -ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
>>> -{
>>> -    return 0;
>>> -}
>>> -
>>> -ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
>>> -{
>>> -    return 0;
>>> -}
>>
>> Maybe put into another patch?
>>
>> Actually I'm thinking whether it would worth to do...  They're still
>> declared in include/exec/cpu-common.h, so logically who includes the
>> header but linked against stubs can still call this function.  So
>> keeping them there still make sense to me.
> 
> Why keep dead code around? If you look closely, the stubs really only
> contain what's strictly necessary to make current code compile, not any
> available ramblock related function.
> 
> I don't see a good reason for a separate patch either (after all, we're
> removing the last users in this patch), but if more people agree, I can
> move it to a separate patch.

FWIW, moved it to a separate patch :)


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks
  2020-02-19  8:43     ` David Hildenbrand
  2020-02-19 11:27       ` David Hildenbrand
@ 2020-02-19 17:34       ` Peter Xu
  1 sibling, 0 replies; 46+ messages in thread
From: Peter Xu @ 2020-02-19 17:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	qemu-devel, Alex Williamson, Stefan Hajnoczi, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Wed, Feb 19, 2020 at 09:43:02AM +0100, David Hildenbrand wrote:
> On 18.02.20 23:00, Peter Xu wrote:
> > On Wed, Feb 12, 2020 at 02:42:39PM +0100, David Hildenbrand wrote:
> >> Factor it out into common code when a new notifier is registered, just
> >> as done with the memory region notifier. This allows us to have the
> >> logic about how to process existing ram blocks at a central place (which
> >> will be extended soon).
> >>
> >> Just like when adding a new ram block, we have to register the max_length
> >> for now. We don't have a way to get notified about resizes yet, and some
> >> memory would not be mapped when growing the ram block.
> >>
> >> Note: Currently, ram blocks are only "fake resized". All memory
> >> (max_length) is accessible.
> >>
> >> We can get rid of a bunch of functions in stubs/ram-block.c . Print the
> >> warning from inside qemu_vfio_ram_block_added().
> 
> [...]
> 
> >>  #include "exec/ramlist.h"
> >>  #include "exec/cpu-common.h"
> >>  
> >> -void *qemu_ram_get_host_addr(RAMBlock *rb)
> >> -{
> >> -    return 0;
> >> -}
> >> -
> >> -ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
> >> -{
> >> -    return 0;
> >> -}
> >> -
> >> -ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
> >> -{
> >> -    return 0;
> >> -}
> > 
> > Maybe put into another patch?
> > 
> > Actually I'm thinking whether it would worth to do...  They're still
> > declared in include/exec/cpu-common.h, so logically who includes the
> > header but linked against stubs can still call this function.  So
> > keeping them there still make sense to me.
> 
> Why keep dead code around? If you look closely, the stubs really only
> contain what's strictly necessary to make current code compile, not any
> available ramblock related function.

Seems correct.  Then it's fine.

> 
> I don't see a good reason for a separate patch either (after all, we're
> removing the last users in this patch), but if more people agree, I can
> move it to a separate patch.
> [...]
> 
> >> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> >> index 813f7ec564..71e02e7f35 100644
> >> --- a/util/vfio-helpers.c
> >> +++ b/util/vfio-helpers.c
> >> @@ -376,8 +376,13 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
> >>                                        void *host, size_t size)
> >>  {
> >>      QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
> >> +    int ret;
> >> +
> >>      trace_qemu_vfio_ram_block_added(s, host, size);
> >> -    qemu_vfio_dma_map(s, host, size, false, NULL);
> >> +    ret = qemu_vfio_dma_map(s, host, size, false, NULL);
> >> +    if (ret) {
> >> +        error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host, size, ret);
> >> +    }
> > 
> > Irrelevant change (another patch)?
> 
> This is the error that was printed in qemu_vfio_init_ramblock(). Not
> moving it in this patch would mean we would stop printing the error.
> [...]
> 
> >> -
> >>  static void qemu_vfio_open_common(QEMUVFIOState *s)
> >>  {
> >>      qemu_mutex_init(&s->lock);
> >>      s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
> >>      s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
> >> -    ram_block_notifier_add(&s->ram_notifier);
> >>      s->low_water_mark = QEMU_VFIO_IOVA_MIN;
> >>      s->high_water_mark = QEMU_VFIO_IOVA_MAX;
> >> -    qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
> >> +    ram_block_notifier_add(&s->ram_notifier);
> > 
> > Pure question: this looks like a good improvement, however do you know
> > why HAX and SEV do not need to init ramblock?
> 
> They register very early (e.g., at accel init time), before any ram
> blocks are added.

That's what I thought but I did feel like it's tricky (not anything
about this patch, though).  Say, I don't see how that's guaranteed
that accel init will always happen before creating any ramblocks.

Anyway, your patch looks good from that point of view. :)

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-12 13:42 ` [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize() David Hildenbrand
@ 2020-02-19 22:46   ` Peter Xu
  2020-02-24 10:50     ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Xu @ 2020-02-19 22:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
> Factor it out and add a comment.
> 
> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  util/mmap-alloc.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 27dcccd8ec..82f02a2cec 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>      return qemu_real_host_page_size;
>  }
>  
> +static inline size_t mmap_pagesize(int fd)
> +{
> +#if defined(__powerpc64__) && defined(__linux__)
> +    /* Mappings in the same segment must share the same page size */
> +    return qemu_fd_getpagesize(fd);
> +#else
> +    return qemu_real_host_page_size;
> +#endif
> +}

Pure question: This will return 4K even for huge pages on x86, is this
what we want?

This is of course not related to this specific patch which still
follows the old code, but I'm thinking whether it was intended or not
even in the old code (or is there anything to do with the
MAP_NORESERVE fix for ppc64 huge pages?).  Do you know the answer?

Thanks,

> +
>  void *qemu_ram_mmap(int fd,
>                      size_t size,
>                      size_t align,
>                      bool shared,
>                      bool is_pmem)
>  {
> +    const size_t pagesize = mmap_pagesize(fd);
>      int flags;
>      int map_sync_flags = 0;
>      int guardfd;
>      size_t offset;
> -    size_t pagesize;
>      size_t total;
>      void *guardptr;
>      void *ptr;
> @@ -113,7 +123,6 @@ void *qemu_ram_mmap(int fd,
>       * anonymous memory is OK.
>       */
>      flags = MAP_PRIVATE;
> -    pagesize = qemu_fd_getpagesize(fd);
>      if (fd == -1 || pagesize == qemu_real_host_page_size) {
>          guardfd = -1;
>          flags |= MAP_ANONYMOUS;
> @@ -123,7 +132,6 @@ void *qemu_ram_mmap(int fd,
>      }
>  #else
>      guardfd = -1;
> -    pagesize = qemu_real_host_page_size;
>      flags = MAP_PRIVATE | MAP_ANONYMOUS;
>  #endif
>  
> @@ -198,15 +206,10 @@ void *qemu_ram_mmap(int fd,
>  
>  void qemu_ram_munmap(int fd, void *ptr, size_t size)
>  {
> -    size_t pagesize;
> +    const size_t pagesize = mmap_pagesize(fd);
>  
>      if (ptr) {
>          /* Unmap both the RAM block and the guard page */
> -#if defined(__powerpc64__) && defined(__linux__)
> -        pagesize = qemu_fd_getpagesize(fd);
> -#else
> -        pagesize = qemu_real_host_page_size;
> -#endif
>          munmap(ptr, size + pagesize);
>      }
>  }
> -- 
> 2.24.1
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 09/16] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()
  2020-02-12 13:42 ` [PATCH v2 fixed 09/16] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
@ 2020-02-19 22:47   ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2020-02-19 22:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On Wed, Feb 12, 2020 at 02:42:47PM +0100, David Hildenbrand wrote:
> We want to reserve a memory region without actually populating memory.
> Let's factor that out.
> 
> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate()
  2020-02-12 13:42 ` [PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
@ 2020-02-19 22:49   ` Peter Xu
  2020-02-24 10:54     ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Xu @ 2020-02-19 22:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On Wed, Feb 12, 2020 at 02:42:48PM +0100, David Hildenbrand wrote:
> We want to populate memory within a reserved memory region. Let's factor
> that out.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Greg Kurz <groug@kaod.org>
> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

The naming could be a bit misleading IMO, because we didn't populate
the memory and it's still serviced on demand.  However I don't have a
quick and better name of that either...

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps
  2020-02-12 13:42 ` [PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps David Hildenbrand
@ 2020-02-19 22:50   ` Peter Xu
  2020-02-24 11:00     ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Xu @ 2020-02-19 22:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On Wed, Feb 12, 2020 at 02:42:49PM +0100, David Hildenbrand wrote:
> @@ -178,13 +183,15 @@ void *qemu_ram_mmap(int fd,
>      size_t offset, total;
>      void *ptr, *guardptr;
>  
> +    g_assert(QEMU_IS_ALIGNED(size, pagesize));

(NOTE: assertion is fine, but as I mentioned in previous patch, I
 think this pagesize could not be the real one that's going to be
 mapped...)

> +
>      /*
>       * Note: this always allocates at least one extra page of virtual address
>       * space, even if size is already aligned.
>       */
>      total = size + align;
>  
> -    guardptr = mmap_reserve(total, fd);
> +    guardptr = mmap_reserve(0, total, fd);

s/0/NULL/

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-19 22:46   ` Peter Xu
@ 2020-02-24 10:50     ` David Hildenbrand
  2020-02-24 10:57       ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-02-24 10:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On 19.02.20 23:46, Peter Xu wrote:
> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
>> Factor it out and add a comment.
>>
>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>> Cc: Greg Kurz <groug@kaod.org>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  util/mmap-alloc.c | 21 ++++++++++++---------
>>  1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>> index 27dcccd8ec..82f02a2cec 100644
>> --- a/util/mmap-alloc.c
>> +++ b/util/mmap-alloc.c
>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>>      return qemu_real_host_page_size;
>>  }
>>  
>> +static inline size_t mmap_pagesize(int fd)
>> +{
>> +#if defined(__powerpc64__) && defined(__linux__)
>> +    /* Mappings in the same segment must share the same page size */
>> +    return qemu_fd_getpagesize(fd);
>> +#else
>> +    return qemu_real_host_page_size;
>> +#endif
>> +}
> 
> Pure question: This will return 4K even for huge pages on x86, is this
> what we want?

(was asking myself the same question) I *think* it's intended. It's
mainly only used to allocate one additional guard page. The callers of
qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
huge pages).

Of course, a 4k guard page is sufficient - unless we can't use that
(special case for ppc64 here).

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate()
  2020-02-19 22:49   ` Peter Xu
@ 2020-02-24 10:54     ` David Hildenbrand
  0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-24 10:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On 19.02.20 23:49, Peter Xu wrote:
> On Wed, Feb 12, 2020 at 02:42:48PM +0100, David Hildenbrand wrote:
>> We want to populate memory within a reserved memory region. Let's factor
>> that out.
>>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>> Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Greg Kurz <groug@kaod.org>
>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> The naming could be a bit misleading IMO, because we didn't populate
> the memory and it's still serviced on demand.  However I don't have a
> quick and better name of that either...

Right, depends I would say. With enable_mlock it would get serviced
right away. I also didn't find a better matching name ...

Thanks!

> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-24 10:50     ` David Hildenbrand
@ 2020-02-24 10:57       ` David Hildenbrand
  2020-02-24 14:16         ` Murilo Opsfelder Araújo
  2020-02-24 17:36         ` Peter Xu
  0 siblings, 2 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-24 10:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On 24.02.20 11:50, David Hildenbrand wrote:
> On 19.02.20 23:46, Peter Xu wrote:
>> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
>>> Factor it out and add a comment.
>>>
>>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
>>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>> Cc: Greg Kurz <groug@kaod.org>
>>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  util/mmap-alloc.c | 21 ++++++++++++---------
>>>  1 file changed, 12 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>>> index 27dcccd8ec..82f02a2cec 100644
>>> --- a/util/mmap-alloc.c
>>> +++ b/util/mmap-alloc.c
>>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>>>      return qemu_real_host_page_size;
>>>  }
>>>  
>>> +static inline size_t mmap_pagesize(int fd)
>>> +{
>>> +#if defined(__powerpc64__) && defined(__linux__)
>>> +    /* Mappings in the same segment must share the same page size */
>>> +    return qemu_fd_getpagesize(fd);
>>> +#else
>>> +    return qemu_real_host_page_size;
>>> +#endif
>>> +}
>>
>> Pure question: This will return 4K even for huge pages on x86, is this
>> what we want?
> 
> (was asking myself the same question) I *think* it's intended. It's
> mainly only used to allocate one additional guard page. The callers of
> qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
> huge pages).
> 
> Of course, a 4k guard page is sufficient - unless we can't use that
> (special case for ppc64 here).
> 
> Thanks!
> 

We could rename the function to mmap_guard_pagesize(), thoughts?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps
  2020-02-19 22:50   ` Peter Xu
@ 2020-02-24 11:00     ` David Hildenbrand
  0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-24 11:00 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On 19.02.20 23:50, Peter Xu wrote:
> On Wed, Feb 12, 2020 at 02:42:49PM +0100, David Hildenbrand wrote:
>> @@ -178,13 +183,15 @@ void *qemu_ram_mmap(int fd,
>>      size_t offset, total;
>>      void *ptr, *guardptr;
>>  
>> +    g_assert(QEMU_IS_ALIGNED(size, pagesize));
> 
> (NOTE: assertion is fine, but as I mentioned in previous patch, I
>  think this pagesize could not be the real one that's going to be
>  mapped...)

Right, maybe rename mmap_pagesize() to mmap_guard_pagesize() and provide
the real pagesize via mmap_pagesize() + assert sizes against that.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-24 10:57       ` David Hildenbrand
@ 2020-02-24 14:16         ` Murilo Opsfelder Araújo
  2020-02-24 14:25           ` Murilo Opsfelder Araújo
  2020-02-26 17:36           ` David Hildenbrand
  2020-02-24 17:36         ` Peter Xu
  1 sibling, 2 replies; 46+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-02-24 14:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Igor Mammedov, Richard Henderson

On Monday, February 24, 2020 7:57:03 AM -03 David Hildenbrand wrote:
> On 24.02.20 11:50, David Hildenbrand wrote:
> > On 19.02.20 23:46, Peter Xu wrote:
> >> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
> >>> Factor it out and add a comment.
> >>>
> >>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> >>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> >>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> >>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> >>> Cc: Greg Kurz <groug@kaod.org>
> >>> Cc: Eduardo Habkost <ehabkost@redhat.com>
> >>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>> Cc: Igor Mammedov <imammedo@redhat.com>
> >>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>> ---
> >>>
> >>>  util/mmap-alloc.c | 21 ++++++++++++---------
> >>>  1 file changed, 12 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> >>> index 27dcccd8ec..82f02a2cec 100644
> >>> --- a/util/mmap-alloc.c
> >>> +++ b/util/mmap-alloc.c
> >>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char
> >>> *mem_path)
> >>>
> >>>      return qemu_real_host_page_size;
> >>>
> >>>  }
> >>>
> >>> +static inline size_t mmap_pagesize(int fd)
> >>> +{
> >>> +#if defined(__powerpc64__) && defined(__linux__)
> >>> +    /* Mappings in the same segment must share the same page size */
> >>> +    return qemu_fd_getpagesize(fd);
> >>> +#else
> >>> +    return qemu_real_host_page_size;
> >>> +#endif
> >>> +}
> >>
> >> Pure question: This will return 4K even for huge pages on x86, is this
> >> what we want?
> >
> > (was asking myself the same question) I *think* it's intended. It's
> > mainly only used to allocate one additional guard page. The callers of
> > qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
> > huge pages).
> >
> > Of course, a 4k guard page is sufficient - unless we can't use that
> > (special case for ppc64 here).
> >
> > Thanks!
>
> We could rename the function to mmap_guard_pagesize(), thoughts?

The existing qemu_fd_getpagesize() already returns qemu_real_host_page_size for
non-anonymous mappings (when fd == -1).  I think this new mmap_pagesize() could
be dropped in favor of qemu_fd_getpagesize().

A side effect of this change would be guard page using a bit more memory for
non-anonymous mapping.  Could that be a problem?

What do you think?

--
Murilo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-24 14:16         ` Murilo Opsfelder Araújo
@ 2020-02-24 14:25           ` Murilo Opsfelder Araújo
  2020-02-24 14:39             ` David Hildenbrand
  2020-02-26 17:36           ` David Hildenbrand
  1 sibling, 1 reply; 46+ messages in thread
From: Murilo Opsfelder Araújo @ 2020-02-24 14:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Igor Mammedov, Richard Henderson

On Monday, February 24, 2020 11:16:16 AM -03 Murilo Opsfelder Araújo wrote:
> On Monday, February 24, 2020 7:57:03 AM -03 David Hildenbrand wrote:
> > On 24.02.20 11:50, David Hildenbrand wrote:
> > > On 19.02.20 23:46, Peter Xu wrote:
> > >> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
> > >>> Factor it out and add a comment.
> > >>>
> > >>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> > >>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> > >>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> > >>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > >>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> > >>> Cc: Greg Kurz <groug@kaod.org>
> > >>> Cc: Eduardo Habkost <ehabkost@redhat.com>
> > >>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >>> Cc: Igor Mammedov <imammedo@redhat.com>
> > >>> Signed-off-by: David Hildenbrand <david@redhat.com>
> > >>> ---
> > >>>
> > >>>  util/mmap-alloc.c | 21 ++++++++++++---------
> > >>>  1 file changed, 12 insertions(+), 9 deletions(-)
> > >>>
> > >>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > >>> index 27dcccd8ec..82f02a2cec 100644
> > >>> --- a/util/mmap-alloc.c
> > >>> +++ b/util/mmap-alloc.c
> > >>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char
> > >>> *mem_path)
> > >>>
> > >>>      return qemu_real_host_page_size;
> > >>>
> > >>>  }
> > >>>
> > >>> +static inline size_t mmap_pagesize(int fd)
> > >>> +{
> > >>> +#if defined(__powerpc64__) && defined(__linux__)
> > >>> +    /* Mappings in the same segment must share the same page size */
> > >>> +    return qemu_fd_getpagesize(fd);
> > >>> +#else
> > >>> +    return qemu_real_host_page_size;
> > >>> +#endif
> > >>> +}
> > >>
> > >> Pure question: This will return 4K even for huge pages on x86, is this
> > >> what we want?
> > >
> > > (was asking myself the same question) I *think* it's intended. It's
> > > mainly only used to allocate one additional guard page. The callers of
> > > qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
> > > huge pages).
> > >
> > > Of course, a 4k guard page is sufficient - unless we can't use that
> > > (special case for ppc64 here).
> > >
> > > Thanks!
> >
> > We could rename the function to mmap_guard_pagesize(), thoughts?
>
> The existing qemu_fd_getpagesize() already returns qemu_real_host_page_size
> for non-anonymous mappings (when fd == -1).  I think this new
> mmap_pagesize() could be dropped in favor of qemu_fd_getpagesize().

s/non-//

I mean "for anonymous mappings".

>
> A side effect of this change would be guard page using a bit more memory for
> non-anonymous mapping.  Could that be a problem?
>
> What do you think?
>
> --
> Murilo


--
Murilo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-24 14:25           ` Murilo Opsfelder Araújo
@ 2020-02-24 14:39             ` David Hildenbrand
  0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-24 14:39 UTC (permalink / raw)
  To: Murilo Opsfelder Araújo
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Igor Mammedov, Richard Henderson

On 24.02.20 15:25, Murilo Opsfelder Araújo wrote:
> On Monday, February 24, 2020 11:16:16 AM -03 Murilo Opsfelder Araújo wrote:
>> On Monday, February 24, 2020 7:57:03 AM -03 David Hildenbrand wrote:
>>> On 24.02.20 11:50, David Hildenbrand wrote:
>>>> On 19.02.20 23:46, Peter Xu wrote:
>>>>> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
>>>>>> Factor it out and add a comment.
>>>>>>
>>>>>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
>>>>>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>>>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>>>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>>>>> Cc: Greg Kurz <groug@kaod.org>
>>>>>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>>>>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>
>>>>>>  util/mmap-alloc.c | 21 ++++++++++++---------
>>>>>>  1 file changed, 12 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>>>>>> index 27dcccd8ec..82f02a2cec 100644
>>>>>> --- a/util/mmap-alloc.c
>>>>>> +++ b/util/mmap-alloc.c
>>>>>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char
>>>>>> *mem_path)
>>>>>>
>>>>>>      return qemu_real_host_page_size;
>>>>>>
>>>>>>  }
>>>>>>
>>>>>> +static inline size_t mmap_pagesize(int fd)
>>>>>> +{
>>>>>> +#if defined(__powerpc64__) && defined(__linux__)
>>>>>> +    /* Mappings in the same segment must share the same page size */
>>>>>> +    return qemu_fd_getpagesize(fd);
>>>>>> +#else
>>>>>> +    return qemu_real_host_page_size;
>>>>>> +#endif
>>>>>> +}
>>>>>
>>>>> Pure question: This will return 4K even for huge pages on x86, is this
>>>>> what we want?
>>>>
>>>> (was asking myself the same question) I *think* it's intended. It's
>>>> mainly only used to allocate one additional guard page. The callers of
>>>> qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
>>>> huge pages).
>>>>
>>>> Of course, a 4k guard page is sufficient - unless we can't use that
>>>> (special case for ppc64 here).
>>>>
>>>> Thanks!
>>>
>>> We could rename the function to mmap_guard_pagesize(), thoughts?
>>
>> The existing qemu_fd_getpagesize() already returns qemu_real_host_page_size
>> for non-anonymous mappings (when fd == -1).  I think this new
>> mmap_pagesize() could be dropped in favor of qemu_fd_getpagesize().
> 
> s/non-//
> 
> I mean "for anonymous mappings".
> 
>>
>> A side effect of this change would be guard page using a bit more memory for
>> non-anonymous mapping.  Could that be a problem?
>>
>> What do you think?

At least under Linux it won't be an issue I guess. The same is probably
true for other OSs as well - the memory is not even readable, so why
should it consume memory. So it will only consume space in the virtual
address space.

Anyone reading along wants to object?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-24 10:57       ` David Hildenbrand
  2020-02-24 14:16         ` Murilo Opsfelder Araújo
@ 2020-02-24 17:36         ` Peter Xu
  2020-02-24 18:37           ` David Hildenbrand
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Xu @ 2020-02-24 17:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On Mon, Feb 24, 2020 at 11:57:03AM +0100, David Hildenbrand wrote:
> On 24.02.20 11:50, David Hildenbrand wrote:
> > On 19.02.20 23:46, Peter Xu wrote:
> >> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
> >>> Factor it out and add a comment.
> >>>
> >>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
> >>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> >>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> >>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> >>> Cc: Greg Kurz <groug@kaod.org>
> >>> Cc: Eduardo Habkost <ehabkost@redhat.com>
> >>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>> Cc: Igor Mammedov <imammedo@redhat.com>
> >>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>> ---
> >>>  util/mmap-alloc.c | 21 ++++++++++++---------
> >>>  1 file changed, 12 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> >>> index 27dcccd8ec..82f02a2cec 100644
> >>> --- a/util/mmap-alloc.c
> >>> +++ b/util/mmap-alloc.c
> >>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
> >>>      return qemu_real_host_page_size;
> >>>  }
> >>>  
> >>> +static inline size_t mmap_pagesize(int fd)
> >>> +{
> >>> +#if defined(__powerpc64__) && defined(__linux__)
> >>> +    /* Mappings in the same segment must share the same page size */
> >>> +    return qemu_fd_getpagesize(fd);
> >>> +#else
> >>> +    return qemu_real_host_page_size;
> >>> +#endif
> >>> +}
> >>
> >> Pure question: This will return 4K even for huge pages on x86, is this
> >> what we want?
> > 
> > (was asking myself the same question) I *think* it's intended. It's
> > mainly only used to allocate one additional guard page. The callers of
> > qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
> > huge pages).
> > 
> > Of course, a 4k guard page is sufficient - unless we can't use that
> > (special case for ppc64 here).
> > 
> > Thanks!
> > 
> 
> We could rename the function to mmap_guard_pagesize(), thoughts?

Yeh this looks better.

Out of topic: Actually I'm still confused on why we'd need to do so
much to just leave an PROT_NONE page after the buffer.  If the
hypervisor is buggy, it can logically writes to anywhere after all.
It's not a stack structure but it can be any guest RAM so I'm not sure
overflow detection really matters that much...

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-24 17:36         ` Peter Xu
@ 2020-02-24 18:37           ` David Hildenbrand
  0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-24 18:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Murilo Opsfelder Araujo, Igor Mammedov,
	Richard Henderson

On 24.02.20 18:36, Peter Xu wrote:
> On Mon, Feb 24, 2020 at 11:57:03AM +0100, David Hildenbrand wrote:
>> On 24.02.20 11:50, David Hildenbrand wrote:
>>> On 19.02.20 23:46, Peter Xu wrote:
>>>> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
>>>>> Factor it out and add a comment.
>>>>>
>>>>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
>>>>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>>>> Cc: Greg Kurz <groug@kaod.org>
>>>>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>>>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>>  util/mmap-alloc.c | 21 ++++++++++++---------
>>>>>  1 file changed, 12 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>>>>> index 27dcccd8ec..82f02a2cec 100644
>>>>> --- a/util/mmap-alloc.c
>>>>> +++ b/util/mmap-alloc.c
>>>>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>>>>>      return qemu_real_host_page_size;
>>>>>  }
>>>>>  
>>>>> +static inline size_t mmap_pagesize(int fd)
>>>>> +{
>>>>> +#if defined(__powerpc64__) && defined(__linux__)
>>>>> +    /* Mappings in the same segment must share the same page size */
>>>>> +    return qemu_fd_getpagesize(fd);
>>>>> +#else
>>>>> +    return qemu_real_host_page_size;
>>>>> +#endif
>>>>> +}
>>>>
>>>> Pure question: This will return 4K even for huge pages on x86, is this
>>>> what we want?
>>>
>>> (was asking myself the same question) I *think* it's intended. It's
>>> mainly only used to allocate one additional guard page. The callers of
>>> qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
>>> huge pages).
>>>
>>> Of course, a 4k guard page is sufficient - unless we can't use that
>>> (special case for ppc64 here).
>>>
>>> Thanks!
>>>
>>
>> We could rename the function to mmap_guard_pagesize(), thoughts?
> 
> Yeh this looks better.
> 
> Out of topic: Actually I'm still confused on why we'd need to do so
> much to just leave an PROT_NONE page after the buffer.  If the
> hypervisor is buggy, it can logically writes to anywhere after all.
> It's not a stack structure but it can be any guest RAM so I'm not sure
> overflow detection really matters that much...

The comment in the file says

"Leave a single PROT_NONE page allocated after the RAM block, to serve
as a guard page guarding against potential buffer overflows."

So it's really just a safety net.

> 
> Thanks,
> 


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  2020-02-24 14:16         ` Murilo Opsfelder Araújo
  2020-02-24 14:25           ` Murilo Opsfelder Araújo
@ 2020-02-26 17:36           ` David Hildenbrand
  1 sibling, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-02-26 17:36 UTC (permalink / raw)
  To: Murilo Opsfelder Araújo
  Cc: Eduardo Habkost, Michael S . Tsirkin, Igor Kotrasinski,
	Richard Henderson, qemu-devel, Peter Xu, Dr . David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Igor Mammedov, Richard Henderson

On 24.02.20 15:16, Murilo Opsfelder Araújo wrote:
> On Monday, February 24, 2020 7:57:03 AM -03 David Hildenbrand wrote:
>> On 24.02.20 11:50, David Hildenbrand wrote:
>>> On 19.02.20 23:46, Peter Xu wrote:
>>>> On Wed, Feb 12, 2020 at 02:42:46PM +0100, David Hildenbrand wrote:
>>>>> Factor it out and add a comment.
>>>>>
>>>>> Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
>>>>> Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>>> Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>>>>> Cc: Greg Kurz <groug@kaod.org>
>>>>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>>>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>>
>>>>>  util/mmap-alloc.c | 21 ++++++++++++---------
>>>>>  1 file changed, 12 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>>>>> index 27dcccd8ec..82f02a2cec 100644
>>>>> --- a/util/mmap-alloc.c
>>>>> +++ b/util/mmap-alloc.c
>>>>> @@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char
>>>>> *mem_path)
>>>>>
>>>>>      return qemu_real_host_page_size;
>>>>>
>>>>>  }
>>>>>
>>>>> +static inline size_t mmap_pagesize(int fd)
>>>>> +{
>>>>> +#if defined(__powerpc64__) && defined(__linux__)
>>>>> +    /* Mappings in the same segment must share the same page size */
>>>>> +    return qemu_fd_getpagesize(fd);
>>>>> +#else
>>>>> +    return qemu_real_host_page_size;
>>>>> +#endif
>>>>> +}
>>>>
>>>> Pure question: This will return 4K even for huge pages on x86, is this
>>>> what we want?
>>>
>>> (was asking myself the same question) I *think* it's intended. It's
>>> mainly only used to allocate one additional guard page. The callers of
>>> qemu_ram_mmap() make sure that the size is properly aligned (e.g., to
>>> huge pages).
>>>
>>> Of course, a 4k guard page is sufficient - unless we can't use that
>>> (special case for ppc64 here).
>>>
>>> Thanks!
>>
>> We could rename the function to mmap_guard_pagesize(), thoughts?
> 
> The existing qemu_fd_getpagesize() already returns qemu_real_host_page_size for
> non-anonymous mappings (when fd == -1).  I think this new mmap_pagesize() could
> be dropped in favor of qemu_fd_getpagesize().
> 
> A side effect of this change would be guard page using a bit more memory for
> non-anonymous mapping.  Could that be a problem?
> 
> What do you think?

So, I'll stick to mmap_guard_pagesize() for now, but use the actual page
size (via qemu_fd_getpagesize()) in the new size asserts.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-02-26 17:38 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 13:42 [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
2020-02-18 22:00   ` Peter Xu
2020-02-19  8:43     ` David Hildenbrand
2020-02-19 11:27       ` David Hildenbrand
2020-02-19 17:34       ` Peter Xu
2020-02-12 13:42 ` [PATCH v2 fixed 02/16] util: vfio-helpers: Fix qemu_vfio_close() David Hildenbrand
2020-02-18 22:00   ` Peter Xu
2020-02-12 13:42 ` [PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping() David Hildenbrand
2020-02-18 22:07   ` Peter Xu
2020-02-19  8:49     ` David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 04/16] util: vfio-helpers: Factor out removal " David Hildenbrand
2020-02-18 22:10   ` Peter Xu
2020-02-12 13:42 ` [PATCH v2 fixed 05/16] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
2020-02-18 22:10   ` Peter Xu
2020-02-12 13:42 ` [PATCH v2 fixed 06/16] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
2020-02-18 22:11   ` Peter Xu
2020-02-12 13:42 ` [PATCH v2 fixed 07/16] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
2020-02-18 22:12   ` Peter Xu
2020-02-12 13:42 ` [PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize() David Hildenbrand
2020-02-19 22:46   ` Peter Xu
2020-02-24 10:50     ` David Hildenbrand
2020-02-24 10:57       ` David Hildenbrand
2020-02-24 14:16         ` Murilo Opsfelder Araújo
2020-02-24 14:25           ` Murilo Opsfelder Araújo
2020-02-24 14:39             ` David Hildenbrand
2020-02-26 17:36           ` David Hildenbrand
2020-02-24 17:36         ` Peter Xu
2020-02-24 18:37           ` David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 09/16] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2020-02-19 22:47   ` Peter Xu
2020-02-12 13:42 ` [PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
2020-02-19 22:49   ` Peter Xu
2020-02-24 10:54     ` David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps David Hildenbrand
2020-02-19 22:50   ` Peter Xu
2020-02-24 11:00     ` David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 12/16] util/mmap-alloc: Implement " David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks David Hildenbrand
2020-02-13 12:41   ` Paul Durrant
2020-02-12 13:42 ` [PATCH v2 fixed 14/16] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 15/16] util: oslib: Resizable anonymous allocations under POSIX David Hildenbrand
2020-02-12 13:42 ` [PATCH v2 fixed 16/16] exec: Ram blocks with resizable " David Hildenbrand
2020-02-12 18:03 ` [PATCH v2 fixed 00/16] " David Hildenbrand
2020-02-13 13:36   ` David Hildenbrand
2020-02-14 13:08   ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).