qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property
@ 2021-03-19 10:12 David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 01/14] softmmu/physmem: Mark shared anonymous memory RAM_SHARED David Hildenbrand
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Some fixes for shared anonymous memory, cleanups previously sent in other
context (resizeable allocations), followed by RAM_NORESERVE, implementing
it under POSIX using MAP_NORESERVE, and letting users configure it for
memory backens using the "reserve" property (default: true).

MAP_NORESERVE under Linux has in the context of QEMU an effect on
1) Private/shared anonymous memory
-> memory-backend-ram,id=mem0,size=10G
2) Private fd-based mappings
-> memory-backend-file,id=mem0,size=10G,mem-path=/dev/shm/0
-> memory-backend-memfd,id=mem0,size=10G
3) Private/shared hugetlb mappings
-> memory-backend-memfd,id=mem0,size=10G,hugetlb=on,hugetlbsize=2M

With MAP_NORESERVE/"reserve=off", we won't be reserving swap space (1/2) or
huge pages (3) for the whole memory region.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. MAP_NORESERVE tells the OS
"this mapping might be very sparse". This essentially allows
avoiding having to set "/proc/sys/vm/overcommit_memory == 1") when using
virtio-mem and also supporting hugetlbfs in the future.

virtio-mem currently only supports anonymous memory, in the future we want
to also support private memfd, shared file-based and shared hugetlbfs
mappings.

virtio-mem features I am currently working on that will make it all
play together with this work include:
1. Introducing a prealloc option for virtio-mem (e.g., using fallocate()
   when plugging blocks) to fail nicely when running out of
   backing storage like huge pages ("prealloc=on").
2. Handling virtio-mem requests via an iothread to not hold the BQL while
   populating/preallocating memory ("iothread=X").
3. Protecting unplugged memory e.g., using userfaultfd ("prot=uffd").
4. Dynamic reservation of swap space ("reserve=on")
5. Supporting resizable RAM block/memmory regions, such that we won't
   always expose a large, sparse memory region to the VM.
6. (resizeable allocations / optimized mmap handling when resizing RAM
    blocks)

v3 -> v4:
- Minor comment/description updates
- "softmmu/physmem: Fix ram_block_discard_range() to handle shared ..."
-- Extended description
- "util/mmap-alloc: Pass flags instead of separate bools to ..."
-- Move flags to include/qemu/osdep.h and rename to "QEMU_MAP_*"
- "memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()"
-- Adjust to new flags. Handle errors in mmap_activate() for now.
- "util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux"
-- Restrict support to Linux only for now
- "qmp: Include "reserve" property of memory backends"
-- Added
- "hmp: Print "reserve" property of memory backends with ..."
-- Added

v2 -> v3:
- Renamed "softmmu/physmem: Drop "shared" parameter from ram_block_add()"
  to "softmmu/physmem: Mark shared anonymous memory RAM_SHARED" and
  adjusted the description
- Added "softmmu/physmem: Fix ram_block_discard_range() to handle shared
  anonymous memory"
- Added "softmmu/physmem: Fix qemu_ram_remap() to handle shared anonymous
  memory"
- Added "util/mmap-alloc: Pass flags instead of separate bools to
  qemu_ram_mmap()"
- "util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE"
-- Further tweak code comments
-- Handle shared anonymous memory

v1 -> v2:
- Rebased to upstream and phs_mem_alloc simplifications
-- Upsteam added the "map_offset" parameter to many RAM allocation
   interfaces.
- "softmmu/physmem: Drop "shared" parameter from ram_block_add()"
-- Use local variable "shared"
- "memory: introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()"
-- Simplify due to phs_mem_alloc changes
- "util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE"
-- Add a whole bunch of comments.
-- Exclude shared anonymous memory that QEMU doesn't use
-- Special-case readonly mappings

Cc: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Philippe Mathieu-Daudé" <philmd@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Liam Merwick <liam.merwick@oracle.com>
Cc: Marcel Apfelbaum <mapfelba@redhat.com>

David Hildenbrand (14):
  softmmu/physmem: Mark shared anonymous memory RAM_SHARED
  softmmu/physmem: Fix ram_block_discard_range() to handle shared
    anonymous memory
  softmmu/physmem: Fix qemu_ram_remap() to handle shared anonymous
    memory
  util/mmap-alloc: Factor out calculation of the pagesize for the guard
    page
  util/mmap-alloc: Factor out reserving of a memory region to
    mmap_reserve()
  util/mmap-alloc: Factor out activating of memory to mmap_activate()
  softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd()
  softmmu/memory: Pass ram_flags to
    memory_region_init_ram_shared_nomigrate()
  util/mmap-alloc: Pass flags instead of separate bools to
    qemu_ram_mmap()
  memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()
  util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux
  hostmem: Wire up RAM_NORESERVE via "reserve" property
  qmp: Include "reserve" property of memory backends
  hmp: Print "reserve" property of memory backends with "info memdev"

 backends/hostmem-file.c                       |  11 +-
 backends/hostmem-memfd.c                      |   8 +-
 backends/hostmem-ram.c                        |   7 +-
 backends/hostmem.c                            |  33 +++
 hw/core/machine-hmp-cmds.c                    |   2 +
 hw/core/machine-qmp-cmds.c                    |   1 +
 hw/m68k/next-cube.c                           |   4 +-
 hw/misc/ivshmem.c                             |   5 +-
 include/exec/cpu-common.h                     |   1 +
 include/exec/memory.h                         |  42 ++--
 include/exec/ram_addr.h                       |   9 +-
 include/qemu/mmap-alloc.h                     |  16 +-
 include/qemu/osdep.h                          |  30 ++-
 include/sysemu/hostmem.h                      |   2 +-
 migration/ram.c                               |   3 +-
 qapi/machine.json                             |   6 +
 .../memory-region-housekeeping.cocci          |   8 +-
 softmmu/memory.c                              |  27 ++-
 softmmu/physmem.c                             |  61 +++--
 util/mmap-alloc.c                             | 212 +++++++++++++-----
 util/oslib-posix.c                            |   7 +-
 util/oslib-win32.c                            |  13 +-
 22 files changed, 352 insertions(+), 156 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 01/14] softmmu/physmem: Mark shared anonymous memory RAM_SHARED
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 02/14] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory David Hildenbrand
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's drop the "shared" parameter from ram_block_add() and properly
store it in the flags of the ram block instead, such that
qemu_ram_is_shared() properly succeeds on all ram blocks that were mapped
MAP_SHARED.

We'll use this information next to fix some cases with shared anonymous
memory.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/physmem.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 85034d9c11..76bb8e324e 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1927,8 +1927,9 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
     }
 }
 
-static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
+static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
+    const bool shared = qemu_ram_is_shared(new_block);
     RAMBlock *block;
     RAMBlock *last_block = NULL;
     ram_addr_t old_ram_size, new_ram_size;
@@ -2064,7 +2065,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
@@ -2127,10 +2128,13 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     if (host) {
         new_block->flags |= RAM_PREALLOC;
     }
+    if (share) {
+        new_block->flags |= RAM_SHARED;
+    }
     if (resizeable) {
         new_block->flags |= RAM_RESIZEABLE;
     }
-    ram_block_add(new_block, &local_err, share);
+    ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
         error_propagate(errp, local_err);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 02/14] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 01/14] softmmu/physmem: Mark shared anonymous memory RAM_SHARED David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 03/14] softmmu/physmem: Fix qemu_ram_remap() " David Hildenbrand
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

We can create shared anonymous memory via
    "-object memory-backend-ram,share=on,..."
which is, for example, required by PVRDMA for mremap() to work.

Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
have to use MADV_REMOVE: MADV_DONTNEED will only remove / zap all
relevant page table entries of the current process, the backend storage
will not get removed, resulting in no reduced memory consumption and
a repopulation of previous content on next access.

Shared anonymous memory is internally really just shmem, but without a
fd exposed. As we cannot use fallocate() without the fd to discard the
backing storage, MADV_REMOVE gets the same job done without a fd as
documented in "man 2 madvise". Removing backing storage implicitly
invalidates all page table entries with relevant mappings - an additional
MADV_DONTNEED is not required.

Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/physmem.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 76bb8e324e..afff96a6dc 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3506,6 +3506,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
         /* The logic here is messy;
          *    madvise DONTNEED fails for hugepages
          *    fallocate works on hugepages and shmem
+         *    shared anonymous memory requires madvise REMOVE
          */
         need_madvise = (rb->page_size == qemu_host_page_size);
         need_fallocate = rb->fd != -1;
@@ -3539,7 +3540,11 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
              * fallocate'd away).
              */
 #if defined(CONFIG_MADVISE)
-            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
+            if (qemu_ram_is_shared(rb) && rb->fd < 0) {
+                ret = madvise(host_startaddr, length, MADV_REMOVE);
+            } else {
+                ret = madvise(host_startaddr, length, MADV_DONTNEED);
+            }
             if (ret) {
                 ret = -errno;
                 error_report("ram_block_discard_range: Failed to discard range "
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 03/14] softmmu/physmem: Fix qemu_ram_remap() to handle shared anonymous memory
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 01/14] softmmu/physmem: Mark shared anonymous memory RAM_SHARED David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 02/14] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-23 20:40   ` Peter Xu
  2021-03-19 10:12 ` [PATCH v4 04/14] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

RAM_SHARED now also properly indicates shared anonymous memory. Let's check
that flag for anonymous memory as well, to restore the proper mapping.

Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/physmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index afff96a6dc..cc59f05593 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2222,13 +2222,13 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                 abort();
             } else {
                 flags = MAP_FIXED;
+                flags |= block->flags & RAM_SHARED ?
+                         MAP_SHARED : MAP_PRIVATE;
                 if (block->fd >= 0) {
-                    flags |= (block->flags & RAM_SHARED ?
-                              MAP_SHARED : MAP_PRIVATE);
                     area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
                                 flags, block->fd, offset);
                 } else {
-                    flags |= MAP_PRIVATE | MAP_ANONYMOUS;
+                    flags |= MAP_ANONYMOUS;
                     area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
                                 flags, -1, 0);
                 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 04/14] util/mmap-alloc: Factor out calculation of the pagesize for the guard page
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 03/14] softmmu/physmem: Fix qemu_ram_remap() " David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 05/14] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's factor out calculating the size of the guard page and rename the
variable to make it clearer that this pagesize only applies to the
guard page.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index e6fa8b598b..24854064b4 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,16 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+static inline size_t mmap_guard_pagesize(int fd)
+{
+#if defined(__powerpc64__) && defined(__linux__)
+    /* Mappings in the same segment must share the same page size */
+    return qemu_fd_getpagesize(fd);
+#else
+    return qemu_real_host_page_size;
+#endif
+}
+
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
@@ -90,12 +100,12 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem,
                     off_t map_offset)
 {
+    const size_t guard_pagesize = mmap_guard_pagesize(fd);
     int prot;
     int flags;
     int map_sync_flags = 0;
     int guardfd;
     size_t offset;
-    size_t pagesize;
     size_t total;
     void *guardptr;
     void *ptr;
@@ -116,8 +126,7 @@ void *qemu_ram_mmap(int fd,
      * anonymous memory is OK.
      */
     flags = MAP_PRIVATE;
-    pagesize = qemu_fd_getpagesize(fd);
-    if (fd == -1 || pagesize == qemu_real_host_page_size) {
+    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
         guardfd = -1;
         flags |= MAP_ANONYMOUS;
     } else {
@@ -126,7 +135,6 @@ void *qemu_ram_mmap(int fd,
     }
 #else
     guardfd = -1;
-    pagesize = qemu_real_host_page_size;
     flags = MAP_PRIVATE | MAP_ANONYMOUS;
 #endif
 
@@ -138,7 +146,7 @@ void *qemu_ram_mmap(int fd,
 
     assert(is_power_of_2(align));
     /* Always align to host page size */
-    assert(align >= pagesize);
+    assert(align >= guard_pagesize);
 
     flags = MAP_FIXED;
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
@@ -193,8 +201,8 @@ void *qemu_ram_mmap(int fd,
      * a guard page guarding against potential buffer overflows.
      */
     total -= offset;
-    if (total > size + pagesize) {
-        munmap(ptr + size + pagesize, total - size - pagesize);
+    if (total > size + guard_pagesize) {
+        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
     }
 
     return ptr;
@@ -202,15 +210,8 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
-    size_t pagesize;
-
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-#if defined(__powerpc64__) && defined(__linux__)
-        pagesize = qemu_fd_getpagesize(fd);
-#else
-        pagesize = qemu_real_host_page_size;
-#endif
-        munmap(ptr, size + pagesize);
+        munmap(ptr, size + mmap_guard_pagesize(fd));
     }
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 05/14] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (3 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 04/14] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 06/14] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

We want to reserve a memory region without actually populating memory.
Let's factor that out.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 58 +++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 24854064b4..223d66219c 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,38 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+/*
+ * Reserve a new memory region of the requested size to be used for mapping
+ * from the given fd (if any).
+ */
+static void *mmap_reserve(size_t size, int fd)
+{
+    int flags = MAP_PRIVATE;
+
+#if defined(__powerpc64__) && defined(__linux__)
+    /*
+     * On ppc64 mappings in the same segment (aka slice) must share the same
+     * page size. Since we will be re-allocating part of this segment
+     * from the supplied fd, we should make sure to use the same page size, to
+     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
+     * avoid allocating backing store memory.
+     * We do this unless we are using the system page size, in which case
+     * anonymous memory is OK.
+     */
+    if (fd == -1 || qemu_fd_getpagesize(fd) == qemu_real_host_page_size) {
+        fd = -1;
+        flags |= MAP_ANONYMOUS;
+    } else {
+        flags |= MAP_NORESERVE;
+    }
+#else
+    fd = -1;
+    flags |= MAP_ANONYMOUS;
+#endif
+
+    return mmap(0, size, PROT_NONE, flags, fd, 0);
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -104,7 +136,6 @@ void *qemu_ram_mmap(int fd,
     int prot;
     int flags;
     int map_sync_flags = 0;
-    int guardfd;
     size_t offset;
     size_t total;
     void *guardptr;
@@ -116,30 +147,7 @@ void *qemu_ram_mmap(int fd,
      */
     total = size + align;
 
-#if defined(__powerpc64__) && defined(__linux__)
-    /* On ppc64 mappings in the same segment (aka slice) must share the same
-     * page size. Since we will be re-allocating part of this segment
-     * from the supplied fd, we should make sure to use the same page size, to
-     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
-     * avoid allocating backing store memory.
-     * We do this unless we are using the system page size, in which case
-     * anonymous memory is OK.
-     */
-    flags = MAP_PRIVATE;
-    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
-        guardfd = -1;
-        flags |= MAP_ANONYMOUS;
-    } else {
-        guardfd = fd;
-        flags |= MAP_NORESERVE;
-    }
-#else
-    guardfd = -1;
-    flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#endif
-
-    guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
-
+    guardptr = mmap_reserve(total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 06/14] util/mmap-alloc: Factor out activating of memory to mmap_activate()
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (4 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 05/14] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 07/14] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd() David Hildenbrand
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

We want to activate memory within a reserved memory region, to make it
accessible. Let's factor that out.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 94 +++++++++++++++++++++++++----------------------
 1 file changed, 50 insertions(+), 44 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 223d66219c..0e2bd7bc0e 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -114,6 +114,52 @@ static void *mmap_reserve(size_t size, int fd)
     return mmap(0, size, PROT_NONE, flags, fd, 0);
 }
 
+/*
+ * Activate memory in a reserved region from the given fd (if any), to make
+ * it accessible.
+ */
+static void *mmap_activate(void *ptr, size_t size, int fd, bool readonly,
+                           bool shared, bool is_pmem, off_t map_offset)
+{
+    const int prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
+    int map_sync_flags = 0;
+    int flags = MAP_FIXED;
+    void *activated_ptr;
+
+    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
+    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
+    activated_ptr = mmap(ptr, size, prot, flags | map_sync_flags, fd,
+                         map_offset);
+    if (activated_ptr == MAP_FAILED && map_sync_flags) {
+        if (errno == ENOTSUP) {
+            char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+            char *file_name = g_malloc0(PATH_MAX);
+            int len = readlink(proc_link, file_name, PATH_MAX - 1);
+
+            if (len < 0) {
+                len = 0;
+            }
+            file_name[len] = '\0';
+            fprintf(stderr, "Warning: requesting persistence across crashes "
+                    "for backend file %s failed. Proceeding without "
+                    "persistence, data might become corrupted in case of host "
+                    "crash.\n", file_name);
+            g_free(proc_link);
+            g_free(file_name);
+        }
+        /*
+         * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
+         * again without these flags to handle backwards compatibility.
+         */
+        activated_ptr = mmap(ptr, size, prot, flags, fd, map_offset);
+    }
+    return activated_ptr;
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -133,13 +179,8 @@ void *qemu_ram_mmap(int fd,
                     off_t map_offset)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
-    int prot;
-    int flags;
-    int map_sync_flags = 0;
-    size_t offset;
-    size_t total;
-    void *guardptr;
-    void *ptr;
+    size_t offset, total;
+    void *ptr, *guardptr;
 
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -156,45 +197,10 @@ void *qemu_ram_mmap(int fd,
     /* Always align to host page size */
     assert(align >= guard_pagesize);
 
-    flags = MAP_FIXED;
-    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
-    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-    if (shared && is_pmem) {
-        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
-    }
-
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
-
-    ptr = mmap(guardptr + offset, size, prot,
-               flags | map_sync_flags, fd, map_offset);
-
-    if (ptr == MAP_FAILED && map_sync_flags) {
-        if (errno == ENOTSUP) {
-            char *proc_link, *file_name;
-            int len;
-            proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
-            file_name = g_malloc0(PATH_MAX);
-            len = readlink(proc_link, file_name, PATH_MAX - 1);
-            if (len < 0) {
-                len = 0;
-            }
-            file_name[len] = '\0';
-            fprintf(stderr, "Warning: requesting persistence across crashes "
-                    "for backend file %s failed. Proceeding without "
-                    "persistence, data might become corrupted in case of host "
-                    "crash.\n", file_name);
-            g_free(proc_link);
-            g_free(file_name);
-        }
-        /*
-         * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
-         * we will remove these flags to handle compatibility.
-         */
-        ptr = mmap(guardptr + offset, size, prot, flags, fd, map_offset);
-    }
-
+    ptr = mmap_activate(guardptr + offset, size, fd, readonly, shared, is_pmem,
+                        map_offset);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 07/14] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd()
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (5 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 06/14] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 08/14] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate() David Hildenbrand
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's pass in ram flags just like we do with qemu_ram_alloc_from_file(),
to clean up and prepare for more flags.

Simplify the documentation of passed ram flags: Looking at our
documentation of RAM_SHARED and RAM_PMEM is sufficient, no need to be
repetitive.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem-memfd.c | 7 ++++---
 hw/misc/ivshmem.c        | 5 ++---
 include/exec/memory.h    | 9 +++------
 include/exec/ram_addr.h  | 6 +-----
 softmmu/memory.c         | 7 +++----
 5 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 69b0ae30bb..93b5d1a4cf 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -36,6 +36,7 @@ static void
 memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
     HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend);
+    uint32_t ram_flags;
     char *name;
     int fd;
 
@@ -53,9 +54,9 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
-                                   name, backend->size,
-                                   backend->share, fd, 0, errp);
+    ram_flags = backend->share ? RAM_SHARED : 0;
+    memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
+                                   backend->size, ram_flags, fd, 0, errp);
     g_free(name);
 }
 
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index a1fa4878be..1ba4a98377 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -493,9 +493,8 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
     size = buf.st_size;
 
     /* mmap the region and map into the BAR2 */
-    memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
-                                   "ivshmem.bar2", size, true, fd, 0,
-                                   &local_err);
+    memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s), "ivshmem.bar2",
+                                   size, RAM_SHARED, fd, 0, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 54ccf1a5f0..9ffd8c662c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -967,10 +967,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  *         (getpagesize()) will be used.
- * @ram_flags: Memory region features:
- *             - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
- *             - RAM_PMEM: the memory is persistent memory
- *             Other bits are ignored now.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
  * @path: the path in which to allocate the RAM.
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
@@ -996,7 +993,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
  * @owner: the object that tracks the region's reference count
  * @name: the name of the region.
  * @size: size of the region.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
  * @fd: the fd to mmap.
  * @offset: offset within the file referenced by fd
  * @errp: pointer to Error*, to store an error if it happens.
@@ -1008,7 +1005,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
                                     Object *owner,
                                     const char *name,
                                     uint64_t size,
-                                    bool share,
+                                    uint32_t ram_flags,
                                     int fd,
                                     ram_addr_t offset,
                                     Error **errp);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 3cb9791df3..a7e3378340 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -104,11 +104,7 @@ long qemu_maxrampagesize(void);
  * Parameters:
  *  @size: the size in bytes of the ram block
  *  @mr: the memory region where the ram block is
- *  @ram_flags: specify the properties of the ram block, which can be one
- *              or bit-or of following values
- *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
- *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
- *              Other bits are ignored.
+ *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
  *  @mem_path or @fd: specify the backing file or device
  *  @readonly: true to open @path for reading, false for read/write.
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/softmmu/memory.c b/softmmu/memory.c
index c4730ec47a..9af1abd79b 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1609,7 +1609,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
                                     Object *owner,
                                     const char *name,
                                     uint64_t size,
-                                    bool share,
+                                    uint32_t ram_flags,
                                     int fd,
                                     ram_addr_t offset,
                                     Error **errp)
@@ -1619,9 +1619,8 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
-                                           share ? RAM_SHARED : 0,
-                                           fd, offset, false, &err);
+    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, offset,
+                                           false, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 08/14] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate()
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (6 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 07/14] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd() David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 09/14] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's forward ram_flags instead, renaming
memory_region_init_ram_shared_nomigrate() into
memory_region_init_ram_flags_nomigrate(). Forward flags to
qemu_ram_alloc() and qemu_ram_alloc_internal().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem-ram.c                        |  6 +++--
 hw/m68k/next-cube.c                           |  4 ++--
 include/exec/memory.h                         | 24 +++++++++----------
 include/exec/ram_addr.h                       |  2 +-
 .../memory-region-housekeeping.cocci          |  8 +++----
 softmmu/memory.c                              | 20 ++++++++--------
 softmmu/physmem.c                             | 24 ++++++++-----------
 7 files changed, 43 insertions(+), 45 deletions(-)

diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 5cc53e76c9..741e701062 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -19,6 +19,7 @@
 static void
 ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
+    uint32_t ram_flags;
     char *name;
 
     if (!backend->size) {
@@ -27,8 +28,9 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    memory_region_init_ram_shared_nomigrate(&backend->mr, OBJECT(backend), name,
-                           backend->size, backend->share, errp);
+    ram_flags = backend->share ? RAM_SHARED : 0;
+    memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
+                                           backend->size, ram_flags, errp);
     g_free(name);
 }
 
diff --git a/hw/m68k/next-cube.c b/hw/m68k/next-cube.c
index 92b45d760f..59ccae0d5e 100644
--- a/hw/m68k/next-cube.c
+++ b/hw/m68k/next-cube.c
@@ -986,8 +986,8 @@ static void next_cube_init(MachineState *machine)
     sysbus_mmio_map(SYS_BUS_DEVICE(pcdev), 1, 0x02100000);
 
     /* BMAP memory */
-    memory_region_init_ram_shared_nomigrate(bmapm1, NULL, "next.bmapmem", 64,
-                                            true, &error_fatal);
+    memory_region_init_ram_flags_nomigrate(bmapm1, NULL, "next.bmapmem", 64,
+                                           RAM_SHARED, &error_fatal);
     memory_region_add_subregion(sysmem, 0x020c0000, bmapm1);
     /* The Rev_2.5_v66.bin firmware accesses it at 0x820c0020, too */
     memory_region_init_alias(bmapm2, NULL, "next.bmapmem2", bmapm1, 0x0, 64);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9ffd8c662c..8879821371 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -904,27 +904,27 @@ void memory_region_init_ram_nomigrate(MemoryRegion *mr,
                                       Error **errp);
 
 /**
- * memory_region_init_ram_shared_nomigrate:  Initialize RAM memory region.
- *                                           Accesses into the region will
- *                                           modify memory directly.
+ * memory_region_init_ram_flags_nomigrate:  Initialize RAM memory region.
+ *                                          Accesses into the region will
+ *                                          modify memory directly.
  *
  * @mr: the #MemoryRegion to be initialized.
  * @owner: the object that tracks the region's reference count
  * @name: Region name, becomes part of RAMBlock name used in migration stream
  *        must be unique within any device
  * @size: size of the region.
- * @share: allow remapping RAM to different addresses
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED.
  * @errp: pointer to Error*, to store an error if it happens.
  *
- * Note that this function is similar to memory_region_init_ram_nomigrate.
- * The only difference is part of the RAM region can be remapped.
+ * Note that this function does not do anything to cause the data in the
+ * RAM memory region to be migrated; that is the responsibility of the caller.
  */
-void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
-                                             Object *owner,
-                                             const char *name,
-                                             uint64_t size,
-                                             bool share,
-                                             Error **errp);
+void memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
+                                            Object *owner,
+                                            const char *name,
+                                            uint64_t size,
+                                            uint32_t ram_flags,
+                                            Error **errp);
 
 /**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index a7e3378340..6d4513f8e2 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -122,7 +122,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
 
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                   MemoryRegion *mr, Error **errp);
-RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share, MemoryRegion *mr,
+RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags, MemoryRegion *mr,
                          Error **errp);
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t max_size,
                                     void (*resized)(const char*,
diff --git a/scripts/coccinelle/memory-region-housekeeping.cocci b/scripts/coccinelle/memory-region-housekeeping.cocci
index c768d8140a..29651ebde9 100644
--- a/scripts/coccinelle/memory-region-housekeeping.cocci
+++ b/scripts/coccinelle/memory-region-housekeeping.cocci
@@ -127,8 +127,8 @@ static void device_fn(DeviceState *dev, ...)
 - memory_region_init_rom(E1, NULL, E2, E3, E4);
 + memory_region_init_rom(E1, obj, E2, E3, E4);
 |
-- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5);
-+ memory_region_init_ram_shared_nomigrate(E1, obj, E2, E3, E4, E5);
+- memory_region_init_ram_flags_nomigrate(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_ram_flags_nomigrate(E1, obj, E2, E3, E4, E5);
 )
   ...+>
 }
@@ -152,8 +152,8 @@ static void device_fn(DeviceState *dev, ...)
 - memory_region_init_rom(E1, NULL, E2, E3, E4);
 + memory_region_init_rom(E1, OBJECT(dev), E2, E3, E4);
 |
-- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5);
-+ memory_region_init_ram_shared_nomigrate(E1, OBJECT(dev), E2, E3, E4, E5);
+- memory_region_init_ram_flags_nomigrate(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_ram_flags_nomigrate(E1, OBJECT(dev), E2, E3, E4, E5);
 )
   ...+>
 }
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9af1abd79b..9466401262 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1531,22 +1531,22 @@ void memory_region_init_ram_nomigrate(MemoryRegion *mr,
                                       uint64_t size,
                                       Error **errp)
 {
-    memory_region_init_ram_shared_nomigrate(mr, owner, name, size, false, errp);
+    memory_region_init_ram_flags_nomigrate(mr, owner, name, size, 0, errp);
 }
 
-void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
-                                             Object *owner,
-                                             const char *name,
-                                             uint64_t size,
-                                             bool share,
-                                             Error **errp)
+void memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
+                                            Object *owner,
+                                            const char *name,
+                                            uint64_t size,
+                                            uint32_t ram_flags,
+                                            Error **errp)
 {
     Error *err = NULL;
     memory_region_init(mr, owner, name, size);
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc(size, share, mr, &err);
+    mr->ram_block = qemu_ram_alloc(size, ram_flags, mr, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
@@ -1682,7 +1682,7 @@ void memory_region_init_rom_nomigrate(MemoryRegion *mr,
                                       uint64_t size,
                                       Error **errp)
 {
-    memory_region_init_ram_shared_nomigrate(mr, owner, name, size, false, errp);
+    memory_region_init_ram_flags_nomigrate(mr, owner, name, size, 0, errp);
     mr->readonly = true;
 }
 
@@ -1702,7 +1702,7 @@ void memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
     mr->terminates = true;
     mr->rom_device = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc(size, false,  mr, &err);
+    mr->ram_block = qemu_ram_alloc(size, 0, mr, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index cc59f05593..fdcd38ba61 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2108,12 +2108,14 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
                                   void (*resized)(const char*,
                                                   uint64_t length,
                                                   void *host),
-                                  void *host, bool resizeable, bool share,
+                                  void *host, uint32_t ram_flags,
                                   MemoryRegion *mr, Error **errp)
 {
     RAMBlock *new_block;
     Error *local_err = NULL;
 
+    assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE)) == 0);
+
     size = HOST_PAGE_ALIGN(size);
     max_size = HOST_PAGE_ALIGN(max_size);
     new_block = g_malloc0(sizeof(*new_block));
@@ -2125,15 +2127,10 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     new_block->fd = -1;
     new_block->page_size = qemu_real_host_page_size;
     new_block->host = host;
+    new_block->flags = ram_flags;
     if (host) {
         new_block->flags |= RAM_PREALLOC;
     }
-    if (share) {
-        new_block->flags |= RAM_SHARED;
-    }
-    if (resizeable) {
-        new_block->flags |= RAM_RESIZEABLE;
-    }
     ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
@@ -2146,15 +2143,14 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr, Error **errp)
 {
-    return qemu_ram_alloc_internal(size, size, NULL, host, false,
-                                   false, mr, errp);
+    return qemu_ram_alloc_internal(size, size, NULL, host, 0, mr, errp);
 }
 
-RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share,
+RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
-    return qemu_ram_alloc_internal(size, size, NULL, NULL, false,
-                                   share, mr, errp);
+    assert((ram_flags & ~RAM_SHARED) == 0);
+    return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
@@ -2163,8 +2159,8 @@ RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
                                                      void *host),
                                      MemoryRegion *mr, Error **errp)
 {
-    return qemu_ram_alloc_internal(size, maxsz, resized, NULL, true,
-                                   false, mr, errp);
+    return qemu_ram_alloc_internal(size, maxsz, resized, NULL,
+                                   RAM_RESIZEABLE, mr, errp);
 }
 
 static void reclaim_ramblock(RAMBlock *block)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 09/14] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (7 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 08/14] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate() David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-23 20:49   ` Peter Xu
  2021-03-19 10:12 ` [PATCH v4 10/14] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's pass flags instead of bools to prepare for passing other flags and
update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_
flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify
it.

We expose only flags that are currently supported by qemu_ram_mmap().
Maybe, we'll see qemu_mmap() in the future as well that can implement these
flags.

Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only
defined for some systems and we want to always be able to identify
these flags reliably inside qemu_ram_mmap() -- for example, to properly
warn when some future flags are not available or effective on a system.
Also, this way we can simplify PROT_ handling as well.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/mmap-alloc.h | 16 +++++++++-------
 include/qemu/osdep.h      | 18 ++++++++++++++++++
 softmmu/physmem.c         |  8 +++++---
 util/mmap-alloc.c         | 15 ++++++++-------
 util/oslib-posix.c        |  3 ++-
 5 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 456ff87df1..a60a2085b3 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,18 +7,22 @@ size_t qemu_fd_getpagesize(int fd);
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
 /**
- * qemu_ram_mmap: mmap the specified file or device.
+ * qemu_ram_mmap: mmap anonymous memory, the specified file or device.
+ *
+ * mmap() abstraction to map guest RAM, simplifying flag handling, taking
+ * care of alignment requirements and installing guard pages.
  *
  * Parameters:
  *  @fd: the file or the device to mmap
  *  @size: the number of bytes to be mmaped
  *  @align: if not zero, specify the alignment of the starting mapping address;
  *          otherwise, the alignment in use will be determined by QEMU.
- *  @readonly: true for a read-only mapping, false for read/write.
- *  @shared: map has RAM_SHARED flag.
- *  @is_pmem: map has RAM_PMEM flag.
+ *  @qemu_map_flags: QEMU_MAP_* flags
  *  @map_offset: map starts at offset of map_offset from the start of fd
  *
+ * Internally, MAP PRIVATE, MAP_ANONYMOUS and MAP_SHARED_VALIDATE are set
+ * implicitly based on other parameters.
+ *
  * Return:
  *  On success, return a pointer to the mapped area.
  *  On failure, return MAP_FAILED.
@@ -26,9 +30,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
-                    bool readonly,
-                    bool shared,
-                    bool is_pmem,
+                    uint32_t qemu_map_flags,
                     off_t map_offset);
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size);
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index ba15be9c56..5ba89f5460 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -347,6 +347,24 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
 void qemu_vfree(void *ptr);
 void qemu_anon_ram_free(void *ptr, size_t size);
 
+/*
+ * Abstraction of PROT_ and MAP_ flags as passed to mmap(), for example,
+ * consumed by qemu_ram_mmap().
+ */
+
+/* Map PROT_READ instead of PROT_READ | PROT_WRITE. */
+#define QEMU_MAP_READONLY   (1 << 0)
+
+/* Use MAP_SHARED instead of MAP_PRIVATE. */
+#define QEMU_MAP_SHARED     (1 << 1)
+
+/*
+ * Use MAP_SYNC | MAP_SHARED_VALIDATE if supported. Ignored without
+ * QEMU_MAP_SHARED. If mapping fails, warn and fallback to !QEMU_MAP_SYNC.
+ */
+#define QEMU_MAP_SYNC       (1 << 2)
+
+
 #define QEMU_MADV_INVALID -1
 
 #if defined(CONFIG_MADVISE)
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index fdcd38ba61..d1a1a6b72e 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1533,6 +1533,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             off_t offset,
                             Error **errp)
 {
+    uint32_t qemu_map_flags;
     void *area;
 
     block->page_size = qemu_fd_getpagesize(fd);
@@ -1580,9 +1581,10 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, block->mr->align, readonly,
-                         block->flags & RAM_SHARED, block->flags & RAM_PMEM,
-                         offset);
+    qemu_map_flags = readonly ? QEMU_MAP_READONLY : 0;
+    qemu_map_flags |= (block->flags & RAM_SHARED) ? QEMU_MAP_SHARED : 0;
+    qemu_map_flags |= (block->flags & RAM_PMEM) ? QEMU_MAP_SYNC : 0;
+    area = qemu_ram_mmap(fd, memory, block->mr->align, qemu_map_flags, offset);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 0e2bd7bc0e..1ddc0e2a1e 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -118,9 +118,12 @@ static void *mmap_reserve(size_t size, int fd)
  * Activate memory in a reserved region from the given fd (if any), to make
  * it accessible.
  */
-static void *mmap_activate(void *ptr, size_t size, int fd, bool readonly,
-                           bool shared, bool is_pmem, off_t map_offset)
+static void *mmap_activate(void *ptr, size_t size, int fd,
+                           uint32_t qemu_map_flags, off_t map_offset)
 {
+    const bool readonly = qemu_map_flags & QEMU_MAP_READONLY;
+    const bool shared = qemu_map_flags & QEMU_MAP_SHARED;
+    const bool sync = qemu_map_flags & QEMU_MAP_SYNC;
     const int prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
     int map_sync_flags = 0;
     int flags = MAP_FIXED;
@@ -128,7 +131,7 @@ static void *mmap_activate(void *ptr, size_t size, int fd, bool readonly,
 
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-    if (shared && is_pmem) {
+    if (shared && sync) {
         map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
     }
 
@@ -173,9 +176,7 @@ static inline size_t mmap_guard_pagesize(int fd)
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
-                    bool readonly,
-                    bool shared,
-                    bool is_pmem,
+                    uint32_t qemu_map_flags,
                     off_t map_offset)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
@@ -199,7 +200,7 @@ void *qemu_ram_mmap(int fd,
 
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap_activate(guardptr + offset, size, fd, readonly, shared, is_pmem,
+    ptr = mmap_activate(guardptr + offset, size, fd, qemu_map_flags,
                         map_offset);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 36820fec16..c12108c6cb 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -229,8 +229,9 @@ void *qemu_memalign(size_t alignment, size_t size)
 /* alloc shared memory pages */
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
+    const uint32_t qemu_map_flags = shared ? QEMU_MAP_SHARED : 0;
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, false, shared, false, 0);
+    void *ptr = qemu_ram_mmap(-1, size, align, qemu_map_flags, 0);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 10/14] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (8 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 09/14] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-23 20:51   ` Peter Xu
  2021-03-19 10:12 ` [PATCH v4 11/14] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux David Hildenbrand
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The
new flag has the following semantics:

"
RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge
pages if applicable) is skipped: will bail out if not supported. When not
set, the OS might do the reservation, depending on OS support.
"

Allow passing it into:
- memory_region_init_ram_nomigrate()
- memory_region_init_resizeable_ram()
- memory_region_init_ram_from_file()

... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag.
Bail out if the flag is not supported, which is the case right now for
both, POSIX and win32. We will add Linux support next and allow specifying
RAM_NORESERVE via memory backends.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/cpu-common.h |  1 +
 include/exec/memory.h     | 15 ++++++++++++---
 include/exec/ram_addr.h   |  3 ++-
 include/qemu/osdep.h      |  9 ++++++++-
 migration/ram.c           |  3 +--
 softmmu/physmem.c         | 15 +++++++++++----
 util/mmap-alloc.c         |  7 +++++++
 util/oslib-posix.c        |  6 ++++--
 util/oslib-win32.c        | 13 ++++++++++++-
 9 files changed, 58 insertions(+), 14 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 5a0a2d93e0..38a47ad4ac 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -58,6 +58,7 @@ void *qemu_ram_get_host_addr(RAMBlock *rb);
 ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
+bool qemu_ram_is_noreserve(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8879821371..7b700bdc33 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -155,6 +155,13 @@ typedef struct IOMMUTLBEvent {
  */
 #define RAM_UF_WRITEPROTECT (1 << 6)
 
+/*
+ * RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge
+ * pages if applicable) is skipped: will bail out if not supported. When not
+ * set, the OS might do the reservation, depending on OS support.
+ */
+#define RAM_NORESERVE (1 << 7)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
@@ -913,7 +920,7 @@ void memory_region_init_ram_nomigrate(MemoryRegion *mr,
  * @name: Region name, becomes part of RAMBlock name used in migration stream
  *        must be unique within any device
  * @size: size of the region.
- * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE.
  * @errp: pointer to Error*, to store an error if it happens.
  *
  * Note that this function does not do anything to cause the data in the
@@ -967,7 +974,8 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  *         (getpagesize()) will be used.
- * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
+ *             RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
@@ -993,7 +1001,8 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
  * @owner: the object that tracks the region's reference count
  * @name: the name of the region.
  * @size: size of the region.
- * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
+ *             RAM_NORESERVE.
  * @fd: the fd to mmap.
  * @offset: offset within the file referenced by fd
  * @errp: pointer to Error*, to store an error if it happens.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6d4513f8e2..551876bed0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -104,7 +104,8 @@ long qemu_maxrampagesize(void);
  * Parameters:
  *  @size: the size in bytes of the ram block
  *  @mr: the memory region where the ram block is
- *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
+ *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
+ *              RAM_NORESERVE.
  *  @mem_path or @fd: specify the backing file or device
  *  @readonly: true to open @path for reading, false for read/write.
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 5ba89f5460..877d9a5eff 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -343,7 +343,8 @@ extern int daemon(int, int);
 int qemu_daemon(int nochdir, int noclose);
 void *qemu_try_memalign(size_t alignment, size_t size);
 void *qemu_memalign(size_t alignment, size_t size);
-void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
+void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared,
+                          bool noreserve);
 void qemu_vfree(void *ptr);
 void qemu_anon_ram_free(void *ptr, size_t size);
 
@@ -364,6 +365,12 @@ void qemu_anon_ram_free(void *ptr, size_t size);
  */
 #define QEMU_MAP_SYNC       (1 << 2)
 
+/*
+ * Use MAP_NORESERVE to skip reservation of swap space (or huge pages if
+ * applicable). Bail out if not supported/effective.
+ */
+#define QEMU_MAP_NORESERVE  (1 << 3)
+
 
 #define QEMU_MADV_INVALID -1
 
diff --git a/migration/ram.c b/migration/ram.c
index 40e78952ad..6a9c7d6e2f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3320,8 +3320,7 @@ int colo_init_ram_cache(void)
     WITH_RCU_READ_LOCK_GUARD() {
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             block->colo_cache = qemu_anon_ram_alloc(block->used_length,
-                                                    NULL,
-                                                    false);
+                                                    NULL, false, false);
             if (!block->colo_cache) {
                 error_report("%s: Can't alloc memory for COLO cache of block %s,"
                              "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index d1a1a6b72e..608ba75d18 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1584,6 +1584,7 @@ static void *file_ram_alloc(RAMBlock *block,
     qemu_map_flags = readonly ? QEMU_MAP_READONLY : 0;
     qemu_map_flags |= (block->flags & RAM_SHARED) ? QEMU_MAP_SHARED : 0;
     qemu_map_flags |= (block->flags & RAM_PMEM) ? QEMU_MAP_SYNC : 0;
+    qemu_map_flags |= (block->flags & RAM_NORESERVE) ? QEMU_MAP_NORESERVE : 0;
     area = qemu_ram_mmap(fd, memory, block->mr->align, qemu_map_flags, offset);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
@@ -1704,6 +1705,11 @@ bool qemu_ram_is_shared(RAMBlock *rb)
     return rb->flags & RAM_SHARED;
 }
 
+bool qemu_ram_is_noreserve(RAMBlock *rb)
+{
+    return rb->flags & RAM_NORESERVE;
+}
+
 /* Note: Only set at the start of postcopy */
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb)
 {
@@ -1931,6 +1937,7 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
 
 static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
+    const bool noreserve = qemu_ram_is_noreserve(new_block);
     const bool shared = qemu_ram_is_shared(new_block);
     RAMBlock *block;
     RAMBlock *last_block = NULL;
@@ -1954,7 +1961,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         } else {
             new_block->host = qemu_anon_ram_alloc(new_block->max_length,
                                                   &new_block->mr->align,
-                                                  shared);
+                                                  shared, noreserve);
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
@@ -2025,7 +2032,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     int64_t file_size, file_align;
 
     /* Just support these ram flags by now. */
-    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE)) == 0);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
@@ -2116,7 +2123,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     RAMBlock *new_block;
     Error *local_err = NULL;
 
-    assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_NORESERVE)) == 0);
 
     size = HOST_PAGE_ALIGN(size);
     max_size = HOST_PAGE_ALIGN(max_size);
@@ -2151,7 +2158,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
-    assert((ram_flags & ~RAM_SHARED) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE)) == 0);
     return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 1ddc0e2a1e..d0cf4aaee5 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "qemu/error-report.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
 
@@ -121,6 +122,7 @@ static void *mmap_reserve(size_t size, int fd)
 static void *mmap_activate(void *ptr, size_t size, int fd,
                            uint32_t qemu_map_flags, off_t map_offset)
 {
+    const bool noreserve = qemu_map_flags & QEMU_MAP_NORESERVE;
     const bool readonly = qemu_map_flags & QEMU_MAP_READONLY;
     const bool shared = qemu_map_flags & QEMU_MAP_SHARED;
     const bool sync = qemu_map_flags & QEMU_MAP_SYNC;
@@ -129,6 +131,11 @@ static void *mmap_activate(void *ptr, size_t size, int fd,
     int flags = MAP_FIXED;
     void *activated_ptr;
 
+    if (noreserve) {
+        error_report("Skipping reservation of swap space is not supported");
+        return MAP_FAILED;
+    }
+
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
     if (shared && sync) {
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index c12108c6cb..5bbdebae0e 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -227,9 +227,11 @@ void *qemu_memalign(size_t alignment, size_t size)
 }
 
 /* alloc shared memory pages */
-void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
+void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared,
+                          bool noreserve)
 {
-    const uint32_t qemu_map_flags = shared ? QEMU_MAP_SHARED : 0;
+    const uint32_t qemu_map_flags = (shared ? QEMU_MAP_SHARED : 0) |
+                                    (noreserve ? QEMU_MAP_NORESERVE : 0);
     size_t align = QEMU_VMALLOC_ALIGN;
     void *ptr = qemu_ram_mmap(-1, size, align, qemu_map_flags, 0);
 
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index f68b8012bb..8cafe44179 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -39,6 +39,7 @@
 #include "trace.h"
 #include "qemu/sockets.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 #include <malloc.h>
 
 /* this must come after including "trace.h" */
@@ -77,10 +78,20 @@ static int get_allocation_granularity(void)
     return system_info.dwAllocationGranularity;
 }
 
-void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared)
+void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared,
+                          bool noreserve)
 {
     void *ptr;
 
+    if (noreserve) {
+        /*
+         * We need a MEM_COMMIT before accessing any memory in a MEM_RESERVE
+         * area; we cannot easily mimic POSIX MAP_NORESERVE semantics.
+         */
+        error_report("Skipping reservation of swap space is not supported.");
+        return NULL;
+    }
+
     ptr = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
     trace_qemu_anon_ram_alloc(size, ptr);
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 11/14] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (9 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 10/14] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-23 20:56   ` Peter Xu
  2021-03-19 10:12 ` [PATCH v4 12/14] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's support RAM_NORESERVE via MAP_NORESERVE on Linux. The flag has no
effect on most shared mappings - except for hugetlbfs and anonymous memory.

Linux man page:
  "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap
  space is reserved, one has the guarantee that it is possible to modify
  the mapping. When swap space is not reserved one might get SIGSEGV
  upon a write if no physical memory is available. See also the discussion
  of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before
  2.6, this flag had effect only for private writable mappings."

Note that the "guarantee" part is wrong with memory overcommit in Linux.

Also, in Linux hugetlbfs is treated differently - we configure reservation
of huge pages from the pool, not reservation of swap space (huge pages
cannot be swapped).

The rough behavior is [1]:
a) !Hugetlbfs:

  1) Without MAP_NORESERVE *or* with memory overcommit under Linux
     disabled ("/proc/sys/vm/overcommit_memory == 2"), the following
     accounting/reservation happens:
      For a file backed map
       SHARED or READ-only - 0 cost (the file is the map not swap)
       PRIVATE WRITABLE - size of mapping per instance

      For an anonymous or /dev/zero map
       SHARED   - size of mapping
       PRIVATE READ-only - 0 cost (but of little use)
       PRIVATE WRITABLE - size of mapping per instance

  2) With MAP_NORESERVE, no accounting/reservation happens.

b) Hugetlbfs:

  1) Without MAP_NORESERVE, huge pages are reserved.

  2) With MAP_NORESERVE, no huge pages are reserved.

Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able
to configure it for !hugetlbfs globally; this toggle now allows
configuring it more fine-grained, not for the whole system.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM.

[1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/osdep.h |  3 ++
 softmmu/physmem.c    |  1 +
 util/mmap-alloc.c    | 69 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 877d9a5eff..d5c1860508 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -176,6 +176,9 @@ extern int daemon(int, int);
 #ifndef MAP_FIXED_NOREPLACE
 #define MAP_FIXED_NOREPLACE 0
 #endif
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
 #ifndef ENOMEDIUM
 #define ENOMEDIUM ENODEV
 #endif
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 608ba75d18..0d4376b04e 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2229,6 +2229,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                 flags = MAP_FIXED;
                 flags |= block->flags & RAM_SHARED ?
                          MAP_SHARED : MAP_PRIVATE;
+                flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
                 if (block->fd >= 0) {
                     area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
                                 flags, block->fd, offset);
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index d0cf4aaee5..838e286ce5 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
@@ -83,6 +84,70 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+#define OVERCOMMIT_MEMORY_PATH "/proc/sys/vm/overcommit_memory"
+static bool map_noreserve_effective(int fd, uint32_t qemu_map_flags)
+{
+#if defined(__linux__)
+    const bool readonly = qemu_map_flags & QEMU_MAP_READONLY;
+    const bool shared = qemu_map_flags & QEMU_MAP_SHARED;
+    gchar *content = NULL;
+    const char *endptr;
+    unsigned int tmp;
+
+    /*
+     * hugeltb accounting is different than ordinary swap reservation:
+     * a) Hugetlb pages from the pool are reserved for both private and
+     *    shared mappings. For shared mappings, all mappers have to specify
+     *    MAP_NORESERVE.
+     * b) MAP_NORESERVE is not affected by /proc/sys/vm/overcommit_memory.
+     */
+    if (qemu_fd_getpagesize(fd) != qemu_real_host_page_size) {
+        return true;
+    }
+
+    /*
+     * Accountable mappings in the kernel that can be affected by MAP_NORESEVE
+     * are private writable mappings (see mm/mmap.c:accountable_mapping() in
+     * Linux). For all shared or readonly mappings, MAP_NORESERVE is always
+     * implicitly active -- no reservation; this includes shmem. The only
+     * exception is shared anonymous memory, it is accounted like private
+     * anonymous memory.
+     */
+    if (readonly || (shared && fd >= 0)) {
+        return true;
+    }
+
+    /*
+     * MAP_NORESERVE is globally ignored for applicable !hugetlb mappings when
+     * memory overcommit is set to "never". Sparse memory regions aren't really
+     * possible in this system configuration.
+     *
+     * Bail out now instead of silently committing way more memory than
+     * currently desired by the user.
+     */
+    if (g_file_get_contents(OVERCOMMIT_MEMORY_PATH, &content, NULL, NULL) &&
+        !qemu_strtoui(content, &endptr, 0, &tmp) &&
+        (!endptr || *endptr == '\n')) {
+        if (tmp == 2) {
+            error_report("Skipping reservation of swap space is not supported:"
+                         " \"" OVERCOMMIT_MEMORY_PATH "\" is \"2\"");
+            return false;
+        }
+        return true;
+    }
+    /* this interface has been around since Linux 2.6 */
+    error_report("Skipping reservation of swap space is not supported:"
+                 " Could not read: \"" OVERCOMMIT_MEMORY_PATH "\"");
+    return false;
+#endif
+    /*
+     * E.g., FreeBSD used to define MAP_NORESERVE, never implemented it,
+     * and removed it a while ago.
+     */
+    error_report("Skipping reservation of swap space is not supported");
+    return false;
+}
+
 /*
  * Reserve a new memory region of the requested size to be used for mapping
  * from the given fd (if any).
@@ -131,13 +196,13 @@ static void *mmap_activate(void *ptr, size_t size, int fd,
     int flags = MAP_FIXED;
     void *activated_ptr;
 
-    if (noreserve) {
-        error_report("Skipping reservation of swap space is not supported");
+    if (noreserve && !map_noreserve_effective(fd, qemu_map_flags)) {
         return MAP_FAILED;
     }
 
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    flags |= noreserve ? MAP_NORESERVE : 0;
     if (shared && sync) {
         map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
     }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 12/14] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (10 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 11/14] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 13/14] qmp: Include "reserve" property of memory backends David Hildenbrand
  2021-03-19 10:12 ` [PATCH v4 14/14] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
  13 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's provide a way to control the use of RAM_NORESERVE via memory
backends using the "reserve" property which defaults to true (old
behavior).

Only Linux currently supports setting the flag (and support is checked at
runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
Windows and other POSIX systems will bail out.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. This essentially allows
avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
virtio-mem and also supporting hugetlbfs in the future.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem-file.c  | 11 ++++++-----
 backends/hostmem-memfd.c |  1 +
 backends/hostmem-ram.c   |  1 +
 backends/hostmem.c       | 33 +++++++++++++++++++++++++++++++++
 include/sysemu/hostmem.h |  2 +-
 5 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index b683da9daf..9d550e53d4 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -40,6 +40,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
                object_get_typename(OBJECT(backend)));
 #else
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
+    uint32_t ram_flags;
     gchar *name;
 
     if (!backend->size) {
@@ -52,11 +53,11 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
-                                     name,
-                                     backend->size, fb->align,
-                                     (backend->share ? RAM_SHARED : 0) |
-                                     (fb->is_pmem ? RAM_PMEM : 0),
+    ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
+    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
+                                     backend->size, fb->align, ram_flags,
                                      fb->mem_path, fb->readonly, errp);
     g_free(name);
 #endif
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 93b5d1a4cf..f3436b623d 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
                                    backend->size, ram_flags, fd, 0, errp);
     g_free(name);
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 741e701062..b8e55cdbd0 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -29,6 +29,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
                                            backend->size, ram_flags, errp);
     g_free(name);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index c6c1ff5b99..9ec5141112 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -217,6 +217,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
     Error *local_err = NULL;
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 
+    if (!backend->reserve && value) {
+        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
+        return;
+    }
+
     if (!host_memory_backend_mr_inited(backend)) {
         backend->prealloc = value;
         return;
@@ -268,6 +273,7 @@ static void host_memory_backend_init(Object *obj)
     /* TODO: convert access to globals to compat properties */
     backend->merge = machine_mem_merge(machine);
     backend->dump = machine_dump_guest_core(machine);
+    backend->reserve = true;
     backend->prealloc_threads = 1;
 }
 
@@ -426,6 +432,28 @@ static void host_memory_backend_set_share(Object *o, bool value, Error **errp)
     backend->share = value;
 }
 
+static bool host_memory_backend_get_reserve(Object *o, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    return backend->reserve;
+}
+
+static void host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(errp, "cannot change property value");
+        return;
+    }
+    if (backend->prealloc && !value) {
+        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
+        return;
+    }
+    backend->reserve = value;
+}
+
 static bool
 host_memory_backend_get_use_canonical_path(Object *obj, Error **errp)
 {
@@ -494,6 +522,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
         host_memory_backend_get_share, host_memory_backend_set_share);
     object_class_property_set_description(oc, "share",
         "Mark the memory as private to QEMU or shared");
+    object_class_property_add_bool(oc, "reserve",
+        host_memory_backend_get_reserve, host_memory_backend_set_reserve);
+    object_class_property_set_description(oc, "reserve",
+        "Reserve swap space (or huge pages if applicable) for the whole memory"
+        " backend, if supported by the OS.");
     /*
      * Do not delete/rename option. This option must be considered stable
      * (as if it didn't have the 'x-' prefix including deprecation period) as
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index df5644723a..9ff5c16963 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -64,7 +64,7 @@ struct HostMemoryBackend {
     /* protected */
     uint64_t size;
     bool merge, dump, use_canonical_path;
-    bool prealloc, is_mapped, share;
+    bool prealloc, is_mapped, share, reserve;
     uint32_t prealloc_threads;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
     HostMemPolicy policy;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 13/14] qmp: Include "reserve" property of memory backends
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (11 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 12/14] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-19 15:40   ` Markus Armbruster
  2021-03-19 10:12 ` [PATCH v4 14/14] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
  13 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's include the new property.

Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine-qmp-cmds.c | 1 +
 qapi/machine.json          | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 68a942595a..bd2a7f2dd0 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque)
         m->merge = object_property_get_bool(obj, "merge", &error_abort);
         m->dump = object_property_get_bool(obj, "dump", &error_abort);
         m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
+        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
         m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
                                              &error_abort);
         host_nodes = object_property_get_qobject(obj,
diff --git a/qapi/machine.json b/qapi/machine.json
index c0c52aef10..12860a1f79 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -814,6 +814,11 @@
 #
 # @prealloc: enables or disables memory preallocation
 #
+# @reserve: enables or disables reservation of swap space (or huge pages
+#           if applicable). If reservation is enabled (default), actual
+#           reservation depends on underlying OS support. In contrast,
+#           disabling reservation without OS support will bail out. (since 6.1)
+#
 # @host-nodes: host nodes for its memory policy
 #
 # @policy: memory policy of memory backend
@@ -827,6 +832,7 @@
     'merge':      'bool',
     'dump':       'bool',
     'prealloc':   'bool',
+    'reserve':    'bool',
     'host-nodes': ['uint16'],
     'policy':     'HostMemPolicy' }}
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 14/14] hmp: Print "reserve" property of memory backends with "info memdev"
  2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (12 preceding siblings ...)
  2021-03-19 10:12 ` [PATCH v4 13/14] qmp: Include "reserve" property of memory backends David Hildenbrand
@ 2021-03-19 10:12 ` David Hildenbrand
  2021-03-25 19:00   ` Dr. David Alan Gilbert
  13 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 10:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

Let's print the new property.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine-hmp-cmds.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index 58248cffa3..fb717f69b9 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -110,6 +110,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
                        m->value->dump ? "true" : "false");
         monitor_printf(mon, "  prealloc: %s\n",
                        m->value->prealloc ? "true" : "false");
+        monitor_printf(mon, "  reserve: %s\n",
+                       m->value->reserve ? "true" : "false");
         monitor_printf(mon, "  policy: %s\n",
                        HostMemPolicy_str(m->value->policy));
         visit_complete(v, &str);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 13/14] qmp: Include "reserve" property of memory backends
  2021-03-19 10:12 ` [PATCH v4 13/14] qmp: Include "reserve" property of memory backends David Hildenbrand
@ 2021-03-19 15:40   ` Markus Armbruster
  2021-03-19 15:49     ` David Hildenbrand
  0 siblings, 1 reply; 25+ messages in thread
From: Markus Armbruster @ 2021-03-19 15:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Igor Kotrasinski, Richard Henderson, qemu-devel, Peter Xu,
	Dr. David Alan Gilbert, Greg Kurz, Paolo Bonzini,
	Stefan Hajnoczi, Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

David Hildenbrand <david@redhat.com> writes:

> Let's include the new property.
>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/core/machine-qmp-cmds.c | 1 +
>  qapi/machine.json          | 6 ++++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
> index 68a942595a..bd2a7f2dd0 100644
> --- a/hw/core/machine-qmp-cmds.c
> +++ b/hw/core/machine-qmp-cmds.c
> @@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque)
>          m->merge = object_property_get_bool(obj, "merge", &error_abort);
>          m->dump = object_property_get_bool(obj, "dump", &error_abort);
>          m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>          m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>                                               &error_abort);
>          host_nodes = object_property_get_qobject(obj,
> diff --git a/qapi/machine.json b/qapi/machine.json
> index c0c52aef10..12860a1f79 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -814,6 +814,11 @@
>  #
>  # @prealloc: enables or disables memory preallocation
>  #
> +# @reserve: enables or disables reservation of swap space (or huge pages
> +#           if applicable). If reservation is enabled (default), actual
> +#           reservation depends on underlying OS support. In contrast,
> +#           disabling reservation without OS support will bail out. (since 6.1)
> +#

Provides two settings: "enable reservation if possible", and "disable
reservation or else fail".

Does "enable reservation or else fail" make no sense, or is it merely
unimplemented?

>  # @host-nodes: host nodes for its memory policy
>  #
>  # @policy: memory policy of memory backend
> @@ -827,6 +832,7 @@
>      'merge':      'bool',
>      'dump':       'bool',
>      'prealloc':   'bool',
> +    'reserve':    'bool',
>      'host-nodes': ['uint16'],
>      'policy':     'HostMemPolicy' }}



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 13/14] qmp: Include "reserve" property of memory backends
  2021-03-19 15:40   ` Markus Armbruster
@ 2021-03-19 15:49     ` David Hildenbrand
  2021-03-19 16:32       ` Markus Armbruster
  0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 15:49 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Igor Kotrasinski, Richard Henderson, qemu-devel, Peter Xu,
	Dr. David Alan Gilbert, Greg Kurz, Paolo Bonzini,
	Stefan Hajnoczi, Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

On 19.03.21 16:40, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> Let's include the new property.
>>
>> Cc: Eric Blake <eblake@redhat.com>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   hw/core/machine-qmp-cmds.c | 1 +
>>   qapi/machine.json          | 6 ++++++
>>   2 files changed, 7 insertions(+)
>>
>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>> index 68a942595a..bd2a7f2dd0 100644
>> --- a/hw/core/machine-qmp-cmds.c
>> +++ b/hw/core/machine-qmp-cmds.c
>> @@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque)
>>           m->merge = object_property_get_bool(obj, "merge", &error_abort);
>>           m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>           m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>           m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>                                                &error_abort);
>>           host_nodes = object_property_get_qobject(obj,
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index c0c52aef10..12860a1f79 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -814,6 +814,11 @@
>>   #
>>   # @prealloc: enables or disables memory preallocation
>>   #
>> +# @reserve: enables or disables reservation of swap space (or huge pages
>> +#           if applicable). If reservation is enabled (default), actual
>> +#           reservation depends on underlying OS support. In contrast,
>> +#           disabling reservation without OS support will bail out. (since 6.1)
>> +#
> 
> Provides two settings: "enable reservation if possible", and "disable
> reservation or else fail".
> 
> Does "enable reservation or else fail" make no sense, or is it merely
> unimplemented?

The default for now used to be "enable reservation if possible". For 
example, Windows always reserves/commits the whole region. Under Linux, 
reservation is always done for private memory mappings, however, 
especially for basically all (with one exception) shared memory there is 
no reservation of any kind (with another exception).

For example, it does not make sense to reserve swap space for a 
file-backed mapping; we can just writeback to the file in case we run 
out of memory. Therefore, Linux will never reserve swap space in that case.

So if we were to implement a "enable reservation or else fail", the 
default ("true") would no longer work for existing setups.

Usually we want "enable reservation if possible" unless in spacial cases 
("definitely avoid the reservation")

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 13/14] qmp: Include "reserve" property of memory backends
  2021-03-19 15:49     ` David Hildenbrand
@ 2021-03-19 16:32       ` Markus Armbruster
  2021-03-19 16:40         ` David Hildenbrand
  0 siblings, 1 reply; 25+ messages in thread
From: Markus Armbruster @ 2021-03-19 16:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Igor Kotrasinski, Richard Henderson, qemu-devel, Peter Xu,
	Dr. David Alan Gilbert, Greg Kurz, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Igor Mammedov

David Hildenbrand <david@redhat.com> writes:

> On 19.03.21 16:40, Markus Armbruster wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> Let's include the new property.
>>>
>>> Cc: Eric Blake <eblake@redhat.com>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   hw/core/machine-qmp-cmds.c | 1 +
>>>   qapi/machine.json          | 6 ++++++
>>>   2 files changed, 7 insertions(+)
>>>
>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>> index 68a942595a..bd2a7f2dd0 100644
>>> --- a/hw/core/machine-qmp-cmds.c
>>> +++ b/hw/core/machine-qmp-cmds.c
>>> @@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>           m->merge = object_property_get_bool(obj, "merge", &error_abort);
>>>           m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>           m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>           m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>                                                &error_abort);
>>>           host_nodes = object_property_get_qobject(obj,
>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index c0c52aef10..12860a1f79 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -814,6 +814,11 @@
>>>   #
>>>   # @prealloc: enables or disables memory preallocation
>>>   #
>>> +# @reserve: enables or disables reservation of swap space (or huge pages
>>> +#           if applicable). If reservation is enabled (default), actual
>>> +#           reservation depends on underlying OS support. In contrast,
>>> +#           disabling reservation without OS support will bail out. (since 6.1)
>>> +#
>>
>> Provides two settings: "enable reservation if possible", and "disable
>> reservation or else fail".
>>
>> Does "enable reservation or else fail" make no sense, or is it merely
>> unimplemented?
>
> The default for now used to be "enable reservation if possible". For
> example, Windows always reserves/commits the whole region. Under
> Linux, reservation is always done for private memory mappings,
> however, especially for basically all (with one exception) shared
> memory there is no reservation of any kind (with another exception).
>
> For example, it does not make sense to reserve swap space for a
> file-backed mapping; we can just writeback to the file in case we run 
> out of memory. Therefore, Linux will never reserve swap space in that case.
>
> So if we were to implement a "enable reservation or else fail", the
> default ("true") would no longer work for existing setups.
>
> Usually we want "enable reservation if possible" unless in spacial
> cases ("definitely avoid the reservation")

Wait a second...  struct Memdev is actually the result of query-memdev,
and *not* a command or option argument.

Saying "enables or disables reservation of swap space" is misleading.
This isn't ever about enabling or disabling things, it's about querying
whether things are enabled or disabled.

Existing member documentation has the same issue:

    # @merge: enables or disables memory merge support
    #
    # @dump: includes memory backend's memory in a core dump or not
    #
    # @prealloc: enables or disables memory preallocation

Should be something like

    # @merge: whether memory merge support is enabled
    #
    # @dump: whether the memory backend's memory is included in a core dump
    #
    # @prealloc: whether memory is preallocated

The new member could be phrased like:

    # @reserved: whether swap space (or huge pages if applicable) have
    # been reserved.

Mind, I'm proposing how to phrase things, not how things are.  You'll
likely have to adjust the contents of my proposal to match reality.

If we can't always tell whether swap space (or whatever) has been
reserved, then

* If we can only ever tell when it has *not* been reserved, make false
  mean "not reserved", and true mean "dunno".

* If we can tell sometimes

  - but nobody cares for the difference between "reserved" and "dunno",
    same as above.

  - and users may care for the difference, we need three values: "not
    reserved", "reserved", and "dunno".  There are various ways to do
    that.  No use talking about them before we know we need one of them.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 13/14] qmp: Include "reserve" property of memory backends
  2021-03-19 16:32       ` Markus Armbruster
@ 2021-03-19 16:40         ` David Hildenbrand
  0 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-19 16:40 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Igor Kotrasinski, Richard Henderson, qemu-devel, Peter Xu,
	Dr. David Alan Gilbert, Greg Kurz, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Igor Mammedov

On 19.03.21 17:32, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 19.03.21 16:40, Markus Armbruster wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> Let's include the new property.
>>>>
>>>> Cc: Eric Blake <eblake@redhat.com>
>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>    hw/core/machine-qmp-cmds.c | 1 +
>>>>    qapi/machine.json          | 6 ++++++
>>>>    2 files changed, 7 insertions(+)
>>>>
>>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>>> index 68a942595a..bd2a7f2dd0 100644
>>>> --- a/hw/core/machine-qmp-cmds.c
>>>> +++ b/hw/core/machine-qmp-cmds.c
>>>> @@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>>            m->merge = object_property_get_bool(obj, "merge", &error_abort);
>>>>            m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>>            m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>>            m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>>                                                 &error_abort);
>>>>            host_nodes = object_property_get_qobject(obj,
>>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>>> index c0c52aef10..12860a1f79 100644
>>>> --- a/qapi/machine.json
>>>> +++ b/qapi/machine.json
>>>> @@ -814,6 +814,11 @@
>>>>    #
>>>>    # @prealloc: enables or disables memory preallocation
>>>>    #
>>>> +# @reserve: enables or disables reservation of swap space (or huge pages
>>>> +#           if applicable). If reservation is enabled (default), actual
>>>> +#           reservation depends on underlying OS support. In contrast,
>>>> +#           disabling reservation without OS support will bail out. (since 6.1)
>>>> +#
>>>
>>> Provides two settings: "enable reservation if possible", and "disable
>>> reservation or else fail".
>>>
>>> Does "enable reservation or else fail" make no sense, or is it merely
>>> unimplemented?
>>
>> The default for now used to be "enable reservation if possible". For
>> example, Windows always reserves/commits the whole region. Under
>> Linux, reservation is always done for private memory mappings,
>> however, especially for basically all (with one exception) shared
>> memory there is no reservation of any kind (with another exception).
>>
>> For example, it does not make sense to reserve swap space for a
>> file-backed mapping; we can just writeback to the file in case we run
>> out of memory. Therefore, Linux will never reserve swap space in that case.
>>
>> So if we were to implement a "enable reservation or else fail", the
>> default ("true") would no longer work for existing setups.
>>
>> Usually we want "enable reservation if possible" unless in spacial
>> cases ("definitely avoid the reservation")
> 
> Wait a second...  struct Memdev is actually the result of query-memdev,
> and *not* a command or option argument.
> 
> Saying "enables or disables reservation of swap space" is misleading.
> This isn't ever about enabling or disabling things, it's about querying
> whether things are enabled or disabled.
> 
> Existing member documentation has the same issue:
> 
>      # @merge: enables or disables memory merge support
>      #
>      # @dump: includes memory backend's memory in a core dump or not
>      #
>      # @prealloc: enables or disables memory preallocation

Yes, I was only playing along although it looked kind of weird ...

> 
> Should be something like
> 
>      # @merge: whether memory merge support is enabled
>      #
>      # @dump: whether the memory backend's memory is included in a core dump
>      #
>      # @prealloc: whether memory is preallocated
> 

I'll include a cleanup for these in the next version.


> The new member could be phrased like:
> 
>      # @reserved: whether swap space (or huge pages if applicable) have
>      # been reserved.
> 
> Mind, I'm proposing how to phrase things, not how things are.  You'll
> likely have to adjust the contents of my proposal to match reality.
> 
> If we can't always tell whether swap space (or whatever) has been
> reserved, then
> 
> * If we can only ever tell when it has *not* been reserved, make false
>    mean "not reserved", and true mean "dunno".
> 
> * If we can tell sometimes
> 
>    - but nobody cares for the difference between "reserved" and "dunno",
>      same as above.
> 
>    - and users may care for the difference, we need three values: "not
>      reserved", "reserved", and "dunno".  There are various ways to do
>      that.  No use talking about them before we know we need one of them.

Right, usually we care about "reserve is a reservation makes sense and 
is possible" - decided by the OS and "definitely don't reserve if you 
would have reserved anything".

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 03/14] softmmu/physmem: Fix qemu_ram_remap() to handle shared anonymous memory
  2021-03-19 10:12 ` [PATCH v4 03/14] softmmu/physmem: Fix qemu_ram_remap() " David Hildenbrand
@ 2021-03-23 20:40   ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2021-03-23 20:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Igor Kotrasinski, Eduardo Habkost,
	Michael S. Tsirkin, Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, qemu-devel, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On Fri, Mar 19, 2021 at 11:12:19AM +0100, David Hildenbrand wrote:
> RAM_SHARED now also properly indicates shared anonymous memory. Let's check
> that flag for anonymous memory as well, to restore the proper mapping.
> 
> Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 09/14] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()
  2021-03-19 10:12 ` [PATCH v4 09/14] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
@ 2021-03-23 20:49   ` Peter Xu
  2021-03-25  9:40     ` David Hildenbrand
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2021-03-23 20:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Igor Kotrasinski, Eduardo Habkost,
	Michael S. Tsirkin, Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, qemu-devel, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On Fri, Mar 19, 2021 at 11:12:25AM +0100, David Hildenbrand wrote:
> Let's pass flags instead of bools to prepare for passing other flags and
> update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_
> flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify
> it.
> 
> We expose only flags that are currently supported by qemu_ram_mmap().
> Maybe, we'll see qemu_mmap() in the future as well that can implement these
> flags.
> 
> Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only
> defined for some systems and we want to always be able to identify
> these flags reliably inside qemu_ram_mmap() -- for example, to properly
> warn when some future flags are not available or effective on a system.
> Also, this way we can simplify PROT_ handling as well.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/qemu/mmap-alloc.h | 16 +++++++++-------
>  include/qemu/osdep.h      | 18 ++++++++++++++++++
>  softmmu/physmem.c         |  8 +++++---
>  util/mmap-alloc.c         | 15 ++++++++-------
>  util/oslib-posix.c        |  3 ++-
>  5 files changed, 42 insertions(+), 18 deletions(-)
> 
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 456ff87df1..a60a2085b3 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,18 +7,22 @@ size_t qemu_fd_getpagesize(int fd);
>  size_t qemu_mempath_getpagesize(const char *mem_path);
>  
>  /**
> - * qemu_ram_mmap: mmap the specified file or device.
> + * qemu_ram_mmap: mmap anonymous memory, the specified file or device.
> + *
> + * mmap() abstraction to map guest RAM, simplifying flag handling, taking
> + * care of alignment requirements and installing guard pages.
>   *
>   * Parameters:
>   *  @fd: the file or the device to mmap
>   *  @size: the number of bytes to be mmaped
>   *  @align: if not zero, specify the alignment of the starting mapping address;
>   *          otherwise, the alignment in use will be determined by QEMU.
> - *  @readonly: true for a read-only mapping, false for read/write.
> - *  @shared: map has RAM_SHARED flag.
> - *  @is_pmem: map has RAM_PMEM flag.
> + *  @qemu_map_flags: QEMU_MAP_* flags
>   *  @map_offset: map starts at offset of map_offset from the start of fd
>   *
> + * Internally, MAP PRIVATE, MAP_ANONYMOUS and MAP_SHARED_VALIDATE are set
                     ^
                     |
                     +---- underscore

> + * implicitly based on other parameters.
> + *

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 10/14] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()
  2021-03-19 10:12 ` [PATCH v4 10/14] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
@ 2021-03-23 20:51   ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2021-03-23 20:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Igor Kotrasinski, Eduardo Habkost,
	Michael S. Tsirkin, Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, qemu-devel, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On Fri, Mar 19, 2021 at 11:12:26AM +0100, David Hildenbrand wrote:
> Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The
> new flag has the following semantics:
> 
> "
> RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge
> pages if applicable) is skipped: will bail out if not supported. When not
> set, the OS might do the reservation, depending on OS support.
> "
> 
> Allow passing it into:
> - memory_region_init_ram_nomigrate()
> - memory_region_init_resizeable_ram()
> - memory_region_init_ram_from_file()
> 
> ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag.
> Bail out if the flag is not supported, which is the case right now for
> both, POSIX and win32. We will add Linux support next and allow specifying
> RAM_NORESERVE via memory backends.
> 
> The target use case is virtio-mem, which dynamically exposes memory
> inside a large, sparse memory area to the VM.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 11/14] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux
  2021-03-19 10:12 ` [PATCH v4 11/14] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux David Hildenbrand
@ 2021-03-23 20:56   ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2021-03-23 20:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Igor Kotrasinski, Eduardo Habkost,
	Michael S. Tsirkin, Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, qemu-devel, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On Fri, Mar 19, 2021 at 11:12:27AM +0100, David Hildenbrand wrote:
> Let's support RAM_NORESERVE via MAP_NORESERVE on Linux. The flag has no
> effect on most shared mappings - except for hugetlbfs and anonymous memory.
> 
> Linux man page:
>   "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap
>   space is reserved, one has the guarantee that it is possible to modify
>   the mapping. When swap space is not reserved one might get SIGSEGV
>   upon a write if no physical memory is available. See also the discussion
>   of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before
>   2.6, this flag had effect only for private writable mappings."
> 
> Note that the "guarantee" part is wrong with memory overcommit in Linux.
> 
> Also, in Linux hugetlbfs is treated differently - we configure reservation
> of huge pages from the pool, not reservation of swap space (huge pages
> cannot be swapped).
> 
> The rough behavior is [1]:
> a) !Hugetlbfs:
> 
>   1) Without MAP_NORESERVE *or* with memory overcommit under Linux
>      disabled ("/proc/sys/vm/overcommit_memory == 2"), the following
>      accounting/reservation happens:
>       For a file backed map
>        SHARED or READ-only - 0 cost (the file is the map not swap)
>        PRIVATE WRITABLE - size of mapping per instance
> 
>       For an anonymous or /dev/zero map
>        SHARED   - size of mapping
>        PRIVATE READ-only - 0 cost (but of little use)
>        PRIVATE WRITABLE - size of mapping per instance
> 
>   2) With MAP_NORESERVE, no accounting/reservation happens.
> 
> b) Hugetlbfs:
> 
>   1) Without MAP_NORESERVE, huge pages are reserved.
> 
>   2) With MAP_NORESERVE, no huge pages are reserved.
> 
> Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able
> to configure it for !hugetlbfs globally; this toggle now allows
> configuring it more fine-grained, not for the whole system.
> 
> The target use case is virtio-mem, which dynamically exposes memory
> inside a large, sparse memory area to the VM.
> 
> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 09/14] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()
  2021-03-23 20:49   ` Peter Xu
@ 2021-03-25  9:40     ` David Hildenbrand
  0 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-03-25  9:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: Marcel Apfelbaum, Igor Kotrasinski, Eduardo Habkost,
	Michael S. Tsirkin, Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, qemu-devel, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

On 23.03.21 21:49, Peter Xu wrote:
> On Fri, Mar 19, 2021 at 11:12:25AM +0100, David Hildenbrand wrote:
>> Let's pass flags instead of bools to prepare for passing other flags and
>> update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_
>> flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify
>> it.
>>
>> We expose only flags that are currently supported by qemu_ram_mmap().
>> Maybe, we'll see qemu_mmap() in the future as well that can implement these
>> flags.
>>
>> Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only
>> defined for some systems and we want to always be able to identify
>> these flags reliably inside qemu_ram_mmap() -- for example, to properly
>> warn when some future flags are not available or effective on a system.
>> Also, this way we can simplify PROT_ handling as well.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   include/qemu/mmap-alloc.h | 16 +++++++++-------
>>   include/qemu/osdep.h      | 18 ++++++++++++++++++
>>   softmmu/physmem.c         |  8 +++++---
>>   util/mmap-alloc.c         | 15 ++++++++-------
>>   util/oslib-posix.c        |  3 ++-
>>   5 files changed, 42 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
>> index 456ff87df1..a60a2085b3 100644
>> --- a/include/qemu/mmap-alloc.h
>> +++ b/include/qemu/mmap-alloc.h
>> @@ -7,18 +7,22 @@ size_t qemu_fd_getpagesize(int fd);
>>   size_t qemu_mempath_getpagesize(const char *mem_path);
>>   
>>   /**
>> - * qemu_ram_mmap: mmap the specified file or device.
>> + * qemu_ram_mmap: mmap anonymous memory, the specified file or device.
>> + *
>> + * mmap() abstraction to map guest RAM, simplifying flag handling, taking
>> + * care of alignment requirements and installing guard pages.
>>    *
>>    * Parameters:
>>    *  @fd: the file or the device to mmap
>>    *  @size: the number of bytes to be mmaped
>>    *  @align: if not zero, specify the alignment of the starting mapping address;
>>    *          otherwise, the alignment in use will be determined by QEMU.
>> - *  @readonly: true for a read-only mapping, false for read/write.
>> - *  @shared: map has RAM_SHARED flag.
>> - *  @is_pmem: map has RAM_PMEM flag.
>> + *  @qemu_map_flags: QEMU_MAP_* flags
>>    *  @map_offset: map starts at offset of map_offset from the start of fd
>>    *
>> + * Internally, MAP PRIVATE, MAP_ANONYMOUS and MAP_SHARED_VALIDATE are set
>                       ^
>                       |
>                       +---- underscore
> 

Nice catch, thanks :)


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 14/14] hmp: Print "reserve" property of memory backends with "info memdev"
  2021-03-19 10:12 ` [PATCH v4 14/14] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
@ 2021-03-25 19:00   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 25+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-25 19:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Igor Kotrasinski, Eduardo Habkost,
	Michael S. Tsirkin, Murilo Opsfelder Araujo, Richard Henderson,
	qemu-devel, Peter Xu, Greg Kurz, Stefan Hajnoczi, Igor Mammedov,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

* David Hildenbrand (david@redhat.com) wrote:
> Let's print the new property.
> 
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  hw/core/machine-hmp-cmds.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> index 58248cffa3..fb717f69b9 100644
> --- a/hw/core/machine-hmp-cmds.c
> +++ b/hw/core/machine-hmp-cmds.c
> @@ -110,6 +110,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
>                         m->value->dump ? "true" : "false");
>          monitor_printf(mon, "  prealloc: %s\n",
>                         m->value->prealloc ? "true" : "false");
> +        monitor_printf(mon, "  reserve: %s\n",
> +                       m->value->reserve ? "true" : "false");
>          monitor_printf(mon, "  policy: %s\n",
>                         HostMemPolicy_str(m->value->policy));
>          visit_complete(v, &str);
> -- 
> 2.29.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-03-25 19:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 10:12 [PATCH v4 00/14] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 01/14] softmmu/physmem: Mark shared anonymous memory RAM_SHARED David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 02/14] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 03/14] softmmu/physmem: Fix qemu_ram_remap() " David Hildenbrand
2021-03-23 20:40   ` Peter Xu
2021-03-19 10:12 ` [PATCH v4 04/14] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 05/14] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 06/14] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 07/14] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd() David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 08/14] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate() David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 09/14] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
2021-03-23 20:49   ` Peter Xu
2021-03-25  9:40     ` David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 10/14] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
2021-03-23 20:51   ` Peter Xu
2021-03-19 10:12 ` [PATCH v4 11/14] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux David Hildenbrand
2021-03-23 20:56   ` Peter Xu
2021-03-19 10:12 ` [PATCH v4 12/14] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 13/14] qmp: Include "reserve" property of memory backends David Hildenbrand
2021-03-19 15:40   ` Markus Armbruster
2021-03-19 15:49     ` David Hildenbrand
2021-03-19 16:32       ` Markus Armbruster
2021-03-19 16:40         ` David Hildenbrand
2021-03-19 10:12 ` [PATCH v4 14/14] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
2021-03-25 19:00   ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).