qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property
@ 2021-04-21 12:26 David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 01/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
                   ` (15 more replies)
  0 siblings, 16 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

Based-on: 20210406080126.24010-1-david@redhat.com

Some cleanups previously sent in other context (resizeable allocations),
followed by RAM_NORESERVE, implementing it under Linux using MAP_NORESERVE,
and letting users configure it for memory backens using the "reserve"
property (default: true).

MAP_NORESERVE under Linux has in the context of QEMU an effect on
1) Private/shared anonymous memory
-> memory-backend-ram,id=mem0,size=10G
2) Private fd-based mappings
-> memory-backend-file,id=mem0,size=10G,mem-path=/dev/shm/0
-> memory-backend-memfd,id=mem0,size=10G
3) Private/shared hugetlb mappings
-> memory-backend-memfd,id=mem0,size=10G,hugetlb=on,hugetlbsize=2M

With MAP_NORESERVE/"reserve=off", we won't be reserving swap space (1/2) or
huge pages (3) for the whole memory region.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. MAP_NORESERVE tells the OS
"this mapping might be very sparse". This essentially allows
avoiding having to set "/proc/sys/vm/overcommit_memory == 1") when using
virtio-mem and also supporting hugetlbfs in the future.

v5 -> v6:
- "softmmu/memory: Pass ram_flags to memory_region_init ..."
-- Split up into two patches
---> "softmmu/memory: Pass ram_flags to memory_region.."
---> "softmmu/memory: Pass ram_flags to qemu_ram_alloc() ..."
-- Also set RAM_PREALLOC from qemu_ram_alloc_from_ptr()
- Collected acks/rbs

v4 -> v5:
- Sent out shared anonymous RAM fixes separately
- Rebased
- "hostmem: Wire up RAM_NORESERVE via "reserve" property"
-- Adjusted/simplified description of new "reserve" property
-- Properly add it to qapi/qom.json
- "qmp: Clarify memory backend properties returned via query-memdev"
-- Added
- "qmp: Include "share" property of memory backends"
-- Added
- "hmp: Print "share" property of memory backends with "info memdev""
- Added
- "qmp: Include "reserve" property of memory backends"
-- Adjust description of new "reserve" property

v3 -> v4:
- Minor comment/description updates
- "softmmu/physmem: Fix ram_block_discard_range() to handle shared ..."
-- Extended description
- "util/mmap-alloc: Pass flags instead of separate bools to ..."
-- Move flags to include/qemu/osdep.h and rename to "QEMU_MAP_*"
- "memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()"
-- Adjust to new flags. Handle errors in mmap_activate() for now.
- "util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux"
-- Restrict support to Linux only for now
- "qmp: Include "reserve" property of memory backends"
-- Added
- "hmp: Print "reserve" property of memory backends with ..."
-- Added

v2 -> v3:
- Renamed "softmmu/physmem: Drop "shared" parameter from ram_block_add()"
  to "softmmu/physmem: Mark shared anonymous memory RAM_SHARED" and
  adjusted the description
- Added "softmmu/physmem: Fix ram_block_discard_range() to handle shared
  anonymous memory"
- Added "softmmu/physmem: Fix qemu_ram_remap() to handle shared anonymous
  memory"
- Added "util/mmap-alloc: Pass flags instead of separate bools to
  qemu_ram_mmap()"
- "util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE"
-- Further tweak code comments
-- Handle shared anonymous memory

v1 -> v2:
- Rebased to upstream and phs_mem_alloc simplifications
-- Upsteam added the "map_offset" parameter to many RAM allocation
   interfaces.
- "softmmu/physmem: Drop "shared" parameter from ram_block_add()"
-- Use local variable "shared"
- "memory: introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()"
-- Simplify due to phs_mem_alloc changes
- "util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE"
-- Add a whole bunch of comments.
-- Exclude shared anonymous memory that QEMU doesn't use
-- Special-case readonly mappings

Cc: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Philippe Mathieu-Daudé" <philmd@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Liam Merwick <liam.merwick@oracle.com>
Cc: Marcel Apfelbaum <mapfelba@redhat.com>

David Hildenbrand (15):
  util/mmap-alloc: Factor out calculation of the pagesize for the guard
    page
  util/mmap-alloc: Factor out reserving of a memory region to
    mmap_reserve()
  util/mmap-alloc: Factor out activating of memory to mmap_activate()
  softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd()
  softmmu/memory: Pass ram_flags to
    memory_region_init_ram_shared_nomigrate()
  softmmu/memory: Pass ram_flags to qemu_ram_alloc() and
    qemu_ram_alloc_internal()
  util/mmap-alloc: Pass flags instead of separate bools to
    qemu_ram_mmap()
  memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()
  util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux
  hostmem: Wire up RAM_NORESERVE via "reserve" property
  qmp: Clarify memory backend properties returned via query-memdev
  qmp: Include "share" property of memory backends
  hmp: Print "share" property of memory backends with "info memdev"
  qmp: Include "reserve" property of memory backends
  hmp: Print "reserve" property of memory backends with "info memdev"

 backends/hostmem-file.c                       |  11 +-
 backends/hostmem-memfd.c                      |   8 +-
 backends/hostmem-ram.c                        |   7 +-
 backends/hostmem.c                            |  32 +++
 hw/core/machine-hmp-cmds.c                    |   4 +
 hw/core/machine-qmp-cmds.c                    |   2 +
 hw/m68k/next-cube.c                           |   4 +-
 hw/misc/ivshmem.c                             |   5 +-
 include/exec/cpu-common.h                     |   1 +
 include/exec/memory.h                         |  42 ++--
 include/exec/ram_addr.h                       |   9 +-
 include/qemu/mmap-alloc.h                     |  16 +-
 include/qemu/osdep.h                          |  30 ++-
 include/sysemu/hostmem.h                      |   2 +-
 migration/ram.c                               |   3 +-
 qapi/machine.json                             |  13 +-
 qapi/qom.json                                 |   4 +
 .../memory-region-housekeeping.cocci          |   8 +-
 softmmu/memory.c                              |  27 ++-
 softmmu/physmem.c                             |  51 +++--
 util/mmap-alloc.c                             | 212 +++++++++++++-----
 util/oslib-posix.c                            |   7 +-
 util/oslib-win32.c                            |  13 +-
 23 files changed, 353 insertions(+), 158 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v6 01/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 02/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

Let's factor out calculating the size of the guard page and rename the
variable to make it clearer that this pagesize only applies to the
guard page.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index e6fa8b598b..24854064b4 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,16 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+static inline size_t mmap_guard_pagesize(int fd)
+{
+#if defined(__powerpc64__) && defined(__linux__)
+    /* Mappings in the same segment must share the same page size */
+    return qemu_fd_getpagesize(fd);
+#else
+    return qemu_real_host_page_size;
+#endif
+}
+
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
@@ -90,12 +100,12 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem,
                     off_t map_offset)
 {
+    const size_t guard_pagesize = mmap_guard_pagesize(fd);
     int prot;
     int flags;
     int map_sync_flags = 0;
     int guardfd;
     size_t offset;
-    size_t pagesize;
     size_t total;
     void *guardptr;
     void *ptr;
@@ -116,8 +126,7 @@ void *qemu_ram_mmap(int fd,
      * anonymous memory is OK.
      */
     flags = MAP_PRIVATE;
-    pagesize = qemu_fd_getpagesize(fd);
-    if (fd == -1 || pagesize == qemu_real_host_page_size) {
+    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
         guardfd = -1;
         flags |= MAP_ANONYMOUS;
     } else {
@@ -126,7 +135,6 @@ void *qemu_ram_mmap(int fd,
     }
 #else
     guardfd = -1;
-    pagesize = qemu_real_host_page_size;
     flags = MAP_PRIVATE | MAP_ANONYMOUS;
 #endif
 
@@ -138,7 +146,7 @@ void *qemu_ram_mmap(int fd,
 
     assert(is_power_of_2(align));
     /* Always align to host page size */
-    assert(align >= pagesize);
+    assert(align >= guard_pagesize);
 
     flags = MAP_FIXED;
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
@@ -193,8 +201,8 @@ void *qemu_ram_mmap(int fd,
      * a guard page guarding against potential buffer overflows.
      */
     total -= offset;
-    if (total > size + pagesize) {
-        munmap(ptr + size + pagesize, total - size - pagesize);
+    if (total > size + guard_pagesize) {
+        munmap(ptr + size + guard_pagesize, total - size - guard_pagesize);
     }
 
     return ptr;
@@ -202,15 +210,8 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
-    size_t pagesize;
-
     if (ptr) {
         /* Unmap both the RAM block and the guard page */
-#if defined(__powerpc64__) && defined(__linux__)
-        pagesize = qemu_fd_getpagesize(fd);
-#else
-        pagesize = qemu_real_host_page_size;
-#endif
-        munmap(ptr, size + pagesize);
+        munmap(ptr, size + mmap_guard_pagesize(fd));
     }
 }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 02/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 01/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 03/15] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Igor Kotrasinski,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

We want to reserve a memory region without actually populating memory.
Let's factor that out.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 58 +++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 24854064b4..223d66219c 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,38 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+/*
+ * Reserve a new memory region of the requested size to be used for mapping
+ * from the given fd (if any).
+ */
+static void *mmap_reserve(size_t size, int fd)
+{
+    int flags = MAP_PRIVATE;
+
+#if defined(__powerpc64__) && defined(__linux__)
+    /*
+     * On ppc64 mappings in the same segment (aka slice) must share the same
+     * page size. Since we will be re-allocating part of this segment
+     * from the supplied fd, we should make sure to use the same page size, to
+     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
+     * avoid allocating backing store memory.
+     * We do this unless we are using the system page size, in which case
+     * anonymous memory is OK.
+     */
+    if (fd == -1 || qemu_fd_getpagesize(fd) == qemu_real_host_page_size) {
+        fd = -1;
+        flags |= MAP_ANONYMOUS;
+    } else {
+        flags |= MAP_NORESERVE;
+    }
+#else
+    fd = -1;
+    flags |= MAP_ANONYMOUS;
+#endif
+
+    return mmap(0, size, PROT_NONE, flags, fd, 0);
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -104,7 +136,6 @@ void *qemu_ram_mmap(int fd,
     int prot;
     int flags;
     int map_sync_flags = 0;
-    int guardfd;
     size_t offset;
     size_t total;
     void *guardptr;
@@ -116,30 +147,7 @@ void *qemu_ram_mmap(int fd,
      */
     total = size + align;
 
-#if defined(__powerpc64__) && defined(__linux__)
-    /* On ppc64 mappings in the same segment (aka slice) must share the same
-     * page size. Since we will be re-allocating part of this segment
-     * from the supplied fd, we should make sure to use the same page size, to
-     * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
-     * avoid allocating backing store memory.
-     * We do this unless we are using the system page size, in which case
-     * anonymous memory is OK.
-     */
-    flags = MAP_PRIVATE;
-    if (fd == -1 || guard_pagesize == qemu_real_host_page_size) {
-        guardfd = -1;
-        flags |= MAP_ANONYMOUS;
-    } else {
-        guardfd = fd;
-        flags |= MAP_NORESERVE;
-    }
-#else
-    guardfd = -1;
-    flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#endif
-
-    guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
-
+    guardptr = mmap_reserve(total, fd);
     if (guardptr == MAP_FAILED) {
         return MAP_FAILED;
     }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 03/15] util/mmap-alloc: Factor out activating of memory to mmap_activate()
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 01/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 02/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 04/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd() David Hildenbrand
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

We want to activate memory within a reserved memory region, to make it
accessible. Let's factor that out.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/mmap-alloc.c | 94 +++++++++++++++++++++++++----------------------
 1 file changed, 50 insertions(+), 44 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 223d66219c..0e2bd7bc0e 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -114,6 +114,52 @@ static void *mmap_reserve(size_t size, int fd)
     return mmap(0, size, PROT_NONE, flags, fd, 0);
 }
 
+/*
+ * Activate memory in a reserved region from the given fd (if any), to make
+ * it accessible.
+ */
+static void *mmap_activate(void *ptr, size_t size, int fd, bool readonly,
+                           bool shared, bool is_pmem, off_t map_offset)
+{
+    const int prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
+    int map_sync_flags = 0;
+    int flags = MAP_FIXED;
+    void *activated_ptr;
+
+    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
+    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
+    activated_ptr = mmap(ptr, size, prot, flags | map_sync_flags, fd,
+                         map_offset);
+    if (activated_ptr == MAP_FAILED && map_sync_flags) {
+        if (errno == ENOTSUP) {
+            char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+            char *file_name = g_malloc0(PATH_MAX);
+            int len = readlink(proc_link, file_name, PATH_MAX - 1);
+
+            if (len < 0) {
+                len = 0;
+            }
+            file_name[len] = '\0';
+            fprintf(stderr, "Warning: requesting persistence across crashes "
+                    "for backend file %s failed. Proceeding without "
+                    "persistence, data might become corrupted in case of host "
+                    "crash.\n", file_name);
+            g_free(proc_link);
+            g_free(file_name);
+        }
+        /*
+         * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
+         * again without these flags to handle backwards compatibility.
+         */
+        activated_ptr = mmap(ptr, size, prot, flags, fd, map_offset);
+    }
+    return activated_ptr;
+}
+
 static inline size_t mmap_guard_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -133,13 +179,8 @@ void *qemu_ram_mmap(int fd,
                     off_t map_offset)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
-    int prot;
-    int flags;
-    int map_sync_flags = 0;
-    size_t offset;
-    size_t total;
-    void *guardptr;
-    void *ptr;
+    size_t offset, total;
+    void *ptr, *guardptr;
 
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -156,45 +197,10 @@ void *qemu_ram_mmap(int fd,
     /* Always align to host page size */
     assert(align >= guard_pagesize);
 
-    flags = MAP_FIXED;
-    flags |= fd == -1 ? MAP_ANONYMOUS : 0;
-    flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-    if (shared && is_pmem) {
-        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
-    }
-
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
-
-    ptr = mmap(guardptr + offset, size, prot,
-               flags | map_sync_flags, fd, map_offset);
-
-    if (ptr == MAP_FAILED && map_sync_flags) {
-        if (errno == ENOTSUP) {
-            char *proc_link, *file_name;
-            int len;
-            proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
-            file_name = g_malloc0(PATH_MAX);
-            len = readlink(proc_link, file_name, PATH_MAX - 1);
-            if (len < 0) {
-                len = 0;
-            }
-            file_name[len] = '\0';
-            fprintf(stderr, "Warning: requesting persistence across crashes "
-                    "for backend file %s failed. Proceeding without "
-                    "persistence, data might become corrupted in case of host "
-                    "crash.\n", file_name);
-            g_free(proc_link);
-            g_free(file_name);
-        }
-        /*
-         * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
-         * we will remove these flags to handle compatibility.
-         */
-        ptr = mmap(guardptr + offset, size, prot, flags, fd, map_offset);
-    }
-
+    ptr = mmap_activate(guardptr + offset, size, fd, readonly, shared, is_pmem,
+                        map_offset);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
         return MAP_FAILED;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 04/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd()
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 03/15] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 05/15] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate() David Hildenbrand
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

Let's pass in ram flags just like we do with qemu_ram_alloc_from_file(),
to clean up and prepare for more flags.

Simplify the documentation of passed ram flags: Looking at our
documentation of RAM_SHARED and RAM_PMEM is sufficient, no need to be
repetitive.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem-memfd.c | 7 ++++---
 hw/misc/ivshmem.c        | 5 ++---
 include/exec/memory.h    | 9 +++------
 include/exec/ram_addr.h  | 6 +-----
 softmmu/memory.c         | 7 +++----
 5 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 69b0ae30bb..93b5d1a4cf 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -36,6 +36,7 @@ static void
 memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
     HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend);
+    uint32_t ram_flags;
     char *name;
     int fd;
 
@@ -53,9 +54,9 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
-                                   name, backend->size,
-                                   backend->share, fd, 0, errp);
+    ram_flags = backend->share ? RAM_SHARED : 0;
+    memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
+                                   backend->size, ram_flags, fd, 0, errp);
     g_free(name);
 }
 
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index a1fa4878be..1ba4a98377 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -493,9 +493,8 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
     size = buf.st_size;
 
     /* mmap the region and map into the BAR2 */
-    memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
-                                   "ivshmem.bar2", size, true, fd, 0,
-                                   &local_err);
+    memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s), "ivshmem.bar2",
+                                   size, RAM_SHARED, fd, 0, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5728a681b2..8ad280e532 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -991,10 +991,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  *         (getpagesize()) will be used.
- * @ram_flags: Memory region features:
- *             - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
- *             - RAM_PMEM: the memory is persistent memory
- *             Other bits are ignored now.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
  * @path: the path in which to allocate the RAM.
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
@@ -1020,7 +1017,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
  * @owner: the object that tracks the region's reference count
  * @name: the name of the region.
  * @size: size of the region.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
  * @fd: the fd to mmap.
  * @offset: offset within the file referenced by fd
  * @errp: pointer to Error*, to store an error if it happens.
@@ -1032,7 +1029,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
                                     Object *owner,
                                     const char *name,
                                     uint64_t size,
-                                    bool share,
+                                    uint32_t ram_flags,
                                     int fd,
                                     ram_addr_t offset,
                                     Error **errp);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 3cb9791df3..a7e3378340 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -104,11 +104,7 @@ long qemu_maxrampagesize(void);
  * Parameters:
  *  @size: the size in bytes of the ram block
  *  @mr: the memory region where the ram block is
- *  @ram_flags: specify the properties of the ram block, which can be one
- *              or bit-or of following values
- *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
- *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
- *              Other bits are ignored.
+ *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
  *  @mem_path or @fd: specify the backing file or device
  *  @readonly: true to open @path for reading, false for read/write.
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/softmmu/memory.c b/softmmu/memory.c
index d4493ef9e4..8c3acd839e 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1611,7 +1611,7 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
                                     Object *owner,
                                     const char *name,
                                     uint64_t size,
-                                    bool share,
+                                    uint32_t ram_flags,
                                     int fd,
                                     ram_addr_t offset,
                                     Error **errp)
@@ -1621,9 +1621,8 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
-                                           share ? RAM_SHARED : 0,
-                                           fd, offset, false, &err);
+    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, offset,
+                                           false, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 05/15] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate()
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (3 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 04/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd() David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:33   ` Philippe Mathieu-Daudé
  2021-04-21 12:26 ` [PATCH v6 06/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal() David Hildenbrand
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

Let's forward ram_flags instead, renaming
memory_region_init_ram_shared_nomigrate() into
memory_region_init_ram_flags_nomigrate().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem-ram.c                        |  6 +++--
 hw/m68k/next-cube.c                           |  4 ++--
 include/exec/memory.h                         | 24 +++++++++----------
 .../memory-region-housekeeping.cocci          |  8 +++----
 softmmu/memory.c                              | 18 +++++++-------
 5 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 5cc53e76c9..741e701062 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -19,6 +19,7 @@
 static void
 ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
+    uint32_t ram_flags;
     char *name;
 
     if (!backend->size) {
@@ -27,8 +28,9 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    memory_region_init_ram_shared_nomigrate(&backend->mr, OBJECT(backend), name,
-                           backend->size, backend->share, errp);
+    ram_flags = backend->share ? RAM_SHARED : 0;
+    memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
+                                           backend->size, ram_flags, errp);
     g_free(name);
 }
 
diff --git a/hw/m68k/next-cube.c b/hw/m68k/next-cube.c
index 92b45d760f..59ccae0d5e 100644
--- a/hw/m68k/next-cube.c
+++ b/hw/m68k/next-cube.c
@@ -986,8 +986,8 @@ static void next_cube_init(MachineState *machine)
     sysbus_mmio_map(SYS_BUS_DEVICE(pcdev), 1, 0x02100000);
 
     /* BMAP memory */
-    memory_region_init_ram_shared_nomigrate(bmapm1, NULL, "next.bmapmem", 64,
-                                            true, &error_fatal);
+    memory_region_init_ram_flags_nomigrate(bmapm1, NULL, "next.bmapmem", 64,
+                                           RAM_SHARED, &error_fatal);
     memory_region_add_subregion(sysmem, 0x020c0000, bmapm1);
     /* The Rev_2.5_v66.bin firmware accesses it at 0x820c0020, too */
     memory_region_init_alias(bmapm2, NULL, "next.bmapmem2", bmapm1, 0x0, 64);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8ad280e532..10179c6695 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -928,27 +928,27 @@ void memory_region_init_ram_nomigrate(MemoryRegion *mr,
                                       Error **errp);
 
 /**
- * memory_region_init_ram_shared_nomigrate:  Initialize RAM memory region.
- *                                           Accesses into the region will
- *                                           modify memory directly.
+ * memory_region_init_ram_flags_nomigrate:  Initialize RAM memory region.
+ *                                          Accesses into the region will
+ *                                          modify memory directly.
  *
  * @mr: the #MemoryRegion to be initialized.
  * @owner: the object that tracks the region's reference count
  * @name: Region name, becomes part of RAMBlock name used in migration stream
  *        must be unique within any device
  * @size: size of the region.
- * @share: allow remapping RAM to different addresses
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED.
  * @errp: pointer to Error*, to store an error if it happens.
  *
- * Note that this function is similar to memory_region_init_ram_nomigrate.
- * The only difference is part of the RAM region can be remapped.
+ * Note that this function does not do anything to cause the data in the
+ * RAM memory region to be migrated; that is the responsibility of the caller.
  */
-void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
-                                             Object *owner,
-                                             const char *name,
-                                             uint64_t size,
-                                             bool share,
-                                             Error **errp);
+void memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
+                                            Object *owner,
+                                            const char *name,
+                                            uint64_t size,
+                                            uint32_t ram_flags,
+                                            Error **errp);
 
 /**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
diff --git a/scripts/coccinelle/memory-region-housekeeping.cocci b/scripts/coccinelle/memory-region-housekeeping.cocci
index c768d8140a..29651ebde9 100644
--- a/scripts/coccinelle/memory-region-housekeeping.cocci
+++ b/scripts/coccinelle/memory-region-housekeeping.cocci
@@ -127,8 +127,8 @@ static void device_fn(DeviceState *dev, ...)
 - memory_region_init_rom(E1, NULL, E2, E3, E4);
 + memory_region_init_rom(E1, obj, E2, E3, E4);
 |
-- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5);
-+ memory_region_init_ram_shared_nomigrate(E1, obj, E2, E3, E4, E5);
+- memory_region_init_ram_flags_nomigrate(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_ram_flags_nomigrate(E1, obj, E2, E3, E4, E5);
 )
   ...+>
 }
@@ -152,8 +152,8 @@ static void device_fn(DeviceState *dev, ...)
 - memory_region_init_rom(E1, NULL, E2, E3, E4);
 + memory_region_init_rom(E1, OBJECT(dev), E2, E3, E4);
 |
-- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5);
-+ memory_region_init_ram_shared_nomigrate(E1, OBJECT(dev), E2, E3, E4, E5);
+- memory_region_init_ram_flags_nomigrate(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_ram_flags_nomigrate(E1, OBJECT(dev), E2, E3, E4, E5);
 )
   ...+>
 }
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 8c3acd839e..67be0aa152 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1533,22 +1533,22 @@ void memory_region_init_ram_nomigrate(MemoryRegion *mr,
                                       uint64_t size,
                                       Error **errp)
 {
-    memory_region_init_ram_shared_nomigrate(mr, owner, name, size, false, errp);
+    memory_region_init_ram_flags_nomigrate(mr, owner, name, size, 0, errp);
 }
 
-void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
-                                             Object *owner,
-                                             const char *name,
-                                             uint64_t size,
-                                             bool share,
-                                             Error **errp)
+void memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
+                                            Object *owner,
+                                            const char *name,
+                                            uint64_t size,
+                                            uint32_t ram_flags,
+                                            Error **errp)
 {
     Error *err = NULL;
     memory_region_init(mr, owner, name, size);
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc(size, share, mr, &err);
+    mr->ram_block = qemu_ram_alloc(size, ram_flags & RAM_SHARED, mr, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
@@ -1684,7 +1684,7 @@ void memory_region_init_rom_nomigrate(MemoryRegion *mr,
                                       uint64_t size,
                                       Error **errp)
 {
-    memory_region_init_ram_shared_nomigrate(mr, owner, name, size, false, errp);
+    memory_region_init_ram_flags_nomigrate(mr, owner, name, size, 0, errp);
     mr->readonly = true;
 }
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 06/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal()
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (4 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 05/15] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate() David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:34   ` Philippe Mathieu-Daudé
  2021-04-21 12:26 ` [PATCH v6 07/15] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

Let's pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal(),
preparing for passing additional flags.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/ram_addr.h |  2 +-
 softmmu/memory.c        |  4 ++--
 softmmu/physmem.c       | 29 ++++++++++++-----------------
 3 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index a7e3378340..6d4513f8e2 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -122,7 +122,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
 
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                   MemoryRegion *mr, Error **errp);
-RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share, MemoryRegion *mr,
+RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags, MemoryRegion *mr,
                          Error **errp);
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t max_size,
                                     void (*resized)(const char*,
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 67be0aa152..580d2c06f5 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1548,7 +1548,7 @@ void memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc(size, ram_flags & RAM_SHARED, mr, &err);
+    mr->ram_block = qemu_ram_alloc(size, ram_flags, mr, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
@@ -1704,7 +1704,7 @@ void memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
     mr->terminates = true;
     mr->rom_device = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc(size, false,  mr, &err);
+    mr->ram_block = qemu_ram_alloc(size, 0, mr, &err);
     if (err) {
         mr->size = int128_zero();
         object_unparent(OBJECT(mr));
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index cc59f05593..11b45be271 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2108,12 +2108,15 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
                                   void (*resized)(const char*,
                                                   uint64_t length,
                                                   void *host),
-                                  void *host, bool resizeable, bool share,
+                                  void *host, uint32_t ram_flags,
                                   MemoryRegion *mr, Error **errp)
 {
     RAMBlock *new_block;
     Error *local_err = NULL;
 
+    assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC)) == 0);
+    assert(!host ^ (ram_flags & RAM_PREALLOC));
+
     size = HOST_PAGE_ALIGN(size);
     max_size = HOST_PAGE_ALIGN(max_size);
     new_block = g_malloc0(sizeof(*new_block));
@@ -2125,15 +2128,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     new_block->fd = -1;
     new_block->page_size = qemu_real_host_page_size;
     new_block->host = host;
-    if (host) {
-        new_block->flags |= RAM_PREALLOC;
-    }
-    if (share) {
-        new_block->flags |= RAM_SHARED;
-    }
-    if (resizeable) {
-        new_block->flags |= RAM_RESIZEABLE;
-    }
+    new_block->flags = ram_flags;
     ram_block_add(new_block, &local_err);
     if (local_err) {
         g_free(new_block);
@@ -2146,15 +2141,15 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr, Error **errp)
 {
-    return qemu_ram_alloc_internal(size, size, NULL, host, false,
-                                   false, mr, errp);
+    return qemu_ram_alloc_internal(size, size, NULL, host, RAM_PREALLOC, mr,
+                                   errp);
 }
 
-RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share,
+RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
-    return qemu_ram_alloc_internal(size, size, NULL, NULL, false,
-                                   share, mr, errp);
+    assert((ram_flags & ~RAM_SHARED) == 0);
+    return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
@@ -2163,8 +2158,8 @@ RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
                                                      void *host),
                                      MemoryRegion *mr, Error **errp)
 {
-    return qemu_ram_alloc_internal(size, maxsz, resized, NULL, true,
-                                   false, mr, errp);
+    return qemu_ram_alloc_internal(size, maxsz, resized, NULL,
+                                   RAM_RESIZEABLE, mr, errp);
 }
 
 static void reclaim_ramblock(RAMBlock *block)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 07/15] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (5 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 06/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal() David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:37   ` Philippe Mathieu-Daudé
  2021-04-21 12:26 ` [PATCH v6 08/15] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

Let's pass flags instead of bools to prepare for passing other flags and
update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_
flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify
it.

We expose only flags that are currently supported by qemu_ram_mmap().
Maybe, we'll see qemu_mmap() in the future as well that can implement these
flags.

Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only
defined for some systems and we want to always be able to identify
these flags reliably inside qemu_ram_mmap() -- for example, to properly
warn when some future flags are not available or effective on a system.
Also, this way we can simplify PROT_ handling as well.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/mmap-alloc.h | 16 +++++++++-------
 include/qemu/osdep.h      | 18 ++++++++++++++++++
 softmmu/physmem.c         |  8 +++++---
 util/mmap-alloc.c         | 15 ++++++++-------
 util/oslib-posix.c        |  3 ++-
 5 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 456ff87df1..90d0eee705 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,18 +7,22 @@ size_t qemu_fd_getpagesize(int fd);
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
 /**
- * qemu_ram_mmap: mmap the specified file or device.
+ * qemu_ram_mmap: mmap anonymous memory, the specified file or device.
+ *
+ * mmap() abstraction to map guest RAM, simplifying flag handling, taking
+ * care of alignment requirements and installing guard pages.
  *
  * Parameters:
  *  @fd: the file or the device to mmap
  *  @size: the number of bytes to be mmaped
  *  @align: if not zero, specify the alignment of the starting mapping address;
  *          otherwise, the alignment in use will be determined by QEMU.
- *  @readonly: true for a read-only mapping, false for read/write.
- *  @shared: map has RAM_SHARED flag.
- *  @is_pmem: map has RAM_PMEM flag.
+ *  @qemu_map_flags: QEMU_MAP_* flags
  *  @map_offset: map starts at offset of map_offset from the start of fd
  *
+ * Internally, MAP_PRIVATE, MAP_ANONYMOUS and MAP_SHARED_VALIDATE are set
+ * implicitly based on other parameters.
+ *
  * Return:
  *  On success, return a pointer to the mapped area.
  *  On failure, return MAP_FAILED.
@@ -26,9 +30,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
-                    bool readonly,
-                    bool shared,
-                    bool is_pmem,
+                    uint32_t qemu_map_flags,
                     off_t map_offset);
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size);
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cb2a07e472..a96d6cb7ac 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -366,6 +366,24 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
 void qemu_vfree(void *ptr);
 void qemu_anon_ram_free(void *ptr, size_t size);
 
+/*
+ * Abstraction of PROT_ and MAP_ flags as passed to mmap(), for example,
+ * consumed by qemu_ram_mmap().
+ */
+
+/* Map PROT_READ instead of PROT_READ | PROT_WRITE. */
+#define QEMU_MAP_READONLY   (1 << 0)
+
+/* Use MAP_SHARED instead of MAP_PRIVATE. */
+#define QEMU_MAP_SHARED     (1 << 1)
+
+/*
+ * Use MAP_SYNC | MAP_SHARED_VALIDATE if supported. Ignored without
+ * QEMU_MAP_SHARED. If mapping fails, warn and fallback to !QEMU_MAP_SYNC.
+ */
+#define QEMU_MAP_SYNC       (1 << 2)
+
+
 #define QEMU_MADV_INVALID -1
 
 #if defined(CONFIG_MADVISE)
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 11b45be271..2e8d1f47f0 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1533,6 +1533,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             off_t offset,
                             Error **errp)
 {
+    uint32_t qemu_map_flags;
     void *area;
 
     block->page_size = qemu_fd_getpagesize(fd);
@@ -1580,9 +1581,10 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, block->mr->align, readonly,
-                         block->flags & RAM_SHARED, block->flags & RAM_PMEM,
-                         offset);
+    qemu_map_flags = readonly ? QEMU_MAP_READONLY : 0;
+    qemu_map_flags |= (block->flags & RAM_SHARED) ? QEMU_MAP_SHARED : 0;
+    qemu_map_flags |= (block->flags & RAM_PMEM) ? QEMU_MAP_SYNC : 0;
+    area = qemu_ram_mmap(fd, memory, block->mr->align, qemu_map_flags, offset);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 0e2bd7bc0e..1ddc0e2a1e 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -118,9 +118,12 @@ static void *mmap_reserve(size_t size, int fd)
  * Activate memory in a reserved region from the given fd (if any), to make
  * it accessible.
  */
-static void *mmap_activate(void *ptr, size_t size, int fd, bool readonly,
-                           bool shared, bool is_pmem, off_t map_offset)
+static void *mmap_activate(void *ptr, size_t size, int fd,
+                           uint32_t qemu_map_flags, off_t map_offset)
 {
+    const bool readonly = qemu_map_flags & QEMU_MAP_READONLY;
+    const bool shared = qemu_map_flags & QEMU_MAP_SHARED;
+    const bool sync = qemu_map_flags & QEMU_MAP_SYNC;
     const int prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
     int map_sync_flags = 0;
     int flags = MAP_FIXED;
@@ -128,7 +131,7 @@ static void *mmap_activate(void *ptr, size_t size, int fd, bool readonly,
 
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-    if (shared && is_pmem) {
+    if (shared && sync) {
         map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
     }
 
@@ -173,9 +176,7 @@ static inline size_t mmap_guard_pagesize(int fd)
 void *qemu_ram_mmap(int fd,
                     size_t size,
                     size_t align,
-                    bool readonly,
-                    bool shared,
-                    bool is_pmem,
+                    uint32_t qemu_map_flags,
                     off_t map_offset)
 {
     const size_t guard_pagesize = mmap_guard_pagesize(fd);
@@ -199,7 +200,7 @@ void *qemu_ram_mmap(int fd,
 
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap_activate(guardptr + offset, size, fd, readonly, shared, is_pmem,
+    ptr = mmap_activate(guardptr + offset, size, fd, qemu_map_flags,
                         map_offset);
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 36820fec16..c12108c6cb 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -229,8 +229,9 @@ void *qemu_memalign(size_t alignment, size_t size)
 /* alloc shared memory pages */
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
+    const uint32_t qemu_map_flags = shared ? QEMU_MAP_SHARED : 0;
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, false, shared, false, 0);
+    void *ptr = qemu_ram_mmap(-1, size, align, qemu_map_flags, 0);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 08/15] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (6 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 07/15] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux David Hildenbrand
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The
new flag has the following semantics:

"
RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge
pages if applicable) is skipped: will bail out if not supported. When not
set, the OS will do the reservation, if supported for the memory type.
"

Allow passing it into:
- memory_region_init_ram_nomigrate()
- memory_region_init_resizeable_ram()
- memory_region_init_ram_from_file()

... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag.
Bail out if the flag is not supported, which is the case right now for
both, POSIX and win32. We will add Linux support next and allow specifying
RAM_NORESERVE via memory backends.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/cpu-common.h |  1 +
 include/exec/memory.h     | 15 ++++++++++++---
 include/exec/ram_addr.h   |  3 ++-
 include/qemu/osdep.h      |  9 ++++++++-
 migration/ram.c           |  3 +--
 softmmu/physmem.c         | 15 ++++++++++++---
 util/mmap-alloc.c         |  7 +++++++
 util/oslib-posix.c        |  6 ++++--
 util/oslib-win32.c        | 13 ++++++++++++-
 9 files changed, 59 insertions(+), 13 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 5a0a2d93e0..38a47ad4ac 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -58,6 +58,7 @@ void *qemu_ram_get_host_addr(RAMBlock *rb);
 ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
+bool qemu_ram_is_noreserve(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 10179c6695..8d77819bcd 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -155,6 +155,13 @@ typedef struct IOMMUTLBEvent {
  */
 #define RAM_UF_WRITEPROTECT (1 << 6)
 
+/*
+ * RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge
+ * pages if applicable) is skipped: will bail out if not supported. When not
+ * set, the OS will do the reservation, if supported for the memory type.
+ */
+#define RAM_NORESERVE (1 << 7)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
@@ -937,7 +944,7 @@ void memory_region_init_ram_nomigrate(MemoryRegion *mr,
  * @name: Region name, becomes part of RAMBlock name used in migration stream
  *        must be unique within any device
  * @size: size of the region.
- * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE.
  * @errp: pointer to Error*, to store an error if it happens.
  *
  * Note that this function does not do anything to cause the data in the
@@ -991,7 +998,8 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  *         (getpagesize()) will be used.
- * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
+ *             RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
@@ -1017,7 +1025,8 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
  * @owner: the object that tracks the region's reference count
  * @name: the name of the region.
  * @size: size of the region.
- * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
+ *             RAM_NORESERVE.
  * @fd: the fd to mmap.
  * @offset: offset within the file referenced by fd
  * @errp: pointer to Error*, to store an error if it happens.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6d4513f8e2..551876bed0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -104,7 +104,8 @@ long qemu_maxrampagesize(void);
  * Parameters:
  *  @size: the size in bytes of the ram block
  *  @mr: the memory region where the ram block is
- *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM.
+ *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
+ *              RAM_NORESERVE.
  *  @mem_path or @fd: specify the backing file or device
  *  @readonly: true to open @path for reading, false for read/write.
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index a96d6cb7ac..af65b36698 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -362,7 +362,8 @@ extern "C" {
 int qemu_daemon(int nochdir, int noclose);
 void *qemu_try_memalign(size_t alignment, size_t size);
 void *qemu_memalign(size_t alignment, size_t size);
-void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
+void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared,
+                          bool noreserve);
 void qemu_vfree(void *ptr);
 void qemu_anon_ram_free(void *ptr, size_t size);
 
@@ -383,6 +384,12 @@ void qemu_anon_ram_free(void *ptr, size_t size);
  */
 #define QEMU_MAP_SYNC       (1 << 2)
 
+/*
+ * Use MAP_NORESERVE to skip reservation of swap space (or huge pages if
+ * applicable). Bail out if not supported/effective.
+ */
+#define QEMU_MAP_NORESERVE  (1 << 3)
+
 
 #define QEMU_MADV_INVALID -1
 
diff --git a/migration/ram.c b/migration/ram.c
index 4682f3625c..7b2d4dd449 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3371,8 +3371,7 @@ int colo_init_ram_cache(void)
     WITH_RCU_READ_LOCK_GUARD() {
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             block->colo_cache = qemu_anon_ram_alloc(block->used_length,
-                                                    NULL,
-                                                    false);
+                                                    NULL, false, false);
             if (!block->colo_cache) {
                 error_report("%s: Can't alloc memory for COLO cache of block %s,"
                              "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 2e8d1f47f0..1efb1d5193 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1584,6 +1584,7 @@ static void *file_ram_alloc(RAMBlock *block,
     qemu_map_flags = readonly ? QEMU_MAP_READONLY : 0;
     qemu_map_flags |= (block->flags & RAM_SHARED) ? QEMU_MAP_SHARED : 0;
     qemu_map_flags |= (block->flags & RAM_PMEM) ? QEMU_MAP_SYNC : 0;
+    qemu_map_flags |= (block->flags & RAM_NORESERVE) ? QEMU_MAP_NORESERVE : 0;
     area = qemu_ram_mmap(fd, memory, block->mr->align, qemu_map_flags, offset);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
@@ -1704,6 +1705,11 @@ bool qemu_ram_is_shared(RAMBlock *rb)
     return rb->flags & RAM_SHARED;
 }
 
+bool qemu_ram_is_noreserve(RAMBlock *rb)
+{
+    return rb->flags & RAM_NORESERVE;
+}
+
 /* Note: Only set at the start of postcopy */
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb)
 {
@@ -1931,6 +1937,7 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
 
 static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
+    const bool noreserve = qemu_ram_is_noreserve(new_block);
     const bool shared = qemu_ram_is_shared(new_block);
     RAMBlock *block;
     RAMBlock *last_block = NULL;
@@ -1954,7 +1961,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         } else {
             new_block->host = qemu_anon_ram_alloc(new_block->max_length,
                                                   &new_block->mr->align,
-                                                  shared);
+                                                  shared, noreserve);
             if (!new_block->host) {
                 error_setg_errno(errp, errno,
                                  "cannot set up guest memory '%s'",
@@ -2025,7 +2032,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     int64_t file_size, file_align;
 
     /* Just support these ram flags by now. */
-    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE)) == 0);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
@@ -2117,6 +2124,8 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     Error *local_err = NULL;
 
     assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
+                          RAM_NORESERVE)) == 0);
     assert(!host ^ (ram_flags & RAM_PREALLOC));
 
     size = HOST_PAGE_ALIGN(size);
@@ -2150,7 +2159,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
-    assert((ram_flags & ~RAM_SHARED) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE)) == 0);
     return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 1ddc0e2a1e..d0cf4aaee5 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "qemu/error-report.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
 
@@ -121,6 +122,7 @@ static void *mmap_reserve(size_t size, int fd)
 static void *mmap_activate(void *ptr, size_t size, int fd,
                            uint32_t qemu_map_flags, off_t map_offset)
 {
+    const bool noreserve = qemu_map_flags & QEMU_MAP_NORESERVE;
     const bool readonly = qemu_map_flags & QEMU_MAP_READONLY;
     const bool shared = qemu_map_flags & QEMU_MAP_SHARED;
     const bool sync = qemu_map_flags & QEMU_MAP_SYNC;
@@ -129,6 +131,11 @@ static void *mmap_activate(void *ptr, size_t size, int fd,
     int flags = MAP_FIXED;
     void *activated_ptr;
 
+    if (noreserve) {
+        error_report("Skipping reservation of swap space is not supported");
+        return MAP_FAILED;
+    }
+
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
     if (shared && sync) {
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index c12108c6cb..5bbdebae0e 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -227,9 +227,11 @@ void *qemu_memalign(size_t alignment, size_t size)
 }
 
 /* alloc shared memory pages */
-void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
+void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared,
+                          bool noreserve)
 {
-    const uint32_t qemu_map_flags = shared ? QEMU_MAP_SHARED : 0;
+    const uint32_t qemu_map_flags = (shared ? QEMU_MAP_SHARED : 0) |
+                                    (noreserve ? QEMU_MAP_NORESERVE : 0);
     size_t align = QEMU_VMALLOC_ALIGN;
     void *ptr = qemu_ram_mmap(-1, size, align, qemu_map_flags, 0);
 
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index f68b8012bb..8cafe44179 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -39,6 +39,7 @@
 #include "trace.h"
 #include "qemu/sockets.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 #include <malloc.h>
 
 /* this must come after including "trace.h" */
@@ -77,10 +78,20 @@ static int get_allocation_granularity(void)
     return system_info.dwAllocationGranularity;
 }
 
-void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared)
+void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared,
+                          bool noreserve)
 {
     void *ptr;
 
+    if (noreserve) {
+        /*
+         * We need a MEM_COMMIT before accessing any memory in a MEM_RESERVE
+         * area; we cannot easily mimic POSIX MAP_NORESERVE semantics.
+         */
+        error_report("Skipping reservation of swap space is not supported.");
+        return NULL;
+    }
+
     ptr = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
     trace_qemu_anon_ram_alloc(size, ptr);
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (7 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 08/15] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 12:26 ` [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé

Let's support RAM_NORESERVE via MAP_NORESERVE on Linux. The flag has no
effect on most shared mappings - except for hugetlbfs and anonymous memory.

Linux man page:
  "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap
  space is reserved, one has the guarantee that it is possible to modify
  the mapping. When swap space is not reserved one might get SIGSEGV
  upon a write if no physical memory is available. See also the discussion
  of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before
  2.6, this flag had effect only for private writable mappings."

Note that the "guarantee" part is wrong with memory overcommit in Linux.

Also, in Linux hugetlbfs is treated differently - we configure reservation
of huge pages from the pool, not reservation of swap space (huge pages
cannot be swapped).

The rough behavior is [1]:
a) !Hugetlbfs:

  1) Without MAP_NORESERVE *or* with memory overcommit under Linux
     disabled ("/proc/sys/vm/overcommit_memory == 2"), the following
     accounting/reservation happens:
      For a file backed map
       SHARED or READ-only - 0 cost (the file is the map not swap)
       PRIVATE WRITABLE - size of mapping per instance

      For an anonymous or /dev/zero map
       SHARED   - size of mapping
       PRIVATE READ-only - 0 cost (but of little use)
       PRIVATE WRITABLE - size of mapping per instance

  2) With MAP_NORESERVE, no accounting/reservation happens.

b) Hugetlbfs:

  1) Without MAP_NORESERVE, huge pages are reserved.

  2) With MAP_NORESERVE, no huge pages are reserved.

Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able
to configure it for !hugetlbfs globally; this toggle now allows
configuring it more fine-grained, not for the whole system.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM.

[1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/osdep.h |  3 ++
 softmmu/physmem.c    |  1 +
 util/mmap-alloc.c    | 69 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index af65b36698..0a7384d15c 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -195,6 +195,9 @@ extern "C" {
 #ifndef MAP_FIXED_NOREPLACE
 #define MAP_FIXED_NOREPLACE 0
 #endif
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
 #ifndef ENOMEDIUM
 #define ENOMEDIUM ENODEV
 #endif
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 1efb1d5193..ccc5985324 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2230,6 +2230,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                 flags = MAP_FIXED;
                 flags |= block->flags & RAM_SHARED ?
                          MAP_SHARED : MAP_PRIVATE;
+                flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
                 if (block->fd >= 0) {
                     area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
                                 flags, block->fd, offset);
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index d0cf4aaee5..838e286ce5 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
@@ -83,6 +84,70 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return qemu_real_host_page_size;
 }
 
+#define OVERCOMMIT_MEMORY_PATH "/proc/sys/vm/overcommit_memory"
+static bool map_noreserve_effective(int fd, uint32_t qemu_map_flags)
+{
+#if defined(__linux__)
+    const bool readonly = qemu_map_flags & QEMU_MAP_READONLY;
+    const bool shared = qemu_map_flags & QEMU_MAP_SHARED;
+    gchar *content = NULL;
+    const char *endptr;
+    unsigned int tmp;
+
+    /*
+     * hugeltb accounting is different than ordinary swap reservation:
+     * a) Hugetlb pages from the pool are reserved for both private and
+     *    shared mappings. For shared mappings, all mappers have to specify
+     *    MAP_NORESERVE.
+     * b) MAP_NORESERVE is not affected by /proc/sys/vm/overcommit_memory.
+     */
+    if (qemu_fd_getpagesize(fd) != qemu_real_host_page_size) {
+        return true;
+    }
+
+    /*
+     * Accountable mappings in the kernel that can be affected by MAP_NORESEVE
+     * are private writable mappings (see mm/mmap.c:accountable_mapping() in
+     * Linux). For all shared or readonly mappings, MAP_NORESERVE is always
+     * implicitly active -- no reservation; this includes shmem. The only
+     * exception is shared anonymous memory, it is accounted like private
+     * anonymous memory.
+     */
+    if (readonly || (shared && fd >= 0)) {
+        return true;
+    }
+
+    /*
+     * MAP_NORESERVE is globally ignored for applicable !hugetlb mappings when
+     * memory overcommit is set to "never". Sparse memory regions aren't really
+     * possible in this system configuration.
+     *
+     * Bail out now instead of silently committing way more memory than
+     * currently desired by the user.
+     */
+    if (g_file_get_contents(OVERCOMMIT_MEMORY_PATH, &content, NULL, NULL) &&
+        !qemu_strtoui(content, &endptr, 0, &tmp) &&
+        (!endptr || *endptr == '\n')) {
+        if (tmp == 2) {
+            error_report("Skipping reservation of swap space is not supported:"
+                         " \"" OVERCOMMIT_MEMORY_PATH "\" is \"2\"");
+            return false;
+        }
+        return true;
+    }
+    /* this interface has been around since Linux 2.6 */
+    error_report("Skipping reservation of swap space is not supported:"
+                 " Could not read: \"" OVERCOMMIT_MEMORY_PATH "\"");
+    return false;
+#endif
+    /*
+     * E.g., FreeBSD used to define MAP_NORESERVE, never implemented it,
+     * and removed it a while ago.
+     */
+    error_report("Skipping reservation of swap space is not supported");
+    return false;
+}
+
 /*
  * Reserve a new memory region of the requested size to be used for mapping
  * from the given fd (if any).
@@ -131,13 +196,13 @@ static void *mmap_activate(void *ptr, size_t size, int fd,
     int flags = MAP_FIXED;
     void *activated_ptr;
 
-    if (noreserve) {
-        error_report("Skipping reservation of swap space is not supported");
+    if (noreserve && !map_noreserve_effective(fd, qemu_map_flags)) {
         return MAP_FAILED;
     }
 
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    flags |= noreserve ? MAP_NORESERVE : 0;
     if (shared && sync) {
         map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
     }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (8 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-23 11:14   ` Markus Armbruster
  2021-04-21 12:26 ` [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev David Hildenbrand
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, Markus Armbruster, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

Let's provide a way to control the use of RAM_NORESERVE via memory
backends using the "reserve" property which defaults to true (old
behavior).

Only Linux currently supports clearing the flag (and support is checked at
runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
Windows and other POSIX systems will bail out with "reserve=false".

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. This essentially allows
avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
virtio-mem and also supporting hugetlbfs in the future.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem-file.c  | 11 ++++++-----
 backends/hostmem-memfd.c |  1 +
 backends/hostmem-ram.c   |  1 +
 backends/hostmem.c       | 32 ++++++++++++++++++++++++++++++++
 include/sysemu/hostmem.h |  2 +-
 qapi/qom.json            |  4 ++++
 6 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index b683da9daf..9d550e53d4 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -40,6 +40,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
                object_get_typename(OBJECT(backend)));
 #else
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
+    uint32_t ram_flags;
     gchar *name;
 
     if (!backend->size) {
@@ -52,11 +53,11 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
-                                     name,
-                                     backend->size, fb->align,
-                                     (backend->share ? RAM_SHARED : 0) |
-                                     (fb->is_pmem ? RAM_PMEM : 0),
+    ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
+    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
+                                     backend->size, fb->align, ram_flags,
                                      fb->mem_path, fb->readonly, errp);
     g_free(name);
 #endif
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 93b5d1a4cf..f3436b623d 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
                                    backend->size, ram_flags, fd, 0, errp);
     g_free(name);
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 741e701062..b8e55cdbd0 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -29,6 +29,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
+    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
     memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
                                            backend->size, ram_flags, errp);
     g_free(name);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index c6c1ff5b99..58fdc1b658 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -217,6 +217,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
     Error *local_err = NULL;
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 
+    if (!backend->reserve && value) {
+        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
+        return;
+    }
+
     if (!host_memory_backend_mr_inited(backend)) {
         backend->prealloc = value;
         return;
@@ -268,6 +273,7 @@ static void host_memory_backend_init(Object *obj)
     /* TODO: convert access to globals to compat properties */
     backend->merge = machine_mem_merge(machine);
     backend->dump = machine_dump_guest_core(machine);
+    backend->reserve = true;
     backend->prealloc_threads = 1;
 }
 
@@ -426,6 +432,28 @@ static void host_memory_backend_set_share(Object *o, bool value, Error **errp)
     backend->share = value;
 }
 
+static bool host_memory_backend_get_reserve(Object *o, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    return backend->reserve;
+}
+
+static void host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(errp, "cannot change property value");
+        return;
+    }
+    if (backend->prealloc && !value) {
+        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
+        return;
+    }
+    backend->reserve = value;
+}
+
 static bool
 host_memory_backend_get_use_canonical_path(Object *obj, Error **errp)
 {
@@ -494,6 +522,10 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
         host_memory_backend_get_share, host_memory_backend_set_share);
     object_class_property_set_description(oc, "share",
         "Mark the memory as private to QEMU or shared");
+    object_class_property_add_bool(oc, "reserve",
+        host_memory_backend_get_reserve, host_memory_backend_set_reserve);
+    object_class_property_set_description(oc, "reserve",
+        "Reserve swap space (or huge pages) if applicable");
     /*
      * Do not delete/rename option. This option must be considered stable
      * (as if it didn't have the 'x-' prefix including deprecation period) as
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index df5644723a..9ff5c16963 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -64,7 +64,7 @@ struct HostMemoryBackend {
     /* protected */
     uint64_t size;
     bool merge, dump, use_canonical_path;
-    bool prealloc, is_mapped, share;
+    bool prealloc, is_mapped, share, reserve;
     uint32_t prealloc_threads;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
     HostMemPolicy policy;
diff --git a/qapi/qom.json b/qapi/qom.json
index cd0e76d564..e9b86893a5 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -545,6 +545,9 @@
 # @share: if false, the memory is private to QEMU; if true, it is shared
 #         (default: false)
 #
+# @reserve: if true, reserve swap space (or huge pages) if applicable
+#           default: true)
+#
 # @size: size of the memory region in bytes
 #
 # @x-use-canonical-path-for-ramblock-id: if true, the canoncial path is used
@@ -566,6 +569,7 @@
             '*prealloc': 'bool',
             '*prealloc-threads': 'uint32',
             '*share': 'bool',
+            '*reserve': 'bool',
             'size': 'size',
             '*x-use-canonical-path-for-ramblock-id': 'bool' } }
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (9 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 21:00   ` Eduardo Habkost
  2021-04-23 10:55   ` Markus Armbruster
  2021-04-21 12:26 ` [PATCH v6 12/15] qmp: Include "share" property of memory backends David Hildenbrand
                   ` (4 subsequent siblings)
  15 siblings, 2 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, Markus Armbruster, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

We return information on the currently configured memory backends and
don't configure them, so decribe what the currently set properties
express.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 qapi/machine.json | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 6e90d463fc..758b901185 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -790,11 +790,11 @@
 #
 # @size: memory backend size
 #
-# @merge: enables or disables memory merge support
+# @merge: whether memory merge support is enabled
 #
-# @dump: includes memory backend's memory in a core dump or not
+# @dump: whether memory backend's memory is included in a core dump
 #
-# @prealloc: enables or disables memory preallocation
+# @prealloc: whether memory was preallocated
 #
 # @host-nodes: host nodes for its memory policy
 #
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 12/15] qmp: Include "share" property of memory backends
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (10 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 21:01   ` Eduardo Habkost
  2021-04-23 11:15   ` Markus Armbruster
  2021-04-21 12:26 ` [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev" David Hildenbrand
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

Let's include the property, which can be helpful when debugging,
for example, to spot misuse of MAP_PRIVATE which can result in some ugly
corner cases (e.g., double-memory consumption on shmem).

Use the same description we also use for describing the property.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine-qmp-cmds.c | 1 +
 qapi/machine.json          | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 68a942595a..d41db5b93b 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque)
         m->merge = object_property_get_bool(obj, "merge", &error_abort);
         m->dump = object_property_get_bool(obj, "dump", &error_abort);
         m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
+        m->share = object_property_get_bool(obj, "share", &error_abort);
         m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
                                              &error_abort);
         host_nodes = object_property_get_qobject(obj,
diff --git a/qapi/machine.json b/qapi/machine.json
index 758b901185..32650bfe9e 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -796,6 +796,8 @@
 #
 # @prealloc: whether memory was preallocated
 #
+# @share: whether memory is private to QEMU or shared (since 6.1)
+#
 # @host-nodes: host nodes for its memory policy
 #
 # @policy: memory policy of memory backend
@@ -809,6 +811,7 @@
     'merge':      'bool',
     'dump':       'bool',
     'prealloc':   'bool',
+    'share':      'bool',
     'host-nodes': ['uint16'],
     'policy':     'HostMemPolicy' }}
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev"
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (11 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 12/15] qmp: Include "share" property of memory backends David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 21:04   ` Eduardo Habkost
  2021-04-23 11:15   ` Markus Armbruster
  2021-04-21 12:26 ` [PATCH v6 14/15] qmp: Include "reserve" property of memory backends David Hildenbrand
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, Markus Armbruster, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

Let's print the property.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine-hmp-cmds.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index 58248cffa3..004a92b3d6 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -110,6 +110,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
                        m->value->dump ? "true" : "false");
         monitor_printf(mon, "  prealloc: %s\n",
                        m->value->prealloc ? "true" : "false");
+        monitor_printf(mon, "  share: %s\n",
+                       m->value->share ? "true" : "false");
         monitor_printf(mon, "  policy: %s\n",
                        HostMemPolicy_str(m->value->policy));
         visit_complete(v, &str);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (12 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev" David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 21:05   ` Eduardo Habkost
  2021-04-23 11:00   ` Markus Armbruster
  2021-04-21 12:26 ` [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
  2021-04-21 21:06 ` [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property Eduardo Habkost
  15 siblings, 2 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, David Hildenbrand, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	Markus Armbruster

Let's include the new property.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine-qmp-cmds.c | 1 +
 qapi/machine.json          | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index d41db5b93b..2d135ecdd0 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
         m->dump = object_property_get_bool(obj, "dump", &error_abort);
         m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
         m->share = object_property_get_bool(obj, "share", &error_abort);
+        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
         m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
                                              &error_abort);
         host_nodes = object_property_get_qobject(obj,
diff --git a/qapi/machine.json b/qapi/machine.json
index 32650bfe9e..5932139d20 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -798,6 +798,9 @@
 #
 # @share: whether memory is private to QEMU or shared (since 6.1)
 #
+# @reserve: whether swap space (or huge pages) was reserved if applicable
+#           (since 6.1)
+#
 # @host-nodes: host nodes for its memory policy
 #
 # @policy: memory policy of memory backend
@@ -812,6 +815,7 @@
     'dump':       'bool',
     'prealloc':   'bool',
     'share':      'bool',
+    'reserve':    'bool',
     'host-nodes': ['uint16'],
     'policy':     'HostMemPolicy' }}
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev"
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (13 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 14/15] qmp: Include "reserve" property of memory backends David Hildenbrand
@ 2021-04-21 12:26 ` David Hildenbrand
  2021-04-21 21:05   ` Eduardo Habkost
  2021-04-23 11:16   ` Markus Armbruster
  2021-04-21 21:06 ` [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property Eduardo Habkost
  15 siblings, 2 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcel Apfelbaum, Murilo Opsfelder Araujo, Eduardo Habkost,
	Michael S. Tsirkin, Markus Armbruster, David Hildenbrand,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

Let's print the new property.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine-hmp-cmds.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index 004a92b3d6..9bedc77bb4 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -112,6 +112,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
                        m->value->prealloc ? "true" : "false");
         monitor_printf(mon, "  share: %s\n",
                        m->value->share ? "true" : "false");
+        monitor_printf(mon, "  reserve: %s\n",
+                       m->value->reserve ? "true" : "false");
         monitor_printf(mon, "  policy: %s\n",
                        HostMemPolicy_str(m->value->policy));
         visit_complete(v, &str);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 05/15] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate()
  2021-04-21 12:26 ` [PATCH v6 05/15] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate() David Hildenbrand
@ 2021-04-21 12:33   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 47+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-04-21 12:33 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini

On 4/21/21 2:26 PM, David Hildenbrand wrote:
> Let's forward ram_flags instead, renaming
> memory_region_init_ram_shared_nomigrate() into
> memory_region_init_ram_flags_nomigrate().
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  backends/hostmem-ram.c                        |  6 +++--
>  hw/m68k/next-cube.c                           |  4 ++--
>  include/exec/memory.h                         | 24 +++++++++----------
>  .../memory-region-housekeeping.cocci          |  8 +++----
>  softmmu/memory.c                              | 18 +++++++-------
>  5 files changed, 31 insertions(+), 29 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 06/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal()
  2021-04-21 12:26 ` [PATCH v6 06/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal() David Hildenbrand
@ 2021-04-21 12:34   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 47+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-04-21 12:34 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini

On 4/21/21 2:26 PM, David Hildenbrand wrote:
> Let's pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal(),
> preparing for passing additional flags.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/exec/ram_addr.h |  2 +-
>  softmmu/memory.c        |  4 ++--
>  softmmu/physmem.c       | 29 ++++++++++++-----------------
>  3 files changed, 15 insertions(+), 20 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 07/15] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()
  2021-04-21 12:26 ` [PATCH v6 07/15] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
@ 2021-04-21 12:37   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 47+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-04-21 12:37 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, Greg Kurz, Stefan Hajnoczi,
	Igor Mammedov, Paolo Bonzini

On 4/21/21 2:26 PM, David Hildenbrand wrote:
> Let's pass flags instead of bools to prepare for passing other flags and
> update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_
> flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify
> it.
> 
> We expose only flags that are currently supported by qemu_ram_mmap().
> Maybe, we'll see qemu_mmap() in the future as well that can implement these
> flags.
> 
> Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only
> defined for some systems and we want to always be able to identify
> these flags reliably inside qemu_ram_mmap() -- for example, to properly
> warn when some future flags are not available or effective on a system.
> Also, this way we can simplify PROT_ handling as well.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/qemu/mmap-alloc.h | 16 +++++++++-------
>  include/qemu/osdep.h      | 18 ++++++++++++++++++
>  softmmu/physmem.c         |  8 +++++---
>  util/mmap-alloc.c         | 15 ++++++++-------
>  util/oslib-posix.c        |  3 ++-
>  5 files changed, 42 insertions(+), 18 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev
  2021-04-21 12:26 ` [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev David Hildenbrand
@ 2021-04-21 21:00   ` Eduardo Habkost
  2021-04-23 10:55   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Eduardo Habkost @ 2021-04-21 21:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Markus Armbruster,
	Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, qemu-devel, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Wed, Apr 21, 2021 at 02:26:20PM +0200, David Hildenbrand wrote:
> We return information on the currently configured memory backends and
> don't configure them, so decribe what the currently set properties
> express.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 12/15] qmp: Include "share" property of memory backends
  2021-04-21 12:26 ` [PATCH v6 12/15] qmp: Include "share" property of memory backends David Hildenbrand
@ 2021-04-21 21:01   ` Eduardo Habkost
  2021-04-23 11:15   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Eduardo Habkost @ 2021-04-21 21:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Murilo Opsfelder Araujo,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, qemu-devel,
	Greg Kurz, Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Wed, Apr 21, 2021 at 02:26:21PM +0200, David Hildenbrand wrote:
> Let's include the property, which can be helpful when debugging,
> for example, to spot misuse of MAP_PRIVATE which can result in some ugly
> corner cases (e.g., double-memory consumption on shmem).
> 
> Use the same description we also use for describing the property.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev"
  2021-04-21 12:26 ` [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev" David Hildenbrand
@ 2021-04-21 21:04   ` Eduardo Habkost
  2021-04-23 11:15   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Eduardo Habkost @ 2021-04-21 21:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Markus Armbruster,
	Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, qemu-devel, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Wed, Apr 21, 2021 at 02:26:22PM +0200, David Hildenbrand wrote:
> Let's print the property.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-21 12:26 ` [PATCH v6 14/15] qmp: Include "reserve" property of memory backends David Hildenbrand
@ 2021-04-21 21:05   ` Eduardo Habkost
  2021-04-23 11:00   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Eduardo Habkost @ 2021-04-21 21:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Murilo Opsfelder Araujo,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, qemu-devel,
	Greg Kurz, Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Markus Armbruster

On Wed, Apr 21, 2021 at 02:26:23PM +0200, David Hildenbrand wrote:
> Let's include the new property.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev"
  2021-04-21 12:26 ` [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
@ 2021-04-21 21:05   ` Eduardo Habkost
  2021-04-23 11:16   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Eduardo Habkost @ 2021-04-21 21:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Markus Armbruster,
	Murilo Opsfelder Araujo, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, qemu-devel, Greg Kurz,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Wed, Apr 21, 2021 at 02:26:24PM +0200, David Hildenbrand wrote:
> Let's print the new property.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property
  2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
                   ` (14 preceding siblings ...)
  2021-04-21 12:26 ` [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
@ 2021-04-21 21:06 ` Eduardo Habkost
  2021-04-23 10:35   ` David Hildenbrand
  15 siblings, 1 reply; 47+ messages in thread
From: Eduardo Habkost @ 2021-04-21 21:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Murilo Opsfelder Araujo,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, qemu-devel,
	Greg Kurz, Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Wed, Apr 21, 2021 at 02:26:09PM +0200, David Hildenbrand wrote:
> Based-on: 20210406080126.24010-1-david@redhat.com
> 
> Some cleanups previously sent in other context (resizeable allocations),
> followed by RAM_NORESERVE, implementing it under Linux using MAP_NORESERVE,
> and letting users configure it for memory backens using the "reserve"
> property (default: true).
> 
> MAP_NORESERVE under Linux has in the context of QEMU an effect on
> 1) Private/shared anonymous memory
> -> memory-backend-ram,id=mem0,size=10G
> 2) Private fd-based mappings
> -> memory-backend-file,id=mem0,size=10G,mem-path=/dev/shm/0
> -> memory-backend-memfd,id=mem0,size=10G
> 3) Private/shared hugetlb mappings
> -> memory-backend-memfd,id=mem0,size=10G,hugetlb=on,hugetlbsize=2M
> 
> With MAP_NORESERVE/"reserve=off", we won't be reserving swap space (1/2) or
> huge pages (3) for the whole memory region.
> 
> The target use case is virtio-mem, which dynamically exposes memory
> inside a large, sparse memory area to the VM. MAP_NORESERVE tells the OS
> "this mapping might be very sparse". This essentially allows
> avoiding having to set "/proc/sys/vm/overcommit_memory == 1") when using
> virtio-mem and also supporting hugetlbfs in the future.

For the memory backend and machine core code:

Acked-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property
  2021-04-21 21:06 ` [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property Eduardo Habkost
@ 2021-04-23 10:35   ` David Hildenbrand
  2021-04-28 16:35     ` Eduardo Habkost
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 10:35 UTC (permalink / raw)
  To: Eduardo Habkost, Paolo Bonzini
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Richard Henderson,
	Dr. David Alan Gilbert, Peter Xu, qemu-devel, Greg Kurz,
	Stefan Hajnoczi, Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

On 21.04.21 23:06, Eduardo Habkost wrote:
> On Wed, Apr 21, 2021 at 02:26:09PM +0200, David Hildenbrand wrote:
>> Based-on: 20210406080126.24010-1-david@redhat.com
>>
>> Some cleanups previously sent in other context (resizeable allocations),
>> followed by RAM_NORESERVE, implementing it under Linux using MAP_NORESERVE,
>> and letting users configure it for memory backens using the "reserve"
>> property (default: true).
>>
>> MAP_NORESERVE under Linux has in the context of QEMU an effect on
>> 1) Private/shared anonymous memory
>> -> memory-backend-ram,id=mem0,size=10G
>> 2) Private fd-based mappings
>> -> memory-backend-file,id=mem0,size=10G,mem-path=/dev/shm/0
>> -> memory-backend-memfd,id=mem0,size=10G
>> 3) Private/shared hugetlb mappings
>> -> memory-backend-memfd,id=mem0,size=10G,hugetlb=on,hugetlbsize=2M
>>
>> With MAP_NORESERVE/"reserve=off", we won't be reserving swap space (1/2) or
>> huge pages (3) for the whole memory region.
>>
>> The target use case is virtio-mem, which dynamically exposes memory
>> inside a large, sparse memory area to the VM. MAP_NORESERVE tells the OS
>> "this mapping might be very sparse". This essentially allows
>> avoiding having to set "/proc/sys/vm/overcommit_memory == 1") when using
>> virtio-mem and also supporting hugetlbfs in the future.
> 
> For the memory backend and machine core code:
> 
> Acked-by: Eduardo Habkost <ehabkost@redhat.com>
> 

Thanks Eduardo. I assume Paolo will be the right maintainer to pick this 
up? (I hope he'll have time to look at this and some of my 
pending-for-a-long-time series soon :( )

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev
  2021-04-21 12:26 ` [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev David Hildenbrand
  2021-04-21 21:00   ` Eduardo Habkost
@ 2021-04-23 10:55   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 10:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

David Hildenbrand <david@redhat.com> writes:

> We return information on the currently configured memory backends and
> don't configure them, so decribe what the currently set properties
> express.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  qapi/machine.json | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 6e90d463fc..758b901185 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -790,11 +790,11 @@
>  #
>  # @size: memory backend size
>  #
> -# @merge: enables or disables memory merge support
> +# @merge: whether memory merge support is enabled
>  #
> -# @dump: includes memory backend's memory in a core dump or not
> +# @dump: whether memory backend's memory is included in a core dump
>  #
> -# @prealloc: enables or disables memory preallocation
> +# @prealloc: whether memory was preallocated
>  #
>  # @host-nodes: host nodes for its memory policy
>  #

Reviewed-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-21 12:26 ` [PATCH v6 14/15] qmp: Include "reserve" property of memory backends David Hildenbrand
  2021-04-21 21:05   ` Eduardo Habkost
@ 2021-04-23 11:00   ` Markus Armbruster
  2021-04-23 11:03     ` David Hildenbrand
  1 sibling, 1 reply; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 11:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

David Hildenbrand <david@redhat.com> writes:

> Let's include the new property.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/core/machine-qmp-cmds.c | 1 +
>  qapi/machine.json          | 4 ++++
>  2 files changed, 5 insertions(+)
>
> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
> index d41db5b93b..2d135ecdd0 100644
> --- a/hw/core/machine-qmp-cmds.c
> +++ b/hw/core/machine-qmp-cmds.c
> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>          m->dump = object_property_get_bool(obj, "dump", &error_abort);
>          m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>          m->share = object_property_get_bool(obj, "share", &error_abort);
> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>          m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>                                               &error_abort);
>          host_nodes = object_property_get_qobject(obj,
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 32650bfe9e..5932139d20 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -798,6 +798,9 @@
>  #
>  # @share: whether memory is private to QEMU or shared (since 6.1)
>  #
> +# @reserve: whether swap space (or huge pages) was reserved if applicable
> +#           (since 6.1)
> +#
>  # @host-nodes: host nodes for its memory policy
>  #
>  # @policy: memory policy of memory backend
> @@ -812,6 +815,7 @@
>      'dump':       'bool',
>      'prealloc':   'bool',
>      'share':      'bool',
> +    'reserve':    'bool',
>      'host-nodes': ['uint16'],
>      'policy':     'HostMemPolicy' }}

Double-checking: true means definitely reserved, and false means
definitely not reserved.  Correct?

I'm asking because the discussion of v4 + passage of time = me
uncertain.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 11:00   ` Markus Armbruster
@ 2021-04-23 11:03     ` David Hildenbrand
  2021-04-23 11:21       ` Markus Armbruster
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 11:03 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

On 23.04.21 13:00, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> Let's include the new property.
>>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   hw/core/machine-qmp-cmds.c | 1 +
>>   qapi/machine.json          | 4 ++++
>>   2 files changed, 5 insertions(+)
>>
>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>> index d41db5b93b..2d135ecdd0 100644
>> --- a/hw/core/machine-qmp-cmds.c
>> +++ b/hw/core/machine-qmp-cmds.c
>> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>>           m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>           m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>           m->share = object_property_get_bool(obj, "share", &error_abort);
>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>           m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>                                                &error_abort);
>>           host_nodes = object_property_get_qobject(obj,
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index 32650bfe9e..5932139d20 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -798,6 +798,9 @@
>>   #
>>   # @share: whether memory is private to QEMU or shared (since 6.1)
>>   #
>> +# @reserve: whether swap space (or huge pages) was reserved if applicable
>> +#           (since 6.1)
>> +#
>>   # @host-nodes: host nodes for its memory policy
>>   #
>>   # @policy: memory policy of memory backend
>> @@ -812,6 +815,7 @@
>>       'dump':       'bool',
>>       'prealloc':   'bool',
>>       'share':      'bool',
>> +    'reserve':    'bool',
>>       'host-nodes': ['uint16'],
>>       'policy':     'HostMemPolicy' }}
> 
> Double-checking: true means definitely reserved, and false means
> definitely not reserved.  Correct?

True means "reserved if applicable" which means "not reserved if not 
applicable". False means "definitely not reserved".

(any recommendation how to rephrase are appreciated; I tried my best -- 
this interface here makes it especially hard -- it's easier for the 
property itself)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-21 12:26 ` [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
@ 2021-04-23 11:14   ` Markus Armbruster
  2021-04-23 11:16     ` David Hildenbrand
  2021-04-23 11:18     ` David Hildenbrand
  0 siblings, 2 replies; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 11:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

David Hildenbrand <david@redhat.com> writes:

> Let's provide a way to control the use of RAM_NORESERVE via memory
> backends using the "reserve" property which defaults to true (old
> behavior).
>
> Only Linux currently supports clearing the flag (and support is checked at
> runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
> Windows and other POSIX systems will bail out with "reserve=false".
>
> The target use case is virtio-mem, which dynamically exposes memory
> inside a large, sparse memory area to the VM. This essentially allows
> avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
> virtio-mem and also supporting hugetlbfs in the future.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  backends/hostmem-file.c  | 11 ++++++-----
>  backends/hostmem-memfd.c |  1 +
>  backends/hostmem-ram.c   |  1 +
>  backends/hostmem.c       | 32 ++++++++++++++++++++++++++++++++
>  include/sysemu/hostmem.h |  2 +-
>  qapi/qom.json            |  4 ++++
>  6 files changed, 45 insertions(+), 6 deletions(-)
>
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index b683da9daf..9d550e53d4 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -40,6 +40,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>                 object_get_typename(OBJECT(backend)));
>  #else
>      HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
> +    uint32_t ram_flags;
>      gchar *name;
>  
>      if (!backend->size) {
> @@ -52,11 +53,11 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>      }
>  
>      name = host_memory_backend_get_name(backend);
> -    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
> -                                     name,
> -                                     backend->size, fb->align,
> -                                     (backend->share ? RAM_SHARED : 0) |
> -                                     (fb->is_pmem ? RAM_PMEM : 0),
> +    ram_flags = backend->share ? RAM_SHARED : 0;
> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
> +    ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
> +    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
> +                                     backend->size, fb->align, ram_flags,
>                                       fb->mem_path, fb->readonly, errp);
>      g_free(name);
>  #endif
> diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
> index 93b5d1a4cf..f3436b623d 100644
> --- a/backends/hostmem-memfd.c
> +++ b/backends/hostmem-memfd.c
> @@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>  
>      name = host_memory_backend_get_name(backend);
>      ram_flags = backend->share ? RAM_SHARED : 0;
> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>      memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
>                                     backend->size, ram_flags, fd, 0, errp);
>      g_free(name);
> diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
> index 741e701062..b8e55cdbd0 100644
> --- a/backends/hostmem-ram.c
> +++ b/backends/hostmem-ram.c
> @@ -29,6 +29,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>  
>      name = host_memory_backend_get_name(backend);
>      ram_flags = backend->share ? RAM_SHARED : 0;
> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>      memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
>                                             backend->size, ram_flags, errp);
>      g_free(name);

As the commit message says, @reserve translates to RAM_NORESERVE.  Good.

I figure passing RAM_NORESERVE can't make these functions fail.
Correct?

@reserve defaults to true.  The commit message assures us this gives us
the old behavior.  Good.  But the patch *adds* flag RAM_NORESERVE when
it is true.  Now I'm confused.

> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index c6c1ff5b99..58fdc1b658 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -217,6 +217,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
>      Error *local_err = NULL;
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
>  
> +    if (!backend->reserve && value) {
> +        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
> +        return;
> +    }

Aha.  Shouldn't this be documented in qom.json?

> +
>      if (!host_memory_backend_mr_inited(backend)) {
>          backend->prealloc = value;
>          return;
> @@ -268,6 +273,7 @@ static void host_memory_backend_init(Object *obj)
>      /* TODO: convert access to globals to compat properties */
>      backend->merge = machine_mem_merge(machine);
>      backend->dump = machine_dump_guest_core(machine);
> +    backend->reserve = true;
>      backend->prealloc_threads = 1;
>  }
>  
> @@ -426,6 +432,28 @@ static void host_memory_backend_set_share(Object *o, bool value, Error **errp)
>      backend->share = value;
>  }
>  
> +static bool host_memory_backend_get_reserve(Object *o, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +
> +    return backend->reserve;
> +}
> +
> +static void host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(errp, "cannot change property value");
> +        return;
> +    }
> +    if (backend->prealloc && !value) {
> +        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
> +        return;
> +    }
> +    backend->reserve = value;
> +}
> +
>  static bool
>  host_memory_backend_get_use_canonical_path(Object *obj, Error **errp)
>  {
> @@ -494,6 +522,10 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
>          host_memory_backend_get_share, host_memory_backend_set_share);
>      object_class_property_set_description(oc, "share",
>          "Mark the memory as private to QEMU or shared");
> +    object_class_property_add_bool(oc, "reserve",
> +        host_memory_backend_get_reserve, host_memory_backend_set_reserve);
> +    object_class_property_set_description(oc, "reserve",
> +        "Reserve swap space (or huge pages) if applicable");
>      /*
>       * Do not delete/rename option. This option must be considered stable
>       * (as if it didn't have the 'x-' prefix including deprecation period) as
> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
> index df5644723a..9ff5c16963 100644
> --- a/include/sysemu/hostmem.h
> +++ b/include/sysemu/hostmem.h
> @@ -64,7 +64,7 @@ struct HostMemoryBackend {
>      /* protected */
>      uint64_t size;
>      bool merge, dump, use_canonical_path;
> -    bool prealloc, is_mapped, share;
> +    bool prealloc, is_mapped, share, reserve;
>      uint32_t prealloc_threads;
>      DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
>      HostMemPolicy policy;
> diff --git a/qapi/qom.json b/qapi/qom.json
> index cd0e76d564..e9b86893a5 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -545,6 +545,9 @@
>  # @share: if false, the memory is private to QEMU; if true, it is shared
>  #         (default: false)
>  #
> +# @reserve: if true, reserve swap space (or huge pages) if applicable
> +#           default: true)
> +#
>  # @size: size of the memory region in bytes
>  #
>  # @x-use-canonical-path-for-ramblock-id: if true, the canoncial path is used
> @@ -566,6 +569,7 @@
>              '*prealloc': 'bool',
>              '*prealloc-threads': 'uint32',
>              '*share': 'bool',
> +            '*reserve': 'bool',
>              'size': 'size',
>              '*x-use-canonical-path-for-ramblock-id': 'bool' } }



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 12/15] qmp: Include "share" property of memory backends
  2021-04-21 12:26 ` [PATCH v6 12/15] qmp: Include "share" property of memory backends David Hildenbrand
  2021-04-21 21:01   ` Eduardo Habkost
@ 2021-04-23 11:15   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 11:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

David Hildenbrand <david@redhat.com> writes:

> Let's include the property, which can be helpful when debugging,
> for example, to spot misuse of MAP_PRIVATE which can result in some ugly
> corner cases (e.g., double-memory consumption on shmem).
>
> Use the same description we also use for describing the property.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/core/machine-qmp-cmds.c | 1 +
>  qapi/machine.json          | 3 +++
>  2 files changed, 4 insertions(+)
>
> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
> index 68a942595a..d41db5b93b 100644
> --- a/hw/core/machine-qmp-cmds.c
> +++ b/hw/core/machine-qmp-cmds.c
> @@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque)
>          m->merge = object_property_get_bool(obj, "merge", &error_abort);
>          m->dump = object_property_get_bool(obj, "dump", &error_abort);
>          m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
> +        m->share = object_property_get_bool(obj, "share", &error_abort);
>          m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>                                               &error_abort);
>          host_nodes = object_property_get_qobject(obj,
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 758b901185..32650bfe9e 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -796,6 +796,8 @@
>  #
>  # @prealloc: whether memory was preallocated
>  #
> +# @share: whether memory is private to QEMU or shared (since 6.1)
> +#
>  # @host-nodes: host nodes for its memory policy
>  #
>  # @policy: memory policy of memory backend
> @@ -809,6 +811,7 @@
>      'merge':      'bool',
>      'dump':       'bool',
>      'prealloc':   'bool',
> +    'share':      'bool',
>      'host-nodes': ['uint16'],
>      'policy':     'HostMemPolicy' }}

Reviewed-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev"
  2021-04-21 12:26 ` [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev" David Hildenbrand
  2021-04-21 21:04   ` Eduardo Habkost
@ 2021-04-23 11:15   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 11:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

David Hildenbrand <david@redhat.com> writes:

> Let's print the property.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/core/machine-hmp-cmds.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> index 58248cffa3..004a92b3d6 100644
> --- a/hw/core/machine-hmp-cmds.c
> +++ b/hw/core/machine-hmp-cmds.c
> @@ -110,6 +110,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
>                         m->value->dump ? "true" : "false");
>          monitor_printf(mon, "  prealloc: %s\n",
>                         m->value->prealloc ? "true" : "false");
> +        monitor_printf(mon, "  share: %s\n",
> +                       m->value->share ? "true" : "false");
>          monitor_printf(mon, "  policy: %s\n",
>                         HostMemPolicy_str(m->value->policy));
>          visit_complete(v, &str);

Reviewed-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev"
  2021-04-21 12:26 ` [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
  2021-04-21 21:05   ` Eduardo Habkost
@ 2021-04-23 11:16   ` Markus Armbruster
  1 sibling, 0 replies; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 11:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

David Hildenbrand <david@redhat.com> writes:

> Let's print the new property.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/core/machine-hmp-cmds.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> index 004a92b3d6..9bedc77bb4 100644
> --- a/hw/core/machine-hmp-cmds.c
> +++ b/hw/core/machine-hmp-cmds.c
> @@ -112,6 +112,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
>                         m->value->prealloc ? "true" : "false");
>          monitor_printf(mon, "  share: %s\n",
>                         m->value->share ? "true" : "false");
> +        monitor_printf(mon, "  reserve: %s\n",
> +                       m->value->reserve ? "true" : "false");
>          monitor_printf(mon, "  policy: %s\n",
>                         HostMemPolicy_str(m->value->policy));
>          visit_complete(v, &str);

Reviewed-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-23 11:14   ` Markus Armbruster
@ 2021-04-23 11:16     ` David Hildenbrand
  2021-04-23 12:11       ` Markus Armbruster
  2021-04-23 11:18     ` David Hildenbrand
  1 sibling, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 11:16 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

On 23.04.21 13:14, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> Let's provide a way to control the use of RAM_NORESERVE via memory
>> backends using the "reserve" property which defaults to true (old
>> behavior).
>>
>> Only Linux currently supports clearing the flag (and support is checked at
>> runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
>> Windows and other POSIX systems will bail out with "reserve=false".
>>
>> The target use case is virtio-mem, which dynamically exposes memory
>> inside a large, sparse memory area to the VM. This essentially allows
>> avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
>> virtio-mem and also supporting hugetlbfs in the future.
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   backends/hostmem-file.c  | 11 ++++++-----
>>   backends/hostmem-memfd.c |  1 +
>>   backends/hostmem-ram.c   |  1 +
>>   backends/hostmem.c       | 32 ++++++++++++++++++++++++++++++++
>>   include/sysemu/hostmem.h |  2 +-
>>   qapi/qom.json            |  4 ++++
>>   6 files changed, 45 insertions(+), 6 deletions(-)
>>
>> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
>> index b683da9daf..9d550e53d4 100644
>> --- a/backends/hostmem-file.c
>> +++ b/backends/hostmem-file.c
>> @@ -40,6 +40,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>                  object_get_typename(OBJECT(backend)));
>>   #else
>>       HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
>> +    uint32_t ram_flags;
>>       gchar *name;
>>   
>>       if (!backend->size) {
>> @@ -52,11 +53,11 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>       }
>>   
>>       name = host_memory_backend_get_name(backend);
>> -    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>> -                                     name,
>> -                                     backend->size, fb->align,
>> -                                     (backend->share ? RAM_SHARED : 0) |
>> -                                     (fb->is_pmem ? RAM_PMEM : 0),
>> +    ram_flags = backend->share ? RAM_SHARED : 0;
>> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>> +    ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
>> +    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
>> +                                     backend->size, fb->align, ram_flags,
>>                                        fb->mem_path, fb->readonly, errp);
>>       g_free(name);
>>   #endif
>> diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
>> index 93b5d1a4cf..f3436b623d 100644
>> --- a/backends/hostmem-memfd.c
>> +++ b/backends/hostmem-memfd.c
>> @@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>   
>>       name = host_memory_backend_get_name(backend);
>>       ram_flags = backend->share ? RAM_SHARED : 0;
>> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>>       memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
>>                                      backend->size, ram_flags, fd, 0, errp);
>>       g_free(name);
>> diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
>> index 741e701062..b8e55cdbd0 100644
>> --- a/backends/hostmem-ram.c
>> +++ b/backends/hostmem-ram.c
>> @@ -29,6 +29,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>   
>>       name = host_memory_backend_get_name(backend);
>>       ram_flags = backend->share ? RAM_SHARED : 0;
>> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>>       memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
>>                                              backend->size, ram_flags, errp);
>>       g_free(name);
> 
> As the commit message says, @reserve translates to RAM_NORESERVE.  Good.
> 
> I figure passing RAM_NORESERVE can't make these functions fail.
> Correct?
> 
> @reserve defaults to true.  The commit message assures us this gives us
> the old behavior.  Good.  But the patch *adds* flag RAM_NORESERVE when
> it is true.  Now I'm confused.

ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;

translates to

if (!backend->reserve)
	ram_flags |= RAM_NORESERVE;

I thought for a while if calling the property "noreserve" would be 
cleaner, but decided against it.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-23 11:14   ` Markus Armbruster
  2021-04-23 11:16     ` David Hildenbrand
@ 2021-04-23 11:18     ` David Hildenbrand
  2021-04-23 12:10       ` Markus Armbruster
  1 sibling, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 11:18 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Paolo Bonzini, Stefan Hajnoczi,
	Murilo Opsfelder Araujo, Igor Mammedov,
	Philippe Mathieu-Daudé

>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>> index c6c1ff5b99..58fdc1b658 100644
>> --- a/backends/hostmem.c
>> +++ b/backends/hostmem.c
>> @@ -217,6 +217,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
>>       Error *local_err = NULL;
>>       HostMemoryBackend *backend = MEMORY_BACKEND(obj);
>>   
>> +    if (!backend->reserve && value) {
>> +        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
>> +        return;
>> +    }
> 
> Aha.  Shouldn't this be documented in qom.json?
> 

Whoops, skipped that comment. Can add it if that's the place to document 
that.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 11:03     ` David Hildenbrand
@ 2021-04-23 11:21       ` Markus Armbruster
  2021-04-23 11:33         ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 11:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

David Hildenbrand <david@redhat.com> writes:

> On 23.04.21 13:00, Markus Armbruster wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> Let's include the new property.
>>>
>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>>> Cc: Eric Blake <eblake@redhat.com>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   hw/core/machine-qmp-cmds.c | 1 +
>>>   qapi/machine.json          | 4 ++++
>>>   2 files changed, 5 insertions(+)
>>>
>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>> index d41db5b93b..2d135ecdd0 100644
>>> --- a/hw/core/machine-qmp-cmds.c
>>> +++ b/hw/core/machine-qmp-cmds.c
>>> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>           m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>           m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>>           m->share = object_property_get_bool(obj, "share", &error_abort);
>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>           m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>                                                &error_abort);
>>>           host_nodes = object_property_get_qobject(obj,
>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index 32650bfe9e..5932139d20 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -798,6 +798,9 @@
>>>   #
>>>   # @share: whether memory is private to QEMU or shared (since 6.1)
>>>   #
>>> +# @reserve: whether swap space (or huge pages) was reserved if applicable
>>> +#           (since 6.1)
>>> +#
>>>   # @host-nodes: host nodes for its memory policy
>>>   #
>>>   # @policy: memory policy of memory backend
>>> @@ -812,6 +815,7 @@
>>>       'dump':       'bool',
>>>       'prealloc':   'bool',
>>>       'share':      'bool',
>>> +    'reserve':    'bool',
>>>       'host-nodes': ['uint16'],
>>>       'policy':     'HostMemPolicy' }}
>> 
>> Double-checking: true means definitely reserved, and false means
>> definitely not reserved.  Correct?
>
> True means "reserved if applicable" which means "not reserved if not 
> applicable". False means "definitely not reserved".
>
> (any recommendation how to rephrase are appreciated; I tried my best -- 
> this interface here makes it especially hard -- it's easier for the 
> property itself)

When is it "applicable"?



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 11:21       ` Markus Armbruster
@ 2021-04-23 11:33         ` David Hildenbrand
  2021-04-23 12:13           ` Markus Armbruster
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 11:33 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

On 23.04.21 13:21, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 23.04.21 13:00, Markus Armbruster wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> Let's include the new property.
>>>>
>>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>>>> Cc: Eric Blake <eblake@redhat.com>
>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>    hw/core/machine-qmp-cmds.c | 1 +
>>>>    qapi/machine.json          | 4 ++++
>>>>    2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>>> index d41db5b93b..2d135ecdd0 100644
>>>> --- a/hw/core/machine-qmp-cmds.c
>>>> +++ b/hw/core/machine-qmp-cmds.c
>>>> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>>            m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>>            m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>>>            m->share = object_property_get_bool(obj, "share", &error_abort);
>>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>>            m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>>                                                 &error_abort);
>>>>            host_nodes = object_property_get_qobject(obj,
>>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>>> index 32650bfe9e..5932139d20 100644
>>>> --- a/qapi/machine.json
>>>> +++ b/qapi/machine.json
>>>> @@ -798,6 +798,9 @@
>>>>    #
>>>>    # @share: whether memory is private to QEMU or shared (since 6.1)
>>>>    #
>>>> +# @reserve: whether swap space (or huge pages) was reserved if applicable
>>>> +#           (since 6.1)
>>>> +#
>>>>    # @host-nodes: host nodes for its memory policy
>>>>    #
>>>>    # @policy: memory policy of memory backend
>>>> @@ -812,6 +815,7 @@
>>>>        'dump':       'bool',
>>>>        'prealloc':   'bool',
>>>>        'share':      'bool',
>>>> +    'reserve':    'bool',
>>>>        'host-nodes': ['uint16'],
>>>>        'policy':     'HostMemPolicy' }}
>>>
>>> Double-checking: true means definitely reserved, and false means
>>> definitely not reserved.  Correct?
>>
>> True means "reserved if applicable" which means "not reserved if not
>> applicable". False means "definitely not reserved".
>>
>> (any recommendation how to rephrase are appreciated; I tried my best --
>> this interface here makes it especially hard -- it's easier for the
>> property itself)
> 
> When is it "applicable"?
> 

When the OS supports it for the memory type and it hasn't been disabled.

Linux handling as described in
   [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via
   MAP_NORESERVE under Linux
and

   https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Summary *without* MAP_NORESERVE:

a) !Hugetlbfs with Memory overcommit disabled via
     ("/proc/sys/vm/overcommit_memory == 2"): never

b) !Hugetlbfs with Memory overcommit enabled
   1) Shared mappings on files: never

   2) Private mappings on files: only when writable (for us always)

   3) Shared anonymous memory: always

   4) Private anonymous memory: only when writable (for us always)

c) Hugetlbfs: Always


Under Windows: always. On other POSIX besides Linux -- don't know.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-23 11:18     ` David Hildenbrand
@ 2021-04-23 12:10       ` Markus Armbruster
  2021-04-23 14:21         ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 12:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

David Hildenbrand <david@redhat.com> writes:

>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>> index c6c1ff5b99..58fdc1b658 100644
>>> --- a/backends/hostmem.c
>>> +++ b/backends/hostmem.c
>>> @@ -217,6 +217,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
>>>       Error *local_err = NULL;
>>>       HostMemoryBackend *backend = MEMORY_BACKEND(obj);
>>>   +    if (!backend->reserve && value) {
>>> +        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
>>> +        return;
>>> +    }
>> 
>> Aha.  Shouldn't this be documented in qom.json?
>
> Whoops, skipped that comment. Can add it if that's the place to
> document that.

Yes, please.  .json doc comments is where we document the external
interface.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-23 11:16     ` David Hildenbrand
@ 2021-04-23 12:11       ` Markus Armbruster
  0 siblings, 0 replies; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 12:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

David Hildenbrand <david@redhat.com> writes:

> On 23.04.21 13:14, Markus Armbruster wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> Let's provide a way to control the use of RAM_NORESERVE via memory
>>> backends using the "reserve" property which defaults to true (old
>>> behavior).
>>>
>>> Only Linux currently supports clearing the flag (and support is checked at
>>> runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
>>> Windows and other POSIX systems will bail out with "reserve=false".
>>>
>>> The target use case is virtio-mem, which dynamically exposes memory
>>> inside a large, sparse memory area to the VM. This essentially allows
>>> avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
>>> virtio-mem and also supporting hugetlbfs in the future.
>>>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Eric Blake <eblake@redhat.com>
>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   backends/hostmem-file.c  | 11 ++++++-----
>>>   backends/hostmem-memfd.c |  1 +
>>>   backends/hostmem-ram.c   |  1 +
>>>   backends/hostmem.c       | 32 ++++++++++++++++++++++++++++++++
>>>   include/sysemu/hostmem.h |  2 +-
>>>   qapi/qom.json            |  4 ++++
>>>   6 files changed, 45 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
>>> index b683da9daf..9d550e53d4 100644
>>> --- a/backends/hostmem-file.c
>>> +++ b/backends/hostmem-file.c
>>> @@ -40,6 +40,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>>                  object_get_typename(OBJECT(backend)));
>>>   #else
>>>       HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
>>> +    uint32_t ram_flags;
>>>       gchar *name;
>>>         if (!backend->size) {
>>> @@ -52,11 +53,11 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>>       }
>>>         name = host_memory_backend_get_name(backend);
>>> -    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>>> -                                     name,
>>> -                                     backend->size, fb->align,
>>> -                                     (backend->share ? RAM_SHARED : 0) |
>>> -                                     (fb->is_pmem ? RAM_PMEM : 0),
>>> +    ram_flags = backend->share ? RAM_SHARED : 0;
>>> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>>> +    ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
>>> +    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
>>> +                                     backend->size, fb->align, ram_flags,
>>>                                        fb->mem_path, fb->readonly, errp);
>>>       g_free(name);
>>>   #endif
>>> diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
>>> index 93b5d1a4cf..f3436b623d 100644
>>> --- a/backends/hostmem-memfd.c
>>> +++ b/backends/hostmem-memfd.c
>>> @@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>>         name = host_memory_backend_get_name(backend);
>>>       ram_flags = backend->share ? RAM_SHARED : 0;
>>> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>>>       memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
>>>                                      backend->size, ram_flags, fd, 0, errp);
>>>       g_free(name);
>>> diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
>>> index 741e701062..b8e55cdbd0 100644
>>> --- a/backends/hostmem-ram.c
>>> +++ b/backends/hostmem-ram.c
>>> @@ -29,6 +29,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>>         name = host_memory_backend_get_name(backend);
>>>       ram_flags = backend->share ? RAM_SHARED : 0;
>>> +    ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>>>       memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
>>>                                              backend->size, ram_flags, errp);
>>>       g_free(name);
>>
>> As the commit message says, @reserve translates to RAM_NORESERVE.
>> Good.
>> I figure passing RAM_NORESERVE can't make these functions fail.
>> Correct?
>> @reserve defaults to true.  The commit message assures us this gives
>> us
>> the old behavior.  Good.  But the patch *adds* flag RAM_NORESERVE when
>> it is true.  Now I'm confused.
>
> ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
>
> translates to
>
> if (!backend->reserve)
> 	ram_flags |= RAM_NORESERVE;

You're right!  /me uncrosses eyes...

> I thought for a while if calling the property "noreserve" would be
> cleaner, but decided against it.

I dislike "negative" flag names, too.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 11:33         ` David Hildenbrand
@ 2021-04-23 12:13           ` Markus Armbruster
  2021-04-23 12:29             ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 12:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

David Hildenbrand <david@redhat.com> writes:

> On 23.04.21 13:21, Markus Armbruster wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> On 23.04.21 13:00, Markus Armbruster wrote:
>>>> David Hildenbrand <david@redhat.com> writes:
>>>>
>>>>> Let's include the new property.
>>>>>
>>>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>>>>> Cc: Eric Blake <eblake@redhat.com>
>>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>>    hw/core/machine-qmp-cmds.c | 1 +
>>>>>    qapi/machine.json          | 4 ++++
>>>>>    2 files changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>>>> index d41db5b93b..2d135ecdd0 100644
>>>>> --- a/hw/core/machine-qmp-cmds.c
>>>>> +++ b/hw/core/machine-qmp-cmds.c
>>>>> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>>>            m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>>>            m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>>>>            m->share = object_property_get_bool(obj, "share", &error_abort);
>>>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>>>            m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>>>                                                 &error_abort);
>>>>>            host_nodes = object_property_get_qobject(obj,
>>>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>>>> index 32650bfe9e..5932139d20 100644
>>>>> --- a/qapi/machine.json
>>>>> +++ b/qapi/machine.json
>>>>> @@ -798,6 +798,9 @@
>>>>>    #
>>>>>    # @share: whether memory is private to QEMU or shared (since 6.1)
>>>>>    #
>>>>> +# @reserve: whether swap space (or huge pages) was reserved if applicable
>>>>> +#           (since 6.1)
>>>>> +#
>>>>>    # @host-nodes: host nodes for its memory policy
>>>>>    #
>>>>>    # @policy: memory policy of memory backend
>>>>> @@ -812,6 +815,7 @@
>>>>>        'dump':       'bool',
>>>>>        'prealloc':   'bool',
>>>>>        'share':      'bool',
>>>>> +    'reserve':    'bool',
>>>>>        'host-nodes': ['uint16'],
>>>>>        'policy':     'HostMemPolicy' }}
>>>>
>>>> Double-checking: true means definitely reserved, and false means
>>>> definitely not reserved.  Correct?
>>>
>>> True means "reserved if applicable" which means "not reserved if not
>>> applicable". False means "definitely not reserved".
>>>
>>> (any recommendation how to rephrase are appreciated; I tried my best --
>>> this interface here makes it especially hard -- it's easier for the
>>> property itself)
>> 
>> When is it "applicable"?
>
> When the OS supports it for the memory type and it hasn't been disabled.
>
> Linux handling as described in
>   [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via
>   MAP_NORESERVE under Linux
> and
>
>   https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>
> Summary *without* MAP_NORESERVE:
>
> a) !Hugetlbfs with Memory overcommit disabled via
>     ("/proc/sys/vm/overcommit_memory == 2"): never
>
> b) !Hugetlbfs with Memory overcommit enabled
>   1) Shared mappings on files: never
>
>   2) Private mappings on files: only when writable (for us always)
>
>   3) Shared anonymous memory: always
>
>   4) Private anonymous memory: only when writable (for us always)
>
> c) Hugetlbfs: Always
>
>
> Under Windows: always. On other POSIX besides Linux -- don't know.

Would working some of this into the doc comment help users of the
interface?  Up to you.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 12:13           ` Markus Armbruster
@ 2021-04-23 12:29             ` David Hildenbrand
  2021-04-23 14:13               ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 12:29 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

On 23.04.21 14:13, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 23.04.21 13:21, Markus Armbruster wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> On 23.04.21 13:00, Markus Armbruster wrote:
>>>>> David Hildenbrand <david@redhat.com> writes:
>>>>>
>>>>>> Let's include the new property.
>>>>>>
>>>>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>>>>>> Cc: Eric Blake <eblake@redhat.com>
>>>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>     hw/core/machine-qmp-cmds.c | 1 +
>>>>>>     qapi/machine.json          | 4 ++++
>>>>>>     2 files changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>>>>> index d41db5b93b..2d135ecdd0 100644
>>>>>> --- a/hw/core/machine-qmp-cmds.c
>>>>>> +++ b/hw/core/machine-qmp-cmds.c
>>>>>> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>>>>             m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>>>>             m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>>>>>             m->share = object_property_get_bool(obj, "share", &error_abort);
>>>>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>>>>             m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>>>>                                                  &error_abort);
>>>>>>             host_nodes = object_property_get_qobject(obj,
>>>>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>>>>> index 32650bfe9e..5932139d20 100644
>>>>>> --- a/qapi/machine.json
>>>>>> +++ b/qapi/machine.json
>>>>>> @@ -798,6 +798,9 @@
>>>>>>     #
>>>>>>     # @share: whether memory is private to QEMU or shared (since 6.1)
>>>>>>     #
>>>>>> +# @reserve: whether swap space (or huge pages) was reserved if applicable
>>>>>> +#           (since 6.1)
>>>>>> +#
>>>>>>     # @host-nodes: host nodes for its memory policy
>>>>>>     #
>>>>>>     # @policy: memory policy of memory backend
>>>>>> @@ -812,6 +815,7 @@
>>>>>>         'dump':       'bool',
>>>>>>         'prealloc':   'bool',
>>>>>>         'share':      'bool',
>>>>>> +    'reserve':    'bool',
>>>>>>         'host-nodes': ['uint16'],
>>>>>>         'policy':     'HostMemPolicy' }}
>>>>>
>>>>> Double-checking: true means definitely reserved, and false means
>>>>> definitely not reserved.  Correct?
>>>>
>>>> True means "reserved if applicable" which means "not reserved if not
>>>> applicable". False means "definitely not reserved".
>>>>
>>>> (any recommendation how to rephrase are appreciated; I tried my best --
>>>> this interface here makes it especially hard -- it's easier for the
>>>> property itself)
>>>
>>> When is it "applicable"?
>>
>> When the OS supports it for the memory type and it hasn't been disabled.
>>
>> Linux handling as described in
>>    [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via
>>    MAP_NORESERVE under Linux
>> and
>>
>>    https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>>
>> Summary *without* MAP_NORESERVE:
>>
>> a) !Hugetlbfs with Memory overcommit disabled via
>>      ("/proc/sys/vm/overcommit_memory == 2"): never
>>
>> b) !Hugetlbfs with Memory overcommit enabled
>>    1) Shared mappings on files: never
>>
>>    2) Private mappings on files: only when writable (for us always)
>>
>>    3) Shared anonymous memory: always
>>
>>    4) Private anonymous memory: only when writable (for us always)
>>
>> c) Hugetlbfs: Always
>>
>>
>> Under Windows: always. On other POSIX besides Linux -- don't know.
> 
> Would working some of this into the doc comment help users of the
> interface?  Up to you.
> 

I'll give it a thought. Most people will only care about explicitly 
disabling it, where we'll bail out if that doesn't work. Otherwise, they 
just use the OS default (== reserve if supported/applicable/not 
explicitly disabled).

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 12:29             ` David Hildenbrand
@ 2021-04-23 14:13               ` David Hildenbrand
  2021-04-23 14:55                 ` Markus Armbruster
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 14:13 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

On 23.04.21 14:29, David Hildenbrand wrote:
> On 23.04.21 14:13, Markus Armbruster wrote:
>> David Hildenbrand <david@redhat.com> writes:
>>
>>> On 23.04.21 13:21, Markus Armbruster wrote:
>>>> David Hildenbrand <david@redhat.com> writes:
>>>>
>>>>> On 23.04.21 13:00, Markus Armbruster wrote:
>>>>>> David Hildenbrand <david@redhat.com> writes:
>>>>>>
>>>>>>> Let's include the new property.
>>>>>>>
>>>>>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>>>>>>> Cc: Eric Blake <eblake@redhat.com>
>>>>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>>> ---
>>>>>>>      hw/core/machine-qmp-cmds.c | 1 +
>>>>>>>      qapi/machine.json          | 4 ++++
>>>>>>>      2 files changed, 5 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>>>>>> index d41db5b93b..2d135ecdd0 100644
>>>>>>> --- a/hw/core/machine-qmp-cmds.c
>>>>>>> +++ b/hw/core/machine-qmp-cmds.c
>>>>>>> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>>>>>              m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>>>>>              m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>>>>>>              m->share = object_property_get_bool(obj, "share", &error_abort);
>>>>>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>>>>>              m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>>>>>                                                   &error_abort);
>>>>>>>              host_nodes = object_property_get_qobject(obj,
>>>>>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>>>>>> index 32650bfe9e..5932139d20 100644
>>>>>>> --- a/qapi/machine.json
>>>>>>> +++ b/qapi/machine.json
>>>>>>> @@ -798,6 +798,9 @@
>>>>>>>      #
>>>>>>>      # @share: whether memory is private to QEMU or shared (since 6.1)
>>>>>>>      #
>>>>>>> +# @reserve: whether swap space (or huge pages) was reserved if applicable
>>>>>>> +#           (since 6.1)
>>>>>>> +#
>>>>>>>      # @host-nodes: host nodes for its memory policy
>>>>>>>      #
>>>>>>>      # @policy: memory policy of memory backend
>>>>>>> @@ -812,6 +815,7 @@
>>>>>>>          'dump':       'bool',
>>>>>>>          'prealloc':   'bool',
>>>>>>>          'share':      'bool',
>>>>>>> +    'reserve':    'bool',
>>>>>>>          'host-nodes': ['uint16'],
>>>>>>>          'policy':     'HostMemPolicy' }}
>>>>>>
>>>>>> Double-checking: true means definitely reserved, and false means
>>>>>> definitely not reserved.  Correct?
>>>>>
>>>>> True means "reserved if applicable" which means "not reserved if not
>>>>> applicable". False means "definitely not reserved".
>>>>>
>>>>> (any recommendation how to rephrase are appreciated; I tried my best --
>>>>> this interface here makes it especially hard -- it's easier for the
>>>>> property itself)
>>>>
>>>> When is it "applicable"?
>>>
>>> When the OS supports it for the memory type and it hasn't been disabled.
>>>
>>> Linux handling as described in
>>>     [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via
>>>     MAP_NORESERVE under Linux
>>> and
>>>
>>>     https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>>>
>>> Summary *without* MAP_NORESERVE:
>>>
>>> a) !Hugetlbfs with Memory overcommit disabled via
>>>       ("/proc/sys/vm/overcommit_memory == 2"): never
>>>
>>> b) !Hugetlbfs with Memory overcommit enabled
>>>     1) Shared mappings on files: never
>>>
>>>     2) Private mappings on files: only when writable (for us always)
>>>
>>>     3) Shared anonymous memory: always
>>>
>>>     4) Private anonymous memory: only when writable (for us always)
>>>
>>> c) Hugetlbfs: Always

Actually, it's a bit simpler in our case (writable mappings). The more complex part is the MAP_NORESERVE handling.

a) !Hugetlbfs
	1) Shared mappings on files: never reserve swap space

	2) Other mappings: always reserve swap space

b) Hugetlbfs: Always reserve huge pages.

>>>
>>>
>>> Under Windows: always. On other POSIX besides Linux -- don't know.
>>
>> Would working some of this into the doc comment help users of the
>> interface?  Up to you.
>>
> 
> I'll give it a thought. Most people will only care about explicitly
> disabling it, where we'll bail out if that doesn't work. Otherwise, they
> just use the OS default (== reserve if supported/applicable/not
> explicitly disabled).

I think something like the following might make sense:

# @reserve: whether swap space (or huge pages) was reserved if applicable.
#           This corresponds to the user configuration and not the actual
#           behavior implemented in the OS to perform a reservation;
#           For example, Linux will never reserve swap space for shared
#           file mappings. (since 6.1)

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-23 12:10       ` Markus Armbruster
@ 2021-04-23 14:21         ` David Hildenbrand
  2021-04-23 14:53           ` Markus Armbruster
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 14:21 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

On 23.04.21 14:10, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>>> index c6c1ff5b99..58fdc1b658 100644
>>>> --- a/backends/hostmem.c
>>>> +++ b/backends/hostmem.c
>>>> @@ -217,6 +217,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
>>>>        Error *local_err = NULL;
>>>>        HostMemoryBackend *backend = MEMORY_BACKEND(obj);
>>>>    +    if (!backend->reserve && value) {
>>>> +        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
>>>> +        return;
>>>> +    }
>>>
>>> Aha.  Shouldn't this be documented in qom.json?
>>
>> Whoops, skipped that comment. Can add it if that's the place to
>> document that.
> 
> Yes, please.  .json doc comments is where we document the external
> interface.
> 

What about something like this:

diff --git a/qapi/qom.json b/qapi/qom.json
index e9b86893a5..4fa3137aab 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -559,6 +559,12 @@
  #                                        false generally, but true for machine
  #                                        types <= 4.0)
  #
+# Note: prealloc=true and reserve=false cannot be set at the same time. With
+#       reserve=true, the behavior depends on the operating system: for example,
+#       Linux will not reserve swap space for shared file mappings --
+#       "not applicable". In contrast, reserve=false will bail out if it cannot
+#       be configured accordingly.
+#
  # Since: 2.1
  ##
  { 'struct': 'MemoryBackendProperties',


-- 
Thanks,

David / dhildenb



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property
  2021-04-23 14:21         ` David Hildenbrand
@ 2021-04-23 14:53           ` Markus Armbruster
  0 siblings, 0 replies; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 14:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

David Hildenbrand <david@redhat.com> writes:

> On 23.04.21 14:10, Markus Armbruster wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>>>> index c6c1ff5b99..58fdc1b658 100644
>>>>> --- a/backends/hostmem.c
>>>>> +++ b/backends/hostmem.c
>>>>> @@ -217,6 +217,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
>>>>>        Error *local_err = NULL;
>>>>>        HostMemoryBackend *backend = MEMORY_BACKEND(obj);
>>>>>    +    if (!backend->reserve && value) {
>>>>> +        error_setg(errp, "'prealloc=on' and 'reserve=off' are incompatible");
>>>>> +        return;
>>>>> +    }
>>>>
>>>> Aha.  Shouldn't this be documented in qom.json?
>>>
>>> Whoops, skipped that comment. Can add it if that's the place to
>>> document that.
>> 
>> Yes, please.  .json doc comments is where we document the external
>> interface.
>> 
>
> What about something like this:
>
> diff --git a/qapi/qom.json b/qapi/qom.json
> index e9b86893a5..4fa3137aab 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -559,6 +559,12 @@
>   #                                        false generally, but true for machine
>   #                                        types <= 4.0)
>   #
> +# Note: prealloc=true and reserve=false cannot be set at the same time. With
> +#       reserve=true, the behavior depends on the operating system: for example,
> +#       Linux will not reserve swap space for shared file mappings --
> +#       "not applicable". In contrast, reserve=false will bail out if it cannot
> +#       be configured accordingly.
> +#
>   # Since: 2.1
>   ##
>   { 'struct': 'MemoryBackendProperties',

Thanks!

Reviewed-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 14:13               ` David Hildenbrand
@ 2021-04-23 14:55                 ` Markus Armbruster
  2021-04-23 15:04                   ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Markus Armbruster @ 2021-04-23 14:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

David Hildenbrand <david@redhat.com> writes:

> On 23.04.21 14:29, David Hildenbrand wrote:
>> On 23.04.21 14:13, Markus Armbruster wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> On 23.04.21 13:21, Markus Armbruster wrote:
>>>>> David Hildenbrand <david@redhat.com> writes:
>>>>>
>>>>>> On 23.04.21 13:00, Markus Armbruster wrote:
>>>>>>> David Hildenbrand <david@redhat.com> writes:
>>>>>>>
>>>>>>>> Let's include the new property.
>>>>>>>>
>>>>>>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>>>>>>>> Cc: Eric Blake <eblake@redhat.com>
>>>>>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>>>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>>>> ---
>>>>>>>>      hw/core/machine-qmp-cmds.c | 1 +
>>>>>>>>      qapi/machine.json          | 4 ++++
>>>>>>>>      2 files changed, 5 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
>>>>>>>> index d41db5b93b..2d135ecdd0 100644
>>>>>>>> --- a/hw/core/machine-qmp-cmds.c
>>>>>>>> +++ b/hw/core/machine-qmp-cmds.c
>>>>>>>> @@ -175,6 +175,7 @@ static int query_memdev(Object *obj, void *opaque)
>>>>>>>>              m->dump = object_property_get_bool(obj, "dump", &error_abort);
>>>>>>>>              m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort);
>>>>>>>>              m->share = object_property_get_bool(obj, "share", &error_abort);
>>>>>>>> +        m->reserve = object_property_get_bool(obj, "reserve", &error_abort);
>>>>>>>>              m->policy = object_property_get_enum(obj, "policy", "HostMemPolicy",
>>>>>>>>                                                   &error_abort);
>>>>>>>>              host_nodes = object_property_get_qobject(obj,
>>>>>>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>>>>>>> index 32650bfe9e..5932139d20 100644
>>>>>>>> --- a/qapi/machine.json
>>>>>>>> +++ b/qapi/machine.json
>>>>>>>> @@ -798,6 +798,9 @@
>>>>>>>>      #
>>>>>>>>      # @share: whether memory is private to QEMU or shared (since 6.1)
>>>>>>>>      #
>>>>>>>> +# @reserve: whether swap space (or huge pages) was reserved if applicable
>>>>>>>> +#           (since 6.1)
>>>>>>>> +#
>>>>>>>>      # @host-nodes: host nodes for its memory policy
>>>>>>>>      #
>>>>>>>>      # @policy: memory policy of memory backend
>>>>>>>> @@ -812,6 +815,7 @@
>>>>>>>>          'dump':       'bool',
>>>>>>>>          'prealloc':   'bool',
>>>>>>>>          'share':      'bool',
>>>>>>>> +    'reserve':    'bool',
>>>>>>>>          'host-nodes': ['uint16'],
>>>>>>>>          'policy':     'HostMemPolicy' }}
>>>>>>>
>>>>>>> Double-checking: true means definitely reserved, and false means
>>>>>>> definitely not reserved.  Correct?
>>>>>>
>>>>>> True means "reserved if applicable" which means "not reserved if not
>>>>>> applicable". False means "definitely not reserved".
>>>>>>
>>>>>> (any recommendation how to rephrase are appreciated; I tried my best --
>>>>>> this interface here makes it especially hard -- it's easier for the
>>>>>> property itself)
>>>>>
>>>>> When is it "applicable"?
>>>>
>>>> When the OS supports it for the memory type and it hasn't been disabled.
>>>>
>>>> Linux handling as described in
>>>>     [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via
>>>>     MAP_NORESERVE under Linux
>>>> and
>>>>
>>>>     https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>>>>
>>>> Summary *without* MAP_NORESERVE:
>>>>
>>>> a) !Hugetlbfs with Memory overcommit disabled via
>>>>       ("/proc/sys/vm/overcommit_memory == 2"): never
>>>>
>>>> b) !Hugetlbfs with Memory overcommit enabled
>>>>     1) Shared mappings on files: never
>>>>
>>>>     2) Private mappings on files: only when writable (for us always)
>>>>
>>>>     3) Shared anonymous memory: always
>>>>
>>>>     4) Private anonymous memory: only when writable (for us always)
>>>>
>>>> c) Hugetlbfs: Always
>
> Actually, it's a bit simpler in our case (writable mappings). The more complex part is the MAP_NORESERVE handling.
>
> a) !Hugetlbfs
> 	1) Shared mappings on files: never reserve swap space
>
> 	2) Other mappings: always reserve swap space
>
> b) Hugetlbfs: Always reserve huge pages.
>
>>>>
>>>>
>>>> Under Windows: always. On other POSIX besides Linux -- don't know.
>>>
>>> Would working some of this into the doc comment help users of the
>>> interface?  Up to you.
>>>
>> I'll give it a thought. Most people will only care about explicitly
>> disabling it, where we'll bail out if that doesn't work. Otherwise, they
>> just use the OS default (== reserve if supported/applicable/not
>> explicitly disabled).
>
> I think something like the following might make sense:
>
> # @reserve: whether swap space (or huge pages) was reserved if applicable.
> #           This corresponds to the user configuration and not the actual
> #           behavior implemented in the OS to perform a reservation;
> #           For example, Linux will never reserve swap space for shared
> #           file mappings. (since 6.1)
>
> Thanks!

Good balance between too little and too much detail.

Reviewed-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 14/15] qmp: Include "reserve" property of memory backends
  2021-04-23 14:55                 ` Markus Armbruster
@ 2021-04-23 15:04                   ` David Hildenbrand
  0 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-23 15:04 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Greg Kurz, Stefan Hajnoczi, Murilo Opsfelder Araujo,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Igor Mammedov

On 23.04.21 16:55, Markus Armbruster wrote:
> Reviewed-by: Markus Armbruster<armbru@redhat.com>

Thanks Markus -- have a nice weekend!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property
  2021-04-23 10:35   ` David Hildenbrand
@ 2021-04-28 16:35     ` Eduardo Habkost
  0 siblings, 0 replies; 47+ messages in thread
From: Eduardo Habkost @ 2021-04-28 16:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Murilo Opsfelder Araujo,
	Richard Henderson, Dr. David Alan Gilbert, Peter Xu, qemu-devel,
	Greg Kurz, Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Fri, Apr 23, 2021 at 12:35:35PM +0200, David Hildenbrand wrote:
> On 21.04.21 23:06, Eduardo Habkost wrote:
> > On Wed, Apr 21, 2021 at 02:26:09PM +0200, David Hildenbrand wrote:
> > > Based-on: 20210406080126.24010-1-david@redhat.com
> > > 
> > > Some cleanups previously sent in other context (resizeable allocations),
> > > followed by RAM_NORESERVE, implementing it under Linux using MAP_NORESERVE,
> > > and letting users configure it for memory backens using the "reserve"
> > > property (default: true).
> > > 
> > > MAP_NORESERVE under Linux has in the context of QEMU an effect on
> > > 1) Private/shared anonymous memory
> > > -> memory-backend-ram,id=mem0,size=10G
> > > 2) Private fd-based mappings
> > > -> memory-backend-file,id=mem0,size=10G,mem-path=/dev/shm/0
> > > -> memory-backend-memfd,id=mem0,size=10G
> > > 3) Private/shared hugetlb mappings
> > > -> memory-backend-memfd,id=mem0,size=10G,hugetlb=on,hugetlbsize=2M
> > > 
> > > With MAP_NORESERVE/"reserve=off", we won't be reserving swap space (1/2) or
> > > huge pages (3) for the whole memory region.
> > > 
> > > The target use case is virtio-mem, which dynamically exposes memory
> > > inside a large, sparse memory area to the VM. MAP_NORESERVE tells the OS
> > > "this mapping might be very sparse". This essentially allows
> > > avoiding having to set "/proc/sys/vm/overcommit_memory == 1") when using
> > > virtio-mem and also supporting hugetlbfs in the future.
> > 
> > For the memory backend and machine core code:
> > 
> > Acked-by: Eduardo Habkost <ehabkost@redhat.com>
> > 
> 
> Thanks Eduardo. I assume Paolo will be the right maintainer to pick this up?
> (I hope he'll have time to look at this and some of my
> pending-for-a-long-time series soon :( )

I can pick it up (or you could merge it through your tree if you
prefer), but I would like to get at least an Acked-by from Paolo
first.

On the other hand, maybe v7 is mature enough to not require
waiting too much longer.

-- 
Eduardo



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-04-28 16:38 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21 12:26 [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 01/15] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 02/15] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 03/15] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 04/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd() David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 05/15] softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate() David Hildenbrand
2021-04-21 12:33   ` Philippe Mathieu-Daudé
2021-04-21 12:26 ` [PATCH v6 06/15] softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal() David Hildenbrand
2021-04-21 12:34   ` Philippe Mathieu-Daudé
2021-04-21 12:26 ` [PATCH v6 07/15] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
2021-04-21 12:37   ` Philippe Mathieu-Daudé
2021-04-21 12:26 ` [PATCH v6 08/15] memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 09/15] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 10/15] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
2021-04-23 11:14   ` Markus Armbruster
2021-04-23 11:16     ` David Hildenbrand
2021-04-23 12:11       ` Markus Armbruster
2021-04-23 11:18     ` David Hildenbrand
2021-04-23 12:10       ` Markus Armbruster
2021-04-23 14:21         ` David Hildenbrand
2021-04-23 14:53           ` Markus Armbruster
2021-04-21 12:26 ` [PATCH v6 11/15] qmp: Clarify memory backend properties returned via query-memdev David Hildenbrand
2021-04-21 21:00   ` Eduardo Habkost
2021-04-23 10:55   ` Markus Armbruster
2021-04-21 12:26 ` [PATCH v6 12/15] qmp: Include "share" property of memory backends David Hildenbrand
2021-04-21 21:01   ` Eduardo Habkost
2021-04-23 11:15   ` Markus Armbruster
2021-04-21 12:26 ` [PATCH v6 13/15] hmp: Print "share" property of memory backends with "info memdev" David Hildenbrand
2021-04-21 21:04   ` Eduardo Habkost
2021-04-23 11:15   ` Markus Armbruster
2021-04-21 12:26 ` [PATCH v6 14/15] qmp: Include "reserve" property of memory backends David Hildenbrand
2021-04-21 21:05   ` Eduardo Habkost
2021-04-23 11:00   ` Markus Armbruster
2021-04-23 11:03     ` David Hildenbrand
2021-04-23 11:21       ` Markus Armbruster
2021-04-23 11:33         ` David Hildenbrand
2021-04-23 12:13           ` Markus Armbruster
2021-04-23 12:29             ` David Hildenbrand
2021-04-23 14:13               ` David Hildenbrand
2021-04-23 14:55                 ` Markus Armbruster
2021-04-23 15:04                   ` David Hildenbrand
2021-04-21 12:26 ` [PATCH v6 15/15] hmp: Print "reserve" property of memory backends with "info memdev" David Hildenbrand
2021-04-21 21:05   ` Eduardo Habkost
2021-04-23 11:16   ` Markus Armbruster
2021-04-21 21:06 ` [PATCH v6 00/15] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property Eduardo Habkost
2021-04-23 10:35   ` David Hildenbrand
2021-04-28 16:35     ` Eduardo Habkost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).