All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file
@ 2019-01-02  5:25 Zhang Yi
  2019-01-02  5:25 ` [Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message Zhang Yi
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Zhang Yi @ 2019-01-02  5:25 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst
  Cc: qemu-devel, imammedo, dan.j.williams, ehabkost, Zhang Yi

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

A new on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
        'share=off' or 'pmem=off', QEMU will not pass this flag to mmap(2)
 - off: (default) never pass MAP_SYNC to mmap(2)

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (5):
  numa: Fixed the memory leak of numa error message
  util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  hostmem: add more information in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 34 ++++++++++++++++++++++++++++++++--
 backends/hostmem.c        |  8 +++++---
 docs/nvdimm.txt           | 23 ++++++++++++++++++++++-
 exec.c                    |  9 +++++----
 include/exec/memory.h     | 26 ++++++++++++++++++++------
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
 include/qemu/osdep.h      | 16 ++++++++++++++++
 numa.c                    |  1 +
 qemu-options.hx           | 19 ++++++++++++++++++-
 util/mmap-alloc.c         | 20 ++++++++++++++++----
 util/oslib-posix.c        |  9 ++++++++-
 12 files changed, 163 insertions(+), 23 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message
  2019-01-02  5:25 [Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file Zhang Yi
@ 2019-01-02  5:25 ` Zhang Yi
  2019-01-14 18:54   ` Eduardo Habkost
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2019-01-02  5:25 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst
  Cc: qemu-devel, imammedo, dan.j.williams, ehabkost, Zhang Yi

object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Pankaj gupta <pagupta@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
             error_report("memory backend %s is used multiple times. Each "
                          "-numa option must use a different memdev value.",
                          path);
+            g_free(path);
             exit(1);
         }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  2019-01-02  5:25 [Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file Zhang Yi
  2019-01-02  5:25 ` [Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message Zhang Yi
@ 2019-01-02  5:26 ` Zhang Yi
  2019-01-14 18:50   ` Eduardo Habkost
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2019-01-02  5:26 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst
  Cc: qemu-devel, imammedo, dan.j.williams, ehabkost, Zhang Yi

As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 exec.c                    |  7 ++++---
 include/exec/memory.h     | 22 ++++++++++++++++------
 include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
 util/mmap-alloc.c         |  8 +++++---
 util/oslib-posix.c        |  9 ++++++++-
 5 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             int fd,
                             bool truncate,
+                            uint32_t flags,
                             Error **errp)
 {
     void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED);
+    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+            ram_flags, errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..6e30c23 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -103,28 +103,38 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+#ifdef __CHECKER__
+#define QEMU_BITWISE __attribute__((bitwise))
+#define QEMU_FORCE   __attribute__((force))
+#else
+#define QEMU_BITWISE
+#define QEMU_FORCE
+#endif
+
+typedef unsigned QEMU_BITWISE QemuMmapFlags;
+
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
+#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
 
 /* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED     (1 << 1)
+#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
 
 /* Only a portion of RAM (used_length) is actually used, and migrated.
  * This used_length size can change across reboots.
  */
-#define RAM_RESIZEABLE (1 << 2)
+#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
 
 /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
  * zero the page and wake waiting processes.
  * (Set during postcopy)
  */
-#define RAM_UF_ZEROPAGE (1 << 3)
+#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
 
 /* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
+#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
 
 /* RAM is a persistent kind memory */
-#define RAM_PMEM (1 << 5)
+#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *          otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *          bit-or of following values
+ *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
      * anonymous memory is OK.
      */
     int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
-    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+    bool shared = flags & RAM_SHARED;
     size_t offset;
     void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..75a0171 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -54,6 +54,7 @@
 #endif
 
 #include "qemu/mmap-alloc.h"
+#include "exec/memory.h"
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #include "qemu/error-report.h"
@@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, shared);
+    uint32_t flags = 0;
+    void *ptr;
+
+    if (shared) {
+        flags = RAM_SHARED;
+    }
+    ptr = qemu_ram_mmap(-1, size, align, flags);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-02  5:25 [Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file Zhang Yi
  2019-01-02  5:25 ` [Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message Zhang Yi
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
@ 2019-01-02  5:26 ` Zhang Yi
  2019-01-14 19:07   ` Eduardo Habkost
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 4/5] hostmem: add more information in error messages Zhang Yi
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option Zhang Yi
  4 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2019-01-02  5:26 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst
  Cc: qemu-devel, imammedo, dan.j.williams, ehabkost, Zhang Yi

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 include/qemu/osdep.h | 16 ++++++++++++++++
 util/mmap-alloc.c    | 12 +++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3bf48bc..bb1eba1 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -410,6 +410,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include <asm-generic/mman.h>
+
+#ifndef MAP_SYNC
+#define MAP_SYNC 0x0
+#endif
+
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC              0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
     uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..a9d5e56 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
     bool shared = flags & RAM_SHARED;
+    bool is_pmem = flags & RAM_PMEM;
+    int mmap_xflags = 0;
     size_t offset;
     void *ptr1;
 
@@ -109,13 +111,21 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     assert(is_power_of_2(align));
     /* Always align to host page size */
     assert(align >= getpagesize());
+    if (shared && is_pmem) {
+        mmap_xflags |= MAP_SYNC;
+    }
 
     offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
+ retry_mmap_fd:
     ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
                 MAP_FIXED |
                 (fd == -1 ? MAP_ANONYMOUS : 0) |
-                (shared ? MAP_SHARED : MAP_PRIVATE),
+                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
                 fd, 0);
+    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC)) {
+        mmap_xflags &= ~MAP_SYNC;
+        goto retry_mmap_fd;
+    }
     if (ptr1 == MAP_FAILED) {
         munmap(ptr, total);
         return MAP_FAILED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH V8 4/5] hostmem: add more information in error messages
  2019-01-02  5:25 [Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (2 preceding siblings ...)
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
@ 2019-01-02  5:26 ` Zhang Yi
  2019-01-14 19:16   ` Eduardo Habkost
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option Zhang Yi
  4 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2019-01-02  5:26 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst
  Cc: qemu-devel, imammedo, dan.j.williams, ehabkost, Zhang Yi

When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 backends/hostmem-file.c | 6 ++++--
 backends/hostmem.c      | 8 +++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(errp, "cannot change property value");
+        error_setg(errp, "cannot change property 'mem-path' of %s",
+                   object_get_typename(o));
         return;
     }
     g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     uint64_t val;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property '%s' of %s",
+                   name, object_get_typename(o));
         goto out;
     }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
     uint64_t value;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property %s of %s ",
+                   name, object_get_typename(obj));
         goto out;
     }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
         goto out;
     }
     if (!value) {
-        error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
-                   PRIu64 "'", object_get_typename(obj), name, value);
+        error_setg(&local_err,
+                   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+                   name, object_get_typename(obj), value);
         goto out;
     }
     backend->size = value;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel]  [PATCH V8 5/5] hostmem-file: add 'sync' option
  2019-01-02  5:25 [Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (3 preceding siblings ...)
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 4/5] hostmem: add more information in error messages Zhang Yi
@ 2019-01-02  5:26 ` Zhang Yi
  2019-01-14 19:39   ` Eduardo Habkost
  2019-01-15  3:31   ` Michael S. Tsirkin
  4 siblings, 2 replies; 20+ messages in thread
From: Zhang Yi @ 2019-01-02  5:26 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst
  Cc: qemu-devel, imammedo, dan.j.williams, ehabkost, Zhang Yi

This option controls will mmap the memory backend file with MAP_SYNC flag,
which can ensure filesystem metadata consistent even after a system crash
or power failure, if MAP_SYNC flag is supported by the host kernel(Linux
kernel 4.15 and later) and the backend is a file supporting DAX (e.g.,
file on ext4/xfs file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
        'share=off' or 'pmem!=on', QEMU will not pass this flags to
	mmap(2)
 - off: default, never pass MAP_SYNC to mmap(2)

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 backends/hostmem-file.c   | 28 ++++++++++++++++++++++++++++
 docs/nvdimm.txt           | 23 ++++++++++++++++++++++-
 exec.c                    |  2 +-
 include/exec/memory.h     |  4 ++++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h |  1 +
 qemu-options.hx           | 19 ++++++++++++++++++-
 util/mmap-alloc.c         |  4 ++--
 8 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 0dd7a90..3d39032 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -36,6 +36,7 @@ struct HostMemoryBackendFile {
     uint64_t align;
     bool discard_data;
     bool is_pmem;
+    bool sync;
 };
 
 static void
@@ -62,6 +63,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
                                  path,
                                  backend->size, fb->align,
                                  (backend->share ? RAM_SHARED : 0) |
+                                 (fb->sync ? RAM_SYNC : 0) |
                                  (fb->is_pmem ? RAM_PMEM : 0),
                                  fb->mem_path, errp);
         g_free(path);
@@ -136,6 +138,29 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_sync(Object *o, Error **errp)
+{
+    return MEMORY_BACKEND_FILE(o)->sync;
+}
+
+static void file_memory_backend_set_sync(
+    Object *obj, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(errp, "cannot change property sync of %s",
+                   object_get_typename(obj));
+        goto out;
+    }
+
+    fb->sync = value;
+
+ out:
+    return;
+}
+
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
     return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -203,6 +228,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
     object_class_property_add_bool(oc, "pmem",
         file_memory_backend_get_pmem, file_memory_backend_set_pmem,
         &error_abort);
+    object_class_property_add_bool(oc, "sync",
+        file_memory_backend_get_sync, file_memory_backend_set_sync,
+        &error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..30db458 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,32 @@ backend of vNVDIMM:
 Guest Data Persistence
 ----------------------
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
+filesystem metadata consistent even after a system crash or power
+failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
+also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'sync' option of memory-backend-file is on, and
+
+ - 'share' option of memory-backend-file is 'on'.
+
+ - 'pmem' option of memory-backend-file is 'on'
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/exec.c b/exec.c
index e92a7da..dc4d180 100644
--- a/exec.c
+++ b/exec.c
@@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     int64_t file_size;
 
     /* Just support these ram flags by now. */
-    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6e30c23..9297b1c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -26,6 +26,7 @@
 #include "qom/object.h"
 #include "qemu/rcu.h"
 #include "hw/qdev-core.h"
+#include "qapi/error.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -136,6 +137,9 @@ typedef unsigned QEMU_BITWISE QemuMmapFlags;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
 
+/* RAM can be mmap by a MAP_SYNC flag */
+#define RAM_SYNC ((QEMU_FORCE QemuMmapFlags) (1 << 6))
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 9ecd911..d239ce7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
  *              or bit-or of following values
  *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
  *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
+ *              - RAM_SYNC:   mmap with MAP_SYNC flag
  *              Other bits are ignored.
  *  @mem_path or @fd: specify the backing file or device
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..1755a8b 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *          bit-or of following values
  *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          - RAM_SYNC:   mmap with MAP_SYNC flag
  *          Other bits are ignored.
  *
  * Return:
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..0f51d08 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3928,7 +3928,7 @@ property must be set.  These objects are placed in the
 
 @table @option
 
-@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
+@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off}
 
 Creates a memory file backend object, which can be used to back
 the guest RAM with huge pages.
@@ -4003,6 +4003,23 @@ If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
 
+The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
+with MAP_SYNC flag, which can ensure the file metadata is in sync to
+@option{mem-path} even on the host crash and power failures. MAP_SYNC
+requires supports from both the host kernel (since Linux kernel 4.15)
+and @option{mem-path} (only files supporting DAX). It can take one of
+following values:
+
+@table @option
+@item @var{on}
+try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
+@option{share}=@var{off}, @option{pmem}=@var{off} QEMU will not pass
+this flags to kernel.
+
+@item @var{off} (default)
+never pass MAP_SYNC to mmap(2)
+@end table
+
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
 Creates a memory backend object, which can be used to back the guest RAM.
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index a9d5e56..33a7639 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,7 +99,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
     bool shared = flags & RAM_SHARED;
-    bool is_pmem = flags & RAM_PMEM;
+    bool is_pmemsync = (flags & RAM_PMEM) && (flags & RAM_SYNC);
     int mmap_xflags = 0;
     size_t offset;
     void *ptr1;
@@ -111,7 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     assert(is_power_of_2(align));
     /* Always align to host page size */
     assert(align >= getpagesize());
-    if (shared && is_pmem) {
+    if (shared && is_pmemsync) {
         mmap_xflags |= MAP_SYNC;
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
@ 2019-01-14 18:50   ` Eduardo Habkost
  2019-01-14 19:04     ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Eduardo Habkost @ 2019-01-14 18:50 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	imammedo, dan.j.williams, qemu-devel

On Wed, Jan 02, 2019 at 01:26:06PM +0800, Zhang Yi wrote:
> As more flag parameters besides the existing 'shared' are going to be
> added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
> in advance, so as to ease the further additions.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  exec.c                    |  7 ++++---
>  include/exec/memory.h     | 22 ++++++++++++++++------
>  include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
>  util/mmap-alloc.c         |  8 +++++---
>  util/oslib-posix.c        |  9 ++++++++-
>  5 files changed, 51 insertions(+), 14 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index bb6170d..e92a7da 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              int fd,
>                              bool truncate,
> +                            uint32_t flags,
>                              Error **errp)

I suggest documenting on the commit message why you are changing
file_ram_alloc() too.  The commit message mentions only
qemu_ram_mmap().

>  {
>      void *area;
> @@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
>          perror("ftruncate");
>      }
>  
> -    area = qemu_ram_mmap(fd, memory, block->mr->align,
> -                         block->flags & RAM_SHARED);
> +    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
>      if (area == MAP_FAILED) {
>          error_setg_errno(errp, errno,
>                           "unable to map backing store for guest RAM");
> @@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = ram_flags;
> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> +            ram_flags, errp);
>      if (!new_block->host) {
>          g_free(new_block);
>          return NULL;
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 667466b..6e30c23 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -103,28 +103,38 @@ struct IOMMUNotifier {
>  };
>  typedef struct IOMMUNotifier IOMMUNotifier;
>  
> +#ifdef __CHECKER__
> +#define QEMU_BITWISE __attribute__((bitwise))
> +#define QEMU_FORCE   __attribute__((force))
> +#else
> +#define QEMU_BITWISE
> +#define QEMU_FORCE
> +#endif
> +

I assume this is a sparse feature?

Why is it part of this patch?  I suggest doing this in a separate
patch series and in a common header file, so other developers
have a better chance to review it and decide how to use this
sparse feature in QEMU.


> +typedef unsigned QEMU_BITWISE QemuMmapFlags;
> +
>  /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
> -#define RAM_PREALLOC   (1 << 0)
> +#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
>  
>  /* RAM is mmap-ed with MAP_SHARED */
> -#define RAM_SHARED     (1 << 1)
> +#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
>  
>  /* Only a portion of RAM (used_length) is actually used, and migrated.
>   * This used_length size can change across reboots.
>   */
> -#define RAM_RESIZEABLE (1 << 2)
> +#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
>  
>  /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
>   * zero the page and wake waiting processes.
>   * (Set during postcopy)
>   */
> -#define RAM_UF_ZEROPAGE (1 << 3)
> +#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
>  
>  /* RAM can be migrated */
> -#define RAM_MIGRATABLE (1 << 4)
> +#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
>  
>  /* RAM is a persistent kind memory */
> -#define RAM_PMEM (1 << 5)
> +#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
>  
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 50385e3..6fe6ed4 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
>  
>  size_t qemu_mempath_getpagesize(const char *mem_path);
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> +/**
> + * qemu_ram_mmap: mmap the specified file or device.
> + *
> + * Parameters:
> + *  @fd: the file or the device to mmap
> + *  @size: the number of bytes to be mmaped
> + *  @align: if not zero, specify the alignment of the starting mapping address;
> + *          otherwise, the alignment in use will be determined by QEMU.
> + *  @flags: specifies additional properties of the mapping, which can be one or
> + *          bit-or of following values
> + *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          Other bits are ignored.
> + *
> + * Return:
> + *  On success, return a pointer to the mapped area.
> + *  On failure, return MAP_FAILED.
> + */
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
>  
>  void qemu_ram_munmap(void *ptr, size_t size);
>  
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index fd329ec..8f0a740 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/mmap-alloc.h"
>  #include "qemu/host-utils.h"
> +#include "exec/memory.h"
>  
>  #define HUGETLBFS_MAGIC       0x958458f6
>  
> @@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>      return getpagesize();
>  }
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>  {
>      /*
>       * Note: this always allocates at least one extra page of virtual address
> @@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * anonymous memory is OK.
>       */
>      int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> -    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> -    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> +    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> +    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
>  #else
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
> +    bool shared = flags & RAM_SHARED;
>      size_t offset;
>      void *ptr1;
>  
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index fbd0dc8..75a0171 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -54,6 +54,7 @@
>  #endif
>  
>  #include "qemu/mmap-alloc.h"
> +#include "exec/memory.h"
>  
>  #ifdef CONFIG_DEBUG_STACK_USAGE
>  #include "qemu/error-report.h"
> @@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
>  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
>  {
>      size_t align = QEMU_VMALLOC_ALIGN;
> -    void *ptr = qemu_ram_mmap(-1, size, align, shared);
> +    uint32_t flags = 0;
> +    void *ptr;
> +
> +    if (shared) {
> +        flags = RAM_SHARED;
> +    }
> +    ptr = qemu_ram_mmap(-1, size, align, flags);
>  
>      if (ptr == MAP_FAILED) {
>          return NULL;
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message
  2019-01-02  5:25 ` [Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message Zhang Yi
@ 2019-01-14 18:54   ` Eduardo Habkost
  0 siblings, 0 replies; 20+ messages in thread
From: Eduardo Habkost @ 2019-01-14 18:54 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	imammedo, dan.j.williams, qemu-devel

On Wed, Jan 02, 2019 at 01:25:56PM +0800, Zhang Yi wrote:
> object_get_canonical_path_component() returns a string which
> must be freed using g_free().
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Reviewed-by: Pankaj gupta <pagupta@redhat.com>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>

Queued, thanks.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  2019-01-14 18:50   ` Eduardo Habkost
@ 2019-01-14 19:04     ` Michael S. Tsirkin
  2019-01-15  2:39       ` Yi Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2019-01-14 19:04 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Zhang Yi, xiaoguangrong.eric, stefanha, pbonzini, pagupta,
	yu.c.zhang, imammedo, dan.j.williams, qemu-devel

On Mon, Jan 14, 2019 at 04:50:36PM -0200, Eduardo Habkost wrote:
> On Wed, Jan 02, 2019 at 01:26:06PM +0800, Zhang Yi wrote:
> > As more flag parameters besides the existing 'shared' are going to be
> > added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
> > in advance, so as to ease the further additions.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  exec.c                    |  7 ++++---
> >  include/exec/memory.h     | 22 ++++++++++++++++------
> >  include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
> >  util/mmap-alloc.c         |  8 +++++---
> >  util/oslib-posix.c        |  9 ++++++++-
> >  5 files changed, 51 insertions(+), 14 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index bb6170d..e92a7da 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
> >                              ram_addr_t memory,
> >                              int fd,
> >                              bool truncate,
> > +                            uint32_t flags,
> >                              Error **errp)
> 
> I suggest documenting on the commit message why you are changing
> file_ram_alloc() too.  The commit message mentions only
> qemu_ram_mmap().
> 
> >  {
> >      void *area;
> > @@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
> >          perror("ftruncate");
> >      }
> >  
> > -    area = qemu_ram_mmap(fd, memory, block->mr->align,
> > -                         block->flags & RAM_SHARED);
> > +    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
> >      if (area == MAP_FAILED) {
> >          error_setg_errno(errp, errno,
> >                           "unable to map backing store for guest RAM");
> > @@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> >      new_block->used_length = size;
> >      new_block->max_length = size;
> >      new_block->flags = ram_flags;
> > -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> > +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> > +            ram_flags, errp);
> >      if (!new_block->host) {
> >          g_free(new_block);
> >          return NULL;
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 667466b..6e30c23 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -103,28 +103,38 @@ struct IOMMUNotifier {
> >  };
> >  typedef struct IOMMUNotifier IOMMUNotifier;
> >  
> > +#ifdef __CHECKER__
> > +#define QEMU_BITWISE __attribute__((bitwise))
> > +#define QEMU_FORCE   __attribute__((force))
> > +#else
> > +#define QEMU_BITWISE
> > +#define QEMU_FORCE
> > +#endif
> > +
> 
> I assume this is a sparse feature?
> 
> Why is it part of this patch?  I suggest doing this in a separate
> patch series and in a common header file, so other developers
> have a better chance to review it and decide how to use this
> sparse feature in QEMU.

It's needed for this series but yes, this ifdefery belongs in
a more central header. Maybe qemu/osdep.h
And it needs documentation and be a separate patch.


> 
> > +typedef unsigned QEMU_BITWISE QemuMmapFlags;
> > +
> >  /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
> > -#define RAM_PREALLOC   (1 << 0)
> > +#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
> >  
> >  /* RAM is mmap-ed with MAP_SHARED */
> > -#define RAM_SHARED     (1 << 1)
> > +#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
> >  
> >  /* Only a portion of RAM (used_length) is actually used, and migrated.
> >   * This used_length size can change across reboots.
> >   */
> > -#define RAM_RESIZEABLE (1 << 2)
> > +#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
> >  
> >  /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
> >   * zero the page and wake waiting processes.
> >   * (Set during postcopy)
> >   */
> > -#define RAM_UF_ZEROPAGE (1 << 3)
> > +#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
> >  
> >  /* RAM can be migrated */
> > -#define RAM_MIGRATABLE (1 << 4)
> > +#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
> >  
> >  /* RAM is a persistent kind memory */
> > -#define RAM_PMEM (1 << 5)
> > +#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
> >  
> >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> >                                         IOMMUNotifierFlag flags,
> > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > index 50385e3..6fe6ed4 100644
> > --- a/include/qemu/mmap-alloc.h
> > +++ b/include/qemu/mmap-alloc.h
> > @@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
> >  
> >  size_t qemu_mempath_getpagesize(const char *mem_path);
> >  
> > -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> > +/**
> > + * qemu_ram_mmap: mmap the specified file or device.
> > + *
> > + * Parameters:
> > + *  @fd: the file or the device to mmap
> > + *  @size: the number of bytes to be mmaped
> > + *  @align: if not zero, specify the alignment of the starting mapping address;
> > + *          otherwise, the alignment in use will be determined by QEMU.
> > + *  @flags: specifies additional properties of the mapping, which can be one or
> > + *          bit-or of following values
> > + *          - RAM_SHARED: mmap with MAP_SHARED flag
> > + *          Other bits are ignored.
> > + *
> > + * Return:
> > + *  On success, return a pointer to the mapped area.
> > + *  On failure, return MAP_FAILED.
> > + */
> > +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
> >  
> >  void qemu_ram_munmap(void *ptr, size_t size);
> >  
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index fd329ec..8f0a740 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -13,6 +13,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/mmap-alloc.h"
> >  #include "qemu/host-utils.h"
> > +#include "exec/memory.h"
> >  
> >  #define HUGETLBFS_MAGIC       0x958458f6
> >  
> > @@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
> >      return getpagesize();
> >  }
> >  
> > -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> > +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >  {
> >      /*
> >       * Note: this always allocates at least one extra page of virtual address
> > @@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> >       * anonymous memory is OK.
> >       */
> >      int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> > -    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> > -    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> > +    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> > +    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
> >  #else
> >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >  #endif
> > +    bool shared = flags & RAM_SHARED;
> >      size_t offset;
> >      void *ptr1;
> >  
> > diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > index fbd0dc8..75a0171 100644
> > --- a/util/oslib-posix.c
> > +++ b/util/oslib-posix.c
> > @@ -54,6 +54,7 @@
> >  #endif
> >  
> >  #include "qemu/mmap-alloc.h"
> > +#include "exec/memory.h"
> >  
> >  #ifdef CONFIG_DEBUG_STACK_USAGE
> >  #include "qemu/error-report.h"
> > @@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
> >  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
> >  {
> >      size_t align = QEMU_VMALLOC_ALIGN;
> > -    void *ptr = qemu_ram_mmap(-1, size, align, shared);
> > +    uint32_t flags = 0;
> > +    void *ptr;
> > +
> > +    if (shared) {
> > +        flags = RAM_SHARED;
> > +    }
> > +    ptr = qemu_ram_mmap(-1, size, align, flags);
> >  
> >      if (ptr == MAP_FAILED) {
> >          return NULL;
> > -- 
> > 2.7.4
> > 
> > 
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
@ 2019-01-14 19:07   ` Eduardo Habkost
  2019-01-15  2:49     ` Yi Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Eduardo Habkost @ 2019-01-14 19:07 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	imammedo, dan.j.williams, qemu-devel

On Wed, Jan 02, 2019 at 01:26:15PM +0800, Zhang Yi wrote:
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition which can ensure file system metadata
> synced in each guest writes to the backend file, without other QEMU
> actions (e.g., periodic fsync() by QEMU).
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  include/qemu/osdep.h | 16 ++++++++++++++++
>  util/mmap-alloc.c    | 12 +++++++++++-
>  2 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 3bf48bc..bb1eba1 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -410,6 +410,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
>  #  define QEMU_VMALLOC_ALIGN getpagesize()
>  #endif
>  
> +/*
> + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> + * 4.15, so they may not be defined when compiling on older kernels.
> + */
> +#ifdef CONFIG_LINUX
> +
> +#include <asm-generic/mman.h>
> +
> +#ifndef MAP_SYNC
> +#define MAP_SYNC 0x0
> +#endif
> +
> +#else  /* !CONFIG_LINUX */
> +#define MAP_SYNC              0x0
> +#endif /* CONFIG_LINUX */
> +
>  #ifdef CONFIG_POSIX
>  struct qemu_signalfd_siginfo {
>      uint32_t ssi_signo;   /* Signal number */
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 8f0a740..a9d5e56 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
>      bool shared = flags & RAM_SHARED;
> +    bool is_pmem = flags & RAM_PMEM;
> +    int mmap_xflags = 0;
>      size_t offset;
>      void *ptr1;
>  
> @@ -109,13 +111,21 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
>      assert(align >= getpagesize());
> +    if (shared && is_pmem) {
> +        mmap_xflags |= MAP_SYNC;
> +    }
>  
>      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> + retry_mmap_fd:
>      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
>                  MAP_FIXED |
>                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> -                (shared ? MAP_SHARED : MAP_PRIVATE),
> +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
>                  fd, 0);
> +    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC)) {
> +        mmap_xflags &= ~MAP_SYNC;
> +        goto retry_mmap_fd;

Do we have use cases where using pmem=on without MAP_SYNC isn't
going to cause problems?  If not, shouldn't we at least print a
warning here?  Otherwise, won't we still need an option for cases
that require MAP_SYNC to be working?

> +    }

-- 
Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 4/5] hostmem: add more information in error messages
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 4/5] hostmem: add more information in error messages Zhang Yi
@ 2019-01-14 19:16   ` Eduardo Habkost
  0 siblings, 0 replies; 20+ messages in thread
From: Eduardo Habkost @ 2019-01-14 19:16 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	imammedo, dan.j.williams, qemu-devel

On Wed, Jan 02, 2019 at 01:26:24PM +0800, Zhang Yi wrote:
> When there are multiple memory backends in use, including the object type
> name, ID and the property name in the error message can help users to
> locate the error.

I don't see the object ID anywhere in the code below.

With the commit message corrected:

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

I'm queueing it on machine-next.

> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  backends/hostmem-file.c | 6 ++++--
>  backends/hostmem.c      | 8 +++++---
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index e640749..0dd7a90 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
>      HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
>  
>      if (host_memory_backend_mr_inited(backend)) {
> -        error_setg(errp, "cannot change property value");
> +        error_setg(errp, "cannot change property 'mem-path' of %s",
> +                   object_get_typename(o));
>          return;
>      }
>      g_free(fb->mem_path);
> @@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
>      uint64_t val;
>  
>      if (host_memory_backend_mr_inited(backend)) {
> -        error_setg(&local_err, "cannot change property value");
> +        error_setg(&local_err, "cannot change property '%s' of %s",
> +                   name, object_get_typename(o));
>          goto out;
>      }
>  
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 1a89342..e2bcf9f 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
>      uint64_t value;
>  
>      if (host_memory_backend_mr_inited(backend)) {
> -        error_setg(&local_err, "cannot change property value");
> +        error_setg(&local_err, "cannot change property %s of %s ",
> +                   name, object_get_typename(obj));
>          goto out;
>      }
>  
> @@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
>          goto out;
>      }
>      if (!value) {
> -        error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
> -                   PRIu64 "'", object_get_typename(obj), name, value);
> +        error_setg(&local_err,
> +                   "property '%s' of %s doesn't take value '%" PRIu64 "'",
> +                   name, object_get_typename(obj), value);
>          goto out;
>      }
>      backend->size = value;
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option Zhang Yi
@ 2019-01-14 19:39   ` Eduardo Habkost
  2019-01-15  3:13     ` Yi Zhang
  2019-01-15  3:31   ` Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Eduardo Habkost @ 2019-01-14 19:39 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	imammedo, dan.j.williams, qemu-devel

On Wed, Jan 02, 2019 at 01:26:34PM +0800, Zhang Yi wrote:
> This option controls will mmap the memory backend file with MAP_SYNC flag,
> which can ensure filesystem metadata consistent even after a system crash
> or power failure, if MAP_SYNC flag is supported by the host kernel(Linux
> kernel 4.15 and later) and the backend is a file supporting DAX (e.g.,
> file on ext4/xfs file system mounted with '-o dax').
> 
> It can take one of following values:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
>         'share=off' or 'pmem!=on', QEMU will not pass this flags to
> 	mmap(2)
>  - off: default, never pass MAP_SYNC to mmap(2)
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
[...]
> +vNVDIMM is designed and implemented to guarantee the guest data
> +persistence on the backends even on the host crash and power
> +failures. However, there are still some requirements and limitations
> +as explained below.
> +
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> +filesystem metadata consistent even after a system crash or power
> +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> +also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with '-o dax',
> +
> + - 'sync' option of memory-backend-file is on, and
> +
> + - 'share' option of memory-backend-file is 'on'.
> +
> + - 'pmem' option of memory-backend-file is 'on'

I miss one piece of information here: are there any negative
side-effects of enabling MAP_SYNC on a pmem=on backend?  Could it
affect performance?  If it has no negative effects, why don't we
try to always enable it whenever possible?


> +
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
>  guest NVDIMM region mapping structure.  This unarmed flag indicates
[...]
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index a9d5e56..33a7639 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -99,7 +99,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
>      bool shared = flags & RAM_SHARED;
> -    bool is_pmem = flags & RAM_PMEM;
> +    bool is_pmemsync = (flags & RAM_PMEM) && (flags & RAM_SYNC);

You seem to be reverting what you did on patch 3/5.  In patch
3/5, you were setting MAP_SYNC automatically on all pmem=on
backends.  Now, you are only setting MAP_SYNC only if sync=on is
set explicitly.

I don't know which behavior is better (see question above), but
it's better to start with the right behavior in the first place.

Also, I don't think we should clear MAP_SYNC silently if sync=on
was explicitly requested in the command-line.  If sync=on was
set, we should do exactly as told, and require MAP_SYNC.  If we
still want to support use cases where MAP_SYNC is desired but
optional (do we?), we can make 'sync' a OnOffAuto option.


>      int mmap_xflags = 0;
>      size_t offset;
>      void *ptr1;
> @@ -111,7 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
>      assert(align >= getpagesize());
> -    if (shared && is_pmem) {
> +    if (shared && is_pmemsync) {
>          mmap_xflags |= MAP_SYNC;
>      }
>  
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  2019-01-14 19:04     ` Michael S. Tsirkin
@ 2019-01-15  2:39       ` Yi Zhang
  2019-01-15  3:16         ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Yi Zhang @ 2019-01-15  2:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eduardo Habkost, xiaoguangrong.eric, stefanha, pbonzini, pagupta,
	yu.c.zhang, imammedo, dan.j.williams, qemu-devel

On 2019-01-14 at 14:04:25 -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 14, 2019 at 04:50:36PM -0200, Eduardo Habkost wrote:
> > On Wed, Jan 02, 2019 at 01:26:06PM +0800, Zhang Yi wrote:
> > > As more flag parameters besides the existing 'shared' are going to be
> > > added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
> > > in advance, so as to ease the further additions.
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > >  exec.c                    |  7 ++++---
> > >  include/exec/memory.h     | 22 ++++++++++++++++------
> > >  include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
> > >  util/mmap-alloc.c         |  8 +++++---
> > >  util/oslib-posix.c        |  9 ++++++++-
> > >  5 files changed, 51 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/exec.c b/exec.c
> > > index bb6170d..e92a7da 100644
> > > --- a/exec.c
> > > +++ b/exec.c
> > > @@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
> > >                              ram_addr_t memory,
> > >                              int fd,
> > >                              bool truncate,
> > > +                            uint32_t flags,
> > >                              Error **errp)
> > 
> > I suggest documenting on the commit message why you are changing
> > file_ram_alloc() too.  The commit message mentions only
> > qemu_ram_mmap().
> > 
> > >  {
> > >      void *area;
> > > @@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
> > >          perror("ftruncate");
> > >      }
> > >  
> > > -    area = qemu_ram_mmap(fd, memory, block->mr->align,
> > > -                         block->flags & RAM_SHARED);
> > > +    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
> > >      if (area == MAP_FAILED) {
> > >          error_setg_errno(errp, errno,
> > >                           "unable to map backing store for guest RAM");
> > > @@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> > >      new_block->used_length = size;
> > >      new_block->max_length = size;
> > >      new_block->flags = ram_flags;
> > > -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> > > +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> > > +            ram_flags, errp);
> > >      if (!new_block->host) {
> > >          g_free(new_block);
> > >          return NULL;
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 667466b..6e30c23 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -103,28 +103,38 @@ struct IOMMUNotifier {
> > >  };
> > >  typedef struct IOMMUNotifier IOMMUNotifier;
> > >  
> > > +#ifdef __CHECKER__
> > > +#define QEMU_BITWISE __attribute__((bitwise))
> > > +#define QEMU_FORCE   __attribute__((force))
> > > +#else
> > > +#define QEMU_BITWISE
> > > +#define QEMU_FORCE
> > > +#endif
> > > +
> > 
> > I assume this is a sparse feature?
> > 
> > Why is it part of this patch?  I suggest doing this in a separate
> > patch series and in a common header file, so other developers
> > have a better chance to review it and decide how to use this
> > sparse feature in QEMU.
> 
> It's needed for this series but yes, this ifdefery belongs in
> a more central header. Maybe qemu/osdep.h
> And it needs documentation and be a separate patch.
Agree, Thank Michael's explanation, better to doing this in a separate
patch series.
> 
> 
> > 
> > > +typedef unsigned QEMU_BITWISE QemuMmapFlags;
> > > +
> > >  /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
> > > -#define RAM_PREALLOC   (1 << 0)
> > > +#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
> > >  
> > >  /* RAM is mmap-ed with MAP_SHARED */
> > > -#define RAM_SHARED     (1 << 1)
> > > +#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
> > >  
> > >  /* Only a portion of RAM (used_length) is actually used, and migrated.
> > >   * This used_length size can change across reboots.
> > >   */
> > > -#define RAM_RESIZEABLE (1 << 2)
> > > +#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
> > >  
> > >  /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
> > >   * zero the page and wake waiting processes.
> > >   * (Set during postcopy)
> > >   */
> > > -#define RAM_UF_ZEROPAGE (1 << 3)
> > > +#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
> > >  
> > >  /* RAM can be migrated */
> > > -#define RAM_MIGRATABLE (1 << 4)
> > > +#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
> > >  
> > >  /* RAM is a persistent kind memory */
> > > -#define RAM_PMEM (1 << 5)
> > > +#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
> > >  
> > >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> > >                                         IOMMUNotifierFlag flags,
> > > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > > index 50385e3..6fe6ed4 100644
> > > --- a/include/qemu/mmap-alloc.h
> > > +++ b/include/qemu/mmap-alloc.h
> > > @@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
> > >  
> > >  size_t qemu_mempath_getpagesize(const char *mem_path);
> > >  
> > > -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> > > +/**
> > > + * qemu_ram_mmap: mmap the specified file or device.
> > > + *
> > > + * Parameters:
> > > + *  @fd: the file or the device to mmap
> > > + *  @size: the number of bytes to be mmaped
> > > + *  @align: if not zero, specify the alignment of the starting mapping address;
> > > + *          otherwise, the alignment in use will be determined by QEMU.
> > > + *  @flags: specifies additional properties of the mapping, which can be one or
> > > + *          bit-or of following values
> > > + *          - RAM_SHARED: mmap with MAP_SHARED flag
> > > + *          Other bits are ignored.
> > > + *
> > > + * Return:
> > > + *  On success, return a pointer to the mapped area.
> > > + *  On failure, return MAP_FAILED.
> > > + */
> > > +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
> > >  
> > >  void qemu_ram_munmap(void *ptr, size_t size);
> > >  
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index fd329ec..8f0a740 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -13,6 +13,7 @@
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/mmap-alloc.h"
> > >  #include "qemu/host-utils.h"
> > > +#include "exec/memory.h"
> > >  
> > >  #define HUGETLBFS_MAGIC       0x958458f6
> > >  
> > > @@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
> > >      return getpagesize();
> > >  }
> > >  
> > > -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> > > +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >  {
> > >      /*
> > >       * Note: this always allocates at least one extra page of virtual address
> > > @@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> > >       * anonymous memory is OK.
> > >       */
> > >      int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> > > -    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> > > -    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> > > +    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> > > +    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
> > >  #else
> > >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >  #endif
> > > +    bool shared = flags & RAM_SHARED;
> > >      size_t offset;
> > >      void *ptr1;
> > >  
> > > diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > > index fbd0dc8..75a0171 100644
> > > --- a/util/oslib-posix.c
> > > +++ b/util/oslib-posix.c
> > > @@ -54,6 +54,7 @@
> > >  #endif
> > >  
> > >  #include "qemu/mmap-alloc.h"
> > > +#include "exec/memory.h"
> > >  
> > >  #ifdef CONFIG_DEBUG_STACK_USAGE
> > >  #include "qemu/error-report.h"
> > > @@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
> > >  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
> > >  {
> > >      size_t align = QEMU_VMALLOC_ALIGN;
> > > -    void *ptr = qemu_ram_mmap(-1, size, align, shared);
> > > +    uint32_t flags = 0;
> > > +    void *ptr;
> > > +
> > > +    if (shared) {
> > > +        flags = RAM_SHARED;
> > > +    }
> > > +    ptr = qemu_ram_mmap(-1, size, align, flags);
> > >  
> > >      if (ptr == MAP_FAILED) {
> > >          return NULL;
> > > -- 
> > > 2.7.4
> > > 
> > > 
> > 
> > -- 
> > Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-14 19:07   ` Eduardo Habkost
@ 2019-01-15  2:49     ` Yi Zhang
  2019-01-15  3:34       ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Yi Zhang @ 2019-01-15  2:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	imammedo, dan.j.williams, qemu-devel

On 2019-01-14 at 17:07:02 -0200, Eduardo Habkost wrote:
> On Wed, Jan 02, 2019 at 01:26:15PM +0800, Zhang Yi wrote:
> > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > MAP_SYNC flag in addition which can ensure file system metadata
> > synced in each guest writes to the backend file, without other QEMU
> > actions (e.g., periodic fsync() by QEMU).
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  include/qemu/osdep.h | 16 ++++++++++++++++
> >  util/mmap-alloc.c    | 12 +++++++++++-
> >  2 files changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 3bf48bc..bb1eba1 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -410,6 +410,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> >  #endif
> >  
> > +/*
> > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > + * 4.15, so they may not be defined when compiling on older kernels.
> > + */
> > +#ifdef CONFIG_LINUX
> > +
> > +#include <asm-generic/mman.h>
> > +
> > +#ifndef MAP_SYNC
> > +#define MAP_SYNC 0x0
> > +#endif
> > +
> > +#else  /* !CONFIG_LINUX */
> > +#define MAP_SYNC              0x0
> > +#endif /* CONFIG_LINUX */
> > +
> >  #ifdef CONFIG_POSIX
> >  struct qemu_signalfd_siginfo {
> >      uint32_t ssi_signo;   /* Signal number */
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index 8f0a740..a9d5e56 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >  #endif
> >      bool shared = flags & RAM_SHARED;
> > +    bool is_pmem = flags & RAM_PMEM;
> > +    int mmap_xflags = 0;
> >      size_t offset;
> >      void *ptr1;
> >  
> > @@ -109,13 +111,21 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      assert(is_power_of_2(align));
> >      /* Always align to host page size */
> >      assert(align >= getpagesize());
> > +    if (shared && is_pmem) {
> > +        mmap_xflags |= MAP_SYNC;
> > +    }
> >  
> >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > + retry_mmap_fd:
> >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> >                  MAP_FIXED |
> >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> >                  fd, 0);
> > +    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC)) {
> > +        mmap_xflags &= ~MAP_SYNC;
> > +        goto retry_mmap_fd;
> 
> Do we have use cases where using pmem=on without MAP_SYNC isn't
> going to cause problems?  If not, shouldn't we at least print a
Yes, we have a case that direct use dax device but not a files on
dax aware file system, we prefer to don't set the MAP_SYNC if user
haven't much knowledge about that. it may took some potencial 
performance issues with MAP_SYNC.
> warning here?  Otherwise, won't we still need an option for cases
> that require MAP_SYNC to be working?
> 
> > +    }
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option
  2019-01-14 19:39   ` Eduardo Habkost
@ 2019-01-15  3:13     ` Yi Zhang
  2019-01-15  3:21       ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Yi Zhang @ 2019-01-15  3:13 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	imammedo, dan.j.williams, qemu-devel

On 2019-01-14 at 17:39:38 -0200, Eduardo Habkost wrote:
> On Wed, Jan 02, 2019 at 01:26:34PM +0800, Zhang Yi wrote:
> > This option controls will mmap the memory backend file with MAP_SYNC flag,
> > which can ensure filesystem metadata consistent even after a system crash
> > or power failure, if MAP_SYNC flag is supported by the host kernel(Linux
> > kernel 4.15 and later) and the backend is a file supporting DAX (e.g.,
> > file on ext4/xfs file system mounted with '-o dax').
> > 
> > It can take one of following values:
> >  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> >         'share=off' or 'pmem!=on', QEMU will not pass this flags to
> > 	mmap(2)
> >  - off: default, never pass MAP_SYNC to mmap(2)
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> [...]
> > +vNVDIMM is designed and implemented to guarantee the guest data
> > +persistence on the backends even on the host crash and power
> > +failures. However, there are still some requirements and limitations
> > +as explained below.
> > +
> >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > -currently the only one that can guarantee the guest write persistence
> > +if MAP_SYNC is not supported by the host kernel and the backends,
> > +the only backend that can guarantee the guest write persistence
> >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> >  which all guest access do not involve any host-side kernel cache.
> >  
> > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> > +filesystem metadata consistent even after a system crash or power
> > +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> > +also requires:
> > +
> > + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> > +   xfs file system mounted with '-o dax',
> > +
> > + - 'sync' option of memory-backend-file is on, and
> > +
> > + - 'share' option of memory-backend-file is 'on'.
> > +
> > + - 'pmem' option of memory-backend-file is 'on'
> 
> I miss one piece of information here: are there any negative
> side-effects of enabling MAP_SYNC on a pmem=on backend?  Could it
> affect performance?  If it has no negative effects, why don't we
> try to always enable it whenever possible?
> 
> 
> > +
> >  When using other types of backends, it's suggested to set 'unarmed'
> >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> [...]
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index a9d5e56..33a7639 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -99,7 +99,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >  #endif
> >      bool shared = flags & RAM_SHARED;
> > -    bool is_pmem = flags & RAM_PMEM;
> > +    bool is_pmemsync = (flags & RAM_PMEM) && (flags & RAM_SYNC);
> 
> You seem to be reverting what you did on patch 3/5.  In patch
> 3/5, you were setting MAP_SYNC automatically on all pmem=on
> backends.  Now, you are only setting MAP_SYNC only if sync=on is
> set explicitly.
> 
> I don't know which behavior is better (see question above), but
> it's better to start with the right behavior in the first place.
> 
> Also, I don't think we should clear MAP_SYNC silently if sync=on
> was explicitly requested in the command-line.  If sync=on was
> set, we should do exactly as told, and require MAP_SYNC.  If we
> still want to support use cases where MAP_SYNC is desired but
> optional (do we?), we can make 'sync' a OnOffAuto option.
Actually, I did this on previous version.
see https://patchwork.kernel.org/patch/10725671/ 

Michael said that we should limit that option as it is only valided
on a dax aware file system, to avoid the potencial performance issues
we set it off by-defualt, and let a well-know user decides they wanna
performance or stability.
> 
> 
> >      int mmap_xflags = 0;
> >      size_t offset;
> >      void *ptr1;
> > @@ -111,7 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      assert(is_power_of_2(align));
> >      /* Always align to host page size */
> >      assert(align >= getpagesize());
> > -    if (shared && is_pmem) {
> > +    if (shared && is_pmemsync) {
> >          mmap_xflags |= MAP_SYNC;
> >      }
> >  
> > -- 
> > 2.7.4
> > 
> > 
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  2019-01-15  2:39       ` Yi Zhang
@ 2019-01-15  3:16         ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2019-01-15  3:16 UTC (permalink / raw)
  To: Eduardo Habkost, xiaoguangrong.eric, stefanha, pbonzini, pagupta,
	yu.c.zhang, imammedo, dan.j.williams, qemu-devel

On Tue, Jan 15, 2019 at 10:39:14AM +0800, Yi Zhang wrote:
> > It's needed for this series but yes, this ifdefery belongs in
> > a more central header. Maybe qemu/osdep.h
> > And it needs documentation and be a separate patch.
> Agree, Thank Michael's explanation, better to doing this in a separate
> patch series.

Separate patch, does not have to be a separate series.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option
  2019-01-15  3:13     ` Yi Zhang
@ 2019-01-15  3:21       ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2019-01-15  3:21 UTC (permalink / raw)
  To: Eduardo Habkost, xiaoguangrong.eric, stefanha, pbonzini, pagupta,
	yu.c.zhang, imammedo, dan.j.williams, qemu-devel

On Tue, Jan 15, 2019 at 11:13:35AM +0800, Yi Zhang wrote:
> On 2019-01-14 at 17:39:38 -0200, Eduardo Habkost wrote:
> > On Wed, Jan 02, 2019 at 01:26:34PM +0800, Zhang Yi wrote:
> > > This option controls will mmap the memory backend file with MAP_SYNC flag,
> > > which can ensure filesystem metadata consistent even after a system crash
> > > or power failure, if MAP_SYNC flag is supported by the host kernel(Linux
> > > kernel 4.15 and later) and the backend is a file supporting DAX (e.g.,
> > > file on ext4/xfs file system mounted with '-o dax').
> > > 
> > > It can take one of following values:
> > >  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> > >         'share=off' or 'pmem!=on', QEMU will not pass this flags to
> > > 	mmap(2)
> > >  - off: default, never pass MAP_SYNC to mmap(2)
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > [...]
> > > +vNVDIMM is designed and implemented to guarantee the guest data
> > > +persistence on the backends even on the host crash and power
> > > +failures. However, there are still some requirements and limitations
> > > +as explained below.
> > > +
> > >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > -currently the only one that can guarantee the guest write persistence
> > > +if MAP_SYNC is not supported by the host kernel and the backends,
> > > +the only backend that can guarantee the guest write persistence
> > >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > >  which all guest access do not involve any host-side kernel cache.
> > >  
> > > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > > +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> > > +filesystem metadata consistent even after a system crash or power
> > > +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> > > +also requires:
> > > +
> > > + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> > > +   xfs file system mounted with '-o dax',
> > > +
> > > + - 'sync' option of memory-backend-file is on, and
> > > +
> > > + - 'share' option of memory-backend-file is 'on'.
> > > +
> > > + - 'pmem' option of memory-backend-file is 'on'
> > 
> > I miss one piece of information here: are there any negative
> > side-effects of enabling MAP_SYNC on a pmem=on backend?  Could it
> > affect performance?  If it has no negative effects, why don't we
> > try to always enable it whenever possible?
> > 
> > 
> > > +
> > >  When using other types of backends, it's suggested to set 'unarmed'
> > >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > [...]
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index a9d5e56..33a7639 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -99,7 +99,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >  #endif
> > >      bool shared = flags & RAM_SHARED;
> > > -    bool is_pmem = flags & RAM_PMEM;
> > > +    bool is_pmemsync = (flags & RAM_PMEM) && (flags & RAM_SYNC);
> > 
> > You seem to be reverting what you did on patch 3/5.  In patch
> > 3/5, you were setting MAP_SYNC automatically on all pmem=on
> > backends.  Now, you are only setting MAP_SYNC only if sync=on is
> > set explicitly.
> > 
> > I don't know which behavior is better (see question above), but
> > it's better to start with the right behavior in the first place.
> > 
> > Also, I don't think we should clear MAP_SYNC silently if sync=on
> > was explicitly requested in the command-line.  If sync=on was
> > set, we should do exactly as told, and require MAP_SYNC.  If we
> > still want to support use cases where MAP_SYNC is desired but
> > optional (do we?), we can make 'sync' a OnOffAuto option.
> Actually, I did this on previous version.
> see https://patchwork.kernel.org/patch/10725671/ 
> 
> Michael said that we should limit that option as it is only valided
> on a dax aware file system, to avoid the potencial performance issues
> we set it off by-defualt, and let a well-know user decides they wanna
> performance or stability.

However I am still unconvinced that the separate sync flag is helpful.
Why don't we set MAP_SYNC unconditionally when pmem is set?

It's a separate question what should happen on an old kernel. Maybe we
want a flag that says "fail unless persistence can be guaranteed".
Even then it's definitely not "sync".






> > 
> > 
> > >      int mmap_xflags = 0;
> > >      size_t offset;
> > >      void *ptr1;
> > > @@ -111,7 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      assert(is_power_of_2(align));
> > >      /* Always align to host page size */
> > >      assert(align >= getpagesize());
> > > -    if (shared && is_pmem) {
> > > +    if (shared && is_pmemsync) {
> > >          mmap_xflags |= MAP_SYNC;
> > >      }
> > >  
> > > -- 
> > > 2.7.4
> > > 
> > > 
> > 
> > -- 
> > Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option
  2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option Zhang Yi
  2019-01-14 19:39   ` Eduardo Habkost
@ 2019-01-15  3:31   ` Michael S. Tsirkin
  2019-01-15  6:55     ` Yi Zhang
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2019-01-15  3:31 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	qemu-devel, imammedo, dan.j.williams, ehabkost

On Wed, Jan 02, 2019 at 01:26:34PM +0800, Zhang Yi wrote:
> This option controls will mmap the memory backend file with MAP_SYNC flag,
> which can ensure filesystem metadata consistent even after a system crash
> or power failure, if MAP_SYNC flag is supported by the host kernel(Linux
> kernel 4.15 and later) and the backend is a file supporting DAX (e.g.,
> file on ext4/xfs file system mounted with '-o dax').
> 
> It can take one of following values:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
>         'share=off' or 'pmem!=on', QEMU will not pass this flags to
> 	mmap(2)
>  - off: default, never pass MAP_SYNC to mmap(2)
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>


So we introduce all of the above complexity and then I am pretty sure go
on and teach management tools to just always, without exception, set
sync=on to avoid data corruption.

So how about we give up on a bit of flexibility, and just say
pmem=on forces MAP_SYNC?

OTOH if you really really want a fast memory then why set pmem=on at
all?

Or, if you have some data that shows how disabling synchronous
pagefaults helps performance a lot, maybe we should introduce
a "crash-unsafe" flag.



> ---
>  backends/hostmem-file.c   | 28 ++++++++++++++++++++++++++++
>  docs/nvdimm.txt           | 23 ++++++++++++++++++++++-
>  exec.c                    |  2 +-
>  include/exec/memory.h     |  4 ++++
>  include/exec/ram_addr.h   |  1 +
>  include/qemu/mmap-alloc.h |  1 +
>  qemu-options.hx           | 19 ++++++++++++++++++-
>  util/mmap-alloc.c         |  4 ++--
>  8 files changed, 77 insertions(+), 5 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index 0dd7a90..3d39032 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -36,6 +36,7 @@ struct HostMemoryBackendFile {
>      uint64_t align;
>      bool discard_data;
>      bool is_pmem;
> +    bool sync;
>  };
>  
>  static void
> @@ -62,6 +63,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>                                   path,
>                                   backend->size, fb->align,
>                                   (backend->share ? RAM_SHARED : 0) |
> +                                 (fb->sync ? RAM_SYNC : 0) |
>                                   (fb->is_pmem ? RAM_PMEM : 0),
>                                   fb->mem_path, errp);
>          g_free(path);
> @@ -136,6 +138,29 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
>      error_propagate(errp, local_err);
>  }
>  
> +static bool file_memory_backend_get_sync(Object *o, Error **errp)
> +{
> +    return MEMORY_BACKEND_FILE(o)->sync;
> +}
> +
> +static void file_memory_backend_set_sync(
> +    Object *obj, bool value, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(errp, "cannot change property sync of %s",
> +                   object_get_typename(obj));
> +        goto out;
> +    }
> +
> +    fb->sync = value;
> +
> + out:
> +    return;
> +}
> +
>  static bool file_memory_backend_get_pmem(Object *o, Error **errp)
>  {
>      return MEMORY_BACKEND_FILE(o)->is_pmem;
> @@ -203,6 +228,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
>      object_class_property_add_bool(oc, "pmem",
>          file_memory_backend_get_pmem, file_memory_backend_set_pmem,
>          &error_abort);
> +    object_class_property_add_bool(oc, "sync",
> +        file_memory_backend_get_sync, file_memory_backend_set_sync,
> +        &error_abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 5f158a6..30db458 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -142,11 +142,32 @@ backend of vNVDIMM:
>  Guest Data Persistence
>  ----------------------
>  
> +vNVDIMM is designed and implemented to guarantee the guest data
> +persistence on the backends even on the host crash and power
> +failures. However, there are still some requirements and limitations
> +as explained below.
> +
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> +filesystem metadata consistent even after a system crash or power
> +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> +also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with '-o dax',
> +
> + - 'sync' option of memory-backend-file is on, and
> +
> + - 'share' option of memory-backend-file is 'on'.
> +
> + - 'pmem' option of memory-backend-file is 'on'
> +
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
>  guest NVDIMM region mapping structure.  This unarmed flag indicates
> diff --git a/exec.c b/exec.c
> index e92a7da..dc4d180 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      int64_t file_size;
>  
>      /* Just support these ram flags by now. */
> -    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
> +    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
>  
>      if (xen_enabled()) {
>          error_setg(errp, "-mem-path not supported with Xen");
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 6e30c23..9297b1c 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -26,6 +26,7 @@
>  #include "qom/object.h"
>  #include "qemu/rcu.h"
>  #include "hw/qdev-core.h"
> +#include "qapi/error.h"
>  
>  #define RAM_ADDR_INVALID (~(ram_addr_t)0)
>  
> @@ -136,6 +137,9 @@ typedef unsigned QEMU_BITWISE QemuMmapFlags;
>  /* RAM is a persistent kind memory */
>  #define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
>  
> +/* RAM can be mmap by a MAP_SYNC flag */
> +#define RAM_SYNC ((QEMU_FORCE QemuMmapFlags) (1 << 6))
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
>                                         hwaddr start, hwaddr end,
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 9ecd911..d239ce7 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
>   *              or bit-or of following values
>   *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
>   *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
> + *              - RAM_SYNC:   mmap with MAP_SYNC flag
>   *              Other bits are ignored.
>   *  @mem_path or @fd: specify the backing file or device
>   *  @errp: pointer to Error*, to store an error if it happens
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 6fe6ed4..1755a8b 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>   *  @flags: specifies additional properties of the mapping, which can be one or
>   *          bit-or of following values
>   *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          - RAM_SYNC:   mmap with MAP_SYNC flag
>   *          Other bits are ignored.
>   *
>   * Return:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08f8516..0f51d08 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3928,7 +3928,7 @@ property must be set.  These objects are placed in the
>  
>  @table @option
>  
> -@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> +@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off}
>  
>  Creates a memory file backend object, which can be used to back
>  the guest RAM with huge pages.
> @@ -4003,6 +4003,23 @@ If @option{pmem} is set to 'on', QEMU will take necessary operations to
>  guarantee the persistence of its own writes to @option{mem-path}
>  (e.g. in vNVDIMM label emulation and live migration).
>  
> +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> +with MAP_SYNC flag, which can ensure the file metadata is in sync to
> +@option{mem-path} even on the host crash and power failures. MAP_SYNC
> +requires supports from both the host kernel (since Linux kernel 4.15)
> +and @option{mem-path} (only files supporting DAX). It can take one of
> +following values:
> +
> +@table @option
> +@item @var{on}
> +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> +@option{share}=@var{off}, @option{pmem}=@var{off} QEMU will not pass
> +this flags to kernel.
> +
> +@item @var{off} (default)
> +never pass MAP_SYNC to mmap(2)
> +@end table
> +
>  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
>  Creates a memory backend object, which can be used to back the guest RAM.
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index a9d5e56..33a7639 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -99,7 +99,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
>      bool shared = flags & RAM_SHARED;
> -    bool is_pmem = flags & RAM_PMEM;
> +    bool is_pmemsync = (flags & RAM_PMEM) && (flags & RAM_SYNC);
>      int mmap_xflags = 0;
>      size_t offset;
>      void *ptr1;
> @@ -111,7 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
>      assert(align >= getpagesize());
> -    if (shared && is_pmem) {
> +    if (shared && is_pmemsync) {
>          mmap_xflags |= MAP_SYNC;
>      }
>  
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-15  2:49     ` Yi Zhang
@ 2019-01-15  3:34       ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2019-01-15  3:34 UTC (permalink / raw)
  To: Eduardo Habkost, xiaoguangrong.eric, stefanha, pbonzini, pagupta,
	yu.c.zhang, imammedo, dan.j.williams, qemu-devel

On Tue, Jan 15, 2019 at 10:49:45AM +0800, Yi Zhang wrote:
> On 2019-01-14 at 17:07:02 -0200, Eduardo Habkost wrote:
> > On Wed, Jan 02, 2019 at 01:26:15PM +0800, Zhang Yi wrote:
> > > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > > MAP_SYNC flag in addition which can ensure file system metadata
> > > synced in each guest writes to the backend file, without other QEMU
> > > actions (e.g., periodic fsync() by QEMU).
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > >  include/qemu/osdep.h | 16 ++++++++++++++++
> > >  util/mmap-alloc.c    | 12 +++++++++++-
> > >  2 files changed, 27 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > > index 3bf48bc..bb1eba1 100644
> > > --- a/include/qemu/osdep.h
> > > +++ b/include/qemu/osdep.h
> > > @@ -410,6 +410,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> > >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> > >  #endif
> > >  
> > > +/*
> > > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > > + * 4.15, so they may not be defined when compiling on older kernels.
> > > + */
> > > +#ifdef CONFIG_LINUX
> > > +
> > > +#include <asm-generic/mman.h>
> > > +
> > > +#ifndef MAP_SYNC
> > > +#define MAP_SYNC 0x0
> > > +#endif
> > > +
> > > +#else  /* !CONFIG_LINUX */
> > > +#define MAP_SYNC              0x0
> > > +#endif /* CONFIG_LINUX */
> > > +
> > >  #ifdef CONFIG_POSIX
> > >  struct qemu_signalfd_siginfo {
> > >      uint32_t ssi_signo;   /* Signal number */
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index 8f0a740..a9d5e56 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >  #endif
> > >      bool shared = flags & RAM_SHARED;
> > > +    bool is_pmem = flags & RAM_PMEM;
> > > +    int mmap_xflags = 0;
> > >      size_t offset;
> > >      void *ptr1;
> > >  
> > > @@ -109,13 +111,21 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      assert(is_power_of_2(align));
> > >      /* Always align to host page size */
> > >      assert(align >= getpagesize());
> > > +    if (shared && is_pmem) {
> > > +        mmap_xflags |= MAP_SYNC;
> > > +    }
> > >  
> > >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > > + retry_mmap_fd:
> > >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> > >                  MAP_FIXED |
> > >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> > >                  fd, 0);
> > > +    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC)) {
> > > +        mmap_xflags &= ~MAP_SYNC;
> > > +        goto retry_mmap_fd;
> > 
> > Do we have use cases where using pmem=on without MAP_SYNC isn't
> > going to cause problems?  If not, shouldn't we at least print a
> Yes, we have a case that direct use dax device but not a files on
> dax aware file system, we prefer to don't set the MAP_SYNC if user
> haven't much knowledge about that. it may took some potencial 
> performance issues with MAP_SYNC.

I think you will have to be quite a bit more specific.

If there's a performance / functionality tradeoff here
then hiding it behind an option with an inscrutable name
isn't a good idea. Neither is ignoring failures silently.



> > warning here?  Otherwise, won't we still need an option for cases
> > that require MAP_SYNC to be working?
> > 
> > > +    }
> > 
> > -- 
> > Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option
  2019-01-15  3:31   ` Michael S. Tsirkin
@ 2019-01-15  6:55     ` Yi Zhang
  0 siblings, 0 replies; 20+ messages in thread
From: Yi Zhang @ 2019-01-15  6:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	qemu-devel, imammedo, dan.j.williams, ehabkost

On 2019-01-14 at 22:31:45 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 02, 2019 at 01:26:34PM +0800, Zhang Yi wrote:
> > This option controls will mmap the memory backend file with MAP_SYNC flag,
> > which can ensure filesystem metadata consistent even after a system crash
> > or power failure, if MAP_SYNC flag is supported by the host kernel(Linux
> > kernel 4.15 and later) and the backend is a file supporting DAX (e.g.,
> > file on ext4/xfs file system mounted with '-o dax').
> > 
> > It can take one of following values:
> >  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> >         'share=off' or 'pmem!=on', QEMU will not pass this flags to
> > 	mmap(2)
> >  - off: default, never pass MAP_SYNC to mmap(2)
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> 
> So we introduce all of the above complexity and then I am pretty sure go
> on and teach management tools to just always, without exception, set
> sync=on to avoid data corruption.
> 
> So how about we give up on a bit of flexibility, and just say
> pmem=on forces MAP_SYNC?
> 
> OTOH if you really really want a fast memory then why set pmem=on at
> all?

Indeed, All my concern is that we do need to pass the sync to a type of
pmem which didn't backend on a dax aware file. Anyway, I will drop the
sync option, and let it on while we set pmem, Thanks your suggestion.
Michael.

> 
> Or, if you have some data that shows how disabling synchronous
> pagefaults helps performance a lot, maybe we should introduce
> a "crash-unsafe" flag.
> 
> 
> 
> > ---
> >  backends/hostmem-file.c   | 28 ++++++++++++++++++++++++++++
> >  docs/nvdimm.txt           | 23 ++++++++++++++++++++++-
> >  exec.c                    |  2 +-
> >  include/exec/memory.h     |  4 ++++
> >  include/exec/ram_addr.h   |  1 +
> >  include/qemu/mmap-alloc.h |  1 +
> >  qemu-options.hx           | 19 ++++++++++++++++++-
> >  util/mmap-alloc.c         |  4 ++--
> >  8 files changed, 77 insertions(+), 5 deletions(-)
> > 
> > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > index 0dd7a90..3d39032 100644
> > --- a/backends/hostmem-file.c
> > +++ b/backends/hostmem-file.c
> > @@ -36,6 +36,7 @@ struct HostMemoryBackendFile {
> >      uint64_t align;
> >      bool discard_data;
> >      bool is_pmem;
> > +    bool sync;
> >  };
> >  
> >  static void
> > @@ -62,6 +63,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> >                                   path,
> >                                   backend->size, fb->align,
> >                                   (backend->share ? RAM_SHARED : 0) |
> > +                                 (fb->sync ? RAM_SYNC : 0) |
> >                                   (fb->is_pmem ? RAM_PMEM : 0),
> >                                   fb->mem_path, errp);
> >          g_free(path);
> > @@ -136,6 +138,29 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
> >      error_propagate(errp, local_err);
> >  }
> >  
> > +static bool file_memory_backend_get_sync(Object *o, Error **errp)
> > +{
> > +    return MEMORY_BACKEND_FILE(o)->sync;
> > +}
> > +
> > +static void file_memory_backend_set_sync(
> > +    Object *obj, bool value, Error **errp)
> > +{
> > +    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> > +
> > +    if (host_memory_backend_mr_inited(backend)) {
> > +        error_setg(errp, "cannot change property sync of %s",
> > +                   object_get_typename(obj));
> > +        goto out;
> > +    }
> > +
> > +    fb->sync = value;
> > +
> > + out:
> > +    return;
> > +}
> > +
> >  static bool file_memory_backend_get_pmem(Object *o, Error **errp)
> >  {
> >      return MEMORY_BACKEND_FILE(o)->is_pmem;
> > @@ -203,6 +228,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
> >      object_class_property_add_bool(oc, "pmem",
> >          file_memory_backend_get_pmem, file_memory_backend_set_pmem,
> >          &error_abort);
> > +    object_class_property_add_bool(oc, "sync",
> > +        file_memory_backend_get_sync, file_memory_backend_set_sync,
> > +        &error_abort);
> >  }
> >  
> >  static void file_backend_instance_finalize(Object *o)
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index 5f158a6..30db458 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -142,11 +142,32 @@ backend of vNVDIMM:
> >  Guest Data Persistence
> >  ----------------------
> >  
> > +vNVDIMM is designed and implemented to guarantee the guest data
> > +persistence on the backends even on the host crash and power
> > +failures. However, there are still some requirements and limitations
> > +as explained below.
> > +
> >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > -currently the only one that can guarantee the guest write persistence
> > +if MAP_SYNC is not supported by the host kernel and the backends,
> > +the only backend that can guarantee the guest write persistence
> >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> >  which all guest access do not involve any host-side kernel cache.
> >  
> > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> > +filesystem metadata consistent even after a system crash or power
> > +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> > +also requires:
> > +
> > + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> > +   xfs file system mounted with '-o dax',
> > +
> > + - 'sync' option of memory-backend-file is on, and
> > +
> > + - 'share' option of memory-backend-file is 'on'.
> > +
> > + - 'pmem' option of memory-backend-file is 'on'
> > +
> >  When using other types of backends, it's suggested to set 'unarmed'
> >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > diff --git a/exec.c b/exec.c
> > index e92a7da..dc4d180 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> >      int64_t file_size;
> >  
> >      /* Just support these ram flags by now. */
> > -    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
> > +    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
> >  
> >      if (xen_enabled()) {
> >          error_setg(errp, "-mem-path not supported with Xen");
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 6e30c23..9297b1c 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -26,6 +26,7 @@
> >  #include "qom/object.h"
> >  #include "qemu/rcu.h"
> >  #include "hw/qdev-core.h"
> > +#include "qapi/error.h"
> >  
> >  #define RAM_ADDR_INVALID (~(ram_addr_t)0)
> >  
> > @@ -136,6 +137,9 @@ typedef unsigned QEMU_BITWISE QemuMmapFlags;
> >  /* RAM is a persistent kind memory */
> >  #define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
> >  
> > +/* RAM can be mmap by a MAP_SYNC flag */
> > +#define RAM_SYNC ((QEMU_FORCE QemuMmapFlags) (1 << 6))
> > +
> >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> >                                         IOMMUNotifierFlag flags,
> >                                         hwaddr start, hwaddr end,
> > diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> > index 9ecd911..d239ce7 100644
> > --- a/include/exec/ram_addr.h
> > +++ b/include/exec/ram_addr.h
> > @@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
> >   *              or bit-or of following values
> >   *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
> >   *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
> > + *              - RAM_SYNC:   mmap with MAP_SYNC flag
> >   *              Other bits are ignored.
> >   *  @mem_path or @fd: specify the backing file or device
> >   *  @errp: pointer to Error*, to store an error if it happens
> > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > index 6fe6ed4..1755a8b 100644
> > --- a/include/qemu/mmap-alloc.h
> > +++ b/include/qemu/mmap-alloc.h
> > @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
> >   *  @flags: specifies additional properties of the mapping, which can be one or
> >   *          bit-or of following values
> >   *          - RAM_SHARED: mmap with MAP_SHARED flag
> > + *          - RAM_SYNC:   mmap with MAP_SYNC flag
> >   *          Other bits are ignored.
> >   *
> >   * Return:
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 08f8516..0f51d08 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -3928,7 +3928,7 @@ property must be set.  These objects are placed in the
> >  
> >  @table @option
> >  
> > -@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> > +@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off}
> >  
> >  Creates a memory file backend object, which can be used to back
> >  the guest RAM with huge pages.
> > @@ -4003,6 +4003,23 @@ If @option{pmem} is set to 'on', QEMU will take necessary operations to
> >  guarantee the persistence of its own writes to @option{mem-path}
> >  (e.g. in vNVDIMM label emulation and live migration).
> >  
> > +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> > +with MAP_SYNC flag, which can ensure the file metadata is in sync to
> > +@option{mem-path} even on the host crash and power failures. MAP_SYNC
> > +requires supports from both the host kernel (since Linux kernel 4.15)
> > +and @option{mem-path} (only files supporting DAX). It can take one of
> > +following values:
> > +
> > +@table @option
> > +@item @var{on}
> > +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> > +@option{share}=@var{off}, @option{pmem}=@var{off} QEMU will not pass
> > +this flags to kernel.
> > +
> > +@item @var{off} (default)
> > +never pass MAP_SYNC to mmap(2)
> > +@end table
> > +
> >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >  
> >  Creates a memory backend object, which can be used to back the guest RAM.
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index a9d5e56..33a7639 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -99,7 +99,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >  #endif
> >      bool shared = flags & RAM_SHARED;
> > -    bool is_pmem = flags & RAM_PMEM;
> > +    bool is_pmemsync = (flags & RAM_PMEM) && (flags & RAM_SYNC);
> >      int mmap_xflags = 0;
> >      size_t offset;
> >      void *ptr1;
> > @@ -111,7 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      assert(is_power_of_2(align));
> >      /* Always align to host page size */
> >      assert(align >= getpagesize());
> > -    if (shared && is_pmem) {
> > +    if (shared && is_pmemsync) {
> >          mmap_xflags |= MAP_SYNC;
> >      }
> >  
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-01-15  6:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-02  5:25 [Qemu-devel] [PATCH V8 0/5] support MAP_SYNC for memory-backend-file Zhang Yi
2019-01-02  5:25 ` [Qemu-devel] [PATCH V8 1/5] numa: Fixed the memory leak of numa error message Zhang Yi
2019-01-14 18:54   ` Eduardo Habkost
2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 2/5] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
2019-01-14 18:50   ` Eduardo Habkost
2019-01-14 19:04     ` Michael S. Tsirkin
2019-01-15  2:39       ` Yi Zhang
2019-01-15  3:16         ` Michael S. Tsirkin
2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 3/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
2019-01-14 19:07   ` Eduardo Habkost
2019-01-15  2:49     ` Yi Zhang
2019-01-15  3:34       ` Michael S. Tsirkin
2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 4/5] hostmem: add more information in error messages Zhang Yi
2019-01-14 19:16   ` Eduardo Habkost
2019-01-02  5:26 ` [Qemu-devel] [PATCH V8 5/5] hostmem-file: add 'sync' option Zhang Yi
2019-01-14 19:39   ` Eduardo Habkost
2019-01-15  3:13     ` Yi Zhang
2019-01-15  3:21       ` Michael S. Tsirkin
2019-01-15  3:31   ` Michael S. Tsirkin
2019-01-15  6:55     ` Yi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.