All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file
@ 2018-12-18  8:16 Zhang Yi
  2018-12-18  8:16 ` [Qemu-devel] [PATCH V7 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Zhang Yi @ 2018-12-18  8:16 UTC (permalink / raw)
  To: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo
  Cc: qemu-devel, pbonzini, stefanha, crosthwaite.peter,
	richard.henderson, xiaoguangrong.eric, Zhang Yi

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

A new auto on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
        'share=off' or 'pmem=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on' 'pmem=on', work as if
        'sync=on'; otherwise, work as if 'sync=off'

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (6):
  numa: Fixed the memory leak of numa error message
  util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto
  hostmem: add more information in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 45 +++++++++++++++++++++++++++++++++++++++++++--
 backends/hostmem.c        |  8 +++++---
 docs/nvdimm.txt           | 22 +++++++++++++++++++++-
 exec.c                    |  9 +++++----
 include/exec/memory.h     | 18 ++++++++++++++++++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
 include/qemu/osdep.h      | 29 +++++++++++++++++++++++++++++
 numa.c                    |  1 +
 qemu-options.hx           | 22 +++++++++++++++++++++-
 util/mmap-alloc.c         | 27 ++++++++++++++++++++++-----
 util/oslib-posix.c        |  8 +++++++-
 12 files changed, 192 insertions(+), 18 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH V7 1/6] numa: Fixed the memory leak of numa error message
  2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
@ 2018-12-18  8:16 ` Zhang Yi
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Zhang Yi @ 2018-12-18  8:16 UTC (permalink / raw)
  To: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo
  Cc: qemu-devel, pbonzini, stefanha, crosthwaite.peter,
	richard.henderson, xiaoguangrong.eric, Zhang Yi

object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Pankaj gupta <pagupta@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
             error_report("memory backend %s is used multiple times. Each "
                          "-numa option must use a different memdev value.",
                          path);
+            g_free(path);
             exit(1);
         }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH V7 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
  2018-12-18  8:16 ` [Qemu-devel] [PATCH V7 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
@ 2018-12-18  8:17 ` Zhang Yi
  2018-12-18 13:35   ` Michael S. Tsirkin
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Zhang Yi @ 2018-12-18  8:17 UTC (permalink / raw)
  To: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo
  Cc: qemu-devel, pbonzini, stefanha, crosthwaite.peter,
	richard.henderson, xiaoguangrong.eric, Zhang Yi

As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
in advance, so as to ease the further additions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 exec.c                    |  7 ++++---
 include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
 util/mmap-alloc.c         |  8 +++++---
 util/oslib-posix.c        |  8 +++++++-
 4 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             int fd,
                             bool truncate,
+                            uint32_t flags,
                             Error **errp)
 {
     void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED);
+    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+            ram_flags, errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *          otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *          bit-or of following values
+ *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
      * anonymous memory is OK.
      */
     int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
-    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+    bool shared = flags & RAM_SHARED;
     size_t offset;
     void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..121c31f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, shared);
+    uint32_t flags = 0;
+    void *ptr;
+
+    if (shared) {
+        flags = MAP_SHARED;
+    }
+    ptr = qemu_ram_mmap(-1, size, align, flags);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
  2018-12-18  8:16 ` [Qemu-devel] [PATCH V7 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
@ 2018-12-18  8:17 ` Zhang Yi
  2018-12-18 13:52   ` Michael S. Tsirkin
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 4/6] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto Zhang Yi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Zhang Yi @ 2018-12-18  8:17 UTC (permalink / raw)
  To: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo
  Cc: qemu-devel, pbonzini, stefanha, crosthwaite.peter,
	richard.henderson, xiaoguangrong.eric, Zhang Yi

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition can guarantee the persistence of guest write
to the backend file without other QEMU actions (e.g., periodic fsync()
by QEMU).

A set of RAM_SYNC flags are added to qemu_ram_mmap():

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 exec.c                    |  2 +-
 include/exec/memory.h     |  3 +++
 include/exec/ram_addr.h   |  1 +
 include/qemu/mmap-alloc.h |  1 +
 include/qemu/osdep.h      | 29 +++++++++++++++++++++++++++++
 util/mmap-alloc.c         | 14 ++++++++++----
 6 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index e92a7da..dc4d180 100644
--- a/exec.c
+++ b/exec.c
@@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     int64_t file_size;
 
     /* Just support these ram flags by now. */
-    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..33a4e2c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 /* RAM is a persistent kind memory */
 #define RAM_PMEM (1 << 5)
 
+/* RAM can be mmap by a MAP_SYNC flag */
+#define RAM_SYNC (1 << 6)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 9ecd911..d239ce7 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
  *              or bit-or of following values
  *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
  *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
+ *              - RAM_SYNC:   mmap with MAP_SYNC flag
  *              Other bits are ignored.
  *  @mem_path or @fd: specify the backing file or device
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..1755a8b 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *          bit-or of following values
  *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          - RAM_SYNC:   mmap with MAP_SYNC flag
  *          Other bits are ignored.
  *
  * Return:
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3bf48bc..f94ea68 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -410,6 +410,35 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include <sys/mman.h>
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE   0x3
+#endif
+
+#ifndef MAP_SYNC
+#define MAP_SYNC              0x80000
+#endif
+
+/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
+#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
+
+#else  /* !CONFIG_LINUX */
+
+#define MAP_SHARED_VALIDATE   0x0
+#define MAP_SYNC              0x0
+
+#define QEMU_HAS_MAP_SYNC     false
+#define MAP_SYNC_FLAGS 0
+
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
     uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..89ae862 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
     bool shared = flags & RAM_SHARED;
+    bool is_pmem = flags & RAM_PMEM;
+    int mmap_xflags = 0;
     size_t offset;
     void *ptr1;
 
@@ -109,16 +111,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     assert(is_power_of_2(align));
     /* Always align to host page size */
     assert(align >= getpagesize());
+    if ((flags & RAM_SYNC) && shared && is_pmem) {
+        mmap_xflags |= MAP_SYNC_FLAGS;
+    }
 
     offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
+ retry_mmap_fd:
     ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
                 MAP_FIXED |
                 (fd == -1 ? MAP_ANONYMOUS : 0) |
-                (shared ? MAP_SHARED : MAP_PRIVATE),
+                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
                 fd, 0);
-    if (ptr1 == MAP_FAILED) {
-        munmap(ptr, total);
-        return MAP_FAILED;
+    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
+        mmap_xflags &= ~MAP_SYNC_FLAGS;
+        goto retry_mmap_fd;
     }
 
     if (offset > 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH V7 4/6] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto
  2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (2 preceding siblings ...)
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
@ 2018-12-18  8:17 ` Zhang Yi
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 5/6] hostmem: add more information in error messages Zhang Yi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Zhang Yi @ 2018-12-18  8:17 UTC (permalink / raw)
  To: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo
  Cc: qemu-devel, pbonzini, stefanha, crosthwaite.peter,
	richard.henderson, xiaoguangrong.eric, Zhang Yi

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>

A set of RAM_SYNC_ON_OFF_AUTO{AUTO,ON,OFF} flags are added to
qemu_ram_mmap():

- If RAM_SYNC_ON_OFF_AUTO_ON is present, qemu_ram_mmap() will try to pass
  MAP_SYNC to mmap(). It will then fail if the host OS or the backend
  file do not support MAP_SYNC, or MAP_SYNC is conflict with other
  flags.

- If RAM_SYNC_ON_OFF_AUTO_OFF is present, qemu_ram_mmap() will never pass
  MAP_SYNC to mmap().

- If RAM_SYNC_ON_OFF_AUTO_AUTO is present, and
  * if the host OS and the backend file support MAP_SYNC, and MAP_SYNC
    is not conflict with other flags, qemu_ram_mmap() will work as if
    RAM_SYNC_ON_OFF_AUTO_ON is present;
  * otherwise, qemu_ram_mmap() will work as if RAM_SYNC_ON_OFF_AUTO_OFF is
    present.
---
 include/exec/memory.h |  9 ++++++++-
 util/mmap-alloc.c     | 13 +++++++++++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 33a4e2c..c74c467 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -127,7 +127,14 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_PMEM (1 << 5)
 
 /* RAM can be mmap by a MAP_SYNC flag */
-#define RAM_SYNC (1 << 6)
+#define RAM_SYNC_SHIFT  6
+#define RAM_SYNC_SHIFT_AUTO  7
+
+#define RAM_SYNC_ON_OFF_AUTO_ON   (1UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_OFF  (0UL << RAM_SYNC_SHIFT)
+#define RAM_SYNC_ON_OFF_AUTO_AUTO (1UL << RAM_SYNC_SHIFT_AUTO)
+
+#define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 89ae862..2f2fb43 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -111,6 +111,11 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     assert(is_power_of_2(align));
     /* Always align to host page size */
     assert(align >= getpagesize());
+    if ((flags & RAM_SYNC_ON_OFF_AUTO_ON) &&
+        (!shared || !MAP_SYNC_FLAGS || !is_pmem)) {
+        munmap(ptr, total);
+        return MAP_FAILED;
+    }
     if ((flags & RAM_SYNC) && shared && is_pmem) {
         mmap_xflags |= MAP_SYNC_FLAGS;
     }
@@ -123,8 +128,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
                 (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
                 fd, 0);
     if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
-        mmap_xflags &= ~MAP_SYNC_FLAGS;
-        goto retry_mmap_fd;
+        if (flags & RAM_SYNC_ON_OFF_AUTO_AUTO) {
+            mmap_xflags &= ~MAP_SYNC_FLAGS;
+            goto retry_mmap_fd;
+        }
+        munmap(ptr, total);
+        return MAP_FAILED;
     }
 
     if (offset > 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH V7 5/6] hostmem: add more information in error messages
  2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (3 preceding siblings ...)
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 4/6] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto Zhang Yi
@ 2018-12-18  8:17 ` Zhang Yi
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option Zhang Yi
  2018-12-18  9:18 ` [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Stefan Hajnoczi
  6 siblings, 0 replies; 23+ messages in thread
From: Zhang Yi @ 2018-12-18  8:17 UTC (permalink / raw)
  To: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo
  Cc: qemu-devel, pbonzini, stefanha, crosthwaite.peter,
	richard.henderson, xiaoguangrong.eric, Zhang Yi

When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 backends/hostmem-file.c | 6 ++++--
 backends/hostmem.c      | 8 +++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(errp, "cannot change property value");
+        error_setg(errp, "cannot change property 'mem-path' of %s",
+                   object_get_typename(o));
         return;
     }
     g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     uint64_t val;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property '%s' of %s",
+                   name, object_get_typename(o));
         goto out;
     }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
     uint64_t value;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property %s of %s ",
+                   name, object_get_typename(obj));
         goto out;
     }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
         goto out;
     }
     if (!value) {
-        error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
-                   PRIu64 "'", object_get_typename(obj), name, value);
+        error_setg(&local_err,
+                   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+                   name, object_get_typename(obj), value);
         goto out;
     }
     backend->size = value;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel]  [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (4 preceding siblings ...)
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 5/6] hostmem: add more information in error messages Zhang Yi
@ 2018-12-18  8:17 ` Zhang Yi
  2018-12-18 14:18   ` Michael S. Tsirkin
  2018-12-18  9:18 ` [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Stefan Hajnoczi
  6 siblings, 1 reply; 23+ messages in thread
From: Zhang Yi @ 2018-12-18  8:17 UTC (permalink / raw)
  To: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo
  Cc: qemu-devel, pbonzini, stefanha, crosthwaite.peter,
	richard.henderson, xiaoguangrong.eric, Zhang Yi

This option controls whether QEMU mmap(2) the memory backend file with
MAP_SYNC flag, which could consistent filesystem metadata for each guest
write, if MAP_SYNC flag is supported by the host kernel(Linux kernel 4.15
and later) and the backend is a file supporting DAX (e.g., file on ext4/xfs
file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
        'share=off' or 'pmem!=on', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on' 'pmem=on', work as
        if 'sync=on'; otherwise, work as if 'sync=off'

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 backends/hostmem-file.c | 39 +++++++++++++++++++++++++++++++++++++++
 docs/nvdimm.txt         | 22 +++++++++++++++++++++-
 include/exec/memory.h   |  8 ++++++++
 qemu-options.hx         | 22 +++++++++++++++++++++-
 4 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 0dd7a90..73cf181 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -16,6 +16,7 @@
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
+#include "qapi/qapi-visit.h"
 
 /* hostmem-file.c */
 /**
@@ -36,6 +37,7 @@ struct HostMemoryBackendFile {
     uint64_t align;
     bool discard_data;
     bool is_pmem;
+    OnOffAuto sync;
 };
 
 static void
@@ -62,6 +64,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
                                  path,
                                  backend->size, fb->align,
                                  (backend->share ? RAM_SHARED : 0) |
+                                 qemu_ram_sync_flags(fb->sync) |
                                  (fb->is_pmem ? RAM_PMEM : 0),
                                  fb->mem_path, errp);
         g_free(path);
@@ -136,6 +139,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     error_propagate(errp, local_err);
 }
 
+static void file_memory_backend_get_sync(
+    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+    OnOffAuto value = fb->sync;
+
+    visit_type_OnOffAuto(v, name, &value, errp);
+}
+
+static void file_memory_backend_set_sync(
+    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+    Error *local_err = NULL;
+    OnOffAuto value;
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(&local_err, "cannot change property '%s' of %s",
+                   name, object_get_typename(obj));
+        goto out;
+    }
+
+    visit_type_OnOffAuto(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    fb->sync = value;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
     return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -203,6 +239,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
     object_class_property_add_bool(oc, "pmem",
         file_memory_backend_get_pmem, file_memory_backend_set_pmem,
         &error_abort);
+    object_class_property_add(oc, "sync", "OnOffAuto",
+        file_memory_backend_get_sync, file_memory_backend_set_sync,
+        NULL, NULL, &error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..d86a270 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,31 @@ backend of vNVDIMM:
 Guest Data Persistence
 ----------------------
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which could
+consistent filesystem metadata for each guest write. Besides the host
+kernel support, enabling MAP_SYNC in QEMU also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'sync' option of memory-backend-file is not 'off', and
+
+ - 'share' option of memory-backend-file is 'on'.
+
+ - 'pmem' option of memory-backend-file is 'on'
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/include/exec/memory.h b/include/exec/memory.h
index c74c467..b398abb 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -26,6 +26,7 @@
 #include "qom/object.h"
 #include "qemu/rcu.h"
 #include "hw/qdev-core.h"
+#include "qapi/error.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -136,6 +137,13 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 
 #define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
 
+static inline uint64_t qemu_ram_sync_flags(OnOffAuto v)
+{
+    return v == ON_OFF_AUTO_OFF ? RAM_SYNC_ON_OFF_AUTO_OFF :
+           v == ON_OFF_AUTO_ON ? RAM_SYNC_ON_OFF_AUTO_ON :
+           RAM_SYNC_ON_OFF_AUTO_AUTO;
+}
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..624f634 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3928,7 +3928,7 @@ property must be set.  These objects are placed in the
 
 @table @option
 
-@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
+@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
 
 Creates a memory file backend object, which can be used to back
 the guest RAM with huge pages.
@@ -4003,6 +4003,26 @@ If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
 
+The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
+with MAP_SYNC flag, which can guarantee the guest write persistence to
+@option{mem-path} even on the host crash and power failures. MAP_SYNC
+requires supports from both the host kernel (since Linux kernel 4.15)
+and @option{mem-path} (only files supporting DAX). It can take one of
+following values:
+
+@table @option
+@item @var{on}
+try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
+@option{share}=@var{off}, @option{pmem}=@var{off} QEMU will abort
+
+@item @var{off}
+never pass MAP_SYNC to mmap(2)
+
+@item @var{auto} (default)
+if MAP_SYNC is supported and @option{share}=@var{on}, @option{pmem}=@var{on}
+work as if @option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
+@end table
+
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
 Creates a memory backend object, which can be used to back the guest RAM.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file
  2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (5 preceding siblings ...)
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option Zhang Yi
@ 2018-12-18  9:18 ` Stefan Hajnoczi
  6 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2018-12-18  9:18 UTC (permalink / raw)
  To: Zhang Yi
  Cc: mst, peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

[-- Attachment #1: Type: text/plain, Size: 1258 bytes --]

On Tue, Dec 18, 2018 at 04:16:44PM +0800, Zhang Yi wrote:
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').
> 
> A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>     https://patchwork.kernel.org/patch/10028151/
> 
> In order to make sure that the file metadata is in sync after a fault 
> while we are writing a shared DAX supporting backend files, this
> patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> 
> As the DAX vs DMA truncated issue was solved, we refined the code and
> send out this feature for the v5 version.
> 
> A new auto on/off option 'sync' is added to memory-backend-file:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
>         'share=off' or 'pmem=off', QEMU will abort
>  - off: never pass MAP_SYNC to mmap(2)
>  - auto (default): if MAP_SYNC is supported and 'share=on' 'pmem=on', work as if
>         'sync=on'; otherwise, work as if 'sync=off'

I took a quick look and didn't see human-readable errors when pmem=off.
Is it possible to report it instead of aborting?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
@ 2018-12-18 13:35   ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-18 13:35 UTC (permalink / raw)
  To: Zhang Yi
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Tue, Dec 18, 2018 at 04:17:04PM +0800, Zhang Yi wrote:
> As more flag parameters besides the existing 'shared' are going to be
> added to qemu_ram_mmap(), let's switch 'shared' to a 'flags' parameter
> in advance, so as to ease the further additions.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>

And that's the trouble with patch 1: no type safety.
See below

> ---
>  exec.c                    |  7 ++++---
>  include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
>  util/mmap-alloc.c         |  8 +++++---
>  util/oslib-posix.c        |  8 +++++++-
>  4 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index bb6170d..e92a7da 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              int fd,
>                              bool truncate,
> +                            uint32_t flags,
>                              Error **errp)
>  {
>      void *area;
> @@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
>          perror("ftruncate");
>      }
>  
> -    area = qemu_ram_mmap(fd, memory, block->mr->align,
> -                         block->flags & RAM_SHARED);
> +    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
>      if (area == MAP_FAILED) {
>          error_setg_errno(errp, errno,
>                           "unable to map backing store for guest RAM");
> @@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = ram_flags;
> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> +            ram_flags, errp);
>      if (!new_block->host) {
>          g_free(new_block);
>          return NULL;
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 50385e3..6fe6ed4 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
>  
>  size_t qemu_mempath_getpagesize(const char *mem_path);
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> +/**
> + * qemu_ram_mmap: mmap the specified file or device.
> + *
> + * Parameters:
> + *  @fd: the file or the device to mmap
> + *  @size: the number of bytes to be mmaped
> + *  @align: if not zero, specify the alignment of the starting mapping address;
> + *          otherwise, the alignment in use will be determined by QEMU.
> + *  @flags: specifies additional properties of the mapping, which can be one or
> + *          bit-or of following values
> + *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          Other bits are ignored.
> + *

So set it to RAM_SHARED to require a shared mapping ...

> + * Return:
> + *  On success, return a pointer to the mapped area.
> + *  On failure, return MAP_FAILED.
> + */
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
>  
>  void qemu_ram_munmap(void *ptr, size_t size);
>  
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index fd329ec..8f0a740 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/mmap-alloc.h"
>  #include "qemu/host-utils.h"
> +#include "exec/memory.h"
>  
>  #define HUGETLBFS_MAGIC       0x958458f6
>  
> @@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>      return getpagesize();
>  }
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>  {
>      /*
>       * Note: this always allocates at least one extra page of virtual address
> @@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * anonymous memory is OK.
>       */
>      int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> -    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> -    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> +    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> +    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
>  #else
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
> +    bool shared = flags & RAM_SHARED;
>      size_t offset;
>      void *ptr1;
>  
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index fbd0dc8..121c31f 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -203,7 +203,13 @@ void *qemu_memalign(size_t alignment, size_t size)
>  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
>  {
>      size_t align = QEMU_VMALLOC_ALIGN;
> -    void *ptr = qemu_ram_mmap(-1, size, align, shared);
> +    uint32_t flags = 0;
> +    void *ptr;
> +
> +    if (shared) {
> +        flags = MAP_SHARED;
> +    }

And here we set it to MAP_SHARED instead.
Not good - was this even tested?

> +    ptr = qemu_ram_mmap(-1, size, align, flags);
>  
>      if (ptr == MAP_FAILED) {
>          return NULL;
> -- 
> 2.7.4


Suggestion:

#ifdef __CHECKER__
#define QEMU_BITWISE __attribute__((bitwise))
#define QEMU_FORCE   __attribute__((force))
#else
#define QEMU_BITWISE
#define QEMU_FORCE
#endif

typedef unsigned QEMU_BITWISE QemuMmapFlags;

#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))

And then run checker to detect misuse.

-- 
MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
@ 2018-12-18 13:52   ` Michael S. Tsirkin
  2018-12-19  9:25     ` Yi Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-18 13:52 UTC (permalink / raw)
  To: Zhang Yi
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Tue, Dec 18, 2018 at 04:17:12PM +0800, Zhang Yi wrote:
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition can guarantee the persistence of guest write
> to the backend file without other QEMU actions (e.g., periodic fsync()
> by QEMU).
> 
> A set of RAM_SYNC flags are added to qemu_ram_mmap():
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>

Should we be reusing RAM_ flags? These are used widely
already -are you going to allow RAM_SYNC everywhere
where other RAM flags are legal?
E.g. memory_region_init_ram_from_file?


> ---
>  exec.c                    |  2 +-
>  include/exec/memory.h     |  3 +++
>  include/exec/ram_addr.h   |  1 +
>  include/qemu/mmap-alloc.h |  1 +
>  include/qemu/osdep.h      | 29 +++++++++++++++++++++++++++++
>  util/mmap-alloc.c         | 14 ++++++++++----
>  6 files changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index e92a7da..dc4d180 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      int64_t file_size;
>  
>      /* Just support these ram flags by now. */
> -    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
> +    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
>  
>      if (xen_enabled()) {
>          error_setg(errp, "-mem-path not supported with Xen");
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 667466b..33a4e2c 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
>  /* RAM is a persistent kind memory */
>  #define RAM_PMEM (1 << 5)
>  
> +/* RAM can be mmap by a MAP_SYNC flag */

"RAM is mmap-ed with MAP_SYNC"

> +#define RAM_SYNC (1 << 6)
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
>                                         hwaddr start, hwaddr end,





> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 9ecd911..d239ce7 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
>   *              or bit-or of following values
>   *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
>   *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
> + *              - RAM_SYNC:   mmap with MAP_SYNC flag
>   *              Other bits are ignored.
>   *  @mem_path or @fd: specify the backing file or device
>   *  @errp: pointer to Error*, to store an error if it happens
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 6fe6ed4..1755a8b 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>   *  @flags: specifies additional properties of the mapping, which can be one or
>   *          bit-or of following values
>   *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          - RAM_SYNC:   mmap with MAP_SYNC flag
>   *          Other bits are ignored.
>   *
>   * Return:
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 3bf48bc..f94ea68 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -410,6 +410,35 @@ void qemu_anon_ram_free(void *ptr, size_t size);
>  #  define QEMU_VMALLOC_ALIGN getpagesize()
>  #endif
>  
> +/*
> + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> + * 4.15, so they may not be defined when compiling on older kernels.

Then you want to import the relevant headers into linux-headers
using scripts/update-linux-headers.sh. Pls do not duplicate code.

> + */
> +#ifdef CONFIG_LINUX
> +
> +#include <sys/mman.h>
> +
> +#ifndef MAP_SHARED_VALIDATE
> +#define MAP_SHARED_VALIDATE   0x3
> +#endif
> +
> +#ifndef MAP_SYNC
> +#define MAP_SYNC              0x80000
> +#endif
> +
> +/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
> +#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)

Given mmap flags tend to start with MAP_, adding our own ones
unconditionally is not a good idea.


> +
> +#else  /* !CONFIG_LINUX */
> +
> +#define MAP_SHARED_VALIDATE   0x0
> +#define MAP_SYNC              0x0
> +
> +#define QEMU_HAS_MAP_SYNC     false

Isn't just checking MAP_SYNC enough?

> +#define MAP_SYNC_FLAGS 0
> +
> +#endif /* CONFIG_LINUX */
> +
>  #ifdef CONFIG_POSIX
>  struct qemu_signalfd_siginfo {
>      uint32_t ssi_signo;   /* Signal number */
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 8f0a740..89ae862 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
>      bool shared = flags & RAM_SHARED;
> +    bool is_pmem = flags & RAM_PMEM;
> +    int mmap_xflags = 0;
>      size_t offset;
>      void *ptr1;
>  
> @@ -109,16 +111,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
>      assert(align >= getpagesize());
> +    if ((flags & RAM_SYNC) && shared && is_pmem) {
> +        mmap_xflags |= MAP_SYNC_FLAGS;
> +    }
>  
>      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> + retry_mmap_fd:
>      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
>                  MAP_FIXED |
>                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> -                (shared ? MAP_SHARED : MAP_PRIVATE),
> +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
>                  fd, 0);
> -    if (ptr1 == MAP_FAILED) {
> -        munmap(ptr, total);
> -        return MAP_FAILED;
> +    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
> +        mmap_xflags &= ~MAP_SYNC_FLAGS;
> +        goto retry_mmap_fd;
>      }
>  
>      if (offset > 0) {
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option Zhang Yi
@ 2018-12-18 14:18   ` Michael S. Tsirkin
  2018-12-19  9:10     ` Yi Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-18 14:18 UTC (permalink / raw)
  To: Zhang Yi
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Tue, Dec 18, 2018 at 04:17:39PM +0800, Zhang Yi wrote:
> This option controls whether QEMU mmap(2)

will mmap

> the memory backend file with
> MAP_SYNC flag, which could consistent filesystem metadata

I'm not sure what does "which could consistent" above mean.


> for each guest
> write, if MAP_SYNC flag is supported by the host kernel(Linux kernel 4.15

space before ( here pls.

> and later) and the backend is a file supporting DAX (e.g., file on ext4/xfs
> file system mounted with '-o dax').
> 
> It can take one of following values:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
>         'share=off' or 'pmem!=on', QEMU will abort

Really abort? What if it's added with object_add?
Should just fail not abort in that case ...

Also you know whether it's built into qemu at build time -
how about skipping it in the schema and man page in that case?


>  - off: never pass MAP_SYNC to mmap(2)
>  - auto (default): if MAP_SYNC is supported and 'share=on' 'pmem=on', work as
>         if 'sync=on'; otherwise, work as if 'sync=off'

So this is a mechanism. And users really do not care about
the mechanism. What do users want? See discussion in nvdimm.txt


> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  backends/hostmem-file.c | 39 +++++++++++++++++++++++++++++++++++++++
>  docs/nvdimm.txt         | 22 +++++++++++++++++++++-
>  include/exec/memory.h   |  8 ++++++++
>  qemu-options.hx         | 22 +++++++++++++++++++++-
>  4 files changed, 89 insertions(+), 2 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index 0dd7a90..73cf181 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -16,6 +16,7 @@
>  #include "sysemu/hostmem.h"
>  #include "sysemu/sysemu.h"
>  #include "qom/object_interfaces.h"
> +#include "qapi/qapi-visit.h"
>  
>  /* hostmem-file.c */
>  /**
> @@ -36,6 +37,7 @@ struct HostMemoryBackendFile {
>      uint64_t align;
>      bool discard_data;
>      bool is_pmem;
> +    OnOffAuto sync;
>  };
>  
>  static void
> @@ -62,6 +64,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>                                   path,
>                                   backend->size, fb->align,
>                                   (backend->share ? RAM_SHARED : 0) |
> +                                 qemu_ram_sync_flags(fb->sync) |
>                                   (fb->is_pmem ? RAM_PMEM : 0),
>                                   fb->mem_path, errp);
>          g_free(path);
> @@ -136,6 +139,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
>      error_propagate(errp, local_err);
>  }
>  
> +static void file_memory_backend_get_sync(
> +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +    OnOffAuto value = fb->sync;
> +
> +    visit_type_OnOffAuto(v, name, &value, errp);
> +}
> +
> +static void file_memory_backend_set_sync(
> +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +    Error *local_err = NULL;
> +    OnOffAuto value;
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(&local_err, "cannot change property '%s' of %s",
> +                   name, object_get_typename(obj));
> +        goto out;
> +    }
> +
> +    visit_type_OnOffAuto(v, name, &value, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +    fb->sync = value;
> +
> + out:
> +    error_propagate(errp, local_err);
> +}
> +
>  static bool file_memory_backend_get_pmem(Object *o, Error **errp)
>  {
>      return MEMORY_BACKEND_FILE(o)->is_pmem;
> @@ -203,6 +239,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
>      object_class_property_add_bool(oc, "pmem",
>          file_memory_backend_get_pmem, file_memory_backend_set_pmem,
>          &error_abort);
> +    object_class_property_add(oc, "sync", "OnOffAuto",
> +        file_memory_backend_get_sync, file_memory_backend_set_sync,
> +        NULL, NULL, &error_abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 5f158a6..d86a270 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -142,11 +142,31 @@ backend of vNVDIMM:
>  Guest Data Persistence
>  ----------------------

>  
> +vNVDIMM is designed and implemented to guarantee
> the guest data
> +persistence on the backends even on the host crash and power
> +failures. However, there are still some requirements and limitations
> +as explained below.
> +

How about:

nNVDIMM can guarantee guest data persistence in case of a host
failure such as a crash or power failure, but only if the following
conditions are satistied.

>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.

BTW I think this user-facing detail belongs in the man page, not the
source doc.


>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which could

Please check all your uses of "could" and "may" and replace
with stronger wording that matches reality. 
I am guessing it's copied from
some other language and I'm trying not to be a language purist
but it's confusing in English.
E.g. no one wants an implementation that could guarantee consistence -
people want one that does guarantee it.

> +consistent filesystem metadata for each guest write. Besides the host
> +kernel support, enabling MAP_SYNC in QEMU also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with '-o dax',

Why list the filesystems here? The list can change with Linux.

> +
> + - 'sync' option of memory-backend-file is not 'off', and
> +
> + - 'share' option of memory-backend-file is 'on'.
> +
> + - 'pmem' option of memory-backend-file is 'on'
> +

Wait isn't this what pmem was supposed to do?
Doesn't it mean "persistent memory"?

Just goes to show that abbreviation is not good in a user interface IMHO.

I think a higher level name like e.g.  "host-failure-persistent"
would be better. Have that set whatever flags you need to
actually be persistent. Thoughts?



>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
>  guest NVDIMM region mapping structure.  This unarmed flag indicates
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c74c467..b398abb 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -26,6 +26,7 @@
>  #include "qom/object.h"
>  #include "qemu/rcu.h"
>  #include "hw/qdev-core.h"
> +#include "qapi/error.h"
>  
>  #define RAM_ADDR_INVALID (~(ram_addr_t)0)
>  
> @@ -136,6 +137,13 @@ typedef struct IOMMUNotifier IOMMUNotifier;
>  
>  #define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
>  
> +static inline uint64_t qemu_ram_sync_flags(OnOffAuto v)
> +{
> +    return v == ON_OFF_AUTO_OFF ? RAM_SYNC_ON_OFF_AUTO_OFF :
> +           v == ON_OFF_AUTO_ON ? RAM_SYNC_ON_OFF_AUTO_ON :
> +           RAM_SYNC_ON_OFF_AUTO_AUTO;
> +}
> +

That's pretty confusing.
ON_OFF_AUTO_AUTO is painful enough. I don't think you want
to duplicate this with all prefixes like RAM_SYNC_ON_OFF_AUTO_ON.
Pls avoid doing it.


>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
>                                         hwaddr start, hwaddr end,
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08f8516..624f634 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3928,7 +3928,7 @@ property must be set.  These objects are placed in the
>  
>  @table @option
>  
> -@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> +@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
>  
>  Creates a memory file backend object, which can be used to back
>  the guest RAM with huge pages.
> @@ -4003,6 +4003,26 @@ If @option{pmem} is set to 'on', QEMU will take necessary operations to
>  guarantee the persistence of its own writes to @option{mem-path}
>  (e.g. in vNVDIMM label emulation and live migration).
>  
> +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> +with MAP_SYNC flag, which can guarantee the guest write persistence to
> +@option{mem-path} even on the host crash and power failures. MAP_SYNC
> +requires supports from both the host kernel (since Linux kernel 4.15)
> +and @option{mem-path} (only files supporting DAX). It can take one of
> +following values:
> +
> +@table @option
> +@item @var{on}
> +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> +@option{share}=@var{off}, @option{pmem}=@var{off} QEMU will abort
> +
> +@item @var{off}
> +never pass MAP_SYNC to mmap(2)
> +
> +@item @var{auto} (default)
> +if MAP_SYNC is supported and @option{share}=@var{on}, @option{pmem}=@var{on}
> +work as if @option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
> +@end table
> +
>  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
>  Creates a memory backend object, which can be used to back the guest RAM.
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-18 14:18   ` Michael S. Tsirkin
@ 2018-12-19  9:10     ` Yi Zhang
  2018-12-19 15:59       ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Yi Zhang @ 2018-12-19  9:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On 2018-12-18 at 09:18:50 -0500, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2018 at 04:17:39PM +0800, Zhang Yi wrote:
> > This option controls whether QEMU mmap(2)
> 
> will mmap
> 
> > the memory backend file with
> > MAP_SYNC flag, which could consistent filesystem metadata
> 
> I'm not sure what does "which could consistent" above mean.
> 
> 
> > for each guest
> > write, if MAP_SYNC flag is supported by the host kernel(Linux kernel 4.15
> 
> space before ( here pls.
> 
> > and later) and the backend is a file supporting DAX (e.g., file on ext4/xfs
> > file system mounted with '-o dax').
> > 
> > It can take one of following values:
> >  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> >         'share=off' or 'pmem!=on', QEMU will abort
> 
> Really abort? What if it's added with object_add?
> Should just fail not abort in that case ...
Ah.. that is a case, will test and improve it.
> 
> Also you know whether it's built into qemu at build time -
> how about skipping it in the schema and man page in that case?
> 
> 
> >  - off: never pass MAP_SYNC to mmap(2)
> >  - auto (default): if MAP_SYNC is supported and 'share=on' 'pmem=on', work as
> >         if 'sync=on'; otherwise, work as if 'sync=off'
> 
> So this is a mechanism. And users really do not care about
> the mechanism. What do users want? See discussion in nvdimm.txt
> 
> 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  backends/hostmem-file.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  docs/nvdimm.txt         | 22 +++++++++++++++++++++-
> >  include/exec/memory.h   |  8 ++++++++
> >  qemu-options.hx         | 22 +++++++++++++++++++++-
> >  4 files changed, 89 insertions(+), 2 deletions(-)
> > 
> > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > index 0dd7a90..73cf181 100644
> > --- a/backends/hostmem-file.c
> > +++ b/backends/hostmem-file.c
> > @@ -16,6 +16,7 @@
> >  #include "sysemu/hostmem.h"
> >  #include "sysemu/sysemu.h"
> >  #include "qom/object_interfaces.h"
> > +#include "qapi/qapi-visit.h"
> >  
> >  /* hostmem-file.c */
> >  /**
> > @@ -36,6 +37,7 @@ struct HostMemoryBackendFile {
> >      uint64_t align;
> >      bool discard_data;
> >      bool is_pmem;
> > +    OnOffAuto sync;
> >  };
> >  
> >  static void
> > @@ -62,6 +64,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> >                                   path,
> >                                   backend->size, fb->align,
> >                                   (backend->share ? RAM_SHARED : 0) |
> > +                                 qemu_ram_sync_flags(fb->sync) |
> >                                   (fb->is_pmem ? RAM_PMEM : 0),
> >                                   fb->mem_path, errp);
> >          g_free(path);
> > @@ -136,6 +139,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
> >      error_propagate(errp, local_err);
> >  }
> >  
> > +static void file_memory_backend_get_sync(
> > +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> > +{
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> > +    OnOffAuto value = fb->sync;
> > +
> > +    visit_type_OnOffAuto(v, name, &value, errp);
> > +}
> > +
> > +static void file_memory_backend_set_sync(
> > +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> > +{
> > +    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> > +    Error *local_err = NULL;
> > +    OnOffAuto value;
> > +
> > +    if (host_memory_backend_mr_inited(backend)) {
> > +        error_setg(&local_err, "cannot change property '%s' of %s",
> > +                   name, object_get_typename(obj));
> > +        goto out;
> > +    }
> > +
> > +    visit_type_OnOffAuto(v, name, &value, &local_err);
> > +    if (local_err) {
> > +        goto out;
> > +    }
> > +    fb->sync = value;
> > +
> > + out:
> > +    error_propagate(errp, local_err);
> > +}
> > +
> >  static bool file_memory_backend_get_pmem(Object *o, Error **errp)
> >  {
> >      return MEMORY_BACKEND_FILE(o)->is_pmem;
> > @@ -203,6 +239,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
> >      object_class_property_add_bool(oc, "pmem",
> >          file_memory_backend_get_pmem, file_memory_backend_set_pmem,
> >          &error_abort);
> > +    object_class_property_add(oc, "sync", "OnOffAuto",
> > +        file_memory_backend_get_sync, file_memory_backend_set_sync,
> > +        NULL, NULL, &error_abort);
> >  }
> >  
> >  static void file_backend_instance_finalize(Object *o)
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index 5f158a6..d86a270 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -142,11 +142,31 @@ backend of vNVDIMM:
> >  Guest Data Persistence
> >  ----------------------
> 
> >  
> > +vNVDIMM is designed and implemented to guarantee
> > the guest data
> > +persistence on the backends even on the host crash and power
> > +failures. However, there are still some requirements and limitations
> > +as explained below.
> > +
> 
> How about:
> 
> nNVDIMM can guarantee guest data persistence in case of a host
> failure such as a crash or power failure, but only if the following
> conditions are satistied.
> 
> >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > -currently the only one that can guarantee the guest write persistence
> > +if MAP_SYNC is not supported by the host kernel and the backends,
> > +the only backend that can guarantee the guest write persistence
> >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> >  which all guest access do not involve any host-side kernel cache.
> 
> BTW I think this user-facing detail belongs in the man page, not the
> source doc.

Agree.
> 
> 
> >  
> > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > +systems, QEMU can mmap(2) the backend with MAP_SYNC, which could
> 
> Please check all your uses of "could" and "may" and replace
> with stronger wording that matches reality. 
> I am guessing it's copied from
> some other language and I'm trying not to be a language purist
> but it's confusing in English.
> E.g. no one wants an implementation that could guarantee consistence -
> people want one that does guarantee it.
Sorry for bring lots of confuse for that, I will try to update the
descriptions.
> 
> > +consistent filesystem metadata for each guest write. Besides the host
> > +kernel support, enabling MAP_SYNC in QEMU also requires:
> > +
> > + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> > +   xfs file system mounted with '-o dax',
> 
> Why list the filesystems here? The list can change with Linux.
> 
> > +
> > + - 'sync' option of memory-backend-file is not 'off', and
> > +
> > + - 'share' option of memory-backend-file is 'on'.
> > +
> > + - 'pmem' option of memory-backend-file is 'on'
> > +
> 
> Wait isn't this what pmem was supposed to do?
> Doesn't it mean "persistent memory"?
pmem is a option for memory-backend-file, user should know the backend
is in host persistent memory, with this flags on, while there is a host crash
or a power failures.

[1] Qemu will take necessary operations to guarantee the persistence.
https://patchwork.ozlabs.org/cover/944749/ 

[2] Host kernel also take opretions to consistent filesystem metadata.
Add MAP_SYNC flags.

> 
> Just goes to show that abbreviation is not good in a user interface IMHO.
> 
> I think a higher level name like e.g.  "host-failure-persistent"
> would be better. Have that set whatever flags you need to
> actually be persistent. Thoughts?
> 
> 
> 
> >  When using other types of backends, it's suggested to set 'unarmed'
> >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index c74c467..b398abb 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -26,6 +26,7 @@
> >  #include "qom/object.h"
> >  #include "qemu/rcu.h"
> >  #include "hw/qdev-core.h"
> > +#include "qapi/error.h"
> >  
> >  #define RAM_ADDR_INVALID (~(ram_addr_t)0)
> >  
> > @@ -136,6 +137,13 @@ typedef struct IOMMUNotifier IOMMUNotifier;
> >  
> >  #define RAM_SYNC (RAM_SYNC_ON_OFF_AUTO_ON | RAM_SYNC_ON_OFF_AUTO_AUTO)
> >  
> > +static inline uint64_t qemu_ram_sync_flags(OnOffAuto v)
> > +{
> > +    return v == ON_OFF_AUTO_OFF ? RAM_SYNC_ON_OFF_AUTO_OFF :
> > +           v == ON_OFF_AUTO_ON ? RAM_SYNC_ON_OFF_AUTO_ON :
> > +           RAM_SYNC_ON_OFF_AUTO_AUTO;
> > +}
> > +
> 
> That's pretty confusing.
> ON_OFF_AUTO_AUTO is painful enough. I don't think you want
> to duplicate this with all prefixes like RAM_SYNC_ON_OFF_AUTO_ON.
> Pls avoid doing it.
> 
> 
> >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> >                                         IOMMUNotifierFlag flags,
> >                                         hwaddr start, hwaddr end,
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 08f8516..624f634 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -3928,7 +3928,7 @@ property must be set.  These objects are placed in the
> >  
> >  @table @option
> >  
> > -@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> > +@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
> >  
> >  Creates a memory file backend object, which can be used to back
> >  the guest RAM with huge pages.
> > @@ -4003,6 +4003,26 @@ If @option{pmem} is set to 'on', QEMU will take necessary operations to
> >  guarantee the persistence of its own writes to @option{mem-path}
> >  (e.g. in vNVDIMM label emulation and live migration).
> >  
> > +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> > +with MAP_SYNC flag, which can guarantee the guest write persistence to
> > +@option{mem-path} even on the host crash and power failures. MAP_SYNC
> > +requires supports from both the host kernel (since Linux kernel 4.15)
> > +and @option{mem-path} (only files supporting DAX). It can take one of
> > +following values:
> > +
> > +@table @option
> > +@item @var{on}
> > +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> > +@option{share}=@var{off}, @option{pmem}=@var{off} QEMU will abort
> > +
> > +@item @var{off}
> > +never pass MAP_SYNC to mmap(2)
> > +
> > +@item @var{auto} (default)
> > +if MAP_SYNC is supported and @option{share}=@var{on}, @option{pmem}=@var{on}
> > +work as if @option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
> > +@end table
> > +
> >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >  
> >  Creates a memory backend object, which can be used to back the guest RAM.
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2018-12-18 13:52   ` Michael S. Tsirkin
@ 2018-12-19  9:25     ` Yi Zhang
  2018-12-19 16:42       ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Yi Zhang @ 2018-12-19  9:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On 2018-12-18 at 08:52:09 -0500, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2018 at 04:17:12PM +0800, Zhang Yi wrote:
> > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > MAP_SYNC flag in addition can guarantee the persistence of guest write
> > to the backend file without other QEMU actions (e.g., periodic fsync()
> > by QEMU).
> > 
> > A set of RAM_SYNC flags are added to qemu_ram_mmap():
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> Should we be reusing RAM_ flags? These are used widely
> already -are you going to allow RAM_SYNC everywhere
> where other RAM flags are legal?
> E.g. memory_region_init_ram_from_file?
To reuse the RAM_* is much straightforward, just like RAM_SHARE, it is a
mmmap flag, and we will check the inlegal flags at a easy way.

assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);

What do you think, do you have a better choice? Thanks, Micheal.
> 
> 
> > ---
> >  exec.c                    |  2 +-
> >  include/exec/memory.h     |  3 +++
> >  include/exec/ram_addr.h   |  1 +
> >  include/qemu/mmap-alloc.h |  1 +
> >  include/qemu/osdep.h      | 29 +++++++++++++++++++++++++++++
> >  util/mmap-alloc.c         | 14 ++++++++++----
> >  6 files changed, 45 insertions(+), 5 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index e92a7da..dc4d180 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> >      int64_t file_size;
> >  
> >      /* Just support these ram flags by now. */
> > -    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
> > +    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
> >  
> >      if (xen_enabled()) {
> >          error_setg(errp, "-mem-path not supported with Xen");
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 667466b..33a4e2c 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
> >  /* RAM is a persistent kind memory */
> >  #define RAM_PMEM (1 << 5)
> >  
> > +/* RAM can be mmap by a MAP_SYNC flag */
> 
> "RAM is mmap-ed with MAP_SYNC"
> 
> > +#define RAM_SYNC (1 << 6)
> > +
> >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> >                                         IOMMUNotifierFlag flags,
> >                                         hwaddr start, hwaddr end,
> 
> 
> 
> 
> 
> > diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> > index 9ecd911..d239ce7 100644
> > --- a/include/exec/ram_addr.h
> > +++ b/include/exec/ram_addr.h
> > @@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
> >   *              or bit-or of following values
> >   *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
> >   *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
> > + *              - RAM_SYNC:   mmap with MAP_SYNC flag
> >   *              Other bits are ignored.
> >   *  @mem_path or @fd: specify the backing file or device
> >   *  @errp: pointer to Error*, to store an error if it happens
> > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > index 6fe6ed4..1755a8b 100644
> > --- a/include/qemu/mmap-alloc.h
> > +++ b/include/qemu/mmap-alloc.h
> > @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
> >   *  @flags: specifies additional properties of the mapping, which can be one or
> >   *          bit-or of following values
> >   *          - RAM_SHARED: mmap with MAP_SHARED flag
> > + *          - RAM_SYNC:   mmap with MAP_SYNC flag
> >   *          Other bits are ignored.
> >   *
> >   * Return:
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 3bf48bc..f94ea68 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -410,6 +410,35 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> >  #endif
> >  
> > +/*
> > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > + * 4.15, so they may not be defined when compiling on older kernels.
> 
> Then you want to import the relevant headers into linux-headers
> using scripts/update-linux-headers.sh. Pls do not duplicate code.
Indeed, will update it. 
> 
> > + */
> > +#ifdef CONFIG_LINUX
> > +
> > +#include <sys/mman.h>
> > +
> > +#ifndef MAP_SHARED_VALIDATE
> > +#define MAP_SHARED_VALIDATE   0x3
> > +#endif
> > +
> > +#ifndef MAP_SYNC
> > +#define MAP_SYNC              0x80000
> > +#endif
> > +
> > +/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
> > +#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
> 
> Given mmap flags tend to start with MAP_, adding our own ones
> unconditionally is not a good idea.
How about to move the define to other place? 
> 
> 
> > +
> > +#else  /* !CONFIG_LINUX */
> > +
> > +#define MAP_SHARED_VALIDATE   0x0
> > +#define MAP_SYNC              0x0
> > +
> > +#define QEMU_HAS_MAP_SYNC     false
> 
> Isn't just checking MAP_SYNC enough?
> 
> > +#define MAP_SYNC_FLAGS 0
> > +
> > +#endif /* CONFIG_LINUX */
> > +
> >  #ifdef CONFIG_POSIX
> >  struct qemu_signalfd_siginfo {
> >      uint32_t ssi_signo;   /* Signal number */
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index 8f0a740..89ae862 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >  #endif
> >      bool shared = flags & RAM_SHARED;
> > +    bool is_pmem = flags & RAM_PMEM;
> > +    int mmap_xflags = 0;
> >      size_t offset;
> >      void *ptr1;
> >  
> > @@ -109,16 +111,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      assert(is_power_of_2(align));
> >      /* Always align to host page size */
> >      assert(align >= getpagesize());
> > +    if ((flags & RAM_SYNC) && shared && is_pmem) {
> > +        mmap_xflags |= MAP_SYNC_FLAGS;
> > +    }
> >  
> >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > + retry_mmap_fd:
> >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> >                  MAP_FIXED |
> >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> >                  fd, 0);
> > -    if (ptr1 == MAP_FAILED) {
> > -        munmap(ptr, total);
> > -        return MAP_FAILED;
> > +    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
> > +        mmap_xflags &= ~MAP_SYNC_FLAGS;
> > +        goto retry_mmap_fd;
> >      }
> >  
> >      if (offset > 0) {
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-19  9:10     ` Yi Zhang
@ 2018-12-19 15:59       ` Michael S. Tsirkin
  2018-12-20  3:03         ` Yi Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-19 15:59 UTC (permalink / raw)
  To: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > +
> > > + - 'sync' option of memory-backend-file is not 'off', and
> > > +
> > > + - 'share' option of memory-backend-file is 'on'.
> > > +
> > > + - 'pmem' option of memory-backend-file is 'on'
> > > +
> > 
> > Wait isn't this what pmem was supposed to do?
> > Doesn't it mean "persistent memory"?
> pmem is a option for memory-backend-file, user should know the backend
> is in host persistent memory, with this flags on, while there is a host crash
> or a power failures.
> 
> [1] Qemu will take necessary operations to guarantee the persistence.
> https://patchwork.ozlabs.org/cover/944749/ 
> 
> [2] Host kernel also take opretions to consistent filesystem metadata.
> Add MAP_SYNC flags.

OK so I'm a user. Can you educate me please?  When should MAP_SYNC not
be set? Are there any disadvantages (e.g.  performance?)?

-- 
MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2018-12-19  9:25     ` Yi Zhang
@ 2018-12-19 16:42       ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-19 16:42 UTC (permalink / raw)
  To: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Wed, Dec 19, 2018 at 05:25:29PM +0800, Yi Zhang wrote:
> On 2018-12-18 at 08:52:09 -0500, Michael S. Tsirkin wrote:
> > On Tue, Dec 18, 2018 at 04:17:12PM +0800, Zhang Yi wrote:
> > > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > > MAP_SYNC flag in addition can guarantee the persistence of guest write
> > > to the backend file without other QEMU actions (e.g., periodic fsync()
> > > by QEMU).
> > > 
> > > A set of RAM_SYNC flags are added to qemu_ram_mmap():
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > 
> > Should we be reusing RAM_ flags? These are used widely
> > already -are you going to allow RAM_SYNC everywhere
> > where other RAM flags are legal?
> > E.g. memory_region_init_ram_from_file?
> To reuse the RAM_* is much straightforward, just like RAM_SHARE, it is a
> mmmap flag, and we will check the inlegal flags at a easy way.
> 
> assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);

Doesn't help if a flag misuse happens to match one of these.

> What do you think, do you have a better choice? Thanks, Micheal.

How about your own flag and static checking with e.g. sparse?

> > 
> > 
> > > ---
> > >  exec.c                    |  2 +-
> > >  include/exec/memory.h     |  3 +++
> > >  include/exec/ram_addr.h   |  1 +
> > >  include/qemu/mmap-alloc.h |  1 +
> > >  include/qemu/osdep.h      | 29 +++++++++++++++++++++++++++++
> > >  util/mmap-alloc.c         | 14 ++++++++++----
> > >  6 files changed, 45 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/exec.c b/exec.c
> > > index e92a7da..dc4d180 100644
> > > --- a/exec.c
> > > +++ b/exec.c
> > > @@ -2241,7 +2241,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> > >      int64_t file_size;
> > >  
> > >      /* Just support these ram flags by now. */
> > > -    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM)) == 0);
> > > +    assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_SYNC)) == 0);
> > >  
> > >      if (xen_enabled()) {
> > >          error_setg(errp, "-mem-path not supported with Xen");
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 667466b..33a4e2c 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -126,6 +126,9 @@ typedef struct IOMMUNotifier IOMMUNotifier;
> > >  /* RAM is a persistent kind memory */
> > >  #define RAM_PMEM (1 << 5)
> > >  
> > > +/* RAM can be mmap by a MAP_SYNC flag */
> > 
> > "RAM is mmap-ed with MAP_SYNC"
> > 
> > > +#define RAM_SYNC (1 << 6)
> > > +
> > >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> > >                                         IOMMUNotifierFlag flags,
> > >                                         hwaddr start, hwaddr end,
> > 
> > 
> > 
> > 
> > 
> > > diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> > > index 9ecd911..d239ce7 100644
> > > --- a/include/exec/ram_addr.h
> > > +++ b/include/exec/ram_addr.h
> > > @@ -87,6 +87,7 @@ long qemu_getrampagesize(void);
> > >   *              or bit-or of following values
> > >   *              - RAM_SHARED: mmap the backing file or device with MAP_SHARED
> > >   *              - RAM_PMEM: the backend @mem_path or @fd is persistent memory
> > > + *              - RAM_SYNC:   mmap with MAP_SYNC flag
> > >   *              Other bits are ignored.
> > >   *  @mem_path or @fd: specify the backing file or device
> > >   *  @errp: pointer to Error*, to store an error if it happens
> > > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > > index 6fe6ed4..1755a8b 100644
> > > --- a/include/qemu/mmap-alloc.h
> > > +++ b/include/qemu/mmap-alloc.h
> > > @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
> > >   *  @flags: specifies additional properties of the mapping, which can be one or
> > >   *          bit-or of following values
> > >   *          - RAM_SHARED: mmap with MAP_SHARED flag
> > > + *          - RAM_SYNC:   mmap with MAP_SYNC flag
> > >   *          Other bits are ignored.
> > >   *
> > >   * Return:
> > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > > index 3bf48bc..f94ea68 100644
> > > --- a/include/qemu/osdep.h
> > > +++ b/include/qemu/osdep.h
> > > @@ -410,6 +410,35 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> > >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> > >  #endif
> > >  
> > > +/*
> > > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > > + * 4.15, so they may not be defined when compiling on older kernels.
> > 
> > Then you want to import the relevant headers into linux-headers
> > using scripts/update-linux-headers.sh. Pls do not duplicate code.
> Indeed, will update it. 
> > 
> > > + */
> > > +#ifdef CONFIG_LINUX
> > > +
> > > +#include <sys/mman.h>
> > > +
> > > +#ifndef MAP_SHARED_VALIDATE
> > > +#define MAP_SHARED_VALIDATE   0x3
> > > +#endif
> > > +
> > > +#ifndef MAP_SYNC
> > > +#define MAP_SYNC              0x80000
> > > +#endif
> > > +
> > > +/* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
> > > +#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
> > 
> > Given mmap flags tend to start with MAP_, adding our own ones
> > unconditionally is not a good idea.
> How about to move the define to other place? 

I'd just open-code, it's not used so widely.

> > 
> > 
> > > +
> > > +#else  /* !CONFIG_LINUX */
> > > +
> > > +#define MAP_SHARED_VALIDATE   0x0
> > > +#define MAP_SYNC              0x0
> > > +
> > > +#define QEMU_HAS_MAP_SYNC     false
> > 
> > Isn't just checking MAP_SYNC enough?
> > 
> > > +#define MAP_SYNC_FLAGS 0
> > > +
> > > +#endif /* CONFIG_LINUX */
> > > +
> > >  #ifdef CONFIG_POSIX
> > >  struct qemu_signalfd_siginfo {
> > >      uint32_t ssi_signo;   /* Signal number */
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index 8f0a740..89ae862 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >  #endif
> > >      bool shared = flags & RAM_SHARED;
> > > +    bool is_pmem = flags & RAM_PMEM;
> > > +    int mmap_xflags = 0;
> > >      size_t offset;
> > >      void *ptr1;
> > >  
> > > @@ -109,16 +111,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      assert(is_power_of_2(align));
> > >      /* Always align to host page size */
> > >      assert(align >= getpagesize());
> > > +    if ((flags & RAM_SYNC) && shared && is_pmem) {
> > > +        mmap_xflags |= MAP_SYNC_FLAGS;
> > > +    }
> > >  
> > >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > > + retry_mmap_fd:
> > >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> > >                  MAP_FIXED |
> > >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> > >                  fd, 0);
> > > -    if (ptr1 == MAP_FAILED) {
> > > -        munmap(ptr, total);
> > > -        return MAP_FAILED;
> > > +    if ((ptr1 == MAP_FAILED) && (mmap_xflags & MAP_SYNC_FLAGS)) {
> > > +        mmap_xflags &= ~MAP_SYNC_FLAGS;
> > > +        goto retry_mmap_fd;
> > >      }
> > >  
> > >      if (offset > 0) {
> > > -- 
> > > 2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-19 15:59       ` Michael S. Tsirkin
@ 2018-12-20  3:03         ` Yi Zhang
  2018-12-20  3:42           ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Yi Zhang @ 2018-12-20  3:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > +
> > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > +
> > > > + - 'share' option of memory-backend-file is 'on'.
> > > > +
> > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > +
> > > 
> > > Wait isn't this what pmem was supposed to do?
> > > Doesn't it mean "persistent memory"?
> > pmem is a option for memory-backend-file, user should know the backend
> > is in host persistent memory, with this flags on, while there is a host crash
> > or a power failures.
> > 
> > [1] Qemu will take necessary operations to guarantee the persistence.
> > https://patchwork.ozlabs.org/cover/944749/ 
> > 
> > [2] Host kernel also take opretions to consistent filesystem metadata.
> > Add MAP_SYNC flags.
> 
> OK so I'm a user. Can you educate me please?  
We suppose an administrator should know it, what is the back-end region coming from,
is it persistent? what is the font-end device is? a volatile dimm or an
nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
If he didn't, we encourage that don't set these 2 flags.

> When should MAP_SYNC not
> be set? Are there any disadvantages (e.g.  performance?)?
Not only the performance, sometimes like the front-end device is an
volatile ram, we don't wanna set such option although the backend is a
novolatile memory, if power lose, all of thing should lose in this ram.

> 
> -- 
> MST
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-20  3:03         ` Yi Zhang
@ 2018-12-20  3:42           ` Michael S. Tsirkin
  2018-12-20  5:37             ` Yi Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-20  3:42 UTC (permalink / raw)
  To: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote:
> On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > > +
> > > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > > +
> > > > > + - 'share' option of memory-backend-file is 'on'.
> > > > > +
> > > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > > +
> > > > 
> > > > Wait isn't this what pmem was supposed to do?
> > > > Doesn't it mean "persistent memory"?
> > > pmem is a option for memory-backend-file, user should know the backend
> > > is in host persistent memory, with this flags on, while there is a host crash
> > > or a power failures.
> > > 
> > > [1] Qemu will take necessary operations to guarantee the persistence.
> > > https://patchwork.ozlabs.org/cover/944749/ 
> > > 
> > > [2] Host kernel also take opretions to consistent filesystem metadata.
> > > Add MAP_SYNC flags.
> > 
> > OK so I'm a user. Can you educate me please?  
> We suppose an administrator should know it, what is the back-end region coming from,
> is it persistent? what is the font-end device is? a volatile dimm or an
> nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
> If he didn't, we encourage that don't set these 2 flags.
> 
> > When should MAP_SYNC not
> > be set? Are there any disadvantages (e.g.  performance?)?
> Not only the performance, sometimes like the front-end device is an
> volatile ram, we don't wanna set such option although the backend is a
> novolatile memory, if power lose, all of thing should lose in this ram.



I am not sure how does above answer the questions. If I don't know,
neither will the hypothetical administrator. Looks like a better
interface is needed to make the choice on behalf of the user.


> > 
> > -- 
> > MST
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-20  3:42           ` Michael S. Tsirkin
@ 2018-12-20  5:37             ` Yi Zhang
  2018-12-20 14:06               ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Yi Zhang @ 2018-12-20  5:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On 2018-12-19 at 22:42:07 -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote:
> > On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > > > +
> > > > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > > > +
> > > > > > + - 'share' option of memory-backend-file is 'on'.
> > > > > > +
> > > > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > > > +
> > > > > 
> > > > > Wait isn't this what pmem was supposed to do?
> > > > > Doesn't it mean "persistent memory"?
> > > > pmem is a option for memory-backend-file, user should know the backend
> > > > is in host persistent memory, with this flags on, while there is a host crash
> > > > or a power failures.
> > > > 
> > > > [1] Qemu will take necessary operations to guarantee the persistence.
> > > > https://patchwork.ozlabs.org/cover/944749/ 
> > > > 
> > > > [2] Host kernel also take opretions to consistent filesystem metadata.
> > > > Add MAP_SYNC flags.
> > > 
> > > OK so I'm a user. Can you educate me please?  
> > We suppose an administrator should know it, what is the back-end region coming from,
> > is it persistent? what is the font-end device is? a volatile dimm or an
> > nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
> > If he didn't, we encourage that don't set these 2 flags.
> > 
> > > When should MAP_SYNC not
> > > be set? Are there any disadvantages (e.g.  performance?)?
> > Not only the performance, sometimes like the front-end device is an
> > volatile ram, we don't wanna set such option although the backend is a
> > novolatile memory, if power lose, all of thing should lose in this ram.
> 
> 
> 
> I am not sure how does above answer the questions. If I don't know,
> neither will the hypothetical administrator. Looks like a better
> interface is needed to make the choice on behalf of the user.
> 
By default, we have already ignore the 2 flags, unless the administrator
know how to make that use. By-now, seems we don't have a better way to detect what
memory-backend-file is, a persistent memory or not.
> 
> > > 
> > > -- 
> > > MST
> > > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-20  5:37             ` Yi Zhang
@ 2018-12-20 14:06               ` Michael S. Tsirkin
  2018-12-21  3:18                 ` Yi Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-20 14:06 UTC (permalink / raw)
  To: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Thu, Dec 20, 2018 at 01:37:40PM +0800, Yi Zhang wrote:
> On 2018-12-19 at 22:42:07 -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote:
> > > On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > > > > +
> > > > > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > > > > +
> > > > > > > + - 'share' option of memory-backend-file is 'on'.
> > > > > > > +
> > > > > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > > > > +
> > > > > > 
> > > > > > Wait isn't this what pmem was supposed to do?
> > > > > > Doesn't it mean "persistent memory"?
> > > > > pmem is a option for memory-backend-file, user should know the backend
> > > > > is in host persistent memory, with this flags on, while there is a host crash
> > > > > or a power failures.
> > > > > 
> > > > > [1] Qemu will take necessary operations to guarantee the persistence.
> > > > > https://patchwork.ozlabs.org/cover/944749/ 
> > > > > 
> > > > > [2] Host kernel also take opretions to consistent filesystem metadata.
> > > > > Add MAP_SYNC flags.
> > > > 
> > > > OK so I'm a user. Can you educate me please?  
> > > We suppose an administrator should know it, what is the back-end region coming from,
> > > is it persistent? what is the font-end device is? a volatile dimm or an
> > > nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
> > > If he didn't, we encourage that don't set these 2 flags.
> > > 
> > > > When should MAP_SYNC not
> > > > be set? Are there any disadvantages (e.g.  performance?)?
> > > Not only the performance, sometimes like the front-end device is an
> > > volatile ram, we don't wanna set such option although the backend is a
> > > novolatile memory, if power lose, all of thing should lose in this ram.
> > 
> > 
> > 
> > I am not sure how does above answer the questions. If I don't know,
> > neither will the hypothetical administrator. Looks like a better
> > interface is needed to make the choice on behalf of the user.
> > 
> By default, we have already ignore the 2 flags, unless the administrator
> know how to make that use. By-now, seems we don't have a better way to detect what
> memory-backend-file is, a persistent memory or not.

In that case how about an interface where user tells QEMU "this backend
is in persistent memory"?



> > 
> > > > 
> > > > -- 
> > > > MST
> > > > 
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-20 14:06               ` Michael S. Tsirkin
@ 2018-12-21  3:18                 ` Yi Zhang
  2018-12-21 16:36                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Yi Zhang @ 2018-12-21  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On 2018-12-20 at 09:06:41 -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 01:37:40PM +0800, Yi Zhang wrote:
> > On 2018-12-19 at 22:42:07 -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote:
> > > > On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > > > > > +
> > > > > > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > > > > > +
> > > > > > > > + - 'share' option of memory-backend-file is 'on'.
> > > > > > > > +
> > > > > > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > > > > > +
> > > > > > > 
> > > > > > > Wait isn't this what pmem was supposed to do?
> > > > > > > Doesn't it mean "persistent memory"?
> > > > > > pmem is a option for memory-backend-file, user should know the backend
> > > > > > is in host persistent memory, with this flags on, while there is a host crash
> > > > > > or a power failures.
> > > > > > 
> > > > > > [1] Qemu will take necessary operations to guarantee the persistence.
> > > > > > https://patchwork.ozlabs.org/cover/944749/ 
> > > > > > 
> > > > > > [2] Host kernel also take opretions to consistent filesystem metadata.
> > > > > > Add MAP_SYNC flags.
> > > > > 
> > > > > OK so I'm a user. Can you educate me please?  
> > > > We suppose an administrator should know it, what is the back-end region coming from,
> > > > is it persistent? what is the font-end device is? a volatile dimm or an
> > > > nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
> > > > If he didn't, we encourage that don't set these 2 flags.
> > > > 
> > > > > When should MAP_SYNC not
> > > > > be set? Are there any disadvantages (e.g.  performance?)?
> > > > Not only the performance, sometimes like the front-end device is an
> > > > volatile ram, we don't wanna set such option although the backend is a
> > > > novolatile memory, if power lose, all of thing should lose in this ram.
> > > 
> > > 
> > > 
> > > I am not sure how does above answer the questions. If I don't know,
> > > neither will the hypothetical administrator. Looks like a better
> > > interface is needed to make the choice on behalf of the user.
> > > 
> > By default, we have already ignore the 2 flags, unless the administrator
> > know how to make that use. By-now, seems we don't have a better way to detect what
> > memory-backend-file is, a persistent memory or not.
> 
> In that case how about an interface where user tells QEMU "this backend
> is in persistent memory"?

Actually, [pmem=on] already did this, we can get the backend memory type from:

file_memory_backend_get_pmem(),

That is already being used in the memory_region_init_ram_from_file. that
is why I reuse the RAM_PMEM to identify the region coming from a
persitent memory? correct me if something I misunderstood? 


> 
> 
> 
> > > 
> > > > > 
> > > > > -- 
> > > > > MST
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-21  3:18                 ` Yi Zhang
@ 2018-12-21 16:36                   ` Michael S. Tsirkin
  2018-12-24  8:11                     ` Yi Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2018-12-21 16:36 UTC (permalink / raw)
  To: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Fri, Dec 21, 2018 at 11:18:18AM +0800, Yi Zhang wrote:
> On 2018-12-20 at 09:06:41 -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 20, 2018 at 01:37:40PM +0800, Yi Zhang wrote:
> > > On 2018-12-19 at 22:42:07 -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote:
> > > > > On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> > > > > > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > > > > > > +
> > > > > > > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > > > > > > +
> > > > > > > > > + - 'share' option of memory-backend-file is 'on'.
> > > > > > > > > +
> > > > > > > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > > > > > > +
> > > > > > > > 
> > > > > > > > Wait isn't this what pmem was supposed to do?
> > > > > > > > Doesn't it mean "persistent memory"?
> > > > > > > pmem is a option for memory-backend-file, user should know the backend
> > > > > > > is in host persistent memory, with this flags on, while there is a host crash
> > > > > > > or a power failures.
> > > > > > > 
> > > > > > > [1] Qemu will take necessary operations to guarantee the persistence.
> > > > > > > https://patchwork.ozlabs.org/cover/944749/ 
> > > > > > > 
> > > > > > > [2] Host kernel also take opretions to consistent filesystem metadata.
> > > > > > > Add MAP_SYNC flags.
> > > > > > 
> > > > > > OK so I'm a user. Can you educate me please?  
> > > > > We suppose an administrator should know it, what is the back-end region coming from,
> > > > > is it persistent? what is the font-end device is? a volatile dimm or an
> > > > > nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
> > > > > If he didn't, we encourage that don't set these 2 flags.
> > > > > 
> > > > > > When should MAP_SYNC not
> > > > > > be set? Are there any disadvantages (e.g.  performance?)?
> > > > > Not only the performance, sometimes like the front-end device is an
> > > > > volatile ram, we don't wanna set such option although the backend is a
> > > > > novolatile memory, if power lose, all of thing should lose in this ram.
> > > > 
> > > > 
> > > > 
> > > > I am not sure how does above answer the questions. If I don't know,
> > > > neither will the hypothetical administrator. Looks like a better
> > > > interface is needed to make the choice on behalf of the user.
> > > > 
> > > By default, we have already ignore the 2 flags, unless the administrator
> > > know how to make that use. By-now, seems we don't have a better way to detect what
> > > memory-backend-file is, a persistent memory or not.
> > 
> > In that case how about an interface where user tells QEMU "this backend
> > is in persistent memory"?
> 
> Actually, [pmem=on] already did this, we can get the backend memory type from:
> 
> file_memory_backend_get_pmem(),
> 
> That is already being used in the memory_region_init_ram_from_file. that
> is why I reuse the RAM_PMEM to identify the region coming from a
> persitent memory? correct me if something I misunderstood? 

Right and thus my question: why not set SYNC unconditionally with pmem=on?

> 
> > 
> > 
> > 
> > > > 
> > > > > > 
> > > > > > -- 
> > > > > > MST
> > > > > > 
> > > > 
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-21 16:36                   ` Michael S. Tsirkin
@ 2018-12-24  8:11                     ` Yi Zhang
  2019-09-16 15:14                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Yi Zhang @ 2018-12-24  8:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On 2018-12-21 at 11:36:07 -0500, Michael S. Tsirkin wrote:
> On Fri, Dec 21, 2018 at 11:18:18AM +0800, Yi Zhang wrote:
> > On 2018-12-20 at 09:06:41 -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2018 at 01:37:40PM +0800, Yi Zhang wrote:
> > > > On 2018-12-19 at 22:42:07 -0500, Michael S. Tsirkin wrote:
> > > > > On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote:
> > > > > > On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > > > > > > > +
> > > > > > > > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > > > > > > > +
> > > > > > > > > > + - 'share' option of memory-backend-file is 'on'.
> > > > > > > > > > +
> > > > > > > > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > > > > > > > +
> > > > > > > > > 
> > > > > > > > > Wait isn't this what pmem was supposed to do?
> > > > > > > > > Doesn't it mean "persistent memory"?
> > > > > > > > pmem is a option for memory-backend-file, user should know the backend
> > > > > > > > is in host persistent memory, with this flags on, while there is a host crash
> > > > > > > > or a power failures.
> > > > > > > > 
> > > > > > > > [1] Qemu will take necessary operations to guarantee the persistence.
> > > > > > > > https://patchwork.ozlabs.org/cover/944749/ 
> > > > > > > > 
> > > > > > > > [2] Host kernel also take opretions to consistent filesystem metadata.
> > > > > > > > Add MAP_SYNC flags.
> > > > > > > 
> > > > > > > OK so I'm a user. Can you educate me please?  
> > > > > > We suppose an administrator should know it, what is the back-end region coming from,
> > > > > > is it persistent? what is the font-end device is? a volatile dimm or an
> > > > > > nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
> > > > > > If he didn't, we encourage that don't set these 2 flags.
> > > > > > 
> > > > > > > When should MAP_SYNC not
> > > > > > > be set? Are there any disadvantages (e.g.  performance?)?
> > > > > > Not only the performance, sometimes like the front-end device is an
> > > > > > volatile ram, we don't wanna set such option although the backend is a
> > > > > > novolatile memory, if power lose, all of thing should lose in this ram.
> > > > > 
> > > > > 
> > > > > 
> > > > > I am not sure how does above answer the questions. If I don't know,
> > > > > neither will the hypothetical administrator. Looks like a better
> > > > > interface is needed to make the choice on behalf of the user.
> > > > > 
> > > > By default, we have already ignore the 2 flags, unless the administrator
> > > > know how to make that use. By-now, seems we don't have a better way to detect what
> > > > memory-backend-file is, a persistent memory or not.
> > > 
> > > In that case how about an interface where user tells QEMU "this backend
> > > is in persistent memory"?
> > 
> > Actually, [pmem=on] already did this, we can get the backend memory type from:
> > 
> > file_memory_backend_get_pmem(),
> > 
> > That is already being used in the memory_region_init_ram_from_file. that
> > is why I reuse the RAM_PMEM to identify the region coming from a
> > persitent memory? correct me if something I misunderstood? 
> 
> Right and thus my question: why not set SYNC unconditionally with pmem=on?
A case is that we set pmem but don't wanna sync in a /dev/dax backend,
which is direct mapping to physical nvdimm device without filesystem aware,
SYNC is useless in this condition.
> 
> > 
> > > 
> > > 
> > > 
> > > > > 
> > > > > > > 
> > > > > > > -- 
> > > > > > > MST
> > > > > > > 
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
  2018-12-24  8:11                     ` Yi Zhang
@ 2019-09-16 15:14                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2019-09-16 15:14 UTC (permalink / raw)
  To: peter.maydell, yu.c.zhang, ehabkost, imammedo, qemu-devel,
	pbonzini, stefanha, crosthwaite.peter, richard.henderson,
	xiaoguangrong.eric

On Mon, Dec 24, 2018 at 04:11:43PM +0800, Yi Zhang wrote:
> On 2018-12-21 at 11:36:07 -0500, Michael S. Tsirkin wrote:
> > On Fri, Dec 21, 2018 at 11:18:18AM +0800, Yi Zhang wrote:
> > > On 2018-12-20 at 09:06:41 -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 20, 2018 at 01:37:40PM +0800, Yi Zhang wrote:
> > > > > On 2018-12-19 at 22:42:07 -0500, Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote:
> > > > > > > On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote:
> > > > > > > > > > > +
> > > > > > > > > > > + - 'sync' option of memory-backend-file is not 'off', and
> > > > > > > > > > > +
> > > > > > > > > > > + - 'share' option of memory-backend-file is 'on'.
> > > > > > > > > > > +
> > > > > > > > > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > > > > > > > > +
> > > > > > > > > > 
> > > > > > > > > > Wait isn't this what pmem was supposed to do?
> > > > > > > > > > Doesn't it mean "persistent memory"?
> > > > > > > > > pmem is a option for memory-backend-file, user should know the backend
> > > > > > > > > is in host persistent memory, with this flags on, while there is a host crash
> > > > > > > > > or a power failures.
> > > > > > > > > 
> > > > > > > > > [1] Qemu will take necessary operations to guarantee the persistence.
> > > > > > > > > https://patchwork.ozlabs.org/cover/944749/ 
> > > > > > > > > 
> > > > > > > > > [2] Host kernel also take opretions to consistent filesystem metadata.
> > > > > > > > > Add MAP_SYNC flags.
> > > > > > > > 
> > > > > > > > OK so I'm a user. Can you educate me please?  
> > > > > > > We suppose an administrator should know it, what is the back-end region coming from,
> > > > > > > is it persistent? what is the font-end device is? a volatile dimm or an
> > > > > > > nonvolatile dimm? then they can choice put the pmem=on[off] and sync=on[off].
> > > > > > > If he didn't, we encourage that don't set these 2 flags.
> > > > > > > 
> > > > > > > > When should MAP_SYNC not
> > > > > > > > be set? Are there any disadvantages (e.g.  performance?)?
> > > > > > > Not only the performance, sometimes like the front-end device is an
> > > > > > > volatile ram, we don't wanna set such option although the backend is a
> > > > > > > novolatile memory, if power lose, all of thing should lose in this ram.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > I am not sure how does above answer the questions. If I don't know,
> > > > > > neither will the hypothetical administrator. Looks like a better
> > > > > > interface is needed to make the choice on behalf of the user.
> > > > > > 
> > > > > By default, we have already ignore the 2 flags, unless the administrator
> > > > > know how to make that use. By-now, seems we don't have a better way to detect what
> > > > > memory-backend-file is, a persistent memory or not.
> > > > 
> > > > In that case how about an interface where user tells QEMU "this backend
> > > > is in persistent memory"?
> > > 
> > > Actually, [pmem=on] already did this, we can get the backend memory type from:
> > > 
> > > file_memory_backend_get_pmem(),
> > > 
> > > That is already being used in the memory_region_init_ram_from_file. that
> > > is why I reuse the RAM_PMEM to identify the region coming from a
> > > persitent memory? correct me if something I misunderstood? 
> > 
> > Right and thus my question: why not set SYNC unconditionally with pmem=on?
> A case is that we set pmem but don't wanna sync in a /dev/dax backend,
> which is direct mapping to physical nvdimm device without filesystem aware,
> SYNC is useless in this condition.

Does it hurt though?

> > 
> > > 
> > > > 
> > > > 
> > > > 
> > > > > > 
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > MST
> > > > > > > > 
> > > > > > 
> > > > 
> > 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-09-16 15:44 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  8:16 [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Zhang Yi
2018-12-18  8:16 ` [Qemu-devel] [PATCH V7 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 2/6] util/mmap-alloc: switch qemu_ram_mmap() to 'flags' parameter Zhang Yi
2018-12-18 13:35   ` Michael S. Tsirkin
2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 3/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
2018-12-18 13:52   ` Michael S. Tsirkin
2018-12-19  9:25     ` Yi Zhang
2018-12-19 16:42       ` Michael S. Tsirkin
2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 4/6] util/mmap-alloc: Switch the RAM_SYNC flags to OnOffAuto Zhang Yi
2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 5/6] hostmem: add more information in error messages Zhang Yi
2018-12-18  8:17 ` [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option Zhang Yi
2018-12-18 14:18   ` Michael S. Tsirkin
2018-12-19  9:10     ` Yi Zhang
2018-12-19 15:59       ` Michael S. Tsirkin
2018-12-20  3:03         ` Yi Zhang
2018-12-20  3:42           ` Michael S. Tsirkin
2018-12-20  5:37             ` Yi Zhang
2018-12-20 14:06               ` Michael S. Tsirkin
2018-12-21  3:18                 ` Yi Zhang
2018-12-21 16:36                   ` Michael S. Tsirkin
2018-12-24  8:11                     ` Yi Zhang
2019-09-16 15:14                       ` Michael S. Tsirkin
2018-12-18  9:18 ` [Qemu-devel] [PATCH V7 0/6] nvdimm: support MAP_SYNC for memory-backend-file Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.