All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file
@ 2019-01-23  2:59 Zhang, Yi
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang, Yi
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Zhang, Yi @ 2019-01-23  2:59 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang,Yi

From: "Zhang,Yi" <yi.z.zhang@linux.intel.com>

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Changes in V10:
 * 4/4: refine the document.
 * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
 * 2/4: Fix the wrong include header

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (4):
  util/mmap-alloc: switch 'shared' to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  hostmem: add more information in error messages
  docs: Added MAP_SYNC documentation

 backends/hostmem-file.c   |  6 ++++--
 backends/hostmem.c        |  8 +++++---
 docs/nvdimm.txt           | 29 ++++++++++++++++++++++++++++-
 exec.c                    |  7 ++++---
 include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
 include/qemu/osdep.h      | 21 +++++++++++++++++++++
 qemu-options.hx           |  4 ++++
 util/mmap-alloc.c         | 15 +++++++++++----
 util/oslib-posix.c        |  9 ++++++++-
 9 files changed, 104 insertions(+), 15 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter
  2019-01-23  2:59 [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Zhang, Yi
@ 2019-01-23  2:59 ` Zhang, Yi
  2019-01-23 15:01   ` Michael S. Tsirkin
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Zhang, Yi @ 2019-01-23  2:59 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap() and qemu_ram_alloc_from_{file,fd}(), let's
switch 'shared' to a 'flags' parameter in advance, so as to ease the
further additions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 exec.c                    |  7 ++++---
 include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
 util/mmap-alloc.c         |  8 +++++---
 util/oslib-posix.c        |  9 ++++++++-
 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             int fd,
                             bool truncate,
+                            uint32_t flags,
                             Error **errp)
 {
     void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED);
+    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+            ram_flags, errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *          otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *          bit-or of following values
+ *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
      * anonymous memory is OK.
      */
     int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
-    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+    bool shared = flags & RAM_SHARED;
     size_t offset;
     void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..75a0171 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -54,6 +54,7 @@
 #endif
 
 #include "qemu/mmap-alloc.h"
+#include "exec/memory.h"
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #include "qemu/error-report.h"
@@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, shared);
+    uint32_t flags = 0;
+    void *ptr;
+
+    if (shared) {
+        flags = RAM_SHARED;
+    }
+    ptr = qemu_ram_mmap(-1, size, align, flags);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-23  2:59 [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Zhang, Yi
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang, Yi
@ 2019-01-23  2:59 ` Zhang, Yi
  2019-01-23 15:04   ` Michael S. Tsirkin
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 3/4] hostmem: add more information in error messages Zhang, Yi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Zhang, Yi @ 2019-01-23  2:59 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
    - MAP_SYNC will active.
   b: backend is not a dax supporting file.
    - mmap will result in an EOPNOTSUPP error.

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 include/qemu/mmap-alloc.h |  1 +
 include/qemu/osdep.h      | 21 +++++++++++++++++++++
 util/mmap-alloc.c         |  7 ++++++-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..a95d91c 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *          bit-or of following values
  *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          - RAM_PMEM: mmap with MAP_SYNC flag
  *          Other bits are ignored.
  *
  * Return:
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..3bcf155 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -419,6 +419,27 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include <linux/mman.h>
+
+#ifndef MAP_SYNC
+#define MAP_SYNC 0x0
+#endif
+
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE 0x0
+#endif
+
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC              0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
     uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..a4ce9b5 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
     bool shared = flags & RAM_SHARED;
+    bool is_pmem = flags & RAM_PMEM;
+    int mmap_xflags = 0;
     size_t offset;
     void *ptr1;
 
@@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     assert(is_power_of_2(align));
     /* Always align to host page size */
     assert(align >= getpagesize());
+    if (shared && is_pmem) {
+        mmap_xflags |= (MAP_SYNC | MAP_SHARED_VALIDATE);
+    }
 
     offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
     ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
                 MAP_FIXED |
                 (fd == -1 ? MAP_ANONYMOUS : 0) |
-                (shared ? MAP_SHARED : MAP_PRIVATE),
+                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
                 fd, 0);
     if (ptr1 == MAP_FAILED) {
         munmap(ptr, total);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V10 3/4] hostmem: add more information in error messages
  2019-01-23  2:59 [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Zhang, Yi
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang, Yi
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
@ 2019-01-23  2:59 ` Zhang, Yi
  2019-01-23  3:00 ` [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation Zhang, Yi
  2019-01-23  3:01 ` [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Michael S. Tsirkin
  4 siblings, 0 replies; 27+ messages in thread
From: Zhang, Yi @ 2019-01-23  2:59 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

When there are multiple memory backends in use, including the object type
name and the property name in the error message can help users to locate
the error.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
---
 backends/hostmem-file.c | 6 ++++--
 backends/hostmem.c      | 8 +++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(errp, "cannot change property value");
+        error_setg(errp, "cannot change property 'mem-path' of %s",
+                   object_get_typename(o));
         return;
     }
     g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     uint64_t val;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property '%s' of %s",
+                   name, object_get_typename(o));
         goto out;
     }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
     uint64_t value;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property %s of %s ",
+                   name, object_get_typename(obj));
         goto out;
     }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
         goto out;
     }
     if (!value) {
-        error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
-                   PRIu64 "'", object_get_typename(obj), name, value);
+        error_setg(&local_err,
+                   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+                   name, object_get_typename(obj), value);
         goto out;
     }
     backend->size = value;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel]  [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-23  2:59 [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Zhang, Yi
                   ` (2 preceding siblings ...)
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 3/4] hostmem: add more information in error messages Zhang, Yi
@ 2019-01-23  3:00 ` Zhang, Yi
  2019-01-23 14:50   ` Eduardo Habkost
  2019-01-23  3:01 ` [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Michael S. Tsirkin
  4 siblings, 1 reply; 27+ messages in thread
From: Zhang, Yi @ 2019-01-23  3:00 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 docs/nvdimm.txt | 29 ++++++++++++++++++++++++++++-
 qemu-options.hx |  4 ++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..166c395 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,38 @@ backend of vNVDIMM:
 Guest Data Persistence
 ----------------------
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends in case of host crash or a power failures.
+However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which
+ensures filesystem metadata consistency in case of a host crash or a power
+failure. Enabling MAP_SYNC in QEMU requires below conditions
+
+ - 'pmem' option of memory-backend-file is 'on':
+   The backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
+   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
+   error.
+
+ - 'share' option of memory-backend-file is 'on':
+   MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type.
+
+ - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.15)
+
+Otherwise, We will ignore the MAP_SYNC flag.
+
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..0cd41f4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
 
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file
  2019-01-23  2:59 [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Zhang, Yi
                   ` (3 preceding siblings ...)
  2019-01-23  3:00 ` [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation Zhang, Yi
@ 2019-01-23  3:01 ` Michael S. Tsirkin
  2019-01-23  3:10   ` Yi Zhang
  4 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-23  3:01 UTC (permalink / raw)
  To: Zhang, Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 23, 2019 at 10:59:27AM +0800, Zhang, Yi wrote:
> From: "Zhang,Yi" <yi.z.zhang@linux.intel.com>
> 
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').
> 
> A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>     https://patchwork.kernel.org/patch/10028151/
> 
> In order to make sure that the file metadata is in sync after a fault 
> while we are writing a shared DAX supporting backend files, this
> patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> 
> As the DAX vs DMA truncated issue was solved, we refined the code and
> send out this feature for the v5 version.
> 
> We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> 'share=on' & 'pmem=on'. 
> Or QEMU will not pass this flag to mmap(2)

How was this patch tested? Given the previous version apparently
passed the combination of flags ignored by Linux,
enquiring minds want to know.

> Changes in V10:
>  * 4/4: refine the document.
>  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
>  * 2/4: Fix the wrong include header
> 
> Changes in V9:
>  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
>  since I don't have much knowledge about the sparse feature, @Micheal Could you 
>  add some documentation/commit message on this patch? Thank you very much.
>  * 3/6: from 2/5: Eduardo: updated the commit message. 
>  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
>  * 5/6: from 4/5: Eduardo: updated the commit message.
>  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> 
> Changes in v8:
>  * Micheal: 3/5, remove the duplicated define in the os_dep.h
>  * Micheal: 2/5, make type define safety.
>  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
>  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
>    MAP_SYNC only worked with pmem=on.
>  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
>    all the flags in one parameter.
> 
> Changes in v7:
>  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> 
> Changes in v6:
>  * Pankaj: 3/7 are squashed with 2/7
>  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
>  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
>  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
>  * Stefan, 5/7 Add missing "munmap"
>  * Stefan, 2/7 refine the shared/flag.
> 
> Changes in v5:
>  * Add patch 1 to fix a memory leak issue.
>  * Refine the patch 4-6
>  * Remove the patch 3 as we already change the parameter from "shared" to
>    "flags"
> 
> Changes in v4:
>  * Add patch 1-3 to switch some functions to a single 'flags'
>    parameters. (Michael S. Tsirkin)
>  * v3 patch 1-3 become v4 patch 4-6.
>  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
>    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
>  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> 
> Changes in v3:
>  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
>    cases, and add back the retry mechanism. MAP_SYNC will be ignored
>    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
>  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
>    platforms in order to make qemu_ram_mmap() compile on those platforms.
>  * Patch 2&3: include more information in error messages of
>    memory-backend in hope to help user to identify the error.
>    (Dr. David Alan Gilbert)
>  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> 
> Changes in v2:
>  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
>  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
>    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
>  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
>    to osdep.h. (Michael S. Tsirkin)
> 
> Zhang Yi (4):
>   util/mmap-alloc: switch 'shared' to 'flags' parameter
>   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
>   hostmem: add more information in error messages
>   docs: Added MAP_SYNC documentation
> 
>  backends/hostmem-file.c   |  6 ++++--
>  backends/hostmem.c        |  8 +++++---
>  docs/nvdimm.txt           | 29 ++++++++++++++++++++++++++++-
>  exec.c                    |  7 ++++---
>  include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
>  include/qemu/osdep.h      | 21 +++++++++++++++++++++
>  qemu-options.hx           |  4 ++++
>  util/mmap-alloc.c         | 15 +++++++++++----
>  util/oslib-posix.c        |  9 ++++++++-
>  9 files changed, 104 insertions(+), 15 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file
  2019-01-23  3:01 ` [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Michael S. Tsirkin
@ 2019-01-23  3:10   ` Yi Zhang
  2019-01-23  3:53     ` Michael S. Tsirkin
  2019-01-23  4:02     ` Michael S. Tsirkin
  0 siblings, 2 replies; 27+ messages in thread
From: Yi Zhang @ 2019-01-23  3:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-01-22 at 22:01:58 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 23, 2019 at 10:59:27AM +0800, Zhang, Yi wrote:
> > From: "Zhang,Yi" <yi.z.zhang@linux.intel.com>
> > 
> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > files on ext4/xfs file system mounted with '-o dax').
> > 
> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> >     https://patchwork.kernel.org/patch/10028151/
> > 
> > In order to make sure that the file metadata is in sync after a fault 
> > while we are writing a shared DAX supporting backend files, this
> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > 
> > As the DAX vs DMA truncated issue was solved, we refined the code and
> > send out this feature for the v5 version.
> > 
> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > 'share=on' & 'pmem=on'. 
> > Or QEMU will not pass this flag to mmap(2)
> 
> How was this patch tested? Given the previous version apparently
> passed the combination of flags ignored by Linux,
> enquiring minds want to know.
Ah. Sorry, We Have missed the flag MAP_SHARED_VALIDATE in V9, V8 and previous
are tested. already fixed in the 2/4.
> 
> > Changes in V10:
> >  * 4/4: refine the document.
> >  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> >  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
> >  * 2/4: Fix the wrong include header
> > 
> > Changes in V9:
> >  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> >  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
> >  since I don't have much knowledge about the sparse feature, @Micheal Could you 
> >  add some documentation/commit message on this patch? Thank you very much.
> >  * 3/6: from 2/5: Eduardo: updated the commit message. 
> >  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
> >  * 5/6: from 4/5: Eduardo: updated the commit message.
> >  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> > 
> > Changes in v8:
> >  * Micheal: 3/5, remove the duplicated define in the os_dep.h
> >  * Micheal: 2/5, make type define safety.
> >  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
> >  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
> >    MAP_SYNC only worked with pmem=on.
> >  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
> >    all the flags in one parameter.
> > 
> > Changes in v7:
> >  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> > 
> > Changes in v6:
> >  * Pankaj: 3/7 are squashed with 2/7
> >  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
> >  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
> >  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
> >  * Stefan, 5/7 Add missing "munmap"
> >  * Stefan, 2/7 refine the shared/flag.
> > 
> > Changes in v5:
> >  * Add patch 1 to fix a memory leak issue.
> >  * Refine the patch 4-6
> >  * Remove the patch 3 as we already change the parameter from "shared" to
> >    "flags"
> > 
> > Changes in v4:
> >  * Add patch 1-3 to switch some functions to a single 'flags'
> >    parameters. (Michael S. Tsirkin)
> >  * v3 patch 1-3 become v4 patch 4-6.
> >  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
> >    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
> >  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> > 
> > Changes in v3:
> >  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
> >    cases, and add back the retry mechanism. MAP_SYNC will be ignored
> >    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
> >  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
> >    platforms in order to make qemu_ram_mmap() compile on those platforms.
> >  * Patch 2&3: include more information in error messages of
> >    memory-backend in hope to help user to identify the error.
> >    (Dr. David Alan Gilbert)
> >  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> > 
> > Changes in v2:
> >  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
> >  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
> >    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
> >  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
> >    to osdep.h. (Michael S. Tsirkin)
> > 
> > Zhang Yi (4):
> >   util/mmap-alloc: switch 'shared' to 'flags' parameter
> >   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
> >   hostmem: add more information in error messages
> >   docs: Added MAP_SYNC documentation
> > 
> >  backends/hostmem-file.c   |  6 ++++--
> >  backends/hostmem.c        |  8 +++++---
> >  docs/nvdimm.txt           | 29 ++++++++++++++++++++++++++++-
> >  exec.c                    |  7 ++++---
> >  include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
> >  include/qemu/osdep.h      | 21 +++++++++++++++++++++
> >  qemu-options.hx           |  4 ++++
> >  util/mmap-alloc.c         | 15 +++++++++++----
> >  util/oslib-posix.c        |  9 ++++++++-
> >  9 files changed, 104 insertions(+), 15 deletions(-)
> > 
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file
  2019-01-23  3:10   ` Yi Zhang
@ 2019-01-23  3:53     ` Michael S. Tsirkin
  2019-01-23 15:45       ` Yi Zhang
  2019-01-23  4:02     ` Michael S. Tsirkin
  1 sibling, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-23  3:53 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 23, 2019 at 11:10:07AM +0800, Yi Zhang wrote:
> On 2019-01-22 at 22:01:58 -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 23, 2019 at 10:59:27AM +0800, Zhang, Yi wrote:
> > > From: "Zhang,Yi" <yi.z.zhang@linux.intel.com>
> > > 
> > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > > files on ext4/xfs file system mounted with '-o dax').
> > > 
> > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> > >     https://patchwork.kernel.org/patch/10028151/
> > > 
> > > In order to make sure that the file metadata is in sync after a fault 
> > > while we are writing a shared DAX supporting backend files, this
> > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > > 
> > > As the DAX vs DMA truncated issue was solved, we refined the code and
> > > send out this feature for the v5 version.
> > > 
> > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > > 'share=on' & 'pmem=on'. 
> > > Or QEMU will not pass this flag to mmap(2)
> > How was this patch tested? Given the previous version apparently
> > passed the combination of flags ignored by Linux,
> > enquiring minds want to know.
> Ah. Sorry, We Have missed the flag MAP_SHARED_VALIDATE in V9, V8 and previous
> are tested. already fixed in the 2/4.

Could you please include the information: which configurations were
tested and how?

> > 
> > > Changes in V10:
> > >  * 4/4: refine the document.
> > >  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> > >  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
> > >  * 2/4: Fix the wrong include header
> > > 
> > > Changes in V9:
> > >  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> > >  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
> > >  since I don't have much knowledge about the sparse feature, @Micheal Could you 
> > >  add some documentation/commit message on this patch? Thank you very much.
> > >  * 3/6: from 2/5: Eduardo: updated the commit message. 
> > >  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
> > >  * 5/6: from 4/5: Eduardo: updated the commit message.
> > >  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> > > 
> > > Changes in v8:
> > >  * Micheal: 3/5, remove the duplicated define in the os_dep.h
> > >  * Micheal: 2/5, make type define safety.
> > >  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
> > >  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
> > >    MAP_SYNC only worked with pmem=on.
> > >  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
> > >    all the flags in one parameter.
> > > 
> > > Changes in v7:
> > >  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> > > 
> > > Changes in v6:
> > >  * Pankaj: 3/7 are squashed with 2/7
> > >  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
> > >  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
> > >  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
> > >  * Stefan, 5/7 Add missing "munmap"
> > >  * Stefan, 2/7 refine the shared/flag.
> > > 
> > > Changes in v5:
> > >  * Add patch 1 to fix a memory leak issue.
> > >  * Refine the patch 4-6
> > >  * Remove the patch 3 as we already change the parameter from "shared" to
> > >    "flags"
> > > 
> > > Changes in v4:
> > >  * Add patch 1-3 to switch some functions to a single 'flags'
> > >    parameters. (Michael S. Tsirkin)
> > >  * v3 patch 1-3 become v4 patch 4-6.
> > >  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
> > >    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
> > >  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> > > 
> > > Changes in v3:
> > >  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
> > >    cases, and add back the retry mechanism. MAP_SYNC will be ignored
> > >    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
> > >  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
> > >    platforms in order to make qemu_ram_mmap() compile on those platforms.
> > >  * Patch 2&3: include more information in error messages of
> > >    memory-backend in hope to help user to identify the error.
> > >    (Dr. David Alan Gilbert)
> > >  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> > > 
> > > Changes in v2:
> > >  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
> > >  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
> > >    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
> > >  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
> > >    to osdep.h. (Michael S. Tsirkin)
> > > 
> > > Zhang Yi (4):
> > >   util/mmap-alloc: switch 'shared' to 'flags' parameter
> > >   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
> > >   hostmem: add more information in error messages
> > >   docs: Added MAP_SYNC documentation
> > > 
> > >  backends/hostmem-file.c   |  6 ++++--
> > >  backends/hostmem.c        |  8 +++++---
> > >  docs/nvdimm.txt           | 29 ++++++++++++++++++++++++++++-
> > >  exec.c                    |  7 ++++---
> > >  include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
> > >  include/qemu/osdep.h      | 21 +++++++++++++++++++++
> > >  qemu-options.hx           |  4 ++++
> > >  util/mmap-alloc.c         | 15 +++++++++++----
> > >  util/oslib-posix.c        |  9 ++++++++-
> > >  9 files changed, 104 insertions(+), 15 deletions(-)
> > > 
> > > -- 
> > > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file
  2019-01-23  3:10   ` Yi Zhang
  2019-01-23  3:53     ` Michael S. Tsirkin
@ 2019-01-23  4:02     ` Michael S. Tsirkin
  2019-01-23 15:57       ` Yi Zhang
  1 sibling, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-23  4:02 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 23, 2019 at 11:10:07AM +0800, Yi Zhang wrote:
> On 2019-01-22 at 22:01:58 -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 23, 2019 at 10:59:27AM +0800, Zhang, Yi wrote:
> > > From: "Zhang,Yi" <yi.z.zhang@linux.intel.com>
> > > 
> > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > > files on ext4/xfs file system mounted with '-o dax').
> > > 
> > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> > >     https://patchwork.kernel.org/patch/10028151/
> > > 
> > > In order to make sure that the file metadata is in sync after a fault 
> > > while we are writing a shared DAX supporting backend files, this
> > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > > 
> > > As the DAX vs DMA truncated issue was solved, we refined the code and
> > > send out this feature for the v5 version.
> > > 
> > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > > 'share=on' & 'pmem=on'. 
> > > Or QEMU will not pass this flag to mmap(2)
> > 
> > How was this patch tested? Given the previous version apparently
> > passed the combination of flags ignored by Linux,
> > enquiring minds want to know.
> Ah. Sorry, We Have missed the flag MAP_SHARED_VALIDATE in V9, V8 and previous
> are tested.

BTW I checked V8 and I don't see MAP_SHARED_VALIDATE there either.
So all the testing seems suspect at this point -
you want a test that fails all versions before V10.

> already fixed in the 2/4.
> > 
> > > Changes in V10:
> > >  * 4/4: refine the document.
> > >  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> > >  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
> > >  * 2/4: Fix the wrong include header
> > > 
> > > Changes in V9:
> > >  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> > >  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
> > >  since I don't have much knowledge about the sparse feature, @Micheal Could you 
> > >  add some documentation/commit message on this patch? Thank you very much.
> > >  * 3/6: from 2/5: Eduardo: updated the commit message. 
> > >  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
> > >  * 5/6: from 4/5: Eduardo: updated the commit message.
> > >  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> > > 
> > > Changes in v8:
> > >  * Micheal: 3/5, remove the duplicated define in the os_dep.h
> > >  * Micheal: 2/5, make type define safety.
> > >  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
> > >  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
> > >    MAP_SYNC only worked with pmem=on.
> > >  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
> > >    all the flags in one parameter.
> > > 
> > > Changes in v7:
> > >  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> > > 
> > > Changes in v6:
> > >  * Pankaj: 3/7 are squashed with 2/7
> > >  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
> > >  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
> > >  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
> > >  * Stefan, 5/7 Add missing "munmap"
> > >  * Stefan, 2/7 refine the shared/flag.
> > > 
> > > Changes in v5:
> > >  * Add patch 1 to fix a memory leak issue.
> > >  * Refine the patch 4-6
> > >  * Remove the patch 3 as we already change the parameter from "shared" to
> > >    "flags"
> > > 
> > > Changes in v4:
> > >  * Add patch 1-3 to switch some functions to a single 'flags'
> > >    parameters. (Michael S. Tsirkin)
> > >  * v3 patch 1-3 become v4 patch 4-6.
> > >  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
> > >    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
> > >  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> > > 
> > > Changes in v3:
> > >  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
> > >    cases, and add back the retry mechanism. MAP_SYNC will be ignored
> > >    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
> > >  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
> > >    platforms in order to make qemu_ram_mmap() compile on those platforms.
> > >  * Patch 2&3: include more information in error messages of
> > >    memory-backend in hope to help user to identify the error.
> > >    (Dr. David Alan Gilbert)
> > >  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> > > 
> > > Changes in v2:
> > >  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
> > >  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
> > >    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
> > >  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
> > >    to osdep.h. (Michael S. Tsirkin)
> > > 
> > > Zhang Yi (4):
> > >   util/mmap-alloc: switch 'shared' to 'flags' parameter
> > >   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
> > >   hostmem: add more information in error messages
> > >   docs: Added MAP_SYNC documentation
> > > 
> > >  backends/hostmem-file.c   |  6 ++++--
> > >  backends/hostmem.c        |  8 +++++---
> > >  docs/nvdimm.txt           | 29 ++++++++++++++++++++++++++++-
> > >  exec.c                    |  7 ++++---
> > >  include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
> > >  include/qemu/osdep.h      | 21 +++++++++++++++++++++
> > >  qemu-options.hx           |  4 ++++
> > >  util/mmap-alloc.c         | 15 +++++++++++----
> > >  util/oslib-posix.c        |  9 ++++++++-
> > >  9 files changed, 104 insertions(+), 15 deletions(-)
> > > 
> > > -- 
> > > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-23  3:00 ` [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation Zhang, Yi
@ 2019-01-23 14:50   ` Eduardo Habkost
  2019-01-24 11:21     ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-23 14:50 UTC (permalink / raw)
  To: Zhang, Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  docs/nvdimm.txt | 29 ++++++++++++++++++++++++++++-
>  qemu-options.hx |  4 ++++
>  2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 5f158a6..166c395 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -142,11 +142,38 @@ backend of vNVDIMM:
>  Guest Data Persistence
>  ----------------------
>  
> +vNVDIMM is designed and implemented to guarantee the guest data
> +persistence on the backends in case of host crash or a power failures.
> +However, there are still some requirements and limitations
> +as explained below.
> +
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which
> +ensures filesystem metadata consistency in case of a host crash or a power
> +failure. Enabling MAP_SYNC in QEMU requires below conditions
> +
> + - 'pmem' option of memory-backend-file is 'on':
> +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> +   error.

Won't this break existing configurations that work today on QEMU
3.1.0?  Why exactly it is OK to break compatibility here?

> +
> + - 'share' option of memory-backend-file is 'on':
> +   MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type.

I don't understand what this paragraph means.

> +
> + - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.15)
> +

I don't understand why you are making the semantics of
command-line options change depending on the host kernel.

> +Otherwise, We will ignore the MAP_SYNC flag.
> +

See the questions I sent about supported use cases at
<https://www.mail-archive.com/qemu-devel@nongnu.org/msg588822.html>.
I still don't see those questions answered:

] We have at least 3 different possible use cases we might need to
] support:
] 
] 1) pmem=on, MAP_SYNC not desired
] 2) pmem=on, MAP_SYNC desired but optional
] 3) pmem=on, MAP_SYNC required, not optional
] 
] Which cases from the list above we need to support?
] 
] From the cases above, what's the expected semantics of "pmem=on"
] with no extra options?

It's not clear to me yet if you want to support use cases (1) and (2).

Also, you seem to be choosing between use case (1) or (3) depending on
the build environment instead of command-line options.  The
meaning of command-line options must be predictable and
unambiguous, and not depend on build time variables.


> +For more details, please reference mmap(2) man page:
> +http://man7.org/linux/man-pages/man2/mmap.2.html.
> +
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
>  guest NVDIMM region mapping structure.  This unarmed flag indicates
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08f8516..0cd41f4 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
>  If @option{pmem} is set to 'on', QEMU will take necessary operations to
>  guarantee the persistence of its own writes to @option{mem-path}
>  (e.g. in vNVDIMM label emulation and live migration).
> +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> +the file metadata is in sync to @option{mem-path} in case of host crash
> +or a power failure. MAP_SYNC requires support from both the host kernel
> +(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
>  
>  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
> -- 
> 2.7.4
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang, Yi
@ 2019-01-23 15:01   ` Michael S. Tsirkin
  0 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-23 15:01 UTC (permalink / raw)
  To: Zhang, Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 23, 2019 at 10:59:37AM +0800, Zhang, Yi wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> As more flag parameters besides the existing 'shared' are going to be
> added to qemu_ram_mmap() and qemu_ram_alloc_from_{file,fd}(), let's
> switch 'shared' to a 'flags' parameter in advance, so as to ease the
> further additions.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>

There are all of two callers to qemu_ram_mmap.

Please just add a bool parameter for now.

If you want to work on switching to flags afterwards
that needs sparse work for type safety.


> ---
>  exec.c                    |  7 ++++---
>  include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
>  util/mmap-alloc.c         |  8 +++++---
>  util/oslib-posix.c        |  9 ++++++++-
>  4 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index bb6170d..e92a7da 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              int fd,
>                              bool truncate,
> +                            uint32_t flags,
>                              Error **errp)
>  {
>      void *area;
> @@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
>          perror("ftruncate");
>      }
>  
> -    area = qemu_ram_mmap(fd, memory, block->mr->align,
> -                         block->flags & RAM_SHARED);
> +    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
>      if (area == MAP_FAILED) {
>          error_setg_errno(errp, errno,
>                           "unable to map backing store for guest RAM");
> @@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = ram_flags;
> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> +            ram_flags, errp);
>      if (!new_block->host) {
>          g_free(new_block);
>          return NULL;
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 50385e3..6fe6ed4 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
>  
>  size_t qemu_mempath_getpagesize(const char *mem_path);
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> +/**
> + * qemu_ram_mmap: mmap the specified file or device.
> + *
> + * Parameters:
> + *  @fd: the file or the device to mmap
> + *  @size: the number of bytes to be mmaped
> + *  @align: if not zero, specify the alignment of the starting mapping address;
> + *          otherwise, the alignment in use will be determined by QEMU.
> + *  @flags: specifies additional properties of the mapping, which can be one or
> + *          bit-or of following values
> + *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          Other bits are ignored.
> + *
> + * Return:
> + *  On success, return a pointer to the mapped area.
> + *  On failure, return MAP_FAILED.
> + */
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
>  
>  void qemu_ram_munmap(void *ptr, size_t size);
>  
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index fd329ec..8f0a740 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/mmap-alloc.h"
>  #include "qemu/host-utils.h"
> +#include "exec/memory.h"
>  
>  #define HUGETLBFS_MAGIC       0x958458f6
>  
> @@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>      return getpagesize();
>  }
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>  {
>      /*
>       * Note: this always allocates at least one extra page of virtual address
> @@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * anonymous memory is OK.
>       */
>      int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> -    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> -    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> +    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> +    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
>  #else
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
> +    bool shared = flags & RAM_SHARED;
>      size_t offset;
>      void *ptr1;
>  
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index fbd0dc8..75a0171 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -54,6 +54,7 @@
>  #endif
>  
>  #include "qemu/mmap-alloc.h"
> +#include "exec/memory.h"
>  
>  #ifdef CONFIG_DEBUG_STACK_USAGE
>  #include "qemu/error-report.h"
> @@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
>  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
>  {
>      size_t align = QEMU_VMALLOC_ALIGN;
> -    void *ptr = qemu_ram_mmap(-1, size, align, shared);
> +    uint32_t flags = 0;
> +    void *ptr;
> +
> +    if (shared) {
> +        flags = RAM_SHARED;
> +    }
> +    ptr = qemu_ram_mmap(-1, size, align, flags);
>  
>      if (ptr == MAP_FAILED) {
>          return NULL;
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
@ 2019-01-23 15:04   ` Michael S. Tsirkin
  2019-01-24 14:15     ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-23 15:04 UTC (permalink / raw)
  To: Zhang, Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 23, 2019 at 10:59:45AM +0800, Zhang, Yi wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition which can ensure file system metadata
> synced in each guest writes to the backend file, without other QEMU
> actions (e.g., periodic fsync() by QEMU).
> 
> Current, We have below different possible use cases:
> 
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
>    a: backend is a dax supporting file.
>     - MAP_SYNC will active.
>    b: backend is not a dax supporting file.
>     - mmap will result in an EOPNOTSUPP error.
> 
> 2. The rest of cases:
>    - we will never pass the MAP_SYNC to mmap2
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  include/qemu/mmap-alloc.h |  1 +
>  include/qemu/osdep.h      | 21 +++++++++++++++++++++
>  util/mmap-alloc.c         |  7 ++++++-
>  3 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 6fe6ed4..a95d91c 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>   *  @flags: specifies additional properties of the mapping, which can be one or
>   *          bit-or of following values
>   *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          - RAM_PMEM: mmap with MAP_SYNC flag
>   *          Other bits are ignored.
>   *
>   * Return:
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 457d24e..3bcf155 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -419,6 +419,27 @@ void qemu_anon_ram_free(void *ptr, size_t size);
>  #  define QEMU_VMALLOC_ALIGN getpagesize()
>  #endif
>  
> +/*
> + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> + * 4.15, so they may not be defined when compiling on older kernels.
> + */


I commented on this part in v7. That's a wrong way to handle
compatibility.

> +#ifdef CONFIG_LINUX
> +
> +#include <linux/mman.h>
> +
> +#ifndef MAP_SYNC
> +#define MAP_SYNC 0x0
> +#endif
> +
> +#ifndef MAP_SHARED_VALIDATE
> +#define MAP_SHARED_VALIDATE 0x0
> +#endif
> +




> +#else  /* !CONFIG_LINUX */
> +#define MAP_SYNC              0x0
> +#define MAP_SHARED_VALIDATE   0x0
> +#endif /* CONFIG_LINUX */
> +
>  #ifdef CONFIG_POSIX
>  struct qemu_signalfd_siginfo {
>      uint32_t ssi_signo;   /* Signal number */
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 8f0a740..a4ce9b5 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
>      bool shared = flags & RAM_SHARED;
> +    bool is_pmem = flags & RAM_PMEM;
> +    int mmap_xflags = 0;
>      size_t offset;
>      void *ptr1;
>  
> @@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
>      assert(align >= getpagesize());
> +    if (shared && is_pmem) {
> +        mmap_xflags |= (MAP_SYNC | MAP_SHARED_VALIDATE);
> +    }
>  
>      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
>      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
>                  MAP_FIXED |
>                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> -                (shared ? MAP_SHARED : MAP_PRIVATE),
> +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
>                  fd, 0);
>      if (ptr1 == MAP_FAILED) {
>          munmap(ptr, total);
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file
  2019-01-23  3:53     ` Michael S. Tsirkin
@ 2019-01-23 15:45       ` Yi Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Yi Zhang @ 2019-01-23 15:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-01-22 at 22:53:36 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 23, 2019 at 11:10:07AM +0800, Yi Zhang wrote:
> > On 2019-01-22 at 22:01:58 -0500, Michael S. Tsirkin wrote:
> > > On Wed, Jan 23, 2019 at 10:59:27AM +0800, Zhang, Yi wrote:
> > > > From: "Zhang,Yi" <yi.z.zhang@linux.intel.com>
> > > > 
> > > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > > > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > > > files on ext4/xfs file system mounted with '-o dax').
> > > > 
> > > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> > > >     https://patchwork.kernel.org/patch/10028151/
> > > > 
> > > > In order to make sure that the file metadata is in sync after a fault 
> > > > while we are writing a shared DAX supporting backend files, this
> > > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > > > 
> > > > As the DAX vs DMA truncated issue was solved, we refined the code and
> > > > send out this feature for the v5 version.
> > > > 
> > > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > > > 'share=on' & 'pmem=on'. 
> > > > Or QEMU will not pass this flag to mmap(2)
> > > How was this patch tested? Given the previous version apparently
> > > passed the combination of flags ignored by Linux,
> > > enquiring minds want to know.
> > Ah. Sorry, We Have missed the flag MAP_SHARED_VALIDATE in V9, V8 and previous
> > are tested. already fixed in the 2/4.
> 
> Could you please include the information: which configurations were
> tested and how?
Sure, updated it in cover letter and resend it.
> 
> > > 
> > > > Changes in V10:
> > > >  * 4/4: refine the document.
> > > >  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
> > > >  * 2/4: Fix the wrong include header
> > > > 
> > > > Changes in V9:
> > > >  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> > > >  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
> > > >  since I don't have much knowledge about the sparse feature, @Micheal Could you 
> > > >  add some documentation/commit message on this patch? Thank you very much.
> > > >  * 3/6: from 2/5: Eduardo: updated the commit message. 
> > > >  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
> > > >  * 5/6: from 4/5: Eduardo: updated the commit message.
> > > >  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> > > > 
> > > > Changes in v8:
> > > >  * Micheal: 3/5, remove the duplicated define in the os_dep.h
> > > >  * Micheal: 2/5, make type define safety.
> > > >  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
> > > >  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
> > > >    MAP_SYNC only worked with pmem=on.
> > > >  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
> > > >    all the flags in one parameter.
> > > > 
> > > > Changes in v7:
> > > >  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> > > > 
> > > > Changes in v6:
> > > >  * Pankaj: 3/7 are squashed with 2/7
> > > >  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
> > > >  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
> > > >  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
> > > >  * Stefan, 5/7 Add missing "munmap"
> > > >  * Stefan, 2/7 refine the shared/flag.
> > > > 
> > > > Changes in v5:
> > > >  * Add patch 1 to fix a memory leak issue.
> > > >  * Refine the patch 4-6
> > > >  * Remove the patch 3 as we already change the parameter from "shared" to
> > > >    "flags"
> > > > 
> > > > Changes in v4:
> > > >  * Add patch 1-3 to switch some functions to a single 'flags'
> > > >    parameters. (Michael S. Tsirkin)
> > > >  * v3 patch 1-3 become v4 patch 4-6.
> > > >  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
> > > >    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
> > > >  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> > > > 
> > > > Changes in v3:
> > > >  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
> > > >    cases, and add back the retry mechanism. MAP_SYNC will be ignored
> > > >    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
> > > >  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
> > > >    platforms in order to make qemu_ram_mmap() compile on those platforms.
> > > >  * Patch 2&3: include more information in error messages of
> > > >    memory-backend in hope to help user to identify the error.
> > > >    (Dr. David Alan Gilbert)
> > > >  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> > > > 
> > > > Changes in v2:
> > > >  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
> > > >  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
> > > >    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
> > > >  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
> > > >    to osdep.h. (Michael S. Tsirkin)
> > > > 
> > > > Zhang Yi (4):
> > > >   util/mmap-alloc: switch 'shared' to 'flags' parameter
> > > >   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
> > > >   hostmem: add more information in error messages
> > > >   docs: Added MAP_SYNC documentation
> > > > 
> > > >  backends/hostmem-file.c   |  6 ++++--
> > > >  backends/hostmem.c        |  8 +++++---
> > > >  docs/nvdimm.txt           | 29 ++++++++++++++++++++++++++++-
> > > >  exec.c                    |  7 ++++---
> > > >  include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
> > > >  include/qemu/osdep.h      | 21 +++++++++++++++++++++
> > > >  qemu-options.hx           |  4 ++++
> > > >  util/mmap-alloc.c         | 15 +++++++++++----
> > > >  util/oslib-posix.c        |  9 ++++++++-
> > > >  9 files changed, 104 insertions(+), 15 deletions(-)
> > > > 
> > > > -- 
> > > > 2.7.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file
  2019-01-23  4:02     ` Michael S. Tsirkin
@ 2019-01-23 15:57       ` Yi Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Yi Zhang @ 2019-01-23 15:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-01-22 at 23:02:22 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 23, 2019 at 11:10:07AM +0800, Yi Zhang wrote:
> > On 2019-01-22 at 22:01:58 -0500, Michael S. Tsirkin wrote:
> > > On Wed, Jan 23, 2019 at 10:59:27AM +0800, Zhang, Yi wrote:
> > > > From: "Zhang,Yi" <yi.z.zhang@linux.intel.com>
> > > > 
> > > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > > > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > > > files on ext4/xfs file system mounted with '-o dax').
> > > > 
> > > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> > > >     https://patchwork.kernel.org/patch/10028151/
> > > > 
> > > > In order to make sure that the file metadata is in sync after a fault 
> > > > while we are writing a shared DAX supporting backend files, this
> > > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > > > 
> > > > As the DAX vs DMA truncated issue was solved, we refined the code and
> > > > send out this feature for the v5 version.
> > > > 
> > > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > > > 'share=on' & 'pmem=on'. 
> > > > Or QEMU will not pass this flag to mmap(2)
> > > 
> > > How was this patch tested? Given the previous version apparently
> > > passed the combination of flags ignored by Linux,
> > > enquiring minds want to know.
> > Ah. Sorry, We Have missed the flag MAP_SHARED_VALIDATE in V9, V8 and previous
> > are tested.
> 
> BTW I checked V8 and I don't see MAP_SHARED_VALIDATE there either.
> So all the testing seems suspect at this point -
> you want a test that fails all versions before V10.

Ah.. it is in V7, I fogot that after remove our own difination

#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)

Sorry agin for that.
> 
> > already fixed in the 2/4.
> > > 
> > > > Changes in V10:
> > > >  * 4/4: refine the document.
> > > >  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
> > > >  * 2/4: Fix the wrong include header
> > > > 
> > > > Changes in V9:
> > > >  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> > > >  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
> > > >  since I don't have much knowledge about the sparse feature, @Micheal Could you 
> > > >  add some documentation/commit message on this patch? Thank you very much.
> > > >  * 3/6: from 2/5: Eduardo: updated the commit message. 
> > > >  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
> > > >  * 5/6: from 4/5: Eduardo: updated the commit message.
> > > >  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> > > > 
> > > > Changes in v8:
> > > >  * Micheal: 3/5, remove the duplicated define in the os_dep.h
> > > >  * Micheal: 2/5, make type define safety.
> > > >  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
> > > >  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
> > > >    MAP_SYNC only worked with pmem=on.
> > > >  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
> > > >    all the flags in one parameter.
> > > > 
> > > > Changes in v7:
> > > >  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> > > > 
> > > > Changes in v6:
> > > >  * Pankaj: 3/7 are squashed with 2/7
> > > >  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
> > > >  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
> > > >  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
> > > >  * Stefan, 5/7 Add missing "munmap"
> > > >  * Stefan, 2/7 refine the shared/flag.
> > > > 
> > > > Changes in v5:
> > > >  * Add patch 1 to fix a memory leak issue.
> > > >  * Refine the patch 4-6
> > > >  * Remove the patch 3 as we already change the parameter from "shared" to
> > > >    "flags"
> > > > 
> > > > Changes in v4:
> > > >  * Add patch 1-3 to switch some functions to a single 'flags'
> > > >    parameters. (Michael S. Tsirkin)
> > > >  * v3 patch 1-3 become v4 patch 4-6.
> > > >  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
> > > >    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
> > > >  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> > > > 
> > > > Changes in v3:
> > > >  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
> > > >    cases, and add back the retry mechanism. MAP_SYNC will be ignored
> > > >    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
> > > >  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
> > > >    platforms in order to make qemu_ram_mmap() compile on those platforms.
> > > >  * Patch 2&3: include more information in error messages of
> > > >    memory-backend in hope to help user to identify the error.
> > > >    (Dr. David Alan Gilbert)
> > > >  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> > > > 
> > > > Changes in v2:
> > > >  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
> > > >  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
> > > >    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
> > > >  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
> > > >    to osdep.h. (Michael S. Tsirkin)
> > > > 
> > > > Zhang Yi (4):
> > > >   util/mmap-alloc: switch 'shared' to 'flags' parameter
> > > >   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
> > > >   hostmem: add more information in error messages
> > > >   docs: Added MAP_SYNC documentation
> > > > 
> > > >  backends/hostmem-file.c   |  6 ++++--
> > > >  backends/hostmem.c        |  8 +++++---
> > > >  docs/nvdimm.txt           | 29 ++++++++++++++++++++++++++++-
> > > >  exec.c                    |  7 ++++---
> > > >  include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
> > > >  include/qemu/osdep.h      | 21 +++++++++++++++++++++
> > > >  qemu-options.hx           |  4 ++++
> > > >  util/mmap-alloc.c         | 15 +++++++++++----
> > > >  util/oslib-posix.c        |  9 ++++++++-
> > > >  9 files changed, 104 insertions(+), 15 deletions(-)
> > > > 
> > > > -- 
> > > > 2.7.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-23 14:50   ` Eduardo Habkost
@ 2019-01-24 11:21     ` Yi Zhang
  2019-01-24 16:59       ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Yi Zhang @ 2019-01-24 11:21 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, qemu-devel, imammedo, dan.j.williams

On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > 
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  docs/nvdimm.txt | 29 ++++++++++++++++++++++++++++-
> >  qemu-options.hx |  4 ++++
> >  2 files changed, 32 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index 5f158a6..166c395 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -142,11 +142,38 @@ backend of vNVDIMM:
> >  Guest Data Persistence
> >  ----------------------
> >  
> > +vNVDIMM is designed and implemented to guarantee the guest data
> > +persistence on the backends in case of host crash or a power failures.
> > +However, there are still some requirements and limitations
> > +as explained below.
> > +
> >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > -currently the only one that can guarantee the guest write persistence
> > +if MAP_SYNC is not supported by the host kernel and the backends,
> > +the only backend that can guarantee the guest write persistence
> >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> >  which all guest access do not involve any host-side kernel cache.
> >  
> > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > +systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which
> > +ensures filesystem metadata consistency in case of a host crash or a power
> > +failure. Enabling MAP_SYNC in QEMU requires below conditions
> > +
> > + - 'pmem' option of memory-backend-file is 'on':
> > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > +   error.
> 
> Won't this break existing configurations that work today on QEMU
> 3.1.0?  Why exactly it is OK to break compatibility here?
won't, pmem option default is off, if people who start VM don't know what
backend file is, it is suggested and *default to set pmem=off,
if people well know the backend file have dax capbility. it is suggest
to set pmem=on. 

For a special case that we use /dev/dax as backend, we already have a
patch to add MAP_SYNC falg mapiing from device dax mode.
see https://lkml.org/lkml/2018/4/22/524 

So, if people force set pmem=on, mapping a regular file, it will results
in an EOPNOTSUPP error. 

see http://man7.org/linux/man-pages/man2/mmap.2.html 
> 
> > +
> > + - 'share' option of memory-backend-file is 'on':
> > +   MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type.
> 
> I don't understand what this paragraph means.
see http://man7.org/linux/man-pages/man2/mmap.2.html 
> 
> > +
> > + - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.15)
> > +
> 
> I don't understand why you are making the semantics of
> command-line options change depending on the host kernel.
the option pmem=on do not dependent the host kernel. MAP_SYNC will be ignore
if the kernel don't support. the "pmem=on" have another meaning
see https://patchwork.kernel.org/patch/10459407/
> 
> > +Otherwise, We will ignore the MAP_SYNC flag.
> > +
> 
> See the questions I sent about supported use cases at
> <https://www.mail-archive.com/qemu-devel@nongnu.org/msg588822.html>.
> I still don't see those questions answered:
> 
> ] We have at least 3 different possible use cases we might need to
> ] support:
> ] 
> ] 1) pmem=on, MAP_SYNC not desired
> ] 2) pmem=on, MAP_SYNC desired but optional
> ] 3) pmem=on, MAP_SYNC required, not optional
> ] 

Sorry for my poor understanding, I don't know what these mean? 
pmem=on will force flag the MAP_SYNC while it capable on current kernel.
As we talk with Micheal if we set pmem=on , MAP_SYNC is always desired.

Means, if pmem=on, there is no option to close MAP_SYNC seprately.


> ] Which cases from the list above we need to support?
> ] 
> ] From the cases above, what's the expected semantics of "pmem=on"
> ] with no extra options?
> 
> It's not clear to me yet if you want to support use cases (1) and (2).
> 
> Also, you seem to be choosing between use case (1) or (3) depending on
> the build environment instead of command-line options.  The
> meaning of command-line options must be predictable and
> unambiguous, and not depend on build time variables.
so you are asking?
1) pmem=on, MAP_SYNC not supported kernel
- MAP_SYNC will be defined 0 and will be ignored in this case. see 2/4.
2) pmem=on, MAP_SYNC is supported but have a option to pass to mmap2()
- v7 send-out for a option sync to open/close MAP_SYNC seprately.
After talking with Micheal, we give up on a bit of flexibility, and
just say pmem=on forces MAP_SYNC. on a MAP_SYNC capable configrations(kernel+
backend dax)
3) pmem=on, ?
> 
> 
> > +For more details, please reference mmap(2) man page:
> > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> > +
> >  When using other types of backends, it's suggested to set 'unarmed'
> >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 08f8516..0cd41f4 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> >  guarantee the persistence of its own writes to @option{mem-path}
> >  (e.g. in vNVDIMM label emulation and live migration).
> > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > +the file metadata is in sync to @option{mem-path} in case of host crash
> > +or a power failure. MAP_SYNC requires support from both the host kernel
> > +(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
> >  
> >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >  
> > -- 
> > 2.7.4
> > 
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-24 14:15     ` Yi Zhang
@ 2019-01-24 13:39       ` Michael S. Tsirkin
  0 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-24 13:39 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 10:15:24PM +0800, Yi Zhang wrote:
> On 2019-01-23 at 10:04:01 -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 23, 2019 at 10:59:45AM +0800, Zhang, Yi wrote:
> > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > 
> > > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > > MAP_SYNC flag in addition which can ensure file system metadata
> > > synced in each guest writes to the backend file, without other QEMU
> > > actions (e.g., periodic fsync() by QEMU).
> > > 
> > > Current, We have below different possible use cases:
> > > 
> > > 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
> > >    a: backend is a dax supporting file.
> > >     - MAP_SYNC will active.
> > >    b: backend is not a dax supporting file.
> > >     - mmap will result in an EOPNOTSUPP error.
> > > 
> > > 2. The rest of cases:
> > >    - we will never pass the MAP_SYNC to mmap2
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > >  include/qemu/mmap-alloc.h |  1 +
> > >  include/qemu/osdep.h      | 21 +++++++++++++++++++++
> > >  util/mmap-alloc.c         |  7 ++++++-
> > >  3 files changed, 28 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > > index 6fe6ed4..a95d91c 100644
> > > --- a/include/qemu/mmap-alloc.h
> > > +++ b/include/qemu/mmap-alloc.h
> > > @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
> > >   *  @flags: specifies additional properties of the mapping, which can be one or
> > >   *          bit-or of following values
> > >   *          - RAM_SHARED: mmap with MAP_SHARED flag
> > > + *          - RAM_PMEM: mmap with MAP_SYNC flag
> > >   *          Other bits are ignored.
> > >   *
> > >   * Return:
> > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > > index 457d24e..3bcf155 100644
> > > --- a/include/qemu/osdep.h
> > > +++ b/include/qemu/osdep.h
> > > @@ -419,6 +419,27 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> > >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> > >  #endif
> > >  
> > > +/*
> > > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > > + * 4.15, so they may not be defined when compiling on older kernels.
> > > + */
> > 
> > 
> > I commented on this part in v7. That's a wrong way to handle
> > compatibility.
> I'm a little confused that, you point me that should use 
> #include <linux/mman.h>
> here in v9, so what is best way to handle compatibility? 
> modify the update-linux-headers.sh and copy the mman.h to
> standard-headers/linux/?
> and #incldue standard-headers/linux/mman.h ?

In fact, asm/mman.h and asm-generic/mman-common.h - that's where
these are defined.

> but it still need fix the compatibility. if the MAP_SYNC not defined in
> the old kernel. Right?

No. You need to check at runtime and handle the case where mmap fails.
There's no guarantee that qemu binary will be rebuilt each time
kernel changes.


> forgive me my poor understandings, I'm pazzled.
> > 
> > > +#ifdef CONFIG_LINUX
> > > +
> > > +#include <linux/mman.h>
> > > +
> > > +#ifndef MAP_SYNC
> > > +#define MAP_SYNC 0x0
> > > +#endif
> > > +
> > > +#ifndef MAP_SHARED_VALIDATE
> > > +#define MAP_SHARED_VALIDATE 0x0
> > > +#endif
> > > +
> > 
> > 
> > 
> > 
> > > +#else  /* !CONFIG_LINUX */
> > > +#define MAP_SYNC              0x0
> > > +#define MAP_SHARED_VALIDATE   0x0
> > > +#endif /* CONFIG_LINUX */
> > > +
> > >  #ifdef CONFIG_POSIX
> > >  struct qemu_signalfd_siginfo {
> > >      uint32_t ssi_signo;   /* Signal number */
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index 8f0a740..a4ce9b5 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >  #endif
> > >      bool shared = flags & RAM_SHARED;
> > > +    bool is_pmem = flags & RAM_PMEM;
> > > +    int mmap_xflags = 0;
> > >      size_t offset;
> > >      void *ptr1;
> > >  
> > > @@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      assert(is_power_of_2(align));
> > >      /* Always align to host page size */
> > >      assert(align >= getpagesize());
> > > +    if (shared && is_pmem) {
> > > +        mmap_xflags |= (MAP_SYNC | MAP_SHARED_VALIDATE);
> > > +    }
> > >  
> > >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> > >                  MAP_FIXED |
> > >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> > >                  fd, 0);
> > >      if (ptr1 == MAP_FAILED) {
> > >          munmap(ptr, total);
> > > -- 
> > > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-23 15:04   ` Michael S. Tsirkin
@ 2019-01-24 14:15     ` Yi Zhang
  2019-01-24 13:39       ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Yi Zhang @ 2019-01-24 14:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-01-23 at 10:04:01 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 23, 2019 at 10:59:45AM +0800, Zhang, Yi wrote:
> > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > 
> > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > MAP_SYNC flag in addition which can ensure file system metadata
> > synced in each guest writes to the backend file, without other QEMU
> > actions (e.g., periodic fsync() by QEMU).
> > 
> > Current, We have below different possible use cases:
> > 
> > 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
> >    a: backend is a dax supporting file.
> >     - MAP_SYNC will active.
> >    b: backend is not a dax supporting file.
> >     - mmap will result in an EOPNOTSUPP error.
> > 
> > 2. The rest of cases:
> >    - we will never pass the MAP_SYNC to mmap2
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  include/qemu/mmap-alloc.h |  1 +
> >  include/qemu/osdep.h      | 21 +++++++++++++++++++++
> >  util/mmap-alloc.c         |  7 ++++++-
> >  3 files changed, 28 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > index 6fe6ed4..a95d91c 100644
> > --- a/include/qemu/mmap-alloc.h
> > +++ b/include/qemu/mmap-alloc.h
> > @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
> >   *  @flags: specifies additional properties of the mapping, which can be one or
> >   *          bit-or of following values
> >   *          - RAM_SHARED: mmap with MAP_SHARED flag
> > + *          - RAM_PMEM: mmap with MAP_SYNC flag
> >   *          Other bits are ignored.
> >   *
> >   * Return:
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 457d24e..3bcf155 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -419,6 +419,27 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> >  #endif
> >  
> > +/*
> > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > + * 4.15, so they may not be defined when compiling on older kernels.
> > + */
> 
> 
> I commented on this part in v7. That's a wrong way to handle
> compatibility.
I'm a little confused that, you point me that should use 
#include <linux/mman.h>
here in v9, so what is best way to handle compatibility? 
modify the update-linux-headers.sh and copy the mman.h to
standard-headers/linux/?
and #incldue standard-headers/linux/mman.h ?

but it still need fix the compatibility. if the MAP_SYNC not defined in
the old kernel. Right?
forgive me my poor understandings, I'm pazzled.
> 
> > +#ifdef CONFIG_LINUX
> > +
> > +#include <linux/mman.h>
> > +
> > +#ifndef MAP_SYNC
> > +#define MAP_SYNC 0x0
> > +#endif
> > +
> > +#ifndef MAP_SHARED_VALIDATE
> > +#define MAP_SHARED_VALIDATE 0x0
> > +#endif
> > +
> 
> 
> 
> 
> > +#else  /* !CONFIG_LINUX */
> > +#define MAP_SYNC              0x0
> > +#define MAP_SHARED_VALIDATE   0x0
> > +#endif /* CONFIG_LINUX */
> > +
> >  #ifdef CONFIG_POSIX
> >  struct qemu_signalfd_siginfo {
> >      uint32_t ssi_signo;   /* Signal number */
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index 8f0a740..a4ce9b5 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >  #endif
> >      bool shared = flags & RAM_SHARED;
> > +    bool is_pmem = flags & RAM_PMEM;
> > +    int mmap_xflags = 0;
> >      size_t offset;
> >      void *ptr1;
> >  
> > @@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      assert(is_power_of_2(align));
> >      /* Always align to host page size */
> >      assert(align >= getpagesize());
> > +    if (shared && is_pmem) {
> > +        mmap_xflags |= (MAP_SYNC | MAP_SHARED_VALIDATE);
> > +    }
> >  
> >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> >                  MAP_FIXED |
> >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> >                  fd, 0);
> >      if (ptr1 == MAP_FAILED) {
> >          munmap(ptr, total);
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-24 11:21     ` Yi Zhang
@ 2019-01-24 16:59       ` Eduardo Habkost
  2019-01-24 17:45         ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-24 16:59 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > 
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > >  docs/nvdimm.txt | 29 ++++++++++++++++++++++++++++-
> > >  qemu-options.hx |  4 ++++
> > >  2 files changed, 32 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > index 5f158a6..166c395 100644
> > > --- a/docs/nvdimm.txt
> > > +++ b/docs/nvdimm.txt
> > > @@ -142,11 +142,38 @@ backend of vNVDIMM:
> > >  Guest Data Persistence
> > >  ----------------------
> > >  
> > > +vNVDIMM is designed and implemented to guarantee the guest data
> > > +persistence on the backends in case of host crash or a power failures.
> > > +However, there are still some requirements and limitations
> > > +as explained below.
> > > +
> > >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > -currently the only one that can guarantee the guest write persistence
> > > +if MAP_SYNC is not supported by the host kernel and the backends,
> > > +the only backend that can guarantee the guest write persistence
> > >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > >  which all guest access do not involve any host-side kernel cache.
> > >  
> > > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > > +systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which
> > > +ensures filesystem metadata consistency in case of a host crash or a power
> > > +failure. Enabling MAP_SYNC in QEMU requires below conditions
> > > +
> > > + - 'pmem' option of memory-backend-file is 'on':
> > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > +   error.
> > 
> > Won't this break existing configurations that work today on QEMU
> > 3.1.0?  Why exactly it is OK to break compatibility here?
> won't, pmem option default is off, if people who start VM don't know what
> backend file is, it is suggested and *default to set pmem=off,
> if people well know the backend file have dax capbility. it is suggest
> to set pmem=on. 
> 
> For a special case that we use /dev/dax as backend, we already have a
> patch to add MAP_SYNC falg mapiing from device dax mode.
> see https://lkml.org/lkml/2018/4/22/524 
> 
> So, if people force set pmem=on, mapping a regular file, it will results
> in an EOPNOTSUPP error. 

This is where compatibility is being broken, isn't it?  People
currently using pmem=on on a regular file will start getting
errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
booting.  Maybe this is OK, but we need to be able to explain why
it is OK.

> 
> see http://man7.org/linux/man-pages/man2/mmap.2.html 
> > 
> > > +
> > > + - 'share' option of memory-backend-file is 'on':
> > > +   MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type.
> > 
> > I don't understand what this paragraph means.
> see http://man7.org/linux/man-pages/man2/mmap.2.html 
> > 
> > > +
> > > + - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.15)
> > > +
> > 
> > I don't understand why you are making the semantics of
> > command-line options change depending on the host kernel.
> the option pmem=on do not dependent the host kernel. MAP_SYNC will be ignore
> if the kernel don't support. the "pmem=on" have another meaning
> see https://patchwork.kernel.org/patch/10459407/
> > 
> > > +Otherwise, We will ignore the MAP_SYNC flag.
> > > +
> > 
> > See the questions I sent about supported use cases at
> > <https://www.mail-archive.com/qemu-devel@nongnu.org/msg588822.html>.
> > I still don't see those questions answered:
> > 
> > ] We have at least 3 different possible use cases we might need to
> > ] support:
> > ] 
> > ] 1) pmem=on, MAP_SYNC not desired
> > ] 2) pmem=on, MAP_SYNC desired but optional
> > ] 3) pmem=on, MAP_SYNC required, not optional
> > ] 
> 
> Sorry for my poor understanding, I don't know what these mean? 
> pmem=on will force flag the MAP_SYNC while it capable on current kernel.
> As we talk with Micheal if we set pmem=on , MAP_SYNC is always desired.
> 
> Means, if pmem=on, there is no option to close MAP_SYNC seprately.

I'm trying to find out what you need the code to do, and the 3
items above are possible use cases that we might need to support.
I'm not claiming we need to support all of them, but I would like
to understand which ones you need to support.

Once we answer that, we can choose what's the command-line
required for each case.  Right now this is not clear.

> 
> 
> > ] Which cases from the list above we need to support?
> > ] 
> > ] From the cases above, what's the expected semantics of "pmem=on"
> > ] with no extra options?
> > 
> > It's not clear to me yet if you want to support use cases (1) and (2).
> > 
> > Also, you seem to be choosing between use case (1) or (3) depending on
> > the build environment instead of command-line options.  The
> > meaning of command-line options must be predictable and
> > unambiguous, and not depend on build time variables.
> so you are asking?
> 1) pmem=on, MAP_SYNC not supported kernel
> - MAP_SYNC will be defined 0 and will be ignored in this case. see 2/4.
> 2) pmem=on, MAP_SYNC is supported but have a option to pass to mmap2()
> - v7 send-out for a option sync to open/close MAP_SYNC seprately.
> After talking with Micheal, we give up on a bit of flexibility, and
> just say pmem=on forces MAP_SYNC. on a MAP_SYNC capable configrations(kernel+
> backend dax)

I don't get this: you seem to be saying your series implement
(2), but above you say that users will get an error if using
pmem=on on a filesystem not supporting DAX, which means MAP_SYNC
is required but not optional (3).

In either case, the choice between (1), (2) or (3) must depend
only on command-line options, not on the QEMU build environment.
"pmem=on" must always mean the same thing.

If pmem=on is documented as making MAP_SYNC required (not
optional), it should make MAP_SYNC required every time.

If pmem=on is documented as making MAP_SYNC desired but optional,
it should make MAP_SYNC optional every time.

If you want pmem=on to mean something else not listed above, it
may be also OK, as long as the meaning of pmem=on doesn't depend
on the build time environment.

With the current version of the series, the user can't be sure if
pmem=on will enable MAP_SYNC or not, because its meaning depends
on the version of the headers when QEMU was compiled.


> 3) pmem=on, ?
> > 
> > 
> > > +For more details, please reference mmap(2) man page:
> > > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> > > +
> > >  When using other types of backends, it's suggested to set 'unarmed'
> > >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 08f8516..0cd41f4 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > >  guarantee the persistence of its own writes to @option{mem-path}
> > >  (e.g. in vNVDIMM label emulation and live migration).
> > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > > +the file metadata is in sync to @option{mem-path} in case of host crash
> > > +or a power failure. MAP_SYNC requires support from both the host kernel
> > > +(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
> > >  
> > >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > >  
> > > -- 
> > > 2.7.4
> > > 
> > 
> > -- 
> > Eduardo

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-24 16:59       ` Eduardo Habkost
@ 2019-01-24 17:45         ` Michael S. Tsirkin
  2019-01-24 18:28           ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-24 17:45 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > 
> > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > ---
> > > >  docs/nvdimm.txt | 29 ++++++++++++++++++++++++++++-
> > > >  qemu-options.hx |  4 ++++
> > > >  2 files changed, 32 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > > index 5f158a6..166c395 100644
> > > > --- a/docs/nvdimm.txt
> > > > +++ b/docs/nvdimm.txt
> > > > @@ -142,11 +142,38 @@ backend of vNVDIMM:
> > > >  Guest Data Persistence
> > > >  ----------------------
> > > >  
> > > > +vNVDIMM is designed and implemented to guarantee the guest data
> > > > +persistence on the backends in case of host crash or a power failures.
> > > > +However, there are still some requirements and limitations
> > > > +as explained below.
> > > > +
> > > >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > > -currently the only one that can guarantee the guest write persistence
> > > > +if MAP_SYNC is not supported by the host kernel and the backends,
> > > > +the only backend that can guarantee the guest write persistence
> > > >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > > >  which all guest access do not involve any host-side kernel cache.
> > > >  
> > > > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > > > +systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which
> > > > +ensures filesystem metadata consistency in case of a host crash or a power
> > > > +failure. Enabling MAP_SYNC in QEMU requires below conditions
> > > > +
> > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > +   error.
> > > 
> > > Won't this break existing configurations that work today on QEMU
> > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > won't, pmem option default is off, if people who start VM don't know what
> > backend file is, it is suggested and *default to set pmem=off,
> > if people well know the backend file have dax capbility. it is suggest
> > to set pmem=on. 
> > 
> > For a special case that we use /dev/dax as backend, we already have a
> > patch to add MAP_SYNC falg mapiing from device dax mode.
> > see https://lkml.org/lkml/2018/4/22/524 
> > 
> > So, if people force set pmem=on, mapping a regular file, it will results
> > in an EOPNOTSUPP error. 
> 
> This is where compatibility is being broken, isn't it?  People
> currently using pmem=on on a regular file will start getting
> errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> booting.  Maybe this is OK, but we need to be able to explain why
> it is OK.

I think it's OK since pmem explicitly means "persistent":

The @option{pmem} option specifies whether the backing file specified
by @option{mem-path} is in host persistent memory that can be accessed
using the SNIA NVM programming model (e.g. Intel NVDIMM).
If @option{pmem} is set to 'on', QEMU will take necessary operations to
guarantee the persistence of its own writes to @option{mem-path}
(e.g. in vNVDIMM label emulation and live migration).




> > 
> > see http://man7.org/linux/man-pages/man2/mmap.2.html 
> > > 
> > > > +
> > > > + - 'share' option of memory-backend-file is 'on':
> > > > +   MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type.
> > > 
> > > I don't understand what this paragraph means.
> > see http://man7.org/linux/man-pages/man2/mmap.2.html 
> > > 
> > > > +
> > > > + - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.15)
> > > > +
> > > 
> > > I don't understand why you are making the semantics of
> > > command-line options change depending on the host kernel.
> > the option pmem=on do not dependent the host kernel. MAP_SYNC will be ignore
> > if the kernel don't support. the "pmem=on" have another meaning
> > see https://patchwork.kernel.org/patch/10459407/
> > > 
> > > > +Otherwise, We will ignore the MAP_SYNC flag.
> > > > +
> > > 
> > > See the questions I sent about supported use cases at
> > > <https://www.mail-archive.com/qemu-devel@nongnu.org/msg588822.html>.
> > > I still don't see those questions answered:
> > > 
> > > ] We have at least 3 different possible use cases we might need to
> > > ] support:
> > > ] 
> > > ] 1) pmem=on, MAP_SYNC not desired
> > > ] 2) pmem=on, MAP_SYNC desired but optional
> > > ] 3) pmem=on, MAP_SYNC required, not optional
> > > ] 
> > 
> > Sorry for my poor understanding, I don't know what these mean? 
> > pmem=on will force flag the MAP_SYNC while it capable on current kernel.
> > As we talk with Micheal if we set pmem=on , MAP_SYNC is always desired.
> > 
> > Means, if pmem=on, there is no option to close MAP_SYNC seprately.
> 
> I'm trying to find out what you need the code to do, and the 3
> items above are possible use cases that we might need to support.
> I'm not claiming we need to support all of them, but I would like
> to understand which ones you need to support.
> 
> Once we answer that, we can choose what's the command-line
> required for each case.  Right now this is not clear.

I think generally MAP_SYNC is required.
But for compatibility reasons we might need to support
!MAP_SYNC on old kernels even though it's risky.

> > 
> > 
> > > ] Which cases from the list above we need to support?
> > > ] 
> > > ] From the cases above, what's the expected semantics of "pmem=on"
> > > ] with no extra options?
> > > 
> > > It's not clear to me yet if you want to support use cases (1) and (2).
> > > 
> > > Also, you seem to be choosing between use case (1) or (3) depending on
> > > the build environment instead of command-line options.  The
> > > meaning of command-line options must be predictable and
> > > unambiguous, and not depend on build time variables.
> > so you are asking?
> > 1) pmem=on, MAP_SYNC not supported kernel
> > - MAP_SYNC will be defined 0 and will be ignored in this case. see 2/4.
> > 2) pmem=on, MAP_SYNC is supported but have a option to pass to mmap2()
> > - v7 send-out for a option sync to open/close MAP_SYNC seprately.
> > After talking with Micheal, we give up on a bit of flexibility, and
> > just say pmem=on forces MAP_SYNC. on a MAP_SYNC capable configrations(kernel+
> > backend dax)
> 
> I don't get this: you seem to be saying your series implement
> (2), but above you say that users will get an error if using
> pmem=on on a filesystem not supporting DAX, which means MAP_SYNC
> is required but not optional (3).
> 
> In either case, the choice between (1), (2) or (3) must depend
> only on command-line options, not on the QEMU build environment.
> "pmem=on" must always mean the same thing.
> 
> If pmem=on is documented as making MAP_SYNC required (not
> optional), it should make MAP_SYNC required every time.
> 
> If pmem=on is documented as making MAP_SYNC desired but optional,
> it should make MAP_SYNC optional every time.
> 
> If you want pmem=on to mean something else not listed above, it
> may be also OK, as long as the meaning of pmem=on doesn't depend
> on the build time environment.
> 
> With the current version of the series, the user can't be sure if
> pmem=on will enable MAP_SYNC or not, because its meaning depends
> on the version of the headers when QEMU was compiled.
> 


Yes I am confused too.

> > 3) pmem=on, ?
> > > 
> > > 
> > > > +For more details, please reference mmap(2) man page:
> > > > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> > > > +
> > > >  When using other types of backends, it's suggested to set 'unarmed'
> > > >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > > >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > index 08f8516..0cd41f4 100644
> > > > --- a/qemu-options.hx
> > > > +++ b/qemu-options.hx
> > > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > >  guarantee the persistence of its own writes to @option{mem-path}
> > > >  (e.g. in vNVDIMM label emulation and live migration).
> > > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > > > +the file metadata is in sync to @option{mem-path} in case of host crash
> > > > +or a power failure. MAP_SYNC requires support from both the host kernel
> > > > +(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
> > > >  
> > > >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > > >  
> > > > -- 
> > > > 2.7.4
> > > > 
> > > 
> > > -- 
> > > Eduardo
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-24 17:45         ` Michael S. Tsirkin
@ 2019-01-24 18:28           ` Eduardo Habkost
  2019-01-24 19:05             ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-24 18:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 12:45:54PM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> > On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > 
> > > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
[...]
> > > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > > +   error.
> > > > 
> > > > Won't this break existing configurations that work today on QEMU
> > > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > > won't, pmem option default is off, if people who start VM don't know what
> > > backend file is, it is suggested and *default to set pmem=off,
> > > if people well know the backend file have dax capbility. it is suggest
> > > to set pmem=on. 
> > > 
> > > For a special case that we use /dev/dax as backend, we already have a
> > > patch to add MAP_SYNC falg mapiing from device dax mode.
> > > see https://lkml.org/lkml/2018/4/22/524 
> > > 
> > > So, if people force set pmem=on, mapping a regular file, it will results
> > > in an EOPNOTSUPP error. 
> > 
> > This is where compatibility is being broken, isn't it?  People
> > currently using pmem=on on a regular file will start getting
> > errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> > booting.  Maybe this is OK, but we need to be able to explain why
> > it is OK.
> 
> I think it's OK since pmem explicitly means "persistent":
> 
> The @option{pmem} option specifies whether the backing file specified
> by @option{mem-path} is in host persistent memory that can be accessed
> using the SNIA NVM programming model (e.g. Intel NVDIMM).
> If @option{pmem} is set to 'on', QEMU will take necessary operations to
> guarantee the persistence of its own writes to @option{mem-path}
> (e.g. in vNVDIMM label emulation and live migration).

If it's OK, let's at least explicitly document that we are
breaking compatibility in those cases.


> > > 
[...]
> I think generally MAP_SYNC is required.
> But for compatibility reasons we might need to support
> !MAP_SYNC on old kernels even though it's risky.

What about making MAP_SYNC optional only on older machine-types?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-24 18:28           ` Eduardo Habkost
@ 2019-01-24 19:05             ` Michael S. Tsirkin
  2019-01-24 19:14               ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-24 19:05 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 04:28:39PM -0200, Eduardo Habkost wrote:
> On Thu, Jan 24, 2019 at 12:45:54PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> > > On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > > > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > > 
> > > > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> [...]
> > > > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > > > +   error.
> > > > > 
> > > > > Won't this break existing configurations that work today on QEMU
> > > > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > > > won't, pmem option default is off, if people who start VM don't know what
> > > > backend file is, it is suggested and *default to set pmem=off,
> > > > if people well know the backend file have dax capbility. it is suggest
> > > > to set pmem=on. 
> > > > 
> > > > For a special case that we use /dev/dax as backend, we already have a
> > > > patch to add MAP_SYNC falg mapiing from device dax mode.
> > > > see https://lkml.org/lkml/2018/4/22/524 
> > > > 
> > > > So, if people force set pmem=on, mapping a regular file, it will results
> > > > in an EOPNOTSUPP error. 
> > > 
> > > This is where compatibility is being broken, isn't it?  People
> > > currently using pmem=on on a regular file will start getting
> > > errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> > > booting.  Maybe this is OK, but we need to be able to explain why
> > > it is OK.
> > 
> > I think it's OK since pmem explicitly means "persistent":
> > 
> > The @option{pmem} option specifies whether the backing file specified
> > by @option{mem-path} is in host persistent memory that can be accessed
> > using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > guarantee the persistence of its own writes to @option{mem-path}
> > (e.g. in vNVDIMM label emulation and live migration).
> 
> If it's OK, let's at least explicitly document that we are
> breaking compatibility in those cases.
> 
> 
> > > > 
> [...]
> > I think generally MAP_SYNC is required.
> > But for compatibility reasons we might need to support
> > !MAP_SYNC on old kernels even though it's risky.
> 
> What about making MAP_SYNC optional only on older machine-types?

I don't think this makes sense. It's not a guest visible change,
machine types are for that.

> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-24 19:05             ` Michael S. Tsirkin
@ 2019-01-24 19:14               ` Eduardo Habkost
  2019-01-25  3:08                 ` Michael S. Tsirkin
  2019-01-25 11:19                 ` Yi Zhang
  0 siblings, 2 replies; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-24 19:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 02:05:45PM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 24, 2019 at 04:28:39PM -0200, Eduardo Habkost wrote:
> > On Thu, Jan 24, 2019 at 12:45:54PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> > > > On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > > > > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > > > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > > > 
> > > > > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > [...]
> > > > > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > > > > +   error.
> > > > > > 
> > > > > > Won't this break existing configurations that work today on QEMU
> > > > > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > > > > won't, pmem option default is off, if people who start VM don't know what
> > > > > backend file is, it is suggested and *default to set pmem=off,
> > > > > if people well know the backend file have dax capbility. it is suggest
> > > > > to set pmem=on. 
> > > > > 
> > > > > For a special case that we use /dev/dax as backend, we already have a
> > > > > patch to add MAP_SYNC falg mapiing from device dax mode.
> > > > > see https://lkml.org/lkml/2018/4/22/524 
> > > > > 
> > > > > So, if people force set pmem=on, mapping a regular file, it will results
> > > > > in an EOPNOTSUPP error. 
> > > > 
> > > > This is where compatibility is being broken, isn't it?  People
> > > > currently using pmem=on on a regular file will start getting
> > > > errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> > > > booting.  Maybe this is OK, but we need to be able to explain why
> > > > it is OK.
> > > 
> > > I think it's OK since pmem explicitly means "persistent":
> > > 
> > > The @option{pmem} option specifies whether the backing file specified
> > > by @option{mem-path} is in host persistent memory that can be accessed
> > > using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > guarantee the persistence of its own writes to @option{mem-path}
> > > (e.g. in vNVDIMM label emulation and live migration).
> > 
> > If it's OK, let's at least explicitly document that we are
> > breaking compatibility in those cases.
> > 
> > 
> > > > > 
> > [...]
> > > I think generally MAP_SYNC is required.
> > > But for compatibility reasons we might need to support
> > > !MAP_SYNC on old kernels even though it's risky.
> > 
> > What about making MAP_SYNC optional only on older machine-types?
> 
> I don't think this makes sense. It's not a guest visible change,
> machine types are for that.

Losing data written to persistent memory is surely guest-visible
behavior.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-24 19:14               ` Eduardo Habkost
@ 2019-01-25  3:08                 ` Michael S. Tsirkin
  2019-01-25  3:26                   ` Eduardo Habkost
  2019-01-25 11:19                 ` Yi Zhang
  1 sibling, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-25  3:08 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 05:14:43PM -0200, Eduardo Habkost wrote:
> On Thu, Jan 24, 2019 at 02:05:45PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Jan 24, 2019 at 04:28:39PM -0200, Eduardo Habkost wrote:
> > > On Thu, Jan 24, 2019 at 12:45:54PM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> > > > > On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > > > > > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > > > > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > > > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > > > > 
> > > > > > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > [...]
> > > > > > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > > > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > > > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > > > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > > > > > +   error.
> > > > > > > 
> > > > > > > Won't this break existing configurations that work today on QEMU
> > > > > > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > > > > > won't, pmem option default is off, if people who start VM don't know what
> > > > > > backend file is, it is suggested and *default to set pmem=off,
> > > > > > if people well know the backend file have dax capbility. it is suggest
> > > > > > to set pmem=on. 
> > > > > > 
> > > > > > For a special case that we use /dev/dax as backend, we already have a
> > > > > > patch to add MAP_SYNC falg mapiing from device dax mode.
> > > > > > see https://lkml.org/lkml/2018/4/22/524 
> > > > > > 
> > > > > > So, if people force set pmem=on, mapping a regular file, it will results
> > > > > > in an EOPNOTSUPP error. 
> > > > > 
> > > > > This is where compatibility is being broken, isn't it?  People
> > > > > currently using pmem=on on a regular file will start getting
> > > > > errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> > > > > booting.  Maybe this is OK, but we need to be able to explain why
> > > > > it is OK.
> > > > 
> > > > I think it's OK since pmem explicitly means "persistent":
> > > > 
> > > > The @option{pmem} option specifies whether the backing file specified
> > > > by @option{mem-path} is in host persistent memory that can be accessed
> > > > using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > > guarantee the persistence of its own writes to @option{mem-path}
> > > > (e.g. in vNVDIMM label emulation and live migration).
> > > 
> > > If it's OK, let's at least explicitly document that we are
> > > breaking compatibility in those cases.
> > > 
> > > 
> > > > > > 
> > > [...]
> > > > I think generally MAP_SYNC is required.
> > > > But for compatibility reasons we might need to support
> > > > !MAP_SYNC on old kernels even though it's risky.
> > > 
> > > What about making MAP_SYNC optional only on older machine-types?
> > 
> > I don't think this makes sense. It's not a guest visible change,
> > machine types are for that.
> 
> Losing data written to persistent memory is surely guest-visible
> behavior.

I think we need not be purists here. Most people don't lose power and
then it's fine and compatible. People who want more robustness need to
use more modern kernels, that is all.

> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-25  3:08                 ` Michael S. Tsirkin
@ 2019-01-25  3:26                   ` Eduardo Habkost
  2019-01-25 20:01                     ` Michael S. Tsirkin
  2019-01-25 20:03                     ` Michael S. Tsirkin
  0 siblings, 2 replies; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-25  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Thu, Jan 24, 2019 at 10:08:37PM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 24, 2019 at 05:14:43PM -0200, Eduardo Habkost wrote:
> > On Thu, Jan 24, 2019 at 02:05:45PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Jan 24, 2019 at 04:28:39PM -0200, Eduardo Habkost wrote:
> > > > On Thu, Jan 24, 2019 at 12:45:54PM -0500, Michael S. Tsirkin wrote:
> > > > > On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> > > > > > On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > > > > > > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > > > > > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > > > > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > [...]
> > > > > > > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > > > > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > > > > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > > > > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > > > > > > +   error.
> > > > > > > > 
> > > > > > > > Won't this break existing configurations that work today on QEMU
> > > > > > > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > > > > > > won't, pmem option default is off, if people who start VM don't know what
> > > > > > > backend file is, it is suggested and *default to set pmem=off,
> > > > > > > if people well know the backend file have dax capbility. it is suggest
> > > > > > > to set pmem=on. 
> > > > > > > 
> > > > > > > For a special case that we use /dev/dax as backend, we already have a
> > > > > > > patch to add MAP_SYNC falg mapiing from device dax mode.
> > > > > > > see https://lkml.org/lkml/2018/4/22/524 
> > > > > > > 
> > > > > > > So, if people force set pmem=on, mapping a regular file, it will results
> > > > > > > in an EOPNOTSUPP error. 
> > > > > > 
> > > > > > This is where compatibility is being broken, isn't it?  People
> > > > > > currently using pmem=on on a regular file will start getting
> > > > > > errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> > > > > > booting.  Maybe this is OK, but we need to be able to explain why
> > > > > > it is OK.
> > > > > 
> > > > > I think it's OK since pmem explicitly means "persistent":
> > > > > 
> > > > > The @option{pmem} option specifies whether the backing file specified
> > > > > by @option{mem-path} is in host persistent memory that can be accessed
> > > > > using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > > > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > > > guarantee the persistence of its own writes to @option{mem-path}
> > > > > (e.g. in vNVDIMM label emulation and live migration).
> > > > 
> > > > If it's OK, let's at least explicitly document that we are
> > > > breaking compatibility in those cases.
> > > > 
> > > > 
> > > > > > > 
> > > > [...]
> > > > > I think generally MAP_SYNC is required.
> > > > > But for compatibility reasons we might need to support
> > > > > !MAP_SYNC on old kernels even though it's risky.
> > > > 
> > > > What about making MAP_SYNC optional only on older machine-types?
> > > 
> > > I don't think this makes sense. It's not a guest visible change,
> > > machine types are for that.
> > 
> > Losing data written to persistent memory is surely guest-visible
> > behavior.
> 
> I think we need not be purists here. Most people don't lose power and
> then it's fine and compatible. People who want more robustness need to
> use more modern kernels, that is all.

I don't think that's being purist.  I want to avoid hidden bugs
if we ignore that MAP_SYNC failed for any unexpected reason.  If
we need to ignore errors in some cases, let's at least limit that
to cases where we absolutely have to.

But I would also be happy with just a warning.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-24 19:14               ` Eduardo Habkost
  2019-01-25  3:08                 ` Michael S. Tsirkin
@ 2019-01-25 11:19                 ` Yi Zhang
  1 sibling, 0 replies; 27+ messages in thread
From: Yi Zhang @ 2019-01-25 11:19 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Michael S. Tsirkin, pagupta, xiaoguangrong.eric, qemu-devel,
	yu.c.zhang, richardw.yang, stefanha, imammedo, pbonzini,
	dan.j.williams

On 2019-01-24 at 17:14:43 -0200, Eduardo Habkost wrote:
> On Thu, Jan 24, 2019 at 02:05:45PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Jan 24, 2019 at 04:28:39PM -0200, Eduardo Habkost wrote:
> > > On Thu, Jan 24, 2019 at 12:45:54PM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> > > > > On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > > > > > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > > > > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > > > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > > > > 
> > > > > > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > [...]
> > > > > > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > > > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > > > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > > > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > > > > > +   error.
> > > > > > > 
> > > > > > > Won't this break existing configurations that work today on QEMU
> > > > > > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > > > > > won't, pmem option default is off, if people who start VM don't know what
> > > > > > backend file is, it is suggested and *default to set pmem=off,
> > > > > > if people well know the backend file have dax capbility. it is suggest
> > > > > > to set pmem=on. 
> > > > > > 
> > > > > > For a special case that we use /dev/dax as backend, we already have a
> > > > > > patch to add MAP_SYNC falg mapiing from device dax mode.
> > > > > > see https://lkml.org/lkml/2018/4/22/524 
> > > > > > 
> > > > > > So, if people force set pmem=on, mapping a regular file, it will results
> > > > > > in an EOPNOTSUPP error. 
> > > > > 
> > > > > This is where compatibility is being broken, isn't it?  People
> > > > > currently using pmem=on on a regular file will start getting
> > > > > errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> > > > > booting.  Maybe this is OK, but we need to be able to explain why
> > > > > it is OK.
> > > > 
> > > > I think it's OK since pmem explicitly means "persistent":
> > > > 
> > > > The @option{pmem} option specifies whether the backing file specified
> > > > by @option{mem-path} is in host persistent memory that can be accessed
> > > > using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > > guarantee the persistence of its own writes to @option{mem-path}
> > > > (e.g. in vNVDIMM label emulation and live migration).
> > > 
> > > If it's OK, let's at least explicitly document that we are
> > > breaking compatibility in those cases.
Yes, I will add more explanation in those broken cases.
> > > 
> > > 
> > > > > > 
> > > [...]
> > > > I think generally MAP_SYNC is required.
> > > > But for compatibility reasons we might need to support
> > > > !MAP_SYNC on old kernels even though it's risky.
> > > 
> > > What about making MAP_SYNC optional only on older machine-types?
Isn't Older Machine-type compatiable with new kernel? 
> > 
> > I don't think this makes sense. It's not a guest visible change,
> > machine types are for that.
> 
> Losing data written to persistent memory is surely guest-visible
> behavior.

Guest always visit it is a persistent memory, but it is only a faked
"persistent" front-end, the only way to guarantee the persistent is make the
host back-end pmem=on, that is not a guest visible option.

> 
> -- 
> Eduardo
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-25  3:26                   ` Eduardo Habkost
@ 2019-01-25 20:01                     ` Michael S. Tsirkin
  2019-01-25 20:03                     ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-25 20:01 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Fri, Jan 25, 2019 at 01:26:53AM -0200, Eduardo Habkost wrote:
> On Thu, Jan 24, 2019 at 10:08:37PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Jan 24, 2019 at 05:14:43PM -0200, Eduardo Habkost wrote:
> > > On Thu, Jan 24, 2019 at 02:05:45PM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Jan 24, 2019 at 04:28:39PM -0200, Eduardo Habkost wrote:
> > > > > On Thu, Jan 24, 2019 at 12:45:54PM -0500, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jan 24, 2019 at 02:59:26PM -0200, Eduardo Habkost wrote:
> > > > > > > On Thu, Jan 24, 2019 at 07:21:03PM +0800, Yi Zhang wrote:
> > > > > > > > On 2019-01-23 at 12:50:50 -0200, Eduardo Habkost wrote:
> > > > > > > > > On Wed, Jan 23, 2019 at 11:00:02AM +0800, Zhang, Yi wrote:
> > > > > > > > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > > [...]
> > > > > > > > > > + - 'pmem' option of memory-backend-file is 'on':
> > > > > > > > > > +   The backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > > > > > > > +   xfs file system mounted with '-o dax'. if your pmem=on ,but the backend is
> > > > > > > > > > +   not a file supporting DAX, mapping with this flag results in an EOPNOTSUPP
> > > > > > > > > > +   error.
> > > > > > > > > 
> > > > > > > > > Won't this break existing configurations that work today on QEMU
> > > > > > > > > 3.1.0?  Why exactly it is OK to break compatibility here?
> > > > > > > > won't, pmem option default is off, if people who start VM don't know what
> > > > > > > > backend file is, it is suggested and *default to set pmem=off,
> > > > > > > > if people well know the backend file have dax capbility. it is suggest
> > > > > > > > to set pmem=on. 
> > > > > > > > 
> > > > > > > > For a special case that we use /dev/dax as backend, we already have a
> > > > > > > > patch to add MAP_SYNC falg mapiing from device dax mode.
> > > > > > > > see https://lkml.org/lkml/2018/4/22/524 
> > > > > > > > 
> > > > > > > > So, if people force set pmem=on, mapping a regular file, it will results
> > > > > > > > in an EOPNOTSUPP error. 
> > > > > > > 
> > > > > > > This is where compatibility is being broken, isn't it?  People
> > > > > > > currently using pmem=on on a regular file will start getting
> > > > > > > errors after a QEMU upgrade.  Existing VMs with pmem=on may stop
> > > > > > > booting.  Maybe this is OK, but we need to be able to explain why
> > > > > > > it is OK.
> > > > > > 
> > > > > > I think it's OK since pmem explicitly means "persistent":
> > > > > > 
> > > > > > The @option{pmem} option specifies whether the backing file specified
> > > > > > by @option{mem-path} is in host persistent memory that can be accessed
> > > > > > using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > > > > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > > > > guarantee the persistence of its own writes to @option{mem-path}
> > > > > > (e.g. in vNVDIMM label emulation and live migration).
> > > > > 
> > > > > If it's OK, let's at least explicitly document that we are
> > > > > breaking compatibility in those cases.
> > > > > 
> > > > > 
> > > > > > > > 
> > > > > [...]
> > > > > > I think generally MAP_SYNC is required.
> > > > > > But for compatibility reasons we might need to support
> > > > > > !MAP_SYNC on old kernels even though it's risky.
> > > > > 
> > > > > What about making MAP_SYNC optional only on older machine-types?
> > > > 
> > > > I don't think this makes sense. It's not a guest visible change,
> > > > machine types are for that.
> > > 
> > > Losing data written to persistent memory is surely guest-visible
> > > behavior.
> > 
> > I think we need not be purists here. Most people don't lose power and
> > then it's fine and compatible. People who want more robustness need to
> > use more modern kernels, that is all.
> 
> I don't think that's being purist.  I want to avoid hidden bugs
> if we ignore that MAP_SYNC failed for any unexpected reason.  If
> we need to ignore errors in some cases, let's at least limit that
> to cases where we absolutely have to.
> But I would also be happy with just a warning.

Makes sense to me. So if it fails with EOPNOTSUPP,
we try with MAP_SHARED_VALIDATE without MAP_SYNC.
If that succeeds then it's not a dax file, and we warn.
If it fails too then it's an old kernel and we
silently proceed for compatibility reasons.


> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation
  2019-01-25  3:26                   ` Eduardo Habkost
  2019-01-25 20:01                     ` Michael S. Tsirkin
@ 2019-01-25 20:03                     ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-25 20:03 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, qemu-devel, imammedo, dan.j.williams

On Fri, Jan 25, 2019 at 01:26:53AM -0200, Eduardo Habkost wrote:
> > I think we need not be purists here. Most people don't lose power and
> > then it's fine and compatible. People who want more robustness need to
> > use more modern kernels, that is all.
> 
> I don't think that's being purist.

Right I thought that you want to prevent kernel running this
on old kernels. Why it's nice to stay up to date this
flag was not backported into stable kernels so many users
can't really update.


> I want to avoid hidden bugs


Makes total sense, I posted a proposal.

> if we ignore that MAP_SYNC failed for any unexpected reason.  If
> we need to ignore errors in some cases, let's at least limit that
> to cases where we absolutely have to.
> 
> But I would also be happy with just a warning.
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-25 20:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23  2:59 [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Zhang, Yi
2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 1/4] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang, Yi
2019-01-23 15:01   ` Michael S. Tsirkin
2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 2/4] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
2019-01-23 15:04   ` Michael S. Tsirkin
2019-01-24 14:15     ` Yi Zhang
2019-01-24 13:39       ` Michael S. Tsirkin
2019-01-23  2:59 ` [Qemu-devel] [PATCH V10 3/4] hostmem: add more information in error messages Zhang, Yi
2019-01-23  3:00 ` [Qemu-devel] [PATCH V10 4/4] docs: Added MAP_SYNC documentation Zhang, Yi
2019-01-23 14:50   ` Eduardo Habkost
2019-01-24 11:21     ` Yi Zhang
2019-01-24 16:59       ` Eduardo Habkost
2019-01-24 17:45         ` Michael S. Tsirkin
2019-01-24 18:28           ` Eduardo Habkost
2019-01-24 19:05             ` Michael S. Tsirkin
2019-01-24 19:14               ` Eduardo Habkost
2019-01-25  3:08                 ` Michael S. Tsirkin
2019-01-25  3:26                   ` Eduardo Habkost
2019-01-25 20:01                     ` Michael S. Tsirkin
2019-01-25 20:03                     ` Michael S. Tsirkin
2019-01-25 11:19                 ` Yi Zhang
2019-01-23  3:01 ` [Qemu-devel] [PATCH V10 0/4] support MAP_SYNC for memory-backend-file Michael S. Tsirkin
2019-01-23  3:10   ` Yi Zhang
2019-01-23  3:53     ` Michael S. Tsirkin
2019-01-23 15:45       ` Yi Zhang
2019-01-23  4:02     ` Michael S. Tsirkin
2019-01-23 15:57       ` Yi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.