All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-22  0:48 ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-22  0:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost, imammedo, dan.j.williams,
	yi.z.zhang

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Test with below cases:
1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
   1) start VM1 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
   
   2) start VM2 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   3) live migrate from VM1 to VM2.
   
   4) Suddenly let Host crash or power failure.

   5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.

   b: backend is a regular file.
   1) start with options
   -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   will warning "failed to validate with mapping flags: Operation not supported"
   FILE_1 and FILE_2 random corrupt.

2. Other cases:
   FILE_1 and FILE_2 random corrupt.

Changes in V14:
 * 1/2 rebase on top of current upstream and tested

Changes in V13:
 * 4/5 Micheal: move the inlcude to mmap_alloc.c.
 * 4/5 Micheal: refine the warning message.
 * 5/5 Micheal: refine the Documentations.

Changes in V12:
 * 2/5: Micheal: Update update-linux-headers.sh
 * 3/5: Micheal: Use script update add linux/mman.h
 * 4/5: Pankaj,Micheal: 1) fallback to mmap without
        MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
	2) Replace the include with 3/5 added linux/mman.h
 * 5/5: Micheal: Refine the Documentations.

Changes in V11:
 * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
 * 2/3: Micheal: Fix the compatibility for old kernel.
 * 2/3&3/3: Micheal&Eduardo :Update the behavior below: 
   Waning at no-dax and continue without MAP_SYNC.
   Test if fails again for compatibility, then remove the MAP_VALIDATE and
   silently proceed.

Changes in V10:
 * 4/4: refine the document.
 * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
 * 2/4: Fix the wrong include header

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (2):
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

 docs/nvdimm.txt   | 22 +++++++++++++++++++---
 qemu-options.hx   |  5 +++++
 util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 64 insertions(+), 4 deletions(-)

-- 
2.19.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-22  0:48 ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-22  0:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: pagupta, xiaoguangrong.eric, mst, yi.z.zhang, yu.c.zhang,
	richardw.yang, stefanha, imammedo, pbonzini, dan.j.williams,
	ehabkost

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Test with below cases:
1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
   1) start VM1 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
   
   2) start VM2 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   3) live migrate from VM1 to VM2.
   
   4) Suddenly let Host crash or power failure.

   5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.

   b: backend is a regular file.
   1) start with options
   -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   will warning "failed to validate with mapping flags: Operation not supported"
   FILE_1 and FILE_2 random corrupt.

2. Other cases:
   FILE_1 and FILE_2 random corrupt.

Changes in V14:
 * 1/2 rebase on top of current upstream and tested

Changes in V13:
 * 4/5 Micheal: move the inlcude to mmap_alloc.c.
 * 4/5 Micheal: refine the warning message.
 * 5/5 Micheal: refine the Documentations.

Changes in V12:
 * 2/5: Micheal: Update update-linux-headers.sh
 * 3/5: Micheal: Use script update add linux/mman.h
 * 4/5: Pankaj,Micheal: 1) fallback to mmap without
        MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
	2) Replace the include with 3/5 added linux/mman.h
 * 5/5: Micheal: Refine the Documentations.

Changes in V11:
 * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
 * 2/3: Micheal: Fix the compatibility for old kernel.
 * 2/3&3/3: Micheal&Eduardo :Update the behavior below: 
   Waning at no-dax and continue without MAP_SYNC.
   Test if fails again for compatibility, then remove the MAP_VALIDATE and
   silently proceed.

Changes in V10:
 * 4/4: refine the document.
 * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
 * 2/4: Fix the wrong include header

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (2):
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

 docs/nvdimm.txt   | 22 +++++++++++++++++++---
 qemu-options.hx   |  5 +++++
 util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 64 insertions(+), 4 deletions(-)

-- 
2.19.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-22  0:48   ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-22  0:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost, imammedo, dan.j.williams,
	yi.z.zhang, Haozhong Zhang

From: Zhang Yi <yi.z.zhang@linux.intel.com>

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
    - MAP_SYNC will active.
   b: backend is not a dax supporting file.
    - mmap will trigger a warning. then MAP_SYNC flag will be ignored

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
[ehabkost: Rebased patch to latest code on master]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Tested-by: Wei Yang <richardw.yang@linux.intel.com>

---
v14: rebase on top of current upstream
---
 util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 9713f4b960..f7f177d0ea 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -10,6 +10,13 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#ifdef CONFIG_LINUX
+#include <linux/mman.h>
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC              0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
@@ -82,6 +89,7 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem)
 {
     int flags;
+    int map_sync_flags = 0;
     int guardfd;
     size_t offset;
     size_t pagesize;
@@ -132,9 +140,40 @@ void *qemu_ram_mmap(int fd,
     flags = MAP_FIXED;
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+    ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
+               flags | map_sync_flags, fd, 0);
+
+    if (ptr == MAP_FAILED && map_sync_flags) {
+        if (errno == ENOTSUP) {
+            char *proc_link, *file_name;
+            int len;
+            proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+            file_name = g_malloc0(PATH_MAX);
+            len = readlink(proc_link, file_name, PATH_MAX - 1);
+            if (len < 0) {
+                len = 0;
+            }
+            file_name[len] = '\0';
+            fprintf(stderr, "Warning: requesting persistence across crashes "
+                    "for backend file %s failed. Proceeding without "
+                    "persistence, data might become corrupted in case of host "
+                    "crash.\n", file_name);
+            g_free(proc_link);
+            g_free(file_name);
+        }
+        /*
+         * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
+         * we will remove these flags to handle compatibility.
+         */
+        ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
+                   flags, fd, 0);
+    }
 
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-22  0:48   ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-22  0:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: pagupta, xiaoguangrong.eric, mst, Haozhong Zhang, yi.z.zhang,
	yu.c.zhang, richardw.yang, stefanha, imammedo, pbonzini,
	dan.j.williams, ehabkost

From: Zhang Yi <yi.z.zhang@linux.intel.com>

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
    - MAP_SYNC will active.
   b: backend is not a dax supporting file.
    - mmap will trigger a warning. then MAP_SYNC flag will be ignored

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
[ehabkost: Rebased patch to latest code on master]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Tested-by: Wei Yang <richardw.yang@linux.intel.com>

---
v14: rebase on top of current upstream
---
 util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 9713f4b960..f7f177d0ea 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -10,6 +10,13 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#ifdef CONFIG_LINUX
+#include <linux/mman.h>
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC              0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
@@ -82,6 +89,7 @@ void *qemu_ram_mmap(int fd,
                     bool is_pmem)
 {
     int flags;
+    int map_sync_flags = 0;
     int guardfd;
     size_t offset;
     size_t pagesize;
@@ -132,9 +140,40 @@ void *qemu_ram_mmap(int fd,
     flags = MAP_FIXED;
     flags |= fd == -1 ? MAP_ANONYMOUS : 0;
     flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
     offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-    ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+    ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
+               flags | map_sync_flags, fd, 0);
+
+    if (ptr == MAP_FAILED && map_sync_flags) {
+        if (errno == ENOTSUP) {
+            char *proc_link, *file_name;
+            int len;
+            proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+            file_name = g_malloc0(PATH_MAX);
+            len = readlink(proc_link, file_name, PATH_MAX - 1);
+            if (len < 0) {
+                len = 0;
+            }
+            file_name[len] = '\0';
+            fprintf(stderr, "Warning: requesting persistence across crashes "
+                    "for backend file %s failed. Proceeding without "
+                    "persistence, data might become corrupted in case of host "
+                    "crash.\n", file_name);
+            g_free(proc_link);
+            g_free(file_name);
+        }
+        /*
+         * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
+         * we will remove these flags to handle compatibility.
+         */
+        ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
+                   flags, fd, 0);
+    }
 
     if (ptr == MAP_FAILED) {
         munmap(guardptr, total);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v14 2/2] docs: Added MAP_SYNC documentation
@ 2019-04-22  0:48   ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-22  0:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost, imammedo, dan.j.williams,
	yi.z.zhang

From: Zhang Yi <yi.z.zhang@linux.intel.com>

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 docs/nvdimm.txt | 22 +++++++++++++++++++---
 qemu-options.hx |  5 +++++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 7231c2d78f..bcd1456e72 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -144,9 +144,25 @@ Guest Data Persistence
 ----------------------
 
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are silently ignored if not satisfied. Currently, no way
+is provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
 
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
diff --git a/qemu-options.hx b/qemu-options.hx
index 08749a3391..bdc74c0620 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4233,6 +4233,11 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which ensures the
+file metadata is in sync for @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and the filesystem of @option{mem-path} mounted
+with DAX option.
 
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v14 2/2] docs: Added MAP_SYNC documentation
@ 2019-04-22  0:48   ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-22  0:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: pagupta, xiaoguangrong.eric, mst, yi.z.zhang, yu.c.zhang,
	richardw.yang, stefanha, imammedo, pbonzini, dan.j.williams,
	ehabkost

From: Zhang Yi <yi.z.zhang@linux.intel.com>

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 docs/nvdimm.txt | 22 +++++++++++++++++++---
 qemu-options.hx |  5 +++++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 7231c2d78f..bcd1456e72 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -144,9 +144,25 @@ Guest Data Persistence
 ----------------------
 
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are silently ignored if not satisfied. Currently, no way
+is provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
 
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
diff --git a/qemu-options.hx b/qemu-options.hx
index 08749a3391..bdc74c0620 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4233,6 +4233,11 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which ensures the
+file metadata is in sync for @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and the filesystem of @option{mem-path} mounted
+with DAX option.
 
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-22 12:34   ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2019-04-22 12:34 UTC (permalink / raw)
  To: Wei Yang
  Cc: qemu-devel, xiaoguangrong.eric, stefanha, pbonzini, pagupta,
	yu.c.zhang, ehabkost, imammedo, dan.j.williams, yi.z.zhang

On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').
> 
> A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>     https://patchwork.kernel.org/patch/10028151/
> 
> In order to make sure that the file metadata is in sync after a fault 
> while we are writing a shared DAX supporting backend files, this
> patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> 
> As the DAX vs DMA truncated issue was solved, we refined the code and
> send out this feature for the v5 version.
> 
> We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> 'share=on' & 'pmem=on'. 
> Or QEMU will not pass this flag to mmap(2)

OK this is in a good shape. As we are in freeze anyway,
there's still a bit more time to polish it. I have a couple of
suggestions:

- squash docs in same patch with code, no need for two patches
- mmap errors are not silently ignored as the doc says,
  a warning is produced

Also, it might make sense to send the warnings to an errp object and not stderr.
I would leave that to a follow-up patch.


> Test with below cases:
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
>    a: backend is a dax supporting file.
>    1) start VM1 with options:
>    -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
>    
>    2) start VM2 with options:
>    -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
> 
>    3) live migrate from VM1 to VM2.
>    
>    4) Suddenly let Host crash or power failure.
> 
>    5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.
> 
>    b: backend is a regular file.
>    1) start with options
>    -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
> 
>    will warning "failed to validate with mapping flags: Operation not supported"
>    FILE_1 and FILE_2 random corrupt.
> 
> 2. Other cases:
>    FILE_1 and FILE_2 random corrupt.
> 
> Changes in V14:
>  * 1/2 rebase on top of current upstream and tested
> 
> Changes in V13:
>  * 4/5 Micheal: move the inlcude to mmap_alloc.c.
>  * 4/5 Micheal: refine the warning message.
>  * 5/5 Micheal: refine the Documentations.
> 
> Changes in V12:
>  * 2/5: Micheal: Update update-linux-headers.sh
>  * 3/5: Micheal: Use script update add linux/mman.h
>  * 4/5: Pankaj,Micheal: 1) fallback to mmap without
>         MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
> 	2) Replace the include with 3/5 added linux/mman.h
>  * 5/5: Micheal: Refine the Documentations.
> 
> Changes in V11:
>  * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
>  * 2/3: Micheal: Fix the compatibility for old kernel.
>  * 2/3&3/3: Micheal&Eduardo :Update the behavior below: 
>    Waning at no-dax and continue without MAP_SYNC.
>    Test if fails again for compatibility, then remove the MAP_VALIDATE and
>    silently proceed.
> 
> Changes in V10:
>  * 4/4: refine the document.
>  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
>  * 2/4: Fix the wrong include header
> 
> Changes in V9:
>  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
>  since I don't have much knowledge about the sparse feature, @Micheal Could you 
>  add some documentation/commit message on this patch? Thank you very much.
>  * 3/6: from 2/5: Eduardo: updated the commit message. 
>  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
>  * 5/6: from 4/5: Eduardo: updated the commit message.
>  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> 
> Changes in v8:
>  * Micheal: 3/5, remove the duplicated define in the os_dep.h
>  * Micheal: 2/5, make type define safety.
>  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
>  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
>    MAP_SYNC only worked with pmem=on.
>  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
>    all the flags in one parameter.
> 
> Changes in v7:
>  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> 
> Changes in v6:
>  * Pankaj: 3/7 are squashed with 2/7
>  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
>  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
>  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
>  * Stefan, 5/7 Add missing "munmap"
>  * Stefan, 2/7 refine the shared/flag.
> 
> Changes in v5:
>  * Add patch 1 to fix a memory leak issue.
>  * Refine the patch 4-6
>  * Remove the patch 3 as we already change the parameter from "shared" to
>    "flags"
> 
> Changes in v4:
>  * Add patch 1-3 to switch some functions to a single 'flags'
>    parameters. (Michael S. Tsirkin)
>  * v3 patch 1-3 become v4 patch 4-6.
>  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
>    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
>  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> 
> Changes in v3:
>  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
>    cases, and add back the retry mechanism. MAP_SYNC will be ignored
>    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
>  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
>    platforms in order to make qemu_ram_mmap() compile on those platforms.
>  * Patch 2&3: include more information in error messages of
>    memory-backend in hope to help user to identify the error.
>    (Dr. David Alan Gilbert)
>  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> 
> Changes in v2:
>  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
>  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
>    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
>  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
>    to osdep.h. (Michael S. Tsirkin)
> 
> Zhang Yi (2):
>   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
>   docs: Added MAP_SYNC documentation
> 
>  docs/nvdimm.txt   | 22 +++++++++++++++++++---
>  qemu-options.hx   |  5 +++++
>  util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 64 insertions(+), 4 deletions(-)
> 
> -- 
> 2.19.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-22 12:34   ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2019-04-22 12:34 UTC (permalink / raw)
  To: Wei Yang
  Cc: pagupta, xiaoguangrong.eric, qemu-devel, yi.z.zhang, yu.c.zhang,
	stefanha, imammedo, pbonzini, dan.j.williams, ehabkost

On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').
> 
> A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>     https://patchwork.kernel.org/patch/10028151/
> 
> In order to make sure that the file metadata is in sync after a fault 
> while we are writing a shared DAX supporting backend files, this
> patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> 
> As the DAX vs DMA truncated issue was solved, we refined the code and
> send out this feature for the v5 version.
> 
> We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> 'share=on' & 'pmem=on'. 
> Or QEMU will not pass this flag to mmap(2)

OK this is in a good shape. As we are in freeze anyway,
there's still a bit more time to polish it. I have a couple of
suggestions:

- squash docs in same patch with code, no need for two patches
- mmap errors are not silently ignored as the doc says,
  a warning is produced

Also, it might make sense to send the warnings to an errp object and not stderr.
I would leave that to a follow-up patch.


> Test with below cases:
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
>    a: backend is a dax supporting file.
>    1) start VM1 with options:
>    -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
>    
>    2) start VM2 with options:
>    -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
> 
>    3) live migrate from VM1 to VM2.
>    
>    4) Suddenly let Host crash or power failure.
> 
>    5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.
> 
>    b: backend is a regular file.
>    1) start with options
>    -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
> 
>    will warning "failed to validate with mapping flags: Operation not supported"
>    FILE_1 and FILE_2 random corrupt.
> 
> 2. Other cases:
>    FILE_1 and FILE_2 random corrupt.
> 
> Changes in V14:
>  * 1/2 rebase on top of current upstream and tested
> 
> Changes in V13:
>  * 4/5 Micheal: move the inlcude to mmap_alloc.c.
>  * 4/5 Micheal: refine the warning message.
>  * 5/5 Micheal: refine the Documentations.
> 
> Changes in V12:
>  * 2/5: Micheal: Update update-linux-headers.sh
>  * 3/5: Micheal: Use script update add linux/mman.h
>  * 4/5: Pankaj,Micheal: 1) fallback to mmap without
>         MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
> 	2) Replace the include with 3/5 added linux/mman.h
>  * 5/5: Micheal: Refine the Documentations.
> 
> Changes in V11:
>  * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
>  * 2/3: Micheal: Fix the compatibility for old kernel.
>  * 2/3&3/3: Micheal&Eduardo :Update the behavior below: 
>    Waning at no-dax and continue without MAP_SYNC.
>    Test if fails again for compatibility, then remove the MAP_VALIDATE and
>    silently proceed.
> 
> Changes in V10:
>  * 4/4: refine the document.
>  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
>  * 2/4: Fix the wrong include header
> 
> Changes in V9:
>  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
>  since I don't have much knowledge about the sparse feature, @Micheal Could you 
>  add some documentation/commit message on this patch? Thank you very much.
>  * 3/6: from 2/5: Eduardo: updated the commit message. 
>  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
>  * 5/6: from 4/5: Eduardo: updated the commit message.
>  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> 
> Changes in v8:
>  * Micheal: 3/5, remove the duplicated define in the os_dep.h
>  * Micheal: 2/5, make type define safety.
>  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
>  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
>    MAP_SYNC only worked with pmem=on.
>  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
>    all the flags in one parameter.
> 
> Changes in v7:
>  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> 
> Changes in v6:
>  * Pankaj: 3/7 are squashed with 2/7
>  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
>  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
>  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
>  * Stefan, 5/7 Add missing "munmap"
>  * Stefan, 2/7 refine the shared/flag.
> 
> Changes in v5:
>  * Add patch 1 to fix a memory leak issue.
>  * Refine the patch 4-6
>  * Remove the patch 3 as we already change the parameter from "shared" to
>    "flags"
> 
> Changes in v4:
>  * Add patch 1-3 to switch some functions to a single 'flags'
>    parameters. (Michael S. Tsirkin)
>  * v3 patch 1-3 become v4 patch 4-6.
>  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
>    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
>  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> 
> Changes in v3:
>  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
>    cases, and add back the retry mechanism. MAP_SYNC will be ignored
>    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
>  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
>    platforms in order to make qemu_ram_mmap() compile on those platforms.
>  * Patch 2&3: include more information in error messages of
>    memory-backend in hope to help user to identify the error.
>    (Dr. David Alan Gilbert)
>  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> 
> Changes in v2:
>  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
>  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
>    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
>  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
>    to osdep.h. (Michael S. Tsirkin)
> 
> Zhang Yi (2):
>   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
>   docs: Added MAP_SYNC documentation
> 
>  docs/nvdimm.txt   | 22 +++++++++++++++++++---
>  qemu-options.hx   |  5 +++++
>  util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 64 insertions(+), 4 deletions(-)
> 
> -- 
> 2.19.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-22 18:22     ` Eduardo Habkost
  0 siblings, 0 replies; 24+ messages in thread
From: Eduardo Habkost @ 2019-04-22 18:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Wei Yang, qemu-devel, xiaoguangrong.eric, stefanha, pbonzini,
	pagupta, yu.c.zhang, imammedo, dan.j.williams, yi.z.zhang

On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > files on ext4/xfs file system mounted with '-o dax').
> > 
> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> >     https://patchwork.kernel.org/patch/10028151/
> > 
> > In order to make sure that the file metadata is in sync after a fault 
> > while we are writing a shared DAX supporting backend files, this
> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > 
> > As the DAX vs DMA truncated issue was solved, we refined the code and
> > send out this feature for the v5 version.
> > 
> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > 'share=on' & 'pmem=on'. 
> > Or QEMU will not pass this flag to mmap(2)
> 
> OK this is in a good shape. As we are in freeze anyway,
> there's still a bit more time to polish it. I have a couple of
> suggestions:
> 
> - squash docs in same patch with code, no need for two patches
> - mmap errors are not silently ignored as the doc says,
>   a warning is produced

Note that a warning is produced only if both share=on and pmem=on
is specified.  If using pmem=on without share=on, no warning is
printed at all.

I agree we could squash the docs in the same patch, but I don't
want to prevent the code from being merged and require v15 to be
sent just because we are still polishing the documentation.

If there are no objections, I plan to apply this version of the
series including the following fixup (just removing the word
"silently"), and I suggest further improvements to be sent as
follow up patches.

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index bcd1456e72..b531cacd35 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share'
 are not set, if the backend file does not support DAX or if MAP_SYNC
 is not supported by the host kernel, write persistence is not
 guaranteed after a system crash. For compatibility reasons, these
-conditions are silently ignored if not satisfied. Currently, no way
-is provided to test for them.
+conditions are ignored if not satisfied. Currently, no way is
+provided to test for them.
 For more details, please reference mmap(2) man page:
 http://man7.org/linux/man-pages/man2/mmap.2.html.
 
-- 
Eduardo

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-22 18:22     ` Eduardo Habkost
  0 siblings, 0 replies; 24+ messages in thread
From: Eduardo Habkost @ 2019-04-22 18:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pagupta, xiaoguangrong.eric, qemu-devel, yi.z.zhang, yu.c.zhang,
	Wei Yang, stefanha, imammedo, pbonzini, dan.j.williams

On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > files on ext4/xfs file system mounted with '-o dax').
> > 
> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> >     https://patchwork.kernel.org/patch/10028151/
> > 
> > In order to make sure that the file metadata is in sync after a fault 
> > while we are writing a shared DAX supporting backend files, this
> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > 
> > As the DAX vs DMA truncated issue was solved, we refined the code and
> > send out this feature for the v5 version.
> > 
> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > 'share=on' & 'pmem=on'. 
> > Or QEMU will not pass this flag to mmap(2)
> 
> OK this is in a good shape. As we are in freeze anyway,
> there's still a bit more time to polish it. I have a couple of
> suggestions:
> 
> - squash docs in same patch with code, no need for two patches
> - mmap errors are not silently ignored as the doc says,
>   a warning is produced

Note that a warning is produced only if both share=on and pmem=on
is specified.  If using pmem=on without share=on, no warning is
printed at all.

I agree we could squash the docs in the same patch, but I don't
want to prevent the code from being merged and require v15 to be
sent just because we are still polishing the documentation.

If there are no objections, I plan to apply this version of the
series including the following fixup (just removing the word
"silently"), and I suggest further improvements to be sent as
follow up patches.

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index bcd1456e72..b531cacd35 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share'
 are not set, if the backend file does not support DAX or if MAP_SYNC
 is not supported by the host kernel, write persistence is not
 guaranteed after a system crash. For compatibility reasons, these
-conditions are silently ignored if not satisfied. Currently, no way
-is provided to test for them.
+conditions are ignored if not satisfied. Currently, no way is
+provided to test for them.
 For more details, please reference mmap(2) man page:
 http://man7.org/linux/man-pages/man2/mmap.2.html.
 
-- 
Eduardo


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-23  2:41       ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-23  2:41 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Michael S. Tsirkin, Wei Yang, qemu-devel, xiaoguangrong.eric,
	stefanha, pbonzini, pagupta, yu.c.zhang, imammedo,
	dan.j.williams, yi.z.zhang

On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote:
>On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
>> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
>> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
>> > files on ext4/xfs file system mounted with '-o dax').
>> > 
>> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>> >     https://patchwork.kernel.org/patch/10028151/
>> > 
>> > In order to make sure that the file metadata is in sync after a fault 
>> > while we are writing a shared DAX supporting backend files, this
>> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
>> > 
>> > As the DAX vs DMA truncated issue was solved, we refined the code and
>> > send out this feature for the v5 version.
>> > 
>> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
>> > 'share=on' & 'pmem=on'. 
>> > Or QEMU will not pass this flag to mmap(2)
>> 
>> OK this is in a good shape. As we are in freeze anyway,
>> there's still a bit more time to polish it. I have a couple of
>> suggestions:
>> 
>> - squash docs in same patch with code, no need for two patches
>> - mmap errors are not silently ignored as the doc says,
>>   a warning is produced
>
>Note that a warning is produced only if both share=on and pmem=on
>is specified.  If using pmem=on without share=on, no warning is
>printed at all.
>
>I agree we could squash the docs in the same patch, but I don't
>want to prevent the code from being merged and require v15 to be
>sent just because we are still polishing the documentation.
>
>If there are no objections, I plan to apply this version of the
>series including the following fixup (just removing the word
>"silently"), and I suggest further improvements to be sent as
>follow up patches.
>

If my understanding is correct, the following up patch is:

"send the warnings to an errp object and not stderr"

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-23  2:41       ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-23  2:41 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pagupta, xiaoguangrong.eric, Michael S. Tsirkin, qemu-devel,
	yi.z.zhang, yu.c.zhang, Wei Yang, stefanha, imammedo, pbonzini,
	dan.j.williams

On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote:
>On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
>> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
>> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
>> > files on ext4/xfs file system mounted with '-o dax').
>> > 
>> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>> >     https://patchwork.kernel.org/patch/10028151/
>> > 
>> > In order to make sure that the file metadata is in sync after a fault 
>> > while we are writing a shared DAX supporting backend files, this
>> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
>> > 
>> > As the DAX vs DMA truncated issue was solved, we refined the code and
>> > send out this feature for the v5 version.
>> > 
>> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
>> > 'share=on' & 'pmem=on'. 
>> > Or QEMU will not pass this flag to mmap(2)
>> 
>> OK this is in a good shape. As we are in freeze anyway,
>> there's still a bit more time to polish it. I have a couple of
>> suggestions:
>> 
>> - squash docs in same patch with code, no need for two patches
>> - mmap errors are not silently ignored as the doc says,
>>   a warning is produced
>
>Note that a warning is produced only if both share=on and pmem=on
>is specified.  If using pmem=on without share=on, no warning is
>printed at all.
>
>I agree we could squash the docs in the same patch, but I don't
>want to prevent the code from being merged and require v15 to be
>sent just because we are still polishing the documentation.
>
>If there are no objections, I plan to apply this version of the
>series including the following fixup (just removing the word
>"silently"), and I suggest further improvements to be sent as
>follow up patches.
>

If my understanding is correct, the following up patch is:

"send the warnings to an errp object and not stderr"

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-23  9:25     ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2019-04-23  9:25 UTC (permalink / raw)
  To: Wei Yang
  Cc: qemu-devel, xiaoguangrong.eric, pbonzini, pagupta, yu.c.zhang,
	mst, ehabkost, imammedo, dan.j.williams, yi.z.zhang,
	Haozhong Zhang

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]

On Mon, Apr 22, 2019 at 08:48:48AM +0800, Wei Yang wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition which can ensure file system metadata
> synced in each guest writes to the backend file, without other QEMU
> actions (e.g., periodic fsync() by QEMU).
> 
> Current, We have below different possible use cases:
> 
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
>    a: backend is a dax supporting file.
>     - MAP_SYNC will active.
>    b: backend is not a dax supporting file.
>     - mmap will trigger a warning. then MAP_SYNC flag will be ignored
> 
> 2. The rest of cases:
>    - we will never pass the MAP_SYNC to mmap2
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> [ehabkost: Rebased patch to latest code on master]
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Tested-by: Wei Yang <richardw.yang@linux.intel.com>
> 
> ---
> v14: rebase on top of current upstream
> ---
>  util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 9713f4b960..f7f177d0ea 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -10,6 +10,13 @@
>   * later.  See the COPYING file in the top-level directory.
>   */
>  
> +#ifdef CONFIG_LINUX
> +#include <linux/mman.h>
> +#else  /* !CONFIG_LINUX */
> +#define MAP_SYNC              0x0
> +#define MAP_SHARED_VALIDATE   0x0
> +#endif /* CONFIG_LINUX */

MAP_SHARED_VALIDATE is is from 2017:

  commit 1c9725974074a047f6080eecc62c50a8e840d050
  Author: Dan Williams <dan.j.williams@intel.com>
  Date:   Wed Nov 1 16:36:30 2017 +0100

    mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags

This code fails to compile on Linux hosts with pre-4.15 headers.

Instead you could use the following (even on Linux!):

  #ifndef MAP_SYNC
  #define MAP_SYNC 0
  #endif
  #ifndef MAP_SHARED_VALIDATE
  #define MAP_SHARED_VALIDATE 0
  #endif

Either way:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-23  9:25     ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2019-04-23  9:25 UTC (permalink / raw)
  To: Wei Yang
  Cc: pagupta, xiaoguangrong.eric, mst, qemu-devel, yi.z.zhang,
	yu.c.zhang, Haozhong Zhang, imammedo, pbonzini, dan.j.williams,
	ehabkost

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]

On Mon, Apr 22, 2019 at 08:48:48AM +0800, Wei Yang wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition which can ensure file system metadata
> synced in each guest writes to the backend file, without other QEMU
> actions (e.g., periodic fsync() by QEMU).
> 
> Current, We have below different possible use cases:
> 
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
>    a: backend is a dax supporting file.
>     - MAP_SYNC will active.
>    b: backend is not a dax supporting file.
>     - mmap will trigger a warning. then MAP_SYNC flag will be ignored
> 
> 2. The rest of cases:
>    - we will never pass the MAP_SYNC to mmap2
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> [ehabkost: Rebased patch to latest code on master]
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Tested-by: Wei Yang <richardw.yang@linux.intel.com>
> 
> ---
> v14: rebase on top of current upstream
> ---
>  util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 9713f4b960..f7f177d0ea 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -10,6 +10,13 @@
>   * later.  See the COPYING file in the top-level directory.
>   */
>  
> +#ifdef CONFIG_LINUX
> +#include <linux/mman.h>
> +#else  /* !CONFIG_LINUX */
> +#define MAP_SYNC              0x0
> +#define MAP_SHARED_VALIDATE   0x0
> +#endif /* CONFIG_LINUX */

MAP_SHARED_VALIDATE is is from 2017:

  commit 1c9725974074a047f6080eecc62c50a8e840d050
  Author: Dan Williams <dan.j.williams@intel.com>
  Date:   Wed Nov 1 16:36:30 2017 +0100

    mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags

This code fails to compile on Linux hosts with pre-4.15 headers.

Instead you could use the following (even on Linux!):

  #ifndef MAP_SYNC
  #define MAP_SYNC 0
  #endif
  #ifndef MAP_SHARED_VALIDATE
  #define MAP_SHARED_VALIDATE 0
  #endif

Either way:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 2/2] docs: Added MAP_SYNC documentation
@ 2019-04-23  9:26     ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2019-04-23  9:26 UTC (permalink / raw)
  To: Wei Yang
  Cc: qemu-devel, xiaoguangrong.eric, pbonzini, pagupta, yu.c.zhang,
	mst, ehabkost, imammedo, dan.j.williams, yi.z.zhang

[-- Attachment #1: Type: text/plain, Size: 360 bytes --]

On Mon, Apr 22, 2019 at 08:48:49AM +0800, Wei Yang wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  docs/nvdimm.txt | 22 +++++++++++++++++++---
>  qemu-options.hx |  5 +++++
>  2 files changed, 24 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 2/2] docs: Added MAP_SYNC documentation
@ 2019-04-23  9:26     ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2019-04-23  9:26 UTC (permalink / raw)
  To: Wei Yang
  Cc: pagupta, xiaoguangrong.eric, mst, qemu-devel, yi.z.zhang,
	yu.c.zhang, imammedo, pbonzini, dan.j.williams, ehabkost

[-- Attachment #1: Type: text/plain, Size: 360 bytes --]

On Mon, Apr 22, 2019 at 08:48:49AM +0800, Wei Yang wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  docs/nvdimm.txt | 22 +++++++++++++++++++---
>  qemu-options.hx |  5 +++++
>  2 files changed, 24 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 2/2] docs: Added MAP_SYNC documentation
@ 2019-04-23  9:57     ` Pankaj Gupta
  0 siblings, 0 replies; 24+ messages in thread
From: Pankaj Gupta @ 2019-04-23  9:57 UTC (permalink / raw)
  To: Wei Yang
  Cc: qemu-devel, xiaoguangrong eric, mst, yi z zhang, yu c zhang,
	stefanha, imammedo, pbonzini, dan j williams, ehabkost


> 
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  docs/nvdimm.txt | 22 +++++++++++++++++++---
>  qemu-options.hx |  5 +++++
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 7231c2d78f..bcd1456e72 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -144,9 +144,25 @@ Guest Data Persistence
>  ----------------------
>  
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> -which all guest access do not involve any host-side kernel cache.
> +the only backend that can guarantee the guest write persistence is:
> +
> +A. DAX device (e.g., /dev/dax0.0, ) or
> +B. DAX file(mounted with dax option)
> +
> +When using B (A file supporting direct mapping of persistent memory)
> +as a backend, write persistence is guaranteed if the host kernel has
> +support for the MAP_SYNC flag in the mmap system call (available
> +since Linux 4.15 and on certain distro kernels) and additionally
> +both 'pmem' and 'share' flags are set to 'on' on the backend.
> +
> +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> +are not set, if the backend file does not support DAX or if MAP_SYNC
> +is not supported by the host kernel, write persistence is not
> +guaranteed after a system crash. For compatibility reasons, these
> +conditions are silently ignored if not satisfied. Currently, no way
> +is provided to test for them.
> +For more details, please reference mmap(2) man page:
> +http://man7.org/linux/man-pages/man2/mmap.2.html.
>  
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08749a3391..bdc74c0620 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4233,6 +4233,11 @@ using the SNIA NVM programming model (e.g. Intel
> NVDIMM).
>  If @option{pmem} is set to 'on', QEMU will take necessary operations to
>  guarantee the persistence of its own writes to @option{mem-path}
>  (e.g. in vNVDIMM label emulation and live migration).
> +Also, we will map the backend-file with MAP_SYNC flag, which ensures the
> +file metadata is in sync for @option{mem-path} in case of host crash
> +or a power failure. MAP_SYNC requires support from both the host kernel
> +(since Linux kernel 4.15) and the filesystem of @option{mem-path} mounted
> +with DAX option.
>  
>  @item -object
>  memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
> --
> 2.19.1

Looks good to me.

Reviewed-by: Pankaj Gupta <pagupta@redhat.com>

> 
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 2/2] docs: Added MAP_SYNC documentation
@ 2019-04-23  9:57     ` Pankaj Gupta
  0 siblings, 0 replies; 24+ messages in thread
From: Pankaj Gupta @ 2019-04-23  9:57 UTC (permalink / raw)
  To: Wei Yang
  Cc: xiaoguangrong eric, mst, qemu-devel, yi z zhang, yu c zhang,
	stefanha, pbonzini, imammedo, dan j williams, ehabkost


> 
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  docs/nvdimm.txt | 22 +++++++++++++++++++---
>  qemu-options.hx |  5 +++++
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 7231c2d78f..bcd1456e72 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -144,9 +144,25 @@ Guest Data Persistence
>  ----------------------
>  
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> -which all guest access do not involve any host-side kernel cache.
> +the only backend that can guarantee the guest write persistence is:
> +
> +A. DAX device (e.g., /dev/dax0.0, ) or
> +B. DAX file(mounted with dax option)
> +
> +When using B (A file supporting direct mapping of persistent memory)
> +as a backend, write persistence is guaranteed if the host kernel has
> +support for the MAP_SYNC flag in the mmap system call (available
> +since Linux 4.15 and on certain distro kernels) and additionally
> +both 'pmem' and 'share' flags are set to 'on' on the backend.
> +
> +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> +are not set, if the backend file does not support DAX or if MAP_SYNC
> +is not supported by the host kernel, write persistence is not
> +guaranteed after a system crash. For compatibility reasons, these
> +conditions are silently ignored if not satisfied. Currently, no way
> +is provided to test for them.
> +For more details, please reference mmap(2) man page:
> +http://man7.org/linux/man-pages/man2/mmap.2.html.
>  
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08749a3391..bdc74c0620 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4233,6 +4233,11 @@ using the SNIA NVM programming model (e.g. Intel
> NVDIMM).
>  If @option{pmem} is set to 'on', QEMU will take necessary operations to
>  guarantee the persistence of its own writes to @option{mem-path}
>  (e.g. in vNVDIMM label emulation and live migration).
> +Also, we will map the backend-file with MAP_SYNC flag, which ensures the
> +file metadata is in sync for @option{mem-path} in case of host crash
> +or a power failure. MAP_SYNC requires support from both the host kernel
> +(since Linux kernel 4.15) and the filesystem of @option{mem-path} mounted
> +with DAX option.
>  
>  @item -object
>  memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
> --
> 2.19.1

Looks good to me.

Reviewed-by: Pankaj Gupta <pagupta@redhat.com>

> 
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-23 12:43       ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2019-04-23 12:43 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Wei Yang, qemu-devel, xiaoguangrong.eric, stefanha, pbonzini,
	pagupta, yu.c.zhang, imammedo, dan.j.williams, yi.z.zhang

On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote:
> On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > > files on ext4/xfs file system mounted with '-o dax').
> > > 
> > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> > >     https://patchwork.kernel.org/patch/10028151/
> > > 
> > > In order to make sure that the file metadata is in sync after a fault 
> > > while we are writing a shared DAX supporting backend files, this
> > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > > 
> > > As the DAX vs DMA truncated issue was solved, we refined the code and
> > > send out this feature for the v5 version.
> > > 
> > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > > 'share=on' & 'pmem=on'. 
> > > Or QEMU will not pass this flag to mmap(2)
> > 
> > OK this is in a good shape. As we are in freeze anyway,
> > there's still a bit more time to polish it. I have a couple of
> > suggestions:
> > 
> > - squash docs in same patch with code, no need for two patches
> > - mmap errors are not silently ignored as the doc says,
> >   a warning is produced
> 
> Note that a warning is produced only if both share=on and pmem=on
> is specified.  If using pmem=on without share=on, no warning is
> printed at all.
> 
> I agree we could squash the docs in the same patch, but I don't
> want to prevent the code from being merged and require v15 to be
> sent just because we are still polishing the documentation.
> 
> If there are no objections, I plan to apply this version of the
> series including the following fixup (just removing the word
> "silently"), and I suggest further improvements to be sent as
> follow up patches.
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index bcd1456e72..b531cacd35 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share'
>  are not set, if the backend file does not support DAX or if MAP_SYNC
>  is not supported by the host kernel, write persistence is not
>  guaranteed after a system crash. For compatibility reasons, these
> -conditions are silently ignored if not satisfied. Currently, no way
> -is provided to test for them.
> +conditions are ignored if not satisfied. Currently, no way is
> +provided to test for them.
>  For more details, please reference mmap(2) man page:
>  http://man7.org/linux/man-pages/man2/mmap.2.html.

with the two being squashed, and above fix:

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>



> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file
@ 2019-04-23 12:43       ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2019-04-23 12:43 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pagupta, xiaoguangrong.eric, qemu-devel, yi.z.zhang, yu.c.zhang,
	Wei Yang, stefanha, imammedo, pbonzini, dan.j.williams

On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote:
> On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > > files on ext4/xfs file system mounted with '-o dax').
> > > 
> > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> > >     https://patchwork.kernel.org/patch/10028151/
> > > 
> > > In order to make sure that the file metadata is in sync after a fault 
> > > while we are writing a shared DAX supporting backend files, this
> > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > > 
> > > As the DAX vs DMA truncated issue was solved, we refined the code and
> > > send out this feature for the v5 version.
> > > 
> > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > > 'share=on' & 'pmem=on'. 
> > > Or QEMU will not pass this flag to mmap(2)
> > 
> > OK this is in a good shape. As we are in freeze anyway,
> > there's still a bit more time to polish it. I have a couple of
> > suggestions:
> > 
> > - squash docs in same patch with code, no need for two patches
> > - mmap errors are not silently ignored as the doc says,
> >   a warning is produced
> 
> Note that a warning is produced only if both share=on and pmem=on
> is specified.  If using pmem=on without share=on, no warning is
> printed at all.
> 
> I agree we could squash the docs in the same patch, but I don't
> want to prevent the code from being merged and require v15 to be
> sent just because we are still polishing the documentation.
> 
> If there are no objections, I plan to apply this version of the
> series including the following fixup (just removing the word
> "silently"), and I suggest further improvements to be sent as
> follow up patches.
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index bcd1456e72..b531cacd35 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share'
>  are not set, if the backend file does not support DAX or if MAP_SYNC
>  is not supported by the host kernel, write persistence is not
>  guaranteed after a system crash. For compatibility reasons, these
> -conditions are silently ignored if not satisfied. Currently, no way
> -is provided to test for them.
> +conditions are ignored if not satisfied. Currently, no way is
> +provided to test for them.
>  For more details, please reference mmap(2) man page:
>  http://man7.org/linux/man-pages/man2/mmap.2.html.

with the two being squashed, and above fix:

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>



> -- 
> Eduardo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-24  1:01       ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-24  1:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Wei Yang, qemu-devel, xiaoguangrong.eric, pbonzini, pagupta,
	yu.c.zhang, mst, ehabkost, imammedo, dan.j.williams, yi.z.zhang,
	Haozhong Zhang

On Tue, Apr 23, 2019 at 10:25:18AM +0100, Stefan Hajnoczi wrote:
[...]
>> +#ifdef CONFIG_LINUX
>> +#include <linux/mman.h>
>> +#else  /* !CONFIG_LINUX */
>> +#define MAP_SYNC              0x0
>> +#define MAP_SHARED_VALIDATE   0x0
>> +#endif /* CONFIG_LINUX */
>
>MAP_SHARED_VALIDATE is is from 2017:
>
>  commit 1c9725974074a047f6080eecc62c50a8e840d050
>  Author: Dan Williams <dan.j.williams@intel.com>
>  Date:   Wed Nov 1 16:36:30 2017 +0100
>
>    mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags
>
>This code fails to compile on Linux hosts with pre-4.15 headers.
>

Ok, qemu build will fail on pre-4.15 linux.

>Instead you could use the following (even on Linux!):
>
>  #ifndef MAP_SYNC
>  #define MAP_SYNC 0
>  #endif
>  #ifndef MAP_SHARED_VALIDATE
>  #define MAP_SHARED_VALIDATE 0
>  #endif
>
>Either way:

You mean replace the above code to:

#ifdef CONFIG_LINUX
#include <linux/mman.h>
#endif /* CONFIG_LINUX */

#ifndef MAP_SYNC
#define MAP_SYNC 0
#endif
#ifndef MAP_SHARED_VALIDATE
#define MAP_SHARED_VALIDATE 0
#endif

>
>Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>



-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-24  1:01       ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2019-04-24  1:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: pagupta, xiaoguangrong.eric, mst, qemu-devel, yi.z.zhang,
	yu.c.zhang, Wei Yang, Haozhong Zhang, imammedo, pbonzini,
	dan.j.williams, ehabkost

On Tue, Apr 23, 2019 at 10:25:18AM +0100, Stefan Hajnoczi wrote:
[...]
>> +#ifdef CONFIG_LINUX
>> +#include <linux/mman.h>
>> +#else  /* !CONFIG_LINUX */
>> +#define MAP_SYNC              0x0
>> +#define MAP_SHARED_VALIDATE   0x0
>> +#endif /* CONFIG_LINUX */
>
>MAP_SHARED_VALIDATE is is from 2017:
>
>  commit 1c9725974074a047f6080eecc62c50a8e840d050
>  Author: Dan Williams <dan.j.williams@intel.com>
>  Date:   Wed Nov 1 16:36:30 2017 +0100
>
>    mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags
>
>This code fails to compile on Linux hosts with pre-4.15 headers.
>

Ok, qemu build will fail on pre-4.15 linux.

>Instead you could use the following (even on Linux!):
>
>  #ifndef MAP_SYNC
>  #define MAP_SYNC 0
>  #endif
>  #ifndef MAP_SHARED_VALIDATE
>  #define MAP_SHARED_VALIDATE 0
>  #endif
>
>Either way:

You mean replace the above code to:

#ifdef CONFIG_LINUX
#include <linux/mman.h>
#endif /* CONFIG_LINUX */

#ifndef MAP_SYNC
#define MAP_SYNC 0
#endif
#ifndef MAP_SHARED_VALIDATE
#define MAP_SHARED_VALIDATE 0
#endif

>
>Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>



-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-25  8:26         ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2019-04-25  8:26 UTC (permalink / raw)
  To: Wei Yang
  Cc: Stefan Hajnoczi, pagupta, xiaoguangrong.eric, mst, qemu-devel,
	yi.z.zhang, yu.c.zhang, Haozhong Zhang, imammedo, pbonzini,
	dan.j.williams, ehabkost

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

On Wed, Apr 24, 2019 at 09:01:05AM +0800, Wei Yang wrote:
> On Tue, Apr 23, 2019 at 10:25:18AM +0100, Stefan Hajnoczi wrote:
> [...]
> >> +#ifdef CONFIG_LINUX
> >> +#include <linux/mman.h>
> >> +#else  /* !CONFIG_LINUX */
> >> +#define MAP_SYNC              0x0
> >> +#define MAP_SHARED_VALIDATE   0x0
> >> +#endif /* CONFIG_LINUX */
> >
> >MAP_SHARED_VALIDATE is is from 2017:
> >
> >  commit 1c9725974074a047f6080eecc62c50a8e840d050
> >  Author: Dan Williams <dan.j.williams@intel.com>
> >  Date:   Wed Nov 1 16:36:30 2017 +0100
> >
> >    mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags
> >
> >This code fails to compile on Linux hosts with pre-4.15 headers.
> >
> 
> Ok, qemu build will fail on pre-4.15 linux.
> 
> >Instead you could use the following (even on Linux!):
> >
> >  #ifndef MAP_SYNC
> >  #define MAP_SYNC 0
> >  #endif
> >  #ifndef MAP_SHARED_VALIDATE
> >  #define MAP_SHARED_VALIDATE 0
> >  #endif
> >
> >Either way:
> 
> You mean replace the above code to:
> 
> #ifdef CONFIG_LINUX
> #include <linux/mman.h>
> #endif /* CONFIG_LINUX */
> 
> #ifndef MAP_SYNC
> #define MAP_SYNC 0
> #endif
> #ifndef MAP_SHARED_VALIDATE
> #define MAP_SHARED_VALIDATE 0
> #endif

Yes, please.

This is not critical but I bet someone will hit this issue when
compiling a new QEMU on an old host.  This small tweak will save them
time.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
@ 2019-04-25  8:26         ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2019-04-25  8:26 UTC (permalink / raw)
  To: Wei Yang
  Cc: pagupta, xiaoguangrong.eric, mst, Haozhong Zhang, qemu-devel,
	yi.z.zhang, yu.c.zhang, Stefan Hajnoczi, pbonzini, imammedo,
	dan.j.williams, ehabkost

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

On Wed, Apr 24, 2019 at 09:01:05AM +0800, Wei Yang wrote:
> On Tue, Apr 23, 2019 at 10:25:18AM +0100, Stefan Hajnoczi wrote:
> [...]
> >> +#ifdef CONFIG_LINUX
> >> +#include <linux/mman.h>
> >> +#else  /* !CONFIG_LINUX */
> >> +#define MAP_SYNC              0x0
> >> +#define MAP_SHARED_VALIDATE   0x0
> >> +#endif /* CONFIG_LINUX */
> >
> >MAP_SHARED_VALIDATE is is from 2017:
> >
> >  commit 1c9725974074a047f6080eecc62c50a8e840d050
> >  Author: Dan Williams <dan.j.williams@intel.com>
> >  Date:   Wed Nov 1 16:36:30 2017 +0100
> >
> >    mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags
> >
> >This code fails to compile on Linux hosts with pre-4.15 headers.
> >
> 
> Ok, qemu build will fail on pre-4.15 linux.
> 
> >Instead you could use the following (even on Linux!):
> >
> >  #ifndef MAP_SYNC
> >  #define MAP_SYNC 0
> >  #endif
> >  #ifndef MAP_SHARED_VALIDATE
> >  #define MAP_SHARED_VALIDATE 0
> >  #endif
> >
> >Either way:
> 
> You mean replace the above code to:
> 
> #ifdef CONFIG_LINUX
> #include <linux/mman.h>
> #endif /* CONFIG_LINUX */
> 
> #ifndef MAP_SYNC
> #define MAP_SYNC 0
> #endif
> #ifndef MAP_SHARED_VALIDATE
> #define MAP_SHARED_VALIDATE 0
> #endif

Yes, please.

This is not critical but I bet someone will hit this issue when
compiling a new QEMU on an old host.  This small tweak will save them
time.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-04-25  8:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-22  0:48 [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file Wei Yang
2019-04-22  0:48 ` Wei Yang
2019-04-22  0:48 ` [Qemu-devel] [PATCH v14 1/2] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Wei Yang
2019-04-22  0:48   ` Wei Yang
2019-04-23  9:25   ` Stefan Hajnoczi
2019-04-23  9:25     ` Stefan Hajnoczi
2019-04-24  1:01     ` Wei Yang
2019-04-24  1:01       ` Wei Yang
2019-04-25  8:26       ` Stefan Hajnoczi
2019-04-25  8:26         ` Stefan Hajnoczi
2019-04-22  0:48 ` [Qemu-devel] [PATCH v14 2/2] docs: Added MAP_SYNC documentation Wei Yang
2019-04-22  0:48   ` Wei Yang
2019-04-23  9:26   ` Stefan Hajnoczi
2019-04-23  9:26     ` Stefan Hajnoczi
2019-04-23  9:57   ` Pankaj Gupta
2019-04-23  9:57     ` Pankaj Gupta
2019-04-22 12:34 ` [Qemu-devel] [PATCH v14 0/2] support MAP_SYNC for memory-backend-file Michael S. Tsirkin
2019-04-22 12:34   ` Michael S. Tsirkin
2019-04-22 18:22   ` Eduardo Habkost
2019-04-22 18:22     ` Eduardo Habkost
2019-04-23  2:41     ` Wei Yang
2019-04-23  2:41       ` Wei Yang
2019-04-23 12:43     ` Michael S. Tsirkin
2019-04-23 12:43       ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.