All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file
@ 2019-02-06 17:25 Zhang, Yi
  2019-02-06 17:26 ` [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap Zhang, Yi
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:25 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang, Yi

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Test with below cases:
1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
   1) start VM1 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
   
   2) start VM2 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   3) live migrate from VM1 to VM2.
   
   4) Suddly let Host crash or power failure.

   5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.

   b: backend is a regular file.
   1) start with options
   -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   will warning "failed to validate with mapping flags: Operation not supported"
   FILE_1 and FILE_2 random corrupt.

2. Other cases:
   FILE_1 and FILE_2 random corrupt.

Changes in V12:
 * 2/5: Micheal: Update update-linux-headers.sh
 * 3/5: Micheal: Use script update add linux/mman.h
 * 4/5: Pankaj,Micheal: 1) fallback to mmap without
        MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
	2) Replace the include with 3/5 added linux/mman.h
 * 5/5: Micheal: Refine the Documentations.

Changes in V11:
 * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
 * 2/3: Micheal: Fix the compatibility for old kernel.
 * 2/3&3/3: Micheal&Eduardo :Update the behavior below: 
   Waning at no-dax and continue without MAP_SYNC.
   Test if fails again for compatibility, then remove the MAP_VALIDATE and
   silently proceed.

Changes in V10:
 * 4/4: refine the document.
 * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
 * 2/4: Fix the wrong include header

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (5):
  util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
  scripts/update-linux-headers: add linux/mman.h
  linux-headers: add linux/mman.h.
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

 docs/nvdimm.txt                            |  25 ++++++-
 exec.c                                     |   2 +-
 include/qemu/mmap-alloc.h                  |  21 +++++-
 include/qemu/osdep.h                       |   7 ++
 linux-headers/asm-arm/mman.h               |   4 ++
 linux-headers/asm-arm64/mman.h             |   1 +
 linux-headers/asm-generic/hugetlb_encode.h |  36 ++++++++++
 linux-headers/asm-generic/mman-common.h    |  77 ++++++++++++++++++++
 linux-headers/asm-generic/mman.h           |  24 +++++++
 linux-headers/asm-mips/mman.h              | 108 +++++++++++++++++++++++++++++
 linux-headers/asm-powerpc/mman.h           |  39 +++++++++++
 linux-headers/asm-s390/mman.h              |   1 +
 linux-headers/asm-x86/mman.h               |  31 +++++++++
 linux-headers/linux/mman.h                 |  38 ++++++++++
 qemu-options.hx                            |   4 ++
 scripts/update-linux-headers.sh            |   6 +-
 util/mmap-alloc.c                          |  30 +++++++-
 util/oslib-posix.c                         |   2 +-
 18 files changed, 445 insertions(+), 11 deletions(-)
 create mode 100644 linux-headers/asm-arm/mman.h
 create mode 100644 linux-headers/asm-arm64/mman.h
 create mode 100644 linux-headers/asm-generic/hugetlb_encode.h
 create mode 100644 linux-headers/asm-generic/mman-common.h
 create mode 100644 linux-headers/asm-generic/mman.h
 create mode 100644 linux-headers/asm-mips/mman.h
 create mode 100644 linux-headers/asm-powerpc/mman.h
 create mode 100644 linux-headers/asm-s390/mman.h
 create mode 100644 linux-headers/asm-x86/mman.h
 create mode 100644 linux-headers/linux/mman.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
  2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
@ 2019-02-06 17:26 ` Zhang, Yi
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h Zhang, Yi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:26 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 exec.c                    |  2 +-
 include/qemu/mmap-alloc.h | 21 ++++++++++++++++++++-
 util/mmap-alloc.c         |  6 +++++-
 util/oslib-posix.c        |  2 +-
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..27cea52 100644
--- a/exec.c
+++ b/exec.c
@@ -1860,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
     }
 
     area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED);
+                         block->flags & RAM_SHARED, block->flags & RAM_PMEM);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..190688a 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,26 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *          otherwise, the alignment in use will be determined by QEMU.
+ *  @shared: map has RAM_SHARED flag.
+ *  @is_pmem: map has RAM_PMEM flag.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd,
+                    size_t size,
+                    size_t align,
+                    bool shared,
+                    bool is_pmem);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..97bbeed 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -75,7 +75,11 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd,
+                    size_t size,
+                    size_t align,
+                    bool shared,
+                    bool is_pmem)
 {
     /*
      * Note: this always allocates at least one extra page of virtual address
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..040937f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,7 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, shared);
+    void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h
  2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
  2019-02-06 17:26 ` [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 3/5] linux-headers: " Zhang, Yi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

Add linux/mman.h,asm/mman.h,asm/mman-common.h to linux-headers,
So we can use more mmap2 flags.

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 scripts/update-linux-headers.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 0a964fe..57db5d9 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -95,7 +95,7 @@ for arch in $ARCHLIST; do
 
     rm -rf "$output/linux-headers/asm-$arch"
     mkdir -p "$output/linux-headers/asm-$arch"
-    for header in kvm.h unistd.h bitsperlong.h; do
+    for header in kvm.h unistd.h bitsperlong.h mman.h; do
         cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
     done
 
@@ -126,13 +126,13 @@ done
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
 for header in kvm.h vfio.h vfio_ccw.h vhost.h \
-              psci.h psp-sev.h userfaultfd.h; do
+              psci.h psp-sev.h userfaultfd.h mman.h; do
     cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 
 rm -rf "$output/linux-headers/asm-generic"
 mkdir -p "$output/linux-headers/asm-generic"
-for header in unistd.h bitsperlong.h; do
+for header in unistd.h bitsperlong.h mman-common.h mman.h hugetlb_encode.h; do
     cp "$tmpdir/include/asm-generic/$header" "$output/linux-headers/asm-generic"
 done
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel]  [PATCH V12 3/5] linux-headers: add linux/mman.h.
  2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
  2019-02-06 17:26 ` [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap Zhang, Yi
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation Zhang, Yi
  4 siblings, 0 replies; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

Update it to 4.20-rc1

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 linux-headers/asm-arm/mman.h               |   4 ++
 linux-headers/asm-arm64/mman.h             |   1 +
 linux-headers/asm-generic/hugetlb_encode.h |  36 ++++++++++
 linux-headers/asm-generic/mman-common.h    |  77 ++++++++++++++++++++
 linux-headers/asm-generic/mman.h           |  24 +++++++
 linux-headers/asm-mips/mman.h              | 108 +++++++++++++++++++++++++++++
 linux-headers/asm-powerpc/mman.h           |  39 +++++++++++
 linux-headers/asm-s390/mman.h              |   1 +
 linux-headers/asm-x86/mman.h               |  31 +++++++++
 linux-headers/linux/mman.h                 |  38 ++++++++++
 10 files changed, 359 insertions(+)
 create mode 100644 linux-headers/asm-arm/mman.h
 create mode 100644 linux-headers/asm-arm64/mman.h
 create mode 100644 linux-headers/asm-generic/hugetlb_encode.h
 create mode 100644 linux-headers/asm-generic/mman-common.h
 create mode 100644 linux-headers/asm-generic/mman.h
 create mode 100644 linux-headers/asm-mips/mman.h
 create mode 100644 linux-headers/asm-powerpc/mman.h
 create mode 100644 linux-headers/asm-s390/mman.h
 create mode 100644 linux-headers/asm-x86/mman.h
 create mode 100644 linux-headers/linux/mman.h

diff --git a/linux-headers/asm-arm/mman.h b/linux-headers/asm-arm/mman.h
new file mode 100644
index 0000000..41f99c5
--- /dev/null
+++ b/linux-headers/asm-arm/mman.h
@@ -0,0 +1,4 @@
+#include <asm-generic/mman.h>
+
+#define arch_mmap_check(addr, len, flags) \
+	(((flags) & MAP_FIXED && (addr) < FIRST_USER_ADDRESS) ? -EINVAL : 0)
diff --git a/linux-headers/asm-arm64/mman.h b/linux-headers/asm-arm64/mman.h
new file mode 100644
index 0000000..8eebf89
--- /dev/null
+++ b/linux-headers/asm-arm64/mman.h
@@ -0,0 +1 @@
+#include <asm-generic/mman.h>
diff --git a/linux-headers/asm-generic/hugetlb_encode.h b/linux-headers/asm-generic/hugetlb_encode.h
new file mode 100644
index 0000000..b0f8e87
--- /dev/null
+++ b/linux-headers/asm-generic/hugetlb_encode.h
@@ -0,0 +1,36 @@
+#ifndef _ASM_GENERIC_HUGETLB_ENCODE_H_
+#define _ASM_GENERIC_HUGETLB_ENCODE_H_
+
+/*
+ * Several system calls take a flag to request "hugetlb" huge pages.
+ * Without further specification, these system calls will use the
+ * system's default huge page size.  If a system supports multiple
+ * huge page sizes, the desired huge page size can be specified in
+ * bits [26:31] of the flag arguments.  The value in these 6 bits
+ * will encode the log2 of the huge page size.
+ *
+ * The following definitions are associated with this huge page size
+ * encoding in flag arguments.  System call specific header files
+ * that use this encoding should include this file.  They can then
+ * provide definitions based on these with their own specific prefix.
+ * for example:
+ * #define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+ */
+
+#define HUGETLB_FLAG_ENCODE_SHIFT	26
+#define HUGETLB_FLAG_ENCODE_MASK	0x3f
+
+#define HUGETLB_FLAG_ENCODE_64KB	(16 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512KB	(19 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1MB		(20 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2MB		(21 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_8MB		(23 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16MB	(24 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_32MB	(25 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_256MB	(28 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512MB	(29 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1GB		(30 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2GB		(31 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16GB	(34 << HUGETLB_FLAG_ENCODE_SHIFT)
+
+#endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */
diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
new file mode 100644
index 0000000..e7ee328
--- /dev/null
+++ b/linux-headers/asm-generic/mman-common.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __ASM_GENERIC_MMAN_COMMON_H
+#define __ASM_GENERIC_MMAN_COMMON_H
+
+/*
+ Author: Michael S. Tsirkin <mst@mellanox.co.il>, Mellanox Technologies Ltd.
+ Based on: asm-xxx/mman.h
+*/
+
+#define PROT_READ	0x1		/* page can be read */
+#define PROT_WRITE	0x2		/* page can be written */
+#define PROT_EXEC	0x4		/* page can be executed */
+#define PROT_SEM	0x8		/* page may be used for atomic ops */
+#define PROT_NONE	0x0		/* page can not be accessed */
+#define PROT_GROWSDOWN	0x01000000	/* mprotect flag: extend change to start of growsdown vma */
+#define PROT_GROWSUP	0x02000000	/* mprotect flag: extend change to end of growsup vma */
+
+#define MAP_SHARED	0x01		/* Share changes */
+#define MAP_PRIVATE	0x02		/* Changes are private */
+#define MAP_SHARED_VALIDATE 0x03	/* share + validate extension flags */
+#define MAP_TYPE	0x0f		/* Mask for type of mapping */
+#define MAP_FIXED	0x10		/* Interpret addr exactly */
+#define MAP_ANONYMOUS	0x20		/* don't use a file */
+#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
+# define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be uninitialized */
+#else
+# define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
+#endif
+
+/* 0x0100 - 0x80000 flags are defined in asm-generic/mman.h */
+#define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT	0x01		/* Lock pages in range after they are faulted in, do not prefault */
+
+#define MS_ASYNC	1		/* sync memory asynchronously */
+#define MS_INVALIDATE	2		/* invalidate the caches */
+#define MS_SYNC		4		/* synchronous memory sync */
+
+#define MADV_NORMAL	0		/* no further special treatment */
+#define MADV_RANDOM	1		/* expect random page references */
+#define MADV_SEQUENTIAL	2		/* expect sequential page references */
+#define MADV_WILLNEED	3		/* will need these pages */
+#define MADV_DONTNEED	4		/* don't need these pages */
+
+/* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
+#define MADV_REMOVE	9		/* remove these pages & resources */
+#define MADV_DONTFORK	10		/* don't inherit across fork */
+#define MADV_DOFORK	11		/* do inherit across fork */
+#define MADV_HWPOISON	100		/* poison a page for testing */
+#define MADV_SOFT_OFFLINE 101		/* soft offline page for testing */
+
+#define MADV_MERGEABLE   12		/* KSM may merge identical pages */
+#define MADV_UNMERGEABLE 13		/* KSM may not merge identical pages */
+
+#define MADV_HUGEPAGE	14		/* Worth backing with hugepages */
+#define MADV_NOHUGEPAGE	15		/* Not worth backing with hugepages */
+
+#define MADV_DONTDUMP   16		/* Explicity exclude from the core dump,
+					   overrides the coredump filter bits */
+#define MADV_DODUMP	17		/* Clear the MADV_DONTDUMP flag */
+
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
+/* compatibility flags */
+#define MAP_FILE	0
+
+#define PKEY_DISABLE_ACCESS	0x1
+#define PKEY_DISABLE_WRITE	0x2
+#define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
+				 PKEY_DISABLE_WRITE)
+
+#endif /* __ASM_GENERIC_MMAN_COMMON_H */
diff --git a/linux-headers/asm-generic/mman.h b/linux-headers/asm-generic/mman.h
new file mode 100644
index 0000000..653687d
--- /dev/null
+++ b/linux-headers/asm-generic/mman.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __ASM_GENERIC_MMAN_H
+#define __ASM_GENERIC_MMAN_H
+
+#include <asm-generic/mman-common.h>
+
+#define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_DENYWRITE	0x0800		/* ETXTBSY */
+#define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
+#define MAP_LOCKED	0x2000		/* pages are locked */
+#define MAP_NORESERVE	0x4000		/* don't check for reservations */
+#define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
+#define MAP_NONBLOCK	0x10000		/* do not block on IO */
+#define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
+#define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
+
+/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
+
+#define MCL_CURRENT	1		/* lock all current mappings */
+#define MCL_FUTURE	2		/* lock all future mappings */
+#define MCL_ONFAULT	4		/* lock all pages that are faulted in */
+
+#endif /* __ASM_GENERIC_MMAN_H */
diff --git a/linux-headers/asm-mips/mman.h b/linux-headers/asm-mips/mman.h
new file mode 100644
index 0000000..3035ca4
--- /dev/null
+++ b/linux-headers/asm-mips/mman.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 1995, 1999, 2002 by Ralf Baechle
+ */
+#ifndef _ASM_MMAN_H
+#define _ASM_MMAN_H
+
+/*
+ * Protections are chosen from these bits, OR'd together.  The
+ * implementation does not necessarily support PROT_EXEC or PROT_WRITE
+ * without PROT_READ.  The only guarantees are that no writing will be
+ * allowed without PROT_WRITE and no access will be allowed for PROT_NONE.
+ */
+#define PROT_NONE	0x00		/* page can not be accessed */
+#define PROT_READ	0x01		/* page can be read */
+#define PROT_WRITE	0x02		/* page can be written */
+#define PROT_EXEC	0x04		/* page can be executed */
+/*			0x08		   reserved for PROT_EXEC_NOFLUSH */
+#define PROT_SEM	0x10		/* page may be used for atomic ops */
+#define PROT_GROWSDOWN	0x01000000	/* mprotect flag: extend change to start of growsdown vma */
+#define PROT_GROWSUP	0x02000000	/* mprotect flag: extend change to end of growsup vma */
+
+/*
+ * Flags for mmap
+ */
+#define MAP_SHARED	0x001		/* Share changes */
+#define MAP_PRIVATE	0x002		/* Changes are private */
+#define MAP_SHARED_VALIDATE 0x003	/* share + validate extension flags */
+#define MAP_TYPE	0x00f		/* Mask for type of mapping */
+#define MAP_FIXED	0x010		/* Interpret addr exactly */
+
+/* not used by linux, but here to make sure we don't clash with ABI defines */
+#define MAP_RENAME	0x020		/* Assign page to file */
+#define MAP_AUTOGROW	0x040		/* File may grow by writing */
+#define MAP_LOCAL	0x080		/* Copy on fork/sproc */
+#define MAP_AUTORSRV	0x100		/* Logical swap reserved on demand */
+
+/* These are linux-specific */
+#define MAP_NORESERVE	0x0400		/* don't check for reservations */
+#define MAP_ANONYMOUS	0x0800		/* don't use a file */
+#define MAP_GROWSDOWN	0x1000		/* stack-like segment */
+#define MAP_DENYWRITE	0x2000		/* ETXTBSY */
+#define MAP_EXECUTABLE	0x4000		/* mark it as an executable */
+#define MAP_LOCKED	0x8000		/* pages are locked */
+#define MAP_POPULATE	0x10000		/* populate (prefault) pagetables */
+#define MAP_NONBLOCK	0x20000		/* do not block on IO */
+#define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
+#define MAP_HUGETLB	0x80000		/* create a huge page mapping */
+#define MAP_FIXED_NOREPLACE 0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
+/*
+ * Flags for msync
+ */
+#define MS_ASYNC	0x0001		/* sync memory asynchronously */
+#define MS_INVALIDATE	0x0002		/* invalidate mappings & caches */
+#define MS_SYNC		0x0004		/* synchronous memory sync */
+
+/*
+ * Flags for mlockall
+ */
+#define MCL_CURRENT	1		/* lock all current mappings */
+#define MCL_FUTURE	2		/* lock all future mappings */
+#define MCL_ONFAULT	4		/* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT	0x01		/* Lock pages in range after they are faulted in, do not prefault */
+
+#define MADV_NORMAL	0		/* no further special treatment */
+#define MADV_RANDOM	1		/* expect random page references */
+#define MADV_SEQUENTIAL 2		/* expect sequential page references */
+#define MADV_WILLNEED	3		/* will need these pages */
+#define MADV_DONTNEED	4		/* don't need these pages */
+
+/* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
+#define MADV_REMOVE	9		/* remove these pages & resources */
+#define MADV_DONTFORK	10		/* don't inherit across fork */
+#define MADV_DOFORK	11		/* do inherit across fork */
+
+#define MADV_MERGEABLE	 12		/* KSM may merge identical pages */
+#define MADV_UNMERGEABLE 13		/* KSM may not merge identical pages */
+#define MADV_HWPOISON	 100		/* poison a page for testing */
+
+#define MADV_HUGEPAGE	14		/* Worth backing with hugepages */
+#define MADV_NOHUGEPAGE 15		/* Not worth backing with hugepages */
+
+#define MADV_DONTDUMP	16		/* Explicity exclude from the core dump,
+					   overrides the coredump filter bits */
+#define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
+
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
+/* compatibility flags */
+#define MAP_FILE	0
+
+#define PKEY_DISABLE_ACCESS	0x1
+#define PKEY_DISABLE_WRITE	0x2
+#define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
+				 PKEY_DISABLE_WRITE)
+
+#endif /* _ASM_MMAN_H */
diff --git a/linux-headers/asm-powerpc/mman.h b/linux-headers/asm-powerpc/mman.h
new file mode 100644
index 0000000..1c2b3fc
--- /dev/null
+++ b/linux-headers/asm-powerpc/mman.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _ASM_POWERPC_MMAN_H
+#define _ASM_POWERPC_MMAN_H
+
+#include <asm-generic/mman-common.h>
+
+
+#define PROT_SAO	0x10		/* Strong Access Ordering */
+
+#define MAP_RENAME      MAP_ANONYMOUS   /* In SunOS terminology */
+#define MAP_NORESERVE   0x40            /* don't reserve swap pages */
+#define MAP_LOCKED	0x80
+
+#define MAP_GROWSDOWN	0x0100		/* stack-like segment */
+#define MAP_DENYWRITE	0x0800		/* ETXTBSY */
+#define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
+
+#define MCL_CURRENT     0x2000          /* lock all currently mapped pages */
+#define MCL_FUTURE      0x4000          /* lock all additions to address space */
+#define MCL_ONFAULT	0x8000		/* lock all pages that are faulted in */
+
+#define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
+#define MAP_NONBLOCK	0x10000		/* do not block on IO */
+#define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
+#define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+
+/* Override any generic PKEY permission defines */
+#define PKEY_DISABLE_EXECUTE   0x4
+#undef PKEY_ACCESS_MASK
+#define PKEY_ACCESS_MASK       (PKEY_DISABLE_ACCESS |\
+				PKEY_DISABLE_WRITE  |\
+				PKEY_DISABLE_EXECUTE)
+#endif /* _ASM_POWERPC_MMAN_H */
diff --git a/linux-headers/asm-s390/mman.h b/linux-headers/asm-s390/mman.h
new file mode 100644
index 0000000..8eebf89
--- /dev/null
+++ b/linux-headers/asm-s390/mman.h
@@ -0,0 +1 @@
+#include <asm-generic/mman.h>
diff --git a/linux-headers/asm-x86/mman.h b/linux-headers/asm-x86/mman.h
new file mode 100644
index 0000000..d4a8d04
--- /dev/null
+++ b/linux-headers/asm-x86/mman.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _ASM_X86_MMAN_H
+#define _ASM_X86_MMAN_H
+
+#define MAP_32BIT	0x40		/* only give out 32bit addresses */
+
+#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+/*
+ * Take the 4 protection key bits out of the vma->vm_flags
+ * value and turn them in to the bits that we can put in
+ * to a pte.
+ *
+ * Only override these if Protection Keys are available
+ * (which is only on 64-bit).
+ */
+#define arch_vm_get_page_prot(vm_flags)	__pgprot(	\
+		((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) |	\
+		((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) |	\
+		((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) |	\
+		((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0))
+
+#define arch_calc_vm_prot_bits(prot, key) (		\
+		((key) & 0x1 ? VM_PKEY_BIT0 : 0) |      \
+		((key) & 0x2 ? VM_PKEY_BIT1 : 0) |      \
+		((key) & 0x4 ? VM_PKEY_BIT2 : 0) |      \
+		((key) & 0x8 ? VM_PKEY_BIT3 : 0))
+#endif
+
+#include <asm-generic/mman.h>
+
+#endif /* _ASM_X86_MMAN_H */
diff --git a/linux-headers/linux/mman.h b/linux-headers/linux/mman.h
new file mode 100644
index 0000000..3c44b6f
--- /dev/null
+++ b/linux-headers/linux/mman.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_MMAN_H
+#define _LINUX_MMAN_H
+
+#include <asm/mman.h>
+#include <asm-generic/hugetlb_encode.h>
+
+#define MREMAP_MAYMOVE	1
+#define MREMAP_FIXED	2
+
+#define OVERCOMMIT_GUESS		0
+#define OVERCOMMIT_ALWAYS		1
+#define OVERCOMMIT_NEVER		2
+
+/*
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired.  See hugetlb_encode.h.
+ * All known huge page size encodings are provided here.  It is the
+ * responsibility of the application to know which sizes are supported on
+ * the running system.  See mmap(2) man page for details.
+ */
+#define MAP_HUGE_SHIFT	HUGETLB_FLAG_ENCODE_SHIFT
+#define MAP_HUGE_MASK	HUGETLB_FLAG_ENCODE_MASK
+
+#define MAP_HUGE_64KB	HUGETLB_FLAG_ENCODE_64KB
+#define MAP_HUGE_512KB	HUGETLB_FLAG_ENCODE_512KB
+#define MAP_HUGE_1MB	HUGETLB_FLAG_ENCODE_1MB
+#define MAP_HUGE_2MB	HUGETLB_FLAG_ENCODE_2MB
+#define MAP_HUGE_8MB	HUGETLB_FLAG_ENCODE_8MB
+#define MAP_HUGE_16MB	HUGETLB_FLAG_ENCODE_16MB
+#define MAP_HUGE_32MB	HUGETLB_FLAG_ENCODE_32MB
+#define MAP_HUGE_256MB	HUGETLB_FLAG_ENCODE_256MB
+#define MAP_HUGE_512MB	HUGETLB_FLAG_ENCODE_512MB
+#define MAP_HUGE_1GB	HUGETLB_FLAG_ENCODE_1GB
+#define MAP_HUGE_2GB	HUGETLB_FLAG_ENCODE_2GB
+#define MAP_HUGE_16GB	HUGETLB_FLAG_ENCODE_16GB
+
+#endif /* _LINUX_MMAN_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
                   ` (2 preceding siblings ...)
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 3/5] linux-headers: " Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
  2019-02-06 18:25   ` Michael S. Tsirkin
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation Zhang, Yi
  4 siblings, 1 reply; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Current, We have below different possible use cases:

1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
    - MAP_SYNC will active.
   b: backend is not a dax supporting file.
    - mmap will trigger a warning. then MAP_SYNC flag will be ignored

2. The rest of cases:
   - we will never pass the MAP_SYNC to mmap2

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 include/qemu/osdep.h |  7 +++++++
 util/mmap-alloc.c    | 24 +++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..9a94cc3 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -115,6 +115,13 @@ extern int daemon(int, int);
 #include "sysemu/os-win32.h"
 #endif
 
+#ifdef CONFIG_LINUX
+#include <linux/mman.h>
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC              0x0
+#define MAP_SHARED_VALIDATE   0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 #include "sysemu/os-posix.h"
 #endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 97bbeed..e4e55fc 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -15,6 +15,7 @@
 #include "qemu/host-utils.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
+#define MAP_SYNC_FLAGS        (MAP_SYNC | MAP_SHARED_VALIDATE)
 
 #ifdef CONFIG_LINUX
 #include <sys/vfs.h>
@@ -101,6 +102,7 @@ void *qemu_ram_mmap(int fd,
 #else
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+    int mmap_flags;
     size_t offset;
     void *ptr1;
 
@@ -111,13 +113,33 @@ void *qemu_ram_mmap(int fd,
     assert(is_power_of_2(align));
     /* Always align to host page size */
     assert(align >= getpagesize());
+    mmap_flags = shared ? MAP_SHARED : MAP_PRIVATE;
+    if (shared && is_pmem) {
+        mmap_flags |= MAP_SYNC_FLAGS;
+    }
 
     offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
     ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
                 MAP_FIXED |
                 (fd == -1 ? MAP_ANONYMOUS : 0) |
-                (shared ? MAP_SHARED : MAP_PRIVATE),
+                mmap_flags,
                 fd, 0);
+
+
+    if (ptr1 == MAP_FAILED &&
+        (mmap_flags & MAP_SYNC_FLAGS) == MAP_SYNC_FLAGS) {
+        if (errno == ENOTSUP) {
+            perror("failed to validate with mapping flags");
+        }
+        /* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
+         * we will remove these flags to handle compatibility.
+         */
+        ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
+                    MAP_FIXED |
+                    (fd == -1 ? MAP_ANONYMOUS : 0) |
+                    MAP_SHARED,
+                    fd, 0);
+    }
     if (ptr1 == MAP_FAILED) {
         munmap(ptr, total);
         return MAP_FAILED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel]  [PATCH V12 5/5] docs: Added MAP_SYNC documentation
  2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
                   ` (3 preceding siblings ...)
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
  2019-02-06 18:29   ` Michael S. Tsirkin
  4 siblings, 1 reply; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, mst, ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

From: Zhang Yi <yi.z.zhang@linux.intel.com>

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 docs/nvdimm.txt | 25 ++++++++++++++++++++++---
 qemu-options.hx |  4 ++++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..e2bf89f 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -143,9 +143,28 @@ Guest Data Persistence
 ----------------------
 
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+both are from the real NVDIMM device, all guest access do not
+involve any host-side kernel cache.
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are silently ignored if not satisfied. Currently, no way
+is provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
 
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..0cd41f4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
 
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
@ 2019-02-06 18:25   ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-02-06 18:25 UTC (permalink / raw)
  To: Zhang, Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Thu, Feb 07, 2019 at 01:27:19AM +0800, Zhang, Yi wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition which can ensure file system metadata
> synced in each guest writes to the backend file, without other QEMU
> actions (e.g., periodic fsync() by QEMU).
> 
> Current, We have below different possible use cases:
> 
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
>    a: backend is a dax supporting file.
>     - MAP_SYNC will active.
>    b: backend is not a dax supporting file.
>     - mmap will trigger a warning. then MAP_SYNC flag will be ignored
> 
> 2. The rest of cases:
>    - we will never pass the MAP_SYNC to mmap2
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  include/qemu/osdep.h |  7 +++++++
>  util/mmap-alloc.c    | 24 +++++++++++++++++++++++-
>  2 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 457d24e..9a94cc3 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -115,6 +115,13 @@ extern int daemon(int, int);
>  #include "sysemu/os-win32.h"
>  #endif
>  
> +#ifdef CONFIG_LINUX
> +#include <linux/mman.h>
> +#else  /* !CONFIG_LINUX */
> +#define MAP_SYNC              0x0
> +#define MAP_SHARED_VALIDATE   0x0
> +#endif /* CONFIG_LINUX */
> +
>  #ifdef CONFIG_POSIX
>  #include "sysemu/os-posix.h"
>  #endif

It's only used in one place. Maybe put this code in mmap-alloc.c ?

> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 97bbeed..e4e55fc 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -15,6 +15,7 @@
>  #include "qemu/host-utils.h"
>  
>  #define HUGETLBFS_MAGIC       0x958458f6
> +#define MAP_SYNC_FLAGS        (MAP_SYNC | MAP_SHARED_VALIDATE)
>  

Pls don't do this, just put it in a local variable within qemu_ram_mmap.

>  #ifdef CONFIG_LINUX
>  #include <sys/vfs.h>
> @@ -101,6 +102,7 @@ void *qemu_ram_mmap(int fd,
>  #else
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
> +    int mmap_flags;
>      size_t offset;
>      void *ptr1;
>  
> @@ -111,13 +113,33 @@ void *qemu_ram_mmap(int fd,
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
>      assert(align >= getpagesize());
> +    mmap_flags = shared ? MAP_SHARED : MAP_PRIVATE;
> +    if (shared && is_pmem) {
> +        mmap_flags |= MAP_SYNC_FLAGS;
> +    }
>  
>      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
>      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
>                  MAP_FIXED |
>                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> -                (shared ? MAP_SHARED : MAP_PRIVATE),
> +                mmap_flags,
>                  fd, 0);
> +
> +
> +    if (ptr1 == MAP_FAILED &&
> +        (mmap_flags & MAP_SYNC_FLAGS) == MAP_SYNC_FLAGS) {
> +        if (errno == ENOTSUP) {
> +            perror("failed to validate with mapping flags");

I don't think this warning message makes sense.
Are you trying to say:
                Warning: requesting persistence across crashes
		for file XYZ failed. Proceeding without persistence,
		data might become corrupted in case of host crash.
?

> +        }
> +        /* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
> +         * we will remove these flags to handle compatibility.
> +         */
> +        ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> +                    MAP_FIXED |
> +                    (fd == -1 ? MAP_ANONYMOUS : 0) |
> +                    MAP_SHARED,
> +                    fd, 0);
> +    }
>      if (ptr1 == MAP_FAILED) {
>          munmap(ptr, total);
>          return MAP_FAILED;
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
  2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation Zhang, Yi
@ 2019-02-06 18:29   ` Michael S. Tsirkin
  2019-02-07 15:16     ` Yi Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-02-06 18:29 UTC (permalink / raw)
  To: Zhang, Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  docs/nvdimm.txt | 25 ++++++++++++++++++++++---
>  qemu-options.hx |  4 ++++
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 5f158a6..e2bf89f 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -143,9 +143,28 @@ Guest Data Persistence
>  ----------------------
>  
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> -which all guest access do not involve any host-side kernel cache.
> +the only backend that can guarantee the guest write persistence is:
> +
> +A. DAX device (e.g., /dev/dax0.0, ) or
> +B. DAX file(mounted with dax option)
> +
> +both are from the real NVDIMM device, all guest access do not
> +involve any host-side kernel cache.

I'm not sure - what do above 2 lines mean?
That cache must not be used if persistence is desired?

> +
> +When using B (A file supporting direct mapping of persistent memory)
> +as a backend, write persistence is guaranteed if the host kernel has
> +support for the MAP_SYNC flag in the mmap system call (available
> +since Linux 4.15 and on certain distro kernels) and additionally
> +both 'pmem' and 'share' flags are set to 'on' on the backend.
> +
> +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> +are not set, if the backend file does not support DAX or if MAP_SYNC
> +is not supported by the host kernel, write persistence is not
> +guaranteed after a system crash. For compatibility reasons, these
> +conditions are silently ignored if not satisfied. Currently, no way
> +is provided to test for them.
> +For more details, please reference mmap(2) man page:
> +http://man7.org/linux/man-pages/man2/mmap.2.html.
>  
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08f8516..0cd41f4 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
>  If @option{pmem} is set to 'on', QEMU will take necessary operations to
>  guarantee the persistence of its own writes to @option{mem-path}
>  (e.g. in vNVDIMM label emulation and live migration).
> +Also, we will map the backend-file with MAP_SYNC flag, which can ensure

should be
	which ensures

> +the file metadata is in sync to @option{mem-path}


should be
	for @option{mem-path}

> in case of host crash
> +or a power failure. MAP_SYNC requires support from both the host kernel
> +(since Linux kernel 4.15) and @option{mem-path}


should be
	and the filesystem of @option{mem-path}

> (only files supporting DAX).
>  
>  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
  2019-02-07 15:16     ` Yi Zhang
@ 2019-02-07 14:30       ` Michael S. Tsirkin
  2019-02-08 10:07         ` Yi Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-02-07 14:30 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On Thu, Feb 07, 2019 at 11:16:05PM +0800, Yi Zhang wrote:
> On 2019-02-06 at 13:29:37 -0500, Michael S. Tsirkin wrote:
> > On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > 
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > >  docs/nvdimm.txt | 25 ++++++++++++++++++++++---
> > >  qemu-options.hx |  4 ++++
> > >  2 files changed, 26 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > index 5f158a6..e2bf89f 100644
> > > --- a/docs/nvdimm.txt
> > > +++ b/docs/nvdimm.txt
> > > @@ -143,9 +143,28 @@ Guest Data Persistence
> > >  ----------------------
> > >  
> > >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > -currently the only one that can guarantee the guest write persistence
> > > -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > > -which all guest access do not involve any host-side kernel cache.
> > > +the only backend that can guarantee the guest write persistence is:
> > > +
> > > +A. DAX device (e.g., /dev/dax0.0, ) or
> > > +B. DAX file(mounted with dax option)
> > > +
> > > +both are from the real NVDIMM device, all guest access do not
> > > +involve any host-side kernel cache.
> > 
> > I'm not sure - what do above 2 lines mean?
> > That cache must not be used if persistence is desired?
> same meaning of direct mapping of pmem, 
> Ah, Maybe better to change to:
> "both are backend from the real NVDIMM device, which supportting direct
> mapping of persistent memory." ?

Yes but typos aside it is still unclear - what is this? An extra
condition when persistence is guaranteed?


> > 
> > > +
> > > +When using B (A file supporting direct mapping of persistent memory)
> > > +as a backend, write persistence is guaranteed if the host kernel has
> > > +support for the MAP_SYNC flag in the mmap system call (available
> > > +since Linux 4.15 and on certain distro kernels) and additionally
> > > +both 'pmem' and 'share' flags are set to 'on' on the backend.
> > > +
> > > +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> > > +are not set, if the backend file does not support DAX or if MAP_SYNC
> > > +is not supported by the host kernel, write persistence is not
> > > +guaranteed after a system crash. For compatibility reasons, these
> > > +conditions are silently ignored if not satisfied. Currently, no way
> > > +is provided to test for them.
> > > +For more details, please reference mmap(2) man page:
> > > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> > >  
> > >  When using other types of backends, it's suggested to set 'unarmed'
> > >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 08f8516..0cd41f4 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > >  guarantee the persistence of its own writes to @option{mem-path}
> > >  (e.g. in vNVDIMM label emulation and live migration).
> > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > 
> > should be
> > 	which ensures
> > 
> > > +the file metadata is in sync to @option{mem-path}
> > 
> > 
> > should be
> > 	for @option{mem-path}
> > 
> > > in case of host crash
> > > +or a power failure. MAP_SYNC requires support from both the host kernel
> > > +(since Linux kernel 4.15) and @option{mem-path}
> > 
> > 
> > should be
> > 	and the filesystem of @option{mem-path}
> Thanks, will update it.
> > 
> > > (only files supporting DAX).
> > >  
> > >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > >  
> > > -- 
> > > 2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
  2019-02-06 18:29   ` Michael S. Tsirkin
@ 2019-02-07 15:16     ` Yi Zhang
  2019-02-07 14:30       ` Michael S. Tsirkin
  0 siblings, 1 reply; 11+ messages in thread
From: Yi Zhang @ 2019-02-07 15:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-02-06 at 13:29:37 -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > 
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  docs/nvdimm.txt | 25 ++++++++++++++++++++++---
> >  qemu-options.hx |  4 ++++
> >  2 files changed, 26 insertions(+), 3 deletions(-)
> > 
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index 5f158a6..e2bf89f 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -143,9 +143,28 @@ Guest Data Persistence
> >  ----------------------
> >  
> >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > -currently the only one that can guarantee the guest write persistence
> > -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > -which all guest access do not involve any host-side kernel cache.
> > +the only backend that can guarantee the guest write persistence is:
> > +
> > +A. DAX device (e.g., /dev/dax0.0, ) or
> > +B. DAX file(mounted with dax option)
> > +
> > +both are from the real NVDIMM device, all guest access do not
> > +involve any host-side kernel cache.
> 
> I'm not sure - what do above 2 lines mean?
> That cache must not be used if persistence is desired?
same meaning of direct mapping of pmem, 
Ah, Maybe better to change to:
"both are backend from the real NVDIMM device, which supportting direct
mapping of persistent memory." ?
> 
> > +
> > +When using B (A file supporting direct mapping of persistent memory)
> > +as a backend, write persistence is guaranteed if the host kernel has
> > +support for the MAP_SYNC flag in the mmap system call (available
> > +since Linux 4.15 and on certain distro kernels) and additionally
> > +both 'pmem' and 'share' flags are set to 'on' on the backend.
> > +
> > +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> > +are not set, if the backend file does not support DAX or if MAP_SYNC
> > +is not supported by the host kernel, write persistence is not
> > +guaranteed after a system crash. For compatibility reasons, these
> > +conditions are silently ignored if not satisfied. Currently, no way
> > +is provided to test for them.
> > +For more details, please reference mmap(2) man page:
> > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> >  
> >  When using other types of backends, it's suggested to set 'unarmed'
> >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 08f8516..0cd41f4 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> >  guarantee the persistence of its own writes to @option{mem-path}
> >  (e.g. in vNVDIMM label emulation and live migration).
> > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> 
> should be
> 	which ensures
> 
> > +the file metadata is in sync to @option{mem-path}
> 
> 
> should be
> 	for @option{mem-path}
> 
> > in case of host crash
> > +or a power failure. MAP_SYNC requires support from both the host kernel
> > +(since Linux kernel 4.15) and @option{mem-path}
> 
> 
> should be
> 	and the filesystem of @option{mem-path}
Thanks, will update it.
> 
> > (only files supporting DAX).
> >  
> >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >  
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
  2019-02-07 14:30       ` Michael S. Tsirkin
@ 2019-02-08 10:07         ` Yi Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Yi Zhang @ 2019-02-08 10:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-02-07 at 09:30:12 -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 07, 2019 at 11:16:05PM +0800, Yi Zhang wrote:
> > On 2019-02-06 at 13:29:37 -0500, Michael S. Tsirkin wrote:
> > > On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > 
> > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > ---
> > > >  docs/nvdimm.txt | 25 ++++++++++++++++++++++---
> > > >  qemu-options.hx |  4 ++++
> > > >  2 files changed, 26 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > > index 5f158a6..e2bf89f 100644
> > > > --- a/docs/nvdimm.txt
> > > > +++ b/docs/nvdimm.txt
> > > > @@ -143,9 +143,28 @@ Guest Data Persistence
> > > >  ----------------------
> > > >  
> > > >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > > -currently the only one that can guarantee the guest write persistence
> > > > -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > > > -which all guest access do not involve any host-side kernel cache.
> > > > +the only backend that can guarantee the guest write persistence is:
> > > > +
> > > > +A. DAX device (e.g., /dev/dax0.0, ) or
> > > > +B. DAX file(mounted with dax option)
> > > > +
> > > > +both are from the real NVDIMM device, all guest access do not
> > > > +involve any host-side kernel cache.
Yes, A and B both based on the direct access for files/devices(no page cache)
> > > 
> > > I'm not sure - what do above 2 lines mean?
> > > That cache must not be used if persistence is desired?
> > same meaning of direct mapping of pmem, 
> > Ah, Maybe better to change to:
> > "both are backend from the real NVDIMM device, which supportting direct
> > mapping of persistent memory." ?
> 
> Yes but typos aside it is still unclear - what is this? An extra
> condition when persistence is guaranteed?
> 
> 
> > > 
> > > > +
> > > > +When using B (A file supporting direct mapping of persistent memory)
> > > > +as a backend, write persistence is guaranteed if the host kernel has
> > > > +support for the MAP_SYNC flag in the mmap system call (available
> > > > +since Linux 4.15 and on certain distro kernels) and additionally
> > > > +both 'pmem' and 'share' flags are set to 'on' on the backend.
> > > > +
> > > > +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> > > > +are not set, if the backend file does not support DAX or if MAP_SYNC
> > > > +is not supported by the host kernel, write persistence is not
> > > > +guaranteed after a system crash. For compatibility reasons, these
> > > > +conditions are silently ignored if not satisfied. Currently, no way
> > > > +is provided to test for them.
> > > > +For more details, please reference mmap(2) man page:
> > > > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> > > >  
> > > >  When using other types of backends, it's suggested to set 'unarmed'
> > > >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > index 08f8516..0cd41f4 100644
> > > > --- a/qemu-options.hx
> > > > +++ b/qemu-options.hx
> > > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > >  guarantee the persistence of its own writes to @option{mem-path}
> > > >  (e.g. in vNVDIMM label emulation and live migration).
> > > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > > 
> > > should be
> > > 	which ensures
> > > 
> > > > +the file metadata is in sync to @option{mem-path}
> > > 
> > > 
> > > should be
> > > 	for @option{mem-path}
> > > 
> > > > in case of host crash
> > > > +or a power failure. MAP_SYNC requires support from both the host kernel
> > > > +(since Linux kernel 4.15) and @option{mem-path}
> > > 
> > > 
> > > should be
> > > 	and the filesystem of @option{mem-path}
> > Thanks, will update it.
> > > 
> > > > (only files supporting DAX).
> > > >  
> > > >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > > >  
> > > > -- 
> > > > 2.7.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-08  1:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
2019-02-06 17:26 ` [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 3/5] linux-headers: " Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
2019-02-06 18:25   ` Michael S. Tsirkin
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation Zhang, Yi
2019-02-06 18:29   ` Michael S. Tsirkin
2019-02-07 15:16     ` Yi Zhang
2019-02-07 14:30       ` Michael S. Tsirkin
2019-02-08 10:07         ` Yi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.