All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file
@ 2019-01-16  8:10 Zhang Yi
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Zhang Yi @ 2019-01-16  8:10 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (6):
  numa: Fixed the memory leak of numa error message
  memory: use sparse feature define RAM_FLAG.
  util/mmap-alloc: switch 'shared' to 'flags' parameter
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  hostmem: add more information in error messages
  docs: Added MAP_SYNC documentation

 backends/hostmem-file.c   |  6 ++++--
 backends/hostmem.c        |  8 +++++---
 docs/nvdimm.txt           | 21 ++++++++++++++++++++-
 exec.c                    |  7 ++++---
 include/exec/memory.h     | 12 ++++++------
 include/qemu/mmap-alloc.h | 20 +++++++++++++++++++-
 include/qemu/osdep.h      | 25 +++++++++++++++++++++++++
 numa.c                    |  1 +
 qemu-options.hx           |  4 ++++
 util/mmap-alloc.c         | 15 +++++++++++----
 util/oslib-posix.c        |  9 ++++++++-
 11 files changed, 107 insertions(+), 21 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message
  2019-01-16  8:10 [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file Zhang Yi
@ 2019-01-16  8:10 ` Zhang Yi
  2019-01-16 15:56   ` Michael S. Tsirkin
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG Zhang Yi
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Zhang Yi @ 2019-01-16  8:10 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

object_get_canonical_path_component() returns a string which
must be freed using g_free().

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Pankaj gupta <pagupta@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
---
 numa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/numa.c b/numa.c
index 50ec016..3875e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
             error_report("memory backend %s is used multiple times. Each "
                          "-numa option must use a different memdev value.",
                          path);
+            g_free(path);
             exit(1);
         }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG.
  2019-01-16  8:10 [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file Zhang Yi
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
@ 2019-01-16  8:10 ` Zhang Yi
  2019-01-16 15:55   ` Michael S. Tsirkin
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 3/6] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang Yi
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Zhang Yi @ 2019-01-16  8:10 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/exec/memory.h | 12 ++++++------
 include/qemu/osdep.h  |  9 +++++++++
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..03824d9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -104,27 +104,27 @@ struct IOMMUNotifier {
 typedef struct IOMMUNotifier IOMMUNotifier;
 
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
+#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
 
 /* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED     (1 << 1)
+#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
 
 /* Only a portion of RAM (used_length) is actually used, and migrated.
  * This used_length size can change across reboots.
  */
-#define RAM_RESIZEABLE (1 << 2)
+#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
 
 /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
  * zero the page and wake waiting processes.
  * (Set during postcopy)
  */
-#define RAM_UF_ZEROPAGE (1 << 3)
+#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
 
 /* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
+#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
 
 /* RAM is a persistent kind memory */
-#define RAM_PMEM (1 << 5)
+#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
 
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3bf48bc..457d24e 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -185,6 +185,15 @@ extern int daemon(int, int);
 #define ESHUTDOWN 4099
 #endif
 
+#ifdef __CHECKER__
+#define QEMU_BITWISE __attribute__((bitwise))
+#define QEMU_FORCE   __attribute__((force))
+#else
+#define QEMU_BITWISE
+#define QEMU_FORCE
+#endif
+
+typedef unsigned QEMU_BITWISE QemuMmapFlags;
 /* time_t may be either 32 or 64 bits depending on the host OS, and
  * can be either signed or unsigned, so we can't just hardcode a
  * specific maximum value. This is not a C preprocessor constant,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V9 3/6] util/mmap-alloc: switch 'shared' to 'flags' parameter
  2019-01-16  8:10 [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file Zhang Yi
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG Zhang Yi
@ 2019-01-16  8:10 ` Zhang Yi
  2019-01-16 15:49   ` Michael S. Tsirkin
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Zhang Yi @ 2019-01-16  8:10 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

As more flag parameters besides the existing 'shared' are going to be
added to qemu_ram_mmap() and qemu_ram_alloc_from_{file,fd}(), let's
switch 'shared' to a 'flags' parameter in advance, so as to ease the
further additions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 exec.c                    |  7 ++++---
 include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
 util/mmap-alloc.c         |  8 +++++---
 util/oslib-posix.c        |  9 ++++++++-
 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..e92a7da 100644
--- a/exec.c
+++ b/exec.c
@@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             int fd,
                             bool truncate,
+                            uint32_t flags,
                             Error **errp)
 {
     void *area;
@@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED);
+    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
@@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
+            ram_flags, errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..6fe6ed4 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ *  @fd: the file or the device to mmap
+ *  @size: the number of bytes to be mmaped
+ *  @align: if not zero, specify the alignment of the starting mapping address;
+ *          otherwise, the alignment in use will be determined by QEMU.
+ *  @flags: specifies additional properties of the mapping, which can be one or
+ *          bit-or of following values
+ *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          Other bits are ignored.
+ *
+ * Return:
+ *  On success, return a pointer to the mapped area.
+ *  On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..8f0a740 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "exec/memory.h"
 
 #define HUGETLBFS_MAGIC       0x958458f6
 
@@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
 {
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
      * anonymous memory is OK.
      */
     int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
-    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
-    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
+    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
 #else
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
+    bool shared = flags & RAM_SHARED;
     size_t offset;
     void *ptr1;
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..75a0171 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -54,6 +54,7 @@
 #endif
 
 #include "qemu/mmap-alloc.h"
+#include "exec/memory.h"
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #include "qemu/error-report.h"
@@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, shared);
+    uint32_t flags = 0;
+    void *ptr;
+
+    if (shared) {
+        flags = RAM_SHARED;
+    }
+    ptr = qemu_ram_mmap(-1, size, align, flags);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-16  8:10 [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (2 preceding siblings ...)
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 3/6] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang Yi
@ 2019-01-16  8:10 ` Zhang Yi
  2019-01-16 15:58   ` Michael S. Tsirkin
  2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 5/6] hostmem: add more information in error messages Zhang Yi
  2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation Zhang Yi
  5 siblings, 1 reply; 27+ messages in thread
From: Zhang Yi @ 2019-01-16  8:10 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 include/qemu/mmap-alloc.h |  1 +
 include/qemu/osdep.h      | 16 ++++++++++++++++
 util/mmap-alloc.c         |  7 ++++++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 6fe6ed4..a95d91c 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  @flags: specifies additional properties of the mapping, which can be one or
  *          bit-or of following values
  *          - RAM_SHARED: mmap with MAP_SHARED flag
+ *          - RAM_PMEM: mmap with MAP_SYNC flag
  *          Other bits are ignored.
  *
  * Return:
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..27a6bfe 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -419,6 +419,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+
+#include <asm-generic/mman.h>
+
+#ifndef MAP_SYNC
+#define MAP_SYNC 0x0
+#endif
+
+#else  /* !CONFIG_LINUX */
+#define MAP_SYNC              0x0
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
     uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 8f0a740..cba961c 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 #endif
     bool shared = flags & RAM_SHARED;
+    bool is_pmem = flags & RAM_PMEM;
+    int mmap_xflags = 0;
     size_t offset;
     void *ptr1;
 
@@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
     assert(is_power_of_2(align));
     /* Always align to host page size */
     assert(align >= getpagesize());
+    if (shared && is_pmem) {
+        mmap_xflags |= MAP_SYNC;
+    }
 
     offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
     ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
                 MAP_FIXED |
                 (fd == -1 ? MAP_ANONYMOUS : 0) |
-                (shared ? MAP_SHARED : MAP_PRIVATE),
+                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
                 fd, 0);
     if (ptr1 == MAP_FAILED) {
         munmap(ptr, total);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V9 5/6] hostmem: add more information in error messages
  2019-01-16  8:10 [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (3 preceding siblings ...)
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
@ 2019-01-16  8:11 ` Zhang Yi
  2019-01-16 11:30   ` Stefano Garzarella
  2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation Zhang Yi
  5 siblings, 1 reply; 27+ messages in thread
From: Zhang Yi @ 2019-01-16  8:11 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

When there are multiple memory backends in use, including the object type
name and the property name in the error message can help users to locate
the error.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
---
 backends/hostmem-file.c | 6 ++++--
 backends/hostmem.c      | 8 +++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e640749..0dd7a90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -82,7 +82,8 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(errp, "cannot change property value");
+        error_setg(errp, "cannot change property 'mem-path' of %s",
+                   object_get_typename(o));
         return;
     }
     g_free(fb->mem_path);
@@ -120,7 +121,8 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     uint64_t val;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property '%s' of %s",
+                   name, object_get_typename(o));
         goto out;
     }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 1a89342..e2bcf9f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -47,7 +47,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
     uint64_t value;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property %s of %s ",
+                   name, object_get_typename(obj));
         goto out;
     }
 
@@ -56,8 +57,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
         goto out;
     }
     if (!value) {
-        error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
-                   PRIu64 "'", object_get_typename(obj), name, value);
+        error_setg(&local_err,
+                   "property '%s' of %s doesn't take value '%" PRIu64 "'",
+                   name, object_get_typename(obj), value);
         goto out;
     }
     backend->size = value;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel]  [PATCH V9 6/6] docs: Added MAP_SYNC documentation
  2019-01-16  8:10 [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file Zhang Yi
                   ` (4 preceding siblings ...)
  2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 5/6] hostmem: add more information in error messages Zhang Yi
@ 2019-01-16  8:11 ` Zhang Yi
  2019-01-16 15:40   ` Michael S. Tsirkin
  5 siblings, 1 reply; 27+ messages in thread
From: Zhang Yi @ 2019-01-16  8:11 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang, mst,
	ehabkost
  Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
 docs/nvdimm.txt | 21 ++++++++++++++++++++-
 qemu-options.hx |  4 ++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..565ba73 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -142,11 +142,30 @@ backend of vNVDIMM:
 Guest Data Persistence
 ----------------------
 
+vNVDIMM is designed and implemented to guarantee the guest data
+persistence on the backends even on the host crash and power
+failures. However, there are still some requirements and limitations
+as explained below.
+
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
+filesystem metadata consistent even after a system crash or power
+failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
+also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with '-o dax',
+
+ - 'share' option of memory-backend-file is 'on'.
+
+ - 'pmem' option of memory-backend-file is 'on'
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..545cb8a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
 If @option{pmem} is set to 'on', QEMU will take necessary operations to
 guarantee the persistence of its own writes to @option{mem-path}
 (e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} even on the host crash
+and power failures. MAP_SYNC requires supports from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
 
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 5/6] hostmem: add more information in error messages
  2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 5/6] hostmem: add more information in error messages Zhang Yi
@ 2019-01-16 11:30   ` Stefano Garzarella
  0 siblings, 0 replies; 27+ messages in thread
From: Stefano Garzarella @ 2019-01-16 11:30 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, Stefan Hajnoczi, Paolo Bonzini, pagupta,
	yu.c.zhang, Michael Tsirkin, Eduardo Habkost, Igor Mammedov,
	dan.j.williams, qemu-devel

On Wed, Jan 16, 2019 at 9:17 AM Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
>
> When there are multiple memory backends in use, including the object type
> name and the property name in the error message can help users to locate
> the error.
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> ---
>  backends/hostmem-file.c | 6 ++++--
>  backends/hostmem.c      | 8 +++++---
>  2 files changed, 9 insertions(+), 5 deletions(-)
>

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation
  2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation Zhang Yi
@ 2019-01-16 15:40   ` Michael S. Tsirkin
  2019-01-21  6:11     ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-16 15:40 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 16, 2019 at 04:11:16PM +0800, Zhang Yi wrote:
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  docs/nvdimm.txt | 21 ++++++++++++++++++++-
>  qemu-options.hx |  4 ++++
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 5f158a6..565ba73 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -142,11 +142,30 @@ backend of vNVDIMM:
>  Guest Data Persistence
>  ----------------------
>  
> +vNVDIMM is designed and implemented to guarantee the guest data
> +persistence on the backends even on the host crash and power

in case of a host crash or a power failure

(be consistent)

> +failures. However, there are still some requirements and limitations
> +as explained below.
> +
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure

which ensures

> +filesystem metadata consistent

consistency

> even after a system crash or power

a power

> +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> +also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with '-o dax',
> +
> + - 'share' option of memory-backend-file is 'on'.

why does share need to be on?

> +
> + - 'pmem' option of memory-backend-file is 'on'
> +
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
>  guest NVDIMM region mapping structure.  This unarmed flag indicates
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08f8516..545cb8a 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
>  If @option{pmem} is set to 'on', QEMU will take necessary operations to
>  guarantee the persistence of its own writes to @option{mem-path}
>  (e.g. in vNVDIMM label emulation and live migration).
> +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> +the file metadata is in sync to @option{mem-path} even on the host crash
> +and power failures. MAP_SYNC requires supports from both the host kernel

requires support

> +(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
>  
>  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 3/6] util/mmap-alloc: switch 'shared' to 'flags' parameter
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 3/6] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang Yi
@ 2019-01-16 15:49   ` Michael S. Tsirkin
  0 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-16 15:49 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 16, 2019 at 04:10:45PM +0800, Zhang Yi wrote:
> As more flag parameters besides the existing 'shared' are going to be
> added to qemu_ram_mmap() and qemu_ram_alloc_from_{file,fd}(), let's
> switch 'shared' to a 'flags' parameter in advance, so as to ease the
> further additions.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  exec.c                    |  7 ++++---
>  include/qemu/mmap-alloc.h | 19 ++++++++++++++++++-
>  util/mmap-alloc.c         |  8 +++++---
>  util/oslib-posix.c        |  9 ++++++++-
>  4 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index bb6170d..e92a7da 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1810,6 +1810,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              int fd,
>                              bool truncate,
> +                            uint32_t flags,
>                              Error **errp)
>  {
>      void *area;


But given that flags is an un-annotated uint32_t does sparse
actually do any checks?

I'm lost ...

> @@ -1859,8 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
>          perror("ftruncate");
>      }
>  
> -    area = qemu_ram_mmap(fd, memory, block->mr->align,
> -                         block->flags & RAM_SHARED);
> +    area = qemu_ram_mmap(fd, memory, block->mr->align, flags);
>      if (area == MAP_FAILED) {
>          error_setg_errno(errp, errno,
>                           "unable to map backing store for guest RAM");
> @@ -2279,7 +2279,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = ram_flags;
> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size,
> +            ram_flags, errp);
>      if (!new_block->host) {
>          g_free(new_block);
>          return NULL;
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 50385e3..6fe6ed4 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,7 +7,24 @@ size_t qemu_fd_getpagesize(int fd);
>  
>  size_t qemu_mempath_getpagesize(const char *mem_path);
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> +/**
> + * qemu_ram_mmap: mmap the specified file or device.
> + *
> + * Parameters:
> + *  @fd: the file or the device to mmap
> + *  @size: the number of bytes to be mmaped
> + *  @align: if not zero, specify the alignment of the starting mapping address;
> + *          otherwise, the alignment in use will be determined by QEMU.
> + *  @flags: specifies additional properties of the mapping, which can be one or
> + *          bit-or of following values
> + *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          Other bits are ignored.
> + *
> + * Return:
> + *  On success, return a pointer to the mapped area.
> + *  On failure, return MAP_FAILED.
> + */
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags);
>  
>  void qemu_ram_munmap(void *ptr, size_t size);
>  
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index fd329ec..8f0a740 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/mmap-alloc.h"
>  #include "qemu/host-utils.h"
> +#include "exec/memory.h"
>  
>  #define HUGETLBFS_MAGIC       0x958458f6
>  
> @@ -75,7 +76,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>      return getpagesize();
>  }
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>  {
>      /*
>       * Note: this always allocates at least one extra page of virtual address
> @@ -92,11 +93,12 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * anonymous memory is OK.
>       */
>      int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> -    int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> -    void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> +    int mmap_flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> +    void *ptr = mmap(0, total, PROT_NONE, mmap_flags | MAP_PRIVATE, anonfd, 0);
>  #else
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
> +    bool shared = flags & RAM_SHARED;
>      size_t offset;
>      void *ptr1;
>  
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index fbd0dc8..75a0171 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -54,6 +54,7 @@
>  #endif
>  
>  #include "qemu/mmap-alloc.h"
> +#include "exec/memory.h"
>  
>  #ifdef CONFIG_DEBUG_STACK_USAGE
>  #include "qemu/error-report.h"
> @@ -203,7 +204,13 @@ void *qemu_memalign(size_t alignment, size_t size)
>  void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
>  {
>      size_t align = QEMU_VMALLOC_ALIGN;
> -    void *ptr = qemu_ram_mmap(-1, size, align, shared);
> +    uint32_t flags = 0;
> +    void *ptr;
> +
> +    if (shared) {
> +        flags = RAM_SHARED;
> +    }
> +    ptr = qemu_ram_mmap(-1, size, align, flags);
>  
>      if (ptr == MAP_FAILED) {
>          return NULL;
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG.
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG Zhang Yi
@ 2019-01-16 15:55   ` Michael S. Tsirkin
  2019-01-21  6:35     ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-16 15:55 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 16, 2019 at 04:10:29PM +0800, Zhang Yi wrote:
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

OK so if you apply this patch, do you get
any sparse warning at all?

If not how come?

And we need to ask patchew maintainers to 

Overall I'd suggest that since you no longer need
a new RAM flag, split this effort out and finish
MAP_SYNC work first.


> ---
>  include/exec/memory.h | 12 ++++++------
>  include/qemu/osdep.h  |  9 +++++++++
>  2 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 667466b..03824d9 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -104,27 +104,27 @@ struct IOMMUNotifier {
>  typedef struct IOMMUNotifier IOMMUNotifier;
>  
>  /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
> -#define RAM_PREALLOC   (1 << 0)
> +#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
>  
>  /* RAM is mmap-ed with MAP_SHARED */
> -#define RAM_SHARED     (1 << 1)
> +#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
>  
>  /* Only a portion of RAM (used_length) is actually used, and migrated.
>   * This used_length size can change across reboots.
>   */
> -#define RAM_RESIZEABLE (1 << 2)
> +#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
>  
>  /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
>   * zero the page and wake waiting processes.
>   * (Set during postcopy)
>   */
> -#define RAM_UF_ZEROPAGE (1 << 3)
> +#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
>  
>  /* RAM can be migrated */
> -#define RAM_MIGRATABLE (1 << 4)
> +#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
>  
>  /* RAM is a persistent kind memory */
> -#define RAM_PMEM (1 << 5)
> +#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
>  
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 3bf48bc..457d24e 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -185,6 +185,15 @@ extern int daemon(int, int);
>  #define ESHUTDOWN 4099
>  #endif
>  
> +#ifdef __CHECKER__
> +#define QEMU_BITWISE __attribute__((bitwise))
> +#define QEMU_FORCE   __attribute__((force))
> +#else
> +#define QEMU_BITWISE
> +#define QEMU_FORCE
> +#endif
> +
> +typedef unsigned QEMU_BITWISE QemuMmapFlags;
>  /* time_t may be either 32 or 64 bits depending on the host OS, and
>   * can be either signed or unsigned, so we can't just hardcode a
>   * specific maximum value. This is not a C preprocessor constant,
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
@ 2019-01-16 15:56   ` Michael S. Tsirkin
  2019-01-18 17:36     ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-16 15:56 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 16, 2019 at 04:10:17PM +0800, Zhang Yi wrote:
> object_get_canonical_path_component() returns a string which
> must be freed using g_free().
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Reviewed-by: Pankaj gupta <pagupta@redhat.com>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

Pls post this separately as it's just a bugfix that
is not a dependency of map_sync.

> ---
>  numa.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/numa.c b/numa.c
> index 50ec016..3875e1e 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -533,6 +533,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>              error_report("memory backend %s is used multiple times. Each "
>                           "-numa option must use a different memdev value.",
>                           path);
> +            g_free(path);
>              exit(1);
>          }
>  
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
@ 2019-01-16 15:58   ` Michael S. Tsirkin
  2019-01-18 18:11     ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-16 15:58 UTC (permalink / raw)
  To: Zhang Yi
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 16, 2019 at 04:10:58PM +0800, Zhang Yi wrote:
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition which can ensure file system metadata
> synced in each guest writes to the backend file, without other QEMU
> actions (e.g., periodic fsync() by QEMU).
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
>  include/qemu/mmap-alloc.h |  1 +
>  include/qemu/osdep.h      | 16 ++++++++++++++++
>  util/mmap-alloc.c         |  7 ++++++-
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 6fe6ed4..a95d91c 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>   *  @flags: specifies additional properties of the mapping, which can be one or
>   *          bit-or of following values
>   *          - RAM_SHARED: mmap with MAP_SHARED flag
> + *          - RAM_PMEM: mmap with MAP_SYNC flag
>   *          Other bits are ignored.
>   *
>   * Return:
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 457d24e..27a6bfe 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -419,6 +419,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
>  #  define QEMU_VMALLOC_ALIGN getpagesize()
>  #endif
>  
> +/*
> + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> + * 4.15, so they may not be defined when compiling on older kernels.
> + */
> +#ifdef CONFIG_LINUX
> +
> +#include <asm-generic/mman.h>

I suspect this is a wrong way to pull in this header.

You are normally supposed to use
       #include <linux/mman.h>

but see below.


> +
> +#ifndef MAP_SYNC
> +#define MAP_SYNC 0x0
> +#endif

Oh that's bad.

So if you run with a new kernel but
your installed headers are old, you get MAP_SYNC 0
and no persistence transparently with no warning.



> +
> +#else  /* !CONFIG_LINUX */
> +#define MAP_SYNC              0x0
> +#endif /* CONFIG_LINUX */
> +
>  #ifdef CONFIG_POSIX
>  struct qemu_signalfd_siginfo {
>      uint32_t ssi_signo;   /* Signal number */
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 8f0a740..cba961c 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>  #endif
>      bool shared = flags & RAM_SHARED;
> +    bool is_pmem = flags & RAM_PMEM;
> +    int mmap_xflags = 0;
>      size_t offset;
>      void *ptr1;
>  
> @@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
>      assert(align >= getpagesize());
> +    if (shared && is_pmem) {
> +        mmap_xflags |= MAP_SYNC;
> +    }
>  
>      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
>      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
>                  MAP_FIXED |
>                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> -                (shared ? MAP_SHARED : MAP_PRIVATE),
> +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
>                  fd, 0);
>      if (ptr1 == MAP_FAILED) {
>          munmap(ptr, total);
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message
  2019-01-16 15:56   ` Michael S. Tsirkin
@ 2019-01-18 17:36     ` Eduardo Habkost
  0 siblings, 0 replies; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-18 17:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhang Yi, xiaoguangrong.eric, stefanha, pbonzini, pagupta,
	yu.c.zhang, qemu-devel, imammedo, dan.j.williams

On Wed, Jan 16, 2019 at 10:56:14AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 16, 2019 at 04:10:17PM +0800, Zhang Yi wrote:
> > object_get_canonical_path_component() returns a string which
> > must be freed using g_free().
> > 
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Reviewed-by: Pankaj gupta <pagupta@redhat.com>
> > Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> 
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Pls post this separately as it's just a bugfix that
> is not a dependency of map_sync.

I have already queued it on machine-next, there's no need to
resubmit.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-16 15:58   ` Michael S. Tsirkin
@ 2019-01-18 18:11     ` Eduardo Habkost
  2019-01-21  5:15       ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-18 18:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhang Yi, pagupta, qemu-devel, yu.c.zhang, stefanha, imammedo,
	pbonzini, dan.j.williams, xiaoguangrong.eric

On Wed, Jan 16, 2019 at 10:58:44AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 16, 2019 at 04:10:58PM +0800, Zhang Yi wrote:
> > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > MAP_SYNC flag in addition which can ensure file system metadata
> > synced in each guest writes to the backend file, without other QEMU
> > actions (e.g., periodic fsync() by QEMU).
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  include/qemu/mmap-alloc.h |  1 +
> >  include/qemu/osdep.h      | 16 ++++++++++++++++
> >  util/mmap-alloc.c         |  7 ++++++-
> >  3 files changed, 23 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > index 6fe6ed4..a95d91c 100644
> > --- a/include/qemu/mmap-alloc.h
> > +++ b/include/qemu/mmap-alloc.h
> > @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
> >   *  @flags: specifies additional properties of the mapping, which can be one or
> >   *          bit-or of following values
> >   *          - RAM_SHARED: mmap with MAP_SHARED flag
> > + *          - RAM_PMEM: mmap with MAP_SYNC flag
> >   *          Other bits are ignored.
> >   *
> >   * Return:
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 457d24e..27a6bfe 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -419,6 +419,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> >  #endif
> >  
> > +/*
> > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > + * 4.15, so they may not be defined when compiling on older kernels.
> > + */
> > +#ifdef CONFIG_LINUX
> > +
> > +#include <asm-generic/mman.h>
> 
> I suspect this is a wrong way to pull in this header.
> 
> You are normally supposed to use
>        #include <linux/mman.h>
> 
> but see below.
> 
> 
> > +
> > +#ifndef MAP_SYNC
> > +#define MAP_SYNC 0x0
> > +#endif
> 
> Oh that's bad.
> 
> So if you run with a new kernel but
> your installed headers are old, you get MAP_SYNC 0
> and no persistence transparently with no warning.

Yes. The semantics of the command-line to not change depending on
build time circumstances.

Anyway, I see a more fundamental problem in each version of this
patch: the semantics of the command-line options are not clearly
documented.

We have at least 3 different possible use cases we might need to
support:

1) pmem=on, MAP_SYNC not desired
2) pmem=on, MAP_SYNC desired but optional
3) pmem=on, MAP_SYNC required, not optional

Which cases from the list above we need to support?

>From the cases above, what's the expected semantics of "pmem=on"
with no extra options?

If these questions are not answered (in the commit message and
user documentation), we won't be able to review and discuss the
code.


> 
> > +
> > +#else  /* !CONFIG_LINUX */
> > +#define MAP_SYNC              0x0
> > +#endif /* CONFIG_LINUX */
> > +
> >  #ifdef CONFIG_POSIX
> >  struct qemu_signalfd_siginfo {
> >      uint32_t ssi_signo;   /* Signal number */
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index 8f0a740..cba961c 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >  #endif
> >      bool shared = flags & RAM_SHARED;
> > +    bool is_pmem = flags & RAM_PMEM;
> > +    int mmap_xflags = 0;
> >      size_t offset;
> >      void *ptr1;
> >  
> > @@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> >      assert(is_power_of_2(align));
> >      /* Always align to host page size */
> >      assert(align >= getpagesize());
> > +    if (shared && is_pmem) {
> > +        mmap_xflags |= MAP_SYNC;
> > +    }
> >  
> >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> >                  MAP_FIXED |
> >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> >                  fd, 0);
> >      if (ptr1 == MAP_FAILED) {
> >          munmap(ptr, total);
> > -- 
> > 2.7.4
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-18 18:11     ` Eduardo Habkost
@ 2019-01-21  5:15       ` Yi Zhang
  2019-01-21 14:44         ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Yi Zhang @ 2019-01-21  5:15 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Michael S. Tsirkin, pagupta, qemu-devel, yu.c.zhang, stefanha,
	imammedo, pbonzini, dan.j.williams, xiaoguangrong.eric

On 2019-01-18 at 16:11:47 -0200, Eduardo Habkost wrote:
> On Wed, Jan 16, 2019 at 10:58:44AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 16, 2019 at 04:10:58PM +0800, Zhang Yi wrote:
> > > When a file supporting DAX is used as vNVDIMM backend, mmap it with
> > > MAP_SYNC flag in addition which can ensure file system metadata
> > > synced in each guest writes to the backend file, without other QEMU
> > > actions (e.g., periodic fsync() by QEMU).
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > >  include/qemu/mmap-alloc.h |  1 +
> > >  include/qemu/osdep.h      | 16 ++++++++++++++++
> > >  util/mmap-alloc.c         |  7 ++++++-
> > >  3 files changed, 23 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> > > index 6fe6ed4..a95d91c 100644
> > > --- a/include/qemu/mmap-alloc.h
> > > +++ b/include/qemu/mmap-alloc.h
> > > @@ -18,6 +18,7 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
> > >   *  @flags: specifies additional properties of the mapping, which can be one or
> > >   *          bit-or of following values
> > >   *          - RAM_SHARED: mmap with MAP_SHARED flag
> > > + *          - RAM_PMEM: mmap with MAP_SYNC flag
> > >   *          Other bits are ignored.
> > >   *
> > >   * Return:
> > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > > index 457d24e..27a6bfe 100644
> > > --- a/include/qemu/osdep.h
> > > +++ b/include/qemu/osdep.h
> > > @@ -419,6 +419,22 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> > >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> > >  #endif
> > >  
> > > +/*
> > > + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel
> > > + * 4.15, so they may not be defined when compiling on older kernels.
> > > + */
> > > +#ifdef CONFIG_LINUX
> > > +
> > > +#include <asm-generic/mman.h>
> > 
> > I suspect this is a wrong way to pull in this header.
> > 
> > You are normally supposed to use
> >        #include <linux/mman.h>
> > 
> > but see below.
> > 
> > 
> > > +
> > > +#ifndef MAP_SYNC
> > > +#define MAP_SYNC 0x0
> > > +#endif
> > 
> > Oh that's bad.
> > 
> > So if you run with a new kernel but
> > your installed headers are old, you get MAP_SYNC 0
> > and no persistence transparently with no warning.
> 
> Yes. The semantics of the command-line to not change depending on
> build time circumstances.
> 
> Anyway, I see a more fundamental problem in each version of this
> patch: the semantics of the command-line options are not clearly
> documented.
> 
> We have at least 3 different possible use cases we might need to
> support:
> 
> 1) pmem=on, MAP_SYNC not desired
> 2) pmem=on, MAP_SYNC desired but optional

Form V9, As Michael suggest, We removed the sync option, MAP_SYNC will
force on while we set pmem=on. So we only have 2 user cases, Will update
to user documentation.
1) pmem=on, MAP_SYNC not desired
We will not pass the flag to mmap2
2) pmem=on, MAP_SYNC desired
We will pass the flag to mmap2

> 3) pmem=on, MAP_SYNC required, not optional
> 
> Which cases from the list above we need to support?
> 
> From the cases above, what's the expected semantics of "pmem=on"
> with no extra options?

> 
> If these questions are not answered (in the commit message and
> user documentation), we won't be able to review and discuss the
> code.
> 
> 
> > 
> > > +
> > > +#else  /* !CONFIG_LINUX */
> > > +#define MAP_SYNC              0x0
> > > +#endif /* CONFIG_LINUX */
> > > +
> > >  #ifdef CONFIG_POSIX
> > >  struct qemu_signalfd_siginfo {
> > >      uint32_t ssi_signo;   /* Signal number */
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index 8f0a740..cba961c 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -99,6 +99,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >  #endif
> > >      bool shared = flags & RAM_SHARED;
> > > +    bool is_pmem = flags & RAM_PMEM;
> > > +    int mmap_xflags = 0;
> > >      size_t offset;
> > >      void *ptr1;
> > >  
> > > @@ -109,12 +111,15 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, uint32_t flags)
> > >      assert(is_power_of_2(align));
> > >      /* Always align to host page size */
> > >      assert(align >= getpagesize());
> > > +    if (shared && is_pmem) {
> > > +        mmap_xflags |= MAP_SYNC;
> > > +    }
> > >  
> > >      offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > >      ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> > >                  MAP_FIXED |
> > >                  (fd == -1 ? MAP_ANONYMOUS : 0) |
> > > -                (shared ? MAP_SHARED : MAP_PRIVATE),
> > > +                (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags,
> > >                  fd, 0);
> > >      if (ptr1 == MAP_FAILED) {
> > >          munmap(ptr, total);
> > > -- 
> > > 2.7.4
> > 
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation
  2019-01-16 15:40   ` Michael S. Tsirkin
@ 2019-01-21  6:11     ` Yi Zhang
  2019-01-21  7:21       ` Pankaj Gupta
  0 siblings, 1 reply; 27+ messages in thread
From: Yi Zhang @ 2019-01-21  6:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-01-16 at 10:40:23 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 16, 2019 at 04:11:16PM +0800, Zhang Yi wrote:
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> >  docs/nvdimm.txt | 21 ++++++++++++++++++++-
> >  qemu-options.hx |  4 ++++
> >  2 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index 5f158a6..565ba73 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -142,11 +142,30 @@ backend of vNVDIMM:
> >  Guest Data Persistence
> >  ----------------------
> >  
> > +vNVDIMM is designed and implemented to guarantee the guest data
> > +persistence on the backends even on the host crash and power
> 
> in case of a host crash or a power failure
> 
> (be consistent)
> 
> > +failures. However, there are still some requirements and limitations
> > +as explained below.
> > +
> >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > -currently the only one that can guarantee the guest write persistence
> > +if MAP_SYNC is not supported by the host kernel and the backends,
> > +the only backend that can guarantee the guest write persistence
> >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> >  which all guest access do not involve any host-side kernel cache.
> >  
> > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> 
> which ensures
> 
> > +filesystem metadata consistent
> 
> consistency
> 
> > even after a system crash or power
> 
> a power
> 
> > +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> > +also requires:
> > +
> > + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> > +   xfs file system mounted with '-o dax',
> > +
> > + - 'share' option of memory-backend-file is 'on'.
> 
> why does share need to be on?
According to mmap(2) man page, 
MAP_SYNC is available only with the MAP_SHARED_VALIDATE map‐ping type.
I will add this to documentation, Thanks.
> 
> > +
> > + - 'pmem' option of memory-backend-file is 'on'
> > +
> >  When using other types of backends, it's suggested to set 'unarmed'
> >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 08f8516..545cb8a 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> >  guarantee the persistence of its own writes to @option{mem-path}
> >  (e.g. in vNVDIMM label emulation and live migration).
> > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > +the file metadata is in sync to @option{mem-path} even on the host crash
> > +and power failures. MAP_SYNC requires supports from both the host kernel
> 
> requires support
> 
> > +(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
> >  
> >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >  
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG.
  2019-01-16 15:55   ` Michael S. Tsirkin
@ 2019-01-21  6:35     ` Yi Zhang
  2019-01-21 20:24       ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Yi Zhang @ 2019-01-21  6:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-01-16 at 10:55:33 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 16, 2019 at 04:10:29PM +0800, Zhang Yi wrote:
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> OK so if you apply this patch, do you get
> any sparse warning at all?
Didn't get any sparse warning.
> 
> If not how come?
> 
> And we need to ask patchew maintainers to 
> 
> Overall I'd suggest that since you no longer need
> a new RAM flag, split this effort out and finish
> MAP_SYNC work first.
Ok, Will
> 
> 
> > ---
> >  include/exec/memory.h | 12 ++++++------
> >  include/qemu/osdep.h  |  9 +++++++++
> >  2 files changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 667466b..03824d9 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -104,27 +104,27 @@ struct IOMMUNotifier {
> >  typedef struct IOMMUNotifier IOMMUNotifier;
> >  
> >  /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
> > -#define RAM_PREALLOC   (1 << 0)
> > +#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
> >  
> >  /* RAM is mmap-ed with MAP_SHARED */
> > -#define RAM_SHARED     (1 << 1)
> > +#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
> >  
> >  /* Only a portion of RAM (used_length) is actually used, and migrated.
> >   * This used_length size can change across reboots.
> >   */
> > -#define RAM_RESIZEABLE (1 << 2)
> > +#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
> >  
> >  /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
> >   * zero the page and wake waiting processes.
> >   * (Set during postcopy)
> >   */
> > -#define RAM_UF_ZEROPAGE (1 << 3)
> > +#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
> >  
> >  /* RAM can be migrated */
> > -#define RAM_MIGRATABLE (1 << 4)
> > +#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
> >  
> >  /* RAM is a persistent kind memory */
> > -#define RAM_PMEM (1 << 5)
> > +#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
> >  
> >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> >                                         IOMMUNotifierFlag flags,
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 3bf48bc..457d24e 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -185,6 +185,15 @@ extern int daemon(int, int);
> >  #define ESHUTDOWN 4099
> >  #endif
> >  
> > +#ifdef __CHECKER__
> > +#define QEMU_BITWISE __attribute__((bitwise))
> > +#define QEMU_FORCE   __attribute__((force))
> > +#else
> > +#define QEMU_BITWISE
> > +#define QEMU_FORCE
> > +#endif
> > +
> > +typedef unsigned QEMU_BITWISE QemuMmapFlags;
> >  /* time_t may be either 32 or 64 bits depending on the host OS, and
> >   * can be either signed or unsigned, so we can't just hardcode a
> >   * specific maximum value. This is not a C preprocessor constant,
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation
  2019-01-21  6:11     ` Yi Zhang
@ 2019-01-21  7:21       ` Pankaj Gupta
  2019-01-21  7:57         ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Pankaj Gupta @ 2019-01-21  7:21 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Michael S. Tsirkin, xiaoguangrong eric, stefanha, pbonzini,
	yu c zhang, ehabkost, qemu-devel, imammedo, dan j williams


> > > ---
> > >  docs/nvdimm.txt | 21 ++++++++++++++++++++-
> > >  qemu-options.hx |  4 ++++
> > >  2 files changed, 24 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > index 5f158a6..565ba73 100644
> > > --- a/docs/nvdimm.txt
> > > +++ b/docs/nvdimm.txt
> > > @@ -142,11 +142,30 @@ backend of vNVDIMM:
> > >  Guest Data Persistence
> > >  ----------------------
> > >  
> > > +vNVDIMM is designed and implemented to guarantee the guest data
> > > +persistence on the backends even on the host crash and power
> > 
> > in case of a host crash or a power failure
> > 
> > (be consistent)
> > 
> > > +failures. However, there are still some requirements and limitations
> > > +as explained below.
> > > +
> > >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > -currently the only one that can guarantee the guest write persistence
> > > +if MAP_SYNC is not supported by the host kernel and the backends,
> > > +the only backend that can guarantee the guest write persistence
> > >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > >  which all guest access do not involve any host-side kernel cache.
> > >  
> > > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > > +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> > 
> > which ensures
> > 
> > > +filesystem metadata consistent
> > 
> > consistency
> > 
> > > even after a system crash or power
> > 
> > a power
> > 
> > > +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> > > +also requires:
> > > +
> > > + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> > > +   xfs file system mounted with '-o dax',
> > > +
> > > + - 'share' option of memory-backend-file is 'on'.
> > 
> > why does share need to be on?
> According to mmap(2) man page,
> MAP_SYNC is available only with the MAP_SHARED_VALIDATE map‐ping type.
> I will add this to documentation, Thanks.

MAP_SHARE_VALIDATE should be used to validate MAP_SYNC with MAP_SHARED.

#define MAP_SHARED_VALIDATE 0x03

In contrast, MAP_SHARED is:
#define MAP_SHARED      0x01            /* Share changes */

I think we should use MAP_SHARED_VALIDATE with MAP_SYNC here?

Thanks,
Pankaj 

> > 
> > > +
> > > + - 'pmem' option of memory-backend-file is 'on'
> > > +
> > >  When using other types of backends, it's suggested to set 'unarmed'
> > >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 08f8516..545cb8a 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel
> > > NVDIMM).
> > >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > >  guarantee the persistence of its own writes to @option{mem-path}
> > >  (e.g. in vNVDIMM label emulation and live migration).
> > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > > +the file metadata is in sync to @option{mem-path} even on the host crash
> > > +and power failures. MAP_SYNC requires supports from both the host kernel
> > 
> > requires support
> > 
> > > +(since Linux kernel 4.15) and @option{mem-path} (only files supporting
> > > DAX).
> > >  
> > >  @item -object
> > >  memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > >  
> > > --
> > > 2.7.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation
  2019-01-21  7:21       ` Pankaj Gupta
@ 2019-01-21  7:57         ` Yi Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Yi Zhang @ 2019-01-21  7:57 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Michael S. Tsirkin, xiaoguangrong eric, stefanha, pbonzini,
	yu c zhang, ehabkost, qemu-devel, imammedo, dan j williams

On 2019-01-21 at 02:21:39 -0500, Pankaj Gupta wrote:
> 
> > > > ---
> > > >  docs/nvdimm.txt | 21 ++++++++++++++++++++-
> > > >  qemu-options.hx |  4 ++++
> > > >  2 files changed, 24 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > > index 5f158a6..565ba73 100644
> > > > --- a/docs/nvdimm.txt
> > > > +++ b/docs/nvdimm.txt
> > > > @@ -142,11 +142,30 @@ backend of vNVDIMM:
> > > >  Guest Data Persistence
> > > >  ----------------------
> > > >  
> > > > +vNVDIMM is designed and implemented to guarantee the guest data
> > > > +persistence on the backends even on the host crash and power
> > > 
> > > in case of a host crash or a power failure
> > > 
> > > (be consistent)
> > > 
> > > > +failures. However, there are still some requirements and limitations
> > > > +as explained below.
> > > > +
> > > >  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > > -currently the only one that can guarantee the guest write persistence
> > > > +if MAP_SYNC is not supported by the host kernel and the backends,
> > > > +the only backend that can guarantee the guest write persistence
> > > >  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > > >  which all guest access do not involve any host-side kernel cache.
> > > >  
> > > > +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> > > > +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can ensure
> > > 
> > > which ensures
> > > 
> > > > +filesystem metadata consistent
> > > 
> > > consistency
> > > 
> > > > even after a system crash or power
> > > 
> > > a power
> > > 
> > > > +failure. Besides the host kernel support, enabling MAP_SYNC in QEMU
> > > > +also requires:
> > > > +
> > > > + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> > > > +   xfs file system mounted with '-o dax',
> > > > +
> > > > + - 'share' option of memory-backend-file is 'on'.
> > > 
> > > why does share need to be on?
> > According to mmap(2) man page,
> > MAP_SYNC is available only with the MAP_SHARED_VALIDATE map‐ping type.
> > I will add this to documentation, Thanks.
> 
> MAP_SHARE_VALIDATE should be used to validate MAP_SYNC with MAP_SHARED.
> 
> #define MAP_SHARED_VALIDATE 0x03
> 
> In contrast, MAP_SHARED is:
> #define MAP_SHARED      0x01            /* Share changes */
> 
> I think we should use MAP_SHARED_VALIDATE with MAP_SYNC here?
> 
> Thanks,
> Pankaj
Yes, Pankaj, Thanks point that out, I will add the MAP_SHARED_VALIDATE.
> 
> > > 
> > > > +
> > > > + - 'pmem' option of memory-backend-file is 'on'
> > > > +
> > > >  When using other types of backends, it's suggested to set 'unarmed'
> > > >  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > > >  guest NVDIMM region mapping structure.  This unarmed flag indicates
> > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > index 08f8516..545cb8a 100644
> > > > --- a/qemu-options.hx
> > > > +++ b/qemu-options.hx
> > > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel
> > > > NVDIMM).
> > > >  If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > >  guarantee the persistence of its own writes to @option{mem-path}
> > > >  (e.g. in vNVDIMM label emulation and live migration).
> > > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > > > +the file metadata is in sync to @option{mem-path} even on the host crash
> > > > +and power failures. MAP_SYNC requires supports from both the host kernel
> > > 
> > > requires support
> > > 
> > > > +(since Linux kernel 4.15) and @option{mem-path} (only files supporting
> > > > DAX).
> > > >  
> > > >  @item -object
> > > >  memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > > >  
> > > > --
> > > > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-21  5:15       ` Yi Zhang
@ 2019-01-21 14:44         ` Eduardo Habkost
  2019-01-22  3:21           ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2019-01-21 14:44 UTC (permalink / raw)
  To: Michael S. Tsirkin, pagupta, qemu-devel, yu.c.zhang, stefanha,
	imammedo, pbonzini, dan.j.williams, xiaoguangrong.eric

On Mon, Jan 21, 2019 at 01:15:36PM +0800, Yi Zhang wrote:
> On 2019-01-18 at 16:11:47 -0200, Eduardo Habkost wrote:
[...]
> > Anyway, I see a more fundamental problem in each version of this
> > patch: the semantics of the command-line options are not clearly
> > documented.
> > 
> > We have at least 3 different possible use cases we might need to
> > support:
> > 
> > 1) pmem=on, MAP_SYNC not desired
> > 2) pmem=on, MAP_SYNC desired but optional
> 
> Form V9, As Michael suggest, We removed the sync option, MAP_SYNC will
> force on while we set pmem=on. So we only have 2 user cases, Will update
> to user documentation.
> 1) pmem=on, MAP_SYNC not desired
> We will not pass the flag to mmap2

If this use case is supported, how the command-line should look
like to enable it?


> 2) pmem=on, MAP_SYNC desired
> We will pass the flag to mmap2

Same question as above: how the command-line should look like for
this use case?

> 
> > 3) pmem=on, MAP_SYNC required, not optional
> > 
> > Which cases from the list above we need to support?
> > 
> > From the cases above, what's the expected semantics of "pmem=on"
> > with no extra options?

We still need to answer that question.

The current semantics of pmem=on (with no extra options) is (1).
It looks like we can't change it to (2) without breaking existing
configurations.  If you make existing configurations stop working
on hosts where they currently work, you need to explain why it's
OK to do that.


> > 
> > If these questions are not answered (in the commit message and
> > user documentation), we won't be able to review and discuss the
> > code.
> > 
> > 
[...]


-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG.
  2019-01-21  6:35     ` Yi Zhang
@ 2019-01-21 20:24       ` Michael S. Tsirkin
  2019-01-22  3:25         ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-21 20:24 UTC (permalink / raw)
  To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On Mon, Jan 21, 2019 at 02:35:57PM +0800, Yi Zhang wrote:
> On 2019-01-16 at 10:55:33 -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 16, 2019 at 04:10:29PM +0800, Zhang Yi wrote:
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > OK so if you apply this patch, do you get
> > any sparse warning at all?
> Didn't get any sparse warning.


The reason I find it strange is because you then go on
to assign these flags to an untagged unint32_t.

Did you forget to configure with sparse enabled maybe?


> > 
> > If not how come?
> > 
> > And we need to ask patchew maintainers to 
> > 
> > Overall I'd suggest that since you no longer need
> > a new RAM flag, split this effort out and finish
> > MAP_SYNC work first.
> Ok, Will
> > 
> > 
> > > ---
> > >  include/exec/memory.h | 12 ++++++------
> > >  include/qemu/osdep.h  |  9 +++++++++
> > >  2 files changed, 15 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 667466b..03824d9 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -104,27 +104,27 @@ struct IOMMUNotifier {
> > >  typedef struct IOMMUNotifier IOMMUNotifier;
> > >  
> > >  /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
> > > -#define RAM_PREALLOC   (1 << 0)
> > > +#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
> > >  
> > >  /* RAM is mmap-ed with MAP_SHARED */
> > > -#define RAM_SHARED     (1 << 1)
> > > +#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
> > >  
> > >  /* Only a portion of RAM (used_length) is actually used, and migrated.
> > >   * This used_length size can change across reboots.
> > >   */
> > > -#define RAM_RESIZEABLE (1 << 2)
> > > +#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
> > >  
> > >  /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
> > >   * zero the page and wake waiting processes.
> > >   * (Set during postcopy)
> > >   */
> > > -#define RAM_UF_ZEROPAGE (1 << 3)
> > > +#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
> > >  
> > >  /* RAM can be migrated */
> > > -#define RAM_MIGRATABLE (1 << 4)
> > > +#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
> > >  
> > >  /* RAM is a persistent kind memory */
> > > -#define RAM_PMEM (1 << 5)
> > > +#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
> > >  
> > >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> > >                                         IOMMUNotifierFlag flags,
> > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > > index 3bf48bc..457d24e 100644
> > > --- a/include/qemu/osdep.h
> > > +++ b/include/qemu/osdep.h
> > > @@ -185,6 +185,15 @@ extern int daemon(int, int);
> > >  #define ESHUTDOWN 4099
> > >  #endif
> > >  
> > > +#ifdef __CHECKER__
> > > +#define QEMU_BITWISE __attribute__((bitwise))
> > > +#define QEMU_FORCE   __attribute__((force))
> > > +#else
> > > +#define QEMU_BITWISE
> > > +#define QEMU_FORCE
> > > +#endif
> > > +
> > > +typedef unsigned QEMU_BITWISE QemuMmapFlags;
> > >  /* time_t may be either 32 or 64 bits depending on the host OS, and
> > >   * can be either signed or unsigned, so we can't just hardcode a
> > >   * specific maximum value. This is not a C preprocessor constant,
> > > -- 
> > > 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-21 14:44         ` Eduardo Habkost
@ 2019-01-22  3:21           ` Yi Zhang
  2019-01-22  3:27             ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Yi Zhang @ 2019-01-22  3:21 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Michael S. Tsirkin, pagupta, qemu-devel, yu.c.zhang, stefanha,
	imammedo, pbonzini, dan.j.williams, xiaoguangrong.eric

On 2019-01-21 at 12:44:00 -0200, Eduardo Habkost wrote:
> On Mon, Jan 21, 2019 at 01:15:36PM +0800, Yi Zhang wrote:
> > On 2019-01-18 at 16:11:47 -0200, Eduardo Habkost wrote:
> [...]
> > > Anyway, I see a more fundamental problem in each version of this
> > > patch: the semantics of the command-line options are not clearly
> > > documented.
> > > 
> > > We have at least 3 different possible use cases we might need to
> > > support:
> > > 
> > > 1) pmem=on, MAP_SYNC not desired
> > > 2) pmem=on, MAP_SYNC desired but optional
> > 
> > Form V9, As Michael suggest, We removed the sync option, MAP_SYNC will
> > force on while we set pmem=on. So we only have 2 user cases, Will update
> > to user documentation.
> > 1) pmem=on, MAP_SYNC not desired
> > We will not pass the flag to mmap2
> 
> If this use case is supported, how the command-line should look
> like to enable it?
> 
> 
> > 2) pmem=on, MAP_SYNC desired
> > We will pass the flag to mmap2
> 
> Same question as above: how the command-line should look like for
> this use case?
Sorry, I got some miss-understood of the MAP_SYNC desired.
As we talk with Micheal:

we give up on a bit of flexibility, and just say
pmem=on forces MAP_SYNC. on a MAP_SYNC capable configrations(kernel+
backend dax)

Current user case is like below:

 1. pmem=on is set, shared=on is set, MAP_SYNC supported in linux kernel:
        a: backend is a dax supporting file.
	 - MAP_SYNC will active.
	b: backedn is not a dax supporting file.
	 - mmap will result in an EOPNOTSUPP error.

2. The reset of cases:
	- we will never pass the MAP_SYNC to mmap2

> 
> > 
> > > 3) pmem=on, MAP_SYNC required, not optional
> > > 
> > > Which cases from the list above we need to support?
> > > 
> > > From the cases above, what's the expected semantics of "pmem=on"
> > > with no extra options?
> 
> We still need to answer that question.
> 
> The current semantics of pmem=on (with no extra options) is (1).
> It looks like we can't change it to (2) without breaking existing
> configurations.  If you make existing configurations stop working
> on hosts where they currently work, you need to explain why it's
> OK to do that.
> 
> 
> > > 
> > > If these questions are not answered (in the commit message and
> > > user documentation), we won't be able to review and discuss the
> > > code.
> > > 
> > > 
> [...]
> 
> 
> -- 
> Eduardo
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG.
  2019-01-21 20:24       ` Michael S. Tsirkin
@ 2019-01-22  3:25         ` Yi Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Yi Zhang @ 2019-01-22  3:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
	ehabkost, qemu-devel, imammedo, dan.j.williams

On 2019-01-21 at 15:24:17 -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 21, 2019 at 02:35:57PM +0800, Yi Zhang wrote:
> > On 2019-01-16 at 10:55:33 -0500, Michael S. Tsirkin wrote:
> > > On Wed, Jan 16, 2019 at 04:10:29PM +0800, Zhang Yi wrote:
> > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > OK so if you apply this patch, do you get
> > > any sparse warning at all?
> > Didn't get any sparse warning.
> 
> 
> The reason I find it strange is because you then go on
> to assign these flags to an untagged unint32_t.
> 
> Did you forget to configure with sparse enabled maybe?
Ah.. Right, Sorry for that I don't have much knowledge about this
feature and how to use that. I will split this effort out and focus
MAP_SYNC work. 

Thanks Micheal.
> 
> 
> > > 
> > > If not how come?
> > > 
> > > And we need to ask patchew maintainers to 
> > > 
> > > Overall I'd suggest that since you no longer need
> > > a new RAM flag, split this effort out and finish
> > > MAP_SYNC work first.
> > Ok, Will
> > > 
> > > 
> > > > ---
> > > >  include/exec/memory.h | 12 ++++++------
> > > >  include/qemu/osdep.h  |  9 +++++++++
> > > >  2 files changed, 15 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > index 667466b..03824d9 100644
> > > > --- a/include/exec/memory.h
> > > > +++ b/include/exec/memory.h
> > > > @@ -104,27 +104,27 @@ struct IOMMUNotifier {
> > > >  typedef struct IOMMUNotifier IOMMUNotifier;
> > > >  
> > > >  /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
> > > > -#define RAM_PREALLOC   (1 << 0)
> > > > +#define RAM_PREALLOC ((QEMU_FORCE QemuMmapFlags) (1 << 0))
> > > >  
> > > >  /* RAM is mmap-ed with MAP_SHARED */
> > > > -#define RAM_SHARED     (1 << 1)
> > > > +#define RAM_SHARED ((QEMU_FORCE QemuMmapFlags) (1 << 1))
> > > >  
> > > >  /* Only a portion of RAM (used_length) is actually used, and migrated.
> > > >   * This used_length size can change across reboots.
> > > >   */
> > > > -#define RAM_RESIZEABLE (1 << 2)
> > > > +#define RAM_RESIZEABLE ((QEMU_FORCE QemuMmapFlags) (1 << 2))
> > > >  
> > > >  /* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
> > > >   * zero the page and wake waiting processes.
> > > >   * (Set during postcopy)
> > > >   */
> > > > -#define RAM_UF_ZEROPAGE (1 << 3)
> > > > +#define RAM_UF_ZEROPAGE ((QEMU_FORCE QemuMmapFlags) (1 << 3))
> > > >  
> > > >  /* RAM can be migrated */
> > > > -#define RAM_MIGRATABLE (1 << 4)
> > > > +#define RAM_MIGRATABLE ((QEMU_FORCE QemuMmapFlags) (1 << 4))
> > > >  
> > > >  /* RAM is a persistent kind memory */
> > > > -#define RAM_PMEM (1 << 5)
> > > > +#define RAM_PMEM ((QEMU_FORCE QemuMmapFlags) (1 << 5))
> > > >  
> > > >  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> > > >                                         IOMMUNotifierFlag flags,
> > > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > > > index 3bf48bc..457d24e 100644
> > > > --- a/include/qemu/osdep.h
> > > > +++ b/include/qemu/osdep.h
> > > > @@ -185,6 +185,15 @@ extern int daemon(int, int);
> > > >  #define ESHUTDOWN 4099
> > > >  #endif
> > > >  
> > > > +#ifdef __CHECKER__
> > > > +#define QEMU_BITWISE __attribute__((bitwise))
> > > > +#define QEMU_FORCE   __attribute__((force))
> > > > +#else
> > > > +#define QEMU_BITWISE
> > > > +#define QEMU_FORCE
> > > > +#endif
> > > > +
> > > > +typedef unsigned QEMU_BITWISE QemuMmapFlags;
> > > >  /* time_t may be either 32 or 64 bits depending on the host OS, and
> > > >   * can be either signed or unsigned, so we can't just hardcode a
> > > >   * specific maximum value. This is not a C preprocessor constant,
> > > > -- 
> > > > 2.7.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-22  3:21           ` Yi Zhang
@ 2019-01-22  3:27             ` Michael S. Tsirkin
  2019-01-22 17:33               ` Dan Williams
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-22  3:27 UTC (permalink / raw)
  To: Eduardo Habkost, pagupta, qemu-devel, yu.c.zhang, stefanha,
	imammedo, pbonzini, dan.j.williams, xiaoguangrong.eric

On Tue, Jan 22, 2019 at 11:21:25AM +0800, Yi Zhang wrote:
> On 2019-01-21 at 12:44:00 -0200, Eduardo Habkost wrote:
> > On Mon, Jan 21, 2019 at 01:15:36PM +0800, Yi Zhang wrote:
> > > On 2019-01-18 at 16:11:47 -0200, Eduardo Habkost wrote:
> > [...]
> > > > Anyway, I see a more fundamental problem in each version of this
> > > > patch: the semantics of the command-line options are not clearly
> > > > documented.
> > > > 
> > > > We have at least 3 different possible use cases we might need to
> > > > support:
> > > > 
> > > > 1) pmem=on, MAP_SYNC not desired
> > > > 2) pmem=on, MAP_SYNC desired but optional
> > > 
> > > Form V9, As Michael suggest, We removed the sync option, MAP_SYNC will
> > > force on while we set pmem=on. So we only have 2 user cases, Will update
> > > to user documentation.
> > > 1) pmem=on, MAP_SYNC not desired
> > > We will not pass the flag to mmap2
> > 
> > If this use case is supported, how the command-line should look
> > like to enable it?
> > 
> > 
> > > 2) pmem=on, MAP_SYNC desired
> > > We will pass the flag to mmap2
> > 
> > Same question as above: how the command-line should look like for
> > this use case?
> Sorry, I got some miss-understood of the MAP_SYNC desired.
> As we talk with Micheal:
> 
> we give up on a bit of flexibility, and just say
> pmem=on forces MAP_SYNC. on a MAP_SYNC capable configrations(kernel+
> backend dax)
> 
> Current user case is like below:
> 
>  1. pmem=on is set, shared=on is set, MAP_SYNC supported in linux kernel:
>         a: backend is a dax supporting file.
> 	 - MAP_SYNC will active.
> 	b: backedn is not a dax supporting file.
> 	 - mmap will result in an EOPNOTSUPP error.
> 
> 2. The reset of cases:
> 	- we will never pass the MAP_SYNC to mmap2

I don't see code probing for MAP_SYNC support. Did I miss it?
But if all you want is to have old linux ignore MAP_SYNC,
I think you got your wish automatically - just do not set
MAP_SHARED_VALIDATE.


> > 
> > > 
> > > > 3) pmem=on, MAP_SYNC required, not optional
> > > > 
> > > > Which cases from the list above we need to support?
> > > > 
> > > > From the cases above, what's the expected semantics of "pmem=on"
> > > > with no extra options?
> > 
> > We still need to answer that question.
> > 
> > The current semantics of pmem=on (with no extra options) is (1).
> > It looks like we can't change it to (2) without breaking existing
> > configurations.  If you make existing configurations stop working
> > on hosts where they currently work, you need to explain why it's
> > OK to do that.
> > 
> > 
> > > > 
> > > > If these questions are not answered (in the commit message and
> > > > user documentation), we won't be able to review and discuss the
> > > > code.
> > > > 
> > > > 
> > [...]
> > 
> > 
> > -- 
> > Eduardo
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-22  3:27             ` Michael S. Tsirkin
@ 2019-01-22 17:33               ` Dan Williams
  2019-01-22 18:47                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Dan Williams @ 2019-01-22 17:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eduardo Habkost, Pankaj Gupta, Qemu Developers, yu.c.zhang,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini, Xiao Guangrong

On Mon, Jan 21, 2019 at 7:27 PM Michael S. Tsirkin <mst@redhat.com> wrote:
[..]
> > 2. The reset of cases:
> >       - we will never pass the MAP_SYNC to mmap2
>
> I don't see code probing for MAP_SYNC support. Did I miss it?
> But if all you want is to have old linux ignore MAP_SYNC,
> I think you got your wish automatically - just do not set
> MAP_SHARED_VALIDATE.

That will also cause new Linux to ignore MAP_SYNC.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2019-01-22 17:33               ` Dan Williams
@ 2019-01-22 18:47                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-22 18:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Eduardo Habkost, Pankaj Gupta, Qemu Developers, yu.c.zhang,
	Stefan Hajnoczi, Igor Mammedov, Paolo Bonzini, Xiao Guangrong

On Tue, Jan 22, 2019 at 09:33:37AM -0800, Dan Williams wrote:
> On Mon, Jan 21, 2019 at 7:27 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> [..]
> > > 2. The reset of cases:
> > >       - we will never pass the MAP_SYNC to mmap2
> >
> > I don't see code probing for MAP_SYNC support. Did I miss it?
> > But if all you want is to have old linux ignore MAP_SYNC,
> > I think you got your wish automatically - just do not set
> > MAP_SHARED_VALIDATE.
> 
> That will also cause new Linux to ignore MAP_SYNC.

Oh you are right. I missed this point.

And given that these patches do not seem to set MAP_SHARED_VALIDATE
at all I conclude that even though thet set MAP_SYNC
it actually has no effect at all.

So I wonder how they were tested.
Would the contributors care to elaborate?
That would be good info to put in the commit log message.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-22 19:33 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16  8:10 [Qemu-devel] [PATCH V9 0/6] support MAP_SYNC for memory-backend-file Zhang Yi
2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 1/6] numa: Fixed the memory leak of numa error message Zhang Yi
2019-01-16 15:56   ` Michael S. Tsirkin
2019-01-18 17:36     ` Eduardo Habkost
2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 2/6] memory: use sparse feature define RAM_FLAG Zhang Yi
2019-01-16 15:55   ` Michael S. Tsirkin
2019-01-21  6:35     ` Yi Zhang
2019-01-21 20:24       ` Michael S. Tsirkin
2019-01-22  3:25         ` Yi Zhang
2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 3/6] util/mmap-alloc: switch 'shared' to 'flags' parameter Zhang Yi
2019-01-16 15:49   ` Michael S. Tsirkin
2019-01-16  8:10 ` [Qemu-devel] [PATCH V9 4/6] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang Yi
2019-01-16 15:58   ` Michael S. Tsirkin
2019-01-18 18:11     ` Eduardo Habkost
2019-01-21  5:15       ` Yi Zhang
2019-01-21 14:44         ` Eduardo Habkost
2019-01-22  3:21           ` Yi Zhang
2019-01-22  3:27             ` Michael S. Tsirkin
2019-01-22 17:33               ` Dan Williams
2019-01-22 18:47                 ` Michael S. Tsirkin
2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 5/6] hostmem: add more information in error messages Zhang Yi
2019-01-16 11:30   ` Stefano Garzarella
2019-01-16  8:11 ` [Qemu-devel] [PATCH V9 6/6] docs: Added MAP_SYNC documentation Zhang Yi
2019-01-16 15:40   ` Michael S. Tsirkin
2019-01-21  6:11     ` Yi Zhang
2019-01-21  7:21       ` Pankaj Gupta
2019-01-21  7:57         ` Yi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.