All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file
@ 2018-01-17  8:13 Haozhong Zhang
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Haozhong Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Haozhong Zhang @ 2018-01-17  8:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst, dgilbert,
	Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang

v3 is based on Eduardo's machine-next tree. Changes of
memory-backend-file are based on my previous patches in that tree.


Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

This patchset enables QEMU to use MAP_SYNC flag for memory-backend-file,
in order to guarantee the guest write persistence to backend files
supporting DAX.

A new auto on/off option 'sync' is added to memory-backend-file:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
        'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
        'sync=on'; otherwise, work as if 'sync=off'

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Haozhong Zhang (3):
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  hostmem: add hostmem type name and id in error messages
  hostmem-file: add 'sync' option

 backends/hostmem-file.c   | 49 +++++++++++++++++++++++++++++++++++++++++++----
 backends/hostmem.c        | 11 +++++++----
 docs/nvdimm.txt           | 15 ++++++++++++++-
 exec.c                    | 13 ++++++++-----
 include/exec/memory.h     |  4 ++++
 include/exec/ram_addr.h   |  6 +++---
 include/qemu/mmap-alloc.h |  3 ++-
 include/qemu/osdep.h      | 18 +++++++++++++++++
 memory.c                  |  6 ++++--
 numa.c                    |  2 +-
 qemu-options.hx           | 21 +++++++++++++++++++-
 util/mmap-alloc.c         | 24 +++++++++++++++++++++--
 util/oslib-posix.c        |  2 +-
 13 files changed, 149 insertions(+), 25 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2018-01-17  8:13 [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang
@ 2018-01-17  8:13 ` Haozhong Zhang
  2018-01-24 20:20   ` Michael S. Tsirkin
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 2/3] hostmem: add more information in error messages Haozhong Zhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Haozhong Zhang @ 2018-01-17  8:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst, dgilbert,
	Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang

When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition can guarantee the persistence of guest write
to the backend file without other QEMU actions (e.g., periodic fsync()
by QEMU).

A OnOffAuto parameter 'sync' is added to qemu_ram_mmap():

- If sync == ON_OFF_AUTO_ON, qemu_ram_mmap() will try to pass MAP_SYNC
  to mmap(). It will then fail if the host OS or the backend file do
  not support MAP_SYNC, or MAP_SYNC is conflict with other flags.

- If sync == ON_OFF_AUTO_OFF, qemu_ram_mmap() will never pass MAP_SYNC
  to mmap().

- If sync == ON_OFF_AUTO_AUTO, and
  * if the host OS and the backend file support MAP_SYNC, and MAP_SYNC
    is not conflict with other flags, qemu_ram_mmap() will work as if
    sync == ON_OFF_AUTO_ON.
  * otherwise, qemu_ram_mmap() will work as if sync == ON_OFF_AUTO_OFF.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 exec.c                    |  2 +-
 include/qemu/mmap-alloc.h |  3 ++-
 include/qemu/osdep.h      | 18 ++++++++++++++++++
 util/mmap-alloc.c         | 24 ++++++++++++++++++++++--
 util/oslib-posix.c        |  2 +-
 5 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 8fba88ae1c..f4254cb6d3 100644
--- a/exec.c
+++ b/exec.c
@@ -1646,7 +1646,7 @@ static void *file_ram_alloc(RAMBlock *block,
     }
 
     area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED);
+                         block->flags & RAM_SHARED, ON_OFF_AUTO_OFF);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3f81..dd5876471f 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,8 @@ size_t qemu_fd_getpagesize(int fd);
 
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared,
+                    OnOffAuto sync);
 
 void qemu_ram_munmap(void *ptr, size_t size);
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index adb3758275..0ff10cb529 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -372,6 +372,24 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
 
+/*
+ * MAP_SHARED_VALIDATE and MAP_SYNC were introduced in Linux kernel
+ * 4.15, so they may not be defined when compiling on older kernels.
+ */
+#ifdef CONFIG_LINUX
+#ifndef MAP_SHARED_VALIDATE
+#define MAP_SHARED_VALIDATE   0x3
+#endif
+#ifndef MAP_SYNC
+#define MAP_SYNC              0x80000
+#endif
+#define QEMU_HAS_MAP_SYNC     true
+#else  /* !CONFIG_LINUX */
+#define MAP_SHARED_VALIDATE   0x0
+#define MAP_SYNC              0x0
+#define QEMU_HAS_MAP_SYNC     false
+#endif /* CONFIG_LINUX */
+
 #ifdef CONFIG_POSIX
 struct qemu_signalfd_siginfo {
     uint32_t ssi_signo;   /* Signal number */
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 2fd8cbcc6f..b42d9719f3 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -73,7 +73,8 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
     return getpagesize();
 }
 
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared,
+                    OnOffAuto sync)
 {
     /*
      * Note: this always allocates at least one extra page of virtual address
@@ -97,6 +98,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
 #endif
     size_t offset;
     void *ptr1;
+    int xflags = 0;
 
     if (ptr == MAP_FAILED) {
         return MAP_FAILED;
@@ -106,13 +108,31 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
     /* Always align to host page size */
     assert(align >= getpagesize());
 
+    if (!QEMU_HAS_MAP_SYNC || !shared) {
+        if (sync == ON_OFF_AUTO_ON) {
+            return MAP_FAILED;
+        }
+        sync = ON_OFF_AUTO_OFF;
+    }
+    if (sync != ON_OFF_AUTO_OFF) {
+        /* MAP_SYNC is only available with MAP_SHARED_VALIDATE. */
+        xflags |= MAP_SYNC | MAP_SHARED_VALIDATE;
+    }
+
     offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
+ retry_mmap_fd:
     ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
                 MAP_FIXED |
                 (fd == -1 ? MAP_ANONYMOUS : 0) |
-                (shared ? MAP_SHARED : MAP_PRIVATE),
+                (shared ? MAP_SHARED : MAP_PRIVATE) | xflags,
                 fd, 0);
     if (ptr1 == MAP_FAILED) {
+        if (sync == ON_OFF_AUTO_AUTO) {
+            xflags &= ~(MAP_SYNC | MAP_SHARED_VALIDATE);
+            sync = ON_OFF_AUTO_OFF;
+            goto retry_mmap_fd;
+        }
+
         munmap(ptr, total);
         return MAP_FAILED;
     }
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 77369c92ce..ecb1c275d2 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -130,7 +130,7 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment)
 {
     size_t align = QEMU_VMALLOC_ALIGN;
-    void *ptr = qemu_ram_mmap(-1, size, align, false);
+    void *ptr = qemu_ram_mmap(-1, size, align, false, ON_OFF_AUTO_OFF);
 
     if (ptr == MAP_FAILED) {
         return NULL;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v3 2/3] hostmem: add more information in error messages
  2018-01-17  8:13 [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Haozhong Zhang
@ 2018-01-17  8:13 ` Haozhong Zhang
  2018-01-24 20:23   ` Michael S. Tsirkin
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option Haozhong Zhang
  2018-01-24  8:12 ` [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang
  3 siblings, 1 reply; 11+ messages in thread
From: Haozhong Zhang @ 2018-01-17  8:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst, dgilbert,
	Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang

When there are multiple memory backends in use, including the object type
name, ID and the property name in the error message can help users to
locate the error.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Suggested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
---
 backends/hostmem-file.c |  9 ++++++---
 backends/hostmem.c      | 11 +++++++----
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e319ec1ad8..ed7d145365 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -79,7 +79,8 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(errp, "cannot change property value");
+        error_setg(errp, "cannot change property 'mem-path' of %s '%s'",
+                   object_get_typename(o), backend->id);
         return;
     }
     g_free(fb->mem_path);
@@ -99,7 +100,8 @@ static void file_memory_backend_set_share(Object *o, bool value, Error **errp)
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(errp, "cannot change property value");
+        error_setg(errp, "cannot change property 'share' of %s '%s'",
+                   object_get_typename(o), backend->id);
         return;
     }
     fb->share = value;
@@ -136,7 +138,8 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     uint64_t val;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property '%s' of %s '%s'",
+                   name, object_get_typename(o), backend->id);
         goto out;
     }
 
diff --git a/backends/hostmem.c b/backends/hostmem.c
index ee2c2d5bfd..6853d19bc5 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -46,7 +46,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
     uint64_t value;
 
     if (host_memory_backend_mr_inited(backend)) {
-        error_setg(&local_err, "cannot change property value");
+        error_setg(&local_err, "cannot change property %s of %s '%s'",
+                   name, object_get_typename(obj), backend->id);
         goto out;
     }
 
@@ -55,8 +56,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
         goto out;
     }
     if (!value) {
-        error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
-                   PRIu64 "'", object_get_typename(obj), name, value);
+        error_setg(&local_err,
+                   "property '%s' of %s '%s' doesn't take value '%" PRIu64 "'",
+                   name, object_get_typename(obj), backend->id, value);
         goto out;
     }
     backend->size = value;
@@ -363,7 +365,8 @@ static void set_id(Object *o, const char *str, Error **errp)
     HostMemoryBackend *backend = MEMORY_BACKEND(o);
 
     if (backend->id) {
-        error_setg(errp, "cannot change property value");
+        error_setg(errp, "cannot change property 'id' of %s '%s'",
+                   object_get_typename(o), backend->id);
         return;
     }
     backend->id = g_strdup(str);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option
  2018-01-17  8:13 [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Haozhong Zhang
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 2/3] hostmem: add more information in error messages Haozhong Zhang
@ 2018-01-17  8:13 ` Haozhong Zhang
  2018-01-24 20:22   ` Michael S. Tsirkin
  2018-01-24 20:23   ` Michael S. Tsirkin
  2018-01-24  8:12 ` [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang
  3 siblings, 2 replies; 11+ messages in thread
From: Haozhong Zhang @ 2018-01-17  8:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst, dgilbert,
	Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang

This option controls whether QEMU mmap(2) the memory backend file with
MAP_SYNC flag, which can fully guarantee the guest write persistence
to the backend, if MAP_SYNC flag is supported by the host kernel
(Linux kernel 4.15 and later) and the backend is a file supporting
DAX (e.g., file on ext4/xfs file system mounted with '-o dax').

It can take one of following values:
 - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
        'share=off', QEMU will abort
 - off: never pass MAP_SYNC to mmap(2)
 - auto (default): if MAP_SYNC is supported and 'share=on', work as if
        'sync=on'; otherwise, work as if 'sync=off'

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
---
 backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++-
 docs/nvdimm.txt         | 15 ++++++++++++++-
 exec.c                  | 13 ++++++++-----
 include/exec/memory.h   |  4 ++++
 include/exec/ram_addr.h |  6 +++---
 memory.c                |  6 ++++--
 numa.c                  |  2 +-
 qemu-options.hx         | 21 ++++++++++++++++++++-
 8 files changed, 93 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index ed7d145365..96dff38619 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -15,6 +15,7 @@
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
+#include "qapi-visit.h"
 
 /* hostmem-file.c */
 /**
@@ -35,6 +36,7 @@ struct HostMemoryBackendFile {
     bool discard_data;
     char *mem_path;
     uint64_t align;
+    OnOffAuto sync;
 };
 
 static void
@@ -60,7 +62,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  path,
                                  backend->size, fb->align, fb->share,
-                                 fb->mem_path, errp);
+                                 fb->sync, fb->mem_path, errp);
         g_free(path);
     }
 #endif
@@ -153,6 +155,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     error_propagate(errp, local_err);
 }
 
+static void file_memory_backend_get_sync(
+    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+    OnOffAuto value = fb->sync;
+
+    visit_type_OnOffAuto(v, name, &value, errp);
+}
+
+static void file_memory_backend_set_sync(
+    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+    Error *local_err = NULL;
+    OnOffAuto value;
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(&local_err, "cannot change property '%s' of %s '%s'",
+                   name, object_get_typename(obj), backend->id);
+        goto out;
+    }
+
+    visit_type_OnOffAuto(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    fb->sync = value;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
 static void file_backend_unparent(Object *obj)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -187,6 +222,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
         file_memory_backend_get_align,
         file_memory_backend_set_align,
         NULL, NULL, &error_abort);
+    object_class_property_add(oc, "sync", "OnOffAuto",
+        file_memory_backend_get_sync, file_memory_backend_set_sync,
+        NULL, NULL, &error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index e903d8bb09..49b174fe66 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -143,10 +143,23 @@ Guest Data Persistence
 ----------------------
 
 Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
+if MAP_SYNC is not supported by the host kernel and the backends,
+the only backend that can guarantee the guest write persistence
 is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
 which all guest access do not involve any host-side kernel cache.
 
+mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
+systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
+guarantee the guest write persistence to vNVDIMM. Besides the host
+kernel support, enabling MAP_SYNC in QEMU also requires:
+
+ - the backend is a file supporting DAX, e.g., a file on an ext4 or
+   xfs file system mounted with "-o dax",
+
+ - 'sync' option of memory-backend-file is not 'off', and
+
+ - 'share' option of memory-backend-file is 'on'.
+
 When using other types of backends, it's suggested to set 'unarmed'
 option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 guest NVDIMM region mapping structure.  This unarmed flag indicates
diff --git a/exec.c b/exec.c
index f4254cb6d3..ce13f8cb21 100644
--- a/exec.c
+++ b/exec.c
@@ -1600,6 +1600,7 @@ static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             int fd,
                             bool truncate,
+                            OnOffAuto sync,
                             Error **errp)
 {
     void *area;
@@ -1646,7 +1647,7 @@ static void *file_ram_alloc(RAMBlock *block,
     }
 
     area = qemu_ram_mmap(fd, memory, block->mr->align,
-                         block->flags & RAM_SHARED, ON_OFF_AUTO_OFF);
+                         block->flags & RAM_SHARED, sync);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for guest RAM");
@@ -1974,7 +1975,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
-                                 bool share, int fd,
+                                 bool share, OnOffAuto sync, int fd,
                                  Error **errp)
 {
     RAMBlock *new_block;
@@ -2017,7 +2018,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = share ? RAM_SHARED : 0;
-    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
+    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, sync,
+                                     errp);
     if (!new_block->host) {
         g_free(new_block);
         return NULL;
@@ -2035,7 +2037,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                   bool share, const char *mem_path,
+                                   bool share, OnOffAuto sync,
+                                   const char *mem_path,
                                    Error **errp)
 {
     int fd;
@@ -2047,7 +2050,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+    block = qemu_ram_alloc_from_fd(size, mr, share, sync, fd, errp);
     if (!block) {
         if (created) {
             unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 07c5d6d597..ff3cd583e9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -468,6 +468,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @align: alignment of the region base address; if 0, the default alignment
  *         (getpagesize()) will be used.
  * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @sync: %ON_OFF_AUTO_ON if memory must be mapped with MAP_SYNC flag;
+ *        %ON_OFF_AUTO_OFF if memory cannot be mapped with MAP_SYNC flag;
+ *        %ON_OFF_AUTO_AUTO directs QEMU to mmap with MAP_SYNC flag if possible
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -480,6 +483,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       uint64_t size,
                                       uint64_t align,
                                       bool share,
+                                      OnOffAuto sync,
                                       const char *path,
                                       Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6cbc02aa0f..4494ae9132 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -73,10 +73,10 @@ static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
 long qemu_getrampagesize(void);
 unsigned long last_ram_page(void);
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                   bool share, const char *mem_path,
-                                   Error **errp);
+                                   bool share, OnOffAuto sync,
+                                   const char *mem_path, Error **errp);
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
-                                 bool share, int fd,
+                                 bool share, OnOffAuto sync, int fd,
                                  Error **errp);
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                   MemoryRegion *mr, Error **errp);
diff --git a/memory.c b/memory.c
index 449a1429b9..e22f51394e 100644
--- a/memory.c
+++ b/memory.c
@@ -1572,6 +1572,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       uint64_t size,
                                       uint64_t align,
                                       bool share,
+                                      OnOffAuto sync,
                                       const char *path,
                                       Error **errp)
 {
@@ -1580,7 +1581,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
     mr->align = align;
-    mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
+    mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, sync, path, errp);
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 
@@ -1596,7 +1597,8 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, share, ON_OFF_AUTO_OFF, fd,
+                                           errp);
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 #endif
diff --git a/numa.c b/numa.c
index 83675a03f3..93180510a4 100644
--- a/numa.c
+++ b/numa.c
@@ -457,7 +457,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
 #ifdef __linux__
         Error *err = NULL;
         memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
-                                         mem_path, &err);
+                                         ON_OFF_AUTO_OFF, mem_path, &err);
         if (err) {
             error_report_err(err);
             if (mem_prealloc) {
diff --git a/qemu-options.hx b/qemu-options.hx
index 5ff741a4af..3ee423e6a8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3974,7 +3974,7 @@ property must be set.  These objects are placed in the
 
 @table @option
 
-@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
+@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
 
 Creates a memory file backend object, which can be used to back
 the guest RAM with huge pages.
@@ -4034,6 +4034,25 @@ requires an alignment different than the default one used by QEMU, eg
 the device DAX /dev/dax0.0 requires 2M alignment rather than 4K. In
 such cases, users can specify the required alignment via this option.
 
+The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
+with MAP_SYNC flag, which can fully guarantee the guest write
+persistence to @option{mem-path}. MAP_SYNC requires supports from both
+the host kernel (since Linux kernel 4.15) and @option{mem-path} (only
+files supporting DAX). It can take one of following values:
+
+@table @option
+@item @var{on}
+try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
+@option{share}=@var{off}, QEMU will abort
+
+@item @var{off}
+never pass MAP_SYNC to mmap(2)
+
+@item @var{auto} (default)
+if MAP_SYNC is supported and @option{share}=@var{on}, work as if
+@option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
+@end table
+
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
 Creates a memory backend object, which can be used to back the guest RAM.
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file
  2018-01-17  8:13 [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang
                   ` (2 preceding siblings ...)
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option Haozhong Zhang
@ 2018-01-24  8:12 ` Haozhong Zhang
  3 siblings, 0 replies; 11+ messages in thread
From: Haozhong Zhang @ 2018-01-24  8:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst, dgilbert,
	Xiao Guangrong, Stefan Hajnoczi, Dan Williams

ping?

On 01/17/18 16:13 +0800, Haozhong Zhang wrote:
> v3 is based on Eduardo's machine-next tree. Changes of
> memory-backend-file are based on my previous patches in that tree.
> 
> 
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').
> 
> A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>     https://patchwork.kernel.org/patch/10028151/
> 
> This patchset enables QEMU to use MAP_SYNC flag for memory-backend-file,
> in order to guarantee the guest write persistence to backend files
> supporting DAX.
> 
> A new auto on/off option 'sync' is added to memory-backend-file:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
>         'share=off', QEMU will abort
>  - off: never pass MAP_SYNC to mmap(2)
>  - auto (default): if MAP_SYNC is supported and 'share=on', work as if
>         'sync=on'; otherwise, work as if 'sync=off'
> 
> Changes in v3:
>  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
>    cases, and add back the retry mechanism. MAP_SYNC will be ignored
>    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
>  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
>    platforms in order to make qemu_ram_mmap() compile on those platforms.
>  * Patch 2&3: include more information in error messages of
>    memory-backend in hope to help user to identify the error.
>    (Dr. David Alan Gilbert)
>  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> 
> Changes in v2:
>  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
>  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
>    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
>  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
>    to osdep.h. (Michael S. Tsirkin)
> 
> Haozhong Zhang (3):
>   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
>   hostmem: add hostmem type name and id in error messages
>   hostmem-file: add 'sync' option
> 
>  backends/hostmem-file.c   | 49 +++++++++++++++++++++++++++++++++++++++++++----
>  backends/hostmem.c        | 11 +++++++----
>  docs/nvdimm.txt           | 15 ++++++++++++++-
>  exec.c                    | 13 ++++++++-----
>  include/exec/memory.h     |  4 ++++
>  include/exec/ram_addr.h   |  6 +++---
>  include/qemu/mmap-alloc.h |  3 ++-
>  include/qemu/osdep.h      | 18 +++++++++++++++++
>  memory.c                  |  6 ++++--
>  numa.c                    |  2 +-
>  qemu-options.hx           | 21 +++++++++++++++++++-
>  util/mmap-alloc.c         | 24 +++++++++++++++++++++--
>  util/oslib-posix.c        |  2 +-
>  13 files changed, 149 insertions(+), 25 deletions(-)
> 
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Haozhong Zhang
@ 2018-01-24 20:20   ` Michael S. Tsirkin
  2018-01-25  0:14     ` Haozhong Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2018-01-24 20:20 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini,
	dgilbert, Xiao Guangrong, Stefan Hajnoczi, Dan Williams

> index 50385e3f81..dd5876471f 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,7 +7,8 @@ size_t qemu_fd_getpagesize(int fd);
>  
>  size_t qemu_mempath_getpagesize(const char *mem_path);
>  
> -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> +void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared,
> +                    OnOffAuto sync);
>  
>  void qemu_ram_munmap(void *ptr, size_t size);
>  

And Marcel plans to add a remappable flag ...  Is it time we
switched to a flags field?

> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index adb3758275..0ff10cb529 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -372,6 +372,24 @@ void qemu_anon_ram_free(void *ptr, size_t size);
>  #  define QEMU_VMALLOC_ALIGN getpagesize()
>  #endif
>  
> +/*
> + * MAP_SHARED_VALIDATE and MAP_SYNC were introduced in Linux kernel
> + * 4.15, so they may not be defined when compiling on older kernels.
> + */
> +#ifdef CONFIG_LINUX
> +#ifndef MAP_SHARED_VALIDATE
> +#define MAP_SHARED_VALIDATE   0x3
> +#endif
> +#ifndef MAP_SYNC
> +#define MAP_SYNC              0x80000
> +#endif
> +#define QEMU_HAS_MAP_SYNC     true
> +#else  /* !CONFIG_LINUX */
> +#define MAP_SHARED_VALIDATE   0x0
> +#define MAP_SYNC              0x0
> +#define QEMU_HAS_MAP_SYNC     false
> +#endif /* CONFIG_LINUX */
> +
>  #ifdef CONFIG_POSIX
>  struct qemu_signalfd_siginfo {
>      uint32_t ssi_signo;   /* Signal number */

Please just import this into standard-headers from Linux.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option Haozhong Zhang
@ 2018-01-24 20:22   ` Michael S. Tsirkin
  2018-01-24 20:23   ` Michael S. Tsirkin
  1 sibling, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2018-01-24 20:22 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini,
	dgilbert, Xiao Guangrong, Stefan Hajnoczi, Dan Williams

On Wed, Jan 17, 2018 at 04:13:25PM +0800, Haozhong Zhang wrote:
> This option controls whether QEMU mmap(2) the memory backend file with
> MAP_SYNC flag, which can fully guarantee the guest write persistence
> to the backend, if MAP_SYNC flag is supported by the host kernel
> (Linux kernel 4.15 and later) and the backend is a file supporting
> DAX (e.g., file on ext4/xfs file system mounted with '-o dax').
> 
> It can take one of following values:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
>         'share=off', QEMU will abort
>  - off: never pass MAP_SYNC to mmap(2)
>  - auto (default): if MAP_SYNC is supported and 'share=on', work as if
>         'sync=on'; otherwise, work as if 'sync=off'
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  docs/nvdimm.txt         | 15 ++++++++++++++-
>  exec.c                  | 13 ++++++++-----
>  include/exec/memory.h   |  4 ++++
>  include/exec/ram_addr.h |  6 +++---
>  memory.c                |  6 ++++--
>  numa.c                  |  2 +-
>  qemu-options.hx         | 21 ++++++++++++++++++++-
>  8 files changed, 93 insertions(+), 14 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index ed7d145365..96dff38619 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -15,6 +15,7 @@
>  #include "sysemu/hostmem.h"
>  #include "sysemu/sysemu.h"
>  #include "qom/object_interfaces.h"
> +#include "qapi-visit.h"
>  
>  /* hostmem-file.c */
>  /**
> @@ -35,6 +36,7 @@ struct HostMemoryBackendFile {
>      bool discard_data;
>      char *mem_path;
>      uint64_t align;
> +    OnOffAuto sync;
>  };
>  
>  static void
> @@ -60,7 +62,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>          memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>                                   path,
>                                   backend->size, fb->align, fb->share,
> -                                 fb->mem_path, errp);
> +                                 fb->sync, fb->mem_path, errp);
>          g_free(path);
>      }
>  #endif
> @@ -153,6 +155,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
>      error_propagate(errp, local_err);
>  }
>  
> +static void file_memory_backend_get_sync(
> +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +    OnOffAuto value = fb->sync;
> +
> +    visit_type_OnOffAuto(v, name, &value, errp);
> +}
> +
> +static void file_memory_backend_set_sync(
> +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +    Error *local_err = NULL;
> +    OnOffAuto value;
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(&local_err, "cannot change property '%s' of %s '%s'",
> +                   name, object_get_typename(obj), backend->id);
> +        goto out;
> +    }
> +
> +    visit_type_OnOffAuto(v, name, &value, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +    fb->sync = value;
> +
> + out:
> +    error_propagate(errp, local_err);
> +}
> +
>  static void file_backend_unparent(Object *obj)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -187,6 +222,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
>          file_memory_backend_get_align,
>          file_memory_backend_set_align,
>          NULL, NULL, &error_abort);
> +    object_class_property_add(oc, "sync", "OnOffAuto",
> +        file_memory_backend_get_sync, file_memory_backend_set_sync,
> +        NULL, NULL, &error_abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index e903d8bb09..49b174fe66 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -143,10 +143,23 @@ Guest Data Persistence
>  ----------------------
>  
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
> +guarantee the guest write persistence to vNVDIMM. Besides the host
> +kernel support, enabling MAP_SYNC in QEMU also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with "-o dax",
> +
> + - 'sync' option of memory-backend-file is not 'off', and
> +
> + - 'share' option of memory-backend-file is 'on'.
> +
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
>  guest NVDIMM region mapping structure.  This unarmed flag indicates
> diff --git a/exec.c b/exec.c
> index f4254cb6d3..ce13f8cb21 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1600,6 +1600,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              int fd,
>                              bool truncate,
> +                            OnOffAuto sync,
>                              Error **errp)
>  {
>      void *area;
> @@ -1646,7 +1647,7 @@ static void *file_ram_alloc(RAMBlock *block,
>      }
>  
>      area = qemu_ram_mmap(fd, memory, block->mr->align,
> -                         block->flags & RAM_SHARED, ON_OFF_AUTO_OFF);
> +                         block->flags & RAM_SHARED, sync);
>      if (area == MAP_FAILED) {
>          error_setg_errno(errp, errno,
>                           "unable to map backing store for guest RAM");
> @@ -1974,7 +1975,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>  
>  #ifdef __linux__
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> -                                 bool share, int fd,
> +                                 bool share, OnOffAuto sync, int fd,
>                                   Error **errp)
>  {
>      RAMBlock *new_block;
> @@ -2017,7 +2018,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = share ? RAM_SHARED : 0;
> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, sync,
> +                                     errp);
>      if (!new_block->host) {
>          g_free(new_block);
>          return NULL;
> @@ -2035,7 +2037,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>  
>  
>  RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
> -                                   bool share, const char *mem_path,
> +                                   bool share, OnOffAuto sync,
> +                                   const char *mem_path,
>                                     Error **errp)
>  {
>      int fd;
> @@ -2047,7 +2050,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
>          return NULL;
>      }
>  
> -    block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
> +    block = qemu_ram_alloc_from_fd(size, mr, share, sync, fd, errp);
>      if (!block) {
>          if (created) {
>              unlink(mem_path);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 07c5d6d597..ff3cd583e9 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -468,6 +468,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
>   * @align: alignment of the region base address; if 0, the default alignment
>   *         (getpagesize()) will be used.
>   * @share: %true if memory must be mmaped with the MAP_SHARED flag
> + * @sync: %ON_OFF_AUTO_ON if memory must be mapped with MAP_SYNC flag;
> + *        %ON_OFF_AUTO_OFF if memory cannot be mapped with MAP_SYNC flag;
> + *        %ON_OFF_AUTO_AUTO directs QEMU to mmap with MAP_SYNC flag if possible
>   * @path: the path in which to allocate the RAM.
>   * @errp: pointer to Error*, to store an error if it happens.
>   *
> @@ -480,6 +483,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        uint64_t size,
>                                        uint64_t align,
>                                        bool share,
> +                                      OnOffAuto sync,
>                                        const char *path,
>                                        Error **errp);
>  
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 6cbc02aa0f..4494ae9132 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -73,10 +73,10 @@ static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
>  long qemu_getrampagesize(void);
>  unsigned long last_ram_page(void);
>  RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
> -                                   bool share, const char *mem_path,
> -                                   Error **errp);
> +                                   bool share, OnOffAuto sync,
> +                                   const char *mem_path, Error **errp);
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> -                                 bool share, int fd,
> +                                 bool share, OnOffAuto sync, int fd,
>                                   Error **errp);
>  RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>                                    MemoryRegion *mr, Error **errp);
> diff --git a/memory.c b/memory.c
> index 449a1429b9..e22f51394e 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1572,6 +1572,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        uint64_t size,
>                                        uint64_t align,
>                                        bool share,
> +                                      OnOffAuto sync,
>                                        const char *path,
>                                        Error **errp)
>  {
> @@ -1580,7 +1581,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>      mr->terminates = true;
>      mr->destructor = memory_region_destructor_ram;
>      mr->align = align;
> -    mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
> +    mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, sync, path, errp);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>  }
>  
> @@ -1596,7 +1597,8 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
>      mr->ram = true;
>      mr->terminates = true;
>      mr->destructor = memory_region_destructor_ram;
> -    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
> +    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, share, ON_OFF_AUTO_OFF, fd,
> +                                           errp);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>  }
>  #endif
> diff --git a/numa.c b/numa.c
> index 83675a03f3..93180510a4 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -457,7 +457,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>  #ifdef __linux__
>          Error *err = NULL;
>          memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
> -                                         mem_path, &err);
> +                                         ON_OFF_AUTO_OFF, mem_path, &err);
>          if (err) {
>              error_report_err(err);
>              if (mem_prealloc) {
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 5ff741a4af..3ee423e6a8 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3974,7 +3974,7 @@ property must be set.  These objects are placed in the
>  
>  @table @option
>  
> -@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> +@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
>  
>  Creates a memory file backend object, which can be used to back
>  the guest RAM with huge pages.
> @@ -4034,6 +4034,25 @@ requires an alignment different than the default one used by QEMU, eg
>  the device DAX /dev/dax0.0 requires 2M alignment rather than 4K. In
>  such cases, users can specify the required alignment via this option.
>  
> +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> +with MAP_SYNC flag, which can fully guarantee the guest write
> +persistence to @option{mem-path}. MAP_SYNC requires supports from both
> +the host kernel (since Linux kernel 4.15) and @option{mem-path} (only
> +files supporting DAX). It can take one of following values:
> +
> +@table @option
> +@item @var{on}
> +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> +@option{share}=@var{off}, QEMU will abort
> +
> +@item @var{off}
> +never pass MAP_SYNC to mmap(2)
> +
> +@item @var{auto} (default)
> +if MAP_SYNC is supported and @option{share}=@var{on}, work as if
> +@option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
> +@end table
> +
>  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
>  Creates a memory backend object, which can be used to back the guest RAM.
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option Haozhong Zhang
  2018-01-24 20:22   ` Michael S. Tsirkin
@ 2018-01-24 20:23   ` Michael S. Tsirkin
  2018-01-25  0:24     ` Haozhong Zhang
  1 sibling, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2018-01-24 20:23 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini,
	dgilbert, Xiao Guangrong, Stefan Hajnoczi, Dan Williams

On Wed, Jan 17, 2018 at 04:13:25PM +0800, Haozhong Zhang wrote:
> This option controls whether QEMU mmap(2) the memory backend file with
> MAP_SYNC flag, which can fully guarantee the guest write persistence
> to the backend, if MAP_SYNC flag is supported by the host kernel
> (Linux kernel 4.15 and later) and the backend is a file supporting
> DAX (e.g., file on ext4/xfs file system mounted with '-o dax').
> 
> It can take one of following values:
>  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
>         'share=off', QEMU will abort
>  - off: never pass MAP_SYNC to mmap(2)
>  - auto (default): if MAP_SYNC is supported and 'share=on', work as if
>         'sync=on'; otherwise, work as if 'sync=off'
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> ---
>  backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  docs/nvdimm.txt         | 15 ++++++++++++++-
>  exec.c                  | 13 ++++++++-----
>  include/exec/memory.h   |  4 ++++
>  include/exec/ram_addr.h |  6 +++---
>  memory.c                |  6 ++++--
>  numa.c                  |  2 +-
>  qemu-options.hx         | 21 ++++++++++++++++++++-
>  8 files changed, 93 insertions(+), 14 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index ed7d145365..96dff38619 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -15,6 +15,7 @@
>  #include "sysemu/hostmem.h"
>  #include "sysemu/sysemu.h"
>  #include "qom/object_interfaces.h"
> +#include "qapi-visit.h"
>  
>  /* hostmem-file.c */
>  /**
> @@ -35,6 +36,7 @@ struct HostMemoryBackendFile {
>      bool discard_data;
>      char *mem_path;
>      uint64_t align;
> +    OnOffAuto sync;
>  };
>  
>  static void
> @@ -60,7 +62,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>          memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>                                   path,
>                                   backend->size, fb->align, fb->share,
> -                                 fb->mem_path, errp);
> +                                 fb->sync, fb->mem_path, errp);
>          g_free(path);
>      }
>  #endif
> @@ -153,6 +155,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
>      error_propagate(errp, local_err);
>  }
>  
> +static void file_memory_backend_get_sync(
> +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +    OnOffAuto value = fb->sync;
> +
> +    visit_type_OnOffAuto(v, name, &value, errp);
> +}
> +
> +static void file_memory_backend_set_sync(
> +    Object *obj, Visitor *v, const char *name, void *opaque, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
> +    Error *local_err = NULL;
> +    OnOffAuto value;
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(&local_err, "cannot change property '%s' of %s '%s'",
> +                   name, object_get_typename(obj), backend->id);
> +        goto out;
> +    }
> +
> +    visit_type_OnOffAuto(v, name, &value, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +    fb->sync = value;
> +
> + out:
> +    error_propagate(errp, local_err);
> +}
> +
>  static void file_backend_unparent(Object *obj)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -187,6 +222,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
>          file_memory_backend_get_align,
>          file_memory_backend_set_align,
>          NULL, NULL, &error_abort);
> +    object_class_property_add(oc, "sync", "OnOffAuto",
> +        file_memory_backend_get_sync, file_memory_backend_set_sync,
> +        NULL, NULL, &error_abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index e903d8bb09..49b174fe66 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -143,10 +143,23 @@ Guest Data Persistence
>  ----------------------
>  
>  Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> +if MAP_SYNC is not supported by the host kernel and the backends,
> +the only backend that can guarantee the guest write persistence
>  is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
>  which all guest access do not involve any host-side kernel cache.
>  
> +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such
> +systems, QEMU can mmap(2) the backend with MAP_SYNC, which can
> +guarantee the guest write persistence to vNVDIMM. Besides the host
> +kernel support, enabling MAP_SYNC in QEMU also requires:
> +
> + - the backend is a file supporting DAX, e.g., a file on an ext4 or
> +   xfs file system mounted with "-o dax",
> +
> + - 'sync' option of memory-backend-file is not 'off', and
> +
> + - 'share' option of memory-backend-file is 'on'.
> +
>  When using other types of backends, it's suggested to set 'unarmed'
>  option of '-device nvdimm' to 'on', which sets the unarmed flag of the
>  guest NVDIMM region mapping structure.  This unarmed flag indicates
> diff --git a/exec.c b/exec.c
> index f4254cb6d3..ce13f8cb21 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1600,6 +1600,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              int fd,
>                              bool truncate,
> +                            OnOffAuto sync,
>                              Error **errp)
>  {
>      void *area;
> @@ -1646,7 +1647,7 @@ static void *file_ram_alloc(RAMBlock *block,
>      }
>  
>      area = qemu_ram_mmap(fd, memory, block->mr->align,
> -                         block->flags & RAM_SHARED, ON_OFF_AUTO_OFF);
> +                         block->flags & RAM_SHARED, sync);
>      if (area == MAP_FAILED) {
>          error_setg_errno(errp, errno,
>                           "unable to map backing store for guest RAM");
> @@ -1974,7 +1975,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>  
>  #ifdef __linux__
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> -                                 bool share, int fd,
> +                                 bool share, OnOffAuto sync, int fd,
>                                   Error **errp)
>  {
>      RAMBlock *new_block;
> @@ -2017,7 +2018,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = share ? RAM_SHARED : 0;
> -    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
> +    new_block->host = file_ram_alloc(new_block, size, fd, !file_size, sync,
> +                                     errp);
>      if (!new_block->host) {
>          g_free(new_block);
>          return NULL;
> @@ -2035,7 +2037,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>  
>  
>  RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
> -                                   bool share, const char *mem_path,
> +                                   bool share, OnOffAuto sync,
> +                                   const char *mem_path,
>                                     Error **errp)
>  {
>      int fd;
> @@ -2047,7 +2050,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
>          return NULL;
>      }
>  
> -    block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
> +    block = qemu_ram_alloc_from_fd(size, mr, share, sync, fd, errp);
>      if (!block) {
>          if (created) {
>              unlink(mem_path);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 07c5d6d597..ff3cd583e9 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -468,6 +468,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
>   * @align: alignment of the region base address; if 0, the default alignment
>   *         (getpagesize()) will be used.
>   * @share: %true if memory must be mmaped with the MAP_SHARED flag
> + * @sync: %ON_OFF_AUTO_ON if memory must be mapped with MAP_SYNC flag;
> + *        %ON_OFF_AUTO_OFF if memory cannot be mapped with MAP_SYNC flag;
> + *        %ON_OFF_AUTO_AUTO directs QEMU to mmap with MAP_SYNC flag if possible
>   * @path: the path in which to allocate the RAM.
>   * @errp: pointer to Error*, to store an error if it happens.
>   *
> @@ -480,6 +483,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        uint64_t size,
>                                        uint64_t align,
>                                        bool share,
> +                                      OnOffAuto sync,
>                                        const char *path,
>                                        Error **errp);
>  
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 6cbc02aa0f..4494ae9132 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -73,10 +73,10 @@ static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
>  long qemu_getrampagesize(void);
>  unsigned long last_ram_page(void);
>  RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
> -                                   bool share, const char *mem_path,
> -                                   Error **errp);
> +                                   bool share, OnOffAuto sync,
> +                                   const char *mem_path, Error **errp);
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
> -                                 bool share, int fd,
> +                                 bool share, OnOffAuto sync, int fd,
>                                   Error **errp);
>  RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>                                    MemoryRegion *mr, Error **errp);
> diff --git a/memory.c b/memory.c
> index 449a1429b9..e22f51394e 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1572,6 +1572,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        uint64_t size,
>                                        uint64_t align,
>                                        bool share,
> +                                      OnOffAuto sync,
>                                        const char *path,
>                                        Error **errp)
>  {
> @@ -1580,7 +1581,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>      mr->terminates = true;
>      mr->destructor = memory_region_destructor_ram;
>      mr->align = align;
> -    mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
> +    mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, sync, path, errp);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>  }
>  
> @@ -1596,7 +1597,8 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
>      mr->ram = true;
>      mr->terminates = true;
>      mr->destructor = memory_region_destructor_ram;
> -    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
> +    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, share, ON_OFF_AUTO_OFF, fd,
> +                                           errp);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>  }
>  #endif
> diff --git a/numa.c b/numa.c
> index 83675a03f3..93180510a4 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -457,7 +457,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>  #ifdef __linux__
>          Error *err = NULL;
>          memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
> -                                         mem_path, &err);
> +                                         ON_OFF_AUTO_OFF, mem_path, &err);
>          if (err) {
>              error_report_err(err);
>              if (mem_prealloc) {
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 5ff741a4af..3ee423e6a8 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3974,7 +3974,7 @@ property must be set.  These objects are placed in the
>  
>  @table @option
>  
> -@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> +@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
>  
>  Creates a memory file backend object, which can be used to back
>  the guest RAM with huge pages.
> @@ -4034,6 +4034,25 @@ requires an alignment different than the default one used by QEMU, eg
>  the device DAX /dev/dax0.0 requires 2M alignment rather than 4K. In
>  such cases, users can specify the required alignment via this option.
>  
> +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> +with MAP_SYNC flag, which can fully guarantee the guest write
> +persistence to @option{mem-path}.

I would add ... even in case of a host power loss.
Here and wherever you say "fully".

> MAP_SYNC requires supports from both
> +the host kernel (since Linux kernel 4.15) and @option{mem-path} (only
> +files supporting DAX). It can take one of following values:
> +
> +@table @option
> +@item @var{on}
> +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> +@option{share}=@var{off}, QEMU will abort
> +
> +@item @var{off}
> +never pass MAP_SYNC to mmap(2)
> +
> +@item @var{auto} (default)
> +if MAP_SYNC is supported and @option{share}=@var{on}, work as if
> +@option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
> +@end table
> +
>  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>  
>  Creates a memory backend object, which can be used to back the guest RAM.
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] hostmem: add more information in error messages
  2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 2/3] hostmem: add more information in error messages Haozhong Zhang
@ 2018-01-24 20:23   ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2018-01-24 20:23 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini,
	dgilbert, Xiao Guangrong, Stefan Hajnoczi, Dan Williams

On Wed, Jan 17, 2018 at 04:13:24PM +0800, Haozhong Zhang wrote:
> When there are multiple memory backends in use, including the object type
> name, ID and the property name in the error message can help users to
> locate the error.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Suggested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  backends/hostmem-file.c |  9 ++++++---
>  backends/hostmem.c      | 11 +++++++----
>  2 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index e319ec1ad8..ed7d145365 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -79,7 +79,8 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
>      HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
>  
>      if (host_memory_backend_mr_inited(backend)) {
> -        error_setg(errp, "cannot change property value");
> +        error_setg(errp, "cannot change property 'mem-path' of %s '%s'",
> +                   object_get_typename(o), backend->id);
>          return;
>      }
>      g_free(fb->mem_path);
> @@ -99,7 +100,8 @@ static void file_memory_backend_set_share(Object *o, bool value, Error **errp)
>      HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
>  
>      if (host_memory_backend_mr_inited(backend)) {
> -        error_setg(errp, "cannot change property value");
> +        error_setg(errp, "cannot change property 'share' of %s '%s'",
> +                   object_get_typename(o), backend->id);
>          return;
>      }
>      fb->share = value;
> @@ -136,7 +138,8 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
>      uint64_t val;
>  
>      if (host_memory_backend_mr_inited(backend)) {
> -        error_setg(&local_err, "cannot change property value");
> +        error_setg(&local_err, "cannot change property '%s' of %s '%s'",
> +                   name, object_get_typename(o), backend->id);
>          goto out;
>      }
>  
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index ee2c2d5bfd..6853d19bc5 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -46,7 +46,8 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
>      uint64_t value;
>  
>      if (host_memory_backend_mr_inited(backend)) {
> -        error_setg(&local_err, "cannot change property value");
> +        error_setg(&local_err, "cannot change property %s of %s '%s'",
> +                   name, object_get_typename(obj), backend->id);
>          goto out;
>      }
>  
> @@ -55,8 +56,9 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
>          goto out;
>      }
>      if (!value) {
> -        error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
> -                   PRIu64 "'", object_get_typename(obj), name, value);
> +        error_setg(&local_err,
> +                   "property '%s' of %s '%s' doesn't take value '%" PRIu64 "'",
> +                   name, object_get_typename(obj), backend->id, value);
>          goto out;
>      }
>      backend->size = value;
> @@ -363,7 +365,8 @@ static void set_id(Object *o, const char *str, Error **errp)
>      HostMemoryBackend *backend = MEMORY_BACKEND(o);
>  
>      if (backend->id) {
> -        error_setg(errp, "cannot change property value");
> +        error_setg(errp, "cannot change property 'id' of %s '%s'",
> +                   object_get_typename(o), backend->id);
>          return;
>      }
>      backend->id = g_strdup(str);
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  2018-01-24 20:20   ` Michael S. Tsirkin
@ 2018-01-25  0:14     ` Haozhong Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Haozhong Zhang @ 2018-01-25  0:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini,
	dgilbert, Xiao Guangrong, Stefan Hajnoczi, Dan Williams

On 01/24/18 22:20 +0200, Michael S. Tsirkin wrote:
> > index 50385e3f81..dd5876471f 100644
> > --- a/include/qemu/mmap-alloc.h
> > +++ b/include/qemu/mmap-alloc.h
> > @@ -7,7 +7,8 @@ size_t qemu_fd_getpagesize(int fd);
> >  
> >  size_t qemu_mempath_getpagesize(const char *mem_path);
> >  
> > -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
> > +void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared,
> > +                    OnOffAuto sync);
> >  
> >  void qemu_ram_munmap(void *ptr, size_t size);
> >  
> 
> And Marcel plans to add a remappable flag ...  Is it time we
> switched to a flags field?

Yes. Some patches on my hands are going to add another field to this
function, so let's switch to flags.

> 
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index adb3758275..0ff10cb529 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -372,6 +372,24 @@ void qemu_anon_ram_free(void *ptr, size_t size);
> >  #  define QEMU_VMALLOC_ALIGN getpagesize()
> >  #endif
> >  
> > +/*
> > + * MAP_SHARED_VALIDATE and MAP_SYNC were introduced in Linux kernel
> > + * 4.15, so they may not be defined when compiling on older kernels.
> > + */
> > +#ifdef CONFIG_LINUX
> > +#ifndef MAP_SHARED_VALIDATE
> > +#define MAP_SHARED_VALIDATE   0x3
> > +#endif
> > +#ifndef MAP_SYNC
> > +#define MAP_SYNC              0x80000
> > +#endif
> > +#define QEMU_HAS_MAP_SYNC     true
> > +#else  /* !CONFIG_LINUX */
> > +#define MAP_SHARED_VALIDATE   0x0
> > +#define MAP_SYNC              0x0
> > +#define QEMU_HAS_MAP_SYNC     false
> > +#endif /* CONFIG_LINUX */
> > +
> >  #ifdef CONFIG_POSIX
> >  struct qemu_signalfd_siginfo {
> >      uint32_t ssi_signo;   /* Signal number */
> 
> Please just import this into standard-headers from Linux.
>

Sure, I'll move it to a new file include/standard-headers/linux/mman.h.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option
  2018-01-24 20:23   ` Michael S. Tsirkin
@ 2018-01-25  0:24     ` Haozhong Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Haozhong Zhang @ 2018-01-25  0:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini,
	dgilbert, Xiao Guangrong, Stefan Hajnoczi, Dan Williams

On 01/24/18 22:23 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 17, 2018 at 04:13:25PM +0800, Haozhong Zhang wrote:
> > This option controls whether QEMU mmap(2) the memory backend file with
> > MAP_SYNC flag, which can fully guarantee the guest write persistence
> > to the backend, if MAP_SYNC flag is supported by the host kernel
> > (Linux kernel 4.15 and later) and the backend is a file supporting
> > DAX (e.g., file on ext4/xfs file system mounted with '-o dax').
> > 
> > It can take one of following values:
> >  - on:  try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> >         'share=off', QEMU will abort
> >  - off: never pass MAP_SYNC to mmap(2)
> >  - auto (default): if MAP_SYNC is supported and 'share=on', work as if
> >         'sync=on'; otherwise, work as if 'sync=off'
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > Suggested-by: Eduardo Habkost <ehabkost@redhat.com>

[..]
> >  
> >  @table @option
> >  
> > -@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
> > +@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},sync=@var{on|off|auto}
> >  
> >  Creates a memory file backend object, which can be used to back
> >  the guest RAM with huge pages.
> > @@ -4034,6 +4034,25 @@ requires an alignment different than the default one used by QEMU, eg
> >  the device DAX /dev/dax0.0 requires 2M alignment rather than 4K. In
> >  such cases, users can specify the required alignment via this option.
> >  
> > +The @option{sync} option specifies whether QEMU mmap(2) @option{mem-path}
> > +with MAP_SYNC flag, which can fully guarantee the guest write
> > +persistence to @option{mem-path}.
> 
> I would add ... even in case of a host power loss.
> Here and wherever you say "fully".

Without MAP_SYNC, QEMU can only guarantee the guest data is written to
the host NVDIMM after, for example, guest clwb+sfence. However, if
some host file system meta data of the mapped file have not been
written back to the host NVDIMM when a host power failure happens, the
mapped file may be broken though all its data may be still there.

Anyway, I'll remove the confusing word "fully" and add your suggestion.

Thanks,
Haozhong

> 
> > MAP_SYNC requires supports from both
> > +the host kernel (since Linux kernel 4.15) and @option{mem-path} (only
> > +files supporting DAX). It can take one of following values:
> > +
> > +@table @option
> > +@item @var{on}
> > +try to pass MAP_SYNC to mmap(2); if MAP_SYNC is not supported or
> > +@option{share}=@var{off}, QEMU will abort
> > +
> > +@item @var{off}
> > +never pass MAP_SYNC to mmap(2)
> > +
> > +@item @var{auto} (default)
> > +if MAP_SYNC is supported and @option{share}=@var{on}, work as if
> > +@option{sync}=@var{on}; otherwise, work as if @option{sync}=@var{off}
> > +@end table
> > +
> >  @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >  
> >  Creates a memory backend object, which can be used to back the guest RAM.
> > -- 
> > 2.14.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-01-25  0:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17  8:13 [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang
2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 1/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Haozhong Zhang
2018-01-24 20:20   ` Michael S. Tsirkin
2018-01-25  0:14     ` Haozhong Zhang
2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 2/3] hostmem: add more information in error messages Haozhong Zhang
2018-01-24 20:23   ` Michael S. Tsirkin
2018-01-17  8:13 ` [Qemu-devel] [PATCH v3 3/3] hostmem-file: add 'sync' option Haozhong Zhang
2018-01-24 20:22   ` Michael S. Tsirkin
2018-01-24 20:23   ` Michael S. Tsirkin
2018-01-25  0:24     ` Haozhong Zhang
2018-01-24  8:12 ` [Qemu-devel] [PATCH v3 0/3] nvdimm: support MAP_SYNC for memory-backend-file Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.