All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
@ 2018-02-28  7:25 Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 1/8] memory, exec: switch file ram allocation functions to 'flags' parameters Haozhong Zhang
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the
persistent memory. Otherwise, a host power failure may result in the
loss the guest data on the persistent memory.

This v3 patch series is based on Marcel's patch "mem: add share
parameter to memory-backend-ram" [1] because of the changes in patch 1.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions can be found at
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v4:
 * (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
 * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
   PMEM writes in it, so we don't need the _common function.
 * (Patch 6) Expose qemu_get_buffer_common so we can remove the
   unnecessary qemu_get_buffer_to_pmem wrapper.
 * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
   PMEM writes in it, so we can remove the unnecessary
   xbzrle_decode_buffer_{common, to_pmem}.
 * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
   of test-{xbzrle,vmstate}.c.

Changes in v2:
 * (Patch 1) Use a flags parameter in file ram allocation functions.
 * (Patch 2) Add a new option 'pmem' to hostmem-file.
 * (Patch 3) Use libpmem to operate on the persistent memory, rather
   than re-implementing those operations in QEMU.
 * (Patch 5-8) Consider the write persistence in the migration path.

Haozhong Zhang (8):
  [1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
  [2/8] hostmem-file: add the 'pmem' option
  [3/8] configure: add libpmem support
  [4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
  [5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
  [6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
  [7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM
  [8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM

 backends/hostmem-file.c             | 27 +++++++++++++++++++-
 configure                           | 35 ++++++++++++++++++++++++++
 docs/nvdimm.txt                     | 14 +++++++++++
 exec.c                              | 20 ++++++++++++---
 hw/mem/nvdimm.c                     |  9 ++++++-
 include/exec/memory.h               | 12 +++++++--
 include/exec/ram_addr.h             | 28 +++++++++++++++++++--
 include/migration/qemu-file-types.h |  2 ++
 include/qemu/pmem.h                 | 27 ++++++++++++++++++++
 memory.c                            |  8 +++---
 migration/qemu-file.c               | 29 ++++++++++++++--------
 migration/ram.c                     | 49 +++++++++++++++++++++++++++----------
 migration/ram.h                     |  2 +-
 migration/rdma.c                    |  2 +-
 migration/xbzrle.c                  |  8 ++++--
 migration/xbzrle.h                  |  3 ++-
 numa.c                              |  2 +-
 qemu-options.hx                     |  9 ++++++-
 stubs/Makefile.objs                 |  1 +
 stubs/pmem.c                        | 37 ++++++++++++++++++++++++++++
 tests/Makefile.include              |  4 +--
 tests/test-xbzrle.c                 |  4 +--
 22 files changed, 285 insertions(+), 47 deletions(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

-- 
2.14.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 2/8] hostmem-file: add the 'pmem' option Haozhong Zhang
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

As more flag parameters besides the existing 'share' are going to be
added to following functions
    memory_region_init_ram_from_file
    qemu_ram_alloc_from_fd
    qemu_ram_alloc_from_file
, let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the QEMU_RAM_SHARE bit in
flags, and other flag bits are ignored by above functions right now.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 backends/hostmem-file.c |  3 ++-
 exec.c                  |  7 ++++---
 include/exec/memory.h   | 10 ++++++++--
 include/exec/ram_addr.h | 25 +++++++++++++++++++++++--
 memory.c                |  8 +++++---
 numa.c                  |  2 +-
 6 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d63a..30df843d90 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         path = object_get_canonical_path(OBJECT(backend));
         memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  path,
-                                 backend->size, fb->align, backend->share,
+                                 backend->size, fb->align,
+                                 backend->share ? QEMU_RAM_SHARE : 0,
                                  fb->mem_path, errp);
         g_free(path);
     }
diff --git a/exec.c b/exec.c
index 4d8addb263..537bf12412 100644
--- a/exec.c
+++ b/exec.c
@@ -2000,12 +2000,13 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
-                                 bool share, int fd,
+                                 uint64_t flags, int fd,
                                  Error **errp)
 {
     RAMBlock *new_block;
     Error *local_err = NULL;
     int64_t file_size;
+    bool share = flags & QEMU_RAM_SHARE;
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
@@ -2061,7 +2062,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                   bool share, const char *mem_path,
+                                   uint64_t flags, const char *mem_path,
                                    Error **errp)
 {
     int fd;
@@ -2073,7 +2074,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
         return NULL;
     }
 
-    block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+    block = qemu_ram_alloc_from_fd(size, mr, flags, fd, errp);
     if (!block) {
         if (created) {
             unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15e81113ba..0fc9d23a48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -487,6 +487,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
                                                        void *host),
                                        Error **errp);
 #ifdef __linux__
+
+#define QEMU_RAM_SHARE      (1UL << 0)
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *                                    mmap-ed backend.
@@ -498,7 +501,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  *         (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @flags: specify properties of this memory region, which can be one or bit-or
+ *         of following values:
+ *         - QEMU_RAM_SHARE: memory must be mmaped with the MAP_SHARED flag
+ *         Other bits are ignored.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -510,7 +516,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       const char *name,
                                       uint64_t size,
                                       uint64_t align,
-                                      bool share,
+                                      uint64_t flags,
                                       const char *path,
                                       Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cf2446a176..b8b01d1eb9 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -72,12 +72,33 @@ static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
 
 long qemu_getrampagesize(void);
 unsigned long last_ram_page(void);
+
+/**
+ * qemu_ram_alloc_from_file,
+ * qemu_ram_alloc_from_fd:  Allocate a ram block from the specified back
+ *                          file or device
+ *
+ * Parameters:
+ *  @size: the size in bytes of the ram block
+ *  @mr: the memory region where the ram block is
+ *  @flags: specify the properties of the ram block, which can be one
+ *          or bit-or of following values
+ *          - QEMU_RAM_SHARE: mmap the back file or device with MAP_SHARED
+ *          Other bits are ignored.
+ *  @mem_path or @fd: specify the back file or device
+ *  @errp: pointer to Error*, to store an error if it happens
+ *
+ * Return:
+ *  On success, return a pointer to the ram block.
+ *  On failure, return NULL.
+ */
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                   bool share, const char *mem_path,
+                                   uint64_t flags, const char *mem_path,
                                    Error **errp);
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
-                                 bool share, int fd,
+                                 uint64_t flags, int fd,
                                  Error **errp);
+
 RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                   MemoryRegion *mr, Error **errp);
 RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share, MemoryRegion *mr,
diff --git a/memory.c b/memory.c
index 6515131ac2..e89500837a 100644
--- a/memory.c
+++ b/memory.c
@@ -1582,7 +1582,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       const char *name,
                                       uint64_t size,
                                       uint64_t align,
-                                      bool share,
+                                      uint64_t flags,
                                       const char *path,
                                       Error **errp)
 {
@@ -1591,7 +1591,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
     mr->align = align;
-    mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
+    mr->ram_block = qemu_ram_alloc_from_file(size, mr, flags, path, errp);
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 
@@ -1607,7 +1607,9 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+    mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
+                                           share ? QEMU_RAM_SHARE : 0,
+                                           fd, errp);
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 #endif
diff --git a/numa.c b/numa.c
index 7e0e789b02..ff5a0bbe18 100644
--- a/numa.c
+++ b/numa.c
@@ -457,7 +457,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
     if (mem_path) {
 #ifdef __linux__
         Error *err = NULL;
-        memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
+        memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, 0,
                                          mem_path, &err);
         if (err) {
             error_report_err(err);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 2/8] hostmem-file: add the 'pmem' option
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 1/8] memory, exec: switch file ram allocation functions to 'flags' parameters Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 3/8] configure: add libpmem support Haozhong Zhang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 backends/hostmem-file.c | 26 +++++++++++++++++++++++++-
 docs/nvdimm.txt         | 14 ++++++++++++++
 exec.c                  | 13 ++++++++++++-
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx         |  9 ++++++++-
 6 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 30df843d90..5d706d471f 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
     bool discard_data;
     char *mem_path;
     uint64_t align;
+    bool is_pmem;
 };
 
 static void
@@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  path,
                                  backend->size, fb->align,
-                                 backend->share ? QEMU_RAM_SHARE : 0,
+                                 (backend->share ? QEMU_RAM_SHARE : 0) |
+                                 (fb->is_pmem ? QEMU_RAM_PMEM : 0),
                                  fb->mem_path, errp);
         g_free(path);
     }
@@ -131,6 +133,25 @@ static void file_memory_backend_set_align(Object *o, Visitor *v,
     error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+    return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+                   object_get_typename(o), backend->id);
+        return;
+    }
+
+    fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +183,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
         file_memory_backend_get_align,
         file_memory_backend_set_align,
         NULL, NULL, &error_abort);
+    object_class_property_add_bool(oc, "pmem",
+        file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+        &error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index e903d8bb09..bcb2032672 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -153,3 +153,17 @@ guest NVDIMM region mapping structure.  This unarmed flag indicates
 guest software that this vNVDIMM device contains a region that cannot
 accept persistent writes. In result, for example, the guest Linux
 NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence of its own writes to the vNVDIMM backend (e.g., in
+vNVDIMM label emulation and live migration).
+
+References
+----------
+
+[1] SNIA NVM Programming Model: https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
+[2] PMDK: http://pmem.io/pmdk/
diff --git a/exec.c b/exec.c
index 537bf12412..3f3b61fb0a 100644
--- a/exec.c
+++ b/exec.c
@@ -99,6 +99,9 @@ static MemoryRegion io_mem_unassigned;
  */
 #define RAM_RESIZEABLE (1 << 2)
 
+/* RAM is backed by the persistent memory. */
+#define RAM_PMEM       (1 << 3)
+
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
@@ -2007,6 +2010,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     Error *local_err = NULL;
     int64_t file_size;
     bool share = flags & QEMU_RAM_SHARE;
+    bool is_pmem = flags & QEMU_RAM_PMEM;
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
@@ -2043,7 +2047,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->mr = mr;
     new_block->used_length = size;
     new_block->max_length = size;
-    new_block->flags = share ? RAM_SHARED : 0;
+    new_block->flags = (share ? RAM_SHARED : 0) |
+                       (is_pmem ? RAM_PMEM : 0);
     new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
     if (!new_block->host) {
         g_free(new_block);
@@ -3749,6 +3754,11 @@ err:
     return ret;
 }
 
+bool ramblock_is_pmem(RAMBlock *rb)
+{
+    return rb->flags & RAM_PMEM;
+}
+
 #endif
 
 void page_size_init(void)
@@ -3847,3 +3857,4 @@ void mtree_print_dispatch(fprintf_function mon, void *f,
 }
 
 #endif
+
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0fc9d23a48..b0b49a7a60 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -489,6 +489,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
 #ifdef __linux__
 
 #define QEMU_RAM_SHARE      (1UL << 0)
+#define QEMU_RAM_PMEM       (1UL << 1)
 
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
@@ -504,6 +505,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @flags: specify properties of this memory region, which can be one or bit-or
  *         of following values:
  *         - QEMU_RAM_SHARE: memory must be mmaped with the MAP_SHARED flag
+ *         - QEMU_RAM_PMEM: the backend @path is persistent memory
  *         Other bits are ignored.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index b8b01d1eb9..f8d8614e4d 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -70,6 +70,8 @@ static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
     return host_addr_offset >> TARGET_PAGE_BITS;
 }
 
+bool ramblock_is_pmem(RAMBlock *rb);
+
 long qemu_getrampagesize(void);
 unsigned long last_ram_page(void);
 
@@ -84,6 +86,7 @@ unsigned long last_ram_page(void);
  *  @flags: specify the properties of the ram block, which can be one
  *          or bit-or of following values
  *          - QEMU_RAM_SHARE: mmap the back file or device with MAP_SHARED
+ *          - QEMU_RAM_PMEM: the backend @mem_path or @fd is persistent memory
  *          Other bits are ignored.
  *  @mem_path or @fd: specify the back file or device
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/qemu-options.hx b/qemu-options.hx
index 8ccd5dcaa6..62c7afe567 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3957,7 +3957,7 @@ property must be set.  These objects are placed in the
 
 @table @option
 
-@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align}
+@item -object memory-backend-file,id=@var{id},size=@var{size},mem-path=@var{dir},share=@var{on|off},discard-data=@var{on|off},merge=@var{on|off},dump=@var{on|off},prealloc=@var{on|off},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave},align=@var{align},pmem=@var{on|off}
 
 Creates a memory file backend object, which can be used to back
 the guest RAM with huge pages.
@@ -4025,6 +4025,13 @@ requires an alignment different than the default one used by QEMU, eg
 the device DAX /dev/dax0.0 requires 2M alignment rather than 4K. In
 such cases, users can specify the required alignment via this option.
 
+The @option{pmem} option specifies whether the backend store specified
+by @option{mem-path} is on the persistent memory that can be accessed
+in the SNIA NVM programming model (e.g. Intel NVDIMM).
+If @option{pmem}=@var{on}, QEMU will take necessary operations to
+guarantee the persistence of its own writes to @option{mem-path}
+(e.g. in vNVDIMM label emulation and live migration).
+
 @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
 
 Creates a memory backend object, which can be used to back the guest RAM.
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 3/8] configure: add libpmem support
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 1/8] memory, exec: switch file ram allocation functions to 'flags' parameters Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 2/8] hostmem-file: add the 'pmem' option Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong Zhang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

[1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/
[2] https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 configure | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/configure b/configure
index 39f3a43001..78e10f6d6d 100755
--- a/configure
+++ b/configure
@@ -450,6 +450,7 @@ jemalloc="no"
 replication="yes"
 vxhs=""
 libxml2=""
+libpmem=""
 
 supported_cpu="no"
 supported_os="no"
@@ -1360,6 +1361,10 @@ for opt do
   ;;
   --disable-git-update) git_update=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
       echo "ERROR: unknown option $opt"
       echo "Try '$0 --help' for more information"
@@ -1612,6 +1617,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   crypto-afalg    Linux AF_ALG crypto backend driver
   vhost-user      vhost-user support
   capstone        capstone disassembler support
+  libpmem         libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5347,6 +5353,30 @@ EOF
   fi
 fi
 
+##########################################
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+  cat > $TMPC <<EOF
+#include <libpmem.h>
+int main(void)
+{
+  pmem_is_pmem(0, 0);
+  return 0;
+}
+EOF
+  libpmem_libs="-lpmem"
+  if compile_prog "" "$libpmem_libs" ; then
+    libs_softmmu="$libpmem_libs $libs_softmmu"
+    libpmem="yes"
+  else
+    if test "$libpmem" = "yes" ; then
+      feature_not_found "libpmem" "Install nvml or pmdk"
+    fi
+    libpmem="no"
+  fi
+fi
+
 ##########################################
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -5817,6 +5847,7 @@ echo "avx2 optimization $avx2_opt"
 echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone          $capstone"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6542,6 +6573,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-I\$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (2 preceding siblings ...)
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 3/8] configure: add libpmem support Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM Haozhong Zhang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 hw/mem/nvdimm.c     |  9 ++++++++-
 include/qemu/pmem.h | 23 +++++++++++++++++++++++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c        | 19 +++++++++++++++++++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 61e677f92f..18861d1a7a 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
@@ -156,11 +157,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
 {
     MemoryRegion *mr;
     PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+    bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+                                            "pmem", NULL);
     uint64_t backend_offset;
 
     nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-    memcpy(nvdimm->label_data + offset, buf, size);
+    if (!is_pmem) {
+        memcpy(nvdimm->label_data + offset, buf, size);
+    } else {
+        pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+    }
 
     mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
     backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 0000000000..16f5b2653a
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include <libpmem.h>
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 2d59d84091..ba944b9739 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 0000000000..03d990e571
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang <haozhong.zhang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <string.h>
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+    return memcpy(pmemdest, src, len);
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (3 preceding siblings ...)
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-03-29 18:59   ` Dr. David Alan Gilbert
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 6/8] migration/ram: ensure write persistence on loading normal " Haozhong Zhang
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

When loading a zero page, check whether it will be loaded to
persistent memory If yes, load it by libpmem function
pmem_memset_nodrain().  Combined with a call to pmem_drain() at the
end of RAM loading, we can guarantee all those zero pages are
persistently loaded.

Depending on the host HW/SW configurations, pmem_drain() can be
"sfence".  Therefore, we do not call pmem_drain() after each
pmem_memset_nodrain(), or use pmem_memset_persist() (equally
pmem_memset_nodrain() + pmem_drain()), in order to avoid unnecessary
overhead.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 include/qemu/pmem.h |  2 ++
 migration/ram.c     | 25 +++++++++++++++++++++----
 migration/ram.h     |  2 +-
 migration/rdma.c    |  2 +-
 stubs/pmem.c        |  9 +++++++++
 5 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 16f5b2653a..ce96379f3c 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,8 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
+void pmem_drain(void);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index 5e33e5cc79..3904ceee79 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "qemu/pmem.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -2479,11 +2480,16 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
  * @host: host address for the zero page
  * @ch: what the page is filled from.  We only support zero
  * @size: size of the zero page
+ * @is_pmem: whether @host is in the persistent memory
  */
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool is_pmem)
 {
     if (ch != 0 || !is_zero_range(host, size)) {
-        memset(host, ch, size);
+        if (!is_pmem) {
+            memset(host, ch, size);
+        } else {
+            pmem_memset_nodrain(host, ch, size);
+        }
     }
 }
 
@@ -2839,6 +2845,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     bool postcopy_running = postcopy_is_running();
     /* ADVISE is earlier, it shows the source has the postcopy capability on */
     bool postcopy_advised = postcopy_is_advised();
+    bool need_pmem_drain = false;
 
     seq_iter++;
 
@@ -2864,6 +2871,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
         ram_addr_t addr, total_ram_bytes;
         void *host = NULL;
         uint8_t ch;
+        RAMBlock *block = NULL;
+        bool is_pmem = false;
 
         addr = qemu_get_be64(f);
         flags = addr & ~TARGET_PAGE_MASK;
@@ -2880,7 +2889,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-            RAMBlock *block = ram_block_from_stream(f, flags);
+            block = ram_block_from_stream(f, flags);
 
             host = host_from_ram_block_offset(block, addr);
             if (!host) {
@@ -2890,6 +2899,9 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
             ramblock_recv_bitmap_set(block, host);
             trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
+
+            is_pmem = ramblock_is_pmem(block);
+            need_pmem_drain = need_pmem_drain || is_pmem;
         }
 
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
@@ -2943,7 +2955,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
         case RAM_SAVE_FLAG_ZERO:
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE, is_pmem);
             break;
 
         case RAM_SAVE_FLAG_PAGE:
@@ -2986,6 +2998,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     wait_for_decompress_done();
+
+    if (need_pmem_drain) {
+        pmem_drain();
+    }
+
     rcu_read_unlock();
     trace_ram_load_complete(ret, seq_iter);
     return ret;
diff --git a/migration/ram.h b/migration/ram.h
index f3a227b4fc..18934ae9e4 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -57,7 +57,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 int ram_discard_range(const char *block_name, uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool is_pmem);
 
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
diff --git a/migration/rdma.c b/migration/rdma.c
index da474fc19f..573bcd2cb0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3229,7 +3229,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
             host_addr = block->local_host_addr +
                             (comp->offset - block->offset);
 
-            ram_handle_compressed(host_addr, comp->value, comp->length);
+            ram_handle_compressed(host_addr, comp->value, comp->length, false);
             break;
 
         case RDMA_CONTROL_REGISTER_FINISHED:
diff --git a/stubs/pmem.c b/stubs/pmem.c
index 03d990e571..a65b3bfc6b 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,12 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
 {
     return memcpy(pmemdest, src, len);
 }
+
+void *pmem_memset_nodrain(void *pmemdest, int c, size_t len)
+{
+    return memset(pmemdest, c, len);
+}
+
+void pmem_drain(void)
+{
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (4 preceding siblings ...)
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 7/8] migration/ram: ensure write persistence on loading compressed " Haozhong Zhang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

When loading a normal page to persistent memory, load its data by
libpmem function pmem_memcpy_nodrain() instead of memcpy(). Combined
with a call to pmem_drain() at the end of memory loading, we can
guarantee all those normal pages are persistenly loaded to PMEM.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 include/migration/qemu-file-types.h |  2 ++
 include/qemu/pmem.h                 |  1 +
 migration/qemu-file.c               | 29 +++++++++++++++++++----------
 migration/ram.c                     |  2 +-
 stubs/pmem.c                        |  5 +++++
 tests/Makefile.include              |  2 +-
 6 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h
index bd6d7dd7f9..c7c3f665f9 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -33,6 +33,8 @@ void qemu_put_byte(QEMUFile *f, int v);
 void qemu_put_be16(QEMUFile *f, unsigned int v);
 void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
+size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size,
+                              bool is_pmem);
 size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size);
 
 int qemu_get_byte(QEMUFile *f);
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index ce96379f3c..127b87c326 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -16,6 +16,7 @@
 #include <libpmem.h>
 #else  /* !CONFIG_LIBPMEM */
 
+void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len);
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
 void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
 void pmem_drain(void);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2ab2bf362d..d19f677796 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -26,6 +26,7 @@
 #include "qemu-common.h"
 #include "qemu/error-report.h"
 #include "qemu/iov.h"
+#include "qemu/pmem.h"
 #include "migration.h"
 #include "qemu-file.h"
 #include "trace.h"
@@ -471,18 +472,13 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset)
     return size;
 }
 
-/*
- * Read 'size' bytes of data from the file into buf.
- * 'size' can be larger than the internal buffer.
- *
- * It will return size bytes unless there was an error, in which case it will
- * return as many as it managed to read (assuming blocking fd's which
- * all current QEMUFile are)
- */
-size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
+size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size,
+                              bool is_pmem)
 {
     size_t pending = size;
     size_t done = 0;
+    void *(*memcpy_func)(void *d, const void *s, size_t n) =
+        is_pmem ? pmem_memcpy_nodrain : memcpy;
 
     while (pending > 0) {
         size_t res;
@@ -492,7 +488,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
         if (res == 0) {
             return done;
         }
-        memcpy(buf, src, res);
+        memcpy_func(buf, src, res);
         qemu_file_skip(f, res);
         buf += res;
         pending -= res;
@@ -501,6 +497,19 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
     return done;
 }
 
+/*
+ * Read 'size' bytes of data from the file into buf.
+ * 'size' can be larger than the internal buffer.
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
+{
+    return qemu_get_buffer_common(f, buf, size, false);
+}
+
 /*
  * Read 'size' bytes of data from the file.
  * 'size' can be larger than the internal buffer.
diff --git a/migration/ram.c b/migration/ram.c
index 3904ceee79..ea2ad7dff0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2959,7 +2959,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             break;
 
         case RAM_SAVE_FLAG_PAGE:
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            qemu_get_buffer_common(f, host, TARGET_PAGE_SIZE, is_pmem);
             break;
 
         case RAM_SAVE_FLAG_COMPRESS_PAGE:
diff --git a/stubs/pmem.c b/stubs/pmem.c
index a65b3bfc6b..e172f31174 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -26,3 +26,8 @@ void *pmem_memset_nodrain(void *pmemdest, int c, size_t len)
 void pmem_drain(void)
 {
 }
+
+void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len)
+{
+    return memcpy(pmemdest, src, len);
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 577eb573a2..37bb85f591 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -637,7 +637,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	$(test-qapi-obj-y)
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
 	migration/vmstate.o migration/vmstate-types.o migration/qemu-file.o \
-        migration/qemu-file-channel.o migration/qjson.o \
+        migration/qemu-file-channel.o migration/qjson.o stubs/pmem.o \
 	$(test-io-obj-y)
 tests/test-timed-average$(EXESUF): tests/test-timed-average.o $(test-util-obj-y)
 tests/test-base64$(EXESUF): tests/test-base64.o $(test-util-obj-y)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (5 preceding siblings ...)
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 6/8] migration/ram: ensure write persistence on loading normal " Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 8/8] migration/ram: ensure write persistence on loading xbzrle " Haozhong Zhang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

When loading a compressed page to persistent memory, flush CPU cache
after the data is decompressed. Combined with a call to pmem_drain()
at the end of memory loading, we can guarantee those compressed pages
are persistently loaded to PMEM.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 include/qemu/pmem.h |  1 +
 migration/ram.c     | 16 +++++++++++-----
 stubs/pmem.c        |  4 ++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 127b87c326..120439ecb8 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -20,6 +20,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len);
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
 void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
 void pmem_drain(void);
+void pmem_flush(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index ea2ad7dff0..37f3c39cee 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -276,6 +276,7 @@ struct DecompressParam {
     void *des;
     uint8_t *compbuf;
     int len;
+    bool is_pmem;
 };
 typedef struct DecompressParam DecompressParam;
 
@@ -2498,7 +2499,7 @@ static void *do_data_decompress(void *opaque)
     DecompressParam *param = opaque;
     unsigned long pagesize;
     uint8_t *des;
-    int len;
+    int len, rc;
 
     qemu_mutex_lock(&param->mutex);
     while (!param->quit) {
@@ -2514,8 +2515,11 @@ static void *do_data_decompress(void *opaque)
              * not a problem because the dirty page will be retransferred
              * and uncompress() won't break the data in other pages.
              */
-            uncompress((Bytef *)des, &pagesize,
-                       (const Bytef *)param->compbuf, len);
+            rc = uncompress((Bytef *)des, &pagesize,
+                            (const Bytef *)param->compbuf, len);
+            if (rc == Z_OK && param->is_pmem) {
+                pmem_flush(des, len);
+            }
 
             qemu_mutex_lock(&decomp_done_lock);
             param->done = true;
@@ -2601,7 +2605,8 @@ static void compress_threads_load_cleanup(void)
 }
 
 static void decompress_data_with_multi_threads(QEMUFile *f,
-                                               void *host, int len)
+                                               void *host, int len,
+                                               bool is_pmem)
 {
     int idx, thread_count;
 
@@ -2615,6 +2620,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
                 qemu_get_buffer(f, decomp_param[idx].compbuf, len);
                 decomp_param[idx].des = host;
                 decomp_param[idx].len = len;
+                decomp_param[idx].is_pmem = is_pmem;
                 qemu_cond_signal(&decomp_param[idx].cond);
                 qemu_mutex_unlock(&decomp_param[idx].mutex);
                 break;
@@ -2969,7 +2975,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
                 break;
             }
-            decompress_data_with_multi_threads(f, host, len);
+            decompress_data_with_multi_threads(f, host, len, is_pmem);
             break;
 
         case RAM_SAVE_FLAG_XBZRLE:
diff --git a/stubs/pmem.c b/stubs/pmem.c
index e172f31174..cfab830131 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -31,3 +31,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len)
 {
     return memcpy(pmemdest, src, len);
 }
+
+void pmem_flush(const void *addr, size_t len)
+{
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (6 preceding siblings ...)
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 7/8] migration/ram: ensure write persistence on loading compressed " Haozhong Zhang
@ 2018-02-28  7:25 ` Haozhong Zhang
  2018-02-28  7:31 ` [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams, Haozhong Zhang

When loading a xbzrle encoded page to persistent memory, load the data
via libpmem function pmem_memcpy_nodrain() instead of memcpy().
Combined with a call to pmem_drain() at the end of memory loading, we
can guarantee those xbzrle encoded pages are persistently loaded to PMEM.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 migration/ram.c        | 6 +++---
 migration/xbzrle.c     | 8 ++++++--
 migration/xbzrle.h     | 3 ++-
 tests/Makefile.include | 2 +-
 tests/test-xbzrle.c    | 4 ++--
 5 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 37f3c39cee..70b196c4f5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2391,7 +2391,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
     }
 }
 
-static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
+static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host, bool is_pmem)
 {
     unsigned int xh_len;
     int xh_flags;
@@ -2417,7 +2417,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
 
     /* decode RLE */
     if (xbzrle_decode_buffer(loaded_data, xh_len, host,
-                             TARGET_PAGE_SIZE) == -1) {
+                             TARGET_PAGE_SIZE, is_pmem) == -1) {
         error_report("Failed to load XBZRLE page - decode error!");
         return -1;
     }
@@ -2979,7 +2979,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             break;
 
         case RAM_SAVE_FLAG_XBZRLE:
-            if (load_xbzrle(f, addr, host) < 0) {
+            if (load_xbzrle(f, addr, host, is_pmem) < 0) {
                 error_report("Failed to decompress XBZRLE page at "
                              RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
diff --git a/migration/xbzrle.c b/migration/xbzrle.c
index 1ba482ded9..ca713c3697 100644
--- a/migration/xbzrle.c
+++ b/migration/xbzrle.c
@@ -12,6 +12,7 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 
 /*
@@ -126,11 +127,14 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
     return d;
 }
 
-int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen)
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen,
+                         bool is_pmem)
 {
     int i = 0, d = 0;
     int ret;
     uint32_t count = 0;
+    void *(*memcpy_func)(void *d, const void *s, size_t n) =
+        is_pmem ? pmem_memcpy_nodrain : memcpy;
 
     while (i < slen) {
 
@@ -167,7 +171,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen)
             return -1;
         }
 
-        memcpy(dst + d, src + i, count);
+        memcpy_func(dst + d, src + i, count);
         d += count;
         i += count;
     }
diff --git a/migration/xbzrle.h b/migration/xbzrle.h
index a0db507b9c..f18f679f47 100644
--- a/migration/xbzrle.h
+++ b/migration/xbzrle.h
@@ -17,5 +17,6 @@
 int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
                          uint8_t *dst, int dlen);
 
-int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen,
+                         bool is_pmem);
 #endif
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 37bb85f591..be5b7e484b 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -616,7 +616,7 @@ tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y)
 tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
-tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o migration/page_cache.o $(test-util-obj-y)
+tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o migration/page_cache.o stubs/pmem.o $(test-util-obj-y)
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o $(test-util-obj-y)
 tests/test-int128$(EXESUF): tests/test-int128.o
 tests/rcutorture$(EXESUF): tests/rcutorture.o $(test-util-obj-y)
diff --git a/tests/test-xbzrle.c b/tests/test-xbzrle.c
index f5e08de91e..9afa0c4bcb 100644
--- a/tests/test-xbzrle.c
+++ b/tests/test-xbzrle.c
@@ -101,7 +101,7 @@ static void test_encode_decode_1_byte(void)
                        PAGE_SIZE);
     g_assert(dlen == (uleb128_encode_small(&buf[0], 4095) + 2));
 
-    rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE);
+    rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE, false);
     g_assert(rc == PAGE_SIZE);
     g_assert(memcmp(test, buffer, PAGE_SIZE) == 0);
 
@@ -156,7 +156,7 @@ static void encode_decode_range(void)
     dlen = xbzrle_encode_buffer(test, buffer, PAGE_SIZE, compressed,
                                 PAGE_SIZE);
 
-    rc = xbzrle_decode_buffer(compressed, dlen, test, PAGE_SIZE);
+    rc = xbzrle_decode_buffer(compressed, dlen, test, PAGE_SIZE, false);
     g_assert(rc < PAGE_SIZE);
     g_assert(memcmp(test, buffer, PAGE_SIZE) == 0);
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (7 preceding siblings ...)
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 8/8] migration/ram: ensure write persistence on loading xbzrle " Haozhong Zhang
@ 2018-02-28  7:31 ` Haozhong Zhang
  2018-03-08  6:49 ` Haozhong Zhang
  2018-03-12 15:39 ` Stefan Hajnoczi
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-02-28  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams

On 02/28/18 15:25 +0800, Haozhong Zhang wrote:
> QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> live migration. If the backend is on the persistent memory, QEMU needs
> to take proper operations to ensure its writes persistent on the
> persistent memory. Otherwise, a host power failure may result in the
> loss the guest data on the persistent memory.
>


> This v3 patch series is based on Marcel's patch "mem: add share
> parameter to memory-backend-ram" [1] because of the changes in patch 1.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

I forgot to remove this part. v4 can be applied on the current master
branch now because above [1] has already been merged.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (8 preceding siblings ...)
  2018-02-28  7:31 ` [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
@ 2018-03-08  6:49 ` Haozhong Zhang
  2018-03-12 15:39 ` Stefan Hajnoczi
  10 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-03-08  6:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, dgilbert, Stefan Hajnoczi,
	Dan Williams

Ping?

On 02/28/18 15:25 +0800, Haozhong Zhang wrote:
> QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> live migration. If the backend is on the persistent memory, QEMU needs
> to take proper operations to ensure its writes persistent on the
> persistent memory. Otherwise, a host power failure may result in the
> loss the guest data on the persistent memory.
> 
> This v3 patch series is based on Marcel's patch "mem: add share
> parameter to memory-backend-ram" [1] because of the changes in patch 1.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
> 
> Previous versions can be found at
> v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
> v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
> v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
> 
> Changes in v4:
>  * (Patch 2) Fix compilation errors found by patchew.
> 
> Changes in v3:
>  * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
>    PMEM writes in it, so we don't need the _common function.
>  * (Patch 6) Expose qemu_get_buffer_common so we can remove the
>    unnecessary qemu_get_buffer_to_pmem wrapper.
>  * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
>    PMEM writes in it, so we can remove the unnecessary
>    xbzrle_decode_buffer_{common, to_pmem}.
>  * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
>    of test-{xbzrle,vmstate}.c.
> 
> Changes in v2:
>  * (Patch 1) Use a flags parameter in file ram allocation functions.
>  * (Patch 2) Add a new option 'pmem' to hostmem-file.
>  * (Patch 3) Use libpmem to operate on the persistent memory, rather
>    than re-implementing those operations in QEMU.
>  * (Patch 5-8) Consider the write persistence in the migration path.
> 
> Haozhong Zhang (8):
>   [1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
>   [2/8] hostmem-file: add the 'pmem' option
>   [3/8] configure: add libpmem support
>   [4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
>   [5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
>   [6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
>   [7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM
>   [8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
> 
>  backends/hostmem-file.c             | 27 +++++++++++++++++++-
>  configure                           | 35 ++++++++++++++++++++++++++
>  docs/nvdimm.txt                     | 14 +++++++++++
>  exec.c                              | 20 ++++++++++++---
>  hw/mem/nvdimm.c                     |  9 ++++++-
>  include/exec/memory.h               | 12 +++++++--
>  include/exec/ram_addr.h             | 28 +++++++++++++++++++--
>  include/migration/qemu-file-types.h |  2 ++
>  include/qemu/pmem.h                 | 27 ++++++++++++++++++++
>  memory.c                            |  8 +++---
>  migration/qemu-file.c               | 29 ++++++++++++++--------
>  migration/ram.c                     | 49 +++++++++++++++++++++++++++----------
>  migration/ram.h                     |  2 +-
>  migration/rdma.c                    |  2 +-
>  migration/xbzrle.c                  |  8 ++++--
>  migration/xbzrle.h                  |  3 ++-
>  numa.c                              |  2 +-
>  qemu-options.hx                     |  9 ++++++-
>  stubs/Makefile.objs                 |  1 +
>  stubs/pmem.c                        | 37 ++++++++++++++++++++++++++++
>  tests/Makefile.include              |  4 +--
>  tests/test-xbzrle.c                 |  4 +--
>  22 files changed, 285 insertions(+), 47 deletions(-)
>  create mode 100644 include/qemu/pmem.h
>  create mode 100644 stubs/pmem.c
> 
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
                   ` (9 preceding siblings ...)
  2018-03-08  6:49 ` Haozhong Zhang
@ 2018-03-12 15:39 ` Stefan Hajnoczi
  2018-03-13  0:11   ` Haozhong Zhang
  10 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2018-03-12 15:39 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Xiao Guangrong, mst, Juan Quintela, dgilbert,
	Stefan Hajnoczi, Paolo Bonzini, Igor Mammedov, Dan Williams,
	Eduardo Habkost

[-- Attachment #1: Type: text/plain, Size: 4632 bytes --]

On Wed, Feb 28, 2018 at 03:25:50PM +0800, Haozhong Zhang wrote:
> QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> live migration. If the backend is on the persistent memory, QEMU needs
> to take proper operations to ensure its writes persistent on the
> persistent memory. Otherwise, a host power failure may result in the
> loss the guest data on the persistent memory.
> 
> This v3 patch series is based on Marcel's patch "mem: add share
> parameter to memory-backend-ram" [1] because of the changes in patch 1.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
> 
> Previous versions can be found at
> v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
> v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
> v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
> 
> Changes in v4:
>  * (Patch 2) Fix compilation errors found by patchew.
> 
> Changes in v3:
>  * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
>    PMEM writes in it, so we don't need the _common function.
>  * (Patch 6) Expose qemu_get_buffer_common so we can remove the
>    unnecessary qemu_get_buffer_to_pmem wrapper.
>  * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
>    PMEM writes in it, so we can remove the unnecessary
>    xbzrle_decode_buffer_{common, to_pmem}.
>  * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
>    of test-{xbzrle,vmstate}.c.
> 
> Changes in v2:
>  * (Patch 1) Use a flags parameter in file ram allocation functions.
>  * (Patch 2) Add a new option 'pmem' to hostmem-file.
>  * (Patch 3) Use libpmem to operate on the persistent memory, rather
>    than re-implementing those operations in QEMU.
>  * (Patch 5-8) Consider the write persistence in the migration path.
> 
> Haozhong Zhang (8):
>   [1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
>   [2/8] hostmem-file: add the 'pmem' option
>   [3/8] configure: add libpmem support
>   [4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
>   [5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
>   [6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
>   [7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM
>   [8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
> 
>  backends/hostmem-file.c             | 27 +++++++++++++++++++-
>  configure                           | 35 ++++++++++++++++++++++++++
>  docs/nvdimm.txt                     | 14 +++++++++++
>  exec.c                              | 20 ++++++++++++---
>  hw/mem/nvdimm.c                     |  9 ++++++-
>  include/exec/memory.h               | 12 +++++++--
>  include/exec/ram_addr.h             | 28 +++++++++++++++++++--
>  include/migration/qemu-file-types.h |  2 ++
>  include/qemu/pmem.h                 | 27 ++++++++++++++++++++
>  memory.c                            |  8 +++---
>  migration/qemu-file.c               | 29 ++++++++++++++--------
>  migration/ram.c                     | 49 +++++++++++++++++++++++++++----------
>  migration/ram.h                     |  2 +-
>  migration/rdma.c                    |  2 +-
>  migration/xbzrle.c                  |  8 ++++--
>  migration/xbzrle.h                  |  3 ++-
>  numa.c                              |  2 +-
>  qemu-options.hx                     |  9 ++++++-
>  stubs/Makefile.objs                 |  1 +
>  stubs/pmem.c                        | 37 ++++++++++++++++++++++++++++
>  tests/Makefile.include              |  4 +--
>  tests/test-xbzrle.c                 |  4 +--
>  22 files changed, 285 insertions(+), 47 deletions(-)
>  create mode 100644 include/qemu/pmem.h
>  create mode 100644 stubs/pmem.c

A few thoughts:

1. Can you use pmem_is_pmem() to auto-detect the pmem=on|off value?

2. The migration/ram code is invasive.  Is it really necessary to
   persist data each time pages are loaded from a migration stream?  It
   seems simpler to migrate as normal and call pmem_persist() just once
   after RAM has been migrated but before the migration completes.

3. This is independent of this patch series and can be done later.
   NVDIMM seems incompatible with post-copy live migration.  It would be
   good to have a postcopy_add_blocker() API so that a nice error
   message is printed if post-copy live migration is attempted.

The code itself seems fine though:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-03-12 15:39 ` Stefan Hajnoczi
@ 2018-03-13  0:11   ` Haozhong Zhang
  2018-03-13  9:36     ` Dr. David Alan Gilbert
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-03-13  0:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Xiao Guangrong, mst, Juan Quintela, dgilbert,
	Stefan Hajnoczi, Paolo Bonzini, Igor Mammedov, Dan Williams,
	Eduardo Habkost

On 03/12/18 15:39 +0000, Stefan Hajnoczi wrote:
> On Wed, Feb 28, 2018 at 03:25:50PM +0800, Haozhong Zhang wrote:
> > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> > live migration. If the backend is on the persistent memory, QEMU needs
> > to take proper operations to ensure its writes persistent on the
> > persistent memory. Otherwise, a host power failure may result in the
> > loss the guest data on the persistent memory.
> > 
> > This v3 patch series is based on Marcel's patch "mem: add share
> > parameter to memory-backend-ram" [1] because of the changes in patch 1.
> > 
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
> > 
> > Previous versions can be found at
> > v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
> > v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
> > v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
> > 
> > Changes in v4:
> >  * (Patch 2) Fix compilation errors found by patchew.
> > 
> > Changes in v3:
> >  * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
> >    PMEM writes in it, so we don't need the _common function.
> >  * (Patch 6) Expose qemu_get_buffer_common so we can remove the
> >    unnecessary qemu_get_buffer_to_pmem wrapper.
> >  * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
> >    PMEM writes in it, so we can remove the unnecessary
> >    xbzrle_decode_buffer_{common, to_pmem}.
> >  * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
> >    of test-{xbzrle,vmstate}.c.
> > 
> > Changes in v2:
> >  * (Patch 1) Use a flags parameter in file ram allocation functions.
> >  * (Patch 2) Add a new option 'pmem' to hostmem-file.
> >  * (Patch 3) Use libpmem to operate on the persistent memory, rather
> >    than re-implementing those operations in QEMU.
> >  * (Patch 5-8) Consider the write persistence in the migration path.
> > 
> > Haozhong Zhang (8):
> >   [1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
> >   [2/8] hostmem-file: add the 'pmem' option
> >   [3/8] configure: add libpmem support
> >   [4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
> >   [5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
> >   [6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
> >   [7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM
> >   [8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
> > 
> >  backends/hostmem-file.c             | 27 +++++++++++++++++++-
> >  configure                           | 35 ++++++++++++++++++++++++++
> >  docs/nvdimm.txt                     | 14 +++++++++++
> >  exec.c                              | 20 ++++++++++++---
> >  hw/mem/nvdimm.c                     |  9 ++++++-
> >  include/exec/memory.h               | 12 +++++++--
> >  include/exec/ram_addr.h             | 28 +++++++++++++++++++--
> >  include/migration/qemu-file-types.h |  2 ++
> >  include/qemu/pmem.h                 | 27 ++++++++++++++++++++
> >  memory.c                            |  8 +++---
> >  migration/qemu-file.c               | 29 ++++++++++++++--------
> >  migration/ram.c                     | 49 +++++++++++++++++++++++++++----------
> >  migration/ram.h                     |  2 +-
> >  migration/rdma.c                    |  2 +-
> >  migration/xbzrle.c                  |  8 ++++--
> >  migration/xbzrle.h                  |  3 ++-
> >  numa.c                              |  2 +-
> >  qemu-options.hx                     |  9 ++++++-
> >  stubs/Makefile.objs                 |  1 +
> >  stubs/pmem.c                        | 37 ++++++++++++++++++++++++++++
> >  tests/Makefile.include              |  4 +--
> >  tests/test-xbzrle.c                 |  4 +--
> >  22 files changed, 285 insertions(+), 47 deletions(-)
> >  create mode 100644 include/qemu/pmem.h
> >  create mode 100644 stubs/pmem.c
> 
> A few thoughts:
> 
> 1. Can you use pmem_is_pmem() to auto-detect the pmem=on|off value?

The manpage [1] of pmem_is_pmem says:

 "The result of pmem_is_pmem() query is only valid for the mappings
  created using pmem_map_file().  For other memory regions, in
  particular those created by a direct call to mmap(2), pmem_is_pmem()
  always returns false, even if the queried range is entirely
  persistent memory."

QEMU is using mmap for NVDIMM mapping, so pmem_is_pmem does not work.

[1] http://pmem.io/pmdk/manpages/linux/master/libpmem/pmem_is_pmem.3#caveats

> 
> 2. The migration/ram code is invasive.  Is it really necessary to
>    persist data each time pages are loaded from a migration stream?  It
>    seems simpler to migrate as normal and call pmem_persist() just once
>    after RAM has been migrated but before the migration completes.

The concern is about the overhead of cache flush.

In this patch series, if possible, QEMU will use pmem_mem{set,cpy}_nodrain
APIs to copy NVDIMM blocks. Those APIs use movnt (if it's available) and
can avoid the subsequent cache flush.

Anyway, I'll make some microbenchmark to check which one will be better.


> 
> 3. This is independent of this patch series and can be done later.
>    NVDIMM seems incompatible with post-copy live migration.  It would be
>    good to have a postcopy_add_blocker() API so that a nice error
>    message is printed if post-copy live migration is attempted.

Post-copy with NVDIMM currently fails with message "Postcopy on shared
RAM (...) is not yet supported". Is it enough?

> 
> The code itself seems fine though:
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-03-13  0:11   ` Haozhong Zhang
@ 2018-03-13  9:36     ` Dr. David Alan Gilbert
  2018-03-13 10:44     ` Stefan Hajnoczi
  2018-03-29 19:12     ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-13  9:36 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel, Xiao Guangrong, mst, Juan Quintela,
	Stefan Hajnoczi, Paolo Bonzini, Igor Mammedov, Dan Williams,
	Eduardo Habkost

* Haozhong Zhang (haozhong.zhang@intel.com) wrote:
> On 03/12/18 15:39 +0000, Stefan Hajnoczi wrote:
> > On Wed, Feb 28, 2018 at 03:25:50PM +0800, Haozhong Zhang wrote:
> > > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> > > live migration. If the backend is on the persistent memory, QEMU needs
> > > to take proper operations to ensure its writes persistent on the
> > > persistent memory. Otherwise, a host power failure may result in the
> > > loss the guest data on the persistent memory.
> > > 
> > > This v3 patch series is based on Marcel's patch "mem: add share
> > > parameter to memory-backend-ram" [1] because of the changes in patch 1.
> > > 
> > > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
> > > 
> > > Previous versions can be found at
> > > v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
> > > v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
> > > v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
> > > 
> > > Changes in v4:
> > >  * (Patch 2) Fix compilation errors found by patchew.
> > > 
> > > Changes in v3:
> > >  * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
> > >    PMEM writes in it, so we don't need the _common function.
> > >  * (Patch 6) Expose qemu_get_buffer_common so we can remove the
> > >    unnecessary qemu_get_buffer_to_pmem wrapper.
> > >  * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
> > >    PMEM writes in it, so we can remove the unnecessary
> > >    xbzrle_decode_buffer_{common, to_pmem}.
> > >  * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
> > >    of test-{xbzrle,vmstate}.c.
> > > 
> > > Changes in v2:
> > >  * (Patch 1) Use a flags parameter in file ram allocation functions.
> > >  * (Patch 2) Add a new option 'pmem' to hostmem-file.
> > >  * (Patch 3) Use libpmem to operate on the persistent memory, rather
> > >    than re-implementing those operations in QEMU.
> > >  * (Patch 5-8) Consider the write persistence in the migration path.
> > > 
> > > Haozhong Zhang (8):
> > >   [1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
> > >   [2/8] hostmem-file: add the 'pmem' option
> > >   [3/8] configure: add libpmem support
> > >   [4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
> > >   [5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
> > >   [6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
> > >   [7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM
> > >   [8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
> > > 
> > >  backends/hostmem-file.c             | 27 +++++++++++++++++++-
> > >  configure                           | 35 ++++++++++++++++++++++++++
> > >  docs/nvdimm.txt                     | 14 +++++++++++
> > >  exec.c                              | 20 ++++++++++++---
> > >  hw/mem/nvdimm.c                     |  9 ++++++-
> > >  include/exec/memory.h               | 12 +++++++--
> > >  include/exec/ram_addr.h             | 28 +++++++++++++++++++--
> > >  include/migration/qemu-file-types.h |  2 ++
> > >  include/qemu/pmem.h                 | 27 ++++++++++++++++++++
> > >  memory.c                            |  8 +++---
> > >  migration/qemu-file.c               | 29 ++++++++++++++--------
> > >  migration/ram.c                     | 49 +++++++++++++++++++++++++++----------
> > >  migration/ram.h                     |  2 +-
> > >  migration/rdma.c                    |  2 +-
> > >  migration/xbzrle.c                  |  8 ++++--
> > >  migration/xbzrle.h                  |  3 ++-
> > >  numa.c                              |  2 +-
> > >  qemu-options.hx                     |  9 ++++++-
> > >  stubs/Makefile.objs                 |  1 +
> > >  stubs/pmem.c                        | 37 ++++++++++++++++++++++++++++
> > >  tests/Makefile.include              |  4 +--
> > >  tests/test-xbzrle.c                 |  4 +--
> > >  22 files changed, 285 insertions(+), 47 deletions(-)
> > >  create mode 100644 include/qemu/pmem.h
> > >  create mode 100644 stubs/pmem.c
> > 
> > A few thoughts:
> > 
> > 1. Can you use pmem_is_pmem() to auto-detect the pmem=on|off value?
> 
> The manpage [1] of pmem_is_pmem says:
> 
>  "The result of pmem_is_pmem() query is only valid for the mappings
>   created using pmem_map_file().  For other memory regions, in
>   particular those created by a direct call to mmap(2), pmem_is_pmem()
>   always returns false, even if the queried range is entirely
>   persistent memory."
> 
> QEMU is using mmap for NVDIMM mapping, so pmem_is_pmem does not work.
> 
> [1] http://pmem.io/pmdk/manpages/linux/master/libpmem/pmem_is_pmem.3#caveats
> 
> > 
> > 2. The migration/ram code is invasive.  Is it really necessary to
> >    persist data each time pages are loaded from a migration stream?  It
> >    seems simpler to migrate as normal and call pmem_persist() just once
> >    after RAM has been migrated but before the migration completes.
> 
> The concern is about the overhead of cache flush.
> 
> In this patch series, if possible, QEMU will use pmem_mem{set,cpy}_nodrain
> APIs to copy NVDIMM blocks. Those APIs use movnt (if it's available) and
> can avoid the subsequent cache flush.
> 
> Anyway, I'll make some microbenchmark to check which one will be better.

The problem is not just the overhead; the problem is the code
complexity; this series makes all the paths through the migration code
more complex in places we wouldn't expect to change.

> 
> > 
> > 3. This is independent of this patch series and can be done later.
> >    NVDIMM seems incompatible with post-copy live migration.  It would be
> >    good to have a postcopy_add_blocker() API so that a nice error
> >    message is printed if post-copy live migration is attempted.
> 
> Post-copy with NVDIMM currently fails with message "Postcopy on shared
> RAM (...) is not yet supported". Is it enough?

Once shared support arrives (see my patch series) that check goes
though; it might get trapped by one of the other checks though as well;
I'll need to try simulated pmem to find out.

Dave

> > 
> > The code itself seems fine though:
> > 
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Thanks,
> Haozhong
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-03-13  0:11   ` Haozhong Zhang
  2018-03-13  9:36     ` Dr. David Alan Gilbert
@ 2018-03-13 10:44     ` Stefan Hajnoczi
  2018-03-29 19:12     ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2018-03-13 10:44 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel, Xiao Guangrong, mst, Juan Quintela,
	dgilbert, Paolo Bonzini, Igor Mammedov, Dan Williams,
	Eduardo Habkost

[-- Attachment #1: Type: text/plain, Size: 5485 bytes --]

On Tue, Mar 13, 2018 at 08:11:50AM +0800, Haozhong Zhang wrote:
> On 03/12/18 15:39 +0000, Stefan Hajnoczi wrote:
> > On Wed, Feb 28, 2018 at 03:25:50PM +0800, Haozhong Zhang wrote:
> > > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> > > live migration. If the backend is on the persistent memory, QEMU needs
> > > to take proper operations to ensure its writes persistent on the
> > > persistent memory. Otherwise, a host power failure may result in the
> > > loss the guest data on the persistent memory.
> > > 
> > > This v3 patch series is based on Marcel's patch "mem: add share
> > > parameter to memory-backend-ram" [1] because of the changes in patch 1.
> > > 
> > > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
> > > 
> > > Previous versions can be found at
> > > v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
> > > v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
> > > v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
> > > 
> > > Changes in v4:
> > >  * (Patch 2) Fix compilation errors found by patchew.
> > > 
> > > Changes in v3:
> > >  * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
> > >    PMEM writes in it, so we don't need the _common function.
> > >  * (Patch 6) Expose qemu_get_buffer_common so we can remove the
> > >    unnecessary qemu_get_buffer_to_pmem wrapper.
> > >  * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
> > >    PMEM writes in it, so we can remove the unnecessary
> > >    xbzrle_decode_buffer_{common, to_pmem}.
> > >  * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
> > >    of test-{xbzrle,vmstate}.c.
> > > 
> > > Changes in v2:
> > >  * (Patch 1) Use a flags parameter in file ram allocation functions.
> > >  * (Patch 2) Add a new option 'pmem' to hostmem-file.
> > >  * (Patch 3) Use libpmem to operate on the persistent memory, rather
> > >    than re-implementing those operations in QEMU.
> > >  * (Patch 5-8) Consider the write persistence in the migration path.
> > > 
> > > Haozhong Zhang (8):
> > >   [1/8] memory, exec: switch file ram allocation functions to 'flags' parameters
> > >   [2/8] hostmem-file: add the 'pmem' option
> > >   [3/8] configure: add libpmem support
> > >   [4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation
> > >   [5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
> > >   [6/8] migration/ram: ensure write persistence on loading normal pages to PMEM
> > >   [7/8] migration/ram: ensure write persistence on loading compressed pages to PMEM
> > >   [8/8] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
> > > 
> > >  backends/hostmem-file.c             | 27 +++++++++++++++++++-
> > >  configure                           | 35 ++++++++++++++++++++++++++
> > >  docs/nvdimm.txt                     | 14 +++++++++++
> > >  exec.c                              | 20 ++++++++++++---
> > >  hw/mem/nvdimm.c                     |  9 ++++++-
> > >  include/exec/memory.h               | 12 +++++++--
> > >  include/exec/ram_addr.h             | 28 +++++++++++++++++++--
> > >  include/migration/qemu-file-types.h |  2 ++
> > >  include/qemu/pmem.h                 | 27 ++++++++++++++++++++
> > >  memory.c                            |  8 +++---
> > >  migration/qemu-file.c               | 29 ++++++++++++++--------
> > >  migration/ram.c                     | 49 +++++++++++++++++++++++++++----------
> > >  migration/ram.h                     |  2 +-
> > >  migration/rdma.c                    |  2 +-
> > >  migration/xbzrle.c                  |  8 ++++--
> > >  migration/xbzrle.h                  |  3 ++-
> > >  numa.c                              |  2 +-
> > >  qemu-options.hx                     |  9 ++++++-
> > >  stubs/Makefile.objs                 |  1 +
> > >  stubs/pmem.c                        | 37 ++++++++++++++++++++++++++++
> > >  tests/Makefile.include              |  4 +--
> > >  tests/test-xbzrle.c                 |  4 +--
> > >  22 files changed, 285 insertions(+), 47 deletions(-)
> > >  create mode 100644 include/qemu/pmem.h
> > >  create mode 100644 stubs/pmem.c
> > 
> > A few thoughts:
> > 
> > 1. Can you use pmem_is_pmem() to auto-detect the pmem=on|off value?
> 
> The manpage [1] of pmem_is_pmem says:
> 
>  "The result of pmem_is_pmem() query is only valid for the mappings
>   created using pmem_map_file().  For other memory regions, in
>   particular those created by a direct call to mmap(2), pmem_is_pmem()
>   always returns false, even if the queried range is entirely
>   persistent memory."
> 
> QEMU is using mmap for NVDIMM mapping, so pmem_is_pmem does not work.
> 
> [1] http://pmem.io/pmdk/manpages/linux/master/libpmem/pmem_is_pmem.3#caveats

Now that the PMDK library dependency is being added to QEMU,
pmem_map_file() could be used too.

> > 3. This is independent of this patch series and can be done later.
> >    NVDIMM seems incompatible with post-copy live migration.  It would be
> >    good to have a postcopy_add_blocker() API so that a nice error
> >    message is printed if post-copy live migration is attempted.
> 
> Post-copy with NVDIMM currently fails with message "Postcopy on shared
> RAM (...) is not yet supported". Is it enough?

Yes, that's better than I expected!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
  2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM Haozhong Zhang
@ 2018-03-29 18:59   ` Dr. David Alan Gilbert
  2018-04-02  1:42     ` Haozhong Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-29 18:59 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, Stefan Hajnoczi, Dan Williams

* Haozhong Zhang (haozhong.zhang@intel.com) wrote:
> When loading a zero page, check whether it will be loaded to
> persistent memory If yes, load it by libpmem function
> pmem_memset_nodrain().  Combined with a call to pmem_drain() at the
> end of RAM loading, we can guarantee all those zero pages are
> persistently loaded.
> 
> Depending on the host HW/SW configurations, pmem_drain() can be
> "sfence".  Therefore, we do not call pmem_drain() after each
> pmem_memset_nodrain(), or use pmem_memset_persist() (equally
> pmem_memset_nodrain() + pmem_drain()), in order to avoid unnecessary
> overhead.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

I'm still thinking this is way too invasive;  especially the next patch
that touches qemu_file.

One thing that would help a little, but not really enough, would be
to define a :

struct MemOps {
  void (*copy)(like a memcpy);
  void (*set)(like a memset);
}

then you could have:

struct MemOps normalops = {memcpy, memset};
struct MemOps pmem_nodrain_ops = { pmem_memcopy_nodrain, pmem_memset_nodrain };

then things like ram_handle_compressed would be:

void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, const struct MemOps *mem)
{
    if (ch != 0 || !is_zero_range(host, size)) {
        mem->set(host, ch,size);
    }
}

which means the change is pretty tiny to each function.

> diff --git a/migration/rdma.c b/migration/rdma.c
> index da474fc19f..573bcd2cb0 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3229,7 +3229,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>              host_addr = block->local_host_addr +
>                              (comp->offset - block->offset);
>  
> -            ram_handle_compressed(host_addr, comp->value, comp->length);
> +            ram_handle_compressed(host_addr, comp->value, comp->length, false);

Is that right? Is RDMA not allowed to work on PMEM?
(and anyway this call is a normal clear rather than an actual RDMA op).

Dave

>              break;
>  
>          case RDMA_CONTROL_REGISTER_FINISHED:
> diff --git a/stubs/pmem.c b/stubs/pmem.c
> index 03d990e571..a65b3bfc6b 100644
> --- a/stubs/pmem.c
> +++ b/stubs/pmem.c
> @@ -17,3 +17,12 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
>  {
>      return memcpy(pmemdest, src, len);
>  }
> +
> +void *pmem_memset_nodrain(void *pmemdest, int c, size_t len)
> +{
> +    return memset(pmemdest, c, len);
> +}
> +
> +void pmem_drain(void)
> +{
> +}
> -- 
> 2.14.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-03-13  0:11   ` Haozhong Zhang
  2018-03-13  9:36     ` Dr. David Alan Gilbert
  2018-03-13 10:44     ` Stefan Hajnoczi
@ 2018-03-29 19:12     ` Dr. David Alan Gilbert
  2018-04-02  1:29       ` Haozhong Zhang
  2 siblings, 1 reply; 19+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-29 19:12 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel, Xiao Guangrong, mst, Juan Quintela,
	Stefan Hajnoczi, Paolo Bonzini, Igor Mammedov, Dan Williams,
	Eduardo Habkost

* Haozhong Zhang (haozhong.zhang@intel.com) wrote:

<snip>

> Post-copy with NVDIMM currently fails with message "Postcopy on shared
> RAM (...) is not yet supported". Is it enough?

What does it say now that postcopy-shared support is in?

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
  2018-03-29 19:12     ` Dr. David Alan Gilbert
@ 2018-04-02  1:29       ` Haozhong Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-04-02  1:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Stefan Hajnoczi, qemu-devel, Xiao Guangrong, mst, Juan Quintela,
	Stefan Hajnoczi, Paolo Bonzini, Igor Mammedov, Dan Williams,
	Eduardo Habkost

On 03/29/18 20:12 +0100, Dr. David Alan Gilbert wrote:
> * Haozhong Zhang (haozhong.zhang@intel.com) wrote:
> 
> <snip>
> 
> > Post-copy with NVDIMM currently fails with message "Postcopy on shared
> > RAM (...) is not yet supported". Is it enough?
> 
> What does it say now that postcopy-shared support is in?
> 

I'll check it later.

Haozhong

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
  2018-03-29 18:59   ` Dr. David Alan Gilbert
@ 2018-04-02  1:42     ` Haozhong Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Haozhong Zhang @ 2018-04-02  1:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Eduardo Habkost, Igor Mammedov, Paolo Bonzini, mst,
	Xiao Guangrong, Juan Quintela, Stefan Hajnoczi, Dan Williams

On 03/29/18 19:59 +0100, Dr. David Alan Gilbert wrote:
> * Haozhong Zhang (haozhong.zhang@intel.com) wrote:
> > When loading a zero page, check whether it will be loaded to
> > persistent memory If yes, load it by libpmem function
> > pmem_memset_nodrain().  Combined with a call to pmem_drain() at the
> > end of RAM loading, we can guarantee all those zero pages are
> > persistently loaded.
> > 
> > Depending on the host HW/SW configurations, pmem_drain() can be
> > "sfence".  Therefore, we do not call pmem_drain() after each
> > pmem_memset_nodrain(), or use pmem_memset_persist() (equally
> > pmem_memset_nodrain() + pmem_drain()), in order to avoid unnecessary
> > overhead.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> 
> I'm still thinking this is way too invasive;  especially the next patch
> that touches qemu_file.
> 
> One thing that would help a little, but not really enough, would be
> to define a :
> 
> struct MemOps {
>   void (*copy)(like a memcpy);
>   void (*set)(like a memset);
> }
> 
> then you could have:
> 
> struct MemOps normalops = {memcpy, memset};
> struct MemOps pmem_nodrain_ops = { pmem_memcopy_nodrain, pmem_memset_nodrain };
> 
> then things like ram_handle_compressed would be:
> 
> void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, const struct MemOps *mem)
> {
>     if (ch != 0 || !is_zero_range(host, size)) {
>         mem->set(host, ch,size);
>     }
> }
> 
> which means the change is pretty tiny to each function.

This looks much better than mine.

I'm also considering Stefan's comment that flushing at the end of all
memory migration rather than invasively change every types of copy in
migration stream. We are going to perform some microbenchmarks on the
real hardware, and then decide which way to take.

> 
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index da474fc19f..573bcd2cb0 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -3229,7 +3229,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >              host_addr = block->local_host_addr +
> >                              (comp->offset - block->offset);
> >  
> > -            ram_handle_compressed(host_addr, comp->value, comp->length);
> > +            ram_handle_compressed(host_addr, comp->value, comp->length, false);
> 
> Is that right? Is RDMA not allowed to work on PMEM?
> (and anyway this call is a normal clear rather than an actual RDMA op).
>

Well, this patch exclude RMDA case intentionally. Once it's clear how
to guarantee the persistence of remote PMEM write in RDMA, we will
propose additional patch to add support in QEMU.

Thanks,
Haozhong

> Dave
> 
> >              break;
> >  
> >          case RDMA_CONTROL_REGISTER_FINISHED:
> > diff --git a/stubs/pmem.c b/stubs/pmem.c
> > index 03d990e571..a65b3bfc6b 100644
> > --- a/stubs/pmem.c
> > +++ b/stubs/pmem.c
> > @@ -17,3 +17,12 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
> >  {
> >      return memcpy(pmemdest, src, len);
> >  }
> > +
> > +void *pmem_memset_nodrain(void *pmemdest, int c, size_t len)
> > +{
> > +    return memset(pmemdest, c, len);
> > +}
> > +
> > +void pmem_drain(void)
> > +{
> > +}
> > -- 
> > 2.14.1
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-04-02  1:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28  7:25 [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 1/8] memory, exec: switch file ram allocation functions to 'flags' parameters Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 2/8] hostmem-file: add the 'pmem' option Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 3/8] configure: add libpmem support Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 4/8] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM Haozhong Zhang
2018-03-29 18:59   ` Dr. David Alan Gilbert
2018-04-02  1:42     ` Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 6/8] migration/ram: ensure write persistence on loading normal " Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 7/8] migration/ram: ensure write persistence on loading compressed " Haozhong Zhang
2018-02-28  7:25 ` [Qemu-devel] [PATCH v4 8/8] migration/ram: ensure write persistence on loading xbzrle " Haozhong Zhang
2018-02-28  7:31 ` [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory Haozhong Zhang
2018-03-08  6:49 ` Haozhong Zhang
2018-03-12 15:39 ` Stefan Hajnoczi
2018-03-13  0:11   ` Haozhong Zhang
2018-03-13  9:36     ` Dr. David Alan Gilbert
2018-03-13 10:44     ` Stefan Hajnoczi
2018-03-29 19:12     ` Dr. David Alan Gilbert
2018-04-02  1:29       ` Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.