All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
@ 2016-01-07 12:19 zhanghailiang
  2016-01-07 12:19 ` [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState zhanghailiang
                   ` (14 more replies)
  0 siblings, 15 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

For now, we still didn't support live memory snapshot, we have discussed
a scheme which based on userfaultfd long time ago.
You can find the discussion by the follow link:
https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html

The scheme is based on userfaultfd's write-protect capability.
The userfaultfd write protection feature is available here:
http://www.spinics.net/lists/linux-mm/msg97422.html

The process of this live memory scheme is like bellow:
1. Pause VM
2. Enable write-protect fault notification by using userfaultfd to
   mark VM's memory to write-protect (readonly).
3. Save VM's static state (here is device state) to snapshot file
4. Resume VM, VM is going to run.
5. Snapshot thread begins to save VM's live state (here is RAM) into
   snapshot file.
6. During this time, all the actions of writing VM's memory will be blocked
  by kernel, and kernel will wakeup the fault treating thread in qemu to
  process this write-protect fault. The fault treating thread will deliver this
  page's address to snapshot thread.
7. snapshot thread gets this address, save this page into snasphot file,
   and then remove the write-protect by using userfaultfd API, after that,
   the actions of writing will be recovered. 
8. Repeat step 5~7 until all VM's memory is saved to snapshot file

Compared with the feature of 'migrate VM's state to file',
the main difference for live memory snapshot is it has little time delay for
catching VM's state. It just captures the VM's state while got users snapshot
command, just like take a photo of VM's state.

For now, we only support tcg accelerator, since userfaultfd is not supporting
tracking write faults for KVM.

Usage:
1. Take a snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
Issue snapshot command:
(qemu)migrate -d file:/home/Snapshot
2. Revert to the snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio -incoming file:/home/Snapshot

NOTE:
The userfaultfd write protection feature does not support THP for now,
Before taking snapshot, please disable THP by:
echo never > /sys/kernel/mm/transparent_hugepage/enabled

TODO:
- Reduce the influence for VM while taking snapshot

zhanghailiang (13):
  postcopy/migration: Split fault related state into struct
    UserfaultState
  migration: Allow the migrate command to work on file: urls
  migration: Allow -incoming to work on file: urls
  migration: Create a snapshot thread to realize saving memory snapshot
  migration: implement initialization work for snapshot
  QEMUSizedBuffer: Introduce two help functions for qsb
  savevm: Split qemu_savevm_state_complete_precopy() into two helper
    functions
  snapshot: Save VM's device state into snapshot file
  migration/postcopy-ram: fix some helper functions to support
    userfaultfd write-protect
  snapshot: Enable the write-protect notification capability for VM's
    RAM
  snapshot/migration: Save VM's RAM into snapshot file
  migration/ram: Fix some helper functions' parameter to use
    PageSearchStatus
  snapshot: Remove page's write-protect and copy the content during
    setup stage

 include/migration/migration.h     |  41 +++++--
 include/migration/postcopy-ram.h  |   9 +-
 include/migration/qemu-file.h     |   3 +-
 include/qemu/typedefs.h           |   1 +
 include/sysemu/sysemu.h           |   3 +
 linux-headers/linux/userfaultfd.h |  21 +++-
 migration/fd.c                    |  51 ++++++++-
 migration/migration.c             | 101 ++++++++++++++++-
 migration/postcopy-ram.c          | 229 ++++++++++++++++++++++++++++----------
 migration/qemu-file-buf.c         |  61 ++++++++++
 migration/ram.c                   | 104 ++++++++++++-----
 migration/savevm.c                |  90 ++++++++++++---
 trace-events                      |   1 +
 13 files changed, 587 insertions(+), 128 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
@ 2016-01-07 12:19 ` zhanghailiang
  2016-01-07 12:19 ` [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls zhanghailiang
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

Split fault related state from MigrationIncomingState struct, and put
them all into a new struct UserfaultState. We will add this state into
struct MigrationState in later patch.

We also fix some helper functions to use the new type.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h    | 20 ++++----
 include/migration/postcopy-ram.h |  2 +-
 include/qemu/typedefs.h          |  1 +
 migration/postcopy-ram.c         | 99 +++++++++++++++++++++++-----------------
 migration/savevm.c               |  2 +-
 5 files changed, 72 insertions(+), 52 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index d9494b8..4c80939 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -79,6 +79,16 @@ typedef enum {
     POSTCOPY_INCOMING_END
 } PostcopyState;
 
+struct UserfaultState {
+    bool           have_fault_thread;
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+    /* For the kernel to send us notifications */
+    int       userfault_fd;
+    /* To tell the fault_thread to quit */
+    int       userfault_quit_fd;
+};
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -89,22 +99,16 @@ struct MigrationIncomingState {
      */
     QemuEvent main_thread_load_event;
 
-    bool           have_fault_thread;
-    QemuThread     fault_thread;
-    QemuSemaphore  fault_thread_sem;
-
     bool           have_listen_thread;
     QemuThread     listen_thread;
     QemuSemaphore  listen_thread_sem;
 
-    /* For the kernel to send us notifications */
-    int       userfault_fd;
-    /* To tell the fault_thread to quit */
-    int       userfault_quit_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
 
+    UserfaultState userfault_state;
+
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index b6a7491..e30978f 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -20,7 +20,7 @@ bool postcopy_ram_supported_by_host(void);
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
  * and wire up anything necessary to deal with it.
  */
-int postcopy_ram_enable_notify(MigrationIncomingState *mis);
+int postcopy_ram_enable_notify(UserfaultState *us);
 
 /*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 78fe6e8..eda3063 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -41,6 +41,7 @@ typedef struct MemoryListener MemoryListener;
 typedef struct MemoryMappingList MemoryMappingList;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionSection MemoryRegionSection;
+typedef struct UserfaultState UserfaultState;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
 typedef struct MigrationState MigrationState;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3946aa9..38245d4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -233,7 +233,7 @@ static int init_range(const char *block_name, void *host_addr,
 static int cleanup_range(const char *block_name, void *host_addr,
                         ram_addr_t offset, ram_addr_t length, void *opaque)
 {
-    MigrationIncomingState *mis = opaque;
+    UserfaultState *us = opaque;
     struct uffdio_range range_struct;
     trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
 
@@ -251,7 +251,7 @@ static int cleanup_range(const char *block_name, void *host_addr,
     range_struct.start = (uintptr_t)host_addr;
     range_struct.len = length;
 
-    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
+    if (ioctl(us->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
         error_report("%s: userfault unregister %s", __func__, strerror(errno));
 
         return -1;
@@ -274,36 +274,47 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
     return 0;
 }
 
-/*
- * At the end of a migration where postcopy_ram_incoming_init was called.
- */
-int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+static int postcopy_ram_disable_notify(UserfaultState *us)
 {
-    trace_postcopy_ram_incoming_cleanup_entry();
-
-    if (mis->have_fault_thread) {
+    if (us->have_fault_thread) {
         uint64_t tmp64;
 
-        if (qemu_ram_foreach_block(cleanup_range, mis)) {
+        if (qemu_ram_foreach_block(cleanup_range, us)) {
             return -1;
         }
+
         /*
-         * Tell the fault_thread to exit, it's an eventfd that should
-         * currently be at 0, we're going to increment it to 1
-         */
+        * Tell the fault_thread to exit, it's an eventfd that should
+        * currently be at 0, we're going to increment it to 1
+        */
         tmp64 = 1;
-        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+
+        if (write(us->userfault_quit_fd, &tmp64, 8) == 8) {
             trace_postcopy_ram_incoming_cleanup_join();
-            qemu_thread_join(&mis->fault_thread);
+            qemu_thread_join(&us->fault_thread);
         } else {
             /* Not much we can do here, but may as well report it */
             error_report("%s: incrementing userfault_quit_fd: %s", __func__,
                          strerror(errno));
         }
+
         trace_postcopy_ram_incoming_cleanup_closeuf();
-        close(mis->userfault_fd);
-        close(mis->userfault_quit_fd);
-        mis->have_fault_thread = false;
+        close(us->userfault_fd);
+        close(us->userfault_quit_fd);
+        us->have_fault_thread = false;
+    }
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    trace_postcopy_ram_incoming_cleanup_entry();
+
+    if (postcopy_ram_disable_notify(&mis->userfault_state) < 0) {
+        return 0;
     }
 
     qemu_balloon_inhibit(false);
@@ -376,7 +387,7 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
                                    ram_addr_t offset, ram_addr_t length,
                                    void *opaque)
 {
-    MigrationIncomingState *mis = opaque;
+    UserfaultState *us = opaque;
     struct uffdio_register reg_struct;
 
     reg_struct.range.start = (uintptr_t)host_addr;
@@ -384,7 +395,7 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
 
     /* Now tell our userfault_fd that it's responsible for this area */
-    if (ioctl(mis->userfault_fd, UFFDIO_REGISTER, &reg_struct)) {
+    if (ioctl(us->userfault_fd, UFFDIO_REGISTER, &reg_struct)) {
         error_report("%s userfault register: %s", __func__, strerror(errno));
         return -1;
     }
@@ -397,15 +408,17 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
  */
 static void *postcopy_ram_fault_thread(void *opaque)
 {
-    MigrationIncomingState *mis = opaque;
+    UserfaultState *us = opaque;
     struct uffd_msg msg;
     int ret;
     size_t hostpagesize = getpagesize();
     RAMBlock *rb = NULL;
     RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
+    MigrationIncomingState *mis = container_of(us, MigrationIncomingState,
+                                               userfault_state);
 
     trace_postcopy_ram_fault_thread_entry();
-    qemu_sem_post(&mis->fault_thread_sem);
+    qemu_sem_post(&us->fault_thread_sem);
 
     while (true) {
         ram_addr_t rb_offset;
@@ -417,10 +430,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
          * however we can be told to quit via userfault_quit_fd which is
          * an eventfd
          */
-        pfd[0].fd = mis->userfault_fd;
+        pfd[0].fd = us->userfault_fd;
         pfd[0].events = POLLIN;
         pfd[0].revents = 0;
-        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].fd = us->userfault_quit_fd;
         pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
         pfd[1].revents = 0;
 
@@ -434,7 +447,8 @@ static void *postcopy_ram_fault_thread(void *opaque)
             break;
         }
 
-        ret = read(mis->userfault_fd, &msg, sizeof(msg));
+        ret = read(us->userfault_fd, &msg, sizeof(msg));
+
         if (ret != sizeof(msg)) {
             if (errno == EAGAIN) {
                 /*
@@ -491,11 +505,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
     return NULL;
 }
 
-int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+int postcopy_ram_enable_notify(UserfaultState *us)
 {
     /* Open the fd for the kernel to give us userfaults */
-    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
-    if (mis->userfault_fd == -1) {
+    us->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    if (us->userfault_fd == -1) {
         error_report("%s: Failed to open userfault fd: %s", __func__,
                      strerror(errno));
         return -1;
@@ -505,28 +519,28 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
      * Although the host check already tested the API, we need to
      * do the check again as an ABI handshake on the new fd.
      */
-    if (!ufd_version_check(mis->userfault_fd)) {
+    if (!ufd_version_check(us->userfault_fd)) {
         return -1;
     }
 
     /* Now an eventfd we use to tell the fault-thread to quit */
-    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
-    if (mis->userfault_quit_fd == -1) {
+    us->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
+    if (us->userfault_quit_fd == -1) {
         error_report("%s: Opening userfault_quit_fd: %s", __func__,
                      strerror(errno));
-        close(mis->userfault_fd);
+        close(us->userfault_fd);
         return -1;
     }
 
-    qemu_sem_init(&mis->fault_thread_sem, 0);
-    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
-                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
-    qemu_sem_wait(&mis->fault_thread_sem);
-    qemu_sem_destroy(&mis->fault_thread_sem);
-    mis->have_fault_thread = true;
+    qemu_sem_init(&us->fault_thread_sem, 0);
+    qemu_thread_create(&us->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, us, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&us->fault_thread_sem);
+    qemu_sem_destroy(&us->fault_thread_sem);
+    us->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
-    if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
+    if (qemu_ram_foreach_block(ram_block_enable_notify, us)) {
         return -1;
     }
 
@@ -559,7 +573,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from)
      * which would be slightly cheaper, but we'd have to be careful
      * of the order of updating our page state.
      */
-    if (ioctl(mis->userfault_fd, UFFDIO_COPY, &copy_struct)) {
+    if (ioctl(mis->userfault_state.userfault_fd, UFFDIO_COPY, &copy_struct)) {
         int e = errno;
         error_report("%s: %s copy host: %p from: %p",
                      __func__, strerror(e), host, from);
@@ -583,7 +597,8 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host)
     zero_struct.range.len = getpagesize();
     zero_struct.mode = 0;
 
-    if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
+    if (ioctl(mis->userfault_state.userfault_fd, UFFDIO_ZEROPAGE,
+              &zero_struct)) {
         int e = errno;
         error_report("%s: %s zero host: %p",
                      __func__, strerror(e), host);
@@ -651,7 +666,7 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
     return -1;
 }
 
-int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+int postcopy_ram_enable_notify(UserfaultState *us, int mode)
 {
     assert(0);
     return -1;
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ad1b93..9b22498 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1467,7 +1467,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
      * However, at this point the CPU shouldn't be running, and the IO
      * shouldn't be doing anything yet so don't actually expect requests
      */
-    if (postcopy_ram_enable_notify(mis)) {
+    if (postcopy_ram_enable_notify(&mis->userfault_state)) {
         return -1;
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
  2016-01-07 12:19 ` [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState zhanghailiang
@ 2016-01-07 12:19 ` zhanghailiang
  2016-07-13 16:12   ` Dr. David Alan Gilbert
  2016-01-07 12:19 ` [Qemu-devel] [RFC 03/13] migration: Allow -incoming " zhanghailiang
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, Benoit Canet, zhanghailiang, hanweidong, quintela,
	peter.huangpeng, dgilbert, amit.shah

Usage:
(qemu) migrate file:/path/to/vm_statefile

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
---
- With this patch, we can easily test memory snapshot
- Rebase on qemu 2.5
---
 include/migration/migration.h |  6 +++++-
 migration/fd.c                | 19 +++++++++++++++++--
 migration/migration.c         |  4 +++-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4c80939..bf4f8e9 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -193,7 +193,11 @@ void unix_start_outgoing_migration(MigrationState *s, const char *path, Error **
 
 void fd_start_incoming_migration(const char *path, Error **errp);
 
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp);
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
+                                 int outfd, Error **errp);
+
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+                                   Error **errp);
 
 void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp);
 
diff --git a/migration/fd.c b/migration/fd.c
index 3e4bed0..b62161f 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -42,9 +42,10 @@ static bool fd_is_socket(int fd)
     return S_ISSOCK(stat.st_mode);
 }
 
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
+                                 int outfd, Error **errp)
 {
-    int fd = monitor_get_fd(cur_mon, fdname, errp);
+    int fd = fdname ? monitor_get_fd(cur_mon, fdname, errp) : outfd;
     if (fd == -1) {
         return;
     }
@@ -58,6 +59,20 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **
     migrate_fd_connect(s);
 }
 
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+                                   Error **errp)
+{
+    int fd;
+
+    fd = qemu_open(filename, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);
+    if (fd < 0) {
+        error_setg_errno(errp, errno, "Failed to open file: %s", filename);
+        return;
+    }
+    fd_start_outgoing_migration(s, NULL, fd, errp);
+}
+
+
 static void fd_accept_incoming_migration(void *opaque)
 {
     QEMUFile *f = opaque;
diff --git a/migration/migration.c b/migration/migration.c
index c842499..3ec3b85 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1021,7 +1021,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     } else if (strstart(uri, "unix:", &p)) {
         unix_start_outgoing_migration(s, p, &local_err);
     } else if (strstart(uri, "fd:", &p)) {
-        fd_start_outgoing_migration(s, p, &local_err);
+        fd_start_outgoing_migration(s, p, -1, &local_err);
+    } else if (strstart(uri, "file:", &p)) {
+        file_start_outgoing_migration(s, p,  &local_err);
 #endif
     } else {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 03/13] migration: Allow -incoming to work on file: urls
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
  2016-01-07 12:19 ` [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState zhanghailiang
  2016-01-07 12:19 ` [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls zhanghailiang
@ 2016-01-07 12:19 ` zhanghailiang
  2016-01-11 20:02   ` Dr. David Alan Gilbert
  2016-01-07 12:19 ` [Qemu-devel] [RFC 04/13] migration: Create a snapshot thread to realize saving memory snapshot zhanghailiang
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, Benoit Canet, zhanghailiang, hanweidong, quintela,
	peter.huangpeng, dgilbert, amit.shah

Usage:
-incoming file:/path/to/vm_statefile

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
---
- Rebase on qemu 2.5
- Use qemu_strtol instead of strtol
---
 include/migration/migration.h |  4 +++-
 migration/fd.c                | 28 +++++++++++++++++++++++++---
 migration/migration.c         |  4 +++-
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index bf4f8e9..3f372a5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -191,7 +191,9 @@ void unix_start_incoming_migration(const char *path, Error **errp);
 
 void unix_start_outgoing_migration(MigrationState *s, const char *path, Error **errp);
 
-void fd_start_incoming_migration(const char *path, Error **errp);
+void fd_start_incoming_migration(const char *path, int fd, Error **errp);
+
+void file_start_incoming_migration(const char *filename, Error **errp);
 
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
                                  int outfd, Error **errp);
diff --git a/migration/fd.c b/migration/fd.c
index b62161f..ac38256 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -81,14 +81,24 @@ static void fd_accept_incoming_migration(void *opaque)
     process_incoming_migration(f);
 }
 
-void fd_start_incoming_migration(const char *infd, Error **errp)
+void fd_start_incoming_migration(const char *infd,  int fd, Error **errp)
 {
-    int fd;
     QEMUFile *f;
+    int err;
+    long in_fd;
 
     DPRINTF("Attempting to start an incoming migration via fd\n");
 
-    fd = strtol(infd, NULL, 0);
+    if (infd) {
+        err = qemu_strtol(infd, NULL, 0, &in_fd);
+        if (err < 0) {
+            error_setg_errno(errp, -err, "Failed to convert string '%s'"
+                            " to number", infd);
+            return;
+        }
+        fd = (int)in_fd;
+    }
+
     if (fd_is_socket(fd)) {
         f = qemu_fopen_socket(fd, "rb");
     } else {
@@ -101,3 +111,15 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
 
     qemu_set_fd_handler(fd, fd_accept_incoming_migration, NULL, f);
 }
+
+void file_start_incoming_migration(const char *filename, Error **errp)
+{
+    int fd;
+
+    fd = qemu_open(filename, O_RDONLY);
+    if (fd < 0) {
+        error_setg_errno(errp, errno, "Failed to open file:%s", filename);
+        return;
+    }
+    fd_start_incoming_migration(NULL, fd, NULL);
+}
diff --git a/migration/migration.c b/migration/migration.c
index 3ec3b85..e54910d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -314,7 +314,9 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     } else if (strstart(uri, "unix:", &p)) {
         unix_start_incoming_migration(p, errp);
     } else if (strstart(uri, "fd:", &p)) {
-        fd_start_incoming_migration(p, errp);
+        fd_start_incoming_migration(p, -1, errp);
+    } else if (strstart(uri, "file:", &p)) {
+        file_start_incoming_migration(p, errp);
 #endif
     } else {
         error_setg(errp, "unknown migration protocol: %s", uri);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 04/13] migration: Create a snapshot thread to realize saving memory snapshot
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (2 preceding siblings ...)
  2016-01-07 12:19 ` [Qemu-devel] [RFC 03/13] migration: Allow -incoming " zhanghailiang
@ 2016-01-07 12:19 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 05/13] migration: implement initialization work for snapshot zhanghailiang
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

If users use migrate file:url command, we consider it as creating
live memory snapshot command.
Besides, we only support tcg accel for now.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h |  2 ++
 migration/fd.c                |  4 ++++
 migration/migration.c         | 30 ++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3f372a5..1316d22 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -171,6 +171,7 @@ struct MigrationState
     QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
     /* The RAMBlock used in the last src_page_request */
     RAMBlock *last_req_rb;
+    bool in_snapshot; /* for snapshot */
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -215,6 +216,7 @@ void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
 MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
+bool migration_in_snapshot(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 /* True if outgoing migration has entered postcopy phase */
diff --git a/migration/fd.c b/migration/fd.c
index ac38256..6036560 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -69,6 +69,10 @@ void file_start_outgoing_migration(MigrationState *s, const char *filename,
         error_setg_errno(errp, errno, "Failed to open file: %s", filename);
         return;
     }
+    /* Fix me: just for test
+    *  we shouldn't use this to identify if we are do snapshot.
+    */
+    s->in_snapshot = true;
     fd_start_outgoing_migration(s, NULL, fd, errp);
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index e54910d..7633043 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -33,6 +33,7 @@
 #include "qom/cpu.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "hw/boards.h" /* Fix me: Remove this if we support snapshot for KVM */
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
@@ -901,6 +902,11 @@ bool migration_in_postcopy(MigrationState *s)
     return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
+bool migration_in_snapshot(MigrationState *s)
+{
+    return s->in_snapshot;
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -1732,6 +1738,21 @@ static void *migration_thread(void *opaque)
     return NULL;
 }
 
+static void *snapshot_thread(void *opaque)
+{
+    rcu_register_thread();
+    /* Fix me: Remove this if we support snapshot for KVM */
+    if (strcmp(current_machine->accel, "tcg")) {
+        error_report("snapshot only support 'tcg' accel for now");
+        goto error;
+    }
+
+    /* TODO: create memory snapshot */
+
+error:
+    rcu_unregister_thread();
+    return NULL;
+}
 void migrate_fd_connect(MigrationState *s)
 {
     /* This is a best 1st approximation. ns to ms */
@@ -1759,8 +1780,13 @@ void migrate_fd_connect(MigrationState *s)
     }
 
     migrate_compress_threads_create();
-    qemu_thread_create(&s->thread, "migration", migration_thread, s,
-                       QEMU_THREAD_JOINABLE);
+    if (!s->in_snapshot) {
+        qemu_thread_create(&s->thread, "migration", migration_thread, s,
+                           QEMU_THREAD_JOINABLE);
+    } else {
+       qemu_thread_create(&s->thread, "snapshot", snapshot_thread, s,
+                          QEMU_THREAD_JOINABLE);
+   }
     s->migration_thread_running = true;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 05/13] migration: implement initialization work for snapshot
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (3 preceding siblings ...)
  2016-01-07 12:19 ` [Qemu-devel] [RFC 04/13] migration: Create a snapshot thread to realize saving memory snapshot zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 06/13] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

We re-use some migration helper fucntions to realize setup work
for snapshot, besides, we need to do some initialization work (for example,
save VM's device state) with VM pausing.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/migration.c | 36 +++++++++++++++++++++++++++++++++++-
 migration/ram.c       |  8 +++++---
 trace-events          |  1 +
 3 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7633043..7413e0d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1740,6 +1740,10 @@ static void *migration_thread(void *opaque)
 
 static void *snapshot_thread(void *opaque)
 {
+    MigrationState *ms = opaque;;
+    bool old_vm_running = false;
+    int ret;
+
     rcu_register_thread();
     /* Fix me: Remove this if we support snapshot for KVM */
     if (strcmp(current_machine->accel, "tcg")) {
@@ -1747,8 +1751,38 @@ static void *snapshot_thread(void *opaque)
         goto error;
     }
 
-    /* TODO: create memory snapshot */
+    qemu_savevm_state_header(ms->file);
+    qemu_savevm_state_begin(ms->file, &ms->params);
+
+    qemu_mutex_lock_iothread();
+    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+    old_vm_running = runstate_is_running();
+    ret = global_state_store();
+    if (!ret) {
+        ret = vm_stop_force_state(RUN_STATE_SAVE_VM);
+        if (ret < 0) {
+            error_report("Failed to stop VM");
+            goto error;
+        }
+    }
+
+    /* TODO: other setup work */
 
+    if (old_vm_running) {
+        vm_start();
+    }
+    qemu_mutex_unlock_iothread();
+
+    migrate_set_state(ms, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
+
+    trace_snapshot_thread_setup_complete();
+
+    /* Save VM's state */
+
+    qemu_mutex_lock_iothread();
+    qemu_savevm_state_cleanup();
+    qemu_bh_schedule(ms->cleanup_bh);
+    qemu_mutex_unlock_iothread();
 error:
     rcu_unregister_thread();
     return NULL;
diff --git a/migration/ram.c b/migration/ram.c
index 0490f00..c87663f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1935,9 +1935,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
      * gaps due to alignment or unplugs.
      */
     migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
-
-    memory_global_dirty_log_start();
-    migration_bitmap_sync();
+    /* For snapshot, we don't need to enable global dirty log */
+    if (!migration_in_snapshot(migrate_get_current())) {
+        memory_global_dirty_log_start();
+        migration_bitmap_sync();
+    }
     qemu_mutex_unlock_ramlist();
     qemu_mutex_unlock_iothread();
 
diff --git a/trace-events b/trace-events
index ea5872d..cfebbed 100644
--- a/trace-events
+++ b/trace-events
@@ -1495,6 +1495,7 @@ migrate_state_too_big(void) ""
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
+snapshot_thread_setup_complete(void) ""
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 06/13] QEMUSizedBuffer: Introduce two help functions for qsb
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (4 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 05/13] migration: implement initialization work for snapshot zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 07/13] savevm: Split qemu_savevm_state_complete_precopy() into two helper functions zhanghailiang
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, Li Zhijian, hanweidong, quintela,
	peter.huangpeng, dgilbert, amit.shah

Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  3 ++-
 migration/qemu-file-buf.c     | 61 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index b5d08d2..ca6a582 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -150,7 +150,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
                        uint8_t *buf);
 ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
                      off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size);
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size);
 
 /*
  * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 49516b8..c50a495 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -366,6 +366,67 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
     return count;
 }
 
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size)
+{
+    size_t l;
+    int i;
+
+    for (i = 0; i < qsb->n_iov && size > 0; i++) {
+        l = MIN(qsb->iov[i].iov_len, size);
+        qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+        size -= l;
+    }
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size)
+{
+    ssize_t rc = qsb_grow(qsb, size);
+    ssize_t pending = size;
+    int i;
+    uint8_t *buf = NULL;
+
+    qsb->used = 0;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+        size_t doneone = 0;
+        /* read until iov full */
+        while (doneone < qsb->iov[i].iov_len && pending > 0) {
+            size_t readone = 0;
+
+            buf = qsb->iov[i].iov_base;
+            readone = qemu_get_buffer(f, buf,
+                                MIN(qsb->iov[i].iov_len - doneone, pending));
+            if (readone == 0) {
+                return qsb->used;
+            }
+            buf += readone;
+            doneone += readone;
+            pending -= readone;
+            qsb->used += readone;
+        }
+    }
+    return qsb->used;
+}
+
 typedef struct QEMUBuffer {
     QEMUSizedBuffer *qsb;
     QEMUFile *file;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 07/13] savevm: Split qemu_savevm_state_complete_precopy() into two helper functions
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (5 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 06/13] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 08/13] snapshot: Save VM's device state into snapshot file zhanghailiang
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

We splited qemu_savevm_state_complete_precopy() into two helper functions,
qemu_savevm_section_full() and qemu_savevm_section_end().
The main reason to do that is, sometimes we may want to do this two works
separately.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/savevm.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 9b22498..1b4e5bd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1026,18 +1026,12 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
-void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
+static int qemu_savevm_section_end(QEMUFile *f, bool iterable_only)
 {
-    QJSON *vmdesc;
-    int vmdesc_len;
     SaveStateEntry *se;
     int ret;
     bool in_postcopy = migration_in_postcopy(migrate_get_current());
 
-    trace_savevm_state_complete_precopy();
-
-    cpu_synchronize_all_states();
-
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops ||
             (in_postcopy && se->ops->save_live_complete_postcopy) ||
@@ -1060,13 +1054,18 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
         save_section_footer(f, se);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
-            return;
+            return -1;
         }
     }
+    return 0;
+}
 
-    if (iterable_only) {
-        return;
-    }
+static void qemu_savevm_section_full(QEMUFile *f)
+{
+    QJSON *vmdesc;
+    int vmdesc_len;
+    SaveStateEntry *se;
+    bool in_postcopy = migration_in_postcopy(migrate_get_current());
 
     vmdesc = qjson_new();
     json_prop_int(vmdesc, "page_size", TARGET_PAGE_SIZE);
@@ -1111,6 +1110,26 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
         qemu_put_buffer(f, (uint8_t *)qjson_get_str(vmdesc), vmdesc_len);
     }
     object_unref(OBJECT(vmdesc));
+}
+
+void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
+{
+    int ret;
+
+    trace_savevm_state_complete_precopy();
+
+    cpu_synchronize_all_states();
+
+    ret = qemu_savevm_section_end(f, iterable_only);
+    if (ret < 0) {
+        return;
+    }
+
+    if (iterable_only) {
+        return;
+    }
+
+    qemu_savevm_section_full(f);
 
     qemu_fflush(f);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 08/13] snapshot: Save VM's device state into snapshot file
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (6 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 07/13] savevm: Split qemu_savevm_state_complete_precopy() into two helper functions zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 09/13] migration/postcopy-ram: fix some helper functions to support userfaultfd write-protect zhanghailiang
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

For live memory snapshot, we want to catch VM's state at the
time of getting snapshot command. So we need to save the VM's
static state (here, it is VM's device state) at the beginning
of snapshot_thread(), but we can't do that while VM is running.
Besides, we can't save device's state into snapshot file directly,
because, we want to re-use the migration's incoming process with
snapshot, we need to keep the save sequence.

So here, we save the VM's device state into qsb temporarily in the
SETUP stage with VM is stopped, and save it into snapshot file after
finishing save VM's live state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/sysemu/sysemu.h |  3 +++
 migration/migration.c   | 14 ++++++++++++--
 migration/savevm.c      | 44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3bb8897..3bc13b6 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -111,6 +111,9 @@ void qemu_savevm_state_begin(QEMUFile *f,
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
+QEMUSizedBuffer *qemu_save_device_buffer(void);
+int qemu_save_buffer_file(MigrationState *s, QEMUSizedBuffer *buffer);
+
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
diff --git a/migration/migration.c b/migration/migration.c
index 7413e0d..fd234eb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1742,6 +1742,7 @@ static void *snapshot_thread(void *opaque)
 {
     MigrationState *ms = opaque;;
     bool old_vm_running = false;
+    QEMUSizedBuffer *buffer = NULL;
     int ret;
 
     rcu_register_thread();
@@ -1766,7 +1767,7 @@ static void *snapshot_thread(void *opaque)
         }
     }
 
-    /* TODO: other setup work */
+    buffer = qemu_save_device_buffer();
 
     if (old_vm_running) {
         vm_start();
@@ -1777,7 +1778,16 @@ static void *snapshot_thread(void *opaque)
 
     trace_snapshot_thread_setup_complete();
 
-    /* Save VM's state */
+    /* Save VM's Live state, such as RAM */
+
+    qemu_save_buffer_file(ms, buffer);
+    ret = qemu_file_get_error(ms->file);
+    if (ret < 0) {
+        migrate_set_state(ms, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED);
+    } else {
+        migrate_set_state(ms, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_COMPLETED);
+    }
 
     qemu_mutex_lock_iothread();
     qemu_savevm_state_cleanup();
diff --git a/migration/savevm.c b/migration/savevm.c
index 1b4e5bd..a59f216 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -59,6 +59,8 @@
 #define ARP_PTYPE_IP 0x0800
 #define ARP_OP_REQUEST_REV 0x3
 
+#define BUFFER_BASE_SIZE (4 * 1024 * 1024)
+
 const unsigned int postcopy_ram_discard_version = 0;
 
 static bool skip_section_footers;
@@ -2206,3 +2208,45 @@ void vmstate_register_ram_global(MemoryRegion *mr)
 {
     vmstate_register_ram(mr, NULL);
 }
+
+QEMUSizedBuffer *qemu_save_device_buffer(void)
+{
+    QEMUSizedBuffer *buffer;
+    QEMUFile *trans = NULL;
+
+    buffer = qsb_create(NULL, BUFFER_BASE_SIZE);
+    if (buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        return NULL;
+    }
+
+    qsb_set_length(buffer, 0);
+    trans = qemu_bufopen("w", buffer);
+    if (!trans) {
+        error_report("Open qsb buffer for write failed");
+        goto error;
+    }
+    cpu_synchronize_all_states();
+    qemu_savevm_section_full(trans);
+    qemu_fflush(trans);
+    error_report("buffer address 0 :%p", buffer);
+    return buffer;
+
+error:
+    qsb_free(buffer);
+    buffer = NULL;
+    return NULL;
+}
+
+int qemu_save_buffer_file(MigrationState *s, QEMUSizedBuffer *buffer)
+{
+    size_t size;
+
+    size = qsb_get_length(buffer);
+
+    qsb_put_buffer(s->file, buffer, size);
+    qemu_fflush(s->file);
+    qsb_free(buffer);
+    buffer = NULL;
+    return qemu_file_get_error(s->file);
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 09/13] migration/postcopy-ram: fix some helper functions to support userfaultfd write-protect
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (7 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 08/13] snapshot: Save VM's device state into snapshot file zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 10/13] snapshot: Enable the write-protect notification capability for VM's RAM zhanghailiang
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

We will re-use some helper functions for snapshot process, and fix these
helper functions to support UFFDIO_WRITEPROTECT_MODE_WP.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h     |  2 +
 include/migration/postcopy-ram.h  |  2 +-
 linux-headers/linux/userfaultfd.h | 21 +++++++++--
 migration/postcopy-ram.c          | 78 ++++++++++++++++++++++++++++++---------
 migration/savevm.c                |  5 ++-
 5 files changed, 83 insertions(+), 25 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1316d22..2312c73 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -87,6 +87,8 @@ struct UserfaultState {
     int       userfault_fd;
     /* To tell the fault_thread to quit */
     int       userfault_quit_fd;
+    /* UFFDIO_REGISTER_MODE_MISSING or UFFDIO_REGISTER_MODE_WP*/
+    int       mode;
 };
 
 /* State for the incoming migration */
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index e30978f..568cbdd 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -20,7 +20,7 @@ bool postcopy_ram_supported_by_host(void);
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
  * and wire up anything necessary to deal with it.
  */
-int postcopy_ram_enable_notify(UserfaultState *us);
+int postcopy_ram_enable_notify(UserfaultState *us, int mode);
 
 /*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index 9057d7a..1cc3f44 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -17,7 +17,7 @@
  * #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \
  *			      UFFD_FEATURE_EVENT_FORK)
  */
-#define UFFD_API_FEATURES (0)
+#define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -25,7 +25,8 @@
 #define UFFD_API_RANGE_IOCTLS			\
 	((__u64)1 << _UFFDIO_WAKE |		\
 	 (__u64)1 << _UFFDIO_COPY |		\
-	 (__u64)1 << _UFFDIO_ZEROPAGE)
+     (__u64)1 << _UFFDIO_ZEROPAGE | \
+     (__u64)1 << _UFFDIO_WRITEPROTECT)
 
 /*
  * Valid ioctl command number range with this API is from 0x00 to
@@ -40,6 +41,7 @@
 #define _UFFDIO_WAKE			(0x02)
 #define _UFFDIO_COPY			(0x03)
 #define _UFFDIO_ZEROPAGE		(0x04)
+#define _UFFDIO_WRITEPROTECT    (0x05)
 #define _UFFDIO_API			(0x3F)
 
 /* userfaultfd ioctl ids */
@@ -57,6 +59,9 @@
 #define UFFDIO_ZEROPAGE		_IOWR(UFFDIO, _UFFDIO_ZEROPAGE,	\
 				      struct uffdio_zeropage)
 
+#define UFFDIO_WRITEPROTECT    _IOWR(UFFDIO, _UFFDIO_WRITEPROTECT, \
+                     struct uffdio_writeprotect)
+
 /* read() structure */
 struct uffd_msg {
 	__u8	event;
@@ -78,7 +83,7 @@ struct uffd_msg {
 			__u64	reserved3;
 		} reserved;
 	} arg;
-} __packed;
+} __attribute__((packed));
 
 /*
  * Start at 0x12 and not at 0 to be more strict against bugs.
@@ -105,8 +110,9 @@ struct uffdio_api {
 	 * are to be considered implicitly always enabled in all kernels as
 	 * long as the uffdio_api.api requested matches UFFD_API.
 	 */
-#if 0 /* not available yet */
+
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
+#if 0
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
 #endif
 	__u64 features;
@@ -164,4 +170,11 @@ struct uffdio_zeropage {
 	__s64 zeropage;
 };
 
+struct uffdio_writeprotect {
+   struct uffdio_range range;
+   /* !WP means undo writeprotect. DONTWAKE is valid only with !WP */
+#define UFFDIO_WRITEPROTECT_MODE_WP        ((__u64)1<<0)
+#define UFFDIO_WRITEPROTECT_MODE_DONTWAKE  ((__u64)1<<1)
+   __u64 mode;
+};
 #endif /* _LINUX_USERFAULTFD_H */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 38245d4..370197e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -85,6 +85,11 @@ static bool ufd_version_check(int ufd)
         return false;
     }
 
+    if (!(api_struct.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
+        error_report("Does not support write protect feature");
+        return false;
+    }
+
     return true;
 }
 
@@ -374,6 +379,31 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
     return 0;
 }
 
+static int ram_set_pages_wp(uint64_t page_addr,
+                            uint64_t size,
+                            bool remove,
+                            int uffd)
+{
+    struct uffdio_writeprotect wp_struct;
+
+    memset(&wp_struct, 0, sizeof(wp_struct));
+    wp_struct.range.start = (uint64_t)(uintptr_t)page_addr;
+    wp_struct.range.len = size;
+    if (remove) {
+        wp_struct.mode = UFFDIO_WRITEPROTECT_MODE_DONTWAKE;
+    } else {
+        wp_struct.mode = UFFDIO_WRITEPROTECT_MODE_WP;
+    }
+    if (ioctl(uffd, UFFDIO_WRITEPROTECT, &wp_struct)) {
+        int e = errno;
+        error_report("%s: %s  page_addr: 0x%lx",
+                     __func__, strerror(e), page_addr);
+
+        return -e;
+    }
+    return 0;
+}
+
 /*
  * Mark the given area of RAM as requiring notification to unwritten areas
  * Used as a  callback on qemu_ram_foreach_block.
@@ -389,18 +419,26 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
 {
     UserfaultState *us = opaque;
     struct uffdio_register reg_struct;
+    int ret = 0;
 
     reg_struct.range.start = (uintptr_t)host_addr;
     reg_struct.range.len = length;
-    reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+    reg_struct.mode = us->mode;
 
     /* Now tell our userfault_fd that it's responsible for this area */
     if (ioctl(us->userfault_fd, UFFDIO_REGISTER, &reg_struct)) {
         error_report("%s userfault register: %s", __func__, strerror(errno));
         return -1;
     }
+    /* We need to remove the write permission for pages to enable kernel
+    * notify us.
+    */
+    if (us->mode == UFFDIO_REGISTER_MODE_WP) {
+        ret = ram_set_pages_wp((uintptr_t)host_addr, length, false,
+                                us->userfault_fd);
+    }
 
-    return 0;
+    return ret;
 }
 
 /*
@@ -414,8 +452,6 @@ static void *postcopy_ram_fault_thread(void *opaque)
     size_t hostpagesize = getpagesize();
     RAMBlock *rb = NULL;
     RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
-    MigrationIncomingState *mis = container_of(us, MigrationIncomingState,
-                                               userfault_state);
 
     trace_postcopy_ram_fault_thread_entry();
     qemu_sem_post(&us->fault_thread_sem);
@@ -487,25 +523,31 @@ static void *postcopy_ram_fault_thread(void *opaque)
                                                 qemu_ram_get_idstr(rb),
                                                 rb_offset);
 
-        /*
-         * Send the request to the source - we want to request one
-         * of our host page sizes (which is >= TPS)
-         */
-        if (rb != last_rb) {
-            last_rb = rb;
-            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                     rb_offset, hostpagesize);
-        } else {
-            /* Save some space */
-            migrate_send_rp_req_pages(mis, NULL,
-                                     rb_offset, hostpagesize);
+        if (us->mode == UFFDIO_REGISTER_MODE_MISSING) {
+            MigrationIncomingState *mis = container_of(us,
+                                                       MigrationIncomingState,
+                                                       userfault_state);
+
+            /*
+             * Send the request to the source - we want to request one
+             * of our host page sizes (which is >= TPS)
+             */
+            if (rb != last_rb) {
+                last_rb = rb;
+                migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                          rb_offset, hostpagesize);
+            } else {
+                /* Save some space */
+                migrate_send_rp_req_pages(mis, NULL,
+                                          rb_offset, hostpagesize);
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
     return NULL;
 }
 
-int postcopy_ram_enable_notify(UserfaultState *us)
+int postcopy_ram_enable_notify(UserfaultState *us, int mode)
 {
     /* Open the fd for the kernel to give us userfaults */
     us->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
@@ -514,7 +556,7 @@ int postcopy_ram_enable_notify(UserfaultState *us)
                      strerror(errno));
         return -1;
     }
-
+    us->mode = mode;
     /*
      * Although the host check already tested the API, we need to
      * do the check again as an ABI handshake on the new fd.
diff --git a/migration/savevm.c b/migration/savevm.c
index a59f216..8fe5328f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -50,7 +50,7 @@
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
-
+#include <linux/userfaultfd.h>
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
@@ -1488,7 +1488,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
      * However, at this point the CPU shouldn't be running, and the IO
      * shouldn't be doing anything yet so don't actually expect requests
      */
-    if (postcopy_ram_enable_notify(&mis->userfault_state)) {
+    if (postcopy_ram_enable_notify(&mis->userfault_state,
+                                   UFFDIO_REGISTER_MODE_MISSING)) {
         return -1;
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 10/13] snapshot: Enable the write-protect notification capability for VM's RAM
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (8 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 09/13] migration/postcopy-ram: fix some helper functions to support userfaultfd write-protect zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 11/13] snapshot/migration: Save VM's RAM into snapshot file zhanghailiang
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

For live memory snapshot, we need to save the page before it is dirtied,
With the help of userfaultfd's write-protect notification capability,
we can get notification when the page is going to be dirtied, and save
the page before it is written.

We need to enable VM's ram write-protect notification capability with VM is
paused, and here we add UserfaultState parameter for struct MigrationState.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h    |  3 +++
 include/migration/postcopy-ram.h |  3 +++
 migration/migration.c            |  6 +++++-
 migration/postcopy-ram.c         | 34 ++++++++++++++++++++++++++++++----
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2312c73..ef4c071 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -173,6 +173,9 @@ struct MigrationState
     QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
     /* The RAMBlock used in the last src_page_request */
     RAMBlock *last_req_rb;
+
+    UserfaultState userfault_state;
+
     bool in_snapshot; /* for snapshot */
 };
 
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 568cbdd..978a8d7 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -96,4 +96,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host);
  */
 void *postcopy_get_tmp_page(MigrationIncomingState *mis);
 
+int postcopy_ram_disable_notify(UserfaultState *us);
+
+void qemu_mlock_all_memory(void);
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index fd234eb..2001490 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1751,6 +1751,8 @@ static void *snapshot_thread(void *opaque)
         error_report("snapshot only support 'tcg' accel for now");
         goto error;
     }
+    /* userfaultfd's write-protected capability need all pages to be exist */
+    qemu_mlock_all_memory();
 
     qemu_savevm_state_header(ms->file);
     qemu_savevm_state_begin(ms->file, &ms->params);
@@ -1766,7 +1768,7 @@ static void *snapshot_thread(void *opaque)
             goto error;
         }
     }
-
+    postcopy_ram_enable_notify(&ms->userfault_state, UFFDIO_REGISTER_MODE_WP);
     buffer = qemu_save_device_buffer();
 
     if (old_vm_running) {
@@ -1789,6 +1791,8 @@ static void *snapshot_thread(void *opaque)
                           MIGRATION_STATUS_COMPLETED);
     }
 
+    postcopy_ram_disable_notify(&ms->userfault_state);
+
     qemu_mutex_lock_iothread();
     qemu_savevm_state_cleanup();
     qemu_bh_schedule(ms->cleanup_bh);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 370197e..9cd854b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -279,7 +279,7 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
     return 0;
 }
 
-static int postcopy_ram_disable_notify(UserfaultState *us)
+int postcopy_ram_disable_notify(UserfaultState *us)
 {
     if (us->have_fault_thread) {
         uint64_t tmp64;
@@ -370,9 +370,7 @@ static int nhp_range(const char *block_name, void *host_addr,
  */
 int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
 {
-    if (qemu_ram_foreach_block(nhp_range, mis)) {
-        return -1;
-    }
+
 
     postcopy_state_set(POSTCOPY_INCOMING_DISCARD);
 
@@ -586,6 +584,9 @@ int postcopy_ram_enable_notify(UserfaultState *us, int mode)
         return -1;
     }
 
+    if (qemu_ram_foreach_block(nhp_range, us)) {
+        return -1;
+    }
     /*
      * Ballooning can mark pages as absent while we're postcopying
      * that would cause false userfaults.
@@ -816,3 +817,28 @@ void postcopy_discard_send_finish(MigrationState *ms, PostcopyDiscardState *pds)
 
     g_free(pds);
 }
+
+static int ram_block_mlock(const char *block_name, void *host_addr,
+                                   ram_addr_t offset, ram_addr_t length,
+                                   void *opaque)
+{
+   int ret;
+
+    ret = mlock(host_addr, length);
+    if (ret < 0) {
+        error_report("%s mlock failed: %s", __func__, strerror(errno));
+        return -1;
+    }
+    return 0;
+}
+
+void qemu_mlock_all_memory(void)
+{
+    /* Users have configured mlock, so don't do it again */
+    if (enable_mlock) {
+        return;
+    }
+    if (qemu_ram_foreach_block(ram_block_mlock, NULL)) {
+        error_report("mlock all VM's memory failed");
+    }
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 11/13] snapshot/migration: Save VM's RAM into snapshot file
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (9 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 10/13] snapshot: Enable the write-protect notification capability for VM's RAM zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-07 12:20 ` [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus zhanghailiang
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

For live memory snapshot, we should capture the VM's state
just at the time of getting snapshot command. For VM's RAM,
they may be dirtied during the process of creating snapshot,
Because VM is still running while we create snapshot.
To catch all the action of writing page, we have remove all pages'
write permission by using userfaultfd, and we also have a thread
to deal with the write-protect notification.

Here, we will read the address of page that will be dirtied, and
then save it into a queue. For snapshot thread, it will save the
page into snapshot file and then remove the write-protect.
In this way, we can ensure, the content of the page in snapshot file is
same with the time we got snapshot command.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/postcopy-ram.h |  4 ++++
 migration/migration.c            | 17 +++++++++++++++--
 migration/postcopy-ram.c         | 19 +++++++++++++++----
 migration/ram.c                  | 24 +++++++++++++++++++++---
 4 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 978a8d7..bc9ce41 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -96,6 +96,10 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host);
  */
 void *postcopy_get_tmp_page(MigrationIncomingState *mis);
 
+int ram_set_pages_wp(uint64_t page_addr,
+                     uint64_t size,
+                     bool remove,
+                     int uffd);
 int postcopy_ram_disable_notify(UserfaultState *us);
 
 void qemu_mlock_all_memory(void);
diff --git a/migration/migration.c b/migration/migration.c
index 2001490..3765c3b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -34,6 +34,7 @@
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
 #include "hw/boards.h" /* Fix me: Remove this if we support snapshot for KVM */
+#include <linux/userfaultfd.h>
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
@@ -1780,7 +1781,19 @@ static void *snapshot_thread(void *opaque)
 
     trace_snapshot_thread_setup_complete();
 
-    /* Save VM's Live state, such as RAM */
+    while (qemu_file_get_error(ms->file) == 0) {
+        if (qemu_savevm_state_iterate(ms->file, false) > 0) {
+            break;
+        }
+    }
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret == 0) {
+        qemu_savevm_state_complete_precopy(ms->file, true);
+    } else {
+        migrate_set_state(ms, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED);
+        goto out;
+    }
 
     qemu_save_buffer_file(ms, buffer);
     ret = qemu_file_get_error(ms->file);
@@ -1790,7 +1803,7 @@ static void *snapshot_thread(void *opaque)
         migrate_set_state(ms, MIGRATION_STATUS_ACTIVE,
                           MIGRATION_STATUS_COMPLETED);
     }
-
+out:
     postcopy_ram_disable_notify(&ms->userfault_state);
 
     qemu_mutex_lock_iothread();
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9cd854b..61392d3 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -377,10 +377,10 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
     return 0;
 }
 
-static int ram_set_pages_wp(uint64_t page_addr,
-                            uint64_t size,
-                            bool remove,
-                            int uffd)
+int ram_set_pages_wp(uint64_t page_addr,
+                     uint64_t size,
+                     bool remove,
+                     int uffd)
 {
     struct uffdio_writeprotect wp_struct;
 
@@ -539,6 +539,17 @@ static void *postcopy_ram_fault_thread(void *opaque)
                 migrate_send_rp_req_pages(mis, NULL,
                                           rb_offset, hostpagesize);
             }
+        } else { /* UFFDIO_REGISTER_MODE_WP */
+            MigrationState *ms = container_of(us, MigrationState,
+                                              userfault_state);
+            ret = ram_save_queue_pages(ms, qemu_ram_get_idstr(rb), rb_offset,
+                                       hostpagesize);
+
+            if (ret < 0) {
+                error_report("%s: Save: %"PRIx64 " failed!",
+                             __func__, (uint64_t)msg.arg.pagefault.address);
+                break;
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
diff --git a/migration/ram.c b/migration/ram.c
index c87663f..fc4c788 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -734,7 +734,11 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
     ram_addr_t current_addr;
     uint8_t *p;
     int ret;
-    bool send_async = true;
+    /*
+    * For snapshot, we should not be async, or the content of page may be
+    * changed before it is really been saved.
+    */
+    bool send_async = !migration_in_snapshot(migrate_get_current());
 
     p = block->host + offset;
 
@@ -1087,7 +1091,7 @@ static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
          * even if this queue request was received after the background
          * search already sent it.
          */
-        if (block) {
+        if (block && !migration_in_snapshot(ms)) {
             unsigned long *bitmap;
             bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
             dirty = test_bit(*ram_addr_abs >> TARGET_PAGE_BITS, bitmap);
@@ -1351,6 +1355,19 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
             pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
                                        last_stage, bytes_transferred,
                                        dirty_ram_abs);
+            /* For snapshot, we will remove the page write-protect here */
+            if (migration_in_snapshot(ms)) {
+                int ret;
+                uint64_t host_addr = (uint64_t)(pss.block->host + pss.offset);
+
+                ret = ram_set_pages_wp(host_addr, getpagesize(), true,
+                                       ms->userfault_state.userfault_fd);
+                if (ret < 0) {
+                    error_report("Failed to remove the write-protect for page:"
+                                 "%"PRIx64 " length: %d, block: %s", host_addr,
+                                 getpagesize(), pss.block->idstr);
+                }
+            }
         }
     } while (!pages && again);
 
@@ -2031,7 +2048,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     rcu_read_lock();
 
-    if (!migration_in_postcopy(migrate_get_current())) {
+    if (!migration_in_postcopy(migrate_get_current()) &&
+        !migration_in_snapshot(migrate_get_current())) {
         migration_bitmap_sync();
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (10 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 11/13] snapshot/migration: Save VM's RAM into snapshot file zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-01-11 17:55   ` Dr. David Alan Gilbert
  2016-01-07 12:20 ` [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage zhanghailiang
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

Some helper functions use parameters 'RAMBlock *block' and 'ram_addr_t *offset',
We can use 'PageSearchStatus *pss' directly instead, with this change, we
can reduce the number of parameters for these helper function, also
it is easily to add new parameters for these helper functions.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/ram.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index fc4c788..8656719 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -726,7 +726,7 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
  * @last_stage: if we are at the completion stage
  * @bytes_transferred: increase it with the number of transferred bytes
  */
-static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
+static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
                          bool last_stage, uint64_t *bytes_transferred)
 {
     int pages = -1;
@@ -739,6 +739,8 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
     * changed before it is really been saved.
     */
     bool send_async = !migration_in_snapshot(migrate_get_current());
+    RAMBlock *block = pss->block;
+    ram_addr_t offset = pss->offset;
 
     p = block->host + offset;
 
@@ -913,14 +915,16 @@ static int compress_page_with_multi_thread(QEMUFile *f, RAMBlock *block,
  * @last_stage: if we are at the completion stage
  * @bytes_transferred: increase it with the number of transferred bytes
  */
-static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
-                                    ram_addr_t offset, bool last_stage,
+static int ram_save_compressed_page(QEMUFile *f, PageSearchStatus *pss,
+                                    bool last_stage,
                                     uint64_t *bytes_transferred)
 {
     int pages = -1;
     uint64_t bytes_xmit;
     uint8_t *p;
     int ret;
+    RAMBlock *block = pss->block;
+    ram_addr_t offset = pss->offset;
 
     p = block->host + offset;
 
@@ -1230,7 +1234,7 @@ err:
  * Returns: Number of pages written.
  */
 static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
-                                RAMBlock *block, ram_addr_t offset,
+                                PageSearchStatus *pss,
                                 bool last_stage,
                                 uint64_t *bytes_transferred,
                                 ram_addr_t dirty_ram_abs)
@@ -1241,11 +1245,11 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
     if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
         unsigned long *unsentmap;
         if (compression_switch && migrate_use_compression()) {
-            res = ram_save_compressed_page(f, block, offset,
+            res = ram_save_compressed_page(f, pss,
                                            last_stage,
                                            bytes_transferred);
         } else {
-            res = ram_save_page(f, block, offset, last_stage,
+            res = ram_save_page(f, pss, last_stage,
                                 bytes_transferred);
         }
 
@@ -1261,7 +1265,7 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
          * to the stream.
          */
         if (res > 0) {
-            last_sent_block = block;
+            last_sent_block = pss->block;
         }
     }
 
@@ -1285,26 +1289,27 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
  * @bytes_transferred: increase it with the number of transferred bytes
  * @dirty_ram_abs: Address of the start of the dirty page in ram_addr_t space
  */
-static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock *block,
-                              ram_addr_t *offset, bool last_stage,
+static int ram_save_host_page(MigrationState *ms, QEMUFile *f,
+                              PageSearchStatus *pss,
+                              bool last_stage,
                               uint64_t *bytes_transferred,
                               ram_addr_t dirty_ram_abs)
 {
     int tmppages, pages = 0;
     do {
-        tmppages = ram_save_target_page(ms, f, block, *offset, last_stage,
+        tmppages = ram_save_target_page(ms, f, pss, last_stage,
                                         bytes_transferred, dirty_ram_abs);
         if (tmppages < 0) {
             return tmppages;
         }
 
         pages += tmppages;
-        *offset += TARGET_PAGE_SIZE;
+        pss->offset += TARGET_PAGE_SIZE;
         dirty_ram_abs += TARGET_PAGE_SIZE;
-    } while (*offset & (qemu_host_page_size - 1));
+    } while (pss->offset & (qemu_host_page_size - 1));
 
     /* The offset we leave with is the last one we looked at */
-    *offset -= TARGET_PAGE_SIZE;
+    pss->offset -= TARGET_PAGE_SIZE;
     return pages;
 }
 
@@ -1352,7 +1357,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
         }
 
         if (found) {
-            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
+            pages = ram_save_host_page(ms, f, &pss,
                                        last_stage, bytes_transferred,
                                        dirty_ram_abs);
             /* For snapshot, we will remove the page write-protect here */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (11 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus zhanghailiang
@ 2016-01-07 12:20 ` zhanghailiang
  2016-07-13 17:52   ` Dr. David Alan Gilbert
  2016-07-04 12:22 ` [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd Baptiste Reynal
  2016-07-13 18:02 ` Dr. David Alan Gilbert
  14 siblings, 1 reply; 48+ messages in thread
From: zhanghailiang @ 2016-01-07 12:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, zhanghailiang, hanweidong, quintela, peter.huangpeng,
	dgilbert, amit.shah

If we modify VM's RAM (pages) during setup stage after enable write-protect
notification in snapshot thread, the modification action will get stuck because
we only remove the page's write-protect in savevm process, it blocked by itself.

To fix this bug, we will remove page's write-protect in fault thread during
the setup stage. Besides, we should not try to get global lock after setup stage,
or there maybe an deadlock error.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h |  4 ++--
 migration/migration.c         |  2 +-
 migration/postcopy-ram.c      | 17 ++++++++++++++++-
 migration/ram.c               | 37 +++++++++++++++++++++++++++++++------
 4 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ef4c071..435de31 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -127,7 +127,7 @@ struct MigrationSrcPageRequest {
     RAMBlock *rb;
     hwaddr    offset;
     hwaddr    len;
-
+    uint8_t *pages_copy_addr;
     QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
 };
 
@@ -333,7 +333,7 @@ void global_state_store_running(void);
 
 void flush_page_queue(MigrationState *ms);
 int ram_save_queue_pages(MigrationState *ms, const char *rbname,
-                         ram_addr_t start, ram_addr_t len);
+                         ram_addr_t start, ram_addr_t len, bool copy_pages);
 
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
diff --git a/migration/migration.c b/migration/migration.c
index 3765c3b..bf4c7a1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1248,7 +1248,7 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
         return;
     }
 
-    if (ram_save_queue_pages(ms, rbname, start, len)) {
+    if (ram_save_queue_pages(ms, rbname, start, len, false)) {
         mark_source_rp_bad(ms);
     }
 }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 61392d3..2cf477d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -543,13 +543,28 @@ static void *postcopy_ram_fault_thread(void *opaque)
             MigrationState *ms = container_of(us, MigrationState,
                                               userfault_state);
             ret = ram_save_queue_pages(ms, qemu_ram_get_idstr(rb), rb_offset,
-                                       hostpagesize);
+                                       hostpagesize, true);
 
             if (ret < 0) {
                 error_report("%s: Save: %"PRIx64 " failed!",
                              __func__, (uint64_t)msg.arg.pagefault.address);
                 break;
             }
+
+            /* Note: In the setup process, snapshot_thread may modify VM's
+            * write-protected pages, we should not block it there, or there
+            * will be an deadlock error.
+            */
+            if (migration_in_setup(ms)) {
+                uint64_t host = msg.arg.pagefault.address;
+
+                host &= ~(hostpagesize - 1);
+                ret = ram_set_pages_wp(host, getpagesize(), true,
+                                       us->userfault_fd);
+                if (ret < 0) {
+                    error_report("Remove page's write-protect failed");
+                }
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
diff --git a/migration/ram.c b/migration/ram.c
index 8656719..747f9aa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -233,6 +233,7 @@ struct PageSearchStatus {
     ram_addr_t   offset;
     /* Set once we wrap around */
     bool         complete_round;
+    uint8_t *pages_copy;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -742,7 +743,12 @@ static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->offset;
 
-    p = block->host + offset;
+    /* If we have a copy of this page, use the backup page first */
+    if (pss->pages_copy) {
+        p = pss->pages_copy;
+    } else {
+        p = block->host + offset;
+    }
 
     /* In doubt sent page as normal */
     bytes_xmit = 0;
@@ -926,7 +932,12 @@ static int ram_save_compressed_page(QEMUFile *f, PageSearchStatus *pss,
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->offset;
 
-    p = block->host + offset;
+    /* If we have a copy of this page, use the backup first */
+    if (pss->pages_copy) {
+        p = pss->pages_copy;
+    } else {
+        p = block->host + offset;
+    }
 
     bytes_xmit = 0;
     ret = ram_control_save_page(f, block->offset,
@@ -1043,7 +1054,7 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
  * Returns:      block (or NULL if none available)
  */
 static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
-                              ram_addr_t *ram_addr_abs)
+                              ram_addr_t *ram_addr_abs, uint8 **pages_copy_addr)
 {
     RAMBlock *block = NULL;
 
@@ -1055,7 +1066,7 @@ static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
         *offset = entry->offset;
         *ram_addr_abs = (entry->offset + entry->rb->offset) &
                         TARGET_PAGE_MASK;
-
+        *pages_copy_addr = entry->pages_copy_addr;
         if (entry->len > TARGET_PAGE_SIZE) {
             entry->len -= TARGET_PAGE_SIZE;
             entry->offset += TARGET_PAGE_SIZE;
@@ -1086,9 +1097,10 @@ static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
     RAMBlock  *block;
     ram_addr_t offset;
     bool dirty;
+    uint8 *pages_backup_addr = NULL;
 
     do {
-        block = unqueue_page(ms, &offset, ram_addr_abs);
+        block = unqueue_page(ms, &offset, ram_addr_abs, &pages_backup_addr);
         /*
          * We're sending this page, and since it's postcopy nothing else
          * will dirty it, and we must make sure it doesn't get sent again
@@ -1130,6 +1142,7 @@ static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
          */
         pss->block = block;
         pss->offset = offset;
+        pss->pages_copy = pages_backup_addr;
     }
 
     return !!block;
@@ -1166,7 +1179,7 @@ void flush_page_queue(MigrationState *ms)
  *   Return: 0 on success
  */
 int ram_save_queue_pages(MigrationState *ms, const char *rbname,
-                         ram_addr_t start, ram_addr_t len)
+                         ram_addr_t start, ram_addr_t len, bool copy_pages)
 {
     RAMBlock *ramblock;
 
@@ -1206,6 +1219,17 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
     new_entry->rb = ramblock;
     new_entry->offset = start;
     new_entry->len = len;
+    if (copy_pages) {
+        /* Fix me: Better to realize a memory pool */
+        new_entry->pages_copy_addr = g_try_malloc0(len);
+
+        if (!new_entry->pages_copy_addr) {
+            error_report("%s: Failed to alloc memory", __func__);
+            return -1;
+        }
+
+        memcpy(new_entry->pages_copy_addr, ramblock_ptr(ramblock, start), len);
+    }
 
     memory_region_ref(ramblock->mr);
     qemu_mutex_lock(&ms->src_page_req_mutex);
@@ -1342,6 +1366,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     pss.block = last_seen_block;
     pss.offset = last_offset;
     pss.complete_round = false;
+    pss.pages_copy = NULL;
 
     if (!pss.block) {
         pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus
  2016-01-07 12:20 ` [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus zhanghailiang
@ 2016-01-11 17:55   ` Dr. David Alan Gilbert
  2016-01-12 12:59     ` Hailiang Zhang
  0 siblings, 1 reply; 48+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-11 17:55 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, hanweidong, quintela, peter.huangpeng, qemu-devel, amit.shah

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Some helper functions use parameters 'RAMBlock *block' and 'ram_addr_t *offset',
> We can use 'PageSearchStatus *pss' directly instead, with this change, we
> can reduce the number of parameters for these helper function, also
> it is easily to add new parameters for these helper functions.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

You should post this as a separate patch, it's independent of
the rest of the series and makes sense to go in anyway.

Dave

> ---
>  migration/ram.c | 33 +++++++++++++++++++--------------
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index fc4c788..8656719 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -726,7 +726,7 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>   * @last_stage: if we are at the completion stage
>   * @bytes_transferred: increase it with the number of transferred bytes
>   */
> -static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
> +static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
>                           bool last_stage, uint64_t *bytes_transferred)
>  {
>      int pages = -1;
> @@ -739,6 +739,8 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>      * changed before it is really been saved.
>      */
>      bool send_async = !migration_in_snapshot(migrate_get_current());
> +    RAMBlock *block = pss->block;
> +    ram_addr_t offset = pss->offset;
>  
>      p = block->host + offset;
>  
> @@ -913,14 +915,16 @@ static int compress_page_with_multi_thread(QEMUFile *f, RAMBlock *block,
>   * @last_stage: if we are at the completion stage
>   * @bytes_transferred: increase it with the number of transferred bytes
>   */
> -static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
> -                                    ram_addr_t offset, bool last_stage,
> +static int ram_save_compressed_page(QEMUFile *f, PageSearchStatus *pss,
> +                                    bool last_stage,
>                                      uint64_t *bytes_transferred)
>  {
>      int pages = -1;
>      uint64_t bytes_xmit;
>      uint8_t *p;
>      int ret;
> +    RAMBlock *block = pss->block;
> +    ram_addr_t offset = pss->offset;
>  
>      p = block->host + offset;
>  
> @@ -1230,7 +1234,7 @@ err:
>   * Returns: Number of pages written.
>   */
>  static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
> -                                RAMBlock *block, ram_addr_t offset,
> +                                PageSearchStatus *pss,
>                                  bool last_stage,
>                                  uint64_t *bytes_transferred,
>                                  ram_addr_t dirty_ram_abs)
> @@ -1241,11 +1245,11 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
>      if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
>          unsigned long *unsentmap;
>          if (compression_switch && migrate_use_compression()) {
> -            res = ram_save_compressed_page(f, block, offset,
> +            res = ram_save_compressed_page(f, pss,
>                                             last_stage,
>                                             bytes_transferred);
>          } else {
> -            res = ram_save_page(f, block, offset, last_stage,
> +            res = ram_save_page(f, pss, last_stage,
>                                  bytes_transferred);
>          }
>  
> @@ -1261,7 +1265,7 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
>           * to the stream.
>           */
>          if (res > 0) {
> -            last_sent_block = block;
> +            last_sent_block = pss->block;
>          }
>      }
>  
> @@ -1285,26 +1289,27 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
>   * @bytes_transferred: increase it with the number of transferred bytes
>   * @dirty_ram_abs: Address of the start of the dirty page in ram_addr_t space
>   */
> -static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock *block,
> -                              ram_addr_t *offset, bool last_stage,
> +static int ram_save_host_page(MigrationState *ms, QEMUFile *f,
> +                              PageSearchStatus *pss,
> +                              bool last_stage,
>                                uint64_t *bytes_transferred,
>                                ram_addr_t dirty_ram_abs)
>  {
>      int tmppages, pages = 0;
>      do {
> -        tmppages = ram_save_target_page(ms, f, block, *offset, last_stage,
> +        tmppages = ram_save_target_page(ms, f, pss, last_stage,
>                                          bytes_transferred, dirty_ram_abs);
>          if (tmppages < 0) {
>              return tmppages;
>          }
>  
>          pages += tmppages;
> -        *offset += TARGET_PAGE_SIZE;
> +        pss->offset += TARGET_PAGE_SIZE;
>          dirty_ram_abs += TARGET_PAGE_SIZE;
> -    } while (*offset & (qemu_host_page_size - 1));
> +    } while (pss->offset & (qemu_host_page_size - 1));
>  
>      /* The offset we leave with is the last one we looked at */
> -    *offset -= TARGET_PAGE_SIZE;
> +    pss->offset -= TARGET_PAGE_SIZE;
>      return pages;
>  }
>  
> @@ -1352,7 +1357,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>          }
>  
>          if (found) {
> -            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
> +            pages = ram_save_host_page(ms, f, &pss,
>                                         last_stage, bytes_transferred,
>                                         dirty_ram_abs);
>              /* For snapshot, we will remove the page write-protect here */
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 03/13] migration: Allow -incoming to work on file: urls
  2016-01-07 12:19 ` [Qemu-devel] [RFC 03/13] migration: Allow -incoming " zhanghailiang
@ 2016-01-11 20:02   ` Dr. David Alan Gilbert
  2016-01-12 13:04     ` Hailiang Zhang
  0 siblings, 1 reply; 48+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-11 20:02 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, Benoit Canet, hanweidong, quintela, peter.huangpeng,
	qemu-devel, amit.shah

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Usage:
> -incoming file:/path/to/vm_statefile
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Benoit Canet <benoit.canet@gmail.com>

This could again be split out of this series; however I have some comments.

> ---
> - Rebase on qemu 2.5
> - Use qemu_strtol instead of strtol
> ---
>  include/migration/migration.h |  4 +++-
>  migration/fd.c                | 28 +++++++++++++++++++++++++---
>  migration/migration.c         |  4 +++-
>  3 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index bf4f8e9..3f372a5 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -191,7 +191,9 @@ void unix_start_incoming_migration(const char *path, Error **errp);
>  
>  void unix_start_outgoing_migration(MigrationState *s, const char *path, Error **errp);
>  
> -void fd_start_incoming_migration(const char *path, Error **errp);
> +void fd_start_incoming_migration(const char *path, int fd, Error **errp);
> +
> +void file_start_incoming_migration(const char *filename, Error **errp);
>  
>  void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
>                                   int outfd, Error **errp);
> diff --git a/migration/fd.c b/migration/fd.c
> index b62161f..ac38256 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -81,14 +81,24 @@ static void fd_accept_incoming_migration(void *opaque)
>      process_incoming_migration(f);
>  }
>  
> -void fd_start_incoming_migration(const char *infd, Error **errp)
> +void fd_start_incoming_migration(const char *infd,  int fd, Error **errp)
>  {
> -    int fd;
>      QEMUFile *f;
> +    int err;
> +    long in_fd;
>  
>      DPRINTF("Attempting to start an incoming migration via fd\n");
>  
> -    fd = strtol(infd, NULL, 0);
> +    if (infd) {
> +        err = qemu_strtol(infd, NULL, 0, &in_fd);
> +        if (err < 0) {
> +            error_setg_errno(errp, -err, "Failed to convert string '%s'"
> +                            " to number", infd);
> +            return;
> +        }
> +        fd = (int)in_fd;
> +    }
> +
>      if (fd_is_socket(fd)) {
>          f = qemu_fopen_socket(fd, "rb");
>      } else {

I think I'd prefer to see something like:
void fd_start_incoming_migration_core(int fd, Error **errp)

void fd_start_incoming_migration(const char *infd, Error **errp)
{
  qemu_strtol
  fd_start_incoming_migration_core
...
}

(I've always done -incoming "exec:cat file"  but this is neater)

Dave

> @@ -101,3 +111,15 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
>  
>      qemu_set_fd_handler(fd, fd_accept_incoming_migration, NULL, f);
>  }
> +
> +void file_start_incoming_migration(const char *filename, Error **errp)
> +{
> +    int fd;
> +
> +    fd = qemu_open(filename, O_RDONLY);
> +    if (fd < 0) {
> +        error_setg_errno(errp, errno, "Failed to open file:%s", filename);
> +        return;
> +    }
> +    fd_start_incoming_migration(NULL, fd, NULL);
> +}
> diff --git a/migration/migration.c b/migration/migration.c
> index 3ec3b85..e54910d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -314,7 +314,9 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
>      } else if (strstart(uri, "unix:", &p)) {
>          unix_start_incoming_migration(p, errp);
>      } else if (strstart(uri, "fd:", &p)) {
> -        fd_start_incoming_migration(p, errp);
> +        fd_start_incoming_migration(p, -1, errp);
> +    } else if (strstart(uri, "file:", &p)) {
> +        file_start_incoming_migration(p, errp);
>  #endif
>      } else {
>          error_setg(errp, "unknown migration protocol: %s", uri);
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus
  2016-01-11 17:55   ` Dr. David Alan Gilbert
@ 2016-01-12 12:59     ` Hailiang Zhang
  0 siblings, 0 replies; 48+ messages in thread
From: Hailiang Zhang @ 2016-01-12 12:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, quintela, hanweidong, peter.huangpeng, qemu-devel, amit.shah

On 2016/1/12 1:55, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Some helper functions use parameters 'RAMBlock *block' and 'ram_addr_t *offset',
>> We can use 'PageSearchStatus *pss' directly instead, with this change, we
>> can reduce the number of parameters for these helper function, also
>> it is easily to add new parameters for these helper functions.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> You should post this as a separate patch, it's independent of
> the rest of the series and makes sense to go in anyway.
>

Thanks for your quick feedback. I will post it later.

> Dave
>
>> ---
>>   migration/ram.c | 33 +++++++++++++++++++--------------
>>   1 file changed, 19 insertions(+), 14 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index fc4c788..8656719 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -726,7 +726,7 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>>    * @last_stage: if we are at the completion stage
>>    * @bytes_transferred: increase it with the number of transferred bytes
>>    */
>> -static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>> +static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
>>                            bool last_stage, uint64_t *bytes_transferred)
>>   {
>>       int pages = -1;
>> @@ -739,6 +739,8 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>>       * changed before it is really been saved.
>>       */
>>       bool send_async = !migration_in_snapshot(migrate_get_current());
>> +    RAMBlock *block = pss->block;
>> +    ram_addr_t offset = pss->offset;
>>
>>       p = block->host + offset;
>>
>> @@ -913,14 +915,16 @@ static int compress_page_with_multi_thread(QEMUFile *f, RAMBlock *block,
>>    * @last_stage: if we are at the completion stage
>>    * @bytes_transferred: increase it with the number of transferred bytes
>>    */
>> -static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
>> -                                    ram_addr_t offset, bool last_stage,
>> +static int ram_save_compressed_page(QEMUFile *f, PageSearchStatus *pss,
>> +                                    bool last_stage,
>>                                       uint64_t *bytes_transferred)
>>   {
>>       int pages = -1;
>>       uint64_t bytes_xmit;
>>       uint8_t *p;
>>       int ret;
>> +    RAMBlock *block = pss->block;
>> +    ram_addr_t offset = pss->offset;
>>
>>       p = block->host + offset;
>>
>> @@ -1230,7 +1234,7 @@ err:
>>    * Returns: Number of pages written.
>>    */
>>   static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
>> -                                RAMBlock *block, ram_addr_t offset,
>> +                                PageSearchStatus *pss,
>>                                   bool last_stage,
>>                                   uint64_t *bytes_transferred,
>>                                   ram_addr_t dirty_ram_abs)
>> @@ -1241,11 +1245,11 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
>>       if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
>>           unsigned long *unsentmap;
>>           if (compression_switch && migrate_use_compression()) {
>> -            res = ram_save_compressed_page(f, block, offset,
>> +            res = ram_save_compressed_page(f, pss,
>>                                              last_stage,
>>                                              bytes_transferred);
>>           } else {
>> -            res = ram_save_page(f, block, offset, last_stage,
>> +            res = ram_save_page(f, pss, last_stage,
>>                                   bytes_transferred);
>>           }
>>
>> @@ -1261,7 +1265,7 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
>>            * to the stream.
>>            */
>>           if (res > 0) {
>> -            last_sent_block = block;
>> +            last_sent_block = pss->block;
>>           }
>>       }
>>
>> @@ -1285,26 +1289,27 @@ static int ram_save_target_page(MigrationState *ms, QEMUFile *f,
>>    * @bytes_transferred: increase it with the number of transferred bytes
>>    * @dirty_ram_abs: Address of the start of the dirty page in ram_addr_t space
>>    */
>> -static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock *block,
>> -                              ram_addr_t *offset, bool last_stage,
>> +static int ram_save_host_page(MigrationState *ms, QEMUFile *f,
>> +                              PageSearchStatus *pss,
>> +                              bool last_stage,
>>                                 uint64_t *bytes_transferred,
>>                                 ram_addr_t dirty_ram_abs)
>>   {
>>       int tmppages, pages = 0;
>>       do {
>> -        tmppages = ram_save_target_page(ms, f, block, *offset, last_stage,
>> +        tmppages = ram_save_target_page(ms, f, pss, last_stage,
>>                                           bytes_transferred, dirty_ram_abs);
>>           if (tmppages < 0) {
>>               return tmppages;
>>           }
>>
>>           pages += tmppages;
>> -        *offset += TARGET_PAGE_SIZE;
>> +        pss->offset += TARGET_PAGE_SIZE;
>>           dirty_ram_abs += TARGET_PAGE_SIZE;
>> -    } while (*offset & (qemu_host_page_size - 1));
>> +    } while (pss->offset & (qemu_host_page_size - 1));
>>
>>       /* The offset we leave with is the last one we looked at */
>> -    *offset -= TARGET_PAGE_SIZE;
>> +    pss->offset -= TARGET_PAGE_SIZE;
>>       return pages;
>>   }
>>
>> @@ -1352,7 +1357,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>>           }
>>
>>           if (found) {
>> -            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
>> +            pages = ram_save_host_page(ms, f, &pss,
>>                                          last_stage, bytes_transferred,
>>                                          dirty_ram_abs);
>>               /* For snapshot, we will remove the page write-protect here */
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 03/13] migration: Allow -incoming to work on file: urls
  2016-01-11 20:02   ` Dr. David Alan Gilbert
@ 2016-01-12 13:04     ` Hailiang Zhang
  0 siblings, 0 replies; 48+ messages in thread
From: Hailiang Zhang @ 2016-01-12 13:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, Benoit Canet, quintela, hanweidong, peter.huangpeng,
	qemu-devel, amit.shah

On 2016/1/12 4:02, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Usage:
>> -incoming file:/path/to/vm_statefile
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
>
> This could again be split out of this series; however I have some comments.
>
>> ---
>> - Rebase on qemu 2.5
>> - Use qemu_strtol instead of strtol
>> ---
>>   include/migration/migration.h |  4 +++-
>>   migration/fd.c                | 28 +++++++++++++++++++++++++---
>>   migration/migration.c         |  4 +++-
>>   3 files changed, 31 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index bf4f8e9..3f372a5 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -191,7 +191,9 @@ void unix_start_incoming_migration(const char *path, Error **errp);
>>
>>   void unix_start_outgoing_migration(MigrationState *s, const char *path, Error **errp);
>>
>> -void fd_start_incoming_migration(const char *path, Error **errp);
>> +void fd_start_incoming_migration(const char *path, int fd, Error **errp);
>> +
>> +void file_start_incoming_migration(const char *filename, Error **errp);
>>
>>   void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
>>                                    int outfd, Error **errp);
>> diff --git a/migration/fd.c b/migration/fd.c
>> index b62161f..ac38256 100644
>> --- a/migration/fd.c
>> +++ b/migration/fd.c
>> @@ -81,14 +81,24 @@ static void fd_accept_incoming_migration(void *opaque)
>>       process_incoming_migration(f);
>>   }
>>
>> -void fd_start_incoming_migration(const char *infd, Error **errp)
>> +void fd_start_incoming_migration(const char *infd,  int fd, Error **errp)
>>   {
>> -    int fd;
>>       QEMUFile *f;
>> +    int err;
>> +    long in_fd;
>>
>>       DPRINTF("Attempting to start an incoming migration via fd\n");
>>
>> -    fd = strtol(infd, NULL, 0);
>> +    if (infd) {
>> +        err = qemu_strtol(infd, NULL, 0, &in_fd);
>> +        if (err < 0) {
>> +            error_setg_errno(errp, -err, "Failed to convert string '%s'"
>> +                            " to number", infd);
>> +            return;
>> +        }
>> +        fd = (int)in_fd;
>> +    }
>> +
>>       if (fd_is_socket(fd)) {
>>           f = qemu_fopen_socket(fd, "rb");
>>       } else {
>
> I think I'd prefer to see something like:
> void fd_start_incoming_migration_core(int fd, Error **errp)
>
> void fd_start_incoming_migration(const char *infd, Error **errp)
> {
>    qemu_strtol
>    fd_start_incoming_migration_core
> ...
> }
>

Hmm, good idea, it avoids changing the define of this function. I will fix it.

Thanks,
Hailiang

> (I've always done -incoming "exec:cat file"  but this is neater)
>
> Dave
>
>> @@ -101,3 +111,15 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
>>
>>       qemu_set_fd_handler(fd, fd_accept_incoming_migration, NULL, f);
>>   }
>> +
>> +void file_start_incoming_migration(const char *filename, Error **errp)
>> +{
>> +    int fd;
>> +
>> +    fd = qemu_open(filename, O_RDONLY);
>> +    if (fd < 0) {
>> +        error_setg_errno(errp, errno, "Failed to open file:%s", filename);
>> +        return;
>> +    }
>> +    fd_start_incoming_migration(NULL, fd, NULL);
>> +}
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 3ec3b85..e54910d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -314,7 +314,9 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
>>       } else if (strstart(uri, "unix:", &p)) {
>>           unix_start_incoming_migration(p, errp);
>>       } else if (strstart(uri, "fd:", &p)) {
>> -        fd_start_incoming_migration(p, errp);
>> +        fd_start_incoming_migration(p, -1, errp);
>> +    } else if (strstart(uri, "file:", &p)) {
>> +        file_start_incoming_migration(p, errp);
>>   #endif
>>       } else {
>>           error_setg(errp, "unknown migration protocol: %s", uri);
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (12 preceding siblings ...)
  2016-01-07 12:20 ` [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage zhanghailiang
@ 2016-07-04 12:22 ` Baptiste Reynal
  2016-07-05  1:49   ` Hailiang Zhang
  2016-07-13 18:02 ` Dr. David Alan Gilbert
  14 siblings, 1 reply; 48+ messages in thread
From: Baptiste Reynal @ 2016-07-04 12:22 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu list, aarcange, hanweidong, Juan Quintela, peter.huangpeng,
	dgilbert, Amit Shah

On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
<zhang.zhanghailiang@huawei.com> wrote:
> For now, we still didn't support live memory snapshot, we have discussed
> a scheme which based on userfaultfd long time ago.
> You can find the discussion by the follow link:
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>
> The scheme is based on userfaultfd's write-protect capability.
> The userfaultfd write protection feature is available here:
> http://www.spinics.net/lists/linux-mm/msg97422.html
>
> The process of this live memory scheme is like bellow:
> 1. Pause VM
> 2. Enable write-protect fault notification by using userfaultfd to
>    mark VM's memory to write-protect (readonly).
> 3. Save VM's static state (here is device state) to snapshot file
> 4. Resume VM, VM is going to run.
> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>    snapshot file.
> 6. During this time, all the actions of writing VM's memory will be blocked
>   by kernel, and kernel will wakeup the fault treating thread in qemu to
>   process this write-protect fault. The fault treating thread will deliver this
>   page's address to snapshot thread.
> 7. snapshot thread gets this address, save this page into snasphot file,
>    and then remove the write-protect by using userfaultfd API, after that,
>    the actions of writing will be recovered.
> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>
> Compared with the feature of 'migrate VM's state to file',
> the main difference for live memory snapshot is it has little time delay for
> catching VM's state. It just captures the VM's state while got users snapshot
> command, just like take a photo of VM's state.
>
> For now, we only support tcg accelerator, since userfaultfd is not supporting
> tracking write faults for KVM.
>
> Usage:
> 1. Take a snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
> Issue snapshot command:
> (qemu)migrate -d file:/home/Snapshot
> 2. Revert to the snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio -incoming file:/home/Snapshot
>
> NOTE:
> The userfaultfd write protection feature does not support THP for now,
> Before taking snapshot, please disable THP by:
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>
> TODO:
> - Reduce the influence for VM while taking snapshot
>
> zhanghailiang (13):
>   postcopy/migration: Split fault related state into struct
>     UserfaultState
>   migration: Allow the migrate command to work on file: urls
>   migration: Allow -incoming to work on file: urls
>   migration: Create a snapshot thread to realize saving memory snapshot
>   migration: implement initialization work for snapshot
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   savevm: Split qemu_savevm_state_complete_precopy() into two helper
>     functions
>   snapshot: Save VM's device state into snapshot file
>   migration/postcopy-ram: fix some helper functions to support
>     userfaultfd write-protect
>   snapshot: Enable the write-protect notification capability for VM's
>     RAM
>   snapshot/migration: Save VM's RAM into snapshot file
>   migration/ram: Fix some helper functions' parameter to use
>     PageSearchStatus
>   snapshot: Remove page's write-protect and copy the content during
>     setup stage
>
>  include/migration/migration.h     |  41 +++++--
>  include/migration/postcopy-ram.h  |   9 +-
>  include/migration/qemu-file.h     |   3 +-
>  include/qemu/typedefs.h           |   1 +
>  include/sysemu/sysemu.h           |   3 +
>  linux-headers/linux/userfaultfd.h |  21 +++-
>  migration/fd.c                    |  51 ++++++++-
>  migration/migration.c             | 101 ++++++++++++++++-
>  migration/postcopy-ram.c          | 229 ++++++++++++++++++++++++++++----------
>  migration/qemu-file-buf.c         |  61 ++++++++++
>  migration/ram.c                   | 104 ++++++++++++-----
>  migration/savevm.c                |  90 ++++++++++++---
>  trace-events                      |   1 +
>  13 files changed, 587 insertions(+), 128 deletions(-)
>
> --
> 1.8.3.1
>
>
>

Hi Hailiang,

Can I get the status of this patch series ? I cannot find a v2.
About TCG limitation, is KVM support on a TODO list or is there a
strong technical barrier ?

Thanks,
Baptiste

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-04 12:22 ` [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd Baptiste Reynal
@ 2016-07-05  1:49   ` Hailiang Zhang
  2016-07-05  9:57     ` Baptiste Reynal
  0 siblings, 1 reply; 48+ messages in thread
From: Hailiang Zhang @ 2016-07-05  1:49 UTC (permalink / raw)
  To: Baptiste Reynal, aarcange
  Cc: peter.huangpeng, qemu list, hanweidong, Juan Quintela, dgilbert,
	Amit Shah

On 2016/7/4 20:22, Baptiste Reynal wrote:
> On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
> <zhang.zhanghailiang@huawei.com> wrote:
>> For now, we still didn't support live memory snapshot, we have discussed
>> a scheme which based on userfaultfd long time ago.
>> You can find the discussion by the follow link:
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>
>> The scheme is based on userfaultfd's write-protect capability.
>> The userfaultfd write protection feature is available here:
>> http://www.spinics.net/lists/linux-mm/msg97422.html
>>
>> The process of this live memory scheme is like bellow:
>> 1. Pause VM
>> 2. Enable write-protect fault notification by using userfaultfd to
>>     mark VM's memory to write-protect (readonly).
>> 3. Save VM's static state (here is device state) to snapshot file
>> 4. Resume VM, VM is going to run.
>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>     snapshot file.
>> 6. During this time, all the actions of writing VM's memory will be blocked
>>    by kernel, and kernel will wakeup the fault treating thread in qemu to
>>    process this write-protect fault. The fault treating thread will deliver this
>>    page's address to snapshot thread.
>> 7. snapshot thread gets this address, save this page into snasphot file,
>>     and then remove the write-protect by using userfaultfd API, after that,
>>     the actions of writing will be recovered.
>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>
>> Compared with the feature of 'migrate VM's state to file',
>> the main difference for live memory snapshot is it has little time delay for
>> catching VM's state. It just captures the VM's state while got users snapshot
>> command, just like take a photo of VM's state.
>>
>> For now, we only support tcg accelerator, since userfaultfd is not supporting
>> tracking write faults for KVM.
>>
>> Usage:
>> 1. Take a snapshot
>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
>> Issue snapshot command:
>> (qemu)migrate -d file:/home/Snapshot
>> 2. Revert to the snapshot
>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio -incoming file:/home/Snapshot
>>
>> NOTE:
>> The userfaultfd write protection feature does not support THP for now,
>> Before taking snapshot, please disable THP by:
>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>
>> TODO:
>> - Reduce the influence for VM while taking snapshot
>>
>> zhanghailiang (13):
>>    postcopy/migration: Split fault related state into struct
>>      UserfaultState
>>    migration: Allow the migrate command to work on file: urls
>>    migration: Allow -incoming to work on file: urls
>>    migration: Create a snapshot thread to realize saving memory snapshot
>>    migration: implement initialization work for snapshot
>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>    savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>      functions
>>    snapshot: Save VM's device state into snapshot file
>>    migration/postcopy-ram: fix some helper functions to support
>>      userfaultfd write-protect
>>    snapshot: Enable the write-protect notification capability for VM's
>>      RAM
>>    snapshot/migration: Save VM's RAM into snapshot file
>>    migration/ram: Fix some helper functions' parameter to use
>>      PageSearchStatus
>>    snapshot: Remove page's write-protect and copy the content during
>>      setup stage
>>
>>   include/migration/migration.h     |  41 +++++--
>>   include/migration/postcopy-ram.h  |   9 +-
>>   include/migration/qemu-file.h     |   3 +-
>>   include/qemu/typedefs.h           |   1 +
>>   include/sysemu/sysemu.h           |   3 +
>>   linux-headers/linux/userfaultfd.h |  21 +++-
>>   migration/fd.c                    |  51 ++++++++-
>>   migration/migration.c             | 101 ++++++++++++++++-
>>   migration/postcopy-ram.c          | 229 ++++++++++++++++++++++++++++----------
>>   migration/qemu-file-buf.c         |  61 ++++++++++
>>   migration/ram.c                   | 104 ++++++++++++-----
>>   migration/savevm.c                |  90 ++++++++++++---
>>   trace-events                      |   1 +
>>   13 files changed, 587 insertions(+), 128 deletions(-)
>>
>> --
>> 1.8.3.1
>>
>>
>>
>

Hi,

> Hi Hailiang,
>
> Can I get the status of this patch series ? I cannot find a v2.

Yes, I haven't updated it for long time, it is based on userfault-wp API
in kernel, and Andrea didn't update the related patches until recent days.
I will update this series in the next one or two weeks. But it will only
support TCG until userfault-wp API supports KVM.

> About TCG limitation, is KVM support on a TODO list or is there a
> strong technical barrier ?
>

I don't think there are any technical hurdles, I would like to
have a try on realizing the KVM part to support userfault-wp,
But I'm a little busy with other things now.
Andrea may has a plan to achieve it.

To: Andrea Arcangeli <aarcange@redhat.com>

Thanks,
Hailiang

> Thanks,
> Baptiste
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-05  1:49   ` Hailiang Zhang
@ 2016-07-05  9:57     ` Baptiste Reynal
  2016-07-05 10:27       ` Hailiang Zhang
  2016-07-05 14:59       ` Andrea Arcangeli
  0 siblings, 2 replies; 48+ messages in thread
From: Baptiste Reynal @ 2016-07-05  9:57 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: aarcange, peter.huangpeng, qemu list, hanweidong, Juan Quintela,
	dgilbert, Amit Shah, Christian Pinto

On Tue, Jul 5, 2016 at 3:49 AM, Hailiang Zhang
<zhang.zhanghailiang@huawei.com> wrote:
> On 2016/7/4 20:22, Baptiste Reynal wrote:
>>
>> On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
>> <zhang.zhanghailiang@huawei.com> wrote:
>>>
>>> For now, we still didn't support live memory snapshot, we have discussed
>>> a scheme which based on userfaultfd long time ago.
>>> You can find the discussion by the follow link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>>
>>> The scheme is based on userfaultfd's write-protect capability.
>>> The userfaultfd write protection feature is available here:
>>> http://www.spinics.net/lists/linux-mm/msg97422.html
>>>
>>> The process of this live memory scheme is like bellow:
>>> 1. Pause VM
>>> 2. Enable write-protect fault notification by using userfaultfd to
>>>     mark VM's memory to write-protect (readonly).
>>> 3. Save VM's static state (here is device state) to snapshot file
>>> 4. Resume VM, VM is going to run.
>>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>>     snapshot file.
>>> 6. During this time, all the actions of writing VM's memory will be
>>> blocked
>>>    by kernel, and kernel will wakeup the fault treating thread in qemu to
>>>    process this write-protect fault. The fault treating thread will
>>> deliver this
>>>    page's address to snapshot thread.
>>> 7. snapshot thread gets this address, save this page into snasphot file,
>>>     and then remove the write-protect by using userfaultfd API, after
>>> that,
>>>     the actions of writing will be recovered.
>>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>>
>>> Compared with the feature of 'migrate VM's state to file',
>>> the main difference for live memory snapshot is it has little time delay
>>> for
>>> catching VM's state. It just captures the VM's state while got users
>>> snapshot
>>> command, just like take a photo of VM's state.
>>>
>>> For now, we only support tcg accelerator, since userfaultfd is not
>>> supporting
>>> tracking write faults for KVM.
>>>
>>> Usage:
>>> 1. Take a snapshot
>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>> --monitor stdio
>>> Issue snapshot command:
>>> (qemu)migrate -d file:/home/Snapshot
>>> 2. Revert to the snapshot
>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>> --monitor stdio -incoming file:/home/Snapshot
>>>
>>> NOTE:
>>> The userfaultfd write protection feature does not support THP for now,
>>> Before taking snapshot, please disable THP by:
>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>>
>>> TODO:
>>> - Reduce the influence for VM while taking snapshot
>>>
>>> zhanghailiang (13):
>>>    postcopy/migration: Split fault related state into struct
>>>      UserfaultState
>>>    migration: Allow the migrate command to work on file: urls
>>>    migration: Allow -incoming to work on file: urls
>>>    migration: Create a snapshot thread to realize saving memory snapshot
>>>    migration: implement initialization work for snapshot
>>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>>    savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>>      functions
>>>    snapshot: Save VM's device state into snapshot file
>>>    migration/postcopy-ram: fix some helper functions to support
>>>      userfaultfd write-protect
>>>    snapshot: Enable the write-protect notification capability for VM's
>>>      RAM
>>>    snapshot/migration: Save VM's RAM into snapshot file
>>>    migration/ram: Fix some helper functions' parameter to use
>>>      PageSearchStatus
>>>    snapshot: Remove page's write-protect and copy the content during
>>>      setup stage
>>>
>>>   include/migration/migration.h     |  41 +++++--
>>>   include/migration/postcopy-ram.h  |   9 +-
>>>   include/migration/qemu-file.h     |   3 +-
>>>   include/qemu/typedefs.h           |   1 +
>>>   include/sysemu/sysemu.h           |   3 +
>>>   linux-headers/linux/userfaultfd.h |  21 +++-
>>>   migration/fd.c                    |  51 ++++++++-
>>>   migration/migration.c             | 101 ++++++++++++++++-
>>>   migration/postcopy-ram.c          | 229
>>> ++++++++++++++++++++++++++++----------
>>>   migration/qemu-file-buf.c         |  61 ++++++++++
>>>   migration/ram.c                   | 104 ++++++++++++-----
>>>   migration/savevm.c                |  90 ++++++++++++---
>>>   trace-events                      |   1 +
>>>   13 files changed, 587 insertions(+), 128 deletions(-)
>>>
>>> --
>>> 1.8.3.1
>>>
>>>
>>>
>>
>
> Hi,
>
>> Hi Hailiang,
>>
>> Can I get the status of this patch series ? I cannot find a v2.
>
>
> Yes, I haven't updated it for long time, it is based on userfault-wp API
> in kernel, and Andrea didn't update the related patches until recent days.
> I will update this series in the next one or two weeks. But it will only
> support TCG until userfault-wp API supports KVM.
>

May I have a pointer to those patches ? The last I found is
http://thread.gmane.org/gmane.linux.kernel.mm/141647 and it seems
pretty old.

>> About TCG limitation, is KVM support on a TODO list or is there a
>> strong technical barrier ?
>>
>
> I don't think there are any technical hurdles, I would like to
> have a try on realizing the KVM part to support userfault-wp,
> But I'm a little busy with other things now.
> Andrea may has a plan to achieve it.
>
> To: Andrea Arcangeli <aarcange@redhat.com>
>
> Thanks,
> Hailiang
>

Ok, if it is not on Andrea schedule I am willing to take the action,
at least for ARM/ARM64 support.

Regards,
Baptiste

>> Thanks,
>> Baptiste
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-05  9:57     ` Baptiste Reynal
@ 2016-07-05 10:27       ` Hailiang Zhang
  2016-08-18 15:56         ` Andrea Arcangeli
  2016-07-05 14:59       ` Andrea Arcangeli
  1 sibling, 1 reply; 48+ messages in thread
From: Hailiang Zhang @ 2016-07-05 10:27 UTC (permalink / raw)
  To: Baptiste Reynal
  Cc: peter.huangpeng, aarcange, qemu list, hanweidong, Juan Quintela,
	dgilbert, Amit Shah, Christian Pinto

On 2016/7/5 17:57, Baptiste Reynal wrote:
> On Tue, Jul 5, 2016 at 3:49 AM, Hailiang Zhang
> <zhang.zhanghailiang@huawei.com> wrote:
>> On 2016/7/4 20:22, Baptiste Reynal wrote:
>>>
>>> On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
>>> <zhang.zhanghailiang@huawei.com> wrote:
>>>>
>>>> For now, we still didn't support live memory snapshot, we have discussed
>>>> a scheme which based on userfaultfd long time ago.
>>>> You can find the discussion by the follow link:
>>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>>>
>>>> The scheme is based on userfaultfd's write-protect capability.
>>>> The userfaultfd write protection feature is available here:
>>>> http://www.spinics.net/lists/linux-mm/msg97422.html
>>>>
>>>> The process of this live memory scheme is like bellow:
>>>> 1. Pause VM
>>>> 2. Enable write-protect fault notification by using userfaultfd to
>>>>      mark VM's memory to write-protect (readonly).
>>>> 3. Save VM's static state (here is device state) to snapshot file
>>>> 4. Resume VM, VM is going to run.
>>>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>>>      snapshot file.
>>>> 6. During this time, all the actions of writing VM's memory will be
>>>> blocked
>>>>     by kernel, and kernel will wakeup the fault treating thread in qemu to
>>>>     process this write-protect fault. The fault treating thread will
>>>> deliver this
>>>>     page's address to snapshot thread.
>>>> 7. snapshot thread gets this address, save this page into snasphot file,
>>>>      and then remove the write-protect by using userfaultfd API, after
>>>> that,
>>>>      the actions of writing will be recovered.
>>>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>>>
>>>> Compared with the feature of 'migrate VM's state to file',
>>>> the main difference for live memory snapshot is it has little time delay
>>>> for
>>>> catching VM's state. It just captures the VM's state while got users
>>>> snapshot
>>>> command, just like take a photo of VM's state.
>>>>
>>>> For now, we only support tcg accelerator, since userfaultfd is not
>>>> supporting
>>>> tracking write faults for KVM.
>>>>
>>>> Usage:
>>>> 1. Take a snapshot
>>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>>> --monitor stdio
>>>> Issue snapshot command:
>>>> (qemu)migrate -d file:/home/Snapshot
>>>> 2. Revert to the snapshot
>>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>>> --monitor stdio -incoming file:/home/Snapshot
>>>>
>>>> NOTE:
>>>> The userfaultfd write protection feature does not support THP for now,
>>>> Before taking snapshot, please disable THP by:
>>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>>>
>>>> TODO:
>>>> - Reduce the influence for VM while taking snapshot
>>>>
>>>> zhanghailiang (13):
>>>>     postcopy/migration: Split fault related state into struct
>>>>       UserfaultState
>>>>     migration: Allow the migrate command to work on file: urls
>>>>     migration: Allow -incoming to work on file: urls
>>>>     migration: Create a snapshot thread to realize saving memory snapshot
>>>>     migration: implement initialization work for snapshot
>>>>     QEMUSizedBuffer: Introduce two help functions for qsb
>>>>     savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>>>       functions
>>>>     snapshot: Save VM's device state into snapshot file
>>>>     migration/postcopy-ram: fix some helper functions to support
>>>>       userfaultfd write-protect
>>>>     snapshot: Enable the write-protect notification capability for VM's
>>>>       RAM
>>>>     snapshot/migration: Save VM's RAM into snapshot file
>>>>     migration/ram: Fix some helper functions' parameter to use
>>>>       PageSearchStatus
>>>>     snapshot: Remove page's write-protect and copy the content during
>>>>       setup stage
>>>>
>>>>    include/migration/migration.h     |  41 +++++--
>>>>    include/migration/postcopy-ram.h  |   9 +-
>>>>    include/migration/qemu-file.h     |   3 +-
>>>>    include/qemu/typedefs.h           |   1 +
>>>>    include/sysemu/sysemu.h           |   3 +
>>>>    linux-headers/linux/userfaultfd.h |  21 +++-
>>>>    migration/fd.c                    |  51 ++++++++-
>>>>    migration/migration.c             | 101 ++++++++++++++++-
>>>>    migration/postcopy-ram.c          | 229
>>>> ++++++++++++++++++++++++++++----------
>>>>    migration/qemu-file-buf.c         |  61 ++++++++++
>>>>    migration/ram.c                   | 104 ++++++++++++-----
>>>>    migration/savevm.c                |  90 ++++++++++++---
>>>>    trace-events                      |   1 +
>>>>    13 files changed, 587 insertions(+), 128 deletions(-)
>>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>>>
>>>
>>
>> Hi,
>>
>>> Hi Hailiang,
>>>
>>> Can I get the status of this patch series ? I cannot find a v2.
>>
>>
>> Yes, I haven't updated it for long time, it is based on userfault-wp API
>> in kernel, and Andrea didn't update the related patches until recent days.
>> I will update this series in the next one or two weeks. But it will only
>> support TCG until userfault-wp API supports KVM.
>>
>
> May I have a pointer to those patches ? The last I found is
> http://thread.gmane.org/gmane.linux.kernel.mm/141647 and it seems
> pretty old.
>

Yes, Andrea has updated it, but not released it in public,
I have retransmited his email to you. Please see the related email.

>>> About TCG limitation, is KVM support on a TODO list or is there a
>>> strong technical barrier ?
>>>
>>
>> I don't think there are any technical hurdles, I would like to
>> have a try on realizing the KVM part to support userfault-wp,
>> But I'm a little busy with other things now.
>> Andrea may has a plan to achieve it.
>>
>> To: Andrea Arcangeli <aarcange@redhat.com>
>>
>> Thanks,
>> Hailiang
>>
>
> Ok, if it is not on Andrea schedule I am willing to take the action,
> at least for ARM/ARM64 support.
>

Hmm, great, if you can participate, we can speed up the developing process.

Thanks,
Hailiang

> Regards,
> Baptiste
>
>>> Thanks,
>>> Baptiste
>>>
>>> .
>>>
>>
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-05  9:57     ` Baptiste Reynal
  2016-07-05 10:27       ` Hailiang Zhang
@ 2016-07-05 14:59       ` Andrea Arcangeli
  1 sibling, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2016-07-05 14:59 UTC (permalink / raw)
  To: Baptiste Reynal
  Cc: Hailiang Zhang, peter.huangpeng, qemu list, hanweidong,
	Juan Quintela, dgilbert, Amit Shah, Christian Pinto

Hello,

On Tue, Jul 05, 2016 at 11:57:31AM +0200, Baptiste Reynal wrote:
> Ok, if it is not on Andrea schedule I am willing to take the action,
> at least for ARM/ARM64 support.

A few days ago I released this update:

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/

git clone -b master --reference linux
git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
cd aa
git fetch
git reset --hard origin/master

The branch will be constantly rebased so you will need to rebase or
reset on origin/master after a fetch to get the updates.


Features added:

1) WP support for anon (Shaohua, hugetlbfs has a FIXME)
2) non cooperative support (Pavel & Mike Rapoport)
3) hugetlbfs missing faults tracking (Mike Kravetz)

WP support and hugetlbfs required a couple of fixes, the
non-cooperative support is as submitted but I wonder if we should have
a single non cooperative feature flag.

I didn't advertise it yet because It's not well tested and in fact I
don't expect the WP mode to work fully as it should.

However the kernel should run stable, I fixed enough bugs so that this
release should not be possible to DoS or exploit the kernel with this
patchset applied (unlike the original code submits which had race
conditions and potentially kernel crashing bugs).

The next thing I plan to work on is a bitflag in the swap entry for
the WP tracking so that WP tracking works correctly through swapins
without false positives. It'll work like soft-dirty. Possible that
other things are still uncovered in the WP support.

THP should be covered now (the callback was missing in the original
submit but I fixed that). KVM it's not entirely clear why it didn't
work before but it may require changes to the KVM code if this is not
enough. KVM should not use gup(write=1) for read faults on shadow
pagetables, so it has at least a chance to work.

I'm also considering using a reserved bitflag in the mapped/present
pte/trans_huge_pmds to track which virtual addresses have been
wrprotected. Without a reserved bitflag, fork() would inevitably lead
to WP userfaults false positives. I'm not sure if it's required or if
it should be left up to userland to enforce the pagetables don't
become wrprotected (i.e. use MADV_DONTFORK like of course KVM already
does). First we've to solve the false positives through swap anyway,
the two should be orthogonal improvements.

If you could test the live snapshotting patchset on my kernel master
branch and report any issue or incremental fix against my branch, it'd
be great.

On my side I think I'll focus on testing by extending the testsuite
inside the kernel to exercise WP tracking too.

There are several other active users of the new userfaultfd features,
including JIT garbage collection (that previously used mprotect and
trapped SIGSEGV), distributed shared memory, SQL database robustness
in hugetlbfs holes and postcopy live migration of containers (a
process using userfaultfd of its own being live migrated inside a
containers with the non-cooperative model, isn't solved yet though).

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls
  2016-01-07 12:19 ` [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls zhanghailiang
@ 2016-07-13 16:12   ` Dr. David Alan Gilbert
  2016-07-14  5:27     ` Hailiang Zhang
  0 siblings, 1 reply; 48+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-13 16:12 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, aarcange, quintela, amit.shah, peter.huangpeng,
	hanweidong, Benoit Canet

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Usage:
> (qemu) migrate file:/path/to/vm_statefile
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
> ---
> - With this patch, we can easily test memory snapshot
> - Rebase on qemu 2.5
> ---
>  include/migration/migration.h |  6 +++++-
>  migration/fd.c                | 19 +++++++++++++++++--
>  migration/migration.c         |  4 +++-
>  3 files changed, 25 insertions(+), 4 deletions(-)

Even if the rest of this series takes some time to finish; it might be
best to post this patch separately - it just a nice bit of simplification.
Of course now you have to rewrite using qio_channel_file_new_path
but I guess it's simple.

(I've tended to use migrate "exec:cat > file" but a file:  would be nice)

Dave

> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 4c80939..bf4f8e9 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -193,7 +193,11 @@ void unix_start_outgoing_migration(MigrationState *s, const char *path, Error **
>  
>  void fd_start_incoming_migration(const char *path, Error **errp);
>  
> -void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp);
> +void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
> +                                 int outfd, Error **errp);
> +
> +void file_start_outgoing_migration(MigrationState *s, const char *filename,
> +                                   Error **errp);
>  
>  void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp);
>  
> diff --git a/migration/fd.c b/migration/fd.c
> index 3e4bed0..b62161f 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -42,9 +42,10 @@ static bool fd_is_socket(int fd)
>      return S_ISSOCK(stat.st_mode);
>  }
>  
> -void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
> +void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
> +                                 int outfd, Error **errp)
>  {
> -    int fd = monitor_get_fd(cur_mon, fdname, errp);
> +    int fd = fdname ? monitor_get_fd(cur_mon, fdname, errp) : outfd;
>      if (fd == -1) {
>          return;
>      }
> @@ -58,6 +59,20 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **
>      migrate_fd_connect(s);
>  }
>  
> +void file_start_outgoing_migration(MigrationState *s, const char *filename,
> +                                   Error **errp)
> +{
> +    int fd;
> +
> +    fd = qemu_open(filename, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);
> +    if (fd < 0) {
> +        error_setg_errno(errp, errno, "Failed to open file: %s", filename);
> +        return;
> +    }
> +    fd_start_outgoing_migration(s, NULL, fd, errp);
> +}
> +
> +
>  static void fd_accept_incoming_migration(void *opaque)
>  {
>      QEMUFile *f = opaque;
> diff --git a/migration/migration.c b/migration/migration.c
> index c842499..3ec3b85 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1021,7 +1021,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>      } else if (strstart(uri, "unix:", &p)) {
>          unix_start_outgoing_migration(s, p, &local_err);
>      } else if (strstart(uri, "fd:", &p)) {
> -        fd_start_outgoing_migration(s, p, &local_err);
> +        fd_start_outgoing_migration(s, p, -1, &local_err);
> +    } else if (strstart(uri, "file:", &p)) {
> +        file_start_outgoing_migration(s, p,  &local_err);
>  #endif
>      } else {
>          error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage
  2016-01-07 12:20 ` [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage zhanghailiang
@ 2016-07-13 17:52   ` Dr. David Alan Gilbert
  2016-07-14  8:02     ` Hailiang Zhang
  0 siblings, 1 reply; 48+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-13 17:52 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, aarcange, quintela, amit.shah, peter.huangpeng, hanweidong

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If we modify VM's RAM (pages) during setup stage after enable write-protect
> notification in snapshot thread, the modification action will get stuck because
> we only remove the page's write-protect in savevm process, it blocked by itself.
> 
> To fix this bug, we will remove page's write-protect in fault thread during
> the setup stage. Besides, we should not try to get global lock after setup stage,
> or there maybe an deadlock error.

Hmm this complicates things a bit more doesn't it.
What's the order of:
   a) setup
   b) savings devices
   c) Being able to transmit the pages?

Are these pages that are being modified during setup, being modified
as part of the device state save?

Dave

> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  include/migration/migration.h |  4 ++--
>  migration/migration.c         |  2 +-
>  migration/postcopy-ram.c      | 17 ++++++++++++++++-
>  migration/ram.c               | 37 +++++++++++++++++++++++++++++++------
>  4 files changed, 50 insertions(+), 10 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index ef4c071..435de31 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -127,7 +127,7 @@ struct MigrationSrcPageRequest {
>      RAMBlock *rb;
>      hwaddr    offset;
>      hwaddr    len;
> -
> +    uint8_t *pages_copy_addr;
>      QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
>  };
>  
> @@ -333,7 +333,7 @@ void global_state_store_running(void);
>  
>  void flush_page_queue(MigrationState *ms);
>  int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> -                         ram_addr_t start, ram_addr_t len);
> +                         ram_addr_t start, ram_addr_t len, bool copy_pages);
>  
>  PostcopyState postcopy_state_get(void);
>  /* Set the state and return the old state */
> diff --git a/migration/migration.c b/migration/migration.c
> index 3765c3b..bf4c7a1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1248,7 +1248,7 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>          return;
>      }
>  
> -    if (ram_save_queue_pages(ms, rbname, start, len)) {
> +    if (ram_save_queue_pages(ms, rbname, start, len, false)) {
>          mark_source_rp_bad(ms);
>      }
>  }
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 61392d3..2cf477d 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -543,13 +543,28 @@ static void *postcopy_ram_fault_thread(void *opaque)
>              MigrationState *ms = container_of(us, MigrationState,
>                                                userfault_state);
>              ret = ram_save_queue_pages(ms, qemu_ram_get_idstr(rb), rb_offset,
> -                                       hostpagesize);
> +                                       hostpagesize, true);
>  
>              if (ret < 0) {
>                  error_report("%s: Save: %"PRIx64 " failed!",
>                               __func__, (uint64_t)msg.arg.pagefault.address);
>                  break;
>              }
> +
> +            /* Note: In the setup process, snapshot_thread may modify VM's
> +            * write-protected pages, we should not block it there, or there
> +            * will be an deadlock error.
> +            */
> +            if (migration_in_setup(ms)) {
> +                uint64_t host = msg.arg.pagefault.address;
> +
> +                host &= ~(hostpagesize - 1);
> +                ret = ram_set_pages_wp(host, getpagesize(), true,
> +                                       us->userfault_fd);
> +                if (ret < 0) {
> +                    error_report("Remove page's write-protect failed");
> +                }
> +            }
>          }
>      }
>      trace_postcopy_ram_fault_thread_exit();
> diff --git a/migration/ram.c b/migration/ram.c
> index 8656719..747f9aa 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -233,6 +233,7 @@ struct PageSearchStatus {
>      ram_addr_t   offset;
>      /* Set once we wrap around */
>      bool         complete_round;
> +    uint8_t *pages_copy;
>  };
>  typedef struct PageSearchStatus PageSearchStatus;
>  
> @@ -742,7 +743,12 @@ static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->offset;
>  
> -    p = block->host + offset;
> +    /* If we have a copy of this page, use the backup page first */
> +    if (pss->pages_copy) {
> +        p = pss->pages_copy;
> +    } else {
> +        p = block->host + offset;
> +    }
>  
>      /* In doubt sent page as normal */
>      bytes_xmit = 0;
> @@ -926,7 +932,12 @@ static int ram_save_compressed_page(QEMUFile *f, PageSearchStatus *pss,
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->offset;
>  
> -    p = block->host + offset;
> +    /* If we have a copy of this page, use the backup first */
> +    if (pss->pages_copy) {
> +        p = pss->pages_copy;
> +    } else {
> +        p = block->host + offset;
> +    }
>  
>      bytes_xmit = 0;
>      ret = ram_control_save_page(f, block->offset,
> @@ -1043,7 +1054,7 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
>   * Returns:      block (or NULL if none available)
>   */
>  static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
> -                              ram_addr_t *ram_addr_abs)
> +                              ram_addr_t *ram_addr_abs, uint8 **pages_copy_addr)
>  {
>      RAMBlock *block = NULL;
>  
> @@ -1055,7 +1066,7 @@ static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
>          *offset = entry->offset;
>          *ram_addr_abs = (entry->offset + entry->rb->offset) &
>                          TARGET_PAGE_MASK;
> -
> +        *pages_copy_addr = entry->pages_copy_addr;
>          if (entry->len > TARGET_PAGE_SIZE) {
>              entry->len -= TARGET_PAGE_SIZE;
>              entry->offset += TARGET_PAGE_SIZE;
> @@ -1086,9 +1097,10 @@ static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
>      RAMBlock  *block;
>      ram_addr_t offset;
>      bool dirty;
> +    uint8 *pages_backup_addr = NULL;
>  
>      do {
> -        block = unqueue_page(ms, &offset, ram_addr_abs);
> +        block = unqueue_page(ms, &offset, ram_addr_abs, &pages_backup_addr);
>          /*
>           * We're sending this page, and since it's postcopy nothing else
>           * will dirty it, and we must make sure it doesn't get sent again
> @@ -1130,6 +1142,7 @@ static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
>           */
>          pss->block = block;
>          pss->offset = offset;
> +        pss->pages_copy = pages_backup_addr;
>      }
>  
>      return !!block;
> @@ -1166,7 +1179,7 @@ void flush_page_queue(MigrationState *ms)
>   *   Return: 0 on success
>   */
>  int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> -                         ram_addr_t start, ram_addr_t len)
> +                         ram_addr_t start, ram_addr_t len, bool copy_pages)
>  {
>      RAMBlock *ramblock;
>  
> @@ -1206,6 +1219,17 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>      new_entry->rb = ramblock;
>      new_entry->offset = start;
>      new_entry->len = len;
> +    if (copy_pages) {
> +        /* Fix me: Better to realize a memory pool */
> +        new_entry->pages_copy_addr = g_try_malloc0(len);
> +
> +        if (!new_entry->pages_copy_addr) {
> +            error_report("%s: Failed to alloc memory", __func__);
> +            return -1;
> +        }
> +
> +        memcpy(new_entry->pages_copy_addr, ramblock_ptr(ramblock, start), len);
> +    }
>  
>      memory_region_ref(ramblock->mr);
>      qemu_mutex_lock(&ms->src_page_req_mutex);
> @@ -1342,6 +1366,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>      pss.block = last_seen_block;
>      pss.offset = last_offset;
>      pss.complete_round = false;
> +    pss.pages_copy = NULL;
>  
>      if (!pss.block) {
>          pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
                   ` (13 preceding siblings ...)
  2016-07-04 12:22 ` [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd Baptiste Reynal
@ 2016-07-13 18:02 ` Dr. David Alan Gilbert
  2016-07-14 10:24   ` Hailiang Zhang
  14 siblings, 1 reply; 48+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-13 18:02 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, aarcange, quintela, amit.shah, peter.huangpeng, hanweidong

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> For now, we still didn't support live memory snapshot, we have discussed
> a scheme which based on userfaultfd long time ago.
> You can find the discussion by the follow link:
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
> 
> The scheme is based on userfaultfd's write-protect capability.
> The userfaultfd write protection feature is available here:
> http://www.spinics.net/lists/linux-mm/msg97422.html

I've (finally!) had a brief look through this, I like the idea.
I've not bothered with minor cleanup like comments on them;
I'm sure those will happen later; some larger scale things to think
about are:
  a) I wonder if it's really best to put that much code into the postcopy
     function; it might be but I can see other userfault uses as well.
  b) I worry a bit about the size of the copies you create during setup
     and I don't really understand why you can't start sending those pages
     immediately - but then I worry aobut the relative order of when pages
     data should be sent compared to the state of devices view of RAM.
  c) Have you considered also using userfault for loading the snapshot - I
    know there was someone on #qemu a while ago who was talking about using
    it as a way to quickly reload from a migration image.

Dave

> 
> The process of this live memory scheme is like bellow:
> 1. Pause VM
> 2. Enable write-protect fault notification by using userfaultfd to
>    mark VM's memory to write-protect (readonly).
> 3. Save VM's static state (here is device state) to snapshot file
> 4. Resume VM, VM is going to run.
> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>    snapshot file.
> 6. During this time, all the actions of writing VM's memory will be blocked
>   by kernel, and kernel will wakeup the fault treating thread in qemu to
>   process this write-protect fault. The fault treating thread will deliver this
>   page's address to snapshot thread.
> 7. snapshot thread gets this address, save this page into snasphot file,
>    and then remove the write-protect by using userfaultfd API, after that,
>    the actions of writing will be recovered. 
> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
> 
> Compared with the feature of 'migrate VM's state to file',
> the main difference for live memory snapshot is it has little time delay for
> catching VM's state. It just captures the VM's state while got users snapshot
> command, just like take a photo of VM's state.
> 
> For now, we only support tcg accelerator, since userfaultfd is not supporting
> tracking write faults for KVM.
> 
> Usage:
> 1. Take a snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
> Issue snapshot command:
> (qemu)migrate -d file:/home/Snapshot
> 2. Revert to the snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio -incoming file:/home/Snapshot
> 
> NOTE:
> The userfaultfd write protection feature does not support THP for now,
> Before taking snapshot, please disable THP by:
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> 
> TODO:
> - Reduce the influence for VM while taking snapshot
> 
> zhanghailiang (13):
>   postcopy/migration: Split fault related state into struct
>     UserfaultState
>   migration: Allow the migrate command to work on file: urls
>   migration: Allow -incoming to work on file: urls
>   migration: Create a snapshot thread to realize saving memory snapshot
>   migration: implement initialization work for snapshot
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   savevm: Split qemu_savevm_state_complete_precopy() into two helper
>     functions
>   snapshot: Save VM's device state into snapshot file
>   migration/postcopy-ram: fix some helper functions to support
>     userfaultfd write-protect
>   snapshot: Enable the write-protect notification capability for VM's
>     RAM
>   snapshot/migration: Save VM's RAM into snapshot file
>   migration/ram: Fix some helper functions' parameter to use
>     PageSearchStatus
>   snapshot: Remove page's write-protect and copy the content during
>     setup stage
> 
>  include/migration/migration.h     |  41 +++++--
>  include/migration/postcopy-ram.h  |   9 +-
>  include/migration/qemu-file.h     |   3 +-
>  include/qemu/typedefs.h           |   1 +
>  include/sysemu/sysemu.h           |   3 +
>  linux-headers/linux/userfaultfd.h |  21 +++-
>  migration/fd.c                    |  51 ++++++++-
>  migration/migration.c             | 101 ++++++++++++++++-
>  migration/postcopy-ram.c          | 229 ++++++++++++++++++++++++++++----------
>  migration/qemu-file-buf.c         |  61 ++++++++++
>  migration/ram.c                   | 104 ++++++++++++-----
>  migration/savevm.c                |  90 ++++++++++++---
>  trace-events                      |   1 +
>  13 files changed, 587 insertions(+), 128 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls
  2016-07-13 16:12   ` Dr. David Alan Gilbert
@ 2016-07-14  5:27     ` Hailiang Zhang
  0 siblings, 0 replies; 48+ messages in thread
From: Hailiang Zhang @ 2016-07-14  5:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: peter.huangpeng, qemu-devel, aarcange, quintela, amit.shah,
	hanweidong, Benoit Canet

On 2016/7/14 0:12, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Usage:
>> (qemu) migrate file:/path/to/vm_statefile
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
>> ---
>> - With this patch, we can easily test memory snapshot
>> - Rebase on qemu 2.5
>> ---
>>   include/migration/migration.h |  6 +++++-
>>   migration/fd.c                | 19 +++++++++++++++++--
>>   migration/migration.c         |  4 +++-
>>   3 files changed, 25 insertions(+), 4 deletions(-)
>
> Even if the rest of this series takes some time to finish; it might be
> best to post this patch separately - it just a nice bit of simplification.
> Of course now you have to rewrite using qio_channel_file_new_path
> but I guess it's simple.
>
> (I've tended to use migrate "exec:cat > file" but a file:  would be nice)
>

OK, if you need this, i will send it as a single patch.

Thanks.

> Dave
>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index 4c80939..bf4f8e9 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -193,7 +193,11 @@ void unix_start_outgoing_migration(MigrationState *s, const char *path, Error **
>>
>>   void fd_start_incoming_migration(const char *path, Error **errp);
>>
>> -void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp);
>> +void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
>> +                                 int outfd, Error **errp);
>> +
>> +void file_start_outgoing_migration(MigrationState *s, const char *filename,
>> +                                   Error **errp);
>>
>>   void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp);
>>
>> diff --git a/migration/fd.c b/migration/fd.c
>> index 3e4bed0..b62161f 100644
>> --- a/migration/fd.c
>> +++ b/migration/fd.c
>> @@ -42,9 +42,10 @@ static bool fd_is_socket(int fd)
>>       return S_ISSOCK(stat.st_mode);
>>   }
>>
>> -void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>> +void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
>> +                                 int outfd, Error **errp)
>>   {
>> -    int fd = monitor_get_fd(cur_mon, fdname, errp);
>> +    int fd = fdname ? monitor_get_fd(cur_mon, fdname, errp) : outfd;
>>       if (fd == -1) {
>>           return;
>>       }
>> @@ -58,6 +59,20 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **
>>       migrate_fd_connect(s);
>>   }
>>
>> +void file_start_outgoing_migration(MigrationState *s, const char *filename,
>> +                                   Error **errp)
>> +{
>> +    int fd;
>> +
>> +    fd = qemu_open(filename, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);
>> +    if (fd < 0) {
>> +        error_setg_errno(errp, errno, "Failed to open file: %s", filename);
>> +        return;
>> +    }
>> +    fd_start_outgoing_migration(s, NULL, fd, errp);
>> +}
>> +
>> +
>>   static void fd_accept_incoming_migration(void *opaque)
>>   {
>>       QEMUFile *f = opaque;
>> diff --git a/migration/migration.c b/migration/migration.c
>> index c842499..3ec3b85 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1021,7 +1021,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>>       } else if (strstart(uri, "unix:", &p)) {
>>           unix_start_outgoing_migration(s, p, &local_err);
>>       } else if (strstart(uri, "fd:", &p)) {
>> -        fd_start_outgoing_migration(s, p, &local_err);
>> +        fd_start_outgoing_migration(s, p, -1, &local_err);
>> +    } else if (strstart(uri, "file:", &p)) {
>> +        file_start_outgoing_migration(s, p,  &local_err);
>>   #endif
>>       } else {
>>           error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage
  2016-07-13 17:52   ` Dr. David Alan Gilbert
@ 2016-07-14  8:02     ` Hailiang Zhang
  0 siblings, 0 replies; 48+ messages in thread
From: Hailiang Zhang @ 2016-07-14  8:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: peter.huangpeng, qemu-devel, aarcange, quintela, amit.shah, hanweidong

On 2016/7/14 1:52, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> If we modify VM's RAM (pages) during setup stage after enable write-protect
>> notification in snapshot thread, the modification action will get stuck because
>> we only remove the page's write-protect in savevm process, it blocked by itself.
>>
>> To fix this bug, we will remove page's write-protect in fault thread during
>> the setup stage. Besides, we should not try to get global lock after setup stage,
>> or there maybe an deadlock error.
>
> Hmm this complicates things a bit more doesn't it.
> What's the order of:
>     a) setup
>     b) savings devices
>     c) Being able to transmit the pages?
>
> Are these pages that are being modified during setup, being modified
> as part of the device state save?
>

Yes, I'm not sure if the problem still exist or not after exchanging the sequence
of 'save devices' and 'enable ram notify'.

I'll look into it.

Hailaing

> Dave
>
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   include/migration/migration.h |  4 ++--
>>   migration/migration.c         |  2 +-
>>   migration/postcopy-ram.c      | 17 ++++++++++++++++-
>>   migration/ram.c               | 37 +++++++++++++++++++++++++++++++------
>>   4 files changed, 50 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index ef4c071..435de31 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -127,7 +127,7 @@ struct MigrationSrcPageRequest {
>>       RAMBlock *rb;
>>       hwaddr    offset;
>>       hwaddr    len;
>> -
>> +    uint8_t *pages_copy_addr;
>>       QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
>>   };
>>
>> @@ -333,7 +333,7 @@ void global_state_store_running(void);
>>
>>   void flush_page_queue(MigrationState *ms);
>>   int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>> -                         ram_addr_t start, ram_addr_t len);
>> +                         ram_addr_t start, ram_addr_t len, bool copy_pages);
>>
>>   PostcopyState postcopy_state_get(void);
>>   /* Set the state and return the old state */
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 3765c3b..bf4c7a1 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1248,7 +1248,7 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>>           return;
>>       }
>>
>> -    if (ram_save_queue_pages(ms, rbname, start, len)) {
>> +    if (ram_save_queue_pages(ms, rbname, start, len, false)) {
>>           mark_source_rp_bad(ms);
>>       }
>>   }
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index 61392d3..2cf477d 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -543,13 +543,28 @@ static void *postcopy_ram_fault_thread(void *opaque)
>>               MigrationState *ms = container_of(us, MigrationState,
>>                                                 userfault_state);
>>               ret = ram_save_queue_pages(ms, qemu_ram_get_idstr(rb), rb_offset,
>> -                                       hostpagesize);
>> +                                       hostpagesize, true);
>>
>>               if (ret < 0) {
>>                   error_report("%s: Save: %"PRIx64 " failed!",
>>                                __func__, (uint64_t)msg.arg.pagefault.address);
>>                   break;
>>               }
>> +
>> +            /* Note: In the setup process, snapshot_thread may modify VM's
>> +            * write-protected pages, we should not block it there, or there
>> +            * will be an deadlock error.
>> +            */
>> +            if (migration_in_setup(ms)) {
>> +                uint64_t host = msg.arg.pagefault.address;
>> +
>> +                host &= ~(hostpagesize - 1);
>> +                ret = ram_set_pages_wp(host, getpagesize(), true,
>> +                                       us->userfault_fd);
>> +                if (ret < 0) {
>> +                    error_report("Remove page's write-protect failed");
>> +                }
>> +            }
>>           }
>>       }
>>       trace_postcopy_ram_fault_thread_exit();
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 8656719..747f9aa 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -233,6 +233,7 @@ struct PageSearchStatus {
>>       ram_addr_t   offset;
>>       /* Set once we wrap around */
>>       bool         complete_round;
>> +    uint8_t *pages_copy;
>>   };
>>   typedef struct PageSearchStatus PageSearchStatus;
>>
>> @@ -742,7 +743,12 @@ static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
>>       RAMBlock *block = pss->block;
>>       ram_addr_t offset = pss->offset;
>>
>> -    p = block->host + offset;
>> +    /* If we have a copy of this page, use the backup page first */
>> +    if (pss->pages_copy) {
>> +        p = pss->pages_copy;
>> +    } else {
>> +        p = block->host + offset;
>> +    }
>>
>>       /* In doubt sent page as normal */
>>       bytes_xmit = 0;
>> @@ -926,7 +932,12 @@ static int ram_save_compressed_page(QEMUFile *f, PageSearchStatus *pss,
>>       RAMBlock *block = pss->block;
>>       ram_addr_t offset = pss->offset;
>>
>> -    p = block->host + offset;
>> +    /* If we have a copy of this page, use the backup first */
>> +    if (pss->pages_copy) {
>> +        p = pss->pages_copy;
>> +    } else {
>> +        p = block->host + offset;
>> +    }
>>
>>       bytes_xmit = 0;
>>       ret = ram_control_save_page(f, block->offset,
>> @@ -1043,7 +1054,7 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
>>    * Returns:      block (or NULL if none available)
>>    */
>>   static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
>> -                              ram_addr_t *ram_addr_abs)
>> +                              ram_addr_t *ram_addr_abs, uint8 **pages_copy_addr)
>>   {
>>       RAMBlock *block = NULL;
>>
>> @@ -1055,7 +1066,7 @@ static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset,
>>           *offset = entry->offset;
>>           *ram_addr_abs = (entry->offset + entry->rb->offset) &
>>                           TARGET_PAGE_MASK;
>> -
>> +        *pages_copy_addr = entry->pages_copy_addr;
>>           if (entry->len > TARGET_PAGE_SIZE) {
>>               entry->len -= TARGET_PAGE_SIZE;
>>               entry->offset += TARGET_PAGE_SIZE;
>> @@ -1086,9 +1097,10 @@ static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
>>       RAMBlock  *block;
>>       ram_addr_t offset;
>>       bool dirty;
>> +    uint8 *pages_backup_addr = NULL;
>>
>>       do {
>> -        block = unqueue_page(ms, &offset, ram_addr_abs);
>> +        block = unqueue_page(ms, &offset, ram_addr_abs, &pages_backup_addr);
>>           /*
>>            * We're sending this page, and since it's postcopy nothing else
>>            * will dirty it, and we must make sure it doesn't get sent again
>> @@ -1130,6 +1142,7 @@ static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
>>            */
>>           pss->block = block;
>>           pss->offset = offset;
>> +        pss->pages_copy = pages_backup_addr;
>>       }
>>
>>       return !!block;
>> @@ -1166,7 +1179,7 @@ void flush_page_queue(MigrationState *ms)
>>    *   Return: 0 on success
>>    */
>>   int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>> -                         ram_addr_t start, ram_addr_t len)
>> +                         ram_addr_t start, ram_addr_t len, bool copy_pages)
>>   {
>>       RAMBlock *ramblock;
>>
>> @@ -1206,6 +1219,17 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>>       new_entry->rb = ramblock;
>>       new_entry->offset = start;
>>       new_entry->len = len;
>> +    if (copy_pages) {
>> +        /* Fix me: Better to realize a memory pool */
>> +        new_entry->pages_copy_addr = g_try_malloc0(len);
>> +
>> +        if (!new_entry->pages_copy_addr) {
>> +            error_report("%s: Failed to alloc memory", __func__);
>> +            return -1;
>> +        }
>> +
>> +        memcpy(new_entry->pages_copy_addr, ramblock_ptr(ramblock, start), len);
>> +    }
>>
>>       memory_region_ref(ramblock->mr);
>>       qemu_mutex_lock(&ms->src_page_req_mutex);
>> @@ -1342,6 +1366,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>>       pss.block = last_seen_block;
>>       pss.offset = last_offset;
>>       pss.complete_round = false;
>> +    pss.pages_copy = NULL;
>>
>>       if (!pss.block) {
>>           pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-13 18:02 ` Dr. David Alan Gilbert
@ 2016-07-14 10:24   ` Hailiang Zhang
  2016-07-14 11:43     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 48+ messages in thread
From: Hailiang Zhang @ 2016-07-14 10:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: peter.huangpeng, qemu-devel, aarcange, quintela, amit.shah, hanweidong

On 2016/7/14 2:02, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> For now, we still didn't support live memory snapshot, we have discussed
>> a scheme which based on userfaultfd long time ago.
>> You can find the discussion by the follow link:
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>
>> The scheme is based on userfaultfd's write-protect capability.
>> The userfaultfd write protection feature is available here:
>> http://www.spinics.net/lists/linux-mm/msg97422.html
>
> I've (finally!) had a brief look through this, I like the idea.
> I've not bothered with minor cleanup like comments on them;
> I'm sure those will happen later; some larger scale things to think
> about are:
>    a) I wonder if it's really best to put that much code into the postcopy
>       function; it might be but I can see other userfault uses as well.

Yes, it is better to extract common codes into public functions.

>    b) I worry a bit about the size of the copies you create during setup
>       and I don't really understand why you can't start sending those pages

Because we save device state and ram in the same snapshot_thread, if the process
of saving device is blocked by writing pages, we can remove the write-protect in
'postcopy/fault' thread, but can't send it immediately.


>       immediately - but then I worry aobut the relative order of when pages
>       data should be sent compared to the state of devices view of RAM.
>    c) Have you considered also using userfault for loading the snapshot - I
>      know there was someone on #qemu a while ago who was talking about using
>      it as a way to quickly reload from a migration image.
>

I didn't notice such talking before, maybe i missed it.
Could you please send me the link ?

But i do consider the scenario of quickly snapshot restoring.
And the difficulty here is how can we quickly find the position
of the special page. That is, while VM is accessing one page, we
need to find its position in snapshot file and read it into memory.
Consider the compatibility, we hope we can still re-use all migration
capabilities.

My rough idea about the scenario is:
1. Use an array to recode the beginning position of all VM's pages.
Use the offset as the index for the array, just like migration bitmaps.
2. Save the data of the array into another file in a special format.
3. Also record the position of device state data in snapshot file.
(Or we can put the device state data at the head of snapshot file)
4. While restore the snapshot, reload the array first, and then read
the device state.
5. Set all pages to MISS status.
6. Resume VM to run
7. The next process is like how postcopy incoming does.

I'm not sure if this scenario is practicable or not. We need further
discussion. :)

Hailiang

> Dave
>
>>
>> The process of this live memory scheme is like bellow:
>> 1. Pause VM
>> 2. Enable write-protect fault notification by using userfaultfd to
>>     mark VM's memory to write-protect (readonly).
>> 3. Save VM's static state (here is device state) to snapshot file
>> 4. Resume VM, VM is going to run.
>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>     snapshot file.
>> 6. During this time, all the actions of writing VM's memory will be blocked
>>    by kernel, and kernel will wakeup the fault treating thread in qemu to
>>    process this write-protect fault. The fault treating thread will deliver this
>>    page's address to snapshot thread.
>> 7. snapshot thread gets this address, save this page into snasphot file,
>>     and then remove the write-protect by using userfaultfd API, after that,
>>     the actions of writing will be recovered.
>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>
>> Compared with the feature of 'migrate VM's state to file',
>> the main difference for live memory snapshot is it has little time delay for
>> catching VM's state. It just captures the VM's state while got users snapshot
>> command, just like take a photo of VM's state.
>>
>> For now, we only support tcg accelerator, since userfaultfd is not supporting
>> tracking write faults for KVM.
>>
>> Usage:
>> 1. Take a snapshot
>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
>> Issue snapshot command:
>> (qemu)migrate -d file:/home/Snapshot
>> 2. Revert to the snapshot
>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio -incoming file:/home/Snapshot
>>
>> NOTE:
>> The userfaultfd write protection feature does not support THP for now,
>> Before taking snapshot, please disable THP by:
>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>
>> TODO:
>> - Reduce the influence for VM while taking snapshot
>>
>> zhanghailiang (13):
>>    postcopy/migration: Split fault related state into struct
>>      UserfaultState
>>    migration: Allow the migrate command to work on file: urls
>>    migration: Allow -incoming to work on file: urls
>>    migration: Create a snapshot thread to realize saving memory snapshot
>>    migration: implement initialization work for snapshot
>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>    savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>      functions
>>    snapshot: Save VM's device state into snapshot file
>>    migration/postcopy-ram: fix some helper functions to support
>>      userfaultfd write-protect
>>    snapshot: Enable the write-protect notification capability for VM's
>>      RAM
>>    snapshot/migration: Save VM's RAM into snapshot file
>>    migration/ram: Fix some helper functions' parameter to use
>>      PageSearchStatus
>>    snapshot: Remove page's write-protect and copy the content during
>>      setup stage
>>
>>   include/migration/migration.h     |  41 +++++--
>>   include/migration/postcopy-ram.h  |   9 +-
>>   include/migration/qemu-file.h     |   3 +-
>>   include/qemu/typedefs.h           |   1 +
>>   include/sysemu/sysemu.h           |   3 +
>>   linux-headers/linux/userfaultfd.h |  21 +++-
>>   migration/fd.c                    |  51 ++++++++-
>>   migration/migration.c             | 101 ++++++++++++++++-
>>   migration/postcopy-ram.c          | 229 ++++++++++++++++++++++++++++----------
>>   migration/qemu-file-buf.c         |  61 ++++++++++
>>   migration/ram.c                   | 104 ++++++++++++-----
>>   migration/savevm.c                |  90 ++++++++++++---
>>   trace-events                      |   1 +
>>   13 files changed, 587 insertions(+), 128 deletions(-)
>>
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-14 10:24   ` Hailiang Zhang
@ 2016-07-14 11:43     ` Dr. David Alan Gilbert
  2016-07-19  6:53       ` Hailiang Zhang
  0 siblings, 1 reply; 48+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-14 11:43 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: peter.huangpeng, qemu-devel, aarcange, quintela, amit.shah, hanweidong

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2016/7/14 2:02, Dr. David Alan Gilbert wrote:
> > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> > > For now, we still didn't support live memory snapshot, we have discussed
> > > a scheme which based on userfaultfd long time ago.
> > > You can find the discussion by the follow link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
> > > 
> > > The scheme is based on userfaultfd's write-protect capability.
> > > The userfaultfd write protection feature is available here:
> > > http://www.spinics.net/lists/linux-mm/msg97422.html
> > 
> > I've (finally!) had a brief look through this, I like the idea.
> > I've not bothered with minor cleanup like comments on them;
> > I'm sure those will happen later; some larger scale things to think
> > about are:
> >    a) I wonder if it's really best to put that much code into the postcopy
> >       function; it might be but I can see other userfault uses as well.
> 
> Yes, it is better to extract common codes into public functions.
> 
> >    b) I worry a bit about the size of the copies you create during setup
> >       and I don't really understand why you can't start sending those pages
> 
> Because we save device state and ram in the same snapshot_thread, if the process
> of saving device is blocked by writing pages, we can remove the write-protect in
> 'postcopy/fault' thread, but can't send it immediately.

Don't you write the devices to a buffer? If so then you perhaps you could split
writing into that buffer into a separate thread.

> >       immediately - but then I worry aobut the relative order of when pages
> >       data should be sent compared to the state of devices view of RAM.
> >    c) Have you considered also using userfault for loading the snapshot - I
> >      know there was someone on #qemu a while ago who was talking about using
> >      it as a way to quickly reload from a migration image.
> > 
> 
> I didn't notice such talking before, maybe i missed it.
> Could you please send me the link ?

I don't think there's any public docs about it; this was a conversation
with Christoph Seifert on #qemu about May last year.

> But i do consider the scenario of quickly snapshot restoring.
> And the difficulty here is how can we quickly find the position
> of the special page. That is, while VM is accessing one page, we
> need to find its position in snapshot file and read it into memory.
> Consider the compatibility, we hope we can still re-use all migration
> capabilities.
> 
> My rough idea about the scenario is:
> 1. Use an array to recode the beginning position of all VM's pages.
> Use the offset as the index for the array, just like migration bitmaps.
> 2. Save the data of the array into another file in a special format.
> 3. Also record the position of device state data in snapshot file.
> (Or we can put the device state data at the head of snapshot file)
> 4. While restore the snapshot, reload the array first, and then read
> the device state.
> 5. Set all pages to MISS status.
> 6. Resume VM to run
> 7. The next process is like how postcopy incoming does.
> 
> I'm not sure if this scenario is practicable or not. We need further
> discussion. :)

Yes;  I can think of a few different ways to do (2):
  a) We could just store it at the end of the snapshot file (and know that
it's at the end - I think the json format description did a similar trick).
  b) We wouldn't need the 4 byte headers on the page we currently send.
  c) Juan's idea of having multiple fd's for migration streams might also fit,
     with the RAM data in the separate file.
  d) But if we know it's a file (not a network stream) then should we treat it
     specially and just use a sparse file of the same size as RAM, and just
     pwrite() the data into the right offset?

Dave

> 
> Hailiang
> 
> > Dave
> > 
> > > 
> > > The process of this live memory scheme is like bellow:
> > > 1. Pause VM
> > > 2. Enable write-protect fault notification by using userfaultfd to
> > >     mark VM's memory to write-protect (readonly).
> > > 3. Save VM's static state (here is device state) to snapshot file
> > > 4. Resume VM, VM is going to run.
> > > 5. Snapshot thread begins to save VM's live state (here is RAM) into
> > >     snapshot file.
> > > 6. During this time, all the actions of writing VM's memory will be blocked
> > >    by kernel, and kernel will wakeup the fault treating thread in qemu to
> > >    process this write-protect fault. The fault treating thread will deliver this
> > >    page's address to snapshot thread.
> > > 7. snapshot thread gets this address, save this page into snasphot file,
> > >     and then remove the write-protect by using userfaultfd API, after that,
> > >     the actions of writing will be recovered.
> > > 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
> > > 
> > > Compared with the feature of 'migrate VM's state to file',
> > > the main difference for live memory snapshot is it has little time delay for
> > > catching VM's state. It just captures the VM's state while got users snapshot
> > > command, just like take a photo of VM's state.
> > > 
> > > For now, we only support tcg accelerator, since userfaultfd is not supporting
> > > tracking write faults for KVM.
> > > 
> > > Usage:
> > > 1. Take a snapshot
> > > #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
> > > Issue snapshot command:
> > > (qemu)migrate -d file:/home/Snapshot
> > > 2. Revert to the snapshot
> > > #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio -incoming file:/home/Snapshot
> > > 
> > > NOTE:
> > > The userfaultfd write protection feature does not support THP for now,
> > > Before taking snapshot, please disable THP by:
> > > echo never > /sys/kernel/mm/transparent_hugepage/enabled
> > > 
> > > TODO:
> > > - Reduce the influence for VM while taking snapshot
> > > 
> > > zhanghailiang (13):
> > >    postcopy/migration: Split fault related state into struct
> > >      UserfaultState
> > >    migration: Allow the migrate command to work on file: urls
> > >    migration: Allow -incoming to work on file: urls
> > >    migration: Create a snapshot thread to realize saving memory snapshot
> > >    migration: implement initialization work for snapshot
> > >    QEMUSizedBuffer: Introduce two help functions for qsb
> > >    savevm: Split qemu_savevm_state_complete_precopy() into two helper
> > >      functions
> > >    snapshot: Save VM's device state into snapshot file
> > >    migration/postcopy-ram: fix some helper functions to support
> > >      userfaultfd write-protect
> > >    snapshot: Enable the write-protect notification capability for VM's
> > >      RAM
> > >    snapshot/migration: Save VM's RAM into snapshot file
> > >    migration/ram: Fix some helper functions' parameter to use
> > >      PageSearchStatus
> > >    snapshot: Remove page's write-protect and copy the content during
> > >      setup stage
> > > 
> > >   include/migration/migration.h     |  41 +++++--
> > >   include/migration/postcopy-ram.h  |   9 +-
> > >   include/migration/qemu-file.h     |   3 +-
> > >   include/qemu/typedefs.h           |   1 +
> > >   include/sysemu/sysemu.h           |   3 +
> > >   linux-headers/linux/userfaultfd.h |  21 +++-
> > >   migration/fd.c                    |  51 ++++++++-
> > >   migration/migration.c             | 101 ++++++++++++++++-
> > >   migration/postcopy-ram.c          | 229 ++++++++++++++++++++++++++++----------
> > >   migration/qemu-file-buf.c         |  61 ++++++++++
> > >   migration/ram.c                   | 104 ++++++++++++-----
> > >   migration/savevm.c                |  90 ++++++++++++---
> > >   trace-events                      |   1 +
> > >   13 files changed, 587 insertions(+), 128 deletions(-)
> > > 
> > > --
> > > 1.8.3.1
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-14 11:43     ` Dr. David Alan Gilbert
@ 2016-07-19  6:53       ` Hailiang Zhang
  0 siblings, 0 replies; 48+ messages in thread
From: Hailiang Zhang @ 2016-07-19  6:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: peter.huangpeng, qemu-devel, aarcange, quintela, amit.shah, hanweidong

On 2016/7/14 19:43, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2016/7/14 2:02, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> For now, we still didn't support live memory snapshot, we have discussed
>>>> a scheme which based on userfaultfd long time ago.
>>>> You can find the discussion by the follow link:
>>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>>>
>>>> The scheme is based on userfaultfd's write-protect capability.
>>>> The userfaultfd write protection feature is available here:
>>>> http://www.spinics.net/lists/linux-mm/msg97422.html
>>>
>>> I've (finally!) had a brief look through this, I like the idea.
>>> I've not bothered with minor cleanup like comments on them;
>>> I'm sure those will happen later; some larger scale things to think
>>> about are:
>>>     a) I wonder if it's really best to put that much code into the postcopy
>>>        function; it might be but I can see other userfault uses as well.
>>
>> Yes, it is better to extract common codes into public functions.
>>
>>>     b) I worry a bit about the size of the copies you create during setup
>>>        and I don't really understand why you can't start sending those pages
>>
>> Because we save device state and ram in the same snapshot_thread, if the process
>> of saving device is blocked by writing pages, we can remove the write-protect in
>> 'postcopy/fault' thread, but can't send it immediately.
>
> Don't you write the devices to a buffer? If so then you perhaps you could split
> writing into that buffer into a separate thread.
>

Hmm, it may work in this way.

>>>        immediately - but then I worry aobut the relative order of when pages
>>>        data should be sent compared to the state of devices view of RAM.
>>>     c) Have you considered also using userfault for loading the snapshot - I
>>>       know there was someone on #qemu a while ago who was talking about using
>>>       it as a way to quickly reload from a migration image.
>>>
>>
>> I didn't notice such talking before, maybe i missed it.
>> Could you please send me the link ?
>
> I don't think there's any public docs about it; this was a conversation
> with Christoph Seifert on #qemu about May last year.
>

Got it.

>> But i do consider the scenario of quickly snapshot restoring.
>> And the difficulty here is how can we quickly find the position
>> of the special page. That is, while VM is accessing one page, we
>> need to find its position in snapshot file and read it into memory.
>> Consider the compatibility, we hope we can still re-use all migration
>> capabilities.
>>
>> My rough idea about the scenario is:
>> 1. Use an array to recode the beginning position of all VM's pages.
>> Use the offset as the index for the array, just like migration bitmaps.
>> 2. Save the data of the array into another file in a special format.
>> 3. Also record the position of device state data in snapshot file.
>> (Or we can put the device state data at the head of snapshot file)
>> 4. While restore the snapshot, reload the array first, and then read
>> the device state.
>> 5. Set all pages to MISS status.
>> 6. Resume VM to run
>> 7. The next process is like how postcopy incoming does.
>>
>> I'm not sure if this scenario is practicable or not. We need further
>> discussion. :)
>
> Yes;  I can think of a few different ways to do (2):
>    a) We could just store it at the end of the snapshot file (and know that
> it's at the end - I think the json format description did a similar trick).

Yes, this is a better idea.

>    b) We wouldn't need the 4 byte headers on the page we currently send.
>    c) Juan's idea of having multiple fd's for migration streams might also fit,
>       with the RAM data in the separate file.
>    d) But if we know it's a file (not a network stream) then should we treat it
>       specially and just use a sparse file of the same size as RAM, and just
>       pwrite() the data into the right offset?
>

Yes, this is the simplest way to save the snapshot file, the disadvantage for
it is we can't directly reuse current migration incoming way to restore VM (None
quickly restore). We need to modify current restore process. I'm not sure which
way is better. But it's worth a try.

Hailiang

> Dave
>
>>
>> Hailiang
>>
>>> Dave
>>>
>>>>
>>>> The process of this live memory scheme is like bellow:
>>>> 1. Pause VM
>>>> 2. Enable write-protect fault notification by using userfaultfd to
>>>>      mark VM's memory to write-protect (readonly).
>>>> 3. Save VM's static state (here is device state) to snapshot file
>>>> 4. Resume VM, VM is going to run.
>>>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>>>      snapshot file.
>>>> 6. During this time, all the actions of writing VM's memory will be blocked
>>>>     by kernel, and kernel will wakeup the fault treating thread in qemu to
>>>>     process this write-protect fault. The fault treating thread will deliver this
>>>>     page's address to snapshot thread.
>>>> 7. snapshot thread gets this address, save this page into snasphot file,
>>>>      and then remove the write-protect by using userfaultfd API, after that,
>>>>      the actions of writing will be recovered.
>>>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>>>
>>>> Compared with the feature of 'migrate VM's state to file',
>>>> the main difference for live memory snapshot is it has little time delay for
>>>> catching VM's state. It just captures the VM's state while got users snapshot
>>>> command, just like take a photo of VM's state.
>>>>
>>>> For now, we only support tcg accelerator, since userfaultfd is not supporting
>>>> tracking write faults for KVM.
>>>>
>>>> Usage:
>>>> 1. Take a snapshot
>>>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
>>>> Issue snapshot command:
>>>> (qemu)migrate -d file:/home/Snapshot
>>>> 2. Revert to the snapshot
>>>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio -incoming file:/home/Snapshot
>>>>
>>>> NOTE:
>>>> The userfaultfd write protection feature does not support THP for now,
>>>> Before taking snapshot, please disable THP by:
>>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>>>
>>>> TODO:
>>>> - Reduce the influence for VM while taking snapshot
>>>>
>>>> zhanghailiang (13):
>>>>     postcopy/migration: Split fault related state into struct
>>>>       UserfaultState
>>>>     migration: Allow the migrate command to work on file: urls
>>>>     migration: Allow -incoming to work on file: urls
>>>>     migration: Create a snapshot thread to realize saving memory snapshot
>>>>     migration: implement initialization work for snapshot
>>>>     QEMUSizedBuffer: Introduce two help functions for qsb
>>>>     savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>>>       functions
>>>>     snapshot: Save VM's device state into snapshot file
>>>>     migration/postcopy-ram: fix some helper functions to support
>>>>       userfaultfd write-protect
>>>>     snapshot: Enable the write-protect notification capability for VM's
>>>>       RAM
>>>>     snapshot/migration: Save VM's RAM into snapshot file
>>>>     migration/ram: Fix some helper functions' parameter to use
>>>>       PageSearchStatus
>>>>     snapshot: Remove page's write-protect and copy the content during
>>>>       setup stage
>>>>
>>>>    include/migration/migration.h     |  41 +++++--
>>>>    include/migration/postcopy-ram.h  |   9 +-
>>>>    include/migration/qemu-file.h     |   3 +-
>>>>    include/qemu/typedefs.h           |   1 +
>>>>    include/sysemu/sysemu.h           |   3 +
>>>>    linux-headers/linux/userfaultfd.h |  21 +++-
>>>>    migration/fd.c                    |  51 ++++++++-
>>>>    migration/migration.c             | 101 ++++++++++++++++-
>>>>    migration/postcopy-ram.c          | 229 ++++++++++++++++++++++++++++----------
>>>>    migration/qemu-file-buf.c         |  61 ++++++++++
>>>>    migration/ram.c                   | 104 ++++++++++++-----
>>>>    migration/savevm.c                |  90 ++++++++++++---
>>>>    trace-events                      |   1 +
>>>>    13 files changed, 587 insertions(+), 128 deletions(-)
>>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-07-05 10:27       ` Hailiang Zhang
@ 2016-08-18 15:56         ` Andrea Arcangeli
  2016-08-20  6:31           ` Hailiang Zhang
  2016-09-06  3:39           ` [Qemu-devel] [RFC 00/13] Live " Hailiang Zhang
  0 siblings, 2 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2016-08-18 15:56 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Baptiste Reynal, peter.huangpeng, qemu list, hanweidong,
	Juan Quintela, dgilbert, Amit Shah, Christian Pinto

Hello everyone,

I've an aa.git tree uptodate on the master & userfault branch (master
includes other pending VM stuff, userfault branch only contains
userfault enhancements):

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

I didn't have time to test KVM live memory snapshot on it yet as I'm
still working to improve it. Did anybody test it? However I'd be happy
to take any bugreports and quickly solve anything that isn't working
right with the shadow MMU.

I got positive report already for another usage of the uffd WP support:

https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f

The last few things I'm working on to finish the WP support are:

1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
   vma->vm_flags with VM_UFFD_WP set, which swap entries were
   generated while the pte was wrprotected.

2) to avoid all false positives the equivalent of pte_mksoft_dirty is
   needed too... and that requires spare software bits on the pte
   which are available on x86. I considered also taking over the
   soft_dirty bit but then you couldn't do checkpoint restore of a
   JIT/to-native compiler that uses uffd WP support so it wasn't
   ideal. Perhaps it would be ok as an incremental patch to make the
   two options mutually exclusive to defer the arch changes that
   pte_mkuffd_wp would require for later.

3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
   cow in userfaultfd_writeprotect.

4) WP selftest

In theory things should work ok already if the userland code is
tolerant against false positives through swap and after fork() and
KSM. For an usage like snapshotting false positives shouldn't be an
issue (it'll just run slower if you swap in the worst case), and point
3) above also isn't an issue because it's going to register into uffd
with WP only.

The current status includes:

1) WP support for anon (with false positives.. work in progress)

2) MISSING support for tmpfs and hugetlbfs

3) non cooperative support

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-08-18 15:56         ` Andrea Arcangeli
@ 2016-08-20  6:31           ` Hailiang Zhang
  2017-02-27 15:37             ` Christian Pinto
  2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
  2016-09-06  3:39           ` [Qemu-devel] [RFC 00/13] Live " Hailiang Zhang
  1 sibling, 2 replies; 48+ messages in thread
From: Hailiang Zhang @ 2016-08-20  6:31 UTC (permalink / raw)
  To: Andrea Arcangeli, Juan Quintela, dgilbert, Amit Shah
  Cc: peter.huangpeng, Baptiste Reynal, qemu list, hanweidong,
	Christian Pinto, colo-ft

Hi,

I updated this series, but didn't post it, because there are some problems while i tested the snapshot function.
I didn't know if it is the userfaultfd issue or not.
I don't have time to investigate it this month. I have put them in github
https://github.com/coloft/qemu/tree/snapshot-v2

Anyone who want to test and modify it are welcomed!

Besides, will you join the linuxcon or KVM forum in Canada ?
I wish to see you there if you join the conference ;)

Thanks,
Hailiang



On 2016/8/18 23:56, Andrea Arcangeli wrote:
> Hello everyone,
>
> I've an aa.git tree uptodate on the master & userfault branch (master
> includes other pending VM stuff, userfault branch only contains
> userfault enhancements):
>
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>
> I didn't have time to test KVM live memory snapshot on it yet as I'm
> still working to improve it. Did anybody test it? However I'd be happy
> to take any bugreports and quickly solve anything that isn't working
> right with the shadow MMU.
>
> I got positive report already for another usage of the uffd WP support:
>
> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f
>
> The last few things I'm working on to finish the WP support are:
>
> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>     vma->vm_flags with VM_UFFD_WP set, which swap entries were
>     generated while the pte was wrprotected.
>
> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>     needed too... and that requires spare software bits on the pte
>     which are available on x86. I considered also taking over the
>     soft_dirty bit but then you couldn't do checkpoint restore of a
>     JIT/to-native compiler that uses uffd WP support so it wasn't
>     ideal. Perhaps it would be ok as an incremental patch to make the
>     two options mutually exclusive to defer the arch changes that
>     pte_mkuffd_wp would require for later.
>
> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>     cow in userfaultfd_writeprotect.
>
> 4) WP selftest
>
> In theory things should work ok already if the userland code is
> tolerant against false positives through swap and after fork() and
> KSM. For an usage like snapshotting false positives shouldn't be an
> issue (it'll just run slower if you swap in the worst case), and point
> 3) above also isn't an issue because it's going to register into uffd
> with WP only.
>
> The current status includes:
>
> 1) WP support for anon (with false positives.. work in progress)
>
> 2) MISSING support for tmpfs and hugetlbfs
>
> 3) non cooperative support
>
> Thanks,
> Andrea
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-08-18 15:56         ` Andrea Arcangeli
  2016-08-20  6:31           ` Hailiang Zhang
@ 2016-09-06  3:39           ` Hailiang Zhang
  2016-09-18  2:14             ` Hailiang Zhang
  1 sibling, 1 reply; 48+ messages in thread
From: Hailiang Zhang @ 2016-09-06  3:39 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: peter.huangpeng, Baptiste Reynal, qemu list, hanweidong,
	Juan Quintela, dgilbert, Amit Shah, Christian Pinto

Hi Andrea,

I tested it with the new live memory snapshot with --enable-kvm, it doesn't work.

To make things simple, I simplified the codes, only left the codes that can tested
the write-protect capability. You can find the codes from
https://github.com/coloft/qemu/tree/test-userfault-write-protect.
You can reproduce the problem easily with it.

Tested result as follow,
[root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
QEMU 2.6.95 monitor - type 'help' for more information
(qemu) migrate file:/home/xxx
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
error: kvm run failed Bad address
EAX=00000004 EBX=00000000 ECX=83b2ac20 EDX=0000c022
ESI=85fe33f4 EDI=0000c020 EBP=83b2abcc ESP=83b2abc0
EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =0030 83b2dc00 00003748 00409300 DPL=0 DS   [-WA]
GS =0000 00000000 ffffffff 00000000
LDT=0000 00000000 ffffffff 00000000
TR =0028 801e2000 000020ab 00008b00 DPL=0 TSS32-busy
GDT=     80b95000 000003ff
IDT=     80b95400 000007ff
CR0=8001003b CR2=030b5000 CR3=00185000 CR4=000006f8
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000800
Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b

I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM with the Linux
Memory Management.

Here for userfault write-protect, the function calling paths are:
userfaultfd_ioctl
   -> userfaultfd_writeprotect
     -> mwriteprotect_range
       -> change_protection (Directly call mprotect helper here)
         -> change_protection_range
           -> change_pud_range
             -> change_pmd_range
                -> mmu_notifier_invalidate_range_start(mm, mni_start, end);
                   -> kvm_mmu_notifier_invalidate_range_start (KVM module)
OK, here, we remove the item from spte. (If we use EPT hardware, we remove
the page table entry for it).
That's why we can get fault notifying for VM.
And It seems that we can't fix the userfault (remove the page's write-protect authority)
by this function calling paths.

Here my question is, for userfault write-protect capability, why we remove the page table
entry instead of marking it as read-only.
Actually, for KVM, we have a mmu notifier (kvm_mmu_notifier_change_pte) to do this,
We can use it to remove the writable authority for KVM page table, just like KVM dirty log tracking
does. Please see function __rmap_write_protect() in KVM.

Another question, is mprotect() works normally with KVM ? (I didn't test it.), I think
KSM and swap can work with KVM properly.

Besides, there seems to be a bug for userfault write-protect.
We use UFFDIO_COPY_MODE_DONTWAKE in userfaultfd_writeprotect, should it be
UFFDIO_WRITEPROTECT_MODE_DONTWAKE there ?

static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
				    unsigned long arg)
{
        ... ...

	if (!(uffdio_wp.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
		range.start = uffdio_wp.range.start;
		range.len = uffdio_wp.range.len;
		wake_userfault(ctx, &range);
	}
	return ret;
}

Thanks.
Hailiang

On 2016/8/18 23:56, Andrea Arcangeli wrote:
> Hello everyone,
>
> I've an aa.git tree uptodate on the master & userfault branch (master
> includes other pending VM stuff, userfault branch only contains
> userfault enhancements):
>
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>
> I didn't have time to test KVM live memory snapshot on it yet as I'm
> still working to improve it. Did anybody test it? However I'd be happy
> to take any bugreports and quickly solve anything that isn't working
> right with the shadow MMU.
>
> I got positive report already for another usage of the uffd WP support:
>
> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f
>
> The last few things I'm working on to finish the WP support are:
>
> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>     vma->vm_flags with VM_UFFD_WP set, which swap entries were
>     generated while the pte was wrprotected.
>
> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>     needed too... and that requires spare software bits on the pte
>     which are available on x86. I considered also taking over the
>     soft_dirty bit but then you couldn't do checkpoint restore of a
>     JIT/to-native compiler that uses uffd WP support so it wasn't
>     ideal. Perhaps it would be ok as an incremental patch to make the
>     two options mutually exclusive to defer the arch changes that
>     pte_mkuffd_wp would require for later.
>
> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>     cow in userfaultfd_writeprotect.
>
> 4) WP selftest
>
> In theory things should work ok already if the userland code is
> tolerant against false positives through swap and after fork() and
> KSM. For an usage like snapshotting false positives shouldn't be an
> issue (it'll just run slower if you swap in the worst case), and point
> 3) above also isn't an issue because it's going to register into uffd
> with WP only.
>
> The current status includes:
>
> 1) WP support for anon (with false positives.. work in progress)
>
> 2) MISSING support for tmpfs and hugetlbfs
>
> 3) non cooperative support
>
> Thanks,
> Andrea
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-09-06  3:39           ` [Qemu-devel] [RFC 00/13] Live " Hailiang Zhang
@ 2016-09-18  2:14             ` Hailiang Zhang
  2016-12-08 12:45               ` Hailiang Zhang
  0 siblings, 1 reply; 48+ messages in thread
From: Hailiang Zhang @ 2016-09-18  2:14 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: peter.huangpeng, Baptiste Reynal, qemu list, hanweidong,
	Juan Quintela, dgilbert, Amit Shah, Christian Pinto

Hi Andrea,

Any comments ?

Thanks.

On 2016/9/6 11:39, Hailiang Zhang wrote:
> Hi Andrea,
>
> I tested it with the new live memory snapshot with --enable-kvm, it doesn't work.
>
> To make things simple, I simplified the codes, only left the codes that can tested
> the write-protect capability. You can find the codes from
> https://github.com/coloft/qemu/tree/test-userfault-write-protect.
> You can reproduce the problem easily with it.
>
> Tested result as follow,
> [root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
> QEMU 2.6.95 monitor - type 'help' for more information
> (qemu) migrate file:/home/xxx
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> error: kvm run failed Bad address
> EAX=00000004 EBX=00000000 ECX=83b2ac20 EDX=0000c022
> ESI=85fe33f4 EDI=0000c020 EBP=83b2abcc ESP=83b2abc0
> EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
> CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
> SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
> FS =0030 83b2dc00 00003748 00409300 DPL=0 DS   [-WA]
> GS =0000 00000000 ffffffff 00000000
> LDT=0000 00000000 ffffffff 00000000
> TR =0028 801e2000 000020ab 00008b00 DPL=0 TSS32-busy
> GDT=     80b95000 000003ff
> IDT=     80b95400 000007ff
> CR0=8001003b CR2=030b5000 CR3=00185000 CR4=000006f8
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000800
> Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b
>
> I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM with the Linux
> Memory Management.
>
> Here for userfault write-protect, the function calling paths are:
> userfaultfd_ioctl
>     -> userfaultfd_writeprotect
>       -> mwriteprotect_range
>         -> change_protection (Directly call mprotect helper here)
>           -> change_protection_range
>             -> change_pud_range
>               -> change_pmd_range
>                  -> mmu_notifier_invalidate_range_start(mm, mni_start, end);
>                     -> kvm_mmu_notifier_invalidate_range_start (KVM module)
> OK, here, we remove the item from spte. (If we use EPT hardware, we remove
> the page table entry for it).
> That's why we can get fault notifying for VM.
> And It seems that we can't fix the userfault (remove the page's write-protect authority)
> by this function calling paths.
>
> Here my question is, for userfault write-protect capability, why we remove the page table
> entry instead of marking it as read-only.
> Actually, for KVM, we have a mmu notifier (kvm_mmu_notifier_change_pte) to do this,
> We can use it to remove the writable authority for KVM page table, just like KVM dirty log tracking
> does. Please see function __rmap_write_protect() in KVM.
>
> Another question, is mprotect() works normally with KVM ? (I didn't test it.), I think
> KSM and swap can work with KVM properly.
>
> Besides, there seems to be a bug for userfault write-protect.
> We use UFFDIO_COPY_MODE_DONTWAKE in userfaultfd_writeprotect, should it be
> UFFDIO_WRITEPROTECT_MODE_DONTWAKE there ?
>
> static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
> 				    unsigned long arg)
> {
>          ... ...
>
> 	if (!(uffdio_wp.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
> 		range.start = uffdio_wp.range.start;
> 		range.len = uffdio_wp.range.len;
> 		wake_userfault(ctx, &range);
> 	}
> 	return ret;
> }
>
> Thanks.
> Hailiang
>
> On 2016/8/18 23:56, Andrea Arcangeli wrote:
>> Hello everyone,
>>
>> I've an aa.git tree uptodate on the master & userfault branch (master
>> includes other pending VM stuff, userfault branch only contains
>> userfault enhancements):
>>
>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>>
>> I didn't have time to test KVM live memory snapshot on it yet as I'm
>> still working to improve it. Did anybody test it? However I'd be happy
>> to take any bugreports and quickly solve anything that isn't working
>> right with the shadow MMU.
>>
>> I got positive report already for another usage of the uffd WP support:
>>
>> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f
>>
>> The last few things I'm working on to finish the WP support are:
>>
>> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>>      vma->vm_flags with VM_UFFD_WP set, which swap entries were
>>      generated while the pte was wrprotected.
>>
>> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>>      needed too... and that requires spare software bits on the pte
>>      which are available on x86. I considered also taking over the
>>      soft_dirty bit but then you couldn't do checkpoint restore of a
>>      JIT/to-native compiler that uses uffd WP support so it wasn't
>>      ideal. Perhaps it would be ok as an incremental patch to make the
>>      two options mutually exclusive to defer the arch changes that
>>      pte_mkuffd_wp would require for later.
>>
>> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>>      cow in userfaultfd_writeprotect.
>>
>> 4) WP selftest
>>
>> In theory things should work ok already if the userland code is
>> tolerant against false positives through swap and after fork() and
>> KSM. For an usage like snapshotting false positives shouldn't be an
>> issue (it'll just run slower if you swap in the worst case), and point
>> 3) above also isn't an issue because it's going to register into uffd
>> with WP only.
>>
>> The current status includes:
>>
>> 1) WP support for anon (with false positives.. work in progress)
>>
>> 2) MISSING support for tmpfs and hugetlbfs
>>
>> 3) non cooperative support
>>
>> Thanks,
>> Andrea
>>
>> .
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-09-18  2:14             ` Hailiang Zhang
@ 2016-12-08 12:45               ` Hailiang Zhang
  0 siblings, 0 replies; 48+ messages in thread
From: Hailiang Zhang @ 2016-12-08 12:45 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: peter.huangpeng, Baptiste Reynal, qemu list, hanweidong,
	Juan Quintela, dgilbert, Amit Shah, Christian Pinto, xuquan8

Hi Andrea,

I noticed that, you call change_protection() helper in mprotect to realize
write protect capability for userfault. But i doubted mprotect can't work
properly with KVM. If shadow page table (spte) which used by VM is already
established in EPT,change_protection() does not remove its write authority
but only invalid its Host page-table and shadow page table ( kvm registers
invalidate_page/invalidate_range_start).

I investigated ksm, Since it can merge the pages which are used by VM,
and it need to remove the write authority of these pages too.
Its process is not same with mprotect. It has a helper write_protect_page(),
and it finally calls hook function change_pte in KVM. It will remove the page's
write authority in EPT page table.

The code path is:
write_protect_page
     -> set_pte_at_notify
        -> mmu_notifier_change_pte
           -> mn->ops->change_pte
              -> kvm_mmu_notifier_change_pte

(If I'm wrong, please let me know :) ).

So IMHO, we can realize userfault supporting KVM by refer to ksm,
I will investigate it deeply and try to implement it,
but I'm not quite familiar with memory system in kernel,
so it will takes me some time to study it firstly ...
I'd like to know if you have any plan about supporting KVM for userfault ?

Thanks,
Hailiang

On 2016/9/18 10:14, Hailiang Zhang wrote:
> Hi Andrea,
>
> Any comments ?
>
> Thanks.
>
> On 2016/9/6 11:39, Hailiang Zhang wrote:
>> Hi Andrea,
>>
>> I tested it with the new live memory snapshot with --enable-kvm, it doesn't work.
>>
>> To make things simple, I simplified the codes, only left the codes that can tested
>> the write-protect capability. You can find the codes from
>> https://github.com/coloft/qemu/tree/test-userfault-write-protect.
>> You can reproduce the problem easily with it.
>>
>> Tested result as follow,
>> [root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
>> QEMU 2.6.95 monitor - type 'help' for more information
>> (qemu) migrate file:/home/xxx
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
>> error: kvm run failed Bad address
>> EAX=00000004 EBX=00000000 ECX=83b2ac20 EDX=0000c022
>> ESI=85fe33f4 EDI=0000c020 EBP=83b2abcc ESP=83b2abc0
>> EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
>> CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
>> SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
>> FS =0030 83b2dc00 00003748 00409300 DPL=0 DS   [-WA]
>> GS =0000 00000000 ffffffff 00000000
>> LDT=0000 00000000 ffffffff 00000000
>> TR =0028 801e2000 000020ab 00008b00 DPL=0 TSS32-busy
>> GDT=     80b95000 000003ff
>> IDT=     80b95400 000007ff
>> CR0=8001003b CR2=030b5000 CR3=00185000 CR4=000006f8
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000800
>> Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b
>>
>> I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM with the Linux
>> Memory Management.
>>
>> Here for userfault write-protect, the function calling paths are:
>> userfaultfd_ioctl
>>      -> userfaultfd_writeprotect
>>        -> mwriteprotect_range
>>          -> change_protection (Directly call mprotect helper here)
>>            -> change_protection_range
>>              -> change_pud_range
>>                -> change_pmd_range
>>                   -> mmu_notifier_invalidate_range_start(mm, mni_start, end);
>>                      -> kvm_mmu_notifier_invalidate_range_start (KVM module)
>> OK, here, we remove the item from spte. (If we use EPT hardware, we remove
>> the page table entry for it).
>> That's why we can get fault notifying for VM.
>> And It seems that we can't fix the userfault (remove the page's write-protect authority)
>> by this function calling paths.
>>
>> Here my question is, for userfault write-protect capability, why we remove the page table
>> entry instead of marking it as read-only.
>> Actually, for KVM, we have a mmu notifier (kvm_mmu_notifier_change_pte) to do this,
>> We can use it to remove the writable authority for KVM page table, just like KVM dirty log tracking
>> does. Please see function __rmap_write_protect() in KVM.
>>
>> Another question, is mprotect() works normally with KVM ? (I didn't test it.), I think
>> KSM and swap can work with KVM properly.
>>
>> Besides, there seems to be a bug for userfault write-protect.
>> We use UFFDIO_COPY_MODE_DONTWAKE in userfaultfd_writeprotect, should it be
>> UFFDIO_WRITEPROTECT_MODE_DONTWAKE there ?
>>
>> static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
>> 				    unsigned long arg)
>> {
>>           ... ...
>>
>> 	if (!(uffdio_wp.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
>> 		range.start = uffdio_wp.range.start;
>> 		range.len = uffdio_wp.range.len;
>> 		wake_userfault(ctx, &range);
>> 	}
>> 	return ret;
>> }
>>
>> Thanks.
>> Hailiang
>>
>> On 2016/8/18 23:56, Andrea Arcangeli wrote:
>>> Hello everyone,
>>>
>>> I've an aa.git tree uptodate on the master & userfault branch (master
>>> includes other pending VM stuff, userfault branch only contains
>>> userfault enhancements):
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>>>
>>> I didn't have time to test KVM live memory snapshot on it yet as I'm
>>> still working to improve it. Did anybody test it? However I'd be happy
>>> to take any bugreports and quickly solve anything that isn't working
>>> right with the shadow MMU.
>>>
>>> I got positive report already for another usage of the uffd WP support:
>>>
>>> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f
>>>
>>> The last few things I'm working on to finish the WP support are:
>>>
>>> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>>>       vma->vm_flags with VM_UFFD_WP set, which swap entries were
>>>       generated while the pte was wrprotected.
>>>
>>> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>>>       needed too... and that requires spare software bits on the pte
>>>       which are available on x86. I considered also taking over the
>>>       soft_dirty bit but then you couldn't do checkpoint restore of a
>>>       JIT/to-native compiler that uses uffd WP support so it wasn't
>>>       ideal. Perhaps it would be ok as an incremental patch to make the
>>>       two options mutually exclusive to defer the arch changes that
>>>       pte_mkuffd_wp would require for later.
>>>
>>> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>>>       cow in userfaultfd_writeprotect.
>>>
>>> 4) WP selftest
>>>
>>> In theory things should work ok already if the userland code is
>>> tolerant against false positives through swap and after fork() and
>>> KSM. For an usage like snapshotting false positives shouldn't be an
>>> issue (it'll just run slower if you swap in the worst case), and point
>>> 3) above also isn't an issue because it's going to register into uffd
>>> with WP only.
>>>
>>> The current status includes:
>>>
>>> 1) WP support for anon (with false positives.. work in progress)
>>>
>>> 2) MISSING support for tmpfs and hugetlbfs
>>>
>>> 3) non cooperative support
>>>
>>> Thanks,
>>> Andrea
>>>
>>> .
>>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2016-08-20  6:31           ` Hailiang Zhang
@ 2017-02-27 15:37             ` Christian Pinto
  2017-02-28  1:48               ` Hailiang Zhang
  2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
  1 sibling, 1 reply; 48+ messages in thread
From: Christian Pinto @ 2017-02-27 15:37 UTC (permalink / raw)
  To: Hailiang Zhang, Andrea Arcangeli, Juan Quintela, dgilbert, Amit Shah
  Cc: peter.huangpeng, Baptiste Reynal, qemu list, hanweidong, colo-ft

Hello Hailiang,

are there any updates on this patch series? Are you planning to release 
a new version?

You say there are some issues with the current snapshot-v2 version, 
which issues were you referring to? On my side the only problem I have 
seen was that the live snapshot was not working on ARMv8, but I have 
fixed that and managed to successfully snapshot and restore a QEMU ARMv8 
tcg machine on an ARMv8 host. I will gladly contribute with these fixes 
once you will release a new version of the patches.


Thanks a lot,

Christian

On 20/08/2016 08:31, Hailiang Zhang wrote:
> Hi,
>
> I updated this series, but didn't post it, because there are some 
> problems while i tested the snapshot function.
> I didn't know if it is the userfaultfd issue or not.
> I don't have time to investigate it this month. I have put them in github
> https://github.com/coloft/qemu/tree/snapshot-v2
>
> Anyone who want to test and modify it are welcomed!
>
> Besides, will you join the linuxcon or KVM forum in Canada ?
> I wish to see you there if you join the conference ;)
>
> Thanks,
> Hailiang
>
>
>
> On 2016/8/18 23:56, Andrea Arcangeli wrote:
>> Hello everyone,
>>
>> I've an aa.git tree uptodate on the master & userfault branch (master
>> includes other pending VM stuff, userfault branch only contains
>> userfault enhancements):
>>
>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault 
>>
>>
>> I didn't have time to test KVM live memory snapshot on it yet as I'm
>> still working to improve it. Did anybody test it? However I'd be happy
>> to take any bugreports and quickly solve anything that isn't working
>> right with the shadow MMU.
>>
>> I got positive report already for another usage of the uffd WP support:
>>
>> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f 
>>
>>
>> The last few things I'm working on to finish the WP support are:
>>
>> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>>     vma->vm_flags with VM_UFFD_WP set, which swap entries were
>>     generated while the pte was wrprotected.
>>
>> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>>     needed too... and that requires spare software bits on the pte
>>     which are available on x86. I considered also taking over the
>>     soft_dirty bit but then you couldn't do checkpoint restore of a
>>     JIT/to-native compiler that uses uffd WP support so it wasn't
>>     ideal. Perhaps it would be ok as an incremental patch to make the
>>     two options mutually exclusive to defer the arch changes that
>>     pte_mkuffd_wp would require for later.
>>
>> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>>     cow in userfaultfd_writeprotect.
>>
>> 4) WP selftest
>>
>> In theory things should work ok already if the userland code is
>> tolerant against false positives through swap and after fork() and
>> KSM. For an usage like snapshotting false positives shouldn't be an
>> issue (it'll just run slower if you swap in the worst case), and point
>> 3) above also isn't an issue because it's going to register into uffd
>> with WP only.
>>
>> The current status includes:
>>
>> 1) WP support for anon (with false positives.. work in progress)
>>
>> 2) MISSING support for tmpfs and hugetlbfs
>>
>> 3) non cooperative support
>>
>> Thanks,
>> Andrea
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2017-02-27 15:37             ` Christian Pinto
@ 2017-02-28  1:48               ` Hailiang Zhang
  2017-02-28  8:30                 ` Christian Pinto
  2017-02-28 16:14                 ` Andrea Arcangeli
  0 siblings, 2 replies; 48+ messages in thread
From: Hailiang Zhang @ 2017-02-28  1:48 UTC (permalink / raw)
  To: Christian Pinto, Andrea Arcangeli, Juan Quintela, dgilbert
  Cc: xuquan8, peter.huangpeng, Baptiste Reynal, qemu list,
	Huangweidong (C),
	colo-ft

Hi,

On 2017/2/27 23:37, Christian Pinto wrote:
> Hello Hailiang,
>
> are there any updates on this patch series? Are you planning to release
> a new version?
>

No, userfaultfd still does not support write-protect for KVM.
You can see the newest discussion about it here:
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html

> You say there are some issues with the current snapshot-v2 version,
> which issues were you referring to? On my side the only problem I have
> seen was that the live snapshot was not working on ARMv8, but I have
> fixed that and managed to successfully snapshot and restore a QEMU ARMv8
> tcg machine on an ARMv8 host. I will gladly contribute with these fixes
> once you will release a new version of the patches.
>

Yes, for current implementing of live snapshot, it supports tcg,
but does not support kvm mode, the reason i have mentioned above,
if you try to implement it, i think you need to start from userfaultfd
supporting KVM. There is scenario for it, But I'm blocked by other things
these days. I'm glad to discuss with you if you planed to do it.

Thanks.
Hailiang

>
> Thanks a lot,
>
> Christian
>
> On 20/08/2016 08:31, Hailiang Zhang wrote:
>> Hi,
>>
>> I updated this series, but didn't post it, because there are some
>> problems while i tested the snapshot function.
>> I didn't know if it is the userfaultfd issue or not.
>> I don't have time to investigate it this month. I have put them in github
>> https://github.com/coloft/qemu/tree/snapshot-v2
>>
>> Anyone who want to test and modify it are welcomed!
>>
>> Besides, will you join the linuxcon or KVM forum in Canada ?
>> I wish to see you there if you join the conference ;)
>>
>> Thanks,
>> Hailiang
>>
>>
>>
>> On 2016/8/18 23:56, Andrea Arcangeli wrote:
>>> Hello everyone,
>>>
>>> I've an aa.git tree uptodate on the master & userfault branch (master
>>> includes other pending VM stuff, userfault branch only contains
>>> userfault enhancements):
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>>>
>>>
>>> I didn't have time to test KVM live memory snapshot on it yet as I'm
>>> still working to improve it. Did anybody test it? However I'd be happy
>>> to take any bugreports and quickly solve anything that isn't working
>>> right with the shadow MMU.
>>>
>>> I got positive report already for another usage of the uffd WP support:
>>>
>>> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f
>>>
>>>
>>> The last few things I'm working on to finish the WP support are:
>>>
>>> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>>>      vma->vm_flags with VM_UFFD_WP set, which swap entries were
>>>      generated while the pte was wrprotected.
>>>
>>> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>>>      needed too... and that requires spare software bits on the pte
>>>      which are available on x86. I considered also taking over the
>>>      soft_dirty bit but then you couldn't do checkpoint restore of a
>>>      JIT/to-native compiler that uses uffd WP support so it wasn't
>>>      ideal. Perhaps it would be ok as an incremental patch to make the
>>>      two options mutually exclusive to defer the arch changes that
>>>      pte_mkuffd_wp would require for later.
>>>
>>> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>>>      cow in userfaultfd_writeprotect.
>>>
>>> 4) WP selftest
>>>
>>> In theory things should work ok already if the userland code is
>>> tolerant against false positives through swap and after fork() and
>>> KSM. For an usage like snapshotting false positives shouldn't be an
>>> issue (it'll just run slower if you swap in the worst case), and point
>>> 3) above also isn't an issue because it's going to register into uffd
>>> with WP only.
>>>
>>> The current status includes:
>>>
>>> 1) WP support for anon (with false positives.. work in progress)
>>>
>>> 2) MISSING support for tmpfs and hugetlbfs
>>>
>>> 3) non cooperative support
>>>
>>> Thanks,
>>> Andrea
>>>
>>> .
>>>
>>
>
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2017-02-28  1:48               ` Hailiang Zhang
@ 2017-02-28  8:30                 ` Christian Pinto
  2017-02-28 16:14                 ` Andrea Arcangeli
  1 sibling, 0 replies; 48+ messages in thread
From: Christian Pinto @ 2017-02-28  8:30 UTC (permalink / raw)
  To: Hailiang Zhang, Andrea Arcangeli, Juan Quintela, dgilbert
  Cc: xuquan8, peter.huangpeng, Baptiste Reynal, qemu list,
	Huangweidong (C),
	colo-ft

Thanks a lot Hailiang


On 28/02/2017 02:48, Hailiang Zhang wrote:
> Hi,
>
> On 2017/2/27 23:37, Christian Pinto wrote:
>> Hello Hailiang,
>>
>> are there any updates on this patch series? Are you planning to release
>> a new version?
>>
>
> No, userfaultfd still does not support write-protect for KVM.
> You can see the newest discussion about it here:
> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html
>

Yes, I have read that part of the discussion and quickly managed to 
reproduce the "Bad address" on ARMv8.

>> You say there are some issues with the current snapshot-v2 version,
>> which issues were you referring to? On my side the only problem I have
>> seen was that the live snapshot was not working on ARMv8, but I have
>> fixed that and managed to successfully snapshot and restore a QEMU ARMv8
>> tcg machine on an ARMv8 host. I will gladly contribute with these fixes
>> once you will release a new version of the patches.
>>
>
> Yes, for current implementing of live snapshot, it supports tcg,
> but does not support kvm mode, the reason i have mentioned above,
> if you try to implement it, i think you need to start from userfaultfd
> supporting KVM. There is scenario for it, But I'm blocked by other things
> these days. I'm glad to discuss with you if you planed to do it.
>

I will have a deeper look at why userfault is not yet working with KVM 
and get back on this thread for feedback/suggestions.

Thanks,
Christian


> Thanks.
> Hailiang
>
>>
>> Thanks a lot,
>>
>> Christian
>>
>> On 20/08/2016 08:31, Hailiang Zhang wrote:
>>> Hi,
>>>
>>> I updated this series, but didn't post it, because there are some
>>> problems while i tested the snapshot function.
>>> I didn't know if it is the userfaultfd issue or not.
>>> I don't have time to investigate it this month. I have put them in 
>>> github
>>> https://github.com/coloft/qemu/tree/snapshot-v2
>>>
>>> Anyone who want to test and modify it are welcomed!
>>>
>>> Besides, will you join the linuxcon or KVM forum in Canada ?
>>> I wish to see you there if you join the conference ;)
>>>
>>> Thanks,
>>> Hailiang
>>>
>>>
>>>
>>> On 2016/8/18 23:56, Andrea Arcangeli wrote:
>>>> Hello everyone,
>>>>
>>>> I've an aa.git tree uptodate on the master & userfault branch (master
>>>> includes other pending VM stuff, userfault branch only contains
>>>> userfault enhancements):
>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault 
>>>>
>>>>
>>>>
>>>> I didn't have time to test KVM live memory snapshot on it yet as I'm
>>>> still working to improve it. Did anybody test it? However I'd be happy
>>>> to take any bugreports and quickly solve anything that isn't working
>>>> right with the shadow MMU.
>>>>
>>>> I got positive report already for another usage of the uffd WP 
>>>> support:
>>>>
>>>> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f 
>>>>
>>>>
>>>>
>>>> The last few things I'm working on to finish the WP support are:
>>>>
>>>> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>>>>      vma->vm_flags with VM_UFFD_WP set, which swap entries were
>>>>      generated while the pte was wrprotected.
>>>>
>>>> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>>>>      needed too... and that requires spare software bits on the pte
>>>>      which are available on x86. I considered also taking over the
>>>>      soft_dirty bit but then you couldn't do checkpoint restore of a
>>>>      JIT/to-native compiler that uses uffd WP support so it wasn't
>>>>      ideal. Perhaps it would be ok as an incremental patch to make the
>>>>      two options mutually exclusive to defer the arch changes that
>>>>      pte_mkuffd_wp would require for later.
>>>>
>>>> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>>>>      cow in userfaultfd_writeprotect.
>>>>
>>>> 4) WP selftest
>>>>
>>>> In theory things should work ok already if the userland code is
>>>> tolerant against false positives through swap and after fork() and
>>>> KSM. For an usage like snapshotting false positives shouldn't be an
>>>> issue (it'll just run slower if you swap in the worst case), and point
>>>> 3) above also isn't an issue because it's going to register into uffd
>>>> with WP only.
>>>>
>>>> The current status includes:
>>>>
>>>> 1) WP support for anon (with false positives.. work in progress)
>>>>
>>>> 2) MISSING support for tmpfs and hugetlbfs
>>>>
>>>> 3) non cooperative support
>>>>
>>>> Thanks,
>>>> Andrea
>>>>
>>>> .
>>>>
>>>
>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2017-02-28  1:48               ` Hailiang Zhang
  2017-02-28  8:30                 ` Christian Pinto
@ 2017-02-28 16:14                 ` Andrea Arcangeli
  2017-03-01  1:08                   ` Hailiang Zhang
  1 sibling, 1 reply; 48+ messages in thread
From: Andrea Arcangeli @ 2017-02-28 16:14 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Christian Pinto, Juan Quintela, dgilbert, xuquan8,
	peter.huangpeng, Baptiste Reynal, qemu list, Huangweidong (C),
	colo-ft

Hello,

On Tue, Feb 28, 2017 at 09:48:26AM +0800, Hailiang Zhang wrote:
> Yes, for current implementing of live snapshot, it supports tcg,
> but does not support kvm mode, the reason i have mentioned above,
> if you try to implement it, i think you need to start from userfaultfd
> supporting KVM. There is scenario for it, But I'm blocked by other things
> these days. I'm glad to discuss with you if you planed to do it.

Yes, there were other urgent userfaultfd features needed by QEMU and
CRIU queued for merging (hugetlbfs/shmem/non-cooperative support) and
they're all included upstream now. Now that such work is finished,
fixing the WP support to work with KVM and to provide full accuracy
will be the next thing to do.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
  2017-02-28 16:14                 ` Andrea Arcangeli
@ 2017-03-01  1:08                   ` Hailiang Zhang
  0 siblings, 0 replies; 48+ messages in thread
From: Hailiang Zhang @ 2017-03-01  1:08 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: xuquan8, Christian Pinto, Juan Quintela, dgilbert,
	peter.huangpeng, Baptiste Reynal, qemu list, Huangweidong (C),
	colo-ft

On 2017/3/1 0:14, Andrea Arcangeli wrote:
> Hello,
>
> On Tue, Feb 28, 2017 at 09:48:26AM +0800, Hailiang Zhang wrote:
>> Yes, for current implementing of live snapshot, it supports tcg,
>> but does not support kvm mode, the reason i have mentioned above,
>> if you try to implement it, i think you need to start from userfaultfd
>> supporting KVM. There is scenario for it, But I'm blocked by other things
>> these days. I'm glad to discuss with you if you planed to do it.
>
> Yes, there were other urgent userfaultfd features needed by QEMU and
> CRIU queued for merging (hugetlbfs/shmem/non-cooperative support) and
> they're all included upstream now. Now that such work is finished,
> fixing the WP support to work with KVM and to provide full accuracy
> will be the next thing to do.
>

Great, looking forward to it. thanks.

> Thanks,
> Andrea
>
> .
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd
  2016-08-20  6:31           ` Hailiang Zhang
  2017-02-27 15:37             ` Christian Pinto
@ 2017-03-09 11:34             ` Christian Pinto
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread Christian Pinto
                                 ` (4 more replies)
  1 sibling, 5 replies; 48+ messages in thread
From: Christian Pinto @ 2017-03-09 11:34 UTC (permalink / raw)
  To: zhang.zhanghailiang
  Cc: b.reynal, aarcange, quintela, dgilbert, amit.shah,
	peter.huangpeng, hanweidong, qemu-devel, tech, Christian Pinto

This patch series introduces a set of fixes to the previous work proposed by
Hailiang Zhang to enable in QEMU live memory snapshot based
on userfaultfd. See discussion here:
http://www.mail-archive.com/qemu-devel@nongnu.org/msg393118.html

These patches apply on top of: 
https://github.com/coloft/qemu/tree/snapshot-v2
that is the latest version of Hailiang's work, and rely on the latest work on
userfaultfd available on Andrea Arcangeli's Linux kernel tree:
https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

The original work was mainly tested on x86 tcg machines and was not working
ARM/ARM64 tcg.
The fixes presented in this series enable the live memory snapshot
to work for ARM64 tcg guests running on top of an ARM64 host.

The main problems encountered were:
    - QEMU uses for ARM a memory page size of 1KB. Even though this size is not
      supported by the Linux kernel, is is kept for backward compatibility
      with older ARM CPU MMUs. Initial work was write-unprotecting pages with
      a granularity not always aligned with host page size, causing userfaultfd
      to fail.
    - The VM execution was resumed right before the status of the migration
      was switched from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_ACTIVE.
      This was causing again the VM to trigger a "Bus error", due to wrong
      status of some memory pages.
    - When unprotecting a memory page the flag
      UFFDIO_WRITEPROTECT_MODE_DONTWAKE was used. This way, after a page is
      copied into snapshot file, the virtual machine execution is not resumed.


To test the patches on an ARM64 host, boot an ARM64 tcg machine:

qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
        -m 256 -kernel Image \
        -initrd rootfs.cpio.gz \
        -append "earlyprintk rw console=ttyAMA0" \
        -net nic -net user \
        -nographic -serial pty -monitor stdio

start migration from QEMU monitor:

    (qemu) migrate file:/root/test_snapshot


resume VM form snapshot:

qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
        -m 256 -kernel Image \
        -initrd rootfs.cpio.gz \
        -append "earlyprintk rw console=ttyAMA0" \
        -net nic -net user \
        -nographic -serial stdio -monitor pty \
        -incoming file:/root/test_snapshot

Christian Pinto (4):
  migration/postcopy-ram: check pagefault flags in userfaultfd thread
  migration/ram: Fix for ARM/ARM64 page size
  migration: snapshot thread
  migration/postcopy-ram: ram_set_pages_wp fix

 migration/migration.c    |  9 +++++----
 migration/postcopy-ram.c | 25 ++++++++-----------------
 migration/ram.c          | 18 ++++++++++++++----
 3 files changed, 27 insertions(+), 25 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread
  2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
@ 2017-03-09 11:34               ` Christian Pinto
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 2/4] migration/ram: Fix for ARM/ARM64 page size Christian Pinto
                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Christian Pinto @ 2017-03-09 11:34 UTC (permalink / raw)
  To: zhang.zhanghailiang
  Cc: b.reynal, aarcange, quintela, dgilbert, amit.shah,
	peter.huangpeng, hanweidong, qemu-devel, tech, Christian Pinto

The UFFD_PAGEFAULT_FLAG_WP should be set every time the page fault is due to a
write to a write-protected page. Flag should be checked at every time
to be sure the page fault is due to a write into WP area.

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
---
 migration/postcopy-ram.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ea70bd5d16..9c45f1059f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -534,7 +534,12 @@ static void *postcopy_ram_fault_thread(void *opaque)
                 migrate_send_rp_req_pages(mis, NULL,
                                           rb_offset, hostpagesize);
             }
-        } else { /* UFFDIO_REGISTER_MODE_WP */
+        } else if (msg.arg.pagefault.flags &
+                    UFFD_PAGEFAULT_FLAG_WP) { /* UFFDIO_REGISTER_MODE_WP */
+            /*
+             * msg.arg.pagefault.flags &UFFD_PAGEFAULT_FLAG_WP expected to
+             * be set in case of pagefault due to write protected page
+             * */
             MigrationState *ms = container_of(us, MigrationState,
                                               userfault_state);
             ret = ram_save_queue_pages(ms, qemu_ram_get_idstr(rb), rb_offset,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC PATCH 2/4] migration/ram: Fix for ARM/ARM64 page size
  2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread Christian Pinto
@ 2017-03-09 11:34               ` Christian Pinto
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 3/4] migration: snapshot thread Christian Pinto
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Christian Pinto @ 2017-03-09 11:34 UTC (permalink / raw)
  To: zhang.zhanghailiang
  Cc: b.reynal, aarcange, quintela, dgilbert, amit.shah,
	peter.huangpeng, hanweidong, qemu-devel, tech, Christian Pinto

Architecture such as ARM use a page size of 1KB, while write protection
is done at the granularity of host pages (generally 4KB).
All addresses must always be aligned to the size of a host page.

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
---
 migration/postcopy-ram.c |  6 +++---
 migration/ram.c          | 18 ++++++++++++++----
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9c45f1059f..97382067b3 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -373,7 +373,7 @@ int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
     return 0;
 }
 
-int ram_set_pages_wp(uint64_t page_addr,
+int ram_set_pages_wp(ram_addr_t page_addr,
                      uint64_t size,
                      bool remove,
                      int uffd)
@@ -556,10 +556,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
             * will be an deadlock error.
             */
             if (migration_in_setup(ms)) {
-                uint64_t host = msg.arg.pagefault.address;
+                ram_addr_t host = msg.arg.pagefault.address;
 
                 host &= ~(hostpagesize - 1);
-                ret = ram_set_pages_wp(host, getpagesize(), true,
+                ret = ram_set_pages_wp(host, hostpagesize, true,
                                        us->userfault_fd);
                 if (ret < 0) {
                     error_report("Remove page's write-protect failed");
diff --git a/migration/ram.c b/migration/ram.c
index 3417c56f29..7a3b1c7ed3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1383,14 +1383,24 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
             /* For snapshot, we will remove the page write-protect here */
             if (migration_in_snapshot(ms)) {
                 int ret;
-                uint64_t host_addr = (uint64_t)(pss.block->host + pss.offset);
-
+                /*
+                 * Some architectures in QEMU use a smaller memory page size
+                 * with respect to the host page size. ARM as an example
+                 * uses 1K memory pages, while Linux supports pages of minimum
+                 * size of 4K.
+                 * Userfault write protection works at the level of a host page
+                 * and thus one full host page has to be protected/unprotected
+                 * every time.
+                 */
+                ram_addr_t host_addr = (ram_addr_t)(pss.block->host +
+                                    pss.offset) & (~(qemu_host_page_size - 1));
                 ret = ram_set_pages_wp(host_addr, getpagesize(), true,
                                        ms->userfault_state.userfault_fd);
                 if (ret < 0) {
                     error_report("Failed to remove the write-protect for page:"
-                                 "%"PRIx64 " length: %d, block: %s", host_addr,
-                                 getpagesize(), pss.block->idstr);
+                                "%"PRIx64 " length: %d, offset: %"PRIx64
+                                ", block: %s", host_addr, getpagesize(),
+                                pss.offset, pss.block->idstr);
                 }
             }
         }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC PATCH 3/4] migration: snapshot thread
  2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread Christian Pinto
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 2/4] migration/ram: Fix for ARM/ARM64 page size Christian Pinto
@ 2017-03-09 11:34               ` Christian Pinto
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 4/4] migration/postcopy-ram: ram_set_pages_wp fix Christian Pinto
  2017-03-09 17:46               ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd Dr. David Alan Gilbert
  4 siblings, 0 replies; 48+ messages in thread
From: Christian Pinto @ 2017-03-09 11:34 UTC (permalink / raw)
  To: zhang.zhanghailiang
  Cc: b.reynal, aarcange, quintela, dgilbert, amit.shah,
	peter.huangpeng, hanweidong, qemu-devel, tech, Christian Pinto

VM execution was resumed while migration was still in setup.
This was causing a bus error because the userfault thread was
waking up VM too early during migration setup.

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
---
 migration/migration.c    |  9 +++++----
 migration/postcopy-ram.c | 14 --------------
 2 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f6d68ca020..19e8da1f84 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1894,15 +1894,16 @@ static void *snapshot_thread(void *opaque)
     postcopy_ram_enable_notify(&ms->userfault_state, UFFDIO_REGISTER_MODE_WP);
     buffer = qemu_save_device_buffer();
 
+    migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP,
+            MIGRATION_STATUS_ACTIVE);
+
+    trace_snapshot_thread_setup_complete();
+
     if (old_vm_running) {
         vm_start();
     }
     qemu_mutex_unlock_iothread();
 
-    migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
-
-    trace_snapshot_thread_setup_complete();
-
     while (qemu_file_get_error(ms->to_dst_file) == 0) {
         if (qemu_savevm_state_iterate(ms->to_dst_file, false) > 0) {
             break;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 97382067b3..6252eb379a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -551,20 +551,6 @@ static void *postcopy_ram_fault_thread(void *opaque)
                 break;
             }
 
-            /* Note: In the setup process, snapshot_thread may modify VM's
-            * write-protected pages, we should not block it there, or there
-            * will be an deadlock error.
-            */
-            if (migration_in_setup(ms)) {
-                ram_addr_t host = msg.arg.pagefault.address;
-
-                host &= ~(hostpagesize - 1);
-                ret = ram_set_pages_wp(host, hostpagesize, true,
-                                       us->userfault_fd);
-                if (ret < 0) {
-                    error_report("Remove page's write-protect failed");
-                }
-            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC PATCH 4/4] migration/postcopy-ram: ram_set_pages_wp fix
  2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
                                 ` (2 preceding siblings ...)
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 3/4] migration: snapshot thread Christian Pinto
@ 2017-03-09 11:34               ` Christian Pinto
  2017-03-09 17:46               ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd Dr. David Alan Gilbert
  4 siblings, 0 replies; 48+ messages in thread
From: Christian Pinto @ 2017-03-09 11:34 UTC (permalink / raw)
  To: zhang.zhanghailiang
  Cc: b.reynal, aarcange, quintela, dgilbert, amit.shah,
	peter.huangpeng, hanweidong, qemu-devel, tech, Christian Pinto

setting UFFDIO_WRITEPROTECT_MODE_DONTWAKE when write un-protecting a page
does not wake up the faulting thread.
Set to 0 to force the faulting (VM) thread to wake-up.

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
---
 migration/postcopy-ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 6252eb379a..684faae614 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -384,7 +384,7 @@ int ram_set_pages_wp(ram_addr_t page_addr,
     wp_struct.range.start = (uint64_t)(uintptr_t)page_addr;
     wp_struct.range.len = size;
     if (remove) {
-        wp_struct.mode = UFFDIO_WRITEPROTECT_MODE_DONTWAKE;
+        wp_struct.mode = 0;
     } else {
         wp_struct.mode = UFFDIO_WRITEPROTECT_MODE_WP;
     }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd
  2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
                                 ` (3 preceding siblings ...)
  2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 4/4] migration/postcopy-ram: ram_set_pages_wp fix Christian Pinto
@ 2017-03-09 17:46               ` Dr. David Alan Gilbert
  2017-03-10  8:15                 ` Christian Pinto
  4 siblings, 1 reply; 48+ messages in thread
From: Dr. David Alan Gilbert @ 2017-03-09 17:46 UTC (permalink / raw)
  To: Christian Pinto
  Cc: zhang.zhanghailiang, b.reynal, aarcange, quintela,
	peter.huangpeng, hanweidong, qemu-devel, tech

* Christian Pinto (c.pinto@virtualopensystems.com) wrote:
> This patch series introduces a set of fixes to the previous work proposed by
> Hailiang Zhang to enable in QEMU live memory snapshot based
> on userfaultfd. See discussion here:
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg393118.html

Thanks for posting this,

> These patches apply on top of: 
> https://github.com/coloft/qemu/tree/snapshot-v2
> that is the latest version of Hailiang's work, and rely on the latest work on
> userfaultfd available on Andrea Arcangeli's Linux kernel tree:
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
> 
> The original work was mainly tested on x86 tcg machines and was not working
> ARM/ARM64 tcg.
> The fixes presented in this series enable the live memory snapshot
> to work for ARM64 tcg guests running on top of an ARM64 host.
> 
> The main problems encountered were:
>     - QEMU uses for ARM a memory page size of 1KB. Even though this size is not
>       supported by the Linux kernel, is is kept for backward compatibility
>       with older ARM CPU MMUs. Initial work was write-unprotecting pages with
>       a granularity not always aligned with host page size, causing userfaultfd
>       to fail.

Yes, Power similarly has a 4kb size for the target page size even though
the host kernel is normally a large page size.

>     - The VM execution was resumed right before the status of the migration
>       was switched from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_ACTIVE.
>       This was causing again the VM to trigger a "Bus error", due to wrong
>       status of some memory pages.
>     - When unprotecting a memory page the flag
>       UFFDIO_WRITEPROTECT_MODE_DONTWAKE was used. This way, after a page is
>       copied into snapshot file, the virtual machine execution is not resumed.
> 
> 
> To test the patches on an ARM64 host, boot an ARM64 tcg machine:
> 
> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>         -m 256 -kernel Image \
>         -initrd rootfs.cpio.gz \
>         -append "earlyprintk rw console=ttyAMA0" \
>         -net nic -net user \
>         -nographic -serial pty -monitor stdio
> 
> start migration from QEMU monitor:
> 
>     (qemu) migrate file:/root/test_snapshot
> 
> 
> resume VM form snapshot:
> 
> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>         -m 256 -kernel Image \
>         -initrd rootfs.cpio.gz \
>         -append "earlyprintk rw console=ttyAMA0" \
>         -net nic -net user \
>         -nographic -serial stdio -monitor pty \
>         -incoming file:/root/test_snapshot

Nice, what's your use case and how are you dealing with storage?

Dave

> Christian Pinto (4):
>   migration/postcopy-ram: check pagefault flags in userfaultfd thread
>   migration/ram: Fix for ARM/ARM64 page size
>   migration: snapshot thread
>   migration/postcopy-ram: ram_set_pages_wp fix
> 
>  migration/migration.c    |  9 +++++----
>  migration/postcopy-ram.c | 25 ++++++++-----------------
>  migration/ram.c          | 18 ++++++++++++++----
>  3 files changed, 27 insertions(+), 25 deletions(-)
> 
> -- 
> 2.11.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd
  2017-03-09 17:46               ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd Dr. David Alan Gilbert
@ 2017-03-10  8:15                 ` Christian Pinto
  0 siblings, 0 replies; 48+ messages in thread
From: Christian Pinto @ 2017-03-10  8:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, b.reynal, aarcange, quintela,
	peter.huangpeng, hanweidong, qemu-devel, tech

Hello Alan,


On 09/03/2017 18:46, Dr. David Alan Gilbert wrote:
> * Christian Pinto (c.pinto@virtualopensystems.com) wrote:
>> This patch series introduces a set of fixes to the previous work proposed by
>> Hailiang Zhang to enable in QEMU live memory snapshot based
>> on userfaultfd. See discussion here:
>> http://www.mail-archive.com/qemu-devel@nongnu.org/msg393118.html
> Thanks for posting this,
>
>> These patches apply on top of:
>> https://github.com/coloft/qemu/tree/snapshot-v2
>> that is the latest version of Hailiang's work, and rely on the latest work on
>> userfaultfd available on Andrea Arcangeli's Linux kernel tree:
>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>>
>> The original work was mainly tested on x86 tcg machines and was not working
>> ARM/ARM64 tcg.
>> The fixes presented in this series enable the live memory snapshot
>> to work for ARM64 tcg guests running on top of an ARM64 host.
>>
>> The main problems encountered were:
>>      - QEMU uses for ARM a memory page size of 1KB. Even though this size is not
>>        supported by the Linux kernel, is is kept for backward compatibility
>>        with older ARM CPU MMUs. Initial work was write-unprotecting pages with
>>        a granularity not always aligned with host page size, causing userfaultfd
>>        to fail.
> Yes, Power similarly has a 4kb size for the target page size even though
> the host kernel is normally a large page size.

The fix included in this series should solve the problem for Power as well,
since it is making sure the address passed to userfaultfd is aligned
to the host page size. So, if someone in the Power community is
interested in this functionality, this fix might come handy.

>
>>      - The VM execution was resumed right before the status of the migration
>>        was switched from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_ACTIVE.
>>        This was causing again the VM to trigger a "Bus error", due to wrong
>>        status of some memory pages.
>>      - When unprotecting a memory page the flag
>>        UFFDIO_WRITEPROTECT_MODE_DONTWAKE was used. This way, after a page is
>>        copied into snapshot file, the virtual machine execution is not resumed.
>>
>>
>> To test the patches on an ARM64 host, boot an ARM64 tcg machine:
>>
>> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>>          -m 256 -kernel Image \
>>          -initrd rootfs.cpio.gz \
>>          -append "earlyprintk rw console=ttyAMA0" \
>>          -net nic -net user \
>>          -nographic -serial pty -monitor stdio
>>
>> start migration from QEMU monitor:
>>
>>      (qemu) migrate file:/root/test_snapshot
>>
>>
>> resume VM form snapshot:
>>
>> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>>          -m 256 -kernel Image \
>>          -initrd rootfs.cpio.gz \
>>          -append "earlyprintk rw console=ttyAMA0" \
>>          -net nic -net user \
>>          -nographic -serial stdio -monitor pty \
>>          -incoming file:/root/test_snapshot
> Nice, what's your use case and how are you dealing with storage?

This is a work done in the context of a H2020 European Project
named ExaNoDe (http://exanode.eu) that is building a prototype ARM64
based compute node for the exascale (computing capabilities in the order
of the Exaflop) domain. In this project, targeting HPC, scientific 
applications
using MPI will be executed in virtualized computing nodes (KVM VMs),
rather than directly on physical machines. This is mainly to improve the
manageability of the overall system and ease the task of separating
different workloads. The work done on live memory snapshot is meant
to tackle the problem of system resiliency, reducing the overall impact
on the virtualized software, and leading to higher availability of the
virtualized computing nodes.

For the time being we are focusing on memory, and storage has not yet
been taken into consideration. However, at a first glance I would say that
storage in QEMU is already using CoW that could be useful for this scenario
as well.


Thanks,

Christian

>
> Dave
>
>> Christian Pinto (4):
>>    migration/postcopy-ram: check pagefault flags in userfaultfd thread
>>    migration/ram: Fix for ARM/ARM64 page size
>>    migration: snapshot thread
>>    migration/postcopy-ram: ram_set_pages_wp fix
>>
>>   migration/migration.c    |  9 +++++----
>>   migration/postcopy-ram.c | 25 ++++++++-----------------
>>   migration/ram.c          | 18 ++++++++++++++----
>>   3 files changed, 27 insertions(+), 25 deletions(-)
>>
>> -- 
>> 2.11.0
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-03-10  8:15 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls zhanghailiang
2016-07-13 16:12   ` Dr. David Alan Gilbert
2016-07-14  5:27     ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 03/13] migration: Allow -incoming " zhanghailiang
2016-01-11 20:02   ` Dr. David Alan Gilbert
2016-01-12 13:04     ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 04/13] migration: Create a snapshot thread to realize saving memory snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 05/13] migration: implement initialization work for snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 06/13] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 07/13] savevm: Split qemu_savevm_state_complete_precopy() into two helper functions zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 08/13] snapshot: Save VM's device state into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 09/13] migration/postcopy-ram: fix some helper functions to support userfaultfd write-protect zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 10/13] snapshot: Enable the write-protect notification capability for VM's RAM zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 11/13] snapshot/migration: Save VM's RAM into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus zhanghailiang
2016-01-11 17:55   ` Dr. David Alan Gilbert
2016-01-12 12:59     ` Hailiang Zhang
2016-01-07 12:20 ` [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage zhanghailiang
2016-07-13 17:52   ` Dr. David Alan Gilbert
2016-07-14  8:02     ` Hailiang Zhang
2016-07-04 12:22 ` [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd Baptiste Reynal
2016-07-05  1:49   ` Hailiang Zhang
2016-07-05  9:57     ` Baptiste Reynal
2016-07-05 10:27       ` Hailiang Zhang
2016-08-18 15:56         ` Andrea Arcangeli
2016-08-20  6:31           ` Hailiang Zhang
2017-02-27 15:37             ` Christian Pinto
2017-02-28  1:48               ` Hailiang Zhang
2017-02-28  8:30                 ` Christian Pinto
2017-02-28 16:14                 ` Andrea Arcangeli
2017-03-01  1:08                   ` Hailiang Zhang
2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 2/4] migration/ram: Fix for ARM/ARM64 page size Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 3/4] migration: snapshot thread Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 4/4] migration/postcopy-ram: ram_set_pages_wp fix Christian Pinto
2017-03-09 17:46               ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd Dr. David Alan Gilbert
2017-03-10  8:15                 ` Christian Pinto
2016-09-06  3:39           ` [Qemu-devel] [RFC 00/13] Live " Hailiang Zhang
2016-09-18  2:14             ` Hailiang Zhang
2016-12-08 12:45               ` Hailiang Zhang
2016-07-05 14:59       ` Andrea Arcangeli
2016-07-13 18:02 ` Dr. David Alan Gilbert
2016-07-14 10:24   ` Hailiang Zhang
2016-07-14 11:43     ` Dr. David Alan Gilbert
2016-07-19  6:53       ` Hailiang Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.