All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 00/17] Background snapshots
@ 2018-07-18 15:41 Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 01/17] migration: add background snapshot capability Denis Plotnikov
                   ` (17 more replies)
  0 siblings, 18 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The workflow to make a snapshot is the following:
1. Pause the vm
2. Make a snapshot of block devices using the scheme of your choice
3. Turn on background-snapshot migration capability
4. Start the migration using the destination (migration stream) of your choice.
   The migration will resume the vm execution by itself
   when it has the devices' states saved  and is ready to start ram writing
   to the migration stream.
5. Listen to the migration finish event

The bakground snapshot works with support of KVM patch:
"x86: mmu: report failed memory access to the userspace"
(not applied to the mainstream, it's in the kvm mailing list)

--
Change log:

v0 => v1:
============
the patch series has been split in smaller chunks

Denis Plotnikov (17):
  migration: add background snapshot capability
  bitops: add some atomic versions of bitmap operations
  threads: add infrastructure to process sigsegv
  background snapshot: make a dedicated type for ram block list
  ram: extend the data structures for background snapshotting
  background snapshot: add helpers to manage a copy of ram block list
  background snapshot: introduce page buffer
  migration: add helpers to change VM memory protection rights
  background snapshot: extend RAM request for holding a page copy
    pointer
  background snapshots: adapt the page queueing code for using page
    copies
  background snapshot: add a memory page copying function
  ram: add background snapshot support in ram page saving part of
    migration
  background snapshot: add write-protected page access handler function
  kvm: add failed memeory access exit reason
  kvm: add vCPU failed memeory access processing
  migration: move the device state saving logic to a separate function
  background snapshot: enable background snapshot

 include/exec/ram_addr.h   |   7 +
 include/exec/ramlist.h    |   4 +-
 include/qemu/bitops.h     |  25 +++
 include/qemu/thread.h     |   5 +
 linux-headers/linux/kvm.h |   5 +
 migration/migration.c     | 140 +++++++++++++-
 migration/migration.h     |   1 +
 migration/ram.c           | 374 ++++++++++++++++++++++++++++++++++++--
 migration/ram.h           |  17 +-
 migration/savevm.c        |  91 +++++-----
 migration/savevm.h        |   2 +
 qapi/migration.json       |   6 +-
 target/i386/kvm.c         |  17 ++
 util/qemu-thread-posix.c  |  52 ++++++
 14 files changed, 684 insertions(+), 62 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 01/17] migration: add background snapshot capability
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  5:14   ` Peter Xu
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 02/17] bitops: add some atomic versions of bitmap operations Denis Plotnikov
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The capability is used for the background vmstate saving
using the migration infrastructure.
Background vmstate saving means that the majority of vmstate
(RAM) is saved in the background when VM's vCPUS are running.
This helps to reduce the VM downtime on VM snapshotting.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/migration.c | 35 +++++++++++++++++++++++++++++++++++
 migration/migration.h |  1 +
 qapi/migration.json   |  6 +++++-
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index d780601f0c..87096d23ef 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -697,6 +697,12 @@ static bool migrate_caps_check(bool *cap_list,
             return false;
         }
 
+        if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
+            error_setg(errp, "Postcopy is not compatible "
+                        "with background snapshot");
+            return false;
+        }
+
         /* This check is reasonably expensive, so only when it's being
          * set the first time, also it's only the destination that needs
          * special support.
@@ -711,6 +717,26 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+    if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
+        if (cap_list[MIGRATION_CAPABILITY_RELEASE_RAM]) {
+            error_setg(errp, "Background snapshot is not compatible "
+                        "with release ram capability");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+            error_setg(errp, "Background snapshot is not "
+                        "currently compatible with compression");
+            return false;
+        }
+
+        if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
+            error_setg(errp, "Background snapshot is not "
+                        "currently compatible with XBZLRE");
+            return false;
+        }
+    }
+
     return true;
 }
 
@@ -1635,6 +1661,15 @@ bool migrate_use_block_incremental(void)
     return s->parameters.block_incremental;
 }
 
+bool migrate_background_snapshot(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
diff --git a/migration/migration.h b/migration/migration.h
index 663415fe48..21babbc008 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -201,6 +201,7 @@ int migrate_compress_level(void);
 int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
 bool migrate_use_events(void);
+bool migrate_background_snapshot(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index 03f57c9616..eb5f52b7ff 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -352,12 +352,16 @@
 #
 # @x-multifd: Use more than one fd for migration (since 2.11)
 #
+# @background-snapshot: Using migration infrastructure makes VM snapshot
+#         saving its RAM in background. This reduces  VM downtime. (since 2.12)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
            'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
-           'block', 'return-path', 'pause-before-switchover', 'x-multifd' ] }
+           'block', 'return-path', 'pause-before-switchover', 'x-multifd',
+           'background-snapshot' ] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 02/17] bitops: add some atomic versions of bitmap operations
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 01/17] migration: add background snapshot capability Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  5:09   ` Peter Xu
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 03/17] threads: add infrastructure to process sigsegv Denis Plotnikov
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

1. test bit
2. test and set bit

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/qemu/bitops.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 3f0926cf40..72afcfaec5 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -94,6 +94,21 @@ static inline int test_and_set_bit(long nr, unsigned long *addr)
     return (old & mask) != 0;
 }
 
+/**
+ * test_and_set_bit_atomic - Set a bit atomically and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ */
+static inline int test_and_set_bit_atomic(long nr, unsigned long *addr)
+{
+    unsigned long mask = BIT_MASK(nr);
+    unsigned long *p = addr + BIT_WORD(nr);
+    unsigned long old;
+
+    old = atomic_fetch_or(p, mask);
+    return (old & mask) != 0;
+}
+
 /**
  * test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
@@ -134,6 +149,16 @@ static inline int test_bit(long nr, const unsigned long *addr)
     return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+/**
+ * test_bit_atomic - Determine whether a bit is set atomicallly
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static inline int test_bit_atomic(long nr, const unsigned long *addr)
+{
+    long valid_nr = nr & (BITS_PER_LONG - 1);
+    return 1UL & (atomic_read(&addr[BIT_WORD(nr)]) >> valid_nr);
+}
 /**
  * find_last_bit - find the last set bit in a memory region
  * @addr: The address to start the search at
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 03/17] threads: add infrastructure to process sigsegv
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 01/17] migration: add background snapshot capability Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 02/17] bitops: add some atomic versions of bitmap operations Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  4:58   ` Peter Xu
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 04/17] background snapshot: make a dedicated type for ram block list Denis Plotnikov
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

Allows to define sigsegv handler temporary for all threads.
This is useful to implement copy-on-write logic while
linux usefaultfd doesn't support write-protected faults.
In the future, switch to using WP userfaultfd when it's
available.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/qemu/thread.h    |  5 ++++
 util/qemu-thread-posix.c | 52 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 9910f49b3a..d6fed833fa 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -210,4 +210,9 @@ void qemu_lockcnt_inc_and_unlock(QemuLockCnt *lockcnt);
  */
 unsigned qemu_lockcnt_count(QemuLockCnt *lockcnt);
 
+
+typedef void (*sigsegv_handler)(int signum, siginfo_t *siginfo, void *sigctx);
+void sigsegv_user_handler_set(sigsegv_handler handler);
+void sigsegv_user_handler_reset(void);
+
 #endif
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 7306475899..5424b7106d 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -489,6 +489,47 @@ static void qemu_thread_set_name(QemuThread *thread, const char *name)
 #endif
 }
 
+static sigsegv_handler sigsegv_user_handler;
+
+void sigsegv_user_handler_set(sigsegv_handler handler)
+{
+    assert(handler);
+    atomic_set(&sigsegv_user_handler, handler);
+}
+
+static sigsegv_handler sigsegv_user_handler_get(void)
+{
+    return atomic_read(&sigsegv_user_handler);
+}
+
+void sigsegv_user_handler_reset(void)
+{
+    atomic_set(&sigsegv_user_handler, NULL);
+}
+
+static void sigsegv_default_handler(int signum, siginfo_t *siginfo, void *sigctx)
+{
+    sigsegv_handler handler = sigsegv_user_handler_get();
+
+    if (!handler) {
+        /*
+         * remove the sigsegv handler if it's not set by user
+         * this will lead to re-raising the error without a handler
+         * and exiting from the program with "Segmentation fault"
+         */
+        int err;
+        struct sigaction act;
+        memset(&act, 0, sizeof(act));
+        act.sa_flags = SA_RESETHAND;
+        err = sigaction(SIGSEGV, &act, NULL);
+        if (err) {
+            error_exit(err, __func__);
+        }
+    } else {
+        handler(signum, siginfo, sigctx);
+    }
+}
+
 void qemu_thread_create(QemuThread *thread, const char *name,
                        void *(*start_routine)(void*),
                        void *arg, int mode)
@@ -496,14 +537,25 @@ void qemu_thread_create(QemuThread *thread, const char *name,
     sigset_t set, oldset;
     int err;
     pthread_attr_t attr;
+    struct sigaction act;
 
     err = pthread_attr_init(&attr);
     if (err) {
         error_exit(err, __func__);
     }
 
+    memset(&act, 0, sizeof(act));
+    act.sa_flags = SA_SIGINFO;
+    act.sa_sigaction = sigsegv_default_handler;
+    err = sigaction(SIGSEGV, &act, NULL);
+    if (err) {
+        error_exit(err, __func__);
+    }
+
     /* Leave signal handling to the iothread.  */
     sigfillset(&set);
+    /* ...all but SIGSEGV */
+    sigdelset(&set, SIGSEGV);
     pthread_sigmask(SIG_SETMASK, &set, &oldset);
     err = pthread_create(&thread->thread, &attr, start_routine, arg);
     if (err)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 04/17] background snapshot: make a dedicated type for ram block list
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (2 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 03/17] threads: add infrastructure to process sigsegv Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 05/17] ram: extend the data structures for background snapshotting Denis Plotnikov
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The type will be used later to hold a list of ram blocks which
memory content to be used for the vm state saving when making
a background snapshot.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/exec/ramlist.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 2e2ac6cb99..e0231d3bec 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -44,11 +44,13 @@ typedef struct {
     unsigned long *blocks[];
 } DirtyMemoryBlocks;
 
+typedef QLIST_HEAD(, RAMBlock) RamBlockList;
+
 typedef struct RAMList {
     QemuMutex mutex;
     RAMBlock *mru_block;
     /* RCU-enabled, writes protected by the ramlist lock. */
-    QLIST_HEAD(, RAMBlock) blocks;
+    RamBlockList blocks;
     DirtyMemoryBlocks *dirty_memory[DIRTY_MEMORY_NUM];
     uint32_t version;
     QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 05/17] ram: extend the data structures for background snapshotting
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (3 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 04/17] background snapshot: make a dedicated type for ram block list Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  7:59   ` Peter Xu
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 06/17] background snapshot: add helpers to manage a copy of ram block list Denis Plotnikov
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/exec/ram_addr.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6cbc02aa0f..5b403d537d 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -36,6 +36,8 @@ struct RAMBlock {
     char idstr[256];
     /* RCU-enabled, writes protected by the ramlist lock */
     QLIST_ENTRY(RAMBlock) next;
+    /* blocks used for background snapshot */
+    QLIST_ENTRY(RAMBlock) bgs_next;
     QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
     int fd;
     size_t page_size;
@@ -49,6 +51,11 @@ struct RAMBlock {
     unsigned long *unsentmap;
     /* bitmap of already received pages in postcopy */
     unsigned long *receivedmap;
+    /* The following 2 are for background snapshot */
+    /* Pages currently being copied */
+    unsigned long *touched_map;
+    /* Pages has been copied already */
+    unsigned long *copied_map;
 };
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 06/17] background snapshot: add helpers to manage a copy of ram block list
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (4 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 05/17] ram: extend the data structures for background snapshotting Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  7:57   ` Peter Xu
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer Denis Plotnikov
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The VM ramblock list may be changed during the snapshotting.
We want to make sure that we have the same ramblock list as it
was at the time of snapshot beginning.
So, we create a copy of the list at the beginning of the snapshotting
and use it during the process.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/ram.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h |  6 ++++++
 2 files changed, 57 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 021d583b9b..4399c31606 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1508,6 +1508,57 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
     return pages;
 }
 
+/* BackGround Snapshot Blocks */
+static RamBlockList bgs_blocks;
+
+static RamBlockList *ram_bgs_block_list_get(void)
+{
+    return &bgs_blocks;
+}
+
+void ram_block_list_create(void)
+{
+    RAMBlock *block = NULL;
+    RamBlockList *block_list = ram_bgs_block_list_get();
+
+    qemu_mutex_lock_ramlist();
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        memory_region_ref(block->mr);
+        QLIST_INSERT_HEAD(block_list, block, bgs_next);
+    }
+    qemu_mutex_unlock_ramlist();
+}
+
+void ram_block_list_destroy(void)
+{
+    RAMBlock *next, *block;
+    RamBlockList *block_list = ram_bgs_block_list_get();
+
+    QLIST_FOREACH_SAFE(block, block_list, bgs_next, next) {
+        QLIST_REMOVE(block, bgs_next);
+        memory_region_unref(block->mr);
+    }
+}
+
+RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset)
+{
+    RAMBlock *block = NULL;
+    RamBlockList *block_list = ram_bgs_block_list_get();
+
+    QLIST_FOREACH(block, block_list, bgs_next) {
+        /* This case append when the block is not mapped. */
+        if (block->host == NULL) {
+            continue;
+        }
+
+        if (address - block->host < block->max_length) {
+            *page_offset = (address - block->host) & TARGET_PAGE_MASK;
+            return block;
+        }
+    }
+
+    return NULL;
+}
 /**
  * ram_find_and_save_block: finds a dirty page and sends it to f
  *
diff --git a/migration/ram.h b/migration/ram.h
index 64d81e9f1d..385579cae9 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -31,6 +31,7 @@
 
 #include "qemu-common.h"
 #include "exec/cpu-common.h"
+#include "exec/ramlist.h"
 
 extern MigrationStats ram_counters;
 extern XBZRLECacheStats xbzrle_counters;
@@ -61,5 +62,10 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
+/* For background snapshot */
+void ram_block_list_create(void);
+void ram_block_list_destroy(void);
+
+RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset);
 
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (5 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 06/17] background snapshot: add helpers to manage a copy of ram block list Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  9:22   ` Peter Xu
  2018-07-20 10:34   ` Paolo Bonzini
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 08/17] migration: add helpers to change VM memory protection rights Denis Plotnikov
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

To limit the amount of memory used by the background snapshot
a memory limiter is used which called "page buffer".
In fact, it's a memory page counter limiting the page allocation.
Currently, the limit of pages is hardcoded but its setter is
the matter of the future work.

The background snapshot makes a copy of the page before writing it
to the snapshot destination, but it doesn't use a preallocated buffer,
because the background snapshot employs the migration infrastructure
which, in turn, uses madvice to release the buffer.
The advice issuing is hard to track, so here we rely on anonymous
memory mapping and releasing it by the existing code.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/ram.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h |  3 ++
 2 files changed, 94 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 4399c31606..27d3403435 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -192,6 +192,16 @@ struct RAMSrcPageRequest {
     QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
+/* Page buffer used for background snapshot */
+typedef struct RAMPageBuffer {
+    /* Page buffer capacity in host pages */
+    int capacity;
+    /* Current number of pages in the buffer */
+    int used;
+    /* Event to notify that buffer usage is under capacity */
+    QemuEvent used_decreased;
+} RAMPageBuffer;
+
 /* State of RAM for migration */
 struct RAMState {
     /* QEMUFile used for this migration */
@@ -230,6 +240,11 @@ struct RAMState {
     /* Queue of outstanding page requests from the destination */
     QemuMutex src_page_req_mutex;
     QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
+    /* The following 2 are for background snapshot */
+    /* Buffer data to store copies of ram pages while async vm saving */
+    RAMPageBuffer page_buffer;
+    /* Event to notify that a page copying just has finished*/
+    QemuEvent page_copying_done;
 };
 typedef struct RAMState RAMState;
 
@@ -1540,6 +1555,52 @@ void ram_block_list_destroy(void)
     }
 }
 
+static void ram_page_buffer_decrease_used(void)
+{
+    qemu_event_reset(&ram_state->page_buffer.used_decreased);
+    atomic_dec(&ram_state->page_buffer.used);
+    qemu_event_set(&ram_state->page_buffer.used_decreased);
+}
+
+static void ram_page_buffer_increase_used_wait(void)
+{
+    int used, *used_ptr;
+    RAMState *rs = ram_state;
+    used_ptr = &rs->page_buffer.used;
+    do {
+        used = atomic_read(used_ptr);
+        if (rs->page_buffer.capacity > used) {
+            if (atomic_cmpxchg(used_ptr, used, used + 1) == used) {
+                return;
+            } else {
+                continue;
+            }
+        } else {
+            qemu_event_wait(&rs->page_buffer.used_decreased);
+        }
+    } while (true);
+}
+
+void *ram_page_buffer_get(void)
+{
+    void *page;
+    ram_page_buffer_increase_used_wait();
+    page = mmap(0, TARGET_PAGE_SIZE, PROT_READ | PROT_WRITE,
+                MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+    if (page == MAP_FAILED) {
+        ram_page_buffer_decrease_used();
+        page = NULL;
+    }
+    return page;
+}
+
+int ram_page_buffer_free(void *buffer)
+{
+    ram_page_buffer_decrease_used();
+    return qemu_madvise(buffer, TARGET_PAGE_SIZE, MADV_DONTNEED);
+}
+
 RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset)
 {
     RAMBlock *block = NULL;
@@ -1559,6 +1620,7 @@ RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset)
 
     return NULL;
 }
+
 /**
  * ram_find_and_save_block: finds a dirty page and sends it to f
  *
@@ -1651,6 +1713,13 @@ static void xbzrle_load_cleanup(void)
 
 static void ram_state_cleanup(RAMState **rsp)
 {
+    if (migrate_background_snapshot()) {
+        qemu_event_destroy(&(*rsp)->page_buffer.used_decreased);
+        /* In case somebody is still waiting for the event */
+        qemu_event_set(&(*rsp)->page_copying_done);
+        qemu_event_destroy(&(*rsp)->page_copying_done);
+    }
+
     migration_page_queue_free(*rsp);
     qemu_mutex_destroy(&(*rsp)->bitmap_mutex);
     qemu_mutex_destroy(&(*rsp)->src_page_req_mutex);
@@ -1689,6 +1758,13 @@ static void ram_save_cleanup(void *opaque)
         block->bmap = NULL;
         g_free(block->unsentmap);
         block->unsentmap = NULL;
+
+        if (migrate_background_snapshot()) {
+            g_free(block->touched_map);
+            block->touched_map = NULL;
+            g_free(block->copied_map);
+            block->copied_map = NULL;
+        }
     }
 
     xbzrle_cleanup();
@@ -1703,6 +1779,10 @@ static void ram_state_reset(RAMState *rs)
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->ram_bulk_stage = true;
+
+    /* page buffer capacity in number of pages */
+    rs->page_buffer.capacity = 1000;
+    rs->page_buffer.used = 0;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2180,6 +2260,11 @@ static int ram_state_init(RAMState **rsp)
      */
     (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
 
+    if (migrate_background_snapshot()) {
+        qemu_event_init(&ram_state->page_buffer.used_decreased, false);
+        qemu_event_init(&ram_state->page_copying_done, false);
+    }
+
     ram_state_reset(*rsp);
 
     return 0;
@@ -2196,10 +2281,16 @@ static void ram_list_init_bitmaps(void)
             pages = block->max_length >> TARGET_PAGE_BITS;
             block->bmap = bitmap_new(pages);
             bitmap_set(block->bmap, 0, pages);
+
             if (migrate_postcopy_ram()) {
                 block->unsentmap = bitmap_new(pages);
                 bitmap_set(block->unsentmap, 0, pages);
             }
+
+            if (migrate_background_snapshot()) {
+                block->touched_map = bitmap_new(pages);
+                block->copied_map = bitmap_new(pages);
+            }
         }
     }
 }
diff --git a/migration/ram.h b/migration/ram.h
index 385579cae9..7c802c1d7f 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -68,4 +68,7 @@ void ram_block_list_destroy(void);
 
 RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset);
 
+void *ram_page_buffer_get(void);
+int ram_page_buffer_free(void *buffer);
+
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 08/17] migration: add helpers to change VM memory protection rights
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (6 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20 11:28   ` Dr. David Alan Gilbert
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 09/17] background snapshot: extend RAM request for holding a page copy pointer Denis Plotnikov
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/ram.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 27d3403435..ce3dead932 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1555,6 +1555,60 @@ void ram_block_list_destroy(void)
     }
 }
 
+static int mem_protect(void *addr, uint64_t length, int prot)
+{
+    int ret = mprotect(addr, length, prot);
+
+    if (ret < 0) {
+        error_report("%s: Can't change protection on ram block at %p",
+                     __func__, addr);
+    }
+
+    return ret;
+}
+
+static int ram_set_ro(void *addr, uint64_t length)
+{
+    return mem_protect(addr, length, PROT_READ);
+}
+
+static int ram_set_rw(void *addr, uint64_t length)
+{
+    return mem_protect(addr, length, PROT_READ | PROT_WRITE);
+}
+
+int ram_block_list_set_readonly(void)
+{
+    RAMBlock *block = NULL;
+    RamBlockList *block_list = ram_bgs_block_list_get();
+    int ret = 0;
+
+    QLIST_FOREACH(block, block_list, bgs_next) {
+        ret = ram_set_ro(block->host, block->used_length);
+        if (ret) {
+            break;
+        }
+    }
+
+    return ret;
+}
+
+int ram_block_list_set_writable(void)
+{
+    RAMBlock *block = NULL;
+    RamBlockList *block_list = ram_bgs_block_list_get();
+    int ret = 0;
+
+    QLIST_FOREACH(block, block_list, bgs_next) {
+        ret = ram_set_rw(block->host, block->used_length);
+        if (ret) {
+            break;
+        }
+    }
+
+    return ret;
+}
+
 static void ram_page_buffer_decrease_used(void)
 {
     qemu_event_reset(&ram_state->page_buffer.used_decreased);
diff --git a/migration/ram.h b/migration/ram.h
index 7c802c1d7f..4c463597a5 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -71,4 +71,7 @@ RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset);
 void *ram_page_buffer_get(void);
 int ram_page_buffer_free(void *buffer);
 
+int ram_block_list_set_readonly(void);
+int ram_block_list_set_writable(void);
+
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 09/17] background snapshot: extend RAM request for holding a page copy pointer
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (7 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 08/17] migration: add helpers to change VM memory protection rights Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies Denis Plotnikov
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

This pointer is going to be used to transfer a memory.
Once the memory page is copied the content the snapshot interested in is
saved for writing and we can make the page writable again.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index ce3dead932..dc7dfe0726 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -188,6 +188,7 @@ struct RAMSrcPageRequest {
     RAMBlock *rb;
     hwaddr    offset;
     hwaddr    len;
+    void     *page_copy;
 
     QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
@@ -265,6 +266,8 @@ struct PageSearchStatus {
     unsigned long page;
     /* Set once we wrap around */
     bool         complete_round;
+    /* Pointer to the cached page */
+    void *page_copy;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (8 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 09/17] background snapshot: extend RAM request for holding a page copy pointer Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  8:39   ` Peter Xu
  2018-07-20 11:46   ` Peter Xu
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 11/17] background snapshot: add a memory page copying function Denis Plotnikov
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The background snapshot uses memeory page copying to seal the page
memory content. The patch adapts the migration infrastructure to save
copies of the pages.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/migration.c |  2 +-
 migration/ram.c       | 59 ++++++++++++++++++++++++++++++++-----------
 migration/ram.h       |  3 ++-
 3 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 87096d23ef..131d0904e4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1716,7 +1716,7 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
         return;
     }
 
-    if (ram_save_queue_pages(rbname, start, len)) {
+    if (ram_save_queue_pages(NULL, rbname, start, len, NULL)) {
         mark_source_rp_bad(ms);
     }
 }
diff --git a/migration/ram.c b/migration/ram.c
index dc7dfe0726..3c4ccd85b4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -976,7 +976,12 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     RAMBlock *block = pss->block;
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
 
-    p = block->host + offset;
+    if (pss->page_copy) {
+        p = pss->page_copy;
+    } else {
+        p = block->host + offset;
+    }
+
     trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
     /* In doubt sent page as normal */
@@ -1003,13 +1008,18 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     } else {
         pages = save_zero_page(rs, block, offset, p);
         if (pages > 0) {
-            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
-             * page would be stale
-             */
-            xbzrle_cache_zero_page(rs, current_addr);
-            ram_release_pages(block->idstr, offset, pages);
+            if (pss->page_copy) {
+                qemu_madvise(p, TARGET_PAGE_SIZE, MADV_DONTNEED);
+            } else {
+                /* Must let xbzrle know, otherwise a previous (now 0'd) cached
+                 * page would be stale
+                 */
+                xbzrle_cache_zero_page(rs, current_addr);
+                ram_release_pages(block->idstr, offset, pages);
+            }
         } else if (!rs->ram_bulk_stage &&
-                   !migration_in_postcopy() && migrate_use_xbzrle()) {
+                   !migration_in_postcopy() && migrate_use_xbzrle() &&
+                   !migrate_background_snapshot()) {
             pages = save_xbzrle_page(rs, &p, current_addr, block,
                                      offset, last_stage);
             if (!last_stage) {
@@ -1026,9 +1036,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
         ram_counters.transferred +=
             save_page_header(rs, rs->f, block, offset | RAM_SAVE_FLAG_PAGE);
         if (send_async) {
-            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
-                                  migrate_release_ram() &
-                                  migration_in_postcopy());
+            bool may_free = migrate_background_snapshot() ||
+                            (migrate_release_ram() &&
+                             migration_in_postcopy());
+            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE, may_free);
         } else {
             qemu_put_buffer(rs->f, p, TARGET_PAGE_SIZE);
         }
@@ -1269,7 +1280,8 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
  * @rs: current RAM state
  * @offset: used to return the offset within the RAMBlock
  */
-static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset)
+static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset,
+                              void **page_copy)
 {
     RAMBlock *block = NULL;
 
@@ -1279,10 +1291,14 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset)
                                 QSIMPLEQ_FIRST(&rs->src_page_requests);
         block = entry->rb;
         *offset = entry->offset;
+        *page_copy = entry->page_copy;
 
         if (entry->len > TARGET_PAGE_SIZE) {
             entry->len -= TARGET_PAGE_SIZE;
             entry->offset += TARGET_PAGE_SIZE;
+            if (entry->page_copy) {
+                entry->page_copy += TARGET_PAGE_SIZE / sizeof(void *);
+            }
         } else {
             memory_region_unref(block->mr);
             QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
@@ -1309,9 +1325,10 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
     RAMBlock  *block;
     ram_addr_t offset;
     bool dirty;
+    void *page_copy;
 
     do {
-        block = unqueue_page(rs, &offset);
+        block = unqueue_page(rs, &offset, &page_copy);
         /*
          * We're sending this page, and since it's postcopy nothing else
          * will dirty it, and we must make sure it doesn't get sent again
@@ -1349,6 +1366,7 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
          */
         pss->block = block;
         pss->page = offset >> TARGET_PAGE_BITS;
+        pss->page_copy = page_copy;
     }
 
     return !!block;
@@ -1386,17 +1404,25 @@ static void migration_page_queue_free(RAMState *rs)
  *
  * @rbname: Name of the RAMBLock of the request. NULL means the
  *          same that last one.
+ * @block: RAMBlock to use. block and rbname have mutualy exclusive
+ *         semantic with higher priority of the block.
  * @start: starting address from the start of the RAMBlock
  * @len: length (in bytes) to send
+ * @page_copy: the address the page should be written from
  */
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
+int ram_save_queue_pages(RAMBlock *block, const char *rbname,
+                         ram_addr_t start, ram_addr_t len, void *page_copy)
 {
     RAMBlock *ramblock;
     RAMState *rs = ram_state;
 
     ram_counters.postcopy_requests++;
+
     rcu_read_lock();
-    if (!rbname) {
+
+    if (block) {
+        ramblock = block;
+    } else if (!rbname) {
         /* Reuse last RAMBlock */
         ramblock = rs->last_req_rb;
 
@@ -1431,6 +1457,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
     new_entry->rb = ramblock;
     new_entry->offset = start;
     new_entry->len = len;
+    new_entry->page_copy = page_copy;
 
     memory_region_ref(ramblock->mr);
     qemu_mutex_lock(&rs->src_page_req_mutex);
@@ -1468,7 +1495,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
          * xbzrle can do better than compression.
          */
         if (migrate_use_compression() &&
-            (rs->ram_bulk_stage || !migrate_use_xbzrle())) {
+            (rs->ram_bulk_stage || !migrate_use_xbzrle()) &&
+            !migrate_background_snapshot()) {
             res = ram_save_compressed_page(rs, pss, last_stage);
         } else {
             res = ram_save_page(rs, pss, last_stage);
@@ -1706,6 +1734,7 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage)
     pss.block = rs->last_seen_block;
     pss.page = rs->last_page;
     pss.complete_round = false;
+    pss.page_copy = NULL;
 
     if (!pss.block) {
         pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
diff --git a/migration/ram.h b/migration/ram.h
index 4c463597a5..c3679ba65e 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -46,7 +46,8 @@ int multifd_load_setup(void);
 int multifd_load_cleanup(Error **errp);
 
 uint64_t ram_pagesize_summary(void);
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
+int ram_save_queue_pages(RAMBlock *block, const char *rbname,
+                         ram_addr_t start, ram_addr_t len, void *page_copy);
 void acct_update_position(QEMUFile *f, size_t size, bool zero);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected,
                            unsigned long pages);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 11/17] background snapshot: add a memory page copying function
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (9 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 12/17] ram: add background snapshot support in ram page saving part of migration Denis Plotnikov
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

It's the only function making a memory page copy.
It supports multithreading semantics ensuring that
the page is copied by one thread only and releasing
the copied page from write protection.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/ram.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h |  1 +
 2 files changed, 57 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 3c4ccd85b4..10b6fdf23e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1706,6 +1706,62 @@ RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset)
     return NULL;
 }
 
+/**
+ * ram_copy_page: make a page copy
+ *
+ * Used in the background snapshot to make a copy of a memeory page.
+ * Ensures that the memeory page is copied only once.
+ * When a page copy is done, restores read/write access to the memory
+ * page.
+ * If a page is being copied by another thread, wait until the copying
+ * ends and exit.
+ *
+ * Returns:
+ *   -1 - on error
+ *    0 - the page wasn't copied by the function call
+ *    1 - the page has been copied
+ *
+ * @block:     RAM block to use
+ * @page_nr:   the page number to copy
+ * @page_copy: the pointer to return a page copy
+ *
+ */
+int ram_copy_page(RAMBlock *block, unsigned long page_nr,
+                          void **page_copy)
+{
+    void *host_page;
+
+    if (test_and_set_bit_atomic(page_nr, block->touched_map)) {
+        while (!test_bit_atomic(page_nr, block->copied_map)) {
+            /* the page is being copied -- wait for the end of the copying
+             * and check once again
+             */
+            qemu_event_wait(&ram_state->page_copying_done);
+        }
+        return 0;
+    }
+
+    *page_copy = ram_page_buffer_get();
+    if (!*page_copy) {
+        return -1;
+    }
+
+    host_page = block->host + (page_nr << TARGET_PAGE_BITS);
+    memcpy(*page_copy, host_page, TARGET_PAGE_SIZE);
+
+    if (ram_set_rw(host_page, TARGET_PAGE_SIZE)) {
+        ram_page_buffer_free(*page_copy);
+        *page_copy = NULL;
+        return -1;
+    }
+
+    set_bit_atomic(page_nr, block->copied_map);
+    qemu_event_set(&ram_state->page_copying_done);
+    qemu_event_reset(&ram_state->page_copying_done);
+
+    return 1;
+}
+
 /**
  * ram_find_and_save_block: finds a dirty page and sends it to f
  *
diff --git a/migration/ram.h b/migration/ram.h
index c3679ba65e..76ab0a3377 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -75,4 +75,5 @@ int ram_page_buffer_free(void *buffer);
 int ram_block_list_set_readonly(void);
 int ram_block_list_set_writable(void);
 
+int ram_copy_page(RAMBlock *block, unsigned long page_nr, void **page_copy);
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 12/17] ram: add background snapshot support in ram page saving part of migration
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (10 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 11/17] background snapshot: add a memory page copying function Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 13/17] background snapshot: add write-protected page access handler function Denis Plotnikov
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

Unify the function saving the ram pages for using in both the migration
and the background snapshots.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/ram.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 10b6fdf23e..b1623e96e7 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1803,11 +1803,28 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage)
         if (!found) {
             /* priority queue empty, so just search for something dirty */
             found = find_dirty_block(rs, &pss, &again);
+
+            if (found && migrate_background_snapshot()) {
+                /* make a copy of the page and
+                 * pass it to the page search status */
+                int ret;
+                ret = ram_copy_page(pss.block, pss.page, &pss.page_copy);
+                if (ret == 0) {
+                    found = false;
+                    pages = 0;
+                } else if (ret < 0) {
+                    return ret;
+                }
+            }
         }
 
         if (found) {
             pages = ram_save_host_page(rs, &pss, last_stage);
         }
+
+        if (pss.page_copy) {
+            ram_page_buffer_decrease_used();
+        }
     } while (!pages && again);
 
     rs->last_seen_block = pss.block;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 13/17] background snapshot: add write-protected page access handler function
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (11 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 12/17] ram: add background snapshot support in ram page saving part of migration Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 14/17] kvm: add failed memeory access exit reason Denis Plotnikov
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The handler does all the necessary operations to save the page being
accessed to the snapshot file(stream).

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/ram.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h |  1 +
 2 files changed, 44 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index b1623e96e7..04a4bf708d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1762,6 +1762,49 @@ int ram_copy_page(RAMBlock *block, unsigned long page_nr,
     return 1;
 }
 
+/**
+ * ram_process_page_fault
+ *
+ * Used in the background snapshot to queue the copy of the memory
+ * page for writing.
+ *
+ * Returns:
+ *    0 > - on error
+ *    0   - success
+ *
+ * @address:  guest address
+ *
+ */
+int ram_process_page_fault(void *address)
+{
+    int ret;
+    void *page_copy = NULL;
+    unsigned long page_nr;
+    ram_addr_t offset;
+
+    RAMBlock *block = ram_bgs_block_find(address, &offset);
+
+    if (!block) {
+        return -1;
+    }
+
+    page_nr = offset >> TARGET_PAGE_BITS;
+
+    ret = ram_copy_page(block, page_nr, &page_copy);
+
+    if (ret < 0) {
+        return ret;
+    } else if (ret > 0) {
+        if (ram_save_queue_pages(block, NULL, offset,
+                                 TARGET_PAGE_SIZE, page_copy)) {
+            ram_page_buffer_free(page_copy);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 /**
  * ram_find_and_save_block: finds a dirty page and sends it to f
  *
diff --git a/migration/ram.h b/migration/ram.h
index 76ab0a3377..2b565ce620 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -76,4 +76,5 @@ int ram_block_list_set_readonly(void);
 int ram_block_list_set_writable(void);
 
 int ram_copy_page(RAMBlock *block, unsigned long page_nr, void **page_copy);
+int ram_process_page_fault(void *address);
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 14/17] kvm: add failed memeory access exit reason
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (12 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 13/17] background snapshot: add write-protected page access handler function Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 15/17] kvm: add vCPU failed memeory access processing Denis Plotnikov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The patch allows qemu to be aware of how to read kvm failed memeory
access.

This is a temporary patch. It sould be removed on importing
the kvm failed memeory access exit from the linux branch.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 linux-headers/linux/kvm.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index d92c9b2f0e..ea63b681d8 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_S390_STSI        25
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
+#define KVM_EXIT_FAIL_MEM_ACCESS  28
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -392,6 +393,10 @@ struct kvm_run {
 		} eoi;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
+		/* KVM_EXIT_FAIL_MEM_ACCESS */
+		struct {
+			__u64 hva;
+		} fail_mem_access;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 15/17] kvm: add vCPU failed memeory access processing
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (13 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 14/17] kvm: add failed memeory access exit reason Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-20  9:44   ` Peter Xu
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 16/17] migration: move the device state saving logic to a separate function Denis Plotnikov
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

Is done with support of the KVM patch returning the faulting address.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 target/i386/kvm.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 3ac5302bc5..55b8860d1a 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -45,6 +45,8 @@
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
 #include "migration/blocker.h"
+#include "migration/savevm.h"
+#include "migration/ram.h"
 #include "exec/memattrs.h"
 #include "trace.h"
 
@@ -3130,6 +3132,18 @@ static bool host_supports_vmx(void)
     return ecx & CPUID_EXT_VMX;
 }
 
+static int kvm_handle_fail_mem_access(CPUState *cpu)
+{
+    struct kvm_run *run = cpu->kvm_run;
+    int ret = ram_process_page_fault((void *)run->fail_mem_access.hva);
+
+    if (ret >= 0) {
+        cpu_resume(cpu);
+    }
+
+    return ret;
+}
+
 #define VMX_INVALID_GUEST_STATE 0x80000021
 
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
@@ -3188,6 +3202,9 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ioapic_eoi_broadcast(run->eoi.vector);
         ret = 0;
         break;
+    case KVM_EXIT_FAIL_MEM_ACCESS:
+        ret = kvm_handle_fail_mem_access(cs);
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 16/17] migration: move the device state saving logic to a separate function
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (14 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 15/17] kvm: add vCPU failed memeory access processing Denis Plotnikov
@ 2018-07-18 15:41 ` Denis Plotnikov
  2018-07-18 15:42 ` [Qemu-devel] [PATCH v1 17/17] background snapshot: enable background snapshot Denis Plotnikov
  2018-07-20  9:27 ` [Qemu-devel] [PATCH v1 00/17] Background snapshots Peter Xu
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:41 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The logic being split will be reused by the background snapshot.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/savevm.c | 91 +++++++++++++++++++++++++---------------------
 migration/savevm.h |  2 +
 2 files changed, 52 insertions(+), 41 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index f202c3de3a..36074ec9de 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1107,51 +1107,15 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
-int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
-                                       bool inactivate_disks)
+int qemu_savevm_state_save(QEMUFile *f, bool inactivate_disks,
+                           bool send_eof)
 {
-    QJSON *vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
-    bool in_postcopy = migration_in_postcopy();
-
-    trace_savevm_state_complete_precopy();
+    QJSON *vmdesc = qjson_new();
 
     cpu_synchronize_all_states();
-
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops ||
-            (in_postcopy && se->ops->has_postcopy &&
-             se->ops->has_postcopy(se->opaque)) ||
-            (in_postcopy && !iterable_only) ||
-            !se->ops->save_live_complete_precopy) {
-            continue;
-        }
-
-        if (se->ops && se->ops->is_active) {
-            if (!se->ops->is_active(se->opaque)) {
-                continue;
-            }
-        }
-        trace_savevm_section_start(se->idstr, se->section_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_END);
-
-        ret = se->ops->save_live_complete_precopy(f, se->opaque);
-        trace_savevm_section_end(se->idstr, se->section_id, ret);
-        save_section_footer(f, se);
-        if (ret < 0) {
-            qemu_file_set_error(f, ret);
-            return -1;
-        }
-    }
-
-    if (iterable_only) {
-        return 0;
-    }
-
-    vmdesc = qjson_new();
     json_prop_int(vmdesc, "page_size", qemu_target_page_size());
     json_start_array(vmdesc, "devices");
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
@@ -1193,8 +1157,8 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
             return ret;
         }
     }
-    if (!in_postcopy) {
-        /* Postcopy stream will still be going */
+
+    if (send_eof) {
         qemu_put_byte(f, QEMU_VM_EOF);
     }
 
@@ -1213,6 +1177,51 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
     return 0;
 }
 
+int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
+                                       bool inactivate_disks)
+{
+    SaveStateEntry *se;
+    int ret;
+    bool in_postcopy = migration_in_postcopy();
+
+    trace_savevm_state_complete_precopy();
+
+    cpu_synchronize_all_states();
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops ||
+            (in_postcopy && se->ops->has_postcopy &&
+             se->ops->has_postcopy(se->opaque)) ||
+            (in_postcopy && !iterable_only) ||
+            !se->ops->save_live_complete_precopy) {
+            continue;
+        }
+
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+
+        save_section_header(f, se, QEMU_VM_SECTION_END);
+
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+        save_section_footer(f, se);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return -1;
+        }
+    }
+
+    if (iterable_only) {
+        return 0;
+    }
+
+    return qemu_savevm_state_save(f, inactivate_disks, !in_postcopy);
+}
+
 /* Give an estimate of the amount left to be transferred,
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
diff --git a/migration/savevm.h b/migration/savevm.h
index 295c4a1f2c..0e8d7aecf8 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,6 +40,8 @@ int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
                                uint64_t *res_non_postcopiable,
                                uint64_t *res_postcopiable);
+int qemu_savevm_state_save(QEMUFile *f, bool inactivate_disks,
+                           bool send_eof);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v1 17/17] background snapshot: enable background snapshot
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (15 preceding siblings ...)
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 16/17] migration: move the device state saving logic to a separate function Denis Plotnikov
@ 2018-07-18 15:42 ` Denis Plotnikov
  2018-07-20  9:27 ` [Qemu-devel] [PATCH v1 00/17] Background snapshots Peter Xu
  17 siblings, 0 replies; 34+ messages in thread
From: Denis Plotnikov @ 2018-07-18 15:42 UTC (permalink / raw)
  To: dgilbert, quintela, pbonzini; +Cc: qemu-devel

The patch enables to save vmstate to a migration thread
in the background: ram is being saved while vCPUs are running.
This is done to reduce downtime on vm snapshotting: the majority
of vmstate is ram, the rest of devices consumes only a few MB of
memory on a typical vm.
By this moment, there were no means to make a snapshot without
full vm stopping. From now, one can save a vm state having the vm
stopped only for a couple of seconds to save devices' states. Then
the vm resumed and ram is written without the vm full stopping.

To use async vm state it's needed to enable "background-snapshot"
migration capability and start the migration

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/migration.c | 103 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 101 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 131d0904e4..935e5b6f2e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2188,6 +2188,99 @@ bool migrate_colo_enabled(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
 }
 
+static void background_snapshot_sigsegv_handler(int unused0, siginfo_t *info,
+                                                void *unused1)
+{
+    if (ram_process_page_fault(info->si_addr)) {
+        assert(false);
+    }
+}
+
+static void *background_snapshot_thread(void *opaque)
+{
+    MigrationState *m = opaque;
+    QIOChannelBuffer *bioc;
+    QEMUFile *fb;
+    int res = 0;
+
+    rcu_register_thread();
+
+    qemu_file_set_rate_limit(m->to_dst_file, INT64_MAX);
+
+    qemu_mutex_lock_iothread();
+    vm_stop(RUN_STATE_PAUSED);
+
+    qemu_savevm_state_header(m->to_dst_file);
+    qemu_mutex_unlock_iothread();
+    qemu_savevm_state_setup(m->to_dst_file);
+    qemu_mutex_lock_iothread();
+
+    migrate_set_state(&m->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_ACTIVE);
+
+    /* save the device state to the temporary buffer */
+    bioc = qio_channel_buffer_new(4096);
+    qio_channel_set_name(QIO_CHANNEL(bioc), "vmstate-buffer");
+    fb = qemu_fopen_channel_output(QIO_CHANNEL(bioc));
+    object_unref(OBJECT(bioc));
+
+    ram_block_list_create();
+    ram_block_list_set_readonly();
+
+    if (global_state_store()) {
+        goto exit;
+    }
+
+    if (qemu_savevm_state_save(fb, false, true) < 0) {
+        migrate_set_state(&m->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+        goto exit;
+    }
+
+    sigsegv_user_handler_set(background_snapshot_sigsegv_handler);
+
+    vm_start();
+    qemu_mutex_unlock_iothread();
+
+    while (!res) {
+        res = qemu_savevm_state_iterate(m->to_dst_file, false);
+
+        if (res < 0 || qemu_file_get_error(m->to_dst_file)) {
+            migrate_set_state(&m->state, MIGRATION_STATUS_ACTIVE,
+                              MIGRATION_STATUS_FAILED);
+            goto exit;
+        }
+    }
+
+    /*
+     * By this moment we have RAM saved into the stream
+     * The next step is to flush the device state right after the
+     * RAM saved. The rest of device state is stored in
+     * the temporary buffer. Do flush the buffer into the stream.
+     */
+    qemu_put_buffer(m->to_dst_file, bioc->data, bioc->usage);
+    qemu_fflush(m->to_dst_file);
+
+    if (qemu_file_get_error(m->to_dst_file)) {
+        migrate_set_state(&m->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+        goto exit;
+    }
+
+    migrate_set_state(&m->state, MIGRATION_STATUS_ACTIVE,
+                                 MIGRATION_STATUS_COMPLETED);
+exit:
+    ram_block_list_set_writable();
+    ram_block_list_destroy();
+    sigsegv_user_handler_reset();
+    qemu_fclose(fb);
+    qemu_mutex_lock_iothread();
+    qemu_savevm_state_cleanup();
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
@@ -2402,8 +2495,14 @@ void migrate_fd_connect(MigrationState *s)
         migrate_fd_cleanup(s);
         return;
     }
-    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
-                       QEMU_THREAD_JOINABLE);
+    if (migrate_background_snapshot()) {
+        qemu_thread_create(&s->thread, "bg_snapshot",
+                           background_snapshot_thread, s,
+                           QEMU_THREAD_JOINABLE);
+    } else {
+        qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
+                           QEMU_THREAD_JOINABLE);
+    }
     s->migration_thread_running = true;
 }
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/17] threads: add infrastructure to process sigsegv
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 03/17] threads: add infrastructure to process sigsegv Denis Plotnikov
@ 2018-07-20  4:58   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  4:58 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:46PM +0300, Denis Plotnikov wrote:
> Allows to define sigsegv handler temporary for all threads.
> This is useful to implement copy-on-write logic while
> linux usefaultfd doesn't support write-protected faults.
> In the future, switch to using WP userfaultfd when it's
> available.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  include/qemu/thread.h    |  5 ++++
>  util/qemu-thread-posix.c | 52 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
> index 9910f49b3a..d6fed833fa 100644
> --- a/include/qemu/thread.h
> +++ b/include/qemu/thread.h
> @@ -210,4 +210,9 @@ void qemu_lockcnt_inc_and_unlock(QemuLockCnt *lockcnt);
>   */
>  unsigned qemu_lockcnt_count(QemuLockCnt *lockcnt);
>  
> +
> +typedef void (*sigsegv_handler)(int signum, siginfo_t *siginfo, void *sigctx);
> +void sigsegv_user_handler_set(sigsegv_handler handler);
> +void sigsegv_user_handler_reset(void);
> +
>  #endif
> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
> index 7306475899..5424b7106d 100644
> --- a/util/qemu-thread-posix.c
> +++ b/util/qemu-thread-posix.c
> @@ -489,6 +489,47 @@ static void qemu_thread_set_name(QemuThread *thread, const char *name)
>  #endif
>  }
>  
> +static sigsegv_handler sigsegv_user_handler;
> +
> +void sigsegv_user_handler_set(sigsegv_handler handler)
> +{
> +    assert(handler);

We can also assert !sigsegv_user_handler here to make sure there won't
accidentally be two users that want to specify a SIGSEGV handler in
parallel.

> +    atomic_set(&sigsegv_user_handler, handler);
> +}
> +
> +static sigsegv_handler sigsegv_user_handler_get(void)
> +{
> +    return atomic_read(&sigsegv_user_handler);
> +}
> +
> +void sigsegv_user_handler_reset(void)
> +{
> +    atomic_set(&sigsegv_user_handler, NULL);
> +}
> +
> +static void sigsegv_default_handler(int signum, siginfo_t *siginfo, void *sigctx)
> +{
> +    sigsegv_handler handler = sigsegv_user_handler_get();
> +
> +    if (!handler) {
> +        /*
> +         * remove the sigsegv handler if it's not set by user
> +         * this will lead to re-raising the error without a handler
> +         * and exiting from the program with "Segmentation fault"
> +         */
> +        int err;
> +        struct sigaction act;
> +        memset(&act, 0, sizeof(act));
> +        act.sa_flags = SA_RESETHAND;

I'm not sure whether this is the correct way to use SA_RESETHAND.

  SA_RESETHAND
        Restore the signal action to the default upon entry to the
        signal handler.  This flag is meaningful only when
        establishing a signal han‐ dler.  SA_ONESHOT is an obsolete,
        nonstandard synonym for this flag.

My understanding is that it's used as a "oneshot" pattern, but here
you didn't really specify any handler.  My wild guess is that the
handler is actually specified to SIG_DFL (which is, 0) after you
bzero-ed the sigaction, hence got what we want.

I'm not sure whether we need this complexity - could we just unset the
handler outside itself when reset?

> +        err = sigaction(SIGSEGV, &act, NULL);
> +        if (err) {
> +            error_exit(err, __func__);
> +        }
> +    } else {
> +        handler(signum, siginfo, sigctx);

Here not sure whether we can return something from the user handler
showing that whether it correctly handled the SIGSEGV signal,
otherwise how could we identify it's an expected event or a real
SIGSEGV fault?

> +    }
> +}
> +
>  void qemu_thread_create(QemuThread *thread, const char *name,
>                         void *(*start_routine)(void*),
>                         void *arg, int mode)
> @@ -496,14 +537,25 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>      sigset_t set, oldset;
>      int err;
>      pthread_attr_t attr;
> +    struct sigaction act;
>  
>      err = pthread_attr_init(&attr);
>      if (err) {
>          error_exit(err, __func__);
>      }
>  
> +    memset(&act, 0, sizeof(act));
> +    act.sa_flags = SA_SIGINFO;
> +    act.sa_sigaction = sigsegv_default_handler;
> +    err = sigaction(SIGSEGV, &act, NULL);
> +    if (err) {
> +        error_exit(err, __func__);
> +    }
> +
>      /* Leave signal handling to the iothread.  */
>      sigfillset(&set);
> +    /* ...all but SIGSEGV */
> +    sigdelset(&set, SIGSEGV);

Why we need to enable SIGSEGV on all threads explicitly?  Could we
just keep the old behavior just like what the other signals do?

>      pthread_sigmask(SIG_SETMASK, &set, &oldset);
>      err = pthread_create(&thread->thread, &attr, start_routine, arg);
>      if (err)
> -- 
> 2.17.0
> 
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 02/17] bitops: add some atomic versions of bitmap operations
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 02/17] bitops: add some atomic versions of bitmap operations Denis Plotnikov
@ 2018-07-20  5:09   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  5:09 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:45PM +0300, Denis Plotnikov wrote:
> 1. test bit
> 2. test and set bit
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 01/17] migration: add background snapshot capability
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 01/17] migration: add background snapshot capability Denis Plotnikov
@ 2018-07-20  5:14   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  5:14 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:44PM +0300, Denis Plotnikov wrote:
> The capability is used for the background vmstate saving
> using the migration infrastructure.
> Background vmstate saving means that the majority of vmstate
> (RAM) is saved in the background when VM's vCPUS are running.
> This helps to reduce the VM downtime on VM snapshotting.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  migration/migration.c | 35 +++++++++++++++++++++++++++++++++++
>  migration/migration.h |  1 +
>  qapi/migration.json   |  6 +++++-
>  3 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index d780601f0c..87096d23ef 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -697,6 +697,12 @@ static bool migrate_caps_check(bool *cap_list,
>              return false;
>          }
>  
> +        if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
> +            error_setg(errp, "Postcopy is not compatible "
> +                        "with background snapshot");
> +            return false;
> +        }
> +
>          /* This check is reasonably expensive, so only when it's being
>           * set the first time, also it's only the destination that needs
>           * special support.
> @@ -711,6 +717,26 @@ static bool migrate_caps_check(bool *cap_list,
>          }
>      }
>  
> +    if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
> +        if (cap_list[MIGRATION_CAPABILITY_RELEASE_RAM]) {
> +            error_setg(errp, "Background snapshot is not compatible "
> +                        "with release ram capability");
> +            return false;
> +        }
> +
> +        if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
> +            error_setg(errp, "Background snapshot is not "
> +                        "currently compatible with compression");
> +            return false;
> +        }
> +
> +        if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
> +            error_setg(errp, "Background snapshot is not "
> +                        "currently compatible with XBZLRE");
> +            return false;
> +        }
> +    }
> +
>      return true;
>  }
>  
> @@ -1635,6 +1661,15 @@ bool migrate_use_block_incremental(void)
>      return s->parameters.block_incremental;
>  }
>  
> +bool migrate_background_snapshot(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
> +}
> +
>  /* migration thread support */
>  /*
>   * Something bad happened to the RP stream, mark an error
> diff --git a/migration/migration.h b/migration/migration.h
> index 663415fe48..21babbc008 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -201,6 +201,7 @@ int migrate_compress_level(void);
>  int migrate_compress_threads(void);
>  int migrate_decompress_threads(void);
>  bool migrate_use_events(void);
> +bool migrate_background_snapshot(void);
>  
>  /* Sending on the return path - generic and then for each message type */
>  void migrate_send_rp_shut(MigrationIncomingState *mis,
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 03f57c9616..eb5f52b7ff 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -352,12 +352,16 @@
>  #
>  # @x-multifd: Use more than one fd for migration (since 2.11)
>  #
> +# @background-snapshot: Using migration infrastructure makes VM snapshot
> +#         saving its RAM in background. This reduces  VM downtime. (since 2.12)
> +#

Please specify "(since 3.1)" in your next post.

The sentence is not that clear to me.  For example, "saving its RAM in
background" is done in every migration, not only for the live
snapshot.  The words "reduces VM downtime" sounds also like for a live
migration rather than snapshots.  My try:

  If enabled, the migration stream will be a snapshot of the VM
  exactly at the point when the migration procedure starts.

>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
>             'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
> -           'block', 'return-path', 'pause-before-switchover', 'x-multifd' ] }
> +           'block', 'return-path', 'pause-before-switchover', 'x-multifd',
> +           'background-snapshot' ] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> -- 
> 2.17.0
> 
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 06/17] background snapshot: add helpers to manage a copy of ram block list
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 06/17] background snapshot: add helpers to manage a copy of ram block list Denis Plotnikov
@ 2018-07-20  7:57   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  7:57 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:49PM +0300, Denis Plotnikov wrote:
> The VM ramblock list may be changed during the snapshotting.
> We want to make sure that we have the same ramblock list as it
> was at the time of snapshot beginning.
> So, we create a copy of the list at the beginning of the snapshotting
> and use it during the process.

Could I ask why the list might change?

> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  migration/ram.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/ram.h |  6 ++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 021d583b9b..4399c31606 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1508,6 +1508,57 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>      return pages;
>  }
>  
> +/* BackGround Snapshot Blocks */
> +static RamBlockList bgs_blocks;
> +
> +static RamBlockList *ram_bgs_block_list_get(void)
> +{
> +    return &bgs_blocks;
> +}
> +
> +void ram_block_list_create(void)
> +{
> +    RAMBlock *block = NULL;
> +    RamBlockList *block_list = ram_bgs_block_list_get();
> +
> +    qemu_mutex_lock_ramlist();
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {

Even if we want to copy a list of ramblocks, we possibly don't need
the ones that aren't migratable?  Please refer to
RAMBLOCK_FOREACH_MIGRATABLE().  Then I think...

> +        memory_region_ref(block->mr);
> +        QLIST_INSERT_HEAD(block_list, block, bgs_next);
> +    }
> +    qemu_mutex_unlock_ramlist();
> +}
> +
> +void ram_block_list_destroy(void)
> +{
> +    RAMBlock *next, *block;
> +    RamBlockList *block_list = ram_bgs_block_list_get();
> +
> +    QLIST_FOREACH_SAFE(block, block_list, bgs_next, next) {
> +        QLIST_REMOVE(block, bgs_next);
> +        memory_region_unref(block->mr);
> +    }
> +}
> +
> +RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset)
> +{
> +    RAMBlock *block = NULL;
> +    RamBlockList *block_list = ram_bgs_block_list_get();
> +
> +    QLIST_FOREACH(block, block_list, bgs_next) {
> +        /* This case append when the block is not mapped. */
> +        if (block->host == NULL) {

... things like this should not happen.

Thanks,

> +            continue;
> +        }
> +
> +        if (address - block->host < block->max_length) {
> +            *page_offset = (address - block->host) & TARGET_PAGE_MASK;
> +            return block;
> +        }
> +    }
> +
> +    return NULL;
> +}
>  /**
>   * ram_find_and_save_block: finds a dirty page and sends it to f
>   *
> diff --git a/migration/ram.h b/migration/ram.h
> index 64d81e9f1d..385579cae9 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -31,6 +31,7 @@
>  
>  #include "qemu-common.h"
>  #include "exec/cpu-common.h"
> +#include "exec/ramlist.h"
>  
>  extern MigrationStats ram_counters;
>  extern XBZRLECacheStats xbzrle_counters;
> @@ -61,5 +62,10 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>  int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
>  void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
>  void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
> +/* For background snapshot */
> +void ram_block_list_create(void);
> +void ram_block_list_destroy(void);
> +
> +RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset);
>  
>  #endif
> -- 
> 2.17.0
> 
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/17] ram: extend the data structures for background snapshotting
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 05/17] ram: extend the data structures for background snapshotting Denis Plotnikov
@ 2018-07-20  7:59   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  7:59 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:48PM +0300, Denis Plotnikov wrote:
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  include/exec/ram_addr.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 6cbc02aa0f..5b403d537d 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -36,6 +36,8 @@ struct RAMBlock {
>      char idstr[256];
>      /* RCU-enabled, writes protected by the ramlist lock */
>      QLIST_ENTRY(RAMBlock) next;
> +    /* blocks used for background snapshot */
> +    QLIST_ENTRY(RAMBlock) bgs_next;
>      QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
>      int fd;
>      size_t page_size;
> @@ -49,6 +51,11 @@ struct RAMBlock {
>      unsigned long *unsentmap;
>      /* bitmap of already received pages in postcopy */
>      unsigned long *receivedmap;
> +    /* The following 2 are for background snapshot */
> +    /* Pages currently being copied */
> +    unsigned long *touched_map;
> +    /* Pages has been copied already */
> +    unsigned long *copied_map;
>  };
>  
>  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> -- 
> 2.17.0
> 
> 

IMHO these new fields can be put into the patches where they are
firstly used.  Isolating them do not really help much for anything
AFAICT...

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies Denis Plotnikov
@ 2018-07-20  8:39   ` Peter Xu
  2018-07-20 11:46   ` Peter Xu
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  8:39 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:53PM +0300, Denis Plotnikov wrote:
> The background snapshot uses memeory page copying to seal the page
> memory content. The patch adapts the migration infrastructure to save
> copies of the pages.

Again, since previous page only defined some fields that are firstly
used in this patch, I think you can squash that into this one.

> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  migration/migration.c |  2 +-
>  migration/ram.c       | 59 ++++++++++++++++++++++++++++++++-----------
>  migration/ram.h       |  3 ++-
>  3 files changed, 47 insertions(+), 17 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 87096d23ef..131d0904e4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1716,7 +1716,7 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>          return;
>      }
>  
> -    if (ram_save_queue_pages(rbname, start, len)) {
> +    if (ram_save_queue_pages(NULL, rbname, start, len, NULL)) {
>          mark_source_rp_bad(ms);
>      }
>  }
> diff --git a/migration/ram.c b/migration/ram.c
> index dc7dfe0726..3c4ccd85b4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -976,7 +976,12 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      RAMBlock *block = pss->block;
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>  
> -    p = block->host + offset;
> +    if (pss->page_copy) {
> +        p = pss->page_copy;
> +    } else {
> +        p = block->host + offset;
> +    }
> +
>      trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>  
>      /* In doubt sent page as normal */
> @@ -1003,13 +1008,18 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      } else {
>          pages = save_zero_page(rs, block, offset, p);
>          if (pages > 0) {
> -            /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> -             * page would be stale
> -             */
> -            xbzrle_cache_zero_page(rs, current_addr);
> -            ram_release_pages(block->idstr, offset, pages);
> +            if (pss->page_copy) {
> +                qemu_madvise(p, TARGET_PAGE_SIZE, MADV_DONTNEED);
> +            } else {
> +                /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> +                 * page would be stale
> +                 */
> +                xbzrle_cache_zero_page(rs, current_addr);
> +                ram_release_pages(block->idstr, offset, pages);
> +            }
>          } else if (!rs->ram_bulk_stage &&
> -                   !migration_in_postcopy() && migrate_use_xbzrle()) {
> +                   !migration_in_postcopy() && migrate_use_xbzrle() &&
> +                   !migrate_background_snapshot()) {
>              pages = save_xbzrle_page(rs, &p, current_addr, block,
>                                       offset, last_stage);
>              if (!last_stage) {
> @@ -1026,9 +1036,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>          ram_counters.transferred +=
>              save_page_header(rs, rs->f, block, offset | RAM_SAVE_FLAG_PAGE);
>          if (send_async) {
> -            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE,
> -                                  migrate_release_ram() &
> -                                  migration_in_postcopy());
> +            bool may_free = migrate_background_snapshot() ||
> +                            (migrate_release_ram() &&
> +                             migration_in_postcopy());
> +            qemu_put_buffer_async(rs->f, p, TARGET_PAGE_SIZE, may_free);
>          } else {
>              qemu_put_buffer(rs->f, p, TARGET_PAGE_SIZE);
>          }
> @@ -1269,7 +1280,8 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
>   * @rs: current RAM state
>   * @offset: used to return the offset within the RAMBlock
>   */
> -static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset)
> +static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset,
> +                              void **page_copy)
>  {
>      RAMBlock *block = NULL;
>  
> @@ -1279,10 +1291,14 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t *offset)
>                                  QSIMPLEQ_FIRST(&rs->src_page_requests);
>          block = entry->rb;
>          *offset = entry->offset;
> +        *page_copy = entry->page_copy;
>  
>          if (entry->len > TARGET_PAGE_SIZE) {
>              entry->len -= TARGET_PAGE_SIZE;
>              entry->offset += TARGET_PAGE_SIZE;
> +            if (entry->page_copy) {
> +                entry->page_copy += TARGET_PAGE_SIZE / sizeof(void *);
> +            }
>          } else {
>              memory_region_unref(block->mr);
>              QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
> @@ -1309,9 +1325,10 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>      RAMBlock  *block;
>      ram_addr_t offset;
>      bool dirty;
> +    void *page_copy;
>  
>      do {
> -        block = unqueue_page(rs, &offset);
> +        block = unqueue_page(rs, &offset, &page_copy);
>          /*
>           * We're sending this page, and since it's postcopy nothing else
>           * will dirty it, and we must make sure it doesn't get sent again
> @@ -1349,6 +1366,7 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>           */
>          pss->block = block;
>          pss->page = offset >> TARGET_PAGE_BITS;
> +        pss->page_copy = page_copy;
>      }
>  
>      return !!block;
> @@ -1386,17 +1404,25 @@ static void migration_page_queue_free(RAMState *rs)
>   *
>   * @rbname: Name of the RAMBLock of the request. NULL means the
>   *          same that last one.
> + * @block: RAMBlock to use. block and rbname have mutualy exclusive
> + *         semantic with higher priority of the block.

If "mutual exclusive" then no priority at all?  Maybe simply:

  @block: RAMBlock to use.  When @block is provided, then @rbname is ignored.
 
>   * @start: starting address from the start of the RAMBlock
>   * @len: length (in bytes) to send
> + * @page_copy: the address the page should be written from

Maybe:

  @page_copy: the page to copy to destination.  If not specified, will
              use the page data specified by @start and @len.

>   */
> -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
> +int ram_save_queue_pages(RAMBlock *block, const char *rbname,
> +                         ram_addr_t start, ram_addr_t len, void *page_copy)
>  {
>      RAMBlock *ramblock;
>      RAMState *rs = ram_state;
>  
>      ram_counters.postcopy_requests++;
> +
>      rcu_read_lock();
> -    if (!rbname) {
> +
> +    if (block) {
> +        ramblock = block;
> +    } else if (!rbname) {
>          /* Reuse last RAMBlock */
>          ramblock = rs->last_req_rb;
>  
> @@ -1431,6 +1457,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
>      new_entry->rb = ramblock;
>      new_entry->offset = start;
>      new_entry->len = len;
> +    new_entry->page_copy = page_copy;
>  
>      memory_region_ref(ramblock->mr);
>      qemu_mutex_lock(&rs->src_page_req_mutex);
> @@ -1468,7 +1495,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>           * xbzrle can do better than compression.
>           */
>          if (migrate_use_compression() &&
> -            (rs->ram_bulk_stage || !migrate_use_xbzrle())) {
> +            (rs->ram_bulk_stage || !migrate_use_xbzrle()) &&
> +            !migrate_background_snapshot()) {

This seems unecessary - in the first patch you have already made sure
that compression is not enabled during a live snapshot, so we should
be fine since we check migrate_use_compression() first.

>              res = ram_save_compressed_page(rs, pss, last_stage);
>          } else {
>              res = ram_save_page(rs, pss, last_stage);
> @@ -1706,6 +1734,7 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage)
>      pss.block = rs->last_seen_block;
>      pss.page = rs->last_page;
>      pss.complete_round = false;
> +    pss.page_copy = NULL;
>  
>      if (!pss.block) {
>          pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
> diff --git a/migration/ram.h b/migration/ram.h
> index 4c463597a5..c3679ba65e 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -46,7 +46,8 @@ int multifd_load_setup(void);
>  int multifd_load_cleanup(Error **errp);
>  
>  uint64_t ram_pagesize_summary(void);
> -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
> +int ram_save_queue_pages(RAMBlock *block, const char *rbname,
> +                         ram_addr_t start, ram_addr_t len, void *page_copy);
>  void acct_update_position(QEMUFile *f, size_t size, bool zero);
>  void ram_debug_dump_bitmap(unsigned long *todump, bool expected,
>                             unsigned long pages);
> -- 
> 2.17.0
> 
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer Denis Plotnikov
@ 2018-07-20  9:22   ` Peter Xu
  2018-07-20 10:34   ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  9:22 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:50PM +0300, Denis Plotnikov wrote:
> To limit the amount of memory used by the background snapshot
> a memory limiter is used which called "page buffer".
> In fact, it's a memory page counter limiting the page allocation.
> Currently, the limit of pages is hardcoded but its setter is
> the matter of the future work.
> 
> The background snapshot makes a copy of the page before writing it
> to the snapshot destination, but it doesn't use a preallocated buffer,
> because the background snapshot employs the migration infrastructure
> which, in turn, uses madvice to release the buffer.
> The advice issuing is hard to track, so here we rely on anonymous
> memory mapping and releasing it by the existing code.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  migration/ram.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/ram.h |  3 ++
>  2 files changed, 94 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 4399c31606..27d3403435 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -192,6 +192,16 @@ struct RAMSrcPageRequest {
>      QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
>  };
>  
> +/* Page buffer used for background snapshot */
> +typedef struct RAMPageBuffer {
> +    /* Page buffer capacity in host pages */
> +    int capacity;
> +    /* Current number of pages in the buffer */
> +    int used;
> +    /* Event to notify that buffer usage is under capacity */
> +    QemuEvent used_decreased;
> +} RAMPageBuffer;
> +
>  /* State of RAM for migration */
>  struct RAMState {
>      /* QEMUFile used for this migration */
> @@ -230,6 +240,11 @@ struct RAMState {
>      /* Queue of outstanding page requests from the destination */
>      QemuMutex src_page_req_mutex;
>      QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
> +    /* The following 2 are for background snapshot */
> +    /* Buffer data to store copies of ram pages while async vm saving */
> +    RAMPageBuffer page_buffer;
> +    /* Event to notify that a page copying just has finished*/
> +    QemuEvent page_copying_done;
>  };
>  typedef struct RAMState RAMState;
>  
> @@ -1540,6 +1555,52 @@ void ram_block_list_destroy(void)
>      }
>  }
>  
> +static void ram_page_buffer_decrease_used(void)
> +{
> +    qemu_event_reset(&ram_state->page_buffer.used_decreased);

Could I ask why do we need to reset it before set?

> +    atomic_dec(&ram_state->page_buffer.used);
> +    qemu_event_set(&ram_state->page_buffer.used_decreased);
> +}
> +
> +static void ram_page_buffer_increase_used_wait(void)
> +{
> +    int used, *used_ptr;
> +    RAMState *rs = ram_state;
> +    used_ptr = &rs->page_buffer.used;
> +    do {
> +        used = atomic_read(used_ptr);
> +        if (rs->page_buffer.capacity > used) {
> +            if (atomic_cmpxchg(used_ptr, used, used + 1) == used) {
> +                return;
> +            } else {
> +                continue;
> +            }
> +        } else {
> +            qemu_event_wait(&rs->page_buffer.used_decreased);
> +        }
> +    } while (true);

AFAIU this whole function tries to serialize the usage of
page_buffer.used. Have you ever tried to simply use e.g. a mutex to
protect it, and just implement the function in the simplest (but
complete) way first?  I believe this is for performance's sake but I'm
not sure if it will run faster if the critical section of the mutex is
very small (here, only the update of page_buffer_used).  It seems that
there are multiple ongoing works on the list that have tried to
implement their own data structures (possibly lockless), including
this one.  For example, Guangrong has proposed a lockless fifo just in
~1.5 month:

https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00526.html

Lots of discussions there, and IIUC the conclusion is that we can
consider to just backport an existing implementation from the kernel:

https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08807.html

I'm just hoping that these "data structure" works might not block the
function itself from being merged.  Same thing applies to this series.
I never meant that you should drop existing work and do it again using
mutex, but I just spoke this out in case it makes some sense to move
the work forward faster.

> +}
> +
> +void *ram_page_buffer_get(void)
> +{
> +    void *page;
> +    ram_page_buffer_increase_used_wait();
> +    page = mmap(0, TARGET_PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +
> +    if (page == MAP_FAILED) {
> +        ram_page_buffer_decrease_used();
> +        page = NULL;
> +    }
> +    return page;
> +}
> +
> +int ram_page_buffer_free(void *buffer)
> +{
> +    ram_page_buffer_decrease_used();
> +    return qemu_madvise(buffer, TARGET_PAGE_SIZE, MADV_DONTNEED);

Just to confirm: is this really exactly the same as munmap()?

> +}
> +
>  RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset)
>  {
>      RAMBlock *block = NULL;
> @@ -1559,6 +1620,7 @@ RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset)
>  
>      return NULL;
>  }
> +
>  /**
>   * ram_find_and_save_block: finds a dirty page and sends it to f
>   *
> @@ -1651,6 +1713,13 @@ static void xbzrle_load_cleanup(void)
>  
>  static void ram_state_cleanup(RAMState **rsp)
>  {
> +    if (migrate_background_snapshot()) {
> +        qemu_event_destroy(&(*rsp)->page_buffer.used_decreased);
> +        /* In case somebody is still waiting for the event */
> +        qemu_event_set(&(*rsp)->page_copying_done);
> +        qemu_event_destroy(&(*rsp)->page_copying_done);

We set this since "there might be someone waiting", however
immediately we destroy it...  Then what if someone reads it after we
destroy it?  Don't we need to sync somehow before destruction happens?

> +    }
> +
>      migration_page_queue_free(*rsp);
>      qemu_mutex_destroy(&(*rsp)->bitmap_mutex);
>      qemu_mutex_destroy(&(*rsp)->src_page_req_mutex);
> @@ -1689,6 +1758,13 @@ static void ram_save_cleanup(void *opaque)
>          block->bmap = NULL;
>          g_free(block->unsentmap);
>          block->unsentmap = NULL;
> +
> +        if (migrate_background_snapshot()) {
> +            g_free(block->touched_map);
> +            block->touched_map = NULL;
> +            g_free(block->copied_map);
> +            block->copied_map = NULL;
> +        }
>      }
>  
>      xbzrle_cleanup();
> @@ -1703,6 +1779,10 @@ static void ram_state_reset(RAMState *rs)
>      rs->last_page = 0;
>      rs->last_version = ram_list.version;
>      rs->ram_bulk_stage = true;
> +
> +    /* page buffer capacity in number of pages */
> +    rs->page_buffer.capacity = 1000;
> +    rs->page_buffer.used = 0;
>  }
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> @@ -2180,6 +2260,11 @@ static int ram_state_init(RAMState **rsp)
>       */
>      (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
>  
> +    if (migrate_background_snapshot()) {
> +        qemu_event_init(&ram_state->page_buffer.used_decreased, false);
> +        qemu_event_init(&ram_state->page_copying_done, false);
> +    }
> +
>      ram_state_reset(*rsp);
>  
>      return 0;
> @@ -2196,10 +2281,16 @@ static void ram_list_init_bitmaps(void)
>              pages = block->max_length >> TARGET_PAGE_BITS;
>              block->bmap = bitmap_new(pages);
>              bitmap_set(block->bmap, 0, pages);
> +
>              if (migrate_postcopy_ram()) {
>                  block->unsentmap = bitmap_new(pages);
>                  bitmap_set(block->unsentmap, 0, pages);
>              }
> +
> +            if (migrate_background_snapshot()) {
> +                block->touched_map = bitmap_new(pages);
> +                block->copied_map = bitmap_new(pages);
> +            }
>          }
>      }
>  }
> diff --git a/migration/ram.h b/migration/ram.h
> index 385579cae9..7c802c1d7f 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -68,4 +68,7 @@ void ram_block_list_destroy(void);
>  
>  RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset);
>  
> +void *ram_page_buffer_get(void);
> +int ram_page_buffer_free(void *buffer);
> +
>  #endif
> -- 
> 2.17.0
> 
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/17] Background snapshots
  2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
                   ` (16 preceding siblings ...)
  2018-07-18 15:42 ` [Qemu-devel] [PATCH v1 17/17] background snapshot: enable background snapshot Denis Plotnikov
@ 2018-07-20  9:27 ` Peter Xu
  2018-09-04 13:00   ` Denis Plotnikov
  17 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2018-07-20  9:27 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:43PM +0300, Denis Plotnikov wrote:
> The workflow to make a snapshot is the following:
> 1. Pause the vm
> 2. Make a snapshot of block devices using the scheme of your choice
> 3. Turn on background-snapshot migration capability
> 4. Start the migration using the destination (migration stream) of your choice.
>    The migration will resume the vm execution by itself
>    when it has the devices' states saved  and is ready to start ram writing
>    to the migration stream.
> 5. Listen to the migration finish event
> 
> The bakground snapshot works with support of KVM patch:
> "x86: mmu: report failed memory access to the userspace"
> (not applied to the mainstream, it's in the kvm mailing list)

Hello, Denis,

Do you mind to push your tree to an online repository in case to make
review easier?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 15/17] kvm: add vCPU failed memeory access processing
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 15/17] kvm: add vCPU failed memeory access processing Denis Plotnikov
@ 2018-07-20  9:44   ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20  9:44 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:58PM +0300, Denis Plotnikov wrote:
> Is done with support of the KVM patch returning the faulting address.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>

I feel like these two kvm-related patches can be put at the end of the
series as an extension to kvm support.  E.g., without these kvm
patches one should still be able to do live snapshot with TCG.  Then
even if your KVM patches are not settled (along with the header
update) there's still chance that your framework can be merged first.

> ---
>  target/i386/kvm.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 3ac5302bc5..55b8860d1a 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -45,6 +45,8 @@
>  #include "hw/pci/msi.h"
>  #include "hw/pci/msix.h"
>  #include "migration/blocker.h"
> +#include "migration/savevm.h"
> +#include "migration/ram.h"
>  #include "exec/memattrs.h"
>  #include "trace.h"
>  
> @@ -3130,6 +3132,18 @@ static bool host_supports_vmx(void)
>      return ecx & CPUID_EXT_VMX;
>  }
>  
> +static int kvm_handle_fail_mem_access(CPUState *cpu)
> +{
> +    struct kvm_run *run = cpu->kvm_run;
> +    int ret = ram_process_page_fault((void *)run->fail_mem_access.hva);
> +
> +    if (ret >= 0) {
> +        cpu_resume(cpu);
> +    }
> +
> +    return ret;
> +}
> +
>  #define VMX_INVALID_GUEST_STATE 0x80000021
>  
>  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
> @@ -3188,6 +3202,9 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ioapic_eoi_broadcast(run->eoi.vector);
>          ret = 0;
>          break;
> +    case KVM_EXIT_FAIL_MEM_ACCESS:
> +        ret = kvm_handle_fail_mem_access(cs);
> +        break;
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> -- 
> 2.17.0
> 
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer Denis Plotnikov
  2018-07-20  9:22   ` Peter Xu
@ 2018-07-20 10:34   ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2018-07-20 10:34 UTC (permalink / raw)
  To: Denis Plotnikov, dgilbert, quintela; +Cc: qemu-devel

On 18/07/2018 17:41, Denis Plotnikov wrote:
> +static void ram_page_buffer_decrease_used(void)
> +{
> +    qemu_event_reset(&ram_state->page_buffer.used_decreased);
> +    atomic_dec(&ram_state->page_buffer.used);
> +    qemu_event_set(&ram_state->page_buffer.used_decreased);

As Peter noted, only the qemu_event_set is needed here, while...

> +}
> +
> +static void ram_page_buffer_increase_used_wait(void)
> +{
> +    int used, *used_ptr;
> +    RAMState *rs = ram_state;
> +    used_ptr = &rs->page_buffer.used;
> +    do {
> +        used = atomic_read(used_ptr);
> +        if (rs->page_buffer.capacity > used) {
> +            if (atomic_cmpxchg(used_ptr, used, used + 1) == used) {
> +                return;
> +            } else {
> +                continue;
> +            }
> +        } else {
> +            qemu_event_wait(&rs->page_buffer.used_decreased);
> +        }
> +    } while (true);

... the right idiom for using qemu_event_wait is this:

   if test condition
       reset
       if test condition
           wait

So something like

    for(;;) {
        used = atomic_read(used_ptr);
        if (rs->page_buffer.capacity <= used) {
            qemu_event_reset(&rs->page_buffer.used_decreased);
            used = atomic_read(used_ptr);
            if (rs->page_buffer.capacity <= used) {
                qemu_event_wait(&rs->page_buffer.used_decreased);
                continue;
            }
        }

        if (atomic_cmpxchg(used_ptr, used, used + 1) == used) {
            return;
        }
    }

Note that there must be a single thread doing the above.  Otherwise, you
have to use mutex and condvar.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 08/17] migration: add helpers to change VM memory protection rights
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 08/17] migration: add helpers to change VM memory protection rights Denis Plotnikov
@ 2018-07-20 11:28   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-20 11:28 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: quintela, pbonzini, qemu-devel

* Denis Plotnikov (dplotnikov@virtuozzo.com) wrote:
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  migration/ram.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/ram.h |  3 +++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 27d3403435..ce3dead932 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1555,6 +1555,60 @@ void ram_block_list_destroy(void)
>      }
>  }
>  
> +static int mem_protect(void *addr, uint64_t length, int prot)
> +{
> +    int ret = mprotect(addr, length, prot);
> +
> +    if (ret < 0) {
> +        error_report("%s: Can't change protection on ram block at %p",
> +                     __func__, addr);
> +    }
> +
> +    return ret;
> +}
> +
> +static int ram_set_ro(void *addr, uint64_t length)
> +{
> +    return mem_protect(addr, length, PROT_READ);
> +}
> +
> +static int ram_set_rw(void *addr, uint64_t length)
> +{
> +    return mem_protect(addr, length, PROT_READ | PROT_WRITE);
> +}
> +
> +int ram_block_list_set_readonly(void)
> +{
> +    RAMBlock *block = NULL;
> +    RamBlockList *block_list = ram_bgs_block_list_get();
> +    int ret = 0;
> +
> +    QLIST_FOREACH(block, block_list, bgs_next) {
> +        ret = ram_set_ro(block->host, block->used_length);
> +        if (ret) {
> +            break;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +int ram_block_list_set_writable(void)
> +{
> +    RAMBlock *block = NULL;
> +    RamBlockList *block_list = ram_bgs_block_list_get();
> +    int ret = 0;
> +
> +    QLIST_FOREACH(block, block_list, bgs_next) {
> +        ret = ram_set_rw(block->host, block->used_length);
> +        if (ret) {
> +            break;
> +        }
> +    }

Do you need to keep track of who was writeable previously
to avoid changing blocks that should be read-only, 
Or are all RAMBlocks currently mapped rw?

Dave

> +    return ret;
> +}
> +
>  static void ram_page_buffer_decrease_used(void)
>  {
>      qemu_event_reset(&ram_state->page_buffer.used_decreased);
> diff --git a/migration/ram.h b/migration/ram.h
> index 7c802c1d7f..4c463597a5 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -71,4 +71,7 @@ RAMBlock *ram_bgs_block_find(uint8_t *address, ram_addr_t *page_offset);
>  void *ram_page_buffer_get(void);
>  int ram_page_buffer_free(void *buffer);
>  
> +int ram_block_list_set_readonly(void);
> +int ram_block_list_set_writable(void);
> +
>  #endif
> -- 
> 2.17.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies
  2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies Denis Plotnikov
  2018-07-20  8:39   ` Peter Xu
@ 2018-07-20 11:46   ` Peter Xu
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-07-20 11:46 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Jul 18, 2018 at 06:41:53PM +0300, Denis Plotnikov wrote:

[...]

> @@ -1003,13 +1008,18 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      } else {
>          pages = save_zero_page(rs, block, offset, p);

Now save_zero_page() is not called by ram_save_page().  I'd suggest
that you rebase to the latest master in your next post since the code
base looks old.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/17] Background snapshots
  2018-07-20  9:27 ` [Qemu-devel] [PATCH v1 00/17] Background snapshots Peter Xu
@ 2018-09-04 13:00   ` Denis Plotnikov
  2018-09-05  3:32     ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-09-04 13:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: dgilbert, quintela, pbonzini, qemu-devel

Hi Peter

I moved the code to the repository 
https://github.com/denis-plotnikov/qemu/tree/background-snapshot-kvm.
the current version includes fixes with respect to your comments for 
version 1.
I moved KVM related patches to the end of the branch (formerly patch 
series).
Since, the KVM patches and the other parts to modify (vhost an others) 
are needless in favor of upcoming userfaltfd,
I would ask you to review the general framework which is able to work 
with tcg.

Thanks in advance!

Denis

On 20.07.2018 12:27, Peter Xu wrote:
> On Wed, Jul 18, 2018 at 06:41:43PM +0300, Denis Plotnikov wrote:
>> The workflow to make a snapshot is the following:
>> 1. Pause the vm
>> 2. Make a snapshot of block devices using the scheme of your choice
>> 3. Turn on background-snapshot migration capability
>> 4. Start the migration using the destination (migration stream) of your choice.
>>     The migration will resume the vm execution by itself
>>     when it has the devices' states saved  and is ready to start ram writing
>>     to the migration stream.
>> 5. Listen to the migration finish event
>>
>> The bakground snapshot works with support of KVM patch:
>> "x86: mmu: report failed memory access to the userspace"
>> (not applied to the mainstream, it's in the kvm mailing list)
> 
> Hello, Denis,
> 
> Do you mind to push your tree to an online repository in case to make
> review easier?
> 
> Thanks,
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/17] Background snapshots
  2018-09-04 13:00   ` Denis Plotnikov
@ 2018-09-05  3:32     ` Peter Xu
  2018-09-05  9:18       ` Denis Plotnikov
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2018-09-05  3:32 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Tue, Sep 04, 2018 at 04:00:31PM +0300, Denis Plotnikov wrote:
> Hi Peter

Hi, Denis,

> 
> I moved the code to the repository
> https://github.com/denis-plotnikov/qemu/tree/background-snapshot-kvm.
> the current version includes fixes with respect to your comments for version
> 1.
> I moved KVM related patches to the end of the branch (formerly patch
> series).
> Since, the KVM patches and the other parts to modify (vhost an others) are
> needless in favor of upcoming userfaltfd,
> I would ask you to review the general framework which is able to work with
> tcg.
> 
> Thanks in advance!

Thank you for pushing the tree.

I might have made a mistake before that I thought this work is at
least working for TCG, but I think I was wrong.  The problem is (I'm
trying to repeat Dave's question that you seems haven't yet answered):
even for TCG there could be use cases where the process might access
guest memory from the kernel space (e.g., vhost, or any system calls
that with a guest memory buffer passed in).  I'm afraid mprotect() and
the whole signal-based mechanism cannot be able to address these page
faults, then we'll encounter adhoc errors and we'll need to fix all
these places up.  Userfaultfd-wp should not have this problem.

I think the general idea of the work is good, but I'm not sure whether
we can merge the work if we don't settle these issues.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/17] Background snapshots
  2018-09-05  3:32     ` Peter Xu
@ 2018-09-05  9:18       ` Denis Plotnikov
  2018-09-05  9:30         ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Denis Plotnikov @ 2018-09-05  9:18 UTC (permalink / raw)
  To: Peter Xu; +Cc: dgilbert, quintela, pbonzini, qemu-devel

Hi Peter,

Thanks for the reply.

Ok, I understand about tcg.
So my only option is to wait for userfaultfd-wp.
Do you know if anyone is  currently working on this? And if so, then is 
there any estimations when the userfaultfd is ready?

Denis


On 05.09.2018 06:32, Peter Xu wrote:
> On Tue, Sep 04, 2018 at 04:00:31PM +0300, Denis Plotnikov wrote:
>> Hi Peter
> 
> Hi, Denis,
> 
>>
>> I moved the code to the repository
>> https://github.com/denis-plotnikov/qemu/tree/background-snapshot-kvm.
>> the current version includes fixes with respect to your comments for version
>> 1.
>> I moved KVM related patches to the end of the branch (formerly patch
>> series).
>> Since, the KVM patches and the other parts to modify (vhost an others) are
>> needless in favor of upcoming userfaltfd,
>> I would ask you to review the general framework which is able to work with
>> tcg.
>>
>> Thanks in advance!
> 
> Thank you for pushing the tree.
> 
> I might have made a mistake before that I thought this work is at
> least working for TCG, but I think I was wrong.  The problem is (I'm
> trying to repeat Dave's question that you seems haven't yet answered):
> even for TCG there could be use cases where the process might access
> guest memory from the kernel space (e.g., vhost, or any system calls
> that with a guest memory buffer passed in).  I'm afraid mprotect() and
> the whole signal-based mechanism cannot be able to address these page
> faults, then we'll encounter adhoc errors and we'll need to fix all
> these places up.  Userfaultfd-wp should not have this problem.
> 
> I think the general idea of the work is good, but I'm not sure whether
> we can merge the work if we don't settle these issues.
> 
> Regards,
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/17] Background snapshots
  2018-09-05  9:18       ` Denis Plotnikov
@ 2018-09-05  9:30         ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2018-09-05  9:30 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: dgilbert, quintela, pbonzini, qemu-devel

On Wed, Sep 05, 2018 at 12:18:09PM +0300, Denis Plotnikov wrote:
> Hi Peter,
> 
> Thanks for the reply.
> 
> Ok, I understand about tcg.
> So my only option is to wait for userfaultfd-wp.
> Do you know if anyone is  currently working on this? And if so, then is
> there any estimations when the userfaultfd is ready?

I am working on it, but it's only progressing slowly.  The latest
status is that I can run with QEMU/tcg now with Andrea's tree (will
need two more patches to be applied upon from me) however all the
changes are still unmature (e.g., one of the patch is discussing on
the linux-mm list, the other one I haven't discussed with anyone; but
these patches worked for me).

Let me know if want to play with the tree, at least I can share with
you on what I got (so maybe at least you can try to run your live
snapshot with QEMU/tcg with my userfaultfd tree; if error occurred we
can investigate).  Again, it is always welcomed if you want to
participate in the kernel work too.

Let me know what you think about it.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-09-05  9:30 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18 15:41 [Qemu-devel] [PATCH v1 00/17] Background snapshots Denis Plotnikov
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 01/17] migration: add background snapshot capability Denis Plotnikov
2018-07-20  5:14   ` Peter Xu
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 02/17] bitops: add some atomic versions of bitmap operations Denis Plotnikov
2018-07-20  5:09   ` Peter Xu
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 03/17] threads: add infrastructure to process sigsegv Denis Plotnikov
2018-07-20  4:58   ` Peter Xu
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 04/17] background snapshot: make a dedicated type for ram block list Denis Plotnikov
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 05/17] ram: extend the data structures for background snapshotting Denis Plotnikov
2018-07-20  7:59   ` Peter Xu
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 06/17] background snapshot: add helpers to manage a copy of ram block list Denis Plotnikov
2018-07-20  7:57   ` Peter Xu
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 07/17] background snapshot: introduce page buffer Denis Plotnikov
2018-07-20  9:22   ` Peter Xu
2018-07-20 10:34   ` Paolo Bonzini
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 08/17] migration: add helpers to change VM memory protection rights Denis Plotnikov
2018-07-20 11:28   ` Dr. David Alan Gilbert
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 09/17] background snapshot: extend RAM request for holding a page copy pointer Denis Plotnikov
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 10/17] background snapshots: adapt the page queueing code for using page copies Denis Plotnikov
2018-07-20  8:39   ` Peter Xu
2018-07-20 11:46   ` Peter Xu
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 11/17] background snapshot: add a memory page copying function Denis Plotnikov
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 12/17] ram: add background snapshot support in ram page saving part of migration Denis Plotnikov
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 13/17] background snapshot: add write-protected page access handler function Denis Plotnikov
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 14/17] kvm: add failed memeory access exit reason Denis Plotnikov
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 15/17] kvm: add vCPU failed memeory access processing Denis Plotnikov
2018-07-20  9:44   ` Peter Xu
2018-07-18 15:41 ` [Qemu-devel] [PATCH v1 16/17] migration: move the device state saving logic to a separate function Denis Plotnikov
2018-07-18 15:42 ` [Qemu-devel] [PATCH v1 17/17] background snapshot: enable background snapshot Denis Plotnikov
2018-07-20  9:27 ` [Qemu-devel] [PATCH v1 00/17] Background snapshots Peter Xu
2018-09-04 13:00   ` Denis Plotnikov
2018-09-05  3:32     ` Peter Xu
2018-09-05  9:18       ` Denis Plotnikov
2018-09-05  9:30         ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.