All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery
@ 2017-08-30  8:31 Peter Xu
  2017-08-30  8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
                   ` (32 more replies)
  0 siblings, 33 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

v2 note (the coarse-grained changelog):

- I appended the migrate-incoming re-use series into this one, since
  that one depends on this one, and it's really for the recovery

- I haven't yet added (actually I just added them but removed) the
  per-monitor thread related patches into this one, basically to setup
  "need-bql"="false" patches - the solution for the monitor hang issue
  is still during discussion in the other thread.  I'll add them in
  when settled.

- Quite a lot of other changes and additions regarding to v1 review
  comments.  I think I settled all the comments, but the God knows
  better.

Feel free to skip this ugly longer changelog (it's too long to be
meaningful I'm afraid).

v2:
- rebased to alexey's received bitmap v9
- add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21
- patch 1: use target page size to calc bitmap [Dave]
- patch 3: move trace_*() after EINTR check [Dave]
- patch 4: dropped since I can use bitmap_complement() [Dave]
- patch 7: check file error right after data is read in both
  qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(),
  meanwhile also check in check_section_footer() [Dave]
- patch 8/9: fix error_report/commit message in both patches [Dave]
- patch 10: dropped (new parameter "x-postcopy-fast")
- patch 11: split the "postcopy-paused" patch into two, one to
  introduce the new state, the other to implement the logic. Also,
  print something when paused [Dave]
- patch 17: removed do_resume label, introduced migration_prepare()
  [Dave]
- patch 18: removed do_pause label using a new loop [Dave]
- patch 20: removed incorrect comment [Dave]
- patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add
  trace in loadvm_handle_recv_bitmap() [Dave]
- patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit
  machines. More info in the commit message update.
- patch 23: add one check on migration state [Dave]
- patch 24: use macro instead of magic 1 [Dave]
- patch 26: use more trace_*() instead of one, and use one sem to
  replace mutex+cond. [Dave]
- move sem init/destroy into migration_instance_init() and
  migration_instance_finalize (new function after rebase).
- patch 29: squashed this patch most into:
  "migration: implement "postcopy-pause" src logic" [Dave]
- split the two fix patches out of the series
- fixed two places where I misused "wake/woke/woken". [Dave]
- add new patch "bitmap: provide to_le/from_le helpers" to solve the
  bitmap endianess issue [Dave]
- appended migrate_incoming series to this series, since that one is
  depending on the paused state.  Using explicit g_source_remove() for
  listening ports [Dan]

FUTURE TODO LIST
- support manual switch source into PAUSED state
- support migrate_cancel during PAUSED/RECOVER state
- when anything wrong happens during PAUSED/RECOVER, switching back to
  PAUSED state on both sides

As we all know that postcopy migration has a potential risk to lost
the VM if the network is broken during the migration. This series
tries to solve the problem by allowing the migration to pause at the
failure point, and do recovery after the link is reconnected.

There was existing work on this issue from Md Haris Iqbal:

https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html

This series is a totally re-work of the issue, based on Alexey
Perevalov's recved bitmap v8 series:

https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html

Two new status are added to support the migration (used on both
sides):

  MIGRATION_STATUS_POSTCOPY_PAUSED
  MIGRATION_STATUS_POSTCOPY_RECOVER

The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
network failure is detected. It is a phase that we'll be in for a long
time as long as the failure is detected, and we'll be there until a
recovery is triggered.  In this state, all the threads (on source:
send thread, return-path thread; destination: ram-load thread,
page-fault thread) will be halted.

The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
a recovery, both source/destination VM will jump into this stage, do
whatever it needs to prepare the recovery (e.g., currently the most
important thing is to synchronize the dirty bitmap, please see commit
messages for more information). After the preparation is ready, the
source will do the final handshake with destination, then both sides
will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.

New commands/messages are defined as well to satisfy the need:

MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
delivering received bitmaps

MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
handshake of postcopy recovery.

Here's some more details on how the whole failure/recovery routine is
happened:

- start migration
- ... (switch from precopy to postcopy)
- both sides are in "postcopy-active" state
- ... (failure happened, e.g., network unplugged)
- both sides switch to "postcopy-paused" state
  - all the migration threads are stopped on both sides
- ... (both VMs hanged)
- ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
  source side, "-r" means "recover")
- both sides switch to "postcopy-recover" state
  - on source: send-thread, return-path-thread will be waked up
  - on dest: ram-load-thread waked up, fault-thread still paused
- source calls new savevmhandler hook resume_prepare() (currently,
  only ram is providing the hook):
  - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
    - src sends MIG_CMD_RECV_BITMAP to dst
    - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
      - src uses the recved bitmap to rebuild dirty bitmap
- source do final handshake with destination
  - src sends MIG_CMD_RESUME to dst, telling "src is ready"
    - when dst receives the command, fault thread will be waked up,
      meanwhile, dst switch back to "postcopy-active"
  - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
    - when src receives the ack, state switch to "postcopy-active"
- postcopy migration continued

Testing:

As I said, it's still an extremely simple test. I used socat to create
a socket bridge:

  socat tcp-listen:6666 tcp-connect:localhost:5555 &

Then do the migration via the bridge. I emulated the network failure
by killing the socat process (bridge down), then tries to recover the
migration using the other channel (default dst channel). It looks
like:

        port:6666    +------------------+
        +----------> | socat bridge [1] |-------+
        |            +------------------+       |
        |         (Original channel)            |
        |                                       | port: 5555
     +---------+  (Recovery channel)            +--->+---------+
     | src VM  |------------------------------------>| dst VM  |
     +---------+                                     +---------+

Known issues/notes:

- currently destination listening port still cannot change. E.g., the
  recovery should be using the same port on destination for
  simplicity. (on source, we can specify new URL)

- the patch: "migration: let dst listen on port always" is still
  hacky, it just kept the incoming accept open forever for now...

- some migration numbers might still be inaccurate, like total
  migration time, etc. (But I don't really think that matters much
  now)

- the patches are very lightly tested.

- Dave reported one problem that may hang destination main loop thread
  (one vcpu thread holds the BQL) and the rest. I haven't encountered
  it yet, but it does not mean this series can survive with it.

- other potential issues that I may have forgotten or unnoticed...

Anyway, the work is still in preliminary stage. Any suggestions and
comments are greatly welcomed.  Thanks.

Peter Xu (33):
  bitmap: remove BITOP_WORD()
  bitmap: introduce bitmap_count_one()
  bitmap: provide to_le/from_le helpers
  migration: dump str in migrate_set_state trace
  migration: better error handling with QEMUFile
  migration: reuse mis->userfault_quit_fd
  migration: provide postcopy_fault_thread_notify()
  migration: new postcopy-pause state
  migration: implement "postcopy-pause" src logic
  migration: allow dst vm pause on postcopy
  migration: allow src return path to pause
  migration: allow send_rq to fail
  migration: allow fault thread to pause
  qmp: hmp: add migrate "resume" option
  migration: pass MigrationState to migrate_init()
  migration: rebuild channel on source
  migration: new state "postcopy-recover"
  migration: wakeup dst ram-load-thread for recover
  migration: new cmd MIG_CMD_RECV_BITMAP
  migration: new message MIG_RP_MSG_RECV_BITMAP
  migration: new cmd MIG_CMD_POSTCOPY_RESUME
  migration: new message MIG_RP_MSG_RESUME_ACK
  migration: introduce SaveVMHandlers.resume_prepare
  migration: synchronize dirty bitmap for resume
  migration: setup ramstate for resume
  migration: final handshake for the resume
  migration: free SocketAddress where allocated
  migration: return incoming task tag for sockets
  migration: return incoming task tag for exec
  migration: return incoming task tag for fd
  migration: store listen task tag
  migration: allow migrate_incoming for paused VM
  migration: init dst in migration_object_init too

 hmp-commands.hx              |   7 +-
 hmp.c                        |   4 +-
 include/migration/register.h |   2 +
 include/qemu/bitmap.h        |  17 ++
 migration/exec.c             |  20 +-
 migration/exec.h             |   2 +-
 migration/fd.c               |  20 +-
 migration/fd.h               |   2 +-
 migration/migration.c        | 578 ++++++++++++++++++++++++++++++++++++++-----
 migration/migration.h        |  26 +-
 migration/postcopy-ram.c     | 107 ++++++--
 migration/postcopy-ram.h     |   2 +
 migration/ram.c              | 265 +++++++++++++++++++-
 migration/ram.h              |   3 +
 migration/savevm.c           | 229 ++++++++++++++++-
 migration/savevm.h           |   3 +
 migration/socket.c           |  42 ++--
 migration/socket.h           |   4 +-
 migration/trace-events       |  21 +-
 qapi-schema.json             |  12 +-
 util/bitmap.c                |  47 ++++
 util/bitops.c                |   6 +-
 22 files changed, 1266 insertions(+), 153 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD()
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
@ 2017-08-30  8:31 ` Peter Xu
  2017-09-20  8:41   ` Juan Quintela
  2017-08-30  8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

We have BIT_WORD(). It's the same.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 util/bitops.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/util/bitops.c b/util/bitops.c
index b0c35dd..f236401 100644
--- a/util/bitops.c
+++ b/util/bitops.c
@@ -14,15 +14,13 @@
 #include "qemu/osdep.h"
 #include "qemu/bitops.h"
 
-#define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
-
 /*
  * Find the next set bit in a memory region.
  */
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
-    const unsigned long *p = addr + BITOP_WORD(offset);
+    const unsigned long *p = addr + BIT_WORD(offset);
     unsigned long result = offset & ~(BITS_PER_LONG-1);
     unsigned long tmp;
 
@@ -87,7 +85,7 @@ found_middle:
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
-    const unsigned long *p = addr + BITOP_WORD(offset);
+    const unsigned long *p = addr + BIT_WORD(offset);
     unsigned long result = offset & ~(BITS_PER_LONG-1);
     unsigned long tmp;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one()
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
  2017-08-30  8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
@ 2017-08-30  8:31 ` Peter Xu
  2017-09-20  8:25   ` Juan Quintela
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
                   ` (30 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Count how many bits set in the bitmap.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qemu/bitmap.h | 10 ++++++++++
 util/bitmap.c         | 15 +++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index c318da1..a13bd28 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -82,6 +82,7 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1,
                        const unsigned long *bitmap2, long bits);
 int slow_bitmap_intersects(const unsigned long *bitmap1,
                            const unsigned long *bitmap2, long bits);
+long slow_bitmap_count_one(const unsigned long *bitmap, long nbits);
 
 static inline unsigned long *bitmap_try_new(long nbits)
 {
@@ -216,6 +217,15 @@ static inline int bitmap_intersects(const unsigned long *src1,
     }
 }
 
+static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
+{
+    if (small_nbits(nbits)) {
+        return (ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits)));
+    } else {
+        return slow_bitmap_count_one(bitmap, nbits);
+    }
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
diff --git a/util/bitmap.c b/util/bitmap.c
index efced9a..3446d72 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -355,3 +355,18 @@ int slow_bitmap_intersects(const unsigned long *bitmap1,
     }
     return 0;
 }
+
+long slow_bitmap_count_one(const unsigned long *bitmap, long nbits)
+{
+    long k, lim = nbits/BITS_PER_LONG, result = 0;
+
+    for (k = 0; k < lim; k++) {
+        result += ctpopl(bitmap[k]);
+    }
+
+    if (nbits % BITS_PER_LONG) {
+        result += ctpopl(bitmap[k] & BITMAP_LAST_WORD_MASK(nbits));
+    }
+
+    return result;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
  2017-08-30  8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
  2017-08-30  8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-21 17:35   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
                   ` (29 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Provide helpers to convert bitmaps to little endian format. It can be
used when we want to send one bitmap via network to some other hosts.

One thing to mention is that, these helpers only solve the problem of
endianess, but it does not solve the problem of different word size on
machines (the bitmaps managing same count of bits may contains different
size when malloced). So we need to take care of the size alignment issue
on the callers for now.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qemu/bitmap.h |  7 +++++++
 util/bitmap.c         | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index a13bd28..4481975 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -39,6 +39,8 @@
  * bitmap_clear(dst, pos, nbits)		Clear specified bit area
  * bitmap_test_and_clear_atomic(dst, pos, nbits)    Test and clear area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)	Find bit free area
+ * bitmap_to_le(dst, src, nbits)      Convert bitmap to little endian
+ * bitmap_from_le(dst, src, nbits)    Convert bitmap from little endian
  */
 
 /*
@@ -247,4 +249,9 @@ static inline unsigned long *bitmap_zero_extend(unsigned long *old,
     return new;
 }
 
+void bitmap_to_le(unsigned long *dst, const unsigned long *src,
+                  long nbits);
+void bitmap_from_le(unsigned long *dst, const unsigned long *src,
+                    long nbits);
+
 #endif /* BITMAP_H */
diff --git a/util/bitmap.c b/util/bitmap.c
index 3446d72..f7aad58 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -370,3 +370,35 @@ long slow_bitmap_count_one(const unsigned long *bitmap, long nbits)
 
     return result;
 }
+
+static void bitmap_to_from_le(unsigned long *dst,
+                              const unsigned long *src, long nbits)
+{
+    long len = BITS_TO_LONGS(nbits);
+
+#ifdef HOST_WORDS_BIGENDIAN
+    long index;
+
+    for (index = 0; index < len; index++) {
+# if __WORD_SIZE == 64
+        dst[index] = bswap64(src[index]);
+# else
+        dst[index] = bswap32(src[index]);
+# endif
+    }
+#else
+    memcpy(dst, src, len * sizeof(unsigned long));
+#endif
+}
+
+void bitmap_from_le(unsigned long *dst, const unsigned long *src,
+                    long nbits)
+{
+    bitmap_to_from_le(dst, src, nbits);
+}
+
+void bitmap_to_le(unsigned long *dst, const unsigned long *src,
+                  long nbits)
+{
+    bitmap_to_from_le(dst, src, nbits);
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (2 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-06 14:36   ` Dr. David Alan Gilbert
  2017-09-20  8:44   ` Juan Quintela
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
                   ` (28 subsequent siblings)
  32 siblings, 2 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Strings are more readable for debugging.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 3 ++-
 migration/trace-events | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index eb7d767..c818412 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -914,8 +914,9 @@ void qmp_migrate_start_postcopy(Error **errp)
 
 void migrate_set_state(int *state, int old_state, int new_state)
 {
+    assert(new_state < MIGRATION_STATUS__MAX);
     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
-        trace_migrate_set_state(new_state);
+        trace_migrate_set_state(MigrationStatus_lookup[new_state]);
         migrate_generate_event(new_state);
     }
 }
diff --git a/migration/trace-events b/migration/trace-events
index 7a3b514..d2910a6 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -80,7 +80,7 @@ ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
 await_return_path_close_on_source_joining(void) ""
-migrate_set_state(int new_state) "new state %d"
+migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
 migrate_fd_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (3 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-21 17:51   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
                   ` (27 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

If the postcopy down due to some reason, we can always see this on dst:

  qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000

However in most cases that's not the real issue. The problem is that
qemu_get_be16() has no way to show whether the returned data is valid or
not, and we are _always_ assuming it is valid. That's possibly not wise.

The best approach to solve this would be: refactoring QEMUFile interface
to allow the APIs to return error if there is. However it needs quite a
bit of work and testing. For now, let's explicitly check the validity
first before using the data in all places for qemu_get_*().

This patch tries to fix most of the cases I can see. Only if we are with
this, can we make sure we are processing the valid data, and also can we
make sure we can capture the channel down events correctly.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c |  5 +++++
 migration/ram.c       | 22 ++++++++++++++++++----
 migration/savevm.c    | 41 +++++++++++++++++++++++++++++++++++++++--
 3 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c818412..92bf9b8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1543,6 +1543,11 @@ static void *source_return_path_thread(void *opaque)
         header_type = qemu_get_be16(rp);
         header_len = qemu_get_be16(rp);
 
+        if (qemu_file_get_error(rp)) {
+            mark_source_rp_bad(ms);
+            goto out;
+        }
+
         if (header_type >= MIG_RP_MSG_MAX ||
             header_type == MIG_RP_MSG_INVALID) {
             error_report("RP: Received invalid message 0x%04x length 0x%04x",
diff --git a/migration/ram.c b/migration/ram.c
index affb20c..7e20097 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2417,7 +2417,7 @@ static int ram_load_postcopy(QEMUFile *f)
     void *last_host = NULL;
     bool all_zero = false;
 
-    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
+    while (!(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr;
         void *host = NULL;
         void *page_buffer = NULL;
@@ -2426,6 +2426,16 @@ static int ram_load_postcopy(QEMUFile *f)
         uint8_t ch;
 
         addr = qemu_get_be64(f);
+
+        /*
+         * If qemu file error, we should stop here, and then "addr"
+         * may be invalid
+         */
+        ret = qemu_file_get_error(f);
+        if (ret) {
+            break;
+        }
+
         flags = addr & ~TARGET_PAGE_MASK;
         addr &= TARGET_PAGE_MASK;
 
@@ -2506,6 +2516,13 @@ static int ram_load_postcopy(QEMUFile *f)
             error_report("Unknown combination of migration flags: %#x"
                          " (postcopy mode)", flags);
             ret = -EINVAL;
+            break;
+        }
+
+        /* Detect for any possible file errors */
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
         }
 
         if (place_needed) {
@@ -2520,9 +2537,6 @@ static int ram_load_postcopy(QEMUFile *f)
                                           place_source, block);
             }
         }
-        if (!ret) {
-            ret = qemu_file_get_error(f);
-        }
     }
 
     return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index fdd15fa..7172f14 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1720,6 +1720,11 @@ static int loadvm_process_command(QEMUFile *f)
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
 
+    /* Check validity before continue processing of cmds */
+    if (qemu_file_get_error(f)) {
+        return qemu_file_get_error(f);
+    }
+
     trace_loadvm_process_command(cmd, len);
     if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
         error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
@@ -1785,6 +1790,7 @@ static int loadvm_process_command(QEMUFile *f)
  */
 static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
 {
+    int ret;
     uint8_t read_mark;
     uint32_t read_section_id;
 
@@ -1795,6 +1801,13 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
 
     read_mark = qemu_get_byte(f);
 
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        error_report("%s: Read section footer failed: %d",
+                     __func__, ret);
+        return false;
+    }
+
     if (read_mark != QEMU_VM_SECTION_FOOTER) {
         error_report("Missing section footer for %s", se->idstr);
         return false;
@@ -1830,6 +1843,13 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
     instance_id = qemu_get_be32(f);
     version_id = qemu_get_be32(f);
 
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        error_report("%s: Failed to read instance/version ID: %d",
+                     __func__, ret);
+        return ret;
+    }
+
     trace_qemu_loadvm_state_section_startfull(section_id, idstr,
             instance_id, version_id);
     /* Find savevm section */
@@ -1877,6 +1897,13 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
 
     section_id = qemu_get_be32(f);
 
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        error_report("%s: Failed to read section ID: %d",
+                     __func__, ret);
+        return ret;
+    }
+
     trace_qemu_loadvm_state_section_partend(section_id);
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (se->load_section_id == section_id) {
@@ -1944,8 +1971,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
     uint8_t section_type;
     int ret = 0;
 
-    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
-        ret = 0;
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
+        }
+
         trace_qemu_loadvm_state_section(section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
@@ -1969,6 +2002,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
                 goto out;
             }
             break;
+        case QEMU_VM_EOF:
+            /* This is the end of migration */
+            goto out;
+            break;
         default:
             error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (4 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-20  8:47   ` Juan Quintela
  2017-09-20  9:06   ` Juan Quintela
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
                   ` (26 subsequent siblings)
  32 siblings, 2 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

It was only used for quitting the page fault thread before. Let it be
something more useful - now we can use it to notify a "wake" for the
page fault thread (for any reason), and it only means "quit" if the
fault_thread_quit is set.

Since we changed what it does, renaming it to userfault_event_fd.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  6 ++++--
 migration/postcopy-ram.c | 26 +++++++++++++++++---------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 148c9fa..70e3094 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -35,6 +35,8 @@ struct MigrationIncomingState {
     bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
+    /* Set this when we want the fault thread to quit */
+    bool           fault_thread_quit;
 
     bool           have_listen_thread;
     QemuThread     listen_thread;
@@ -42,8 +44,8 @@ struct MigrationIncomingState {
 
     /* For the kernel to send us notifications */
     int       userfault_fd;
-    /* To tell the fault_thread to quit */
-    int       userfault_quit_fd;
+    /* To notify the fault_thread to wake, e.g., when need to quit */
+    int       userfault_event_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 7a414eb..0138a49 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -305,17 +305,18 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
          * currently be at 0, we're going to increment it to 1
          */
         tmp64 = 1;
-        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+        atomic_set(&mis->fault_thread_quit, 1);
+        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
             trace_postcopy_ram_incoming_cleanup_join();
             qemu_thread_join(&mis->fault_thread);
         } else {
             /* Not much we can do here, but may as well report it */
-            error_report("%s: incrementing userfault_quit_fd: %s", __func__,
+            error_report("%s: incrementing userfault_event_fd: %s", __func__,
                          strerror(errno));
         }
         trace_postcopy_ram_incoming_cleanup_closeuf();
         close(mis->userfault_fd);
-        close(mis->userfault_quit_fd);
+        close(mis->userfault_event_fd);
         mis->have_fault_thread = false;
     }
 
@@ -438,7 +439,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
         pfd[0].fd = mis->userfault_fd;
         pfd[0].events = POLLIN;
         pfd[0].revents = 0;
-        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].fd = mis->userfault_event_fd;
         pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
         pfd[1].revents = 0;
 
@@ -448,8 +449,15 @@ static void *postcopy_ram_fault_thread(void *opaque)
         }
 
         if (pfd[1].revents) {
-            trace_postcopy_ram_fault_thread_quit();
-            break;
+            uint64_t tmp64 = 0;
+
+            /* Consume the signal */
+            read(mis->userfault_event_fd, &tmp64, 8);
+
+            if (atomic_read(&mis->fault_thread_quit)) {
+                trace_postcopy_ram_fault_thread_quit();
+                break;
+            }
         }
 
         ret = read(mis->userfault_fd, &msg, sizeof(msg));
@@ -528,9 +536,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     }
 
     /* Now an eventfd we use to tell the fault-thread to quit */
-    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
-    if (mis->userfault_quit_fd == -1) {
-        error_report("%s: Opening userfault_quit_fd: %s", __func__,
+    mis->userfault_event_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_event_fd == -1) {
+        error_report("%s: Opening userfault_event_fd: %s", __func__,
                      strerror(errno));
         close(mis->userfault_fd);
         return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify()
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (5 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

A general helper to notify the fault thread.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 35 ++++++++++++++++++++---------------
 migration/postcopy-ram.h |  2 ++
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0138a49..c28e340 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -287,6 +287,21 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
     return 0;
 }
 
+void postcopy_fault_thread_notify(MigrationIncomingState *mis)
+{
+    uint64_t tmp64 = 1;
+
+    /*
+     * Wakeup the fault_thread.  It's an eventfd that should currently
+     * be at 0, we're going to increment it to 1
+     */
+    if (write(mis->userfault_event_fd, &tmp64, 8) != 8) {
+        /* Not much we can do here, but may as well report it */
+        error_report("%s: incrementing failed: %s", __func__,
+                     strerror(errno));
+    }
+}
+
 /*
  * At the end of a migration where postcopy_ram_incoming_init was called.
  */
@@ -295,25 +310,15 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     trace_postcopy_ram_incoming_cleanup_entry();
 
     if (mis->have_fault_thread) {
-        uint64_t tmp64;
-
         if (qemu_ram_foreach_block(cleanup_range, mis)) {
             return -1;
         }
-        /*
-         * Tell the fault_thread to exit, it's an eventfd that should
-         * currently be at 0, we're going to increment it to 1
-         */
-        tmp64 = 1;
+        /* Let the fault thread quit */
         atomic_set(&mis->fault_thread_quit, 1);
-        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
-            trace_postcopy_ram_incoming_cleanup_join();
-            qemu_thread_join(&mis->fault_thread);
-        } else {
-            /* Not much we can do here, but may as well report it */
-            error_report("%s: incrementing userfault_event_fd: %s", __func__,
-                         strerror(errno));
-        }
+        postcopy_fault_thread_notify(mis);
+        trace_postcopy_ram_incoming_cleanup_join();
+        qemu_thread_join(&mis->fault_thread);
+
         trace_postcopy_ram_incoming_cleanup_closeuf();
         close(mis->userfault_fd);
         close(mis->userfault_event_fd);
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 78a3591..4a7644d 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -114,4 +114,6 @@ PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
 
+void postcopy_fault_thread_notify(MigrationIncomingState *mis);
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (6 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-21 17:57   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
                   ` (24 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing a new state "postcopy-paused", which can be used when the
postcopy migration is paused. It is targeted for postcopy network
failure recovery.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 2 ++
 qapi-schema.json      | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 92bf9b8..f6130db 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -470,6 +470,7 @@ static bool migration_is_setup_or_active(int state)
     switch (state) {
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_SETUP:
         return true;
 
@@ -545,6 +546,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
diff --git a/qapi-schema.json b/qapi-schema.json
index 802ea53..368b592 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -667,6 +667,8 @@
 #
 # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
 #
+# @postcopy-paused: during postcopy but paused. (since 2.11)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -679,7 +681,8 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
+            'active', 'postcopy-active', 'postcopy-paused',
+            'completed', 'failed', 'colo' ] }
 
 ##
 # @MigrationInfo:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (7 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-21 19:21   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
                   ` (23 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Now when network down for postcopy, the source side will not fail the
migration. Instead we convert the status into this new paused state, and
we will try to wait for a rescue in the future.

If a recovery is detected, migration_thread() will reset its local
variables to prepare for that.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++++---
 migration/migration.h  |  3 ++
 migration/trace-events |  1 +
 3 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f6130db..8d26ea8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -993,6 +993,8 @@ static void migrate_fd_cleanup(void *opaque)
 
     notifier_list_notify(&migration_state_notifiers, s);
     block_cleanup_parameters(s);
+
+    qemu_sem_destroy(&s->postcopy_pause_sem);
 }
 
 void migrate_fd_error(MigrationState *s, const Error *error)
@@ -1136,6 +1138,7 @@ MigrationState *migrate_init(void)
     s->migration_thread_running = false;
     error_free(s->error);
     s->error = NULL;
+    qemu_sem_init(&s->postcopy_pause_sem, 0);
 
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
@@ -1938,6 +1941,80 @@ bool migrate_colo_enabled(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
 }
 
+typedef enum MigThrError {
+    /* No error detected */
+    MIG_THR_ERR_NONE = 0,
+    /* Detected error, but resumed successfully */
+    MIG_THR_ERR_RECOVERED = 1,
+    /* Detected fatal error, need to exit */
+    MIG_THR_ERR_FATAL = 2,
+} MigThrError;
+
+/*
+ * We don't return until we are in a safe state to continue current
+ * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
+ * MIG_THR_ERR_FATAL if unrecovery failure happened.
+ */
+static MigThrError postcopy_pause(MigrationState *s)
+{
+    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    /* Current channel is possibly broken. Release it. */
+    assert(s->to_dst_file);
+    qemu_file_shutdown(s->to_dst_file);
+    qemu_fclose(s->to_dst_file);
+    s->to_dst_file = NULL;
+
+    error_report("Detected IO failure for postcopy. "
+                 "Migration paused.");
+
+    /*
+     * We wait until things fixed up. Then someone will setup the
+     * status back for us.
+     */
+    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&s->postcopy_pause_sem);
+    }
+
+    trace_postcopy_pause_continued();
+
+    return MIG_THR_ERR_RECOVERED;
+}
+
+static MigThrError migration_detect_error(MigrationState *s)
+{
+    int ret;
+
+    /* Try to detect any file errors */
+    ret = qemu_file_get_error(s->to_dst_file);
+
+    if (!ret) {
+        /* Everything is fine */
+        return MIG_THR_ERR_NONE;
+    }
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
+        /*
+         * For postcopy, we allow the network to be down for a
+         * while. After that, it can be continued by a
+         * recovery phase.
+         */
+        return postcopy_pause(s);
+    } else {
+        /*
+         * For precopy (or postcopy with error outside IO), we fail
+         * with no time.
+         */
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+        trace_migration_thread_file_err();
+
+        /* Time to stop the migration, now. */
+        return MIG_THR_ERR_FATAL;
+    }
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
@@ -1962,6 +2039,7 @@ static void *migration_thread(void *opaque)
     /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
     enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
     bool enable_colo = migrate_colo_enabled();
+    MigThrError thr_error;
 
     rcu_register_thread();
 
@@ -2034,12 +2112,24 @@ static void *migration_thread(void *opaque)
             }
         }
 
-        if (qemu_file_get_error(s->to_dst_file)) {
-            migrate_set_state(&s->state, current_active_state,
-                              MIGRATION_STATUS_FAILED);
-            trace_migration_thread_file_err();
+        /*
+         * Try to detect any kind of failures, and see whether we
+         * should stop the migration now.
+         */
+        thr_error = migration_detect_error(s);
+        if (thr_error == MIG_THR_ERR_FATAL) {
+            /* Stop migration */
             break;
+        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
+            /*
+             * Just recovered from a e.g. network failure, reset all
+             * the local variables. This is important to avoid
+             * breaking transferred_bytes and bandwidth calculation
+             */
+            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+            initial_bytes = 0;
         }
+
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         if (current_time >= initial_time + BUFFER_DELAY) {
             uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
diff --git a/migration/migration.h b/migration/migration.h
index 70e3094..0c957c9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -149,6 +149,9 @@ struct MigrationState
     bool send_configuration;
     /* Whether we send section footer during migration */
     bool send_section_footer;
+
+    /* Needed by postcopy-pause state */
+    QemuSemaphore postcopy_pause_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/trace-events b/migration/trace-events
index d2910a6..907564b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -98,6 +98,7 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (8 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-21 19:29   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
                   ` (22 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

When there is IO error on the incoming channel (e.g., network down),
instead of bailing out immediately, we allow the dst vm to switch to the
new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
new semaphore, until someone poke it for another attempt.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  1 +
 migration/migration.h  |  3 +++
 migration/savevm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
 migration/trace-events |  2 ++
 4 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8d26ea8..80de212 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
         memset(&mis_current, 0, sizeof(MigrationIncomingState));
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
         once = true;
     }
     return &mis_current;
diff --git a/migration/migration.h b/migration/migration.h
index 0c957c9..c423682 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -60,6 +60,9 @@ struct MigrationIncomingState {
     /* The coroutine we should enter (back) after failover */
     Coroutine *migration_incoming_co;
     QemuSemaphore colo_incoming_sem;
+
+    /* notify PAUSED postcopy incoming migrations to try to continue */
+    QemuSemaphore postcopy_pause_sem_dst;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index 7172f14..3777124 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
  */
 static void *postcopy_ram_listen_thread(void *opaque)
 {
-    QEMUFile *f = opaque;
     MigrationIncomingState *mis = migration_incoming_get_current();
+    QEMUFile *f = mis->from_src_file;
     int load_res;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
      */
     qemu_file_set_blocking(f, true);
     load_res = qemu_loadvm_state_main(f, mis);
+
+    /*
+     * This is tricky, but, mis->from_src_file can change after it
+     * returns, when postcopy recovery happened. In the future, we may
+     * want a wrapper for the QEMUFile handle.
+     */
+    f = mis->from_src_file;
+
     /* And non-blocking again so we don't block in any cleanup */
     qemu_file_set_blocking(f, false);
 
@@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     /* Start up the listening thread and wait for it to signal ready */
     qemu_sem_init(&mis->listen_thread_sem, 0);
     qemu_thread_create(&mis->listen_thread, "postcopy/listen",
-                       postcopy_ram_listen_thread, mis->from_src_file,
+                       postcopy_ram_listen_thread, NULL,
                        QEMU_THREAD_DETACHED);
     qemu_sem_wait(&mis->listen_thread_sem);
     qemu_sem_destroy(&mis->listen_thread_sem);
@@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
     }
 }
 
+/* Return true if we should continue the migration, or false. */
+static bool postcopy_pause_incoming(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_incoming();
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    assert(mis->from_src_file);
+    qemu_file_shutdown(mis->from_src_file);
+    qemu_fclose(mis->from_src_file);
+    mis->from_src_file = NULL;
+
+    assert(mis->to_src_file);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_file_shutdown(mis->to_src_file);
+    qemu_fclose(mis->to_src_file);
+    mis->to_src_file = NULL;
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    error_report("Detected IO failure for postcopy. "
+                 "Migration paused.");
+
+    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
+    }
+
+    trace_postcopy_pause_incoming_continued();
+
+    return true;
+}
+
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret = 0;
 
+retry:
     while (true) {
         section_type = qemu_get_byte(f);
 
@@ -2016,6 +2057,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 out:
     if (ret < 0) {
         qemu_file_set_error(f, ret);
+
+        /*
+         * Detect whether it is:
+         *
+         * 1. postcopy running
+         * 2. network failure (-EIO)
+         *
+         * If so, we try to wait for a recovery.
+         */
+        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
+            ret == -EIO && postcopy_pause_incoming(mis)) {
+            /* Reset f to point to the newly created channel */
+            f = mis->from_src_file;
+            goto retry;
+        }
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 907564b..7764c6f 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -99,6 +99,8 @@ open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_continued(void) ""
+postcopy_pause_incoming(void) ""
+postcopy_pause_incoming_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (9 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Let the thread pause for network issues.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 35 +++++++++++++++++++++++++++++++++--
 migration/migration.h  |  1 +
 migration/trace-events |  2 ++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 80de212..b3cd8be 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -996,6 +996,7 @@ static void migrate_fd_cleanup(void *opaque)
     block_cleanup_parameters(s);
 
     qemu_sem_destroy(&s->postcopy_pause_sem);
+    qemu_sem_destroy(&s->postcopy_pause_rp_sem);
 }
 
 void migrate_fd_error(MigrationState *s, const Error *error)
@@ -1140,6 +1141,7 @@ MigrationState *migrate_init(void)
     error_free(s->error);
     s->error = NULL;
     qemu_sem_init(&s->postcopy_pause_sem, 0);
+    qemu_sem_init(&s->postcopy_pause_rp_sem, 0);
 
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
@@ -1527,6 +1529,18 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
     }
 }
 
+/* Return true to retry, false to quit */
+static bool postcopy_pause_return_path_thread(MigrationState *s)
+{
+    trace_postcopy_pause_return_path();
+
+    qemu_sem_wait(&s->postcopy_pause_rp_sem);
+
+    trace_postcopy_pause_return_path_continued();
+
+    return true;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1543,6 +1557,8 @@ static void *source_return_path_thread(void *opaque)
     int res;
 
     trace_source_return_path_thread_entry();
+
+retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
@@ -1634,13 +1650,28 @@ static void *source_return_path_thread(void *opaque)
             break;
         }
     }
-    if (qemu_file_get_error(rp)) {
+
+out:
+    res = qemu_file_get_error(rp);
+    if (res) {
+        if (res == -EIO) {
+            /*
+             * Maybe there is something we can do: it looks like a
+             * network down issue, and we pause for a recovery.
+             */
+            if (postcopy_pause_return_path_thread(ms)) {
+                /* Reload rp, reset the rest */
+                rp = ms->rp_state.from_dst_file;
+                ms->rp_state.error = false;
+                goto retry;
+            }
+        }
+
         trace_source_return_path_thread_bad_end();
         mark_source_rp_bad(ms);
     }
 
     trace_source_return_path_thread_end();
-out:
     ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
     return NULL;
diff --git a/migration/migration.h b/migration/migration.h
index c423682..323d88d 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -155,6 +155,7 @@ struct MigrationState
 
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;
+    QemuSemaphore postcopy_pause_rp_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/trace-events b/migration/trace-events
index 7764c6f..1a83f60 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -98,6 +98,8 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_return_path(void) ""
+postcopy_pause_return_path_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (10 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

We will not allow failures to happen when sending data from destination
to source via the return path. However it is possible that there can be
errors along the way.  This patch allows the migrate_send_rp_message()
to return error when it happens, and further extended it to
migrate_send_rp_req_pages().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 38 ++++++++++++++++++++++++++++++--------
 migration/migration.h |  2 +-
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b3cd8be..d42209d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -195,17 +195,35 @@ static void deferred_incoming_migration(Error **errp)
  * Send a message on the return channel back to the source
  * of the migration.
  */
-static void migrate_send_rp_message(MigrationIncomingState *mis,
-                                    enum mig_rp_message_type message_type,
-                                    uint16_t len, void *data)
+static int migrate_send_rp_message(MigrationIncomingState *mis,
+                                   enum mig_rp_message_type message_type,
+                                   uint16_t len, void *data)
 {
+    int ret = 0;
+
     trace_migrate_send_rp_message((int)message_type, len);
     qemu_mutex_lock(&mis->rp_mutex);
+
+    /*
+     * It's possible that the file handle got lost due to network
+     * failures.
+     */
+    if (!mis->to_src_file) {
+        ret = -EIO;
+        goto error;
+    }
+
     qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
     qemu_put_be16(mis->to_src_file, len);
     qemu_put_buffer(mis->to_src_file, data, len);
     qemu_fflush(mis->to_src_file);
+
+    /* It's possible that qemu file got error during sending */
+    ret = qemu_file_get_error(mis->to_src_file);
+
+error:
     qemu_mutex_unlock(&mis->rp_mutex);
+    return ret;
 }
 
 /* Request a range of pages from the source VM at the given
@@ -215,26 +233,30 @@ static void migrate_send_rp_message(MigrationIncomingState *mis,
  *   Start: Address offset within the RB
  *   Len: Length in bytes required - must be a multiple of pagesize
  */
-void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
-                               ram_addr_t start, size_t len)
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
+                              ram_addr_t start, size_t len)
 {
     uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
     size_t msglen = 12; /* start + len */
+    int rbname_len;
+    enum mig_rp_message_type msg_type;
 
     *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
     *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
 
     if (rbname) {
-        int rbname_len = strlen(rbname);
+        rbname_len = strlen(rbname);
         assert(rbname_len < 256);
 
         bufc[msglen++] = rbname_len;
         memcpy(bufc + msglen, rbname, rbname_len);
         msglen += rbname_len;
-        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc);
+        msg_type = MIG_RP_MSG_REQ_PAGES_ID;
     } else {
-        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
+        msg_type = MIG_RP_MSG_REQ_PAGES;
     }
+
+    return migrate_send_rp_message(mis, msg_type, msglen, bufc);
 }
 
 void qemu_start_incoming_migration(const char *uri, Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index 323d88d..6333391 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -199,7 +199,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
-void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
 
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (11 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Allows the fault thread to stop handling page faults temporarily. When
network failure happened (and if we expect a recovery afterwards), we
should not allow the fault thread to continue sending things to source,
instead, it should halt for a while until the connection is rebuilt.

When the dest main thread noticed the failure, it kicks the fault thread
to switch to pause state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    |  1 +
 migration/migration.h    |  1 +
 migration/postcopy-ram.c | 50 ++++++++++++++++++++++++++++++++++++++++++++----
 migration/savevm.c       |  3 +++
 migration/trace-events   |  2 ++
 5 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d42209d..722f8ac 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -147,6 +147,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
         qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
         once = true;
     }
     return &mis_current;
diff --git a/migration/migration.h b/migration/migration.h
index 6333391..338dfe3 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -63,6 +63,7 @@ struct MigrationIncomingState {
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
     QemuSemaphore postcopy_pause_sem_dst;
+    QemuSemaphore postcopy_pause_sem_fault;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index c28e340..026a58e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -418,6 +418,17 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_fault_thread();
+
+    qemu_sem_wait(&mis->postcopy_pause_sem_fault);
+
+    trace_postcopy_pause_fault_thread_continued();
+
+    return true;
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -465,6 +476,22 @@ static void *postcopy_ram_fault_thread(void *opaque)
             }
         }
 
+        if (!mis->to_src_file) {
+            /*
+             * Possibly someone tells us that the return path is
+             * broken already using the event. We should hold until
+             * the channel is rebuilt.
+             */
+            if (postcopy_pause_fault_thread(mis)) {
+                last_rb = NULL;
+                /* Continue to read the userfaultfd */
+            } else {
+                error_report("%s: paused but don't allow to continue",
+                             __func__);
+                break;
+            }
+        }
+
         ret = read(mis->userfault_fd, &msg, sizeof(msg));
         if (ret != sizeof(msg)) {
             if (errno == EAGAIN) {
@@ -504,18 +531,33 @@ static void *postcopy_ram_fault_thread(void *opaque)
                                                 qemu_ram_get_idstr(rb),
                                                 rb_offset);
 
+retry:
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
          */
         if (rb != last_rb) {
             last_rb = rb;
-            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                     rb_offset, qemu_ram_pagesize(rb));
+            ret = migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                            rb_offset, qemu_ram_pagesize(rb));
         } else {
             /* Save some space */
-            migrate_send_rp_req_pages(mis, NULL,
-                                     rb_offset, qemu_ram_pagesize(rb));
+            ret = migrate_send_rp_req_pages(mis, NULL,
+                                            rb_offset, qemu_ram_pagesize(rb));
+        }
+
+        if (ret) {
+            /* May be network failure, try to wait for recovery */
+            if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
+                /* We got reconnected somehow, try to continue */
+                last_rb = NULL;
+                goto retry;
+            } else {
+                /* This is a unavoidable fault */
+                error_report("%s: migrate_send_rp_req_pages() get %d",
+                             __func__, ret);
+                break;
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
diff --git a/migration/savevm.c b/migration/savevm.c
index 3777124..a3162c1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1994,6 +1994,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     mis->to_src_file = NULL;
     qemu_mutex_unlock(&mis->rp_mutex);
 
+    /* Notify the fault thread for the invalidated file handle */
+    postcopy_fault_thread_notify(mis);
+
     error_report("Detected IO failure for postcopy. "
                  "Migration paused.");
 
diff --git a/migration/trace-events b/migration/trace-events
index 1a83f60..42a93d9 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -100,6 +100,8 @@ open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_return_path(void) ""
 postcopy_pause_return_path_continued(void) ""
+postcopy_pause_fault_thread(void) ""
+postcopy_pause_fault_thread_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (12 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

It will be used when we want to resume one paused migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp-commands.hx       | 7 ++++---
 hmp.c                 | 4 +++-
 migration/migration.c | 2 +-
 qapi-schema.json      | 5 ++++-
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 1941e19..7adb029 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -928,13 +928,14 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
+        .params     = "[-d] [-b] [-i] [-r] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "(base image shared between src and destination)"
+                      "\n\t\t\t -r to resume a paused migration",
         .cmd        = hmp_migrate,
     },
 
diff --git a/hmp.c b/hmp.c
index fd80dce..ebc1563 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1891,10 +1891,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     bool detach = qdict_get_try_bool(qdict, "detach", false);
     bool blk = qdict_get_try_bool(qdict, "blk", false);
     bool inc = qdict_get_try_bool(qdict, "inc", false);
+    bool resume = qdict_get_try_bool(qdict, "resume", false);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc,
+                false, false, true, resume, &err);
     if (err) {
         error_report_err(err);
         return;
diff --git a/migration/migration.c b/migration/migration.c
index 722f8ac..394e84b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1238,7 +1238,7 @@ bool migration_is_blocked(Error **errp)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 Error **errp)
+                 bool has_resume, bool resume, Error **errp)
 {
     Error *local_err = NULL;
     MigrationState *s = migrate_get_current();
diff --git a/qapi-schema.json b/qapi-schema.json
index 368b592..ba41f2c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3208,6 +3208,8 @@
 # @detach: this argument exists only for compatibility reasons and
 #          is ignored by QEMU
 #
+# @resume: resume one paused migration, default "off". (since 2.11)
+#
 # Returns: nothing on success
 #
 # Since: 0.14.0
@@ -3229,7 +3231,8 @@
 #
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
+           '*detach': 'bool', '*resume': 'bool' } }
 
 ##
 # @migrate-incoming:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init()
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (13 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22  9:09   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
                   ` (17 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Let the callers take the object, then pass it to migrate_init().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 7 ++-----
 migration/migration.h | 2 +-
 migration/savevm.c    | 5 ++++-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 394e84b..15b8eb1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1138,10 +1138,8 @@ bool migration_is_idle(void)
     return false;
 }
 
-MigrationState *migrate_init(void)
+void migrate_init(MigrationState *s)
 {
-    MigrationState *s = migrate_get_current();
-
     /*
      * Reinitialise all migration state, except
      * parameters/capabilities that the user set, and
@@ -1169,7 +1167,6 @@ MigrationState *migrate_init(void)
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-    return s;
 }
 
 static GSList *migration_blockers;
@@ -1277,7 +1274,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         migrate_set_block_incremental(s, true);
     }
 
-    s = migrate_init();
+    migrate_init(s);
 
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
diff --git a/migration/migration.h b/migration/migration.h
index 338dfe3..b78b9bd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -169,7 +169,7 @@ void migrate_fd_error(MigrationState *s, const Error *error);
 
 void migrate_fd_connect(MigrationState *s);
 
-MigrationState *migrate_init(void);
+void migrate_init(MigrationState *s);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index a3162c1..c9bccf7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1224,8 +1224,11 @@ void qemu_savevm_state_cleanup(void)
 static int qemu_savevm_state(QEMUFile *f, Error **errp)
 {
     int ret;
-    MigrationState *ms = migrate_init();
+    MigrationState *ms = migrate_get_current();
     MigrationStatus status;
+
+    migrate_init(ms);
+
     ms->to_dst_file = f;
 
     if (migration_is_blocked(errp)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (14 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22  9:56   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
                   ` (16 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This patch detects the "resume" flag of migration command, rebuild the
channels only if the flag is set.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 92 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 69 insertions(+), 23 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 15b8eb1..deb947b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1233,49 +1233,75 @@ bool migration_is_blocked(Error **errp)
     return false;
 }
 
-void qmp_migrate(const char *uri, bool has_blk, bool blk,
-                 bool has_inc, bool inc, bool has_detach, bool detach,
-                 bool has_resume, bool resume, Error **errp)
+/* Returns true if continue to migrate, or false if error detected */
+static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
+                            bool resume, Error **errp)
 {
     Error *local_err = NULL;
-    MigrationState *s = migrate_get_current();
-    const char *p;
+
+    if (resume) {
+        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+            error_setg(errp, "Cannot resume if there is no "
+                       "paused migration");
+            return false;
+        }
+        /* This is a resume, skip init status */
+        return true;
+    }
 
     if (migration_is_setup_or_active(s->state) ||
         s->state == MIGRATION_STATUS_CANCELLING ||
         s->state == MIGRATION_STATUS_COLO) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
-        return;
+        return false;
     }
+
     if (runstate_check(RUN_STATE_INMIGRATE)) {
         error_setg(errp, "Guest is waiting for an incoming migration");
-        return;
+        return false;
     }
 
     if (migration_is_blocked(errp)) {
-        return;
+        return false;
     }
 
-    if ((has_blk && blk) || (has_inc && inc)) {
+    if (blk || blk_inc) {
         if (migrate_use_block() || migrate_use_block_incremental()) {
             error_setg(errp, "Command options are incompatible with "
                        "current migration capabilities");
-            return;
+            return false;
         }
         migrate_set_block_enabled(true, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
-            return;
+            return false;
         }
         s->must_remove_block_options = true;
     }
 
-    if (has_inc && inc) {
+    if (blk_inc) {
         migrate_set_block_incremental(s, true);
     }
 
     migrate_init(s);
 
+    return true;
+}
+
+void qmp_migrate(const char *uri, bool has_blk, bool blk,
+                 bool has_inc, bool inc, bool has_detach, bool detach,
+                 bool has_resume, bool resume, Error **errp)
+{
+    Error *local_err = NULL;
+    MigrationState *s = migrate_get_current();
+    const char *p;
+
+    if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
+                         has_resume && resume, errp)) {
+        /* Error detected, put into errp */
+        return;
+    }
+
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -1697,7 +1723,8 @@ out:
     return NULL;
 }
 
-static int open_return_path_on_source(MigrationState *ms)
+static int open_return_path_on_source(MigrationState *ms,
+                                      bool create_thread)
 {
 
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
@@ -1706,6 +1733,12 @@ static int open_return_path_on_source(MigrationState *ms)
     }
 
     trace_open_return_path_on_source();
+
+    if (!create_thread) {
+        /* We're done */
+        return 0;
+    }
+
     qemu_thread_create(&ms->rp_state.rp_thread, "return path",
                        source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
 
@@ -2263,15 +2296,24 @@ static void *migration_thread(void *opaque)
 
 void migrate_fd_connect(MigrationState *s)
 {
-    s->expected_downtime = s->parameters.downtime_limit;
-    s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
+    int64_t rate_limit;
+    bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
 
-    qemu_file_set_blocking(s->to_dst_file, true);
-    qemu_file_set_rate_limit(s->to_dst_file,
-                             s->parameters.max_bandwidth / XFER_LIMIT_RATIO);
+    if (resume) {
+        /* This is a resumed migration */
+        rate_limit = INT64_MAX;
+    } else {
+        /* This is a fresh new migration */
+        rate_limit = s->parameters.max_bandwidth / XFER_LIMIT_RATIO;
+        s->expected_downtime = s->parameters.downtime_limit;
+        s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
 
-    /* Notify before starting migration thread */
-    notifier_list_notify(&migration_state_notifiers, s);
+        /* Notify before starting migration thread */
+        notifier_list_notify(&migration_state_notifiers, s);
+    }
+
+    qemu_file_set_rate_limit(s->to_dst_file, rate_limit);
+    qemu_file_set_blocking(s->to_dst_file, true);
 
     /*
      * Open the return path. For postcopy, it is used exclusively. For
@@ -2279,15 +2321,19 @@ void migrate_fd_connect(MigrationState *s)
      * QEMU uses the return path.
      */
     if (migrate_postcopy_ram() || migrate_use_return_path()) {
-        if (open_return_path_on_source(s)) {
+        if (open_return_path_on_source(s, !resume)) {
             error_report("Unable to open return-path for postcopy");
-            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-                              MIGRATION_STATUS_FAILED);
+            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
             migrate_fd_cleanup(s);
             return;
         }
     }
 
+    if (resume) {
+        /* TODO: do the resume logic */
+        return;
+    }
+
     qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
     s->migration_thread_running = true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover"
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (15 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 10:08   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
                   ` (15 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing new migration state "postcopy-recover". If a migration
procedure is paused and the connection is rebuilt afterward
successfully, we'll switch the source VM state from "postcopy-paused" to
the new state "postcopy-recover", then we'll do the resume logic in the
migration thread (along with the return path thread).

This patch only do the state switch on source side. Another following up
patch will handle the state switching on destination side using the same
status bit.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 76 ++++++++++++++++++++++++++++++++++++++-------------
 qapi-schema.json      |  4 ++-
 2 files changed, 60 insertions(+), 20 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index deb947b..30dd566 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -495,6 +495,7 @@ static bool migration_is_setup_or_active(int state)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
     case MIGRATION_STATUS_SETUP:
         return true;
 
@@ -571,6 +572,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
@@ -2035,6 +2037,13 @@ typedef enum MigThrError {
     MIG_THR_ERR_FATAL = 2,
 } MigThrError;
 
+/* Return zero if success, or <0 for error */
+static int postcopy_do_resume(MigrationState *s)
+{
+    /* TODO: do the resume logic */
+    return 0;
+}
+
 /*
  * We don't return until we are in a safe state to continue current
  * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
@@ -2043,29 +2052,55 @@ typedef enum MigThrError {
 static MigThrError postcopy_pause(MigrationState *s)
 {
     assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                      MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-    /* Current channel is possibly broken. Release it. */
-    assert(s->to_dst_file);
-    qemu_file_shutdown(s->to_dst_file);
-    qemu_fclose(s->to_dst_file);
-    s->to_dst_file = NULL;
+    while (true) {
+        migrate_set_state(&s->state, s->state,
+                          MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-    error_report("Detected IO failure for postcopy. "
-                 "Migration paused.");
+        /* Current channel is possibly broken. Release it. */
+        assert(s->to_dst_file);
+        qemu_file_shutdown(s->to_dst_file);
+        qemu_fclose(s->to_dst_file);
+        s->to_dst_file = NULL;
 
-    /*
-     * We wait until things fixed up. Then someone will setup the
-     * status back for us.
-     */
-    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
-        qemu_sem_wait(&s->postcopy_pause_sem);
-    }
+        error_report("Detected IO failure for postcopy. "
+                     "Migration paused.");
+
+        /*
+         * We wait until things fixed up. Then someone will setup the
+         * status back for us.
+         */
+        while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+            qemu_sem_wait(&s->postcopy_pause_sem);
+        }
 
-    trace_postcopy_pause_continued();
+        if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+            /* Woken up by a recover procedure. Give it a shot */
+
+            /*
+             * Firstly, let's wake up the return path now, with a new
+             * return path channel.
+             */
+            qemu_sem_post(&s->postcopy_pause_rp_sem);
 
-    return MIG_THR_ERR_RECOVERED;
+            /* Do the resume logic */
+            if (postcopy_do_resume(s) == 0) {
+                /* Let's continue! */
+                trace_postcopy_pause_continued();
+                return MIG_THR_ERR_RECOVERED;
+            } else {
+                /*
+                 * Something wrong happened during the recovery, let's
+                 * pause again. Pause is always better than throwing
+                 * data away.
+                 */
+                continue;
+            }
+        } else {
+            /* This is not right... Time to quit. */
+            return MIG_THR_ERR_FATAL;
+        }
+    }
 }
 
 static MigThrError migration_detect_error(MigrationState *s)
@@ -2330,7 +2365,10 @@ void migrate_fd_connect(MigrationState *s)
     }
 
     if (resume) {
-        /* TODO: do the resume logic */
+        /* Wakeup the main migration thread to do the recovery */
+        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+        qemu_sem_post(&s->postcopy_pause_sem);
         return;
     }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index ba41f2c..989f95a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -669,6 +669,8 @@
 #
 # @postcopy-paused: during postcopy but paused. (since 2.11)
 #
+# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -682,7 +684,7 @@
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
             'active', 'postcopy-active', 'postcopy-paused',
-            'completed', 'failed', 'colo' ] }
+            'postcopy-recover', 'completed', 'failed', 'colo' ] }
 
 ##
 # @MigrationInfo:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (16 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

On the destination side, we cannot wake up all the threads when we got
reconnected. The first thing to do is to wake up the main load thread,
so that we can continue to receive valid messages from source again and
reply when needed.

At this point, we switch the destination VM state from postcopy-paused
back to postcopy-recover.

Now we are finally ready to do the resume logic.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 30dd566..1370c70 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -389,10 +389,37 @@ static void process_incoming_migration_co(void *opaque)
 
 void migration_fd_process_incoming(QEMUFile *f)
 {
-    Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, f);
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    Coroutine *co;
+
+    mis->from_src_file = f;
+
+    if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        /* Resumed from a paused postcopy migration */
+
+        /* Postcopy has standalone thread to do vm load */
+        qemu_file_set_blocking(f, true);
+
+        /* Re-configure the return path */
+        mis->to_src_file = qemu_file_get_return_path(f);
 
-    qemu_file_set_blocking(f, false);
-    qemu_coroutine_enter(co);
+        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+
+        /*
+         * Here, we only wake up the main loading thread (while the
+         * fault thread will still be waiting), so that we can receive
+         * commands from source now, and answer it if needed. The
+         * fault thread will be woken up afterwards until we are sure
+         * that source is ready to reply to page requests.
+         */
+        qemu_sem_post(&mis->postcopy_pause_sem_dst);
+    } else {
+        /* New incoming migration */
+        qemu_file_set_blocking(f, false);
+        co = qemu_coroutine_create(process_incoming_migration_co, f);
+        qemu_coroutine_enter(co);
+    }
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (17 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
one ramblock.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.h     |  1 +
 migration/trace-events |  2 ++
 3 files changed, 64 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index c9bccf7..f532ca0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -78,6 +78,7 @@ enum qemu_vm_cmd {
                                       were previously sent during
                                       precopy but are dirty. */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
+    MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
 };
 
@@ -95,6 +96,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
+    [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -929,6 +931,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
+{
+    size_t len;
+    char buf[256];
+
+    trace_savevm_send_recv_bitmap(block_name);
+
+    buf[0] = len = strlen(block_name);
+    memcpy(buf + 1, block_name, len);
+
+    qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1716,6 +1731,49 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
 }
 
 /*
+ * Handle request that source requests for recved_bitmap on
+ * destination. Payload format:
+ *
+ * len (1 byte) + ramblock_name (<255 bytes)
+ */
+static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
+                                     uint16_t len)
+{
+    QEMUFile *file = mis->from_src_file;
+    RAMBlock *rb;
+    char block_name[256];
+    size_t cnt;
+
+    cnt = qemu_get_counted_string(file, block_name);
+    if (!cnt) {
+        error_report("%s: failed to read block name", __func__);
+        return -EINVAL;
+    }
+
+    /* Validate before using the data */
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    if (len != cnt + 1) {
+        error_report("%s: invalid payload length (%d)", __func__, len);
+        return -EINVAL;
+    }
+
+    rb = qemu_ram_block_by_name(block_name);
+    if (!rb) {
+        error_report("%s: block '%s' not found", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* TODO: send the bitmap back to source */
+
+    trace_loadvm_handle_recv_bitmap(block_name);
+
+    return 0;
+}
+
+/*
  * Process an incoming 'QEMU_VM_COMMAND'
  * 0           just a normal return
  * LOADVM_QUIT All good, but exit the loop
@@ -1788,6 +1846,9 @@ static int loadvm_process_command(QEMUFile *f)
 
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case MIG_CMD_RECV_BITMAP:
+        return loadvm_handle_recv_bitmap(mis, len);
     }
 
     return 0;
diff --git a/migration/savevm.h b/migration/savevm.h
index 295c4a1..8126b1c 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len,
diff --git a/migration/trace-events b/migration/trace-events
index 42a93d9..c5f7e41 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -12,6 +12,7 @@ loadvm_state_cleanup(void) ""
 loadvm_handle_cmd_packaged(unsigned int length) "%u"
 loadvm_handle_cmd_packaged_main(int ret) "%d"
 loadvm_handle_cmd_packaged_received(int ret) "%d"
+loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
@@ -34,6 +35,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (18 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 11:05   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
                   ` (12 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
received bitmap of ramblock back to source.

This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
the header (including the ramblock name), and it was appended with the
whole ramblock received bitmap on the destination side.

When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
it parses it, convert it to the dirty bitmap by inverting the bits.

One thing to mention is that, when we send the recv bitmap, we are doing
these things in extra:

- converting the bitmap to little endian, to support when hosts are
  using different endianess on src/dst.

- do proper alignment for 8 bytes, to support when hosts are using
  different word size (32/64 bits) on src/dst.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  68 ++++++++++++++++++++++++
 migration/migration.h  |   2 +
 migration/ram.c        | 141 +++++++++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h        |   3 ++
 migration/savevm.c     |   2 +-
 migration/trace-events |   2 +
 6 files changed, 217 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1370c70..625f19a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -92,6 +92,7 @@ enum mig_rp_message_type {
 
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
+    MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
 
     MIG_RP_MSG_MAX
 };
@@ -449,6 +450,45 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
 }
 
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name)
+{
+    char buf[512];
+    int len;
+    int64_t res;
+
+    /*
+     * First, we send the header part. It contains only the len of
+     * idstr, and the idstr itself.
+     */
+    len = strlen(block_name);
+    buf[0] = len;
+    memcpy(buf + 1, block_name, len);
+
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: MSG_RP_RECV_BITMAP only used for recovery",
+                     __func__);
+        return;
+    }
+
+    migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf);
+
+    /*
+     * Next, we dump the received bitmap to the stream.
+     *
+     * TODO: currently we are safe since we are the only one that is
+     * using the to_src_file handle (fault thread is still paused),
+     * and it's ok even not taking the mutex. However the best way is
+     * to take the lock before sending the message header, and release
+     * the lock after sending the bitmap.
+     */
+    qemu_mutex_lock(&mis->rp_mutex);
+    res = ramblock_recv_bitmap_send(mis->to_src_file, block_name);
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    trace_migrate_send_rp_recv_bitmap(block_name, res);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1572,6 +1612,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
+    [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1616,6 +1657,19 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
     return true;
 }
 
+static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+
+    if (!block) {
+        error_report("%s: invalid block name '%s'", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* Fetch the received bitmap and refresh the dirty bitmap */
+    return ram_dirty_bitmap_reload(s, block);
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1721,6 +1775,20 @@ retry:
             migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
             break;
 
+        case MIG_RP_MSG_RECV_BITMAP:
+            if (header_len < 1) {
+                error_report("%s: missing block name", __func__);
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            /* Format: len (1B) + idstr (<255B). This ends the idstr. */
+            buf[buf[0] + 1] = '\0';
+            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/migration.h b/migration/migration.h
index b78b9bd..4051379 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -202,5 +202,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 7e20097..5d938e3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -182,6 +182,70 @@ void ramblock_recv_bitmap_clear(RAMBlock *rb, void *host_addr)
     clear_bit(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
 }
 
+#define  RAMBLOCK_RECV_BITMAP_ENDING  (0x0123456789abcdefULL)
+
+/*
+ * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
+ *
+ * Returns >0 if success with sent bytes, or <0 if error.
+ */
+int64_t ramblock_recv_bitmap_send(QEMUFile *file,
+                                  const char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+    unsigned long *le_bitmap, nbits;
+    uint64_t size;
+
+    if (!block) {
+        error_report("%s: invalid block name: %s", __func__, block_name);
+        return -1;
+    }
+
+    nbits = block->used_length >> TARGET_PAGE_BITS;
+
+    /*
+     * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
+     * machines we may need 4 more bytes for padding (see below
+     * comment). So extend it a bit before hand.
+     */
+    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
+
+    /*
+     * Always use little endian when sending the bitmap. This is
+     * required that when source and destination VMs are not using the
+     * same endianess. (Note: big endian won't work.)
+     */
+    bitmap_to_le(le_bitmap, block->receivedmap, nbits);
+
+    /* Size of the bitmap, in bytes */
+    size = nbits / 8;
+
+    /*
+     * size is always aligned to 8 bytes for 64bit machines, but it
+     * may not be true for 32bit machines. We need this padding to
+     * make sure the migration can survive even between 32bit and
+     * 64bit machines.
+     */
+    size = ROUND_UP(size, 8);
+
+    qemu_put_be64(file, size);
+    qemu_put_buffer(file, (const uint8_t *)le_bitmap, size);
+    /*
+     * Mark as an end, in case the middle part is screwed up due to
+     * some "misterious" reason.
+     */
+    qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING);
+    qemu_fflush(file);
+
+    free(le_bitmap);
+
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    return size + sizeof(size);
+}
+
 /*
  * An outstanding page request, on the source, having been received
  * and queued
@@ -2706,6 +2770,83 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/*
+ * Read the received bitmap, revert it as the initial dirty bitmap.
+ * This is only used when the postcopy migration is paused but wants
+ * to resume from a middle point.
+ */
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
+{
+    int ret = -EINVAL;
+    QEMUFile *file = s->rp_state.from_dst_file;
+    unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
+    uint64_t local_size = nbits / 8;
+    uint64_t size, end_mark;
+
+    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: incorrect state %s", __func__,
+                     MigrationStatus_lookup[s->state]);
+        return -EINVAL;
+    }
+
+    /*
+     * Note: see comments in ramblock_recv_bitmap_send() on why we
+     * need the endianess convertion, and the paddings.
+     */
+    local_size = ROUND_UP(local_size, 8);
+
+    /* Add addings */
+    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
+
+    size = qemu_get_be64(file);
+
+    /* The size of the bitmap should match with our ramblock */
+    if (size != local_size) {
+        error_report("%s: ramblock '%s' bitmap size mismatch "
+                     "(0x%lx != 0x%lx)", __func__, block->idstr,
+                     size, local_size);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
+    end_mark = qemu_get_be64(file);
+
+    ret = qemu_file_get_error(file);
+    if (ret || size != local_size) {
+        error_report("%s: read bitmap failed for ramblock '%s': %d",
+                     __func__, block->idstr, ret);
+        ret = -EIO;
+        goto out;
+    }
+
+    if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
+        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIu64,
+                     __func__, block->idstr, end_mark);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /*
+     * Endianess convertion. We are during postcopy (though paused).
+     * The dirty bitmap won't change. We can directly modify it.
+     */
+    bitmap_from_le(block->bmap, le_bitmap, nbits);
+
+    /*
+     * What we received is "received bitmap". Revert it as the initial
+     * dirty bitmap for this ramblock.
+     */
+    bitmap_complement(block->bmap, block->bmap, nbits);
+
+    trace_ram_dirty_bitmap_reload(block->idstr);
+
+    ret = 0;
+out:
+    free(le_bitmap);
+    return ret;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/migration/ram.h b/migration/ram.h
index 4db9922..bd4b8ba 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -57,5 +57,8 @@ int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 void ramblock_recv_bitmap_clear(RAMBlock *rb, void *host_addr);
+int64_t ramblock_recv_bitmap_send(QEMUFile *file,
+                                  const char *block_name);
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index f532ca0..7f77a31 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1766,7 +1766,7 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
         return -EINVAL;
     }
 
-    /* TODO: send the bitmap back to source */
+    migrate_send_rp_recv_bitmap(mis, block_name);
 
     trace_loadvm_handle_recv_bitmap(block_name);
 
diff --git a/migration/trace-events b/migration/trace-events
index c5f7e41..9960cd8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -78,6 +78,7 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
+ram_dirty_bitmap_reload(char *str) "%s"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
@@ -89,6 +90,7 @@ migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
 migration_completion_postcopy_end(void) ""
 migration_completion_postcopy_end_after_complete(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (19 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 11:08   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing this new command to be sent when the source VM is ready to
resume the paused migration.  What the destination does here is
basically release the fault thread to continue service page faults.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 33 +++++++++++++++++++++++++++++++++
 migration/savevm.h     |  1 +
 migration/trace-events |  1 +
 3 files changed, 35 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 7f77a31..e914346 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -77,6 +77,7 @@ enum qemu_vm_cmd {
     MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
                                       were previously sent during
                                       precopy but are dirty. */
+    MIG_CMD_POSTCOPY_RESUME,       /* resume postcopy on dest */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
@@ -95,6 +96,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
+    [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
     [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
@@ -931,6 +933,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_postcopy_resume(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_resume();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL);
+}
+
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
 {
     size_t len;
@@ -1682,6 +1690,28 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
     return LOADVM_QUIT;
 }
 
+static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
+{
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: illegal resume received", __func__);
+        /* Don't fail the load, only for this. */
+        return 0;
+    }
+
+    /*
+     * This means source VM is ready to resume the postcopy migration.
+     * It's time to switch state and release the fault thread to
+     * continue service page faults.
+     */
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    qemu_sem_post(&mis->postcopy_pause_sem_fault);
+
+    /* TODO: Tell source that "we are ready" */
+
+    return 0;
+}
+
 /**
  * Immediately following this command is a blob of data containing an embedded
  * chunk of migration stream; read it and load it.
@@ -1847,6 +1877,9 @@ static int loadvm_process_command(QEMUFile *f)
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
 
+    case MIG_CMD_POSTCOPY_RESUME:
+        return loadvm_postcopy_handle_resume(mis);
+
     case MIG_CMD_RECV_BITMAP:
         return loadvm_handle_recv_bitmap(mis, len);
     }
diff --git a/migration/savevm.h b/migration/savevm.h
index 8126b1c..a5f3879 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_resume(QEMUFile *f);
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
diff --git a/migration/trace-events b/migration/trace-events
index 9960cd8..0a1c302 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -35,6 +35,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (20 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 11:13   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
                   ` (10 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
is used as payload to let the source know whether destination is ready
to continue the migration.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 37 +++++++++++++++++++++++++++++++++++++
 migration/migration.h  |  3 +++
 migration/savevm.c     |  3 ++-
 migration/trace-events |  1 +
 4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 625f19a..4dc564a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -93,6 +93,7 @@ enum mig_rp_message_type {
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
     MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
+    MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
 
     MIG_RP_MSG_MAX
 };
@@ -489,6 +490,14 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
     trace_migrate_send_rp_recv_bitmap(block_name, res);
 }
 
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1613,6 +1622,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
     [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
+    [MIG_RP_MSG_RESUME_ACK]     = { .len =  4, .name = "RESUME_ACK" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1670,6 +1680,25 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
     return ram_dirty_bitmap_reload(s, block);
 }
 
+static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
+{
+    trace_source_return_path_thread_resume_ack(value);
+
+    if (value != MIGRATION_RESUME_ACK_VALUE) {
+        error_report("%s: illegal resume_ack value %"PRIu32,
+                     __func__, value);
+        return -1;
+    }
+
+    /* Now both sides are active. */
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    /* TODO: notify send thread that time to continue send pages */
+
+    return 0;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1789,6 +1818,14 @@ retry:
             }
             break;
 
+        case MIG_RP_MSG_RESUME_ACK:
+            tmp32 = ldl_be_p(buf);
+            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/migration.h b/migration/migration.h
index 4051379..a3a0582 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -21,6 +21,8 @@
 #include "qemu/coroutine_int.h"
 #include "hw/qdev.h"
 
+#define  MIGRATION_RESUME_ACK_VALUE  (1)
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -204,5 +206,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
                                  char *block_name);
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index e914346..7fd5390 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1707,7 +1707,8 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
     qemu_sem_post(&mis->postcopy_pause_sem_fault);
 
-    /* TODO: Tell source that "we are ready" */
+    /* Tell source that "we are ready" */
+    migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE);
 
     return 0;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 0a1c302..a929bc7 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -117,6 +117,7 @@ source_return_path_thread_entry(void) ""
 source_return_path_thread_loop_top(void) ""
 source_return_path_thread_pong(uint32_t val) "0x%x"
 source_return_path_thread_shut(uint32_t val) "0x%x"
+source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 migrate_global_state_post_load(const char *state) "loaded state: %s"
 migrate_global_state_pre_save(const char *state) "saved state: %s"
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (21 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 11:17   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
                   ` (9 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This is hook function to be called when a postcopy migration wants to
resume from a failure. For each module, it should provide its own
recovery logic before we switch to the postcopy-active state.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h |  2 ++
 migration/migration.c        | 20 +++++++++++++++++++-
 migration/savevm.c           | 25 +++++++++++++++++++++++++
 migration/savevm.h           |  1 +
 migration/trace-events       |  1 +
 5 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index a0f1edd..b669362 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -41,6 +41,8 @@ typedef struct SaveVMHandlers {
     LoadStateHandler *load_state;
     int (*load_setup)(QEMUFile *f, void *opaque);
     int (*load_cleanup)(void *opaque);
+    /* Called when postcopy migration wants to resume from failure */
+    int (*resume_prepare)(MigrationState *s, void *opaque);
 } SaveVMHandlers;
 
 int register_savevm_live(DeviceState *dev,
diff --git a/migration/migration.c b/migration/migration.c
index 4dc564a..19b7f3a5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2172,7 +2172,25 @@ typedef enum MigThrError {
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
-    /* TODO: do the resume logic */
+    int ret;
+
+    /*
+     * Call all the resume_prepare() hooks, so that modules can be
+     * ready for the migration resume.
+     */
+    ret = qemu_savevm_state_resume_prepare(s);
+    if (ret) {
+        error_report("%s: resume_prepare() failure detected: %d",
+                     __func__, ret);
+        return ret;
+    }
+
+    /*
+     * TODO: handshake with dest using MIG_CMD_RESUME,
+     * MIG_RP_MSG_RESUME_ACK, then switch source state to
+     * "postcopy-active"
+     */
+
     return 0;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 7fd5390..b86c9c6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1004,6 +1004,31 @@ void qemu_savevm_state_setup(QEMUFile *f)
     }
 }
 
+int qemu_savevm_state_resume_prepare(MigrationState *s)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    trace_savevm_state_resume_prepare();
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->resume_prepare) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        ret = se->ops->resume_prepare(s, se->opaque);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /*
  * this function has three return values:
  *   negative: there was one error, and we have -errno.
diff --git a/migration/savevm.h b/migration/savevm.h
index a5f3879..3193f04 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -31,6 +31,7 @@
 
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_resume_prepare(MigrationState *s);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
diff --git a/migration/trace-events b/migration/trace-events
index a929bc7..61b0d49 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -38,6 +38,7 @@ savevm_send_postcopy_run(void) ""
 savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
+savevm_state_resume_prepare(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_cleanup(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (22 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 11:33   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
                   ` (8 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This patch implements the first part of core RAM resume logic for
postcopy. ram_resume_prepare() is provided for the work.

When the migration is interrupted by network failure, the dirty bitmap
on the source side will be meaningless, because even the dirty bit is
cleared, it is still possible that the sent page was lost along the way
to destination. Here instead of continue the migration with the old
dirty bitmap on source, we ask the destination side to send back its
received bitmap, then invert it to be our initial dirty bitmap.

The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
once per ramblock, to ask for the received bitmap. On destination side,
MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
Data will be received on the return-path thread of source, and the main
migration thread will be notified when all the ramblock bitmaps are
synchronized.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  4 +++
 migration/migration.h  |  1 +
 migration/ram.c        | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  4 +++
 4 files changed, 76 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 19b7f3a5..19aed72 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2605,6 +2605,8 @@ static void migration_instance_finalize(Object *obj)
 
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
+
+    qemu_sem_destroy(&ms->rp_state.rp_sem);
 }
 
 static void migration_instance_init(Object *obj)
@@ -2629,6 +2631,8 @@ static void migration_instance_init(Object *obj)
     params->has_downtime_limit = true;
     params->has_x_checkpoint_delay = true;
     params->has_block_incremental = true;
+
+    qemu_sem_init(&ms->rp_state.rp_sem, 1);
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index a3a0582..d041369 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -107,6 +107,7 @@ struct MigrationState
         QEMUFile     *from_dst_file;
         QemuThread    rp_thread;
         bool          error;
+        QemuSemaphore rp_sem;
     } rp_state;
 
     double mbps;
diff --git a/migration/ram.c b/migration/ram.c
index 5d938e3..afabcf5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -47,6 +47,7 @@
 #include "exec/target_page.h"
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
+#include "savevm.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -295,6 +296,8 @@ struct RAMState {
     RAMBlock *last_req_rb;
     /* Queue of outstanding page requests from the destination */
     QemuMutex src_page_req_mutex;
+    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
+    int ramblock_to_sync;
     QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
 };
 typedef struct RAMState RAMState;
@@ -2770,6 +2773,56 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/* Sync all the dirty bitmap with destination VM.  */
+static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
+{
+    RAMBlock *block;
+    QEMUFile *file = s->to_dst_file;
+    int ramblock_count = 0;
+
+    trace_ram_dirty_bitmap_sync_start();
+
+    /*
+     * We do this in such order:
+     *
+     * 1. calculate block count
+     * 2. fill in the count to N
+     * 3. send MIG_CMD_RECV_BITMAP requests
+     * 4. wait on the semaphore until N -> 0
+     */
+
+    RAMBLOCK_FOREACH(block) {
+        ramblock_count++;
+    }
+
+    atomic_set(&rs->ramblock_to_sync, ramblock_count);
+
+    RAMBLOCK_FOREACH(block) {
+        qemu_savevm_send_recv_bitmap(file, block->idstr);
+    }
+
+    trace_ram_dirty_bitmap_sync_wait();
+
+    /* Wait until all the ramblocks' dirty bitmap synced */
+    while (atomic_read(&rs->ramblock_to_sync)) {
+        qemu_sem_wait(&s->rp_state.rp_sem);
+    }
+
+    trace_ram_dirty_bitmap_sync_complete();
+
+    return 0;
+}
+
+static void ram_dirty_bitmap_reload_notify(MigrationState *s)
+{
+    atomic_dec(&ram_state->ramblock_to_sync);
+    if (ram_state->ramblock_to_sync == 0) {
+        /* Make sure the other thread gets the latest */
+        trace_ram_dirty_bitmap_sync_notify();
+        qemu_sem_post(&s->rp_state.rp_sem);
+    }
+}
+
 /*
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
@@ -2841,12 +2894,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     trace_ram_dirty_bitmap_reload(block->idstr);
 
+    /*
+     * We succeeded to sync bitmap for current ramblock. If this is
+     * the last one to sync, we need to notify the main send thread.
+     */
+    ram_dirty_bitmap_reload_notify(s);
+
     ret = 0;
 out:
     free(le_bitmap);
     return ret;
 }
 
+static int ram_resume_prepare(MigrationState *s, void *opaque)
+{
+    RAMState *rs = *(RAMState **)opaque;
+
+    return ram_dirty_bitmap_sync_all(s, rs);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -2857,6 +2923,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_cleanup = ram_save_cleanup,
     .load_setup = ram_load_setup,
     .load_cleanup = ram_load_cleanup,
+    .resume_prepare = ram_resume_prepare,
 };
 
 void ram_mig_init(void)
diff --git a/migration/trace-events b/migration/trace-events
index 61b0d49..8962916 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -81,6 +81,10 @@ ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
 ram_dirty_bitmap_reload(char *str) "%s"
+ram_dirty_bitmap_sync_start(void) ""
+ram_dirty_bitmap_sync_wait(void) ""
+ram_dirty_bitmap_sync_notify(void) ""
+ram_dirty_bitmap_sync_complete(void) ""
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 25/33] migration: setup ramstate for resume
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (23 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 11:53   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
                   ` (7 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

After we updated the dirty bitmaps of ramblocks, we also need to update
the critical fields in RAMState to make sure it is ready for a resume.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c        | 37 ++++++++++++++++++++++++++++++++++++-
 migration/trace-events |  1 +
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index afabcf5..c5d9028 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1986,6 +1986,33 @@ static int ram_state_init(RAMState **rsp)
     return 0;
 }
 
+static void ram_state_resume_prepare(RAMState *rs)
+{
+    RAMBlock *block;
+    long pages = 0;
+
+    /*
+     * Postcopy is not using xbzrle/compression, so no need for that.
+     * Also, since source are already halted, we don't need to care
+     * about dirty page logging as well.
+     */
+
+    RAMBLOCK_FOREACH(block) {
+        pages += bitmap_count_one(block->bmap,
+                                  block->used_length >> TARGET_PAGE_BITS);
+    }
+
+    /* This may not be aligned with current bitmaps. Recalculate. */
+    rs->migration_dirty_pages = pages;
+
+    rs->last_seen_block = NULL;
+    rs->last_sent_block = NULL;
+    rs->last_page = 0;
+    rs->last_version = ram_list.version;
+
+    trace_ram_state_resume_prepare(pages);
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -2909,8 +2936,16 @@ out:
 static int ram_resume_prepare(MigrationState *s, void *opaque)
 {
     RAMState *rs = *(RAMState **)opaque;
+    int ret;
 
-    return ram_dirty_bitmap_sync_all(s, rs);
+    ret = ram_dirty_bitmap_sync_all(s, rs);
+    if (ret) {
+        return ret;
+    }
+
+    ram_state_resume_prepare(rs);
+
+    return 0;
 }
 
 static SaveVMHandlers savevm_ram_handlers = {
diff --git a/migration/trace-events b/migration/trace-events
index 8962916..6e06283 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -85,6 +85,7 @@ ram_dirty_bitmap_sync_start(void) ""
 ram_dirty_bitmap_sync_wait(void) ""
 ram_dirty_bitmap_sync_notify(void) ""
 ram_dirty_bitmap_sync_complete(void) ""
+ram_state_resume_prepare(long v) "%ld"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (24 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 11:56   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
                   ` (6 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Finish the last step to do the final handshake for the recovery.

First source sends one MIG_CMD_RESUME to dst, telling that source is
ready to resume.

Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that
dest is ready to resume (after switch to postcopy-active state).

When source received the RESUME_ACK, it switches its state to
postcopy-active, and finally the recovery is completed.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 19aed72..c9b7085 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1694,7 +1694,8 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
     migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
-    /* TODO: notify send thread that time to continue send pages */
+    /* Notify send thread that time to continue send pages */
+    qemu_sem_post(&s->rp_state.rp_sem);
 
     return 0;
 }
@@ -2169,6 +2170,21 @@ typedef enum MigThrError {
     MIG_THR_ERR_FATAL = 2,
 } MigThrError;
 
+static int postcopy_resume_handshake(MigrationState *s)
+{
+    qemu_savevm_send_postcopy_resume(s->to_dst_file);
+
+    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        qemu_sem_wait(&s->rp_state.rp_sem);
+    }
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
+    return -1;
+}
+
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
@@ -2186,10 +2202,14 @@ static int postcopy_do_resume(MigrationState *s)
     }
 
     /*
-     * TODO: handshake with dest using MIG_CMD_RESUME,
-     * MIG_RP_MSG_RESUME_ACK, then switch source state to
-     * "postcopy-active"
+     * Last handshake with destination on the resume (destination will
+     * switch to postcopy-active afterwards)
      */
+    ret = postcopy_resume_handshake(s);
+    if (ret) {
+        error_report("%s: handshake failed: %d", __func__, ret);
+        return ret;
+    }
 
     return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (25 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 20:08   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
                   ` (5 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Freeing the SocketAddress struct in socket_start_incoming_migration is
slightly confusing. Let's free the address in the same context where we
allocated it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/socket.c b/migration/socket.c
index 757d382..9fc6cb3 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -168,7 +168,6 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
 
     if (qio_channel_socket_listen_sync(listen_ioc, saddr, errp) < 0) {
         object_unref(OBJECT(listen_ioc));
-        qapi_free_SocketAddress(saddr);
         return;
     }
 
@@ -177,7 +176,6 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
                           socket_accept_incoming_migration,
                           listen_ioc,
                           (GDestroyNotify)object_unref);
-    qapi_free_SocketAddress(saddr);
 }
 
 void tcp_start_incoming_migration(const char *host_port, Error **errp)
@@ -188,10 +186,12 @@ void tcp_start_incoming_migration(const char *host_port, Error **errp)
         socket_start_incoming_migration(saddr, &err);
     }
     error_propagate(errp, err);
+    qapi_free_SocketAddress(saddr);
 }
 
 void unix_start_incoming_migration(const char *path, Error **errp)
 {
     SocketAddress *saddr = unix_build_address(path);
     socket_start_incoming_migration(saddr, errp);
+    qapi_free_SocketAddress(saddr);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (26 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 20:11   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
                   ` (4 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

For socket based incoming migration, we attached a background task onto
main loop to handle the acception of connections. We never had a way to
destroy it before, only if we finished the migration.

Let's allow socket_start_incoming_migration() to return the source tag
of the listening async work, so that we may be able to clean it up in
the future.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/socket.c | 36 ++++++++++++++++++++++++------------
 migration/socket.h |  4 ++--
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/migration/socket.c b/migration/socket.c
index 9fc6cb3..6ee51ef 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -158,8 +158,12 @@ out:
 }
 
 
-static void socket_start_incoming_migration(SocketAddress *saddr,
-                                            Error **errp)
+/*
+ * Returns the tag ID of the watch that is attached to global main
+ * loop (>0), or zero if failure detected.
+ */
+static guint socket_start_incoming_migration(SocketAddress *saddr,
+                                             Error **errp)
 {
     QIOChannelSocket *listen_ioc = qio_channel_socket_new();
 
@@ -168,30 +172,38 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
 
     if (qio_channel_socket_listen_sync(listen_ioc, saddr, errp) < 0) {
         object_unref(OBJECT(listen_ioc));
-        return;
+        return 0;
     }
 
-    qio_channel_add_watch(QIO_CHANNEL(listen_ioc),
-                          G_IO_IN,
-                          socket_accept_incoming_migration,
-                          listen_ioc,
-                          (GDestroyNotify)object_unref);
+    return qio_channel_add_watch(QIO_CHANNEL(listen_ioc),
+                                 G_IO_IN,
+                                 socket_accept_incoming_migration,
+                                 listen_ioc,
+                                 (GDestroyNotify)object_unref);
 }
 
-void tcp_start_incoming_migration(const char *host_port, Error **errp)
+guint tcp_start_incoming_migration(const char *host_port, Error **errp)
 {
     Error *err = NULL;
     SocketAddress *saddr = tcp_build_address(host_port, &err);
+    guint tag;
+
     if (!err) {
-        socket_start_incoming_migration(saddr, &err);
+        tag = socket_start_incoming_migration(saddr, &err);
     }
     error_propagate(errp, err);
     qapi_free_SocketAddress(saddr);
+
+    return tag;
 }
 
-void unix_start_incoming_migration(const char *path, Error **errp)
+guint unix_start_incoming_migration(const char *path, Error **errp)
 {
     SocketAddress *saddr = unix_build_address(path);
-    socket_start_incoming_migration(saddr, errp);
+    guint tag;
+
+    tag = socket_start_incoming_migration(saddr, errp);
     qapi_free_SocketAddress(saddr);
+
+    return tag;
 }
diff --git a/migration/socket.h b/migration/socket.h
index 6b91e9d..bc8a59a 100644
--- a/migration/socket.h
+++ b/migration/socket.h
@@ -16,12 +16,12 @@
 
 #ifndef QEMU_MIGRATION_SOCKET_H
 #define QEMU_MIGRATION_SOCKET_H
-void tcp_start_incoming_migration(const char *host_port, Error **errp);
+guint tcp_start_incoming_migration(const char *host_port, Error **errp);
 
 void tcp_start_outgoing_migration(MigrationState *s, const char *host_port,
                                   Error **errp);
 
-void unix_start_incoming_migration(const char *path, Error **errp);
+guint unix_start_incoming_migration(const char *path, Error **errp);
 
 void unix_start_outgoing_migration(MigrationState *s, const char *path,
                                    Error **errp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (27 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 20:15   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
                   ` (3 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Return the async task tag for exec typed incoming migration in
exec_start_incoming_migration().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/exec.c | 18 +++++++++++-------
 migration/exec.h |  2 +-
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/migration/exec.c b/migration/exec.c
index 08b599e..ef1fb4c 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -52,7 +52,11 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
     return FALSE; /* unregister */
 }
 
-void exec_start_incoming_migration(const char *command, Error **errp)
+/*
+ * Returns the tag ID of the watch that is attached to global main
+ * loop (>0), or zero if failure detected.
+ */
+guint exec_start_incoming_migration(const char *command, Error **errp)
 {
     QIOChannel *ioc;
     const char *argv[] = { "/bin/sh", "-c", command, NULL };
@@ -62,13 +66,13 @@ void exec_start_incoming_migration(const char *command, Error **errp)
                                                     O_RDWR,
                                                     errp));
     if (!ioc) {
-        return;
+        return 0;
     }
 
     qio_channel_set_name(ioc, "migration-exec-incoming");
-    qio_channel_add_watch(ioc,
-                          G_IO_IN,
-                          exec_accept_incoming_migration,
-                          NULL,
-                          NULL);
+    return qio_channel_add_watch(ioc,
+                                 G_IO_IN,
+                                 exec_accept_incoming_migration,
+                                 NULL,
+                                 NULL);
 }
diff --git a/migration/exec.h b/migration/exec.h
index b210ffd..0a7aada 100644
--- a/migration/exec.h
+++ b/migration/exec.h
@@ -19,7 +19,7 @@
 
 #ifndef QEMU_MIGRATION_EXEC_H
 #define QEMU_MIGRATION_EXEC_H
-void exec_start_incoming_migration(const char *host_port, Error **errp);
+guint exec_start_incoming_migration(const char *host_port, Error **errp);
 
 void exec_start_outgoing_migration(MigrationState *s, const char *host_port,
                                    Error **errp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (28 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 20:15   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
                   ` (2 subsequent siblings)
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Allow to return the task tag in fd_start_incoming_migration().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/fd.c | 18 +++++++++++-------
 migration/fd.h |  2 +-
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index 30f5258..e9a548c 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -52,7 +52,11 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
     return FALSE; /* unregister */
 }
 
-void fd_start_incoming_migration(const char *infd, Error **errp)
+/*
+ * Returns the tag ID of the watch that is attached to global main
+ * loop (>0), or zero if failure detected.
+ */
+guint fd_start_incoming_migration(const char *infd, Error **errp)
 {
     QIOChannel *ioc;
     int fd;
@@ -63,13 +67,13 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
     ioc = qio_channel_new_fd(fd, errp);
     if (!ioc) {
         close(fd);
-        return;
+        return 0;
     }
 
     qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-incoming");
-    qio_channel_add_watch(ioc,
-                          G_IO_IN,
-                          fd_accept_incoming_migration,
-                          NULL,
-                          NULL);
+    return qio_channel_add_watch(ioc,
+                                 G_IO_IN,
+                                 fd_accept_incoming_migration,
+                                 NULL,
+                                 NULL);
 }
diff --git a/migration/fd.h b/migration/fd.h
index a14a63c..94cdea8 100644
--- a/migration/fd.h
+++ b/migration/fd.h
@@ -16,7 +16,7 @@
 
 #ifndef QEMU_MIGRATION_FD_H
 #define QEMU_MIGRATION_FD_H
-void fd_start_incoming_migration(const char *path, Error **errp);
+guint fd_start_incoming_migration(const char *path, Error **errp);
 
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
                                  Error **errp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 31/33] migration: store listen task tag
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (29 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 20:17   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Store the task tag for migration types: tcp/unix/fd/exec in current
MigrationIncomingState struct.

For defered migration, no need to store task tag since there is no task
running in the main loop at all. For RDMA, let's mark it as todo.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 22 ++++++++++++++++++----
 migration/migration.h |  2 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c9b7085..daf356b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -171,6 +171,7 @@ void migration_incoming_state_destroy(void)
         mis->from_src_file = NULL;
     }
 
+    mis->listen_task_tag = 0;
     qemu_event_destroy(&mis->main_thread_load_event);
 }
 
@@ -265,25 +266,31 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
+    guint task_tag = 0;
+    MigrationIncomingState *mis = migration_incoming_get_current();
 
     qapi_event_send_migration(MIGRATION_STATUS_SETUP, &error_abort);
     if (!strcmp(uri, "defer")) {
         deferred_incoming_migration(errp);
     } else if (strstart(uri, "tcp:", &p)) {
-        tcp_start_incoming_migration(p, errp);
+        task_tag = tcp_start_incoming_migration(p, errp);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
+        /* TODO: store task tag for RDMA migrations */
         rdma_start_incoming_migration(p, errp);
 #endif
     } else if (strstart(uri, "exec:", &p)) {
-        exec_start_incoming_migration(p, errp);
+        task_tag = exec_start_incoming_migration(p, errp);
     } else if (strstart(uri, "unix:", &p)) {
-        unix_start_incoming_migration(p, errp);
+        task_tag = unix_start_incoming_migration(p, errp);
     } else if (strstart(uri, "fd:", &p)) {
-        fd_start_incoming_migration(p, errp);
+        task_tag = fd_start_incoming_migration(p, errp);
     } else {
         error_setg(errp, "unknown migration protocol: %s", uri);
+        return;
     }
+
+    mis->listen_task_tag = task_tag;
 }
 
 static void process_incoming_migration_bh(void *opaque)
@@ -422,6 +429,13 @@ void migration_fd_process_incoming(QEMUFile *f)
         co = qemu_coroutine_create(process_incoming_migration_co, f);
         qemu_coroutine_enter(co);
     }
+
+    /*
+     * When reach here, we should not need the listening port any
+     * more. We'll detach the listening task soon, let's reset the
+     * listen task tag.
+     */
+    mis->listen_task_tag = 0;
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index d041369..1f4faef 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -26,6 +26,8 @@
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
+    /* Task tag for incoming listen port. Valid when >0. */
+    guint listen_task_tag;
 
     /*
      * Free at the start of the main state load, set as the main thread finishes
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (30 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 20:32   ` Dr. David Alan Gilbert
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

migrate_incoming command is previously only used when we were providing
"-incoming defer" in the command line, to defer the incoming migration
channel creation.

However there is similar requirement when we are paused during postcopy
migration. The old incoming channel might have been destroyed already.
We may need another new channel for the recovery to happen.

This patch leveraged the same interface, but allows the user to specify
incoming migration channel even for paused postcopy.

Meanwhile, now migration listening ports are always detached manually
using the tag, rather than using return values of dispatchers.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/exec.c      |  2 +-
 migration/fd.c        |  2 +-
 migration/migration.c | 39 +++++++++++++++++++++++++++++----------
 migration/socket.c    |  2 +-
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/migration/exec.c b/migration/exec.c
index ef1fb4c..26fc37d 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
 {
     migration_channel_process_incoming(ioc);
     object_unref(OBJECT(ioc));
-    return FALSE; /* unregister */
+    return TRUE; /* keep it registered */
 }
 
 /*
diff --git a/migration/fd.c b/migration/fd.c
index e9a548c..7d0aefa 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
 {
     migration_channel_process_incoming(ioc);
     object_unref(OBJECT(ioc));
-    return FALSE; /* unregister */
+    return TRUE; /* keep it registered */
 }
 
 /*
diff --git a/migration/migration.c b/migration/migration.c
index daf356b..5812478 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -175,6 +175,17 @@ void migration_incoming_state_destroy(void)
     qemu_event_destroy(&mis->main_thread_load_event);
 }
 
+static bool migrate_incoming_detach_listen(MigrationIncomingState *mis)
+{
+    if (mis->listen_task_tag) {
+        /* Never fail */
+        g_source_remove(mis->listen_task_tag);
+        mis->listen_task_tag = 0;
+        return true;
+    }
+    return false;
+}
+
 static void migrate_generate_event(int new_state)
 {
     if (migrate_use_events()) {
@@ -432,10 +443,9 @@ void migration_fd_process_incoming(QEMUFile *f)
 
     /*
      * When reach here, we should not need the listening port any
-     * more. We'll detach the listening task soon, let's reset the
-     * listen task tag.
+     * more.  Detach the listening port explicitly.
      */
-    mis->listen_task_tag = 0;
+    migrate_incoming_detach_listen(mis);
 }
 
 /*
@@ -1291,14 +1301,25 @@ void migrate_del_blocker(Error *reason)
 void qmp_migrate_incoming(const char *uri, Error **errp)
 {
     Error *local_err = NULL;
-    static bool once = true;
+    MigrationIncomingState *mis = migration_incoming_get_current();
 
-    if (!deferred_incoming) {
-        error_setg(errp, "For use with '-incoming defer'");
+    if (!deferred_incoming &&
+        mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        error_setg(errp, "For use with '-incoming defer'"
+                   " or PAUSED postcopy migration only.");
         return;
     }
-    if (!once) {
-        error_setg(errp, "The incoming migration has already been started");
+
+    /*
+     * Destroy existing listening task if exist. Logically this should
+     * not really happen at all (for either deferred migration or
+     * postcopy migration, we should both detached the listening
+     * task). So raise an error but still we safely detach it.
+     */
+    if (migrate_incoming_detach_listen(mis)) {
+        error_report("%s: detected existing listen channel, "
+                     "while it should not exist", __func__);
+        /* Continue */
     }
 
     qemu_start_incoming_migration(uri, &local_err);
@@ -1307,8 +1328,6 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
         error_propagate(errp, local_err);
         return;
     }
-
-    once = false;
 }
 
 bool migration_is_blocked(Error **errp)
diff --git a/migration/socket.c b/migration/socket.c
index 6ee51ef..e3e453f 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -154,7 +154,7 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
 out:
     /* Close listening socket as its no longer needed */
     qio_channel_close(ioc, NULL);
-    return FALSE; /* unregister */
+    return TRUE; /* keep it registered */
 }
 
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too
  2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
                   ` (31 preceding siblings ...)
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
@ 2017-08-30  8:32 ` Peter Xu
  2017-09-22 20:37   ` Dr. David Alan Gilbert
  32 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-08-30  8:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Daniel P . Berrange, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Though we may not need it, now we init both the src/dst migration
objects in migration_object_init() so that even incoming migration
object would be thread safe (it was not).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5812478..7e9ccf0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -103,6 +103,7 @@ enum mig_rp_message_type {
    dynamic creation of migration */
 
 static MigrationState *current_migration;
+static MigrationIncomingState *current_incoming;
 
 static bool migration_object_check(MigrationState *ms, Error **errp);
 
@@ -128,6 +129,17 @@ void migration_object_init(void)
     if (ms->enforce_config_section) {
         current_migration->send_configuration = true;
     }
+
+    /*
+     * Init the migrate incoming object as well no matter whether
+     * we'll use it or not.
+     */
+    current_incoming = g_new0(MigrationIncomingState, 1);
+    current_incoming->state = MIGRATION_STATUS_NONE;
+    qemu_mutex_init(&current_incoming->rp_mutex);
+    qemu_event_init(&current_incoming->main_thread_load_event, false);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
 }
 
 /* For outgoing */
@@ -140,19 +152,8 @@ MigrationState *migrate_get_current(void)
 
 MigrationIncomingState *migration_incoming_get_current(void)
 {
-    static bool once;
-    static MigrationIncomingState mis_current;
-
-    if (!once) {
-        mis_current.state = MIGRATION_STATUS_NONE;
-        memset(&mis_current, 0, sizeof(MigrationIncomingState));
-        qemu_mutex_init(&mis_current.rp_mutex);
-        qemu_event_init(&mis_current.main_thread_load_event, false);
-        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
-        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
-        once = true;
-    }
-    return &mis_current;
+    assert(current_incoming);
+    return current_incoming;
 }
 
 void migration_incoming_state_destroy(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
@ 2017-09-06 14:36   ` Dr. David Alan Gilbert
  2017-09-20  8:44   ` Juan Quintela
  1 sibling, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-06 14:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Strings are more readable for debugging.
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

I've queued this individual patch; it's standalone and useful as is.

Dave

> ---
>  migration/migration.c  | 3 ++-
>  migration/trace-events | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index eb7d767..c818412 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -914,8 +914,9 @@ void qmp_migrate_start_postcopy(Error **errp)
>  
>  void migrate_set_state(int *state, int old_state, int new_state)
>  {
> +    assert(new_state < MIGRATION_STATUS__MAX);
>      if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
> -        trace_migrate_set_state(new_state);
> +        trace_migrate_set_state(MigrationStatus_lookup[new_state]);
>          migrate_generate_event(new_state);
>      }
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 7a3b514..d2910a6 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -80,7 +80,7 @@ ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
>  await_return_path_close_on_source_joining(void) ""
> -migrate_set_state(int new_state) "new state %d"
> +migrate_set_state(const char *new_state) "new state %s"
>  migrate_fd_cleanup(void) ""
>  migrate_fd_error(const char *error_desc) "error=%s"
>  migrate_fd_cancel(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one()
  2017-08-30  8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
@ 2017-09-20  8:25   ` Juan Quintela
  0 siblings, 0 replies; 86+ messages in thread
From: Juan Quintela @ 2017-09-20  8:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> Count how many bits set in the bitmap.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD()
  2017-08-30  8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
@ 2017-09-20  8:41   ` Juan Quintela
  0 siblings, 0 replies; 86+ messages in thread
From: Juan Quintela @ 2017-09-20  8:41 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> We have BIT_WORD(). It's the same.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

It is the same, and it don't ever exist on linux code.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
  2017-09-06 14:36   ` Dr. David Alan Gilbert
@ 2017-09-20  8:44   ` Juan Quintela
  1 sibling, 0 replies; 86+ messages in thread
From: Juan Quintela @ 2017-09-20  8:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> Strings are more readable for debugging.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
@ 2017-09-20  8:47   ` Juan Quintela
  2017-09-20  9:06   ` Juan Quintela
  1 sibling, 0 replies; 86+ messages in thread
From: Juan Quintela @ 2017-09-20  8:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> It was only used for quitting the page fault thread before. Let it be
> something more useful - now we can use it to notify a "wake" for the
> page fault thread (for any reason), and it only means "quit" if the
> fault_thread_quit is set.
>
> Since we changed what it does, renaming it to userfault_event_fd.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
  2017-09-20  8:47   ` Juan Quintela
@ 2017-09-20  9:06   ` Juan Quintela
  1 sibling, 0 replies; 86+ messages in thread
From: Juan Quintela @ 2017-09-20  9:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:

> @@ -448,8 +449,15 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          }
>  
>          if (pfd[1].revents) {
> -            trace_postcopy_ram_fault_thread_quit();
> -            break;
> +            uint64_t tmp64 = 0;
> +
> +            /* Consume the signal */
> +            read(mis->userfault_event_fd, &tmp64, 8);

make: Entering directory '/scratch/qemu/next/all'
  CC      migration/postcopy-ram.o
/mnt/kvm/qemu/next/migration/postcopy-ram.c: In function ‘postcopy_ram_fault_thread’:
/mnt/kvm/qemu/next/migration/postcopy-ram.c:460:13: error: ignoring return value of ‘read’, declared with attribute warn_unused_result [-Werror=unused-result]
             read(mis->userfault_event_fd, &tmp64, 8);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
@ 2017-09-21 17:35   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-21 17:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Provide helpers to convert bitmaps to little endian format. It can be
> used when we want to send one bitmap via network to some other hosts.
> 
> One thing to mention is that, these helpers only solve the problem of
> endianess, but it does not solve the problem of different word size on
> machines (the bitmaps managing same count of bits may contains different
> size when malloced). So we need to take care of the size alignment issue
> on the callers for now.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/qemu/bitmap.h |  7 +++++++
>  util/bitmap.c         | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index a13bd28..4481975 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -39,6 +39,8 @@
>   * bitmap_clear(dst, pos, nbits)		Clear specified bit area
>   * bitmap_test_and_clear_atomic(dst, pos, nbits)    Test and clear area
>   * bitmap_find_next_zero_area(buf, len, pos, n, mask)	Find bit free area
> + * bitmap_to_le(dst, src, nbits)      Convert bitmap to little endian
> + * bitmap_from_le(dst, src, nbits)    Convert bitmap from little endian
>   */
>  
>  /*
> @@ -247,4 +249,9 @@ static inline unsigned long *bitmap_zero_extend(unsigned long *old,
>      return new;
>  }
>  
> +void bitmap_to_le(unsigned long *dst, const unsigned long *src,
> +                  long nbits);
> +void bitmap_from_le(unsigned long *dst, const unsigned long *src,
> +                    long nbits);
> +
>  #endif /* BITMAP_H */
> diff --git a/util/bitmap.c b/util/bitmap.c
> index 3446d72..f7aad58 100644
> --- a/util/bitmap.c
> +++ b/util/bitmap.c
> @@ -370,3 +370,35 @@ long slow_bitmap_count_one(const unsigned long *bitmap, long nbits)
>  
>      return result;
>  }
> +
> +static void bitmap_to_from_le(unsigned long *dst,
> +                              const unsigned long *src, long nbits)
> +{
> +    long len = BITS_TO_LONGS(nbits);
> +
> +#ifdef HOST_WORDS_BIGENDIAN
> +    long index;
> +
> +    for (index = 0; index < len; index++) {
> +# if __WORD_SIZE == 64

I think the right constant to use here is HOST_LONG_BITS

> +        dst[index] = bswap64(src[index]);
> +# else
> +        dst[index] = bswap32(src[index]);
> +# endif
> +    }
> +#else
> +    memcpy(dst, src, len * sizeof(unsigned long));
> +#endif
> +}
> +
> +void bitmap_from_le(unsigned long *dst, const unsigned long *src,
> +                    long nbits)
> +{
> +    bitmap_to_from_le(dst, src, nbits);
> +}
> +
> +void bitmap_to_le(unsigned long *dst, const unsigned long *src,
> +                  long nbits)
> +{
> +    bitmap_to_from_le(dst, src, nbits);
> +}
> -- 
> 2.7.4

Other than that;

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Maybe adding a bswapl with that ifdef would be easier
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
@ 2017-09-21 17:51   ` Dr. David Alan Gilbert
  2017-09-26  8:48     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-21 17:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> If the postcopy down due to some reason, we can always see this on dst:
> 
>   qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000
> 
> However in most cases that's not the real issue. The problem is that
> qemu_get_be16() has no way to show whether the returned data is valid or
> not, and we are _always_ assuming it is valid. That's possibly not wise.
> 
> The best approach to solve this would be: refactoring QEMUFile interface
> to allow the APIs to return error if there is. However it needs quite a
> bit of work and testing. For now, let's explicitly check the validity
> first before using the data in all places for qemu_get_*().
> 
> This patch tries to fix most of the cases I can see. Only if we are with
> this, can we make sure we are processing the valid data, and also can we
> make sure we can capture the channel down events correctly.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c |  5 +++++
>  migration/ram.c       | 22 ++++++++++++++++++----
>  migration/savevm.c    | 41 +++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 62 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index c818412..92bf9b8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1543,6 +1543,11 @@ static void *source_return_path_thread(void *opaque)
>          header_type = qemu_get_be16(rp);
>          header_len = qemu_get_be16(rp);
>  
> +        if (qemu_file_get_error(rp)) {
> +            mark_source_rp_bad(ms);
> +            goto out;
> +        }
> +
>          if (header_type >= MIG_RP_MSG_MAX ||
>              header_type == MIG_RP_MSG_INVALID) {
>              error_report("RP: Received invalid message 0x%04x length 0x%04x",
> diff --git a/migration/ram.c b/migration/ram.c
> index affb20c..7e20097 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2417,7 +2417,7 @@ static int ram_load_postcopy(QEMUFile *f)
>      void *last_host = NULL;
>      bool all_zero = false;
>  
> -    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> +    while (!(flags & RAM_SAVE_FLAG_EOS)) {

With this change, I don't see what's checking the result of:
   ret = postcopy_place_page(...)
at the bottom of the loop.

>          ram_addr_t addr;
>          void *host = NULL;
>          void *page_buffer = NULL;
> @@ -2426,6 +2426,16 @@ static int ram_load_postcopy(QEMUFile *f)
>          uint8_t ch;
>  
>          addr = qemu_get_be64(f);
> +
> +        /*
> +         * If qemu file error, we should stop here, and then "addr"
> +         * may be invalid
> +         */
> +        ret = qemu_file_get_error(f);
> +        if (ret) {
> +            break;
> +        }
> +
>          flags = addr & ~TARGET_PAGE_MASK;
>          addr &= TARGET_PAGE_MASK;
>  
> @@ -2506,6 +2516,13 @@ static int ram_load_postcopy(QEMUFile *f)
>              error_report("Unknown combination of migration flags: %#x"
>                           " (postcopy mode)", flags);
>              ret = -EINVAL;
> +            break;
> +        }
> +
> +        /* Detect for any possible file errors */
> +        if (qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
> +            break;
>          }
>  
>          if (place_needed) {
> @@ -2520,9 +2537,6 @@ static int ram_load_postcopy(QEMUFile *f)
>                                            place_source, block);
>              }
>          }
> -        if (!ret) {
> -            ret = qemu_file_get_error(f);
> -        }
>      }
>  
>      return ret;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index fdd15fa..7172f14 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1720,6 +1720,11 @@ static int loadvm_process_command(QEMUFile *f)
>      cmd = qemu_get_be16(f);
>      len = qemu_get_be16(f);
>  
> +    /* Check validity before continue processing of cmds */
> +    if (qemu_file_get_error(f)) {
> +        return qemu_file_get_error(f);
> +    }
> +
>      trace_loadvm_process_command(cmd, len);
>      if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
>          error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
> @@ -1785,6 +1790,7 @@ static int loadvm_process_command(QEMUFile *f)
>   */
>  static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
>  {
> +    int ret;
>      uint8_t read_mark;
>      uint32_t read_section_id;
>  
> @@ -1795,6 +1801,13 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
>  
>      read_mark = qemu_get_byte(f);
>  
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        error_report("%s: Read section footer failed: %d",
> +                     __func__, ret);
> +        return false;
> +    }
> +
>      if (read_mark != QEMU_VM_SECTION_FOOTER) {
>          error_report("Missing section footer for %s", se->idstr);
>          return false;
> @@ -1830,6 +1843,13 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
>      instance_id = qemu_get_be32(f);
>      version_id = qemu_get_be32(f);
>  
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        error_report("%s: Failed to read instance/version ID: %d",
> +                     __func__, ret);
> +        return ret;
> +    }
> +
>      trace_qemu_loadvm_state_section_startfull(section_id, idstr,
>              instance_id, version_id);
>      /* Find savevm section */
> @@ -1877,6 +1897,13 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>  
>      section_id = qemu_get_be32(f);
>  
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        error_report("%s: Failed to read section ID: %d",
> +                     __func__, ret);
> +        return ret;
> +    }
> +
>      trace_qemu_loadvm_state_section_partend(section_id);
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (se->load_section_id == section_id) {
> @@ -1944,8 +1971,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>      uint8_t section_type;
>      int ret = 0;
>  
> -    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> -        ret = 0;
> +    while (true) {
> +        section_type = qemu_get_byte(f);
> +
> +        if (qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
> +            break;
> +        }
> +
>          trace_qemu_loadvm_state_section(section_type);
>          switch (section_type) {
>          case QEMU_VM_SECTION_START:
> @@ -1969,6 +2002,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>                  goto out;
>              }
>              break;
> +        case QEMU_VM_EOF:
> +            /* This is the end of migration */
> +            goto out;
> +            break;

You don't need the goto and the break (although it does no harm).

Dave

>          default:
>              error_report("Unknown savevm section type %d", section_type);
>              ret = -EINVAL;
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
@ 2017-09-21 17:57   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-21 17:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing a new state "postcopy-paused", which can be used when the
> postcopy migration is paused. It is targeted for postcopy network
> failure recovery.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 2 ++
>  qapi-schema.json      | 5 ++++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 92bf9b8..f6130db 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -470,6 +470,7 @@ static bool migration_is_setup_or_active(int state)
>      switch (state) {
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
>      case MIGRATION_STATUS_SETUP:
>          return true;

That's quite interesting; but yes I think it's right.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> @@ -545,6 +546,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_CANCELLING:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
>           /* TODO add some postcopy stats */
>          info->has_status = true;
>          info->has_total_time = true;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 802ea53..368b592 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -667,6 +667,8 @@
>  #
>  # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
>  #
> +# @postcopy-paused: during postcopy but paused. (since 2.11)
> +#
>  # @completed: migration is finished.
>  #
>  # @failed: some error occurred during migration process.
> @@ -679,7 +681,8 @@
>  ##
>  { 'enum': 'MigrationStatus',
>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
> -            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
> +            'active', 'postcopy-active', 'postcopy-paused',
> +            'completed', 'failed', 'colo' ] }
>  
>  ##
>  # @MigrationInfo:
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
@ 2017-09-21 19:21   ` Dr. David Alan Gilbert
  2017-09-26  9:35     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-21 19:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Now when network down for postcopy, the source side will not fail the
> migration. Instead we convert the status into this new paused state, and
> we will try to wait for a rescue in the future.
> 
> If a recovery is detected, migration_thread() will reset its local
> variables to prepare for that.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++++---
>  migration/migration.h  |  3 ++
>  migration/trace-events |  1 +
>  3 files changed, 98 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index f6130db..8d26ea8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -993,6 +993,8 @@ static void migrate_fd_cleanup(void *opaque)
>  
>      notifier_list_notify(&migration_state_notifiers, s);
>      block_cleanup_parameters(s);
> +
> +    qemu_sem_destroy(&s->postcopy_pause_sem);
>  }
>  
>  void migrate_fd_error(MigrationState *s, const Error *error)
> @@ -1136,6 +1138,7 @@ MigrationState *migrate_init(void)
>      s->migration_thread_running = false;
>      error_free(s->error);
>      s->error = NULL;
> +    qemu_sem_init(&s->postcopy_pause_sem, 0);
>  
>      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
>  
> @@ -1938,6 +1941,80 @@ bool migrate_colo_enabled(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
>  }
>  
> +typedef enum MigThrError {
> +    /* No error detected */
> +    MIG_THR_ERR_NONE = 0,
> +    /* Detected error, but resumed successfully */
> +    MIG_THR_ERR_RECOVERED = 1,
> +    /* Detected fatal error, need to exit */
> +    MIG_THR_ERR_FATAL = 2,

I don't think it's necessary to assign the values there, but it's OK.

> +} MigThrError;
> +
> +/*
> + * We don't return until we are in a safe state to continue current
> + * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
> + * MIG_THR_ERR_FATAL if unrecovery failure happened.
> + */
> +static MigThrError postcopy_pause(MigrationState *s)
> +{
> +    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> +
> +    /* Current channel is possibly broken. Release it. */
> +    assert(s->to_dst_file);
> +    qemu_file_shutdown(s->to_dst_file);
> +    qemu_fclose(s->to_dst_file);
> +    s->to_dst_file = NULL;
> +
> +    error_report("Detected IO failure for postcopy. "
> +                 "Migration paused.");
> +
> +    /*
> +     * We wait until things fixed up. Then someone will setup the
> +     * status back for us.
> +     */
> +    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        qemu_sem_wait(&s->postcopy_pause_sem);
> +    }
> +
> +    trace_postcopy_pause_continued();
> +
> +    return MIG_THR_ERR_RECOVERED;
> +}
> +
> +static MigThrError migration_detect_error(MigrationState *s)
> +{
> +    int ret;
> +
> +    /* Try to detect any file errors */
> +    ret = qemu_file_get_error(s->to_dst_file);
> +
> +    if (!ret) {
> +        /* Everything is fine */
> +        return MIG_THR_ERR_NONE;
> +    }
> +
> +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {

We do need to make sure that whenever we hit a failure in migration
due to a device that we pass that up rather than calling
qemu_file_set_error - e.g. an EIO in a block device or network.

However,

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +        /*
> +         * For postcopy, we allow the network to be down for a
> +         * while. After that, it can be continued by a
> +         * recovery phase.
> +         */
> +        return postcopy_pause(s);
> +    } else {
> +        /*
> +         * For precopy (or postcopy with error outside IO), we fail
> +         * with no time.
> +         */
> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> +        trace_migration_thread_file_err();
> +
> +        /* Time to stop the migration, now. */
> +        return MIG_THR_ERR_FATAL;
> +    }
> +}
> +
>  /*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
> @@ -1962,6 +2039,7 @@ static void *migration_thread(void *opaque)
>      /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
>      enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
>      bool enable_colo = migrate_colo_enabled();
> +    MigThrError thr_error;
>  
>      rcu_register_thread();
>  
> @@ -2034,12 +2112,24 @@ static void *migration_thread(void *opaque)
>              }
>          }
>  
> -        if (qemu_file_get_error(s->to_dst_file)) {
> -            migrate_set_state(&s->state, current_active_state,
> -                              MIGRATION_STATUS_FAILED);
> -            trace_migration_thread_file_err();
> +        /*
> +         * Try to detect any kind of failures, and see whether we
> +         * should stop the migration now.
> +         */
> +        thr_error = migration_detect_error(s);
> +        if (thr_error == MIG_THR_ERR_FATAL) {
> +            /* Stop migration */
>              break;
> +        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
> +            /*
> +             * Just recovered from a e.g. network failure, reset all
> +             * the local variables. This is important to avoid
> +             * breaking transferred_bytes and bandwidth calculation
> +             */
> +            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +            initial_bytes = 0;
>          }
> +
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>          if (current_time >= initial_time + BUFFER_DELAY) {
>              uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
> diff --git a/migration/migration.h b/migration/migration.h
> index 70e3094..0c957c9 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -149,6 +149,9 @@ struct MigrationState
>      bool send_configuration;
>      /* Whether we send section footer during migration */
>      bool send_section_footer;
> +
> +    /* Needed by postcopy-pause state */
> +    QemuSemaphore postcopy_pause_sem;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/trace-events b/migration/trace-events
> index d2910a6..907564b 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -98,6 +98,7 @@ migration_thread_setup_complete(void) ""
>  open_return_path_on_source(void) ""
>  open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""
> +postcopy_pause_continued(void) ""
>  postcopy_start_set_run(void) ""
>  source_return_path_thread_bad_end(void) ""
>  source_return_path_thread_end(void) ""
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
@ 2017-09-21 19:29   ` Dr. David Alan Gilbert
  2017-09-27  7:34     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-21 19:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> When there is IO error on the incoming channel (e.g., network down),
> instead of bailing out immediately, we allow the dst vm to switch to the
> new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> new semaphore, until someone poke it for another attempt.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  |  1 +
>  migration/migration.h  |  3 +++
>  migration/savevm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  migration/trace-events |  2 ++
>  4 files changed, 64 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 8d26ea8..80de212 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
>          memset(&mis_current, 0, sizeof(MigrationIncomingState));
>          qemu_mutex_init(&mis_current.rp_mutex);
>          qemu_event_init(&mis_current.main_thread_load_event, false);
> +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
>          once = true;
>      }
>      return &mis_current;
> diff --git a/migration/migration.h b/migration/migration.h
> index 0c957c9..c423682 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -60,6 +60,9 @@ struct MigrationIncomingState {
>      /* The coroutine we should enter (back) after failover */
>      Coroutine *migration_incoming_co;
>      QemuSemaphore colo_incoming_sem;
> +
> +    /* notify PAUSED postcopy incoming migrations to try to continue */
> +    QemuSemaphore postcopy_pause_sem_dst;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7172f14..3777124 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
>   */
>  static void *postcopy_ram_listen_thread(void *opaque)
>  {
> -    QEMUFile *f = opaque;
>      MigrationIncomingState *mis = migration_incoming_get_current();
> +    QEMUFile *f = mis->from_src_file;
>      int load_res;
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       */
>      qemu_file_set_blocking(f, true);
>      load_res = qemu_loadvm_state_main(f, mis);
> +
> +    /*
> +     * This is tricky, but, mis->from_src_file can change after it
> +     * returns, when postcopy recovery happened. In the future, we may
> +     * want a wrapper for the QEMUFile handle.
> +     */
> +    f = mis->from_src_file;
> +
>      /* And non-blocking again so we don't block in any cleanup */
>      qemu_file_set_blocking(f, false);
>  
> @@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>      /* Start up the listening thread and wait for it to signal ready */
>      qemu_sem_init(&mis->listen_thread_sem, 0);
>      qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> -                       postcopy_ram_listen_thread, mis->from_src_file,
> +                       postcopy_ram_listen_thread, NULL,
>                         QEMU_THREAD_DETACHED);
>      qemu_sem_wait(&mis->listen_thread_sem);
>      qemu_sem_destroy(&mis->listen_thread_sem);
> @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
>      }
>  }
>  
> +/* Return true if we should continue the migration, or false. */
> +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> +{
> +    trace_postcopy_pause_incoming();
> +
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> +
> +    assert(mis->from_src_file);
> +    qemu_file_shutdown(mis->from_src_file);
> +    qemu_fclose(mis->from_src_file);
> +    mis->from_src_file = NULL;
> +
> +    assert(mis->to_src_file);
> +    qemu_mutex_lock(&mis->rp_mutex);
> +    qemu_file_shutdown(mis->to_src_file);

Should you not do the shutdown() before the lock?
For example if the other thread is stuck, with rp_mutex
held, trying to write to to_src_file, then you'll block
waiting for the mutex.  If you call shutdown and then take
the lock, the other thread will error and release the lock.

I'm not quite sure what will happen if we end up calling this
before the main thread has been returned from postcopy and the
device loading is complete.

Also, at this point have we guaranteed no one else is about
to do an op on mis->to_src_file and will seg?

Dave

> +    qemu_fclose(mis->to_src_file);
> +    mis->to_src_file = NULL;
> +    qemu_mutex_unlock(&mis->rp_mutex);
> +
> +    error_report("Detected IO failure for postcopy. "
> +                 "Migration paused.");
> +
> +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> +    }
> +
> +    trace_postcopy_pause_incoming_continued();
> +
> +    return true;
> +}
> +
>  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  {
>      uint8_t section_type;
>      int ret = 0;
>  
> +retry:
>      while (true) {
>          section_type = qemu_get_byte(f);
>  
> @@ -2016,6 +2057,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  out:
>      if (ret < 0) {
>          qemu_file_set_error(f, ret);
> +
> +        /*
> +         * Detect whether it is:
> +         *
> +         * 1. postcopy running
> +         * 2. network failure (-EIO)
> +         *
> +         * If so, we try to wait for a recovery.
> +         */
> +        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> +            ret == -EIO && postcopy_pause_incoming(mis)) {
> +            /* Reset f to point to the newly created channel */
> +            f = mis->from_src_file;
> +            goto retry;
> +        }
>      }
>      return ret;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 907564b..7764c6f 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -99,6 +99,8 @@ open_return_path_on_source(void) ""
>  open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""
>  postcopy_pause_continued(void) ""
> +postcopy_pause_incoming(void) ""
> +postcopy_pause_incoming_continued(void) ""
>  postcopy_start_set_run(void) ""
>  source_return_path_thread_bad_end(void) ""
>  source_return_path_thread_end(void) ""
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init()
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
@ 2017-09-22  9:09   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22  9:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Let the callers take the object, then pass it to migrate_init().
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 7 ++-----
>  migration/migration.h | 2 +-
>  migration/savevm.c    | 5 ++++-
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 394e84b..15b8eb1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1138,10 +1138,8 @@ bool migration_is_idle(void)
>      return false;
>  }
>  
> -MigrationState *migrate_init(void)
> +void migrate_init(MigrationState *s)
>  {
> -    MigrationState *s = migrate_get_current();
> -
>      /*
>       * Reinitialise all migration state, except
>       * parameters/capabilities that the user set, and
> @@ -1169,7 +1167,6 @@ MigrationState *migrate_init(void)
>      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
>  
>      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -    return s;
>  }
>  
>  static GSList *migration_blockers;
> @@ -1277,7 +1274,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>          migrate_set_block_incremental(s, true);
>      }
>  
> -    s = migrate_init();
> +    migrate_init(s);
>  
>      if (strstart(uri, "tcp:", &p)) {
>          tcp_start_outgoing_migration(s, p, &local_err);
> diff --git a/migration/migration.h b/migration/migration.h
> index 338dfe3..b78b9bd 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -169,7 +169,7 @@ void migrate_fd_error(MigrationState *s, const Error *error);
>  
>  void migrate_fd_connect(MigrationState *s);
>  
> -MigrationState *migrate_init(void);
> +void migrate_init(MigrationState *s);
>  bool migration_is_blocked(Error **errp);
>  /* True if outgoing migration has entered postcopy phase */
>  bool migration_in_postcopy(void);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index a3162c1..c9bccf7 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1224,8 +1224,11 @@ void qemu_savevm_state_cleanup(void)
>  static int qemu_savevm_state(QEMUFile *f, Error **errp)
>  {
>      int ret;
> -    MigrationState *ms = migrate_init();
> +    MigrationState *ms = migrate_get_current();
>      MigrationStatus status;
> +
> +    migrate_init(ms);
> +
>      ms->to_dst_file = f;
>  
>      if (migration_is_blocked(errp)) {
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
@ 2017-09-22  9:56   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22  9:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> This patch detects the "resume" flag of migration command, rebuild the
> channels only if the flag is set.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 92 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 69 insertions(+), 23 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 15b8eb1..deb947b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1233,49 +1233,75 @@ bool migration_is_blocked(Error **errp)
>      return false;
>  }
>  
> -void qmp_migrate(const char *uri, bool has_blk, bool blk,
> -                 bool has_inc, bool inc, bool has_detach, bool detach,
> -                 bool has_resume, bool resume, Error **errp)
> +/* Returns true if continue to migrate, or false if error detected */
> +static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
> +                            bool resume, Error **errp)
>  {
>      Error *local_err = NULL;
> -    MigrationState *s = migrate_get_current();
> -    const char *p;
> +
> +    if (resume) {
> +        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +            error_setg(errp, "Cannot resume if there is no "
> +                       "paused migration");
> +            return false;
> +        }
> +        /* This is a resume, skip init status */
> +        return true;
> +    }
>  
>      if (migration_is_setup_or_active(s->state) ||
>          s->state == MIGRATION_STATUS_CANCELLING ||
>          s->state == MIGRATION_STATUS_COLO) {
>          error_setg(errp, QERR_MIGRATION_ACTIVE);
> -        return;
> +        return false;
>      }
> +
>      if (runstate_check(RUN_STATE_INMIGRATE)) {
>          error_setg(errp, "Guest is waiting for an incoming migration");
> -        return;
> +        return false;
>      }
>  
>      if (migration_is_blocked(errp)) {
> -        return;
> +        return false;
>      }
>  
> -    if ((has_blk && blk) || (has_inc && inc)) {
> +    if (blk || blk_inc) {
>          if (migrate_use_block() || migrate_use_block_incremental()) {
>              error_setg(errp, "Command options are incompatible with "
>                         "current migration capabilities");
> -            return;
> +            return false;
>          }
>          migrate_set_block_enabled(true, &local_err);
>          if (local_err) {
>              error_propagate(errp, local_err);
> -            return;
> +            return false;
>          }
>          s->must_remove_block_options = true;
>      }
>  
> -    if (has_inc && inc) {
> +    if (blk_inc) {
>          migrate_set_block_incremental(s, true);
>      }
>  
>      migrate_init(s);
>  
> +    return true;
> +}
> +
> +void qmp_migrate(const char *uri, bool has_blk, bool blk,
> +                 bool has_inc, bool inc, bool has_detach, bool detach,
> +                 bool has_resume, bool resume, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    MigrationState *s = migrate_get_current();
> +    const char *p;
> +
> +    if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
> +                         has_resume && resume, errp)) {
> +        /* Error detected, put into errp */
> +        return;
> +    }
> +
>      if (strstart(uri, "tcp:", &p)) {
>          tcp_start_outgoing_migration(s, p, &local_err);
>  #ifdef CONFIG_RDMA
> @@ -1697,7 +1723,8 @@ out:
>      return NULL;
>  }
>  
> -static int open_return_path_on_source(MigrationState *ms)
> +static int open_return_path_on_source(MigrationState *ms,
> +                                      bool create_thread)
>  {
>  
>      ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> @@ -1706,6 +1733,12 @@ static int open_return_path_on_source(MigrationState *ms)
>      }
>  
>      trace_open_return_path_on_source();
> +
> +    if (!create_thread) {
> +        /* We're done */
> +        return 0;
> +    }
> +
>      qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>                         source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>  
> @@ -2263,15 +2296,24 @@ static void *migration_thread(void *opaque)
>  
>  void migrate_fd_connect(MigrationState *s)
>  {
> -    s->expected_downtime = s->parameters.downtime_limit;
> -    s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
> +    int64_t rate_limit;
> +    bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>  
> -    qemu_file_set_blocking(s->to_dst_file, true);
> -    qemu_file_set_rate_limit(s->to_dst_file,
> -                             s->parameters.max_bandwidth / XFER_LIMIT_RATIO);
> +    if (resume) {
> +        /* This is a resumed migration */
> +        rate_limit = INT64_MAX;
> +    } else {
> +        /* This is a fresh new migration */
> +        rate_limit = s->parameters.max_bandwidth / XFER_LIMIT_RATIO;
> +        s->expected_downtime = s->parameters.downtime_limit;
> +        s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
>  
> -    /* Notify before starting migration thread */
> -    notifier_list_notify(&migration_state_notifiers, s);
> +        /* Notify before starting migration thread */
> +        notifier_list_notify(&migration_state_notifiers, s);
> +    }
> +
> +    qemu_file_set_rate_limit(s->to_dst_file, rate_limit);
> +    qemu_file_set_blocking(s->to_dst_file, true);
>  
>      /*
>       * Open the return path. For postcopy, it is used exclusively. For
> @@ -2279,15 +2321,19 @@ void migrate_fd_connect(MigrationState *s)
>       * QEMU uses the return path.
>       */
>      if (migrate_postcopy_ram() || migrate_use_return_path()) {
> -        if (open_return_path_on_source(s)) {
> +        if (open_return_path_on_source(s, !resume)) {
>              error_report("Unable to open return-path for postcopy");
> -            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> -                              MIGRATION_STATUS_FAILED);
> +            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>              migrate_fd_cleanup(s);
>              return;
>          }
>      }
>  
> +    if (resume) {
> +        /* TODO: do the resume logic */
> +        return;
> +    }
> +
>      qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
>                         QEMU_THREAD_JOINABLE);
>      s->migration_thread_running = true;
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover"
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
@ 2017-09-22 10:08   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 10:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing new migration state "postcopy-recover". If a migration
> procedure is paused and the connection is rebuilt afterward
> successfully, we'll switch the source VM state from "postcopy-paused" to
> the new state "postcopy-recover", then we'll do the resume logic in the
> migration thread (along with the return path thread).
> 
> This patch only do the state switch on source side. Another following up
> patch will handle the state switching on destination side using the same
> status bit.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(Although it's a little large, so you may want to split it)


> ---
>  migration/migration.c | 76 ++++++++++++++++++++++++++++++++++++++-------------
>  qapi-schema.json      |  4 ++-
>  2 files changed, 60 insertions(+), 20 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index deb947b..30dd566 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -495,6 +495,7 @@ static bool migration_is_setup_or_active(int state)
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
>      case MIGRATION_STATUS_SETUP:
>          return true;
>  
> @@ -571,6 +572,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>      case MIGRATION_STATUS_CANCELLING:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
>           /* TODO add some postcopy stats */
>          info->has_status = true;
>          info->has_total_time = true;
> @@ -2035,6 +2037,13 @@ typedef enum MigThrError {
>      MIG_THR_ERR_FATAL = 2,
>  } MigThrError;
>  
> +/* Return zero if success, or <0 for error */
> +static int postcopy_do_resume(MigrationState *s)
> +{
> +    /* TODO: do the resume logic */
> +    return 0;
> +}
> +
>  /*
>   * We don't return until we are in a safe state to continue current
>   * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
> @@ -2043,29 +2052,55 @@ typedef enum MigThrError {
>  static MigThrError postcopy_pause(MigrationState *s)
>  {
>      assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> -    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> -                      MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
> -    /* Current channel is possibly broken. Release it. */
> -    assert(s->to_dst_file);
> -    qemu_file_shutdown(s->to_dst_file);
> -    qemu_fclose(s->to_dst_file);
> -    s->to_dst_file = NULL;
> +    while (true) {
> +        migrate_set_state(&s->state, s->state,
> +                          MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
> -    error_report("Detected IO failure for postcopy. "
> -                 "Migration paused.");
> +        /* Current channel is possibly broken. Release it. */
> +        assert(s->to_dst_file);
> +        qemu_file_shutdown(s->to_dst_file);
> +        qemu_fclose(s->to_dst_file);
> +        s->to_dst_file = NULL;
>  
> -    /*
> -     * We wait until things fixed up. Then someone will setup the
> -     * status back for us.
> -     */
> -    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> -        qemu_sem_wait(&s->postcopy_pause_sem);
> -    }
> +        error_report("Detected IO failure for postcopy. "
> +                     "Migration paused.");
> +
> +        /*
> +         * We wait until things fixed up. Then someone will setup the
> +         * status back for us.
> +         */
> +        while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +            qemu_sem_wait(&s->postcopy_pause_sem);
> +        }
>  
> -    trace_postcopy_pause_continued();
> +        if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +            /* Woken up by a recover procedure. Give it a shot */
> +
> +            /*
> +             * Firstly, let's wake up the return path now, with a new
> +             * return path channel.
> +             */
> +            qemu_sem_post(&s->postcopy_pause_rp_sem);
>  
> -    return MIG_THR_ERR_RECOVERED;
> +            /* Do the resume logic */
> +            if (postcopy_do_resume(s) == 0) {
> +                /* Let's continue! */
> +                trace_postcopy_pause_continued();
> +                return MIG_THR_ERR_RECOVERED;
> +            } else {
> +                /*
> +                 * Something wrong happened during the recovery, let's
> +                 * pause again. Pause is always better than throwing
> +                 * data away.
> +                 */
> +                continue;
> +            }
> +        } else {
> +            /* This is not right... Time to quit. */
> +            return MIG_THR_ERR_FATAL;
> +        }
> +    }
>  }
>  
>  static MigThrError migration_detect_error(MigrationState *s)
> @@ -2330,7 +2365,10 @@ void migrate_fd_connect(MigrationState *s)
>      }
>  
>      if (resume) {
> -        /* TODO: do the resume logic */
> +        /* Wakeup the main migration thread to do the recovery */
> +        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> +                          MIGRATION_STATUS_POSTCOPY_RECOVER);
> +        qemu_sem_post(&s->postcopy_pause_sem);
>          return;
>      }
>  
> diff --git a/qapi-schema.json b/qapi-schema.json
> index ba41f2c..989f95a 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -669,6 +669,8 @@
>  #
>  # @postcopy-paused: during postcopy but paused. (since 2.11)
>  #
> +# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11)
> +#
>  # @completed: migration is finished.
>  #
>  # @failed: some error occurred during migration process.
> @@ -682,7 +684,7 @@
>  { 'enum': 'MigrationStatus',
>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
>              'active', 'postcopy-active', 'postcopy-paused',
> -            'completed', 'failed', 'colo' ] }
> +            'postcopy-recover', 'completed', 'failed', 'colo' ] }
>  
>  ##
>  # @MigrationInfo:
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
@ 2017-09-22 11:05   ` Dr. David Alan Gilbert
  2017-09-27 10:04     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 11:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
> received bitmap of ramblock back to source.
> 
> This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
> the header (including the ramblock name), and it was appended with the
> whole ramblock received bitmap on the destination side.
> 
> When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
> it parses it, convert it to the dirty bitmap by inverting the bits.
> 
> One thing to mention is that, when we send the recv bitmap, we are doing
> these things in extra:
> 
> - converting the bitmap to little endian, to support when hosts are
>   using different endianess on src/dst.
> 
> - do proper alignment for 8 bytes, to support when hosts are using
>   different word size (32/64 bits) on src/dst.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  |  68 ++++++++++++++++++++++++
>  migration/migration.h  |   2 +
>  migration/ram.c        | 141 +++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/ram.h        |   3 ++
>  migration/savevm.c     |   2 +-
>  migration/trace-events |   2 +
>  6 files changed, 217 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 1370c70..625f19a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -92,6 +92,7 @@ enum mig_rp_message_type {
>  
>      MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
>      MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
> +    MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
>  
>      MIG_RP_MSG_MAX
>  };
> @@ -449,6 +450,45 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
>      migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
>  }
>  
> +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> +                                 char *block_name)
> +{
> +    char buf[512];
> +    int len;
> +    int64_t res;
> +
> +    /*
> +     * First, we send the header part. It contains only the len of
> +     * idstr, and the idstr itself.
> +     */
> +    len = strlen(block_name);
> +    buf[0] = len;
> +    memcpy(buf + 1, block_name, len);
> +
> +    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        error_report("%s: MSG_RP_RECV_BITMAP only used for recovery",
> +                     __func__);
> +        return;
> +    }
> +
> +    migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf);
> +
> +    /*
> +     * Next, we dump the received bitmap to the stream.
> +     *
> +     * TODO: currently we are safe since we are the only one that is
> +     * using the to_src_file handle (fault thread is still paused),
> +     * and it's ok even not taking the mutex. However the best way is
> +     * to take the lock before sending the message header, and release
> +     * the lock after sending the bitmap.
> +     */
> +    qemu_mutex_lock(&mis->rp_mutex);
> +    res = ramblock_recv_bitmap_send(mis->to_src_file, block_name);
> +    qemu_mutex_unlock(&mis->rp_mutex);
> +
> +    trace_migrate_send_rp_recv_bitmap(block_name, res);
> +}
> +
>  MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
>  {
>      MigrationCapabilityStatusList *head = NULL;
> @@ -1572,6 +1612,7 @@ static struct rp_cmd_args {
>      [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
>      [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
>      [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
> +    [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
>      [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
>  };
>  
> @@ -1616,6 +1657,19 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
>      return true;
>  }
>  
> +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
> +{
> +    RAMBlock *block = qemu_ram_block_by_name(block_name);
> +
> +    if (!block) {
> +        error_report("%s: invalid block name '%s'", __func__, block_name);
> +        return -EINVAL;
> +    }
> +
> +    /* Fetch the received bitmap and refresh the dirty bitmap */
> +    return ram_dirty_bitmap_reload(s, block);
> +}
> +
>  /*
>   * Handles messages sent on the return path towards the source VM
>   *
> @@ -1721,6 +1775,20 @@ retry:
>              migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
>              break;
>  
> +        case MIG_RP_MSG_RECV_BITMAP:
> +            if (header_len < 1) {
> +                error_report("%s: missing block name", __func__);
> +                mark_source_rp_bad(ms);
> +                goto out;
> +            }
> +            /* Format: len (1B) + idstr (<255B). This ends the idstr. */
> +            buf[buf[0] + 1] = '\0';
> +            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
> +                mark_source_rp_bad(ms);
> +                goto out;
> +            }
> +            break;
> +
>          default:
>              break;
>          }
> diff --git a/migration/migration.h b/migration/migration.h
> index b78b9bd..4051379 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -202,5 +202,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
>                                ram_addr_t start, size_t len);
> +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> +                                 char *block_name);
>  
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index 7e20097..5d938e3 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -182,6 +182,70 @@ void ramblock_recv_bitmap_clear(RAMBlock *rb, void *host_addr)
>      clear_bit(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
>  }
>  
> +#define  RAMBLOCK_RECV_BITMAP_ENDING  (0x0123456789abcdefULL)
> +
> +/*
> + * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
> + *
> + * Returns >0 if success with sent bytes, or <0 if error.
> + */
> +int64_t ramblock_recv_bitmap_send(QEMUFile *file,
> +                                  const char *block_name)
> +{
> +    RAMBlock *block = qemu_ram_block_by_name(block_name);
> +    unsigned long *le_bitmap, nbits;
> +    uint64_t size;
> +
> +    if (!block) {
> +        error_report("%s: invalid block name: %s", __func__, block_name);
> +        return -1;
> +    }
> +
> +    nbits = block->used_length >> TARGET_PAGE_BITS;
> +
> +    /*
> +     * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
> +     * machines we may need 4 more bytes for padding (see below
> +     * comment). So extend it a bit before hand.
> +     */
> +    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);

I do worry what will happen on really huge RAMBlocks; the worst case is
that this temporary bitmap is a few GB.

> +    /*
> +     * Always use little endian when sending the bitmap. This is
> +     * required that when source and destination VMs are not using the
> +     * same endianess. (Note: big endian won't work.)
> +     */
> +    bitmap_to_le(le_bitmap, block->receivedmap, nbits);
> +
> +    /* Size of the bitmap, in bytes */
> +    size = nbits / 8;
> +
> +    /*
> +     * size is always aligned to 8 bytes for 64bit machines, but it
> +     * may not be true for 32bit machines. We need this padding to
> +     * make sure the migration can survive even between 32bit and
> +     * 64bit machines.
> +     */
> +    size = ROUND_UP(size, 8);
> +
> +    qemu_put_be64(file, size);
> +    qemu_put_buffer(file, (const uint8_t *)le_bitmap, size);
> +    /*
> +     * Mark as an end, in case the middle part is screwed up due to
> +     * some "misterious" reason.
> +     */
> +    qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING);
> +    qemu_fflush(file);
> +
> +    free(le_bitmap);
> +
> +    if (qemu_file_get_error(file)) {
> +        return qemu_file_get_error(file);
> +    }
> +
> +    return size + sizeof(size);
> +}
> +
>  /*
>   * An outstanding page request, on the source, having been received
>   * and queued
> @@ -2706,6 +2770,83 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      return ret;
>  }
>  
> +/*
> + * Read the received bitmap, revert it as the initial dirty bitmap.
> + * This is only used when the postcopy migration is paused but wants
> + * to resume from a middle point.
> + */
> +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> +{
> +    int ret = -EINVAL;
> +    QEMUFile *file = s->rp_state.from_dst_file;
> +    unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
> +    uint64_t local_size = nbits / 8;
> +    uint64_t size, end_mark;
> +
> +    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        error_report("%s: incorrect state %s", __func__,
> +                     MigrationStatus_lookup[s->state]);
> +        return -EINVAL;
> +    }
> +
> +    /*
> +     * Note: see comments in ramblock_recv_bitmap_send() on why we
> +     * need the endianess convertion, and the paddings.
> +     */
> +    local_size = ROUND_UP(local_size, 8);
> +
> +    /* Add addings */
> +    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
> +
> +    size = qemu_get_be64(file);
> +
> +    /* The size of the bitmap should match with our ramblock */
> +    if (size != local_size) {
> +        error_report("%s: ramblock '%s' bitmap size mismatch "
> +                     "(0x%lx != 0x%lx)", __func__, block->idstr,
> +                     size, local_size);

You need to use PRIx64 formatters there - %lx isn't portable.

> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
> +    end_mark = qemu_get_be64(file);
> +
> +    ret = qemu_file_get_error(file);
> +    if (ret || size != local_size) {
> +        error_report("%s: read bitmap failed for ramblock '%s': %d",
> +                     __func__, block->idstr, ret);

You might like to include size/local_size in the error.

> +        ret = -EIO;
> +        goto out;
> +    }
> +
> +    if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
> +        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIu64,
> +                     __func__, block->idstr, end_mark);
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    /*
> +     * Endianess convertion. We are during postcopy (though paused).
                        >>s

Dave

> +     * The dirty bitmap won't change. We can directly modify it.
> +     */
> +    bitmap_from_le(block->bmap, le_bitmap, nbits);
> +
> +    /*
> +     * What we received is "received bitmap". Revert it as the initial
> +     * dirty bitmap for this ramblock.
> +     */
> +    bitmap_complement(block->bmap, block->bmap, nbits);
> +
> +    trace_ram_dirty_bitmap_reload(block->idstr);
> +
> +    ret = 0;
> +out:
> +    free(le_bitmap);
> +    return ret;
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> diff --git a/migration/ram.h b/migration/ram.h
> index 4db9922..bd4b8ba 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -57,5 +57,8 @@ int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
>  void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
>  void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
>  void ramblock_recv_bitmap_clear(RAMBlock *rb, void *host_addr);
> +int64_t ramblock_recv_bitmap_send(QEMUFile *file,
> +                                  const char *block_name);
> +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
>  
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index f532ca0..7f77a31 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1766,7 +1766,7 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
>          return -EINVAL;
>      }
>  
> -    /* TODO: send the bitmap back to source */
> +    migrate_send_rp_recv_bitmap(mis, block_name);
>  
>      trace_loadvm_handle_recv_bitmap(block_name);
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index c5f7e41..9960cd8 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -78,6 +78,7 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
>  ram_postcopy_send_discard_bitmap(void) ""
>  ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
>  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
> +ram_dirty_bitmap_reload(char *str) "%s"
>  
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
> @@ -89,6 +90,7 @@ migrate_fd_cancel(void) ""
>  migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
>  migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
>  migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
> +migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
>  migration_completion_file_err(void) ""
>  migration_completion_postcopy_end(void) ""
>  migration_completion_postcopy_end_after_complete(void) ""
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
@ 2017-09-22 11:08   ` Dr. David Alan Gilbert
  2017-09-27 10:11     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 11:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing this new command to be sent when the source VM is ready to
> resume the paused migration.  What the destination does here is
> basically release the fault thread to continue service page faults.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/savevm.c     | 33 +++++++++++++++++++++++++++++++++
>  migration/savevm.h     |  1 +
>  migration/trace-events |  1 +
>  3 files changed, 35 insertions(+)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7f77a31..e914346 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -77,6 +77,7 @@ enum qemu_vm_cmd {
>      MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
>                                        were previously sent during
>                                        precopy but are dirty. */
> +    MIG_CMD_POSTCOPY_RESUME,       /* resume postcopy on dest */
>      MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
>      MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
>      MIG_CMD_MAX
> @@ -95,6 +96,7 @@ static struct mig_cmd_args {
>      [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
>      [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
>                                     .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
> +    [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
>      [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
>      [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
>      [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
> @@ -931,6 +933,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
>      qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
>  }
>  
> +void qemu_savevm_send_postcopy_resume(QEMUFile *f)
> +{
> +    trace_savevm_send_postcopy_resume();
> +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL);
> +}
> +
>  void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
>  {
>      size_t len;
> @@ -1682,6 +1690,28 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>      return LOADVM_QUIT;
>  }
>  
> +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> +{
> +    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        error_report("%s: illegal resume received", __func__);
> +        /* Don't fail the load, only for this. */
> +        return 0;
> +    }
> +
> +    /*
> +     * This means source VM is ready to resume the postcopy migration.
> +     * It's time to switch state and release the fault thread to
> +     * continue service page faults.
> +     */
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> +
> +    /* TODO: Tell source that "we are ready" */
> +

You might want to add a trace in here; however,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +    return 0;
> +}
> +
>  /**
>   * Immediately following this command is a blob of data containing an embedded
>   * chunk of migration stream; read it and load it.
> @@ -1847,6 +1877,9 @@ static int loadvm_process_command(QEMUFile *f)
>      case MIG_CMD_POSTCOPY_RAM_DISCARD:
>          return loadvm_postcopy_ram_handle_discard(mis, len);
>  
> +    case MIG_CMD_POSTCOPY_RESUME:
> +        return loadvm_postcopy_handle_resume(mis);
> +
>      case MIG_CMD_RECV_BITMAP:
>          return loadvm_handle_recv_bitmap(mis, len);
>      }
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 8126b1c..a5f3879 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
>  void qemu_savevm_send_postcopy_advise(QEMUFile *f);
>  void qemu_savevm_send_postcopy_listen(QEMUFile *f);
>  void qemu_savevm_send_postcopy_run(QEMUFile *f);
> +void qemu_savevm_send_postcopy_resume(QEMUFile *f);
>  void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
>  
>  void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> diff --git a/migration/trace-events b/migration/trace-events
> index 9960cd8..0a1c302 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -35,6 +35,7 @@ savevm_send_open_return_path(void) ""
>  savevm_send_ping(uint32_t val) "0x%x"
>  savevm_send_postcopy_listen(void) ""
>  savevm_send_postcopy_run(void) ""
> +savevm_send_postcopy_resume(void) ""
>  savevm_send_recv_bitmap(char *name) "%s"
>  savevm_state_setup(void) ""
>  savevm_state_header(void) ""
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
@ 2017-09-22 11:13   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 11:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
> is used as payload to let the source know whether destination is ready
> to continue the migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c  | 37 +++++++++++++++++++++++++++++++++++++
>  migration/migration.h  |  3 +++
>  migration/savevm.c     |  3 ++-
>  migration/trace-events |  1 +
>  4 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 625f19a..4dc564a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -93,6 +93,7 @@ enum mig_rp_message_type {
>      MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
>      MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
>      MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
> +    MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
>  
>      MIG_RP_MSG_MAX
>  };
> @@ -489,6 +490,14 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>      trace_migrate_send_rp_recv_bitmap(block_name, res);
>  }
>  
> +void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
> +{
> +    uint32_t buf;
> +
> +    buf = cpu_to_be32(value);
> +    migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf);
> +}
> +
>  MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
>  {
>      MigrationCapabilityStatusList *head = NULL;
> @@ -1613,6 +1622,7 @@ static struct rp_cmd_args {
>      [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
>      [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
>      [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
> +    [MIG_RP_MSG_RESUME_ACK]     = { .len =  4, .name = "RESUME_ACK" },
>      [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
>  };
>  
> @@ -1670,6 +1680,25 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
>      return ram_dirty_bitmap_reload(s, block);
>  }
>  
> +static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> +{
> +    trace_source_return_path_thread_resume_ack(value);
> +
> +    if (value != MIGRATION_RESUME_ACK_VALUE) {
> +        error_report("%s: illegal resume_ack value %"PRIu32,
> +                     __func__, value);
> +        return -1;
> +    }
> +
> +    /* Now both sides are active. */
> +    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +
> +    /* TODO: notify send thread that time to continue send pages */
> +
> +    return 0;
> +}
> +
>  /*
>   * Handles messages sent on the return path towards the source VM
>   *
> @@ -1789,6 +1818,14 @@ retry:
>              }
>              break;
>  
> +        case MIG_RP_MSG_RESUME_ACK:
> +            tmp32 = ldl_be_p(buf);
> +            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
> +                mark_source_rp_bad(ms);
> +                goto out;
> +            }
> +            break;
> +
>          default:
>              break;
>          }
> diff --git a/migration/migration.h b/migration/migration.h
> index 4051379..a3a0582 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -21,6 +21,8 @@
>  #include "qemu/coroutine_int.h"
>  #include "hw/qdev.h"
>  
> +#define  MIGRATION_RESUME_ACK_VALUE  (1)
> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
> @@ -204,5 +206,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
>                                ram_addr_t start, size_t len);
>  void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>                                   char *block_name);
> +void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>  
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e914346..7fd5390 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1707,7 +1707,8 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>                        MIGRATION_STATUS_POSTCOPY_ACTIVE);
>      qemu_sem_post(&mis->postcopy_pause_sem_fault);
>  
> -    /* TODO: Tell source that "we are ready" */
> +    /* Tell source that "we are ready" */
> +    migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE);
>  
>      return 0;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 0a1c302..a929bc7 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -117,6 +117,7 @@ source_return_path_thread_entry(void) ""
>  source_return_path_thread_loop_top(void) ""
>  source_return_path_thread_pong(uint32_t val) "0x%x"
>  source_return_path_thread_shut(uint32_t val) "0x%x"
> +source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
>  migrate_global_state_post_load(const char *state) "loaded state: %s"
>  migrate_global_state_pre_save(const char *state) "saved state: %s"
>  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
@ 2017-09-22 11:17   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 11:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> This is hook function to be called when a postcopy migration wants to
> resume from a failure. For each module, it should provide its own
> recovery logic before we switch to the postcopy-active state.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  include/migration/register.h |  2 ++
>  migration/migration.c        | 20 +++++++++++++++++++-
>  migration/savevm.c           | 25 +++++++++++++++++++++++++
>  migration/savevm.h           |  1 +
>  migration/trace-events       |  1 +
>  5 files changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index a0f1edd..b669362 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -41,6 +41,8 @@ typedef struct SaveVMHandlers {
>      LoadStateHandler *load_state;
>      int (*load_setup)(QEMUFile *f, void *opaque);
>      int (*load_cleanup)(void *opaque);
> +    /* Called when postcopy migration wants to resume from failure */
> +    int (*resume_prepare)(MigrationState *s, void *opaque);
>  } SaveVMHandlers;
>  
>  int register_savevm_live(DeviceState *dev,
> diff --git a/migration/migration.c b/migration/migration.c
> index 4dc564a..19b7f3a5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2172,7 +2172,25 @@ typedef enum MigThrError {
>  /* Return zero if success, or <0 for error */
>  static int postcopy_do_resume(MigrationState *s)
>  {
> -    /* TODO: do the resume logic */
> +    int ret;
> +
> +    /*
> +     * Call all the resume_prepare() hooks, so that modules can be
> +     * ready for the migration resume.
> +     */
> +    ret = qemu_savevm_state_resume_prepare(s);
> +    if (ret) {
> +        error_report("%s: resume_prepare() failure detected: %d",
> +                     __func__, ret);
> +        return ret;
> +    }
> +
> +    /*
> +     * TODO: handshake with dest using MIG_CMD_RESUME,
> +     * MIG_RP_MSG_RESUME_ACK, then switch source state to
> +     * "postcopy-active"
> +     */
> +
>      return 0;
>  }
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7fd5390..b86c9c6 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1004,6 +1004,31 @@ void qemu_savevm_state_setup(QEMUFile *f)
>      }
>  }
>  
> +int qemu_savevm_state_resume_prepare(MigrationState *s)
> +{
> +    SaveStateEntry *se;
> +    int ret;
> +
> +    trace_savevm_state_resume_prepare();
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (!se->ops || !se->ops->resume_prepare) {
> +            continue;
> +        }
> +        if (se->ops && se->ops->is_active) {
> +            if (!se->ops->is_active(se->opaque)) {
> +                continue;
> +            }
> +        }
> +        ret = se->ops->resume_prepare(s, se->opaque);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  /*
>   * this function has three return values:
>   *   negative: there was one error, and we have -errno.
> diff --git a/migration/savevm.h b/migration/savevm.h
> index a5f3879..3193f04 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -31,6 +31,7 @@
>  
>  bool qemu_savevm_state_blocked(Error **errp);
>  void qemu_savevm_state_setup(QEMUFile *f);
> +int qemu_savevm_state_resume_prepare(MigrationState *s);
>  void qemu_savevm_state_header(QEMUFile *f);
>  int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
>  void qemu_savevm_state_cleanup(void);
> diff --git a/migration/trace-events b/migration/trace-events
> index a929bc7..61b0d49 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -38,6 +38,7 @@ savevm_send_postcopy_run(void) ""
>  savevm_send_postcopy_resume(void) ""
>  savevm_send_recv_bitmap(char *name) "%s"
>  savevm_state_setup(void) ""
> +savevm_state_resume_prepare(void) ""
>  savevm_state_header(void) ""
>  savevm_state_iterate(void) ""
>  savevm_state_cleanup(void) ""
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
@ 2017-09-22 11:33   ` Dr. David Alan Gilbert
  2017-09-28  2:30     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 11:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> This patch implements the first part of core RAM resume logic for
> postcopy. ram_resume_prepare() is provided for the work.
> 
> When the migration is interrupted by network failure, the dirty bitmap
> on the source side will be meaningless, because even the dirty bit is
> cleared, it is still possible that the sent page was lost along the way
> to destination. Here instead of continue the migration with the old
> dirty bitmap on source, we ask the destination side to send back its
> received bitmap, then invert it to be our initial dirty bitmap.
> 
> The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
> once per ramblock, to ask for the received bitmap. On destination side,
> MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
> Data will be received on the return-path thread of source, and the main
> migration thread will be notified when all the ramblock bitmaps are
> synchronized.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  |  4 +++
>  migration/migration.h  |  1 +
>  migration/ram.c        | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/trace-events |  4 +++
>  4 files changed, 76 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 19b7f3a5..19aed72 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2605,6 +2605,8 @@ static void migration_instance_finalize(Object *obj)
>  
>      g_free(params->tls_hostname);
>      g_free(params->tls_creds);
> +
> +    qemu_sem_destroy(&ms->rp_state.rp_sem);
>  }
>  
>  static void migration_instance_init(Object *obj)
> @@ -2629,6 +2631,8 @@ static void migration_instance_init(Object *obj)
>      params->has_downtime_limit = true;
>      params->has_x_checkpoint_delay = true;
>      params->has_block_incremental = true;
> +
> +    qemu_sem_init(&ms->rp_state.rp_sem, 1);
>  }
>  
>  /*
> diff --git a/migration/migration.h b/migration/migration.h
> index a3a0582..d041369 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -107,6 +107,7 @@ struct MigrationState
>          QEMUFile     *from_dst_file;
>          QemuThread    rp_thread;
>          bool          error;
> +        QemuSemaphore rp_sem;
>      } rp_state;
>  
>      double mbps;
> diff --git a/migration/ram.c b/migration/ram.c
> index 5d938e3..afabcf5 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -47,6 +47,7 @@
>  #include "exec/target_page.h"
>  #include "qemu/rcu_queue.h"
>  #include "migration/colo.h"
> +#include "savevm.h"
>  
>  /***********************************************************/
>  /* ram save/restore */
> @@ -295,6 +296,8 @@ struct RAMState {
>      RAMBlock *last_req_rb;
>      /* Queue of outstanding page requests from the destination */
>      QemuMutex src_page_req_mutex;
> +    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
> +    int ramblock_to_sync;
>      QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
>  };
>  typedef struct RAMState RAMState;
> @@ -2770,6 +2773,56 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      return ret;
>  }
>  
> +/* Sync all the dirty bitmap with destination VM.  */
> +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> +{
> +    RAMBlock *block;
> +    QEMUFile *file = s->to_dst_file;
> +    int ramblock_count = 0;
> +
> +    trace_ram_dirty_bitmap_sync_start();
> +
> +    /*
> +     * We do this in such order:
> +     *
> +     * 1. calculate block count
> +     * 2. fill in the count to N
> +     * 3. send MIG_CMD_RECV_BITMAP requests
> +     * 4. wait on the semaphore until N -> 0
> +     */
> +
> +    RAMBLOCK_FOREACH(block) {
> +        ramblock_count++;
> +    }
> +
> +    atomic_set(&rs->ramblock_to_sync, ramblock_count);
> +    RAMBLOCK_FOREACH(block) {
> +        qemu_savevm_send_recv_bitmap(file, block->idstr);
> +    }
> +
> +    trace_ram_dirty_bitmap_sync_wait();

Please include the RAMBlock name in the trace, so if it hangs we can
see where.

> +
> +    /* Wait until all the ramblocks' dirty bitmap synced */
> +    while (atomic_read(&rs->ramblock_to_sync)) {
> +        qemu_sem_wait(&s->rp_state.rp_sem);
> +    }

Do you need to make ramblock_to_sync global and use atomics - I think
you can simplify it;  if you qemu_sem_init to 0, then I think you
can do:
   while (ramblock_count--) {
       qemu_sem_wait(&s->rp_state.rp_sem);
   }

qemu_sem_wait will block until the semaphore is >0....

> +
> +    trace_ram_dirty_bitmap_sync_complete();
> +
> +    return 0;
> +}
> +
> +static void ram_dirty_bitmap_reload_notify(MigrationState *s)
> +{
> +    atomic_dec(&ram_state->ramblock_to_sync);
> +    if (ram_state->ramblock_to_sync == 0) {
> +        /* Make sure the other thread gets the latest */
> +        trace_ram_dirty_bitmap_sync_notify();
> +        qemu_sem_post(&s->rp_state.rp_sem);
> +    }

then with the suggestion above you just do a qemu_sem_post each time.

> +}
> +
>  /*
>   * Read the received bitmap, revert it as the initial dirty bitmap.
>   * This is only used when the postcopy migration is paused but wants
> @@ -2841,12 +2894,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
>  
>      trace_ram_dirty_bitmap_reload(block->idstr);
>  
> +    /*
> +     * We succeeded to sync bitmap for current ramblock. If this is
> +     * the last one to sync, we need to notify the main send thread.
> +     */
> +    ram_dirty_bitmap_reload_notify(s);
> +
>      ret = 0;
>  out:
>      free(le_bitmap);
>      return ret;
>  }
>  
> +static int ram_resume_prepare(MigrationState *s, void *opaque)
> +{
> +    RAMState *rs = *(RAMState **)opaque;
> +
> +    return ram_dirty_bitmap_sync_all(s, rs);
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> @@ -2857,6 +2923,7 @@ static SaveVMHandlers savevm_ram_handlers = {
>      .save_cleanup = ram_save_cleanup,
>      .load_setup = ram_load_setup,
>      .load_cleanup = ram_load_cleanup,
> +    .resume_prepare = ram_resume_prepare,
>  };
>  
>  void ram_mig_init(void)
> diff --git a/migration/trace-events b/migration/trace-events
> index 61b0d49..8962916 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -81,6 +81,10 @@ ram_postcopy_send_discard_bitmap(void) ""
>  ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
>  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
>  ram_dirty_bitmap_reload(char *str) "%s"
> +ram_dirty_bitmap_sync_start(void) ""
> +ram_dirty_bitmap_sync_wait(void) ""
> +ram_dirty_bitmap_sync_notify(void) ""
> +ram_dirty_bitmap_sync_complete(void) ""
>  
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 25/33] migration: setup ramstate for resume
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
@ 2017-09-22 11:53   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 11:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> After we updated the dirty bitmaps of ramblocks, we also need to update
> the critical fields in RAMState to make sure it is ready for a resume.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c        | 37 ++++++++++++++++++++++++++++++++++++-
>  migration/trace-events |  1 +
>  2 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index afabcf5..c5d9028 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1986,6 +1986,33 @@ static int ram_state_init(RAMState **rsp)
>      return 0;
>  }
>  
> +static void ram_state_resume_prepare(RAMState *rs)
> +{
> +    RAMBlock *block;
> +    long pages = 0;
> +
> +    /*
> +     * Postcopy is not using xbzrle/compression, so no need for that.
> +     * Also, since source are already halted, we don't need to care
> +     * about dirty page logging as well.
> +     */
> +
> +    RAMBLOCK_FOREACH(block) {
> +        pages += bitmap_count_one(block->bmap,
> +                                  block->used_length >> TARGET_PAGE_BITS);
> +    }
> +
> +    /* This may not be aligned with current bitmaps. Recalculate. */
> +    rs->migration_dirty_pages = pages;
> +
> +    rs->last_seen_block = NULL;
> +    rs->last_sent_block = NULL;
> +    rs->last_page = 0;
> +    rs->last_version = ram_list.version;
> +
> +    trace_ram_state_resume_prepare(pages);

Yes, I think this is fine;  I wonder what happens if pages is 0?
However,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> @@ -2909,8 +2936,16 @@ out:
>  static int ram_resume_prepare(MigrationState *s, void *opaque)
>  {
>      RAMState *rs = *(RAMState **)opaque;
> +    int ret;
>  
> -    return ram_dirty_bitmap_sync_all(s, rs);
> +    ret = ram_dirty_bitmap_sync_all(s, rs);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    ram_state_resume_prepare(rs);
> +
> +    return 0;
>  }
>  
>  static SaveVMHandlers savevm_ram_handlers = {
> diff --git a/migration/trace-events b/migration/trace-events
> index 8962916..6e06283 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -85,6 +85,7 @@ ram_dirty_bitmap_sync_start(void) ""
>  ram_dirty_bitmap_sync_wait(void) ""
>  ram_dirty_bitmap_sync_notify(void) ""
>  ram_dirty_bitmap_sync_complete(void) ""
> +ram_state_resume_prepare(long v) "%ld"
>  
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
@ 2017-09-22 11:56   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 11:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Finish the last step to do the final handshake for the recovery.
> 
> First source sends one MIG_CMD_RESUME to dst, telling that source is
> ready to resume.
> 
> Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that
> dest is ready to resume (after switch to postcopy-active state).
> 
> When source received the RESUME_ACK, it switches its state to
> postcopy-active, and finally the recovery is completed.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 19aed72..c9b7085 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1694,7 +1694,8 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
>      migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
>                        MIGRATION_STATUS_POSTCOPY_ACTIVE);
>  
> -    /* TODO: notify send thread that time to continue send pages */
> +    /* Notify send thread that time to continue send pages */
> +    qemu_sem_post(&s->rp_state.rp_sem);
>  
>      return 0;
>  }
> @@ -2169,6 +2170,21 @@ typedef enum MigThrError {
>      MIG_THR_ERR_FATAL = 2,
>  } MigThrError;
>  
> +static int postcopy_resume_handshake(MigrationState *s)
> +{
> +    qemu_savevm_send_postcopy_resume(s->to_dst_file);
> +
> +    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        qemu_sem_wait(&s->rp_state.rp_sem);
> +    }
> +
> +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        return 0;
> +    }
> +
> +    return -1;
> +}
> +
>  /* Return zero if success, or <0 for error */
>  static int postcopy_do_resume(MigrationState *s)
>  {
> @@ -2186,10 +2202,14 @@ static int postcopy_do_resume(MigrationState *s)
>      }
>  
>      /*
> -     * TODO: handshake with dest using MIG_CMD_RESUME,
> -     * MIG_RP_MSG_RESUME_ACK, then switch source state to
> -     * "postcopy-active"
> +     * Last handshake with destination on the resume (destination will
> +     * switch to postcopy-active afterwards)
>       */
> +    ret = postcopy_resume_handshake(s);
> +    if (ret) {
> +        error_report("%s: handshake failed: %d", __func__, ret);
> +        return ret;
> +    }
>  
>      return 0;
>  }
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
@ 2017-09-22 20:08   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 20:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Freeing the SocketAddress struct in socket_start_incoming_migration is
> slightly confusing. Let's free the address in the same context where we
> allocated it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/socket.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/socket.c b/migration/socket.c
> index 757d382..9fc6cb3 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -168,7 +168,6 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
>  
>      if (qio_channel_socket_listen_sync(listen_ioc, saddr, errp) < 0) {
>          object_unref(OBJECT(listen_ioc));
> -        qapi_free_SocketAddress(saddr);
>          return;
>      }
>  
> @@ -177,7 +176,6 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
>                            socket_accept_incoming_migration,
>                            listen_ioc,
>                            (GDestroyNotify)object_unref);
> -    qapi_free_SocketAddress(saddr);
>  }
>  
>  void tcp_start_incoming_migration(const char *host_port, Error **errp)
> @@ -188,10 +186,12 @@ void tcp_start_incoming_migration(const char *host_port, Error **errp)
>          socket_start_incoming_migration(saddr, &err);
>      }
>      error_propagate(errp, err);
> +    qapi_free_SocketAddress(saddr);
>  }
>  
>  void unix_start_incoming_migration(const char *path, Error **errp)
>  {
>      SocketAddress *saddr = unix_build_address(path);
>      socket_start_incoming_migration(saddr, errp);
> +    qapi_free_SocketAddress(saddr);
>  }
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
@ 2017-09-22 20:11   ` Dr. David Alan Gilbert
  2017-09-28  3:12     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 20:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> For socket based incoming migration, we attached a background task onto
> main loop to handle the acception of connections. We never had a way to
> destroy it before, only if we finished the migration.
> 
> Let's allow socket_start_incoming_migration() to return the source tag
> of the listening async work, so that we may be able to clean it up in
> the future.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/socket.c | 36 ++++++++++++++++++++++++------------
>  migration/socket.h |  4 ++--
>  2 files changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/socket.c b/migration/socket.c
> index 9fc6cb3..6ee51ef 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -158,8 +158,12 @@ out:
>  }
>  
>  
> -static void socket_start_incoming_migration(SocketAddress *saddr,
> -                                            Error **errp)
> +/*
> + * Returns the tag ID of the watch that is attached to global main
> + * loop (>0), or zero if failure detected.
> + */
> +static guint socket_start_incoming_migration(SocketAddress *saddr,
> +                                             Error **errp)
>  {
>      QIOChannelSocket *listen_ioc = qio_channel_socket_new();
>  
> @@ -168,30 +172,38 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
>  
>      if (qio_channel_socket_listen_sync(listen_ioc, saddr, errp) < 0) {
>          object_unref(OBJECT(listen_ioc));
> -        return;
> +        return 0;
>      }
>  
> -    qio_channel_add_watch(QIO_CHANNEL(listen_ioc),
> -                          G_IO_IN,
> -                          socket_accept_incoming_migration,
> -                          listen_ioc,
> -                          (GDestroyNotify)object_unref);
> +    return qio_channel_add_watch(QIO_CHANNEL(listen_ioc),
> +                                 G_IO_IN,
> +                                 socket_accept_incoming_migration,
> +                                 listen_ioc,
> +                                 (GDestroyNotify)object_unref);
>  }
>  
> -void tcp_start_incoming_migration(const char *host_port, Error **errp)
> +guint tcp_start_incoming_migration(const char *host_port, Error **errp)
>  {
>      Error *err = NULL;
>      SocketAddress *saddr = tcp_build_address(host_port, &err);
> +    guint tag;
> +
>      if (!err) {
> -        socket_start_incoming_migration(saddr, &err);
> +        tag = socket_start_incoming_migration(saddr, &err);
>      }

I'd be tempted to initialise that tag = 0   for the case where
there's an error; but OK.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>      error_propagate(errp, err);
>      qapi_free_SocketAddress(saddr);
> +
> +    return tag;
>  }
>  
> -void unix_start_incoming_migration(const char *path, Error **errp)
> +guint unix_start_incoming_migration(const char *path, Error **errp)
>  {
>      SocketAddress *saddr = unix_build_address(path);
> -    socket_start_incoming_migration(saddr, errp);
> +    guint tag;
> +
> +    tag = socket_start_incoming_migration(saddr, errp);
>      qapi_free_SocketAddress(saddr);
> +
> +    return tag;
>  }
> diff --git a/migration/socket.h b/migration/socket.h
> index 6b91e9d..bc8a59a 100644
> --- a/migration/socket.h
> +++ b/migration/socket.h
> @@ -16,12 +16,12 @@
>  
>  #ifndef QEMU_MIGRATION_SOCKET_H
>  #define QEMU_MIGRATION_SOCKET_H
> -void tcp_start_incoming_migration(const char *host_port, Error **errp);
> +guint tcp_start_incoming_migration(const char *host_port, Error **errp);
>  
>  void tcp_start_outgoing_migration(MigrationState *s, const char *host_port,
>                                    Error **errp);
>  
> -void unix_start_incoming_migration(const char *path, Error **errp);
> +guint unix_start_incoming_migration(const char *path, Error **errp);
>  
>  void unix_start_outgoing_migration(MigrationState *s, const char *path,
>                                     Error **errp);
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
@ 2017-09-22 20:15   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 20:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Return the async task tag for exec typed incoming migration in
> exec_start_incoming_migration().
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/exec.c | 18 +++++++++++-------
>  migration/exec.h |  2 +-
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/exec.c b/migration/exec.c
> index 08b599e..ef1fb4c 100644
> --- a/migration/exec.c
> +++ b/migration/exec.c
> @@ -52,7 +52,11 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
>      return FALSE; /* unregister */
>  }
>  
> -void exec_start_incoming_migration(const char *command, Error **errp)
> +/*
> + * Returns the tag ID of the watch that is attached to global main
> + * loop (>0), or zero if failure detected.
> + */
> +guint exec_start_incoming_migration(const char *command, Error **errp)
>  {
>      QIOChannel *ioc;
>      const char *argv[] = { "/bin/sh", "-c", command, NULL };
> @@ -62,13 +66,13 @@ void exec_start_incoming_migration(const char *command, Error **errp)
>                                                      O_RDWR,
>                                                      errp));
>      if (!ioc) {
> -        return;
> +        return 0;
>      }
>  
>      qio_channel_set_name(ioc, "migration-exec-incoming");
> -    qio_channel_add_watch(ioc,
> -                          G_IO_IN,
> -                          exec_accept_incoming_migration,
> -                          NULL,
> -                          NULL);
> +    return qio_channel_add_watch(ioc,
> +                                 G_IO_IN,
> +                                 exec_accept_incoming_migration,
> +                                 NULL,
> +                                 NULL);
>  }
> diff --git a/migration/exec.h b/migration/exec.h
> index b210ffd..0a7aada 100644
> --- a/migration/exec.h
> +++ b/migration/exec.h
> @@ -19,7 +19,7 @@
>  
>  #ifndef QEMU_MIGRATION_EXEC_H
>  #define QEMU_MIGRATION_EXEC_H
> -void exec_start_incoming_migration(const char *host_port, Error **errp);
> +guint exec_start_incoming_migration(const char *host_port, Error **errp);
>  
>  void exec_start_outgoing_migration(MigrationState *s, const char *host_port,
>                                     Error **errp);
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
@ 2017-09-22 20:15   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 20:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Allow to return the task tag in fd_start_incoming_migration().
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/fd.c | 18 +++++++++++-------
>  migration/fd.h |  2 +-
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/fd.c b/migration/fd.c
> index 30f5258..e9a548c 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -52,7 +52,11 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>      return FALSE; /* unregister */
>  }
>  
> -void fd_start_incoming_migration(const char *infd, Error **errp)
> +/*
> + * Returns the tag ID of the watch that is attached to global main
> + * loop (>0), or zero if failure detected.
> + */
> +guint fd_start_incoming_migration(const char *infd, Error **errp)
>  {
>      QIOChannel *ioc;
>      int fd;
> @@ -63,13 +67,13 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
>      ioc = qio_channel_new_fd(fd, errp);
>      if (!ioc) {
>          close(fd);
> -        return;
> +        return 0;
>      }
>  
>      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-incoming");
> -    qio_channel_add_watch(ioc,
> -                          G_IO_IN,
> -                          fd_accept_incoming_migration,
> -                          NULL,
> -                          NULL);
> +    return qio_channel_add_watch(ioc,
> +                                 G_IO_IN,
> +                                 fd_accept_incoming_migration,
> +                                 NULL,
> +                                 NULL);
>  }
> diff --git a/migration/fd.h b/migration/fd.h
> index a14a63c..94cdea8 100644
> --- a/migration/fd.h
> +++ b/migration/fd.h
> @@ -16,7 +16,7 @@
>  
>  #ifndef QEMU_MIGRATION_FD_H
>  #define QEMU_MIGRATION_FD_H
> -void fd_start_incoming_migration(const char *path, Error **errp);
> +guint fd_start_incoming_migration(const char *path, Error **errp);
>  
>  void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
>                                   Error **errp);
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 31/33] migration: store listen task tag
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
@ 2017-09-22 20:17   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 20:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Store the task tag for migration types: tcp/unix/fd/exec in current
> MigrationIncomingState struct.
> 
> For defered migration, no need to store task tag since there is no task
> running in the main loop at all. For RDMA, let's mark it as todo.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 22 ++++++++++++++++++----
>  migration/migration.h |  2 ++
>  2 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index c9b7085..daf356b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -171,6 +171,7 @@ void migration_incoming_state_destroy(void)
>          mis->from_src_file = NULL;
>      }
>  
> +    mis->listen_task_tag = 0;
>      qemu_event_destroy(&mis->main_thread_load_event);
>  }
>  
> @@ -265,25 +266,31 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
>  void qemu_start_incoming_migration(const char *uri, Error **errp)
>  {
>      const char *p;
> +    guint task_tag = 0;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      qapi_event_send_migration(MIGRATION_STATUS_SETUP, &error_abort);
>      if (!strcmp(uri, "defer")) {
>          deferred_incoming_migration(errp);
>      } else if (strstart(uri, "tcp:", &p)) {
> -        tcp_start_incoming_migration(p, errp);
> +        task_tag = tcp_start_incoming_migration(p, errp);
>  #ifdef CONFIG_RDMA
>      } else if (strstart(uri, "rdma:", &p)) {
> +        /* TODO: store task tag for RDMA migrations */
>          rdma_start_incoming_migration(p, errp);
>  #endif
>      } else if (strstart(uri, "exec:", &p)) {
> -        exec_start_incoming_migration(p, errp);
> +        task_tag = exec_start_incoming_migration(p, errp);
>      } else if (strstart(uri, "unix:", &p)) {
> -        unix_start_incoming_migration(p, errp);
> +        task_tag = unix_start_incoming_migration(p, errp);
>      } else if (strstart(uri, "fd:", &p)) {
> -        fd_start_incoming_migration(p, errp);
> +        task_tag = fd_start_incoming_migration(p, errp);
>      } else {
>          error_setg(errp, "unknown migration protocol: %s", uri);
> +        return;
>      }
> +
> +    mis->listen_task_tag = task_tag;
>  }
>  
>  static void process_incoming_migration_bh(void *opaque)
> @@ -422,6 +429,13 @@ void migration_fd_process_incoming(QEMUFile *f)
>          co = qemu_coroutine_create(process_incoming_migration_co, f);
>          qemu_coroutine_enter(co);
>      }
> +
> +    /*
> +     * When reach here, we should not need the listening port any
> +     * more. We'll detach the listening task soon, let's reset the
> +     * listen task tag.
> +     */
> +    mis->listen_task_tag = 0;
>  }
>  
>  /*
> diff --git a/migration/migration.h b/migration/migration.h
> index d041369..1f4faef 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -26,6 +26,8 @@
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
> +    /* Task tag for incoming listen port. Valid when >0. */
> +    guint listen_task_tag;
>  
>      /*
>       * Free at the start of the main state load, set as the main thread finishes
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
@ 2017-09-22 20:32   ` Dr. David Alan Gilbert
  2017-09-28  6:54     ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 20:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> migrate_incoming command is previously only used when we were providing
> "-incoming defer" in the command line, to defer the incoming migration
> channel creation.
> 
> However there is similar requirement when we are paused during postcopy
> migration. The old incoming channel might have been destroyed already.
> We may need another new channel for the recovery to happen.
> 
> This patch leveraged the same interface, but allows the user to specify
> incoming migration channel even for paused postcopy.
> 
> Meanwhile, now migration listening ports are always detached manually
> using the tag, rather than using return values of dispatchers.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/exec.c      |  2 +-
>  migration/fd.c        |  2 +-
>  migration/migration.c | 39 +++++++++++++++++++++++++++++----------
>  migration/socket.c    |  2 +-
>  4 files changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/migration/exec.c b/migration/exec.c
> index ef1fb4c..26fc37d 100644
> --- a/migration/exec.c
> +++ b/migration/exec.c
> @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
>  {
>      migration_channel_process_incoming(ioc);
>      object_unref(OBJECT(ioc));
> -    return FALSE; /* unregister */
> +    return TRUE; /* keep it registered */
>  }
>  
>  /*
> diff --git a/migration/fd.c b/migration/fd.c
> index e9a548c..7d0aefa 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>  {
>      migration_channel_process_incoming(ioc);
>      object_unref(OBJECT(ioc));
> -    return FALSE; /* unregister */
> +    return TRUE; /* keep it registered */
>  }
>  
>  /*
> diff --git a/migration/migration.c b/migration/migration.c
> index daf356b..5812478 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -175,6 +175,17 @@ void migration_incoming_state_destroy(void)
>      qemu_event_destroy(&mis->main_thread_load_event);
>  }
>  
> +static bool migrate_incoming_detach_listen(MigrationIncomingState *mis)
> +{
> +    if (mis->listen_task_tag) {
> +        /* Never fail */
> +        g_source_remove(mis->listen_task_tag);
> +        mis->listen_task_tag = 0;
> +        return true;
> +    }
> +    return false;
> +}
> +
>  static void migrate_generate_event(int new_state)
>  {
>      if (migrate_use_events()) {
> @@ -432,10 +443,9 @@ void migration_fd_process_incoming(QEMUFile *f)
>  
>      /*
>       * When reach here, we should not need the listening port any
> -     * more. We'll detach the listening task soon, let's reset the
> -     * listen task tag.
> +     * more.  Detach the listening port explicitly.
>       */
> -    mis->listen_task_tag = 0;
> +    migrate_incoming_detach_listen(mis);
>  }
>  
>  /*
> @@ -1291,14 +1301,25 @@ void migrate_del_blocker(Error *reason)
>  void qmp_migrate_incoming(const char *uri, Error **errp)
>  {
>      Error *local_err = NULL;
> -    static bool once = true;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>  
> -    if (!deferred_incoming) {
> -        error_setg(errp, "For use with '-incoming defer'");
> +    if (!deferred_incoming &&
> +        mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        error_setg(errp, "For use with '-incoming defer'"
> +                   " or PAUSED postcopy migration only.");
>          return;
>      }
> -    if (!once) {
> -        error_setg(errp, "The incoming migration has already been started");

What guards against someone doing a migrate_incoming after the succesful
completion of an incoming migration?
Also with RDMA the following won't happen so I'm not quite sure what
state we're in.

When we get to non-blocking commands it's also a bit interesting - we
could be getting an accept on the main thread at just the same time
this is going down the OOB side.

Dave

> +
> +    /*
> +     * Destroy existing listening task if exist. Logically this should
> +     * not really happen at all (for either deferred migration or
> +     * postcopy migration, we should both detached the listening
> +     * task). So raise an error but still we safely detach it.
> +     */
> +    if (migrate_incoming_detach_listen(mis)) {
> +        error_report("%s: detected existing listen channel, "
> +                     "while it should not exist", __func__);
> +        /* Continue */
>      }
>  
>      qemu_start_incoming_migration(uri, &local_err);
> @@ -1307,8 +1328,6 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
>          error_propagate(errp, local_err);
>          return;
>      }
> -
> -    once = false;
>  }
>  
>  bool migration_is_blocked(Error **errp)
> diff --git a/migration/socket.c b/migration/socket.c
> index 6ee51ef..e3e453f 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -154,7 +154,7 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
>  out:
>      /* Close listening socket as its no longer needed */
>      qio_channel_close(ioc, NULL);
> -    return FALSE; /* unregister */
> +    return TRUE; /* keep it registered */
>  }
>  
>  
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too
  2017-08-30  8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
@ 2017-09-22 20:37   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-22 20:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Though we may not need it, now we init both the src/dst migration
> objects in migration_object_init() so that even incoming migration
> object would be thread safe (it was not).
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5812478..7e9ccf0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -103,6 +103,7 @@ enum mig_rp_message_type {
>     dynamic creation of migration */
>  
>  static MigrationState *current_migration;
> +static MigrationIncomingState *current_incoming;
>  
>  static bool migration_object_check(MigrationState *ms, Error **errp);
>  
> @@ -128,6 +129,17 @@ void migration_object_init(void)
>      if (ms->enforce_config_section) {
>          current_migration->send_configuration = true;
>      }
> +
> +    /*
> +     * Init the migrate incoming object as well no matter whether
> +     * we'll use it or not.
> +     */
> +    current_incoming = g_new0(MigrationIncomingState, 1);
> +    current_incoming->state = MIGRATION_STATUS_NONE;
> +    qemu_mutex_init(&current_incoming->rp_mutex);
> +    qemu_event_init(&current_incoming->main_thread_load_event, false);
> +    qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
> +    qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
>  }
>  
>  /* For outgoing */
> @@ -140,19 +152,8 @@ MigrationState *migrate_get_current(void)
>  
>  MigrationIncomingState *migration_incoming_get_current(void)
>  {
> -    static bool once;
> -    static MigrationIncomingState mis_current;
> -
> -    if (!once) {
> -        mis_current.state = MIGRATION_STATUS_NONE;
> -        memset(&mis_current, 0, sizeof(MigrationIncomingState));
> -        qemu_mutex_init(&mis_current.rp_mutex);
> -        qemu_event_init(&mis_current.main_thread_load_event, false);
> -        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> -        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
> -        once = true;
> -    }
> -    return &mis_current;
> +    assert(current_incoming);
> +    return current_incoming;
>  }
>  
>  void migration_incoming_state_destroy(void)
> -- 
> 2.7.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile
  2017-09-21 17:51   ` Dr. David Alan Gilbert
@ 2017-09-26  8:48     ` Peter Xu
  2017-09-26  8:53       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-09-26  8:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Thu, Sep 21, 2017 at 06:51:37PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > If the postcopy down due to some reason, we can always see this on dst:
> > 
> >   qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000
> > 
> > However in most cases that's not the real issue. The problem is that
> > qemu_get_be16() has no way to show whether the returned data is valid or
> > not, and we are _always_ assuming it is valid. That's possibly not wise.
> > 
> > The best approach to solve this would be: refactoring QEMUFile interface
> > to allow the APIs to return error if there is. However it needs quite a
> > bit of work and testing. For now, let's explicitly check the validity
> > first before using the data in all places for qemu_get_*().
> > 
> > This patch tries to fix most of the cases I can see. Only if we are with
> > this, can we make sure we are processing the valid data, and also can we
> > make sure we can capture the channel down events correctly.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c |  5 +++++
> >  migration/ram.c       | 22 ++++++++++++++++++----
> >  migration/savevm.c    | 41 +++++++++++++++++++++++++++++++++++++++--
> >  3 files changed, 62 insertions(+), 6 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c818412..92bf9b8 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1543,6 +1543,11 @@ static void *source_return_path_thread(void *opaque)
> >          header_type = qemu_get_be16(rp);
> >          header_len = qemu_get_be16(rp);
> >  
> > +        if (qemu_file_get_error(rp)) {
> > +            mark_source_rp_bad(ms);
> > +            goto out;
> > +        }
> > +
> >          if (header_type >= MIG_RP_MSG_MAX ||
> >              header_type == MIG_RP_MSG_INVALID) {
> >              error_report("RP: Received invalid message 0x%04x length 0x%04x",
> > diff --git a/migration/ram.c b/migration/ram.c
> > index affb20c..7e20097 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2417,7 +2417,7 @@ static int ram_load_postcopy(QEMUFile *f)
> >      void *last_host = NULL;
> >      bool all_zero = false;
> >  
> > -    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> > +    while (!(flags & RAM_SAVE_FLAG_EOS)) {
> 
> With this change, I don't see what's checking the result of:
>    ret = postcopy_place_page(...)
> at the bottom of the loop.

Indeed.

I'll add the check after postcopy_place_page().

> 
> >          ram_addr_t addr;
> >          void *host = NULL;
> >          void *page_buffer = NULL;
> > @@ -2426,6 +2426,16 @@ static int ram_load_postcopy(QEMUFile *f)
> >          uint8_t ch;
> >  
> >          addr = qemu_get_be64(f);
> > +
> > +        /*
> > +         * If qemu file error, we should stop here, and then "addr"
> > +         * may be invalid
> > +         */
> > +        ret = qemu_file_get_error(f);
> > +        if (ret) {
> > +            break;
> > +        }
> > +
> >          flags = addr & ~TARGET_PAGE_MASK;
> >          addr &= TARGET_PAGE_MASK;
> >  
> > @@ -2506,6 +2516,13 @@ static int ram_load_postcopy(QEMUFile *f)
> >              error_report("Unknown combination of migration flags: %#x"
> >                           " (postcopy mode)", flags);
> >              ret = -EINVAL;
> > +            break;
> > +        }
> > +
> > +        /* Detect for any possible file errors */
> > +        if (qemu_file_get_error(f)) {
> > +            ret = qemu_file_get_error(f);
> > +            break;
> >          }
> >  
> >          if (place_needed) {
> > @@ -2520,9 +2537,6 @@ static int ram_load_postcopy(QEMUFile *f)
> >                                            place_source, block);
> >              }
> >          }
> > -        if (!ret) {
> > -            ret = qemu_file_get_error(f);
> > -        }
> >      }
> >  
> >      return ret;
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index fdd15fa..7172f14 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1720,6 +1720,11 @@ static int loadvm_process_command(QEMUFile *f)
> >      cmd = qemu_get_be16(f);
> >      len = qemu_get_be16(f);
> >  
> > +    /* Check validity before continue processing of cmds */
> > +    if (qemu_file_get_error(f)) {
> > +        return qemu_file_get_error(f);
> > +    }
> > +
> >      trace_loadvm_process_command(cmd, len);
> >      if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
> >          error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
> > @@ -1785,6 +1790,7 @@ static int loadvm_process_command(QEMUFile *f)
> >   */
> >  static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
> >  {
> > +    int ret;
> >      uint8_t read_mark;
> >      uint32_t read_section_id;
> >  
> > @@ -1795,6 +1801,13 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
> >  
> >      read_mark = qemu_get_byte(f);
> >  
> > +    ret = qemu_file_get_error(f);
> > +    if (ret) {
> > +        error_report("%s: Read section footer failed: %d",
> > +                     __func__, ret);
> > +        return false;
> > +    }
> > +
> >      if (read_mark != QEMU_VM_SECTION_FOOTER) {
> >          error_report("Missing section footer for %s", se->idstr);
> >          return false;
> > @@ -1830,6 +1843,13 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
> >      instance_id = qemu_get_be32(f);
> >      version_id = qemu_get_be32(f);
> >  
> > +    ret = qemu_file_get_error(f);
> > +    if (ret) {
> > +        error_report("%s: Failed to read instance/version ID: %d",
> > +                     __func__, ret);
> > +        return ret;
> > +    }
> > +
> >      trace_qemu_loadvm_state_section_startfull(section_id, idstr,
> >              instance_id, version_id);
> >      /* Find savevm section */
> > @@ -1877,6 +1897,13 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
> >  
> >      section_id = qemu_get_be32(f);
> >  
> > +    ret = qemu_file_get_error(f);
> > +    if (ret) {
> > +        error_report("%s: Failed to read section ID: %d",
> > +                     __func__, ret);
> > +        return ret;
> > +    }
> > +
> >      trace_qemu_loadvm_state_section_partend(section_id);
> >      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> >          if (se->load_section_id == section_id) {
> > @@ -1944,8 +1971,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> >      uint8_t section_type;
> >      int ret = 0;
> >  
> > -    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> > -        ret = 0;
> > +    while (true) {
> > +        section_type = qemu_get_byte(f);
> > +
> > +        if (qemu_file_get_error(f)) {
> > +            ret = qemu_file_get_error(f);
> > +            break;
> > +        }
> > +
> >          trace_qemu_loadvm_state_section(section_type);
> >          switch (section_type) {
> >          case QEMU_VM_SECTION_START:
> > @@ -1969,6 +2002,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> >                  goto out;
> >              }
> >              break;
> > +        case QEMU_VM_EOF:
> > +            /* This is the end of migration */
> > +            goto out;
> > +            break;
> 
> You don't need the goto and the break (although it does no harm).

I still need the goto to break the loop? (I think a single break will
only break the select, but not the loop)

If I remove both "goto" and "break", it'll fall through to default, I
suppose that's not what we want?

> 
> Dave
> 
> >          default:
> >              error_report("Unknown savevm section type %d", section_type);
> >              ret = -EINVAL;
> > -- 
> > 2.7.4
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile
  2017-09-26  8:48     ` Peter Xu
@ 2017-09-26  8:53       ` Dr. David Alan Gilbert
  2017-09-26  9:13         ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-26  8:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Sep 21, 2017 at 06:51:37PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > If the postcopy down due to some reason, we can always see this on dst:
> > > 
> > >   qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000
> > > 
> > > However in most cases that's not the real issue. The problem is that
> > > qemu_get_be16() has no way to show whether the returned data is valid or
> > > not, and we are _always_ assuming it is valid. That's possibly not wise.
> > > 
> > > The best approach to solve this would be: refactoring QEMUFile interface
> > > to allow the APIs to return error if there is. However it needs quite a
> > > bit of work and testing. For now, let's explicitly check the validity
> > > first before using the data in all places for qemu_get_*().
> > > 
> > > This patch tries to fix most of the cases I can see. Only if we are with
> > > this, can we make sure we are processing the valid data, and also can we
> > > make sure we can capture the channel down events correctly.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c |  5 +++++
> > >  migration/ram.c       | 22 ++++++++++++++++++----
> > >  migration/savevm.c    | 41 +++++++++++++++++++++++++++++++++++++++--
> > >  3 files changed, 62 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index c818412..92bf9b8 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1543,6 +1543,11 @@ static void *source_return_path_thread(void *opaque)
> > >          header_type = qemu_get_be16(rp);
> > >          header_len = qemu_get_be16(rp);
> > >  
> > > +        if (qemu_file_get_error(rp)) {
> > > +            mark_source_rp_bad(ms);
> > > +            goto out;
> > > +        }
> > > +
> > >          if (header_type >= MIG_RP_MSG_MAX ||
> > >              header_type == MIG_RP_MSG_INVALID) {
> > >              error_report("RP: Received invalid message 0x%04x length 0x%04x",
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index affb20c..7e20097 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2417,7 +2417,7 @@ static int ram_load_postcopy(QEMUFile *f)
> > >      void *last_host = NULL;
> > >      bool all_zero = false;
> > >  
> > > -    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> > > +    while (!(flags & RAM_SAVE_FLAG_EOS)) {
> > 
> > With this change, I don't see what's checking the result of:
> >    ret = postcopy_place_page(...)
> > at the bottom of the loop.
> 
> Indeed.
> 
> I'll add the check after postcopy_place_page().
> 
> > 
> > >          ram_addr_t addr;
> > >          void *host = NULL;
> > >          void *page_buffer = NULL;
> > > @@ -2426,6 +2426,16 @@ static int ram_load_postcopy(QEMUFile *f)
> > >          uint8_t ch;
> > >  
> > >          addr = qemu_get_be64(f);
> > > +
> > > +        /*
> > > +         * If qemu file error, we should stop here, and then "addr"
> > > +         * may be invalid
> > > +         */
> > > +        ret = qemu_file_get_error(f);
> > > +        if (ret) {
> > > +            break;
> > > +        }
> > > +
> > >          flags = addr & ~TARGET_PAGE_MASK;
> > >          addr &= TARGET_PAGE_MASK;
> > >  
> > > @@ -2506,6 +2516,13 @@ static int ram_load_postcopy(QEMUFile *f)
> > >              error_report("Unknown combination of migration flags: %#x"
> > >                           " (postcopy mode)", flags);
> > >              ret = -EINVAL;
> > > +            break;
> > > +        }
> > > +
> > > +        /* Detect for any possible file errors */
> > > +        if (qemu_file_get_error(f)) {
> > > +            ret = qemu_file_get_error(f);
> > > +            break;
> > >          }
> > >  
> > >          if (place_needed) {
> > > @@ -2520,9 +2537,6 @@ static int ram_load_postcopy(QEMUFile *f)
> > >                                            place_source, block);
> > >              }
> > >          }
> > > -        if (!ret) {
> > > -            ret = qemu_file_get_error(f);
> > > -        }
> > >      }
> > >  
> > >      return ret;
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index fdd15fa..7172f14 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -1720,6 +1720,11 @@ static int loadvm_process_command(QEMUFile *f)
> > >      cmd = qemu_get_be16(f);
> > >      len = qemu_get_be16(f);
> > >  
> > > +    /* Check validity before continue processing of cmds */
> > > +    if (qemu_file_get_error(f)) {
> > > +        return qemu_file_get_error(f);
> > > +    }
> > > +
> > >      trace_loadvm_process_command(cmd, len);
> > >      if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
> > >          error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
> > > @@ -1785,6 +1790,7 @@ static int loadvm_process_command(QEMUFile *f)
> > >   */
> > >  static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
> > >  {
> > > +    int ret;
> > >      uint8_t read_mark;
> > >      uint32_t read_section_id;
> > >  
> > > @@ -1795,6 +1801,13 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
> > >  
> > >      read_mark = qemu_get_byte(f);
> > >  
> > > +    ret = qemu_file_get_error(f);
> > > +    if (ret) {
> > > +        error_report("%s: Read section footer failed: %d",
> > > +                     __func__, ret);
> > > +        return false;
> > > +    }
> > > +
> > >      if (read_mark != QEMU_VM_SECTION_FOOTER) {
> > >          error_report("Missing section footer for %s", se->idstr);
> > >          return false;
> > > @@ -1830,6 +1843,13 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
> > >      instance_id = qemu_get_be32(f);
> > >      version_id = qemu_get_be32(f);
> > >  
> > > +    ret = qemu_file_get_error(f);
> > > +    if (ret) {
> > > +        error_report("%s: Failed to read instance/version ID: %d",
> > > +                     __func__, ret);
> > > +        return ret;
> > > +    }
> > > +
> > >      trace_qemu_loadvm_state_section_startfull(section_id, idstr,
> > >              instance_id, version_id);
> > >      /* Find savevm section */
> > > @@ -1877,6 +1897,13 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
> > >  
> > >      section_id = qemu_get_be32(f);
> > >  
> > > +    ret = qemu_file_get_error(f);
> > > +    if (ret) {
> > > +        error_report("%s: Failed to read section ID: %d",
> > > +                     __func__, ret);
> > > +        return ret;
> > > +    }
> > > +
> > >      trace_qemu_loadvm_state_section_partend(section_id);
> > >      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > >          if (se->load_section_id == section_id) {
> > > @@ -1944,8 +1971,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > >      uint8_t section_type;
> > >      int ret = 0;
> > >  
> > > -    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> > > -        ret = 0;
> > > +    while (true) {
> > > +        section_type = qemu_get_byte(f);
> > > +
> > > +        if (qemu_file_get_error(f)) {
> > > +            ret = qemu_file_get_error(f);
> > > +            break;
> > > +        }
> > > +
> > >          trace_qemu_loadvm_state_section(section_type);
> > >          switch (section_type) {
> > >          case QEMU_VM_SECTION_START:
> > > @@ -1969,6 +2002,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > >                  goto out;
> > >              }
> > >              break;
> > > +        case QEMU_VM_EOF:
> > > +            /* This is the end of migration */
> > > +            goto out;
> > > +            break;
> > 
> > You don't need the goto and the break (although it does no harm).
> 
> I still need the goto to break the loop? (I think a single break will
> only break the select, but not the loop)
> 
> If I remove both "goto" and "break", it'll fall through to default, I
> suppose that's not what we want?

No, but if you have the 'goto' you don't need the 'break'.

Dave

> > 
> > Dave
> > 
> > >          default:
> > >              error_report("Unknown savevm section type %d", section_type);
> > >              ret = -EINVAL;
> > > -- 
> > > 2.7.4
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile
  2017-09-26  8:53       ` Dr. David Alan Gilbert
@ 2017-09-26  9:13         ` Peter Xu
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-09-26  9:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Tue, Sep 26, 2017 at 09:53:44AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Sep 21, 2017 at 06:51:37PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > > > @@ -1969,6 +2002,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > >                  goto out;
> > > >              }
> > > >              break;
> > > > +        case QEMU_VM_EOF:
> > > > +            /* This is the end of migration */
> > > > +            goto out;
> > > > +            break;
> > > 
> > > You don't need the goto and the break (although it does no harm).
> > 
> > I still need the goto to break the loop? (I think a single break will
> > only break the select, but not the loop)
> > 
> > If I remove both "goto" and "break", it'll fall through to default, I
> > suppose that's not what we want?
> 
> No, but if you have the 'goto' you don't need the 'break'.

I see.  Let me remove the "break" after "goto" then.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic
  2017-09-21 19:21   ` Dr. David Alan Gilbert
@ 2017-09-26  9:35     ` Peter Xu
  2017-10-09 15:32       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-09-26  9:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Thu, Sep 21, 2017 at 08:21:45PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Now when network down for postcopy, the source side will not fail the
> > migration. Instead we convert the status into this new paused state, and
> > we will try to wait for a rescue in the future.
> > 
> > If a recovery is detected, migration_thread() will reset its local
> > variables to prepare for that.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++++---
> >  migration/migration.h  |  3 ++
> >  migration/trace-events |  1 +
> >  3 files changed, 98 insertions(+), 4 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index f6130db..8d26ea8 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -993,6 +993,8 @@ static void migrate_fd_cleanup(void *opaque)
> >  
> >      notifier_list_notify(&migration_state_notifiers, s);
> >      block_cleanup_parameters(s);
> > +
> > +    qemu_sem_destroy(&s->postcopy_pause_sem);
> >  }
> >  
> >  void migrate_fd_error(MigrationState *s, const Error *error)
> > @@ -1136,6 +1138,7 @@ MigrationState *migrate_init(void)
> >      s->migration_thread_running = false;
> >      error_free(s->error);
> >      s->error = NULL;
> > +    qemu_sem_init(&s->postcopy_pause_sem, 0);
> >  
> >      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
> >  
> > @@ -1938,6 +1941,80 @@ bool migrate_colo_enabled(void)
> >      return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
> >  }
> >  
> > +typedef enum MigThrError {
> > +    /* No error detected */
> > +    MIG_THR_ERR_NONE = 0,
> > +    /* Detected error, but resumed successfully */
> > +    MIG_THR_ERR_RECOVERED = 1,
> > +    /* Detected fatal error, need to exit */
> > +    MIG_THR_ERR_FATAL = 2,
> 
> I don't think it's necessary to assign the values there, but it's OK.
> 
> > +} MigThrError;
> > +
> > +/*
> > + * We don't return until we are in a safe state to continue current
> > + * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
> > + * MIG_THR_ERR_FATAL if unrecovery failure happened.
> > + */
> > +static MigThrError postcopy_pause(MigrationState *s)
> > +{
> > +    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > +    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > +
> > +    /* Current channel is possibly broken. Release it. */
> > +    assert(s->to_dst_file);
> > +    qemu_file_shutdown(s->to_dst_file);
> > +    qemu_fclose(s->to_dst_file);
> > +    s->to_dst_file = NULL;
> > +
> > +    error_report("Detected IO failure for postcopy. "
> > +                 "Migration paused.");
> > +
> > +    /*
> > +     * We wait until things fixed up. Then someone will setup the
> > +     * status back for us.
> > +     */
> > +    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +        qemu_sem_wait(&s->postcopy_pause_sem);
> > +    }
> > +
> > +    trace_postcopy_pause_continued();
> > +
> > +    return MIG_THR_ERR_RECOVERED;
> > +}
> > +
> > +static MigThrError migration_detect_error(MigrationState *s)
> > +{
> > +    int ret;
> > +
> > +    /* Try to detect any file errors */
> > +    ret = qemu_file_get_error(s->to_dst_file);
> > +
> > +    if (!ret) {
> > +        /* Everything is fine */
> > +        return MIG_THR_ERR_NONE;
> > +    }
> > +
> > +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> 
> We do need to make sure that whenever we hit a failure in migration
> due to a device that we pass that up rather than calling
> qemu_file_set_error - e.g. an EIO in a block device or network.
> 
> However,
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

I'll take the R-b first. :)

Regarding to above - aren't we currently detecting these kind of
errors using -EIO?  And network down should be only one of such case?

For now I still cannot distinguish network down out of something worse
that cannot even be recovered.  No matter what, current code will go
into PAUSED state as long as EIO is got.  I thought about it, and for
now I don't think it is a problem, since even if it is a critical
failure and unable to recover in any way, we still won't lose anything
if we stop the VM at once (that's what paused state do - VM is just
stopped).  For the critical failures, we will just find out that the
recovery will fail again rather than success.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-09-21 19:29   ` Dr. David Alan Gilbert
@ 2017-09-27  7:34     ` Peter Xu
  2017-10-09 18:58       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-09-27  7:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Thu, Sep 21, 2017 at 08:29:03PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > When there is IO error on the incoming channel (e.g., network down),
> > instead of bailing out immediately, we allow the dst vm to switch to the
> > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> > new semaphore, until someone poke it for another attempt.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c  |  1 +
> >  migration/migration.h  |  3 +++
> >  migration/savevm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
> >  migration/trace-events |  2 ++
> >  4 files changed, 64 insertions(+), 2 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 8d26ea8..80de212 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
> >          memset(&mis_current, 0, sizeof(MigrationIncomingState));
> >          qemu_mutex_init(&mis_current.rp_mutex);
> >          qemu_event_init(&mis_current.main_thread_load_event, false);
> > +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> >          once = true;
> >      }
> >      return &mis_current;
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 0c957c9..c423682 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -60,6 +60,9 @@ struct MigrationIncomingState {
> >      /* The coroutine we should enter (back) after failover */
> >      Coroutine *migration_incoming_co;
> >      QemuSemaphore colo_incoming_sem;
> > +
> > +    /* notify PAUSED postcopy incoming migrations to try to continue */
> > +    QemuSemaphore postcopy_pause_sem_dst;
> >  };
> >  
> >  MigrationIncomingState *migration_incoming_get_current(void);
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 7172f14..3777124 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> >   */
> >  static void *postcopy_ram_listen_thread(void *opaque)
> >  {
> > -    QEMUFile *f = opaque;
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> > +    QEMUFile *f = mis->from_src_file;
> >      int load_res;
> >  
> >      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
> >       */
> >      qemu_file_set_blocking(f, true);
> >      load_res = qemu_loadvm_state_main(f, mis);
> > +
> > +    /*
> > +     * This is tricky, but, mis->from_src_file can change after it
> > +     * returns, when postcopy recovery happened. In the future, we may
> > +     * want a wrapper for the QEMUFile handle.
> > +     */
> > +    f = mis->from_src_file;
> > +
> >      /* And non-blocking again so we don't block in any cleanup */
> >      qemu_file_set_blocking(f, false);
> >  
> > @@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> >      /* Start up the listening thread and wait for it to signal ready */
> >      qemu_sem_init(&mis->listen_thread_sem, 0);
> >      qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > -                       postcopy_ram_listen_thread, mis->from_src_file,
> > +                       postcopy_ram_listen_thread, NULL,
> >                         QEMU_THREAD_DETACHED);
> >      qemu_sem_wait(&mis->listen_thread_sem);
> >      qemu_sem_destroy(&mis->listen_thread_sem);
> > @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
> >      }
> >  }
> >  
> > +/* Return true if we should continue the migration, or false. */
> > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > +{
> > +    trace_postcopy_pause_incoming();
> > +
> > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > +
> > +    assert(mis->from_src_file);
> > +    qemu_file_shutdown(mis->from_src_file);
> > +    qemu_fclose(mis->from_src_file);
> > +    mis->from_src_file = NULL;
> > +
> > +    assert(mis->to_src_file);
> > +    qemu_mutex_lock(&mis->rp_mutex);
> > +    qemu_file_shutdown(mis->to_src_file);
> 
> Should you not do the shutdown() before the lock?
> For example if the other thread is stuck, with rp_mutex
> held, trying to write to to_src_file, then you'll block
> waiting for the mutex.  If you call shutdown and then take
> the lock, the other thread will error and release the lock.

The problem is that IMHO QEMUFile is not yet thread-safe itself.  So
if we operate on it (even to shut it down) logically we need to have
the lock, right?

Then, IMHO the question would be: when will the send() be stuck in the
other thread?

Normally the only case I can think of is that source didn't recv()
fast enough, and we even consumed all the write buffer in dst side (I
don't really know how kernel manages the buffers though, and e.g. how
the size of buffer is defined...).

But when reach here, the channel (say, from_src_file and to_src_file,
since both of them are using the same channel behind the QEMUFile
interface) should already be broken in some way, then IIUC even there
is a send() in the other thread, it should return at some point with a
failure as well, just like how we reached here (possibly due to a
read() failure).

> 
> I'm not quite sure what will happen if we end up calling this
> before the main thread has been returned from postcopy and the
> device loading is complete.

IIUC you mean the time starts from when we got MIG_CMD_PACKAGED until
main thread finishes handling that package?

Normally I think that should not matter much since during handling the
package it should hardly fail (we were reading from a buffer QIO
channel, no real IOs there)...  But I agree about the reasoning.  How
about one more patch to postpone the "active" to "postcopy-active"
state change after the package is handled correctly?  Like:

--------------
diff --git a/migration/savevm.c b/migration/savevm.c                     
index b5c3214034..8317b2a7e2 100644 
--- a/migration/savevm.c            
+++ b/migration/savevm.c            
@@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)                                                                       
     QEMUFile *f = mis->from_src_file;                                   
     int load_res;                  
                                    
-    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
-                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
     qemu_sem_post(&mis->listen_thread_sem);                             
     trace_postcopy_ram_listen_thread_start();                           
                                    
@@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)                                                          
     qemu_fclose(packf);            
     object_unref(OBJECT(bioc));    
                                    
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
+                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
+                                   
     return ret;                    
 }                                  
--------------

This function will only be called with "postcopy-active" state.

> 
> Also, at this point have we guaranteed no one else is about
> to do an op on mis->to_src_file and will seg?

I think no?  Since IMHO the main thread is playing with the buffer QIO
channel, rather than the real one?

(btw, could I ask what's "seg"? :)

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-09-22 11:05   ` Dr. David Alan Gilbert
@ 2017-09-27 10:04     ` Peter Xu
  2017-10-09 19:12       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-09-27 10:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Fri, Sep 22, 2017 at 12:05:42PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > diff --git a/migration/ram.c b/migration/ram.c
> > index 7e20097..5d938e3 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -182,6 +182,70 @@ void ramblock_recv_bitmap_clear(RAMBlock *rb, void *host_addr)
> >      clear_bit(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
> >  }
> >  
> > +#define  RAMBLOCK_RECV_BITMAP_ENDING  (0x0123456789abcdefULL)
> > +
> > +/*
> > + * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
> > + *
> > + * Returns >0 if success with sent bytes, or <0 if error.
> > + */
> > +int64_t ramblock_recv_bitmap_send(QEMUFile *file,
> > +                                  const char *block_name)
> > +{
> > +    RAMBlock *block = qemu_ram_block_by_name(block_name);
> > +    unsigned long *le_bitmap, nbits;
> > +    uint64_t size;
> > +
> > +    if (!block) {
> > +        error_report("%s: invalid block name: %s", __func__, block_name);
> > +        return -1;
> > +    }
> > +
> > +    nbits = block->used_length >> TARGET_PAGE_BITS;
> > +
> > +    /*
> > +     * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
> > +     * machines we may need 4 more bytes for padding (see below
> > +     * comment). So extend it a bit before hand.
> > +     */
> > +    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
> 
> I do worry what will happen on really huge RAMBlocks; the worst case is
> that this temporary bitmap is a few GB.

IIUC the bitmap ratio is 32K, so the ramblock will be 1GB only if the
guest RAM region size is 1GB * 32K = 32TB.

Then, can I just assume allocating (only) 1GB temporary memory for a
guest using more than 32TB memory not a problem? :-)

I hope I didn't calculate it wrongly though.

> 
> > +    /*
> > +     * Always use little endian when sending the bitmap. This is
> > +     * required that when source and destination VMs are not using the
> > +     * same endianess. (Note: big endian won't work.)
> > +     */
> > +    bitmap_to_le(le_bitmap, block->receivedmap, nbits);
> > +
> > +    /* Size of the bitmap, in bytes */
> > +    size = nbits / 8;
> > +
> > +    /*
> > +     * size is always aligned to 8 bytes for 64bit machines, but it
> > +     * may not be true for 32bit machines. We need this padding to
> > +     * make sure the migration can survive even between 32bit and
> > +     * 64bit machines.
> > +     */
> > +    size = ROUND_UP(size, 8);
> > +
> > +    qemu_put_be64(file, size);
> > +    qemu_put_buffer(file, (const uint8_t *)le_bitmap, size);
> > +    /*
> > +     * Mark as an end, in case the middle part is screwed up due to
> > +     * some "misterious" reason.
> > +     */
> > +    qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING);
> > +    qemu_fflush(file);
> > +
> > +    free(le_bitmap);
> > +
> > +    if (qemu_file_get_error(file)) {
> > +        return qemu_file_get_error(file);
> > +    }
> > +
> > +    return size + sizeof(size);
> > +}
> > +
> >  /*
> >   * An outstanding page request, on the source, having been received
> >   * and queued
> > @@ -2706,6 +2770,83 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      return ret;
> >  }
> >  
> > +/*
> > + * Read the received bitmap, revert it as the initial dirty bitmap.
> > + * This is only used when the postcopy migration is paused but wants
> > + * to resume from a middle point.
> > + */
> > +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > +{
> > +    int ret = -EINVAL;
> > +    QEMUFile *file = s->rp_state.from_dst_file;
> > +    unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
> > +    uint64_t local_size = nbits / 8;
> > +    uint64_t size, end_mark;
> > +
> > +    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +        error_report("%s: incorrect state %s", __func__,
> > +                     MigrationStatus_lookup[s->state]);
> > +        return -EINVAL;
> > +    }
> > +
> > +    /*
> > +     * Note: see comments in ramblock_recv_bitmap_send() on why we
> > +     * need the endianess convertion, and the paddings.
> > +     */
> > +    local_size = ROUND_UP(local_size, 8);
> > +
> > +    /* Add addings */
> > +    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
> > +
> > +    size = qemu_get_be64(file);
> > +
> > +    /* The size of the bitmap should match with our ramblock */
> > +    if (size != local_size) {
> > +        error_report("%s: ramblock '%s' bitmap size mismatch "
> > +                     "(0x%lx != 0x%lx)", __func__, block->idstr,
> > +                     size, local_size);
> 
> You need to use PRIx64 formatters there - %lx isn't portable.

Yes. Fixing.

> 
> > +        ret = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
> > +    end_mark = qemu_get_be64(file);
> > +
> > +    ret = qemu_file_get_error(file);
> > +    if (ret || size != local_size) {
> > +        error_report("%s: read bitmap failed for ramblock '%s': %d",
> > +                     __func__, block->idstr, ret);
> 
> You might like to include size/local_size in the error.

Will do.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-09-22 11:08   ` Dr. David Alan Gilbert
@ 2017-09-27 10:11     ` Peter Xu
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-09-27 10:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Fri, Sep 22, 2017 at 12:08:06PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Introducing this new command to be sent when the source VM is ready to
> > resume the paused migration.  What the destination does here is
> > basically release the fault thread to continue service page faults.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/savevm.c     | 33 +++++++++++++++++++++++++++++++++
> >  migration/savevm.h     |  1 +
> >  migration/trace-events |  1 +
> >  3 files changed, 35 insertions(+)
> > 
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 7f77a31..e914346 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -77,6 +77,7 @@ enum qemu_vm_cmd {
> >      MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
> >                                        were previously sent during
> >                                        precopy but are dirty. */
> > +    MIG_CMD_POSTCOPY_RESUME,       /* resume postcopy on dest */
> >      MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
> >      MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
> >      MIG_CMD_MAX
> > @@ -95,6 +96,7 @@ static struct mig_cmd_args {
> >      [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
> >      [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
> >                                     .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
> > +    [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
> >      [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
> >      [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
> >      [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
> > @@ -931,6 +933,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
> >      qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
> >  }
> >  
> > +void qemu_savevm_send_postcopy_resume(QEMUFile *f)
> > +{
> > +    trace_savevm_send_postcopy_resume();
> > +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL);
> > +}
> > +
> >  void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
> >  {
> >      size_t len;
> > @@ -1682,6 +1690,28 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
> >      return LOADVM_QUIT;
> >  }
> >  
> > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > +{
> > +    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +        error_report("%s: illegal resume received", __func__);
> > +        /* Don't fail the load, only for this. */
> > +        return 0;
> > +    }
> > +
> > +    /*
> > +     * This means source VM is ready to resume the postcopy migration.
> > +     * It's time to switch state and release the fault thread to
> > +     * continue service page faults.
> > +     */
> > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > +
> > +    /* TODO: Tell source that "we are ready" */
> > +
> 
> You might want to add a trace in here; however,

Added.

> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume
  2017-09-22 11:33   ` Dr. David Alan Gilbert
@ 2017-09-28  2:30     ` Peter Xu
  2017-10-02 11:04       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-09-28  2:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Fri, Sep 22, 2017 at 12:33:19PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > This patch implements the first part of core RAM resume logic for
> > postcopy. ram_resume_prepare() is provided for the work.
> > 
> > When the migration is interrupted by network failure, the dirty bitmap
> > on the source side will be meaningless, because even the dirty bit is
> > cleared, it is still possible that the sent page was lost along the way
> > to destination. Here instead of continue the migration with the old
> > dirty bitmap on source, we ask the destination side to send back its
> > received bitmap, then invert it to be our initial dirty bitmap.
> > 
> > The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
> > once per ramblock, to ask for the received bitmap. On destination side,
> > MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
> > Data will be received on the return-path thread of source, and the main
> > migration thread will be notified when all the ramblock bitmaps are
> > synchronized.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c  |  4 +++
> >  migration/migration.h  |  1 +
> >  migration/ram.c        | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  migration/trace-events |  4 +++
> >  4 files changed, 76 insertions(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 19b7f3a5..19aed72 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2605,6 +2605,8 @@ static void migration_instance_finalize(Object *obj)
> >  
> >      g_free(params->tls_hostname);
> >      g_free(params->tls_creds);
> > +
> > +    qemu_sem_destroy(&ms->rp_state.rp_sem);
> >  }
> >  
> >  static void migration_instance_init(Object *obj)
> > @@ -2629,6 +2631,8 @@ static void migration_instance_init(Object *obj)
> >      params->has_downtime_limit = true;
> >      params->has_x_checkpoint_delay = true;
> >      params->has_block_incremental = true;
> > +
> > +    qemu_sem_init(&ms->rp_state.rp_sem, 1);
> >  }
> >  
> >  /*
> > diff --git a/migration/migration.h b/migration/migration.h
> > index a3a0582..d041369 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -107,6 +107,7 @@ struct MigrationState
> >          QEMUFile     *from_dst_file;
> >          QemuThread    rp_thread;
> >          bool          error;
> > +        QemuSemaphore rp_sem;
> >      } rp_state;
> >  
> >      double mbps;
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 5d938e3..afabcf5 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -47,6 +47,7 @@
> >  #include "exec/target_page.h"
> >  #include "qemu/rcu_queue.h"
> >  #include "migration/colo.h"
> > +#include "savevm.h"
> >  
> >  /***********************************************************/
> >  /* ram save/restore */
> > @@ -295,6 +296,8 @@ struct RAMState {
> >      RAMBlock *last_req_rb;
> >      /* Queue of outstanding page requests from the destination */
> >      QemuMutex src_page_req_mutex;
> > +    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
> > +    int ramblock_to_sync;
> >      QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
> >  };
> >  typedef struct RAMState RAMState;
> > @@ -2770,6 +2773,56 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      return ret;
> >  }
> >  
> > +/* Sync all the dirty bitmap with destination VM.  */
> > +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> > +{
> > +    RAMBlock *block;
> > +    QEMUFile *file = s->to_dst_file;
> > +    int ramblock_count = 0;
> > +
> > +    trace_ram_dirty_bitmap_sync_start();
> > +
> > +    /*
> > +     * We do this in such order:
> > +     *
> > +     * 1. calculate block count
> > +     * 2. fill in the count to N
> > +     * 3. send MIG_CMD_RECV_BITMAP requests
> > +     * 4. wait on the semaphore until N -> 0
> > +     */
> > +
> > +    RAMBLOCK_FOREACH(block) {
> > +        ramblock_count++;
> > +    }
> > +
> > +    atomic_set(&rs->ramblock_to_sync, ramblock_count);
> > +    RAMBLOCK_FOREACH(block) {
> > +        qemu_savevm_send_recv_bitmap(file, block->idstr);
> > +    }
> > +
> > +    trace_ram_dirty_bitmap_sync_wait();
> 
> Please include the RAMBlock name in the trace, so if it hangs we can
> see where.

This is to note when we start to wait, while there is a trace below
when we reload one single ramblock at [1].  Would that suffice?

> 
> > +
> > +    /* Wait until all the ramblocks' dirty bitmap synced */
> > +    while (atomic_read(&rs->ramblock_to_sync)) {
> > +        qemu_sem_wait(&s->rp_state.rp_sem);
> > +    }
> 
> Do you need to make ramblock_to_sync global and use atomics - I think
> you can simplify it;  if you qemu_sem_init to 0, then I think you
> can do:
>    while (ramblock_count--) {
>        qemu_sem_wait(&s->rp_state.rp_sem);
>    }
> 
> qemu_sem_wait will block until the semaphore is >0....

You are right!

> 
> > +
> > +    trace_ram_dirty_bitmap_sync_complete();
> > +
> > +    return 0;
> > +}
> > +
> > +static void ram_dirty_bitmap_reload_notify(MigrationState *s)
> > +{
> > +    atomic_dec(&ram_state->ramblock_to_sync);
> > +    if (ram_state->ramblock_to_sync == 0) {
> > +        /* Make sure the other thread gets the latest */
> > +        trace_ram_dirty_bitmap_sync_notify();
> > +        qemu_sem_post(&s->rp_state.rp_sem);
> > +    }
> 
> then with the suggestion above you just do a qemu_sem_post each time.

Yes.  I'll also remove the notify trace since there is a better
tracepoint before calling this function.

> 
> > +}
> > +
> >  /*
> >   * Read the received bitmap, revert it as the initial dirty bitmap.
> >   * This is only used when the postcopy migration is paused but wants
> > @@ -2841,12 +2894,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> >  
> >      trace_ram_dirty_bitmap_reload(block->idstr);

[1]

> >  
> > +    /*
> > +     * We succeeded to sync bitmap for current ramblock. If this is
> > +     * the last one to sync, we need to notify the main send thread.
> > +     */
> > +    ram_dirty_bitmap_reload_notify(s);
> > +
> >      ret = 0;
> >  out:
> >      free(le_bitmap);
> >      return ret;
> >  }
> >  
> > +static int ram_resume_prepare(MigrationState *s, void *opaque)
> > +{
> > +    RAMState *rs = *(RAMState **)opaque;
> > +
> > +    return ram_dirty_bitmap_sync_all(s, rs);
> > +}
> > +
> >  static SaveVMHandlers savevm_ram_handlers = {
> >      .save_setup = ram_save_setup,
> >      .save_live_iterate = ram_save_iterate,
> > @@ -2857,6 +2923,7 @@ static SaveVMHandlers savevm_ram_handlers = {
> >      .save_cleanup = ram_save_cleanup,
> >      .load_setup = ram_load_setup,
> >      .load_cleanup = ram_load_cleanup,
> > +    .resume_prepare = ram_resume_prepare,
> >  };
> >  
> >  void ram_mig_init(void)
> > diff --git a/migration/trace-events b/migration/trace-events
> > index 61b0d49..8962916 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -81,6 +81,10 @@ ram_postcopy_send_discard_bitmap(void) ""
> >  ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
> >  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
> >  ram_dirty_bitmap_reload(char *str) "%s"
> > +ram_dirty_bitmap_sync_start(void) ""
> > +ram_dirty_bitmap_sync_wait(void) ""
> > +ram_dirty_bitmap_sync_notify(void) ""
> > +ram_dirty_bitmap_sync_complete(void) ""
> >  
> >  # migration/migration.c
> >  await_return_path_close_on_source_close(void) ""
> > -- 
> > 2.7.4
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets
  2017-09-22 20:11   ` Dr. David Alan Gilbert
@ 2017-09-28  3:12     ` Peter Xu
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-09-28  3:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Fri, Sep 22, 2017 at 09:11:50PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > -void tcp_start_incoming_migration(const char *host_port, Error **errp)
> > +guint tcp_start_incoming_migration(const char *host_port, Error **errp)
> >  {
> >      Error *err = NULL;
> >      SocketAddress *saddr = tcp_build_address(host_port, &err);
> > +    guint tag;
> > +
> >      if (!err) {
> > -        socket_start_incoming_migration(saddr, &err);
> > +        tag = socket_start_incoming_migration(saddr, &err);
> >      }
> 
> I'd be tempted to initialise that tag = 0   for the case where
> there's an error; but OK.

Yeh, I think it worths a touch-up.

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Will take the r-b after fixing above.  Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM
  2017-09-22 20:32   ` Dr. David Alan Gilbert
@ 2017-09-28  6:54     ` Peter Xu
  2017-10-09 17:28       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-09-28  6:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Fri, Sep 22, 2017 at 09:32:28PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > migrate_incoming command is previously only used when we were providing
> > "-incoming defer" in the command line, to defer the incoming migration
> > channel creation.
> > 
> > However there is similar requirement when we are paused during postcopy
> > migration. The old incoming channel might have been destroyed already.
> > We may need another new channel for the recovery to happen.
> > 
> > This patch leveraged the same interface, but allows the user to specify
> > incoming migration channel even for paused postcopy.
> > 
> > Meanwhile, now migration listening ports are always detached manually
> > using the tag, rather than using return values of dispatchers.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/exec.c      |  2 +-
> >  migration/fd.c        |  2 +-
> >  migration/migration.c | 39 +++++++++++++++++++++++++++++----------
> >  migration/socket.c    |  2 +-
> >  4 files changed, 32 insertions(+), 13 deletions(-)
> > 
> > diff --git a/migration/exec.c b/migration/exec.c
> > index ef1fb4c..26fc37d 100644
> > --- a/migration/exec.c
> > +++ b/migration/exec.c
> > @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
> >  {
> >      migration_channel_process_incoming(ioc);
> >      object_unref(OBJECT(ioc));
> > -    return FALSE; /* unregister */
> > +    return TRUE; /* keep it registered */
> >  }
> >  
> >  /*
> > diff --git a/migration/fd.c b/migration/fd.c
> > index e9a548c..7d0aefa 100644
> > --- a/migration/fd.c
> > +++ b/migration/fd.c
> > @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> >  {
> >      migration_channel_process_incoming(ioc);
> >      object_unref(OBJECT(ioc));
> > -    return FALSE; /* unregister */
> > +    return TRUE; /* keep it registered */
> >  }
> >  
> >  /*
> > diff --git a/migration/migration.c b/migration/migration.c
> > index daf356b..5812478 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -175,6 +175,17 @@ void migration_incoming_state_destroy(void)
> >      qemu_event_destroy(&mis->main_thread_load_event);
> >  }
> >  
> > +static bool migrate_incoming_detach_listen(MigrationIncomingState *mis)
> > +{
> > +    if (mis->listen_task_tag) {
> > +        /* Never fail */
> > +        g_source_remove(mis->listen_task_tag);
> > +        mis->listen_task_tag = 0;
> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> >  static void migrate_generate_event(int new_state)
> >  {
> >      if (migrate_use_events()) {
> > @@ -432,10 +443,9 @@ void migration_fd_process_incoming(QEMUFile *f)
> >  
> >      /*
> >       * When reach here, we should not need the listening port any
> > -     * more. We'll detach the listening task soon, let's reset the
> > -     * listen task tag.
> > +     * more.  Detach the listening port explicitly.
> >       */
> > -    mis->listen_task_tag = 0;
> > +    migrate_incoming_detach_listen(mis);
> >  }
> >  
> >  /*
> > @@ -1291,14 +1301,25 @@ void migrate_del_blocker(Error *reason)
> >  void qmp_migrate_incoming(const char *uri, Error **errp)
> >  {
> >      Error *local_err = NULL;
> > -    static bool once = true;
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> >  
> > -    if (!deferred_incoming) {
> > -        error_setg(errp, "For use with '-incoming defer'");
> > +    if (!deferred_incoming &&
> > +        mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +        error_setg(errp, "For use with '-incoming defer'"
> > +                   " or PAUSED postcopy migration only.");
> >          return;
> >      }
> > -    if (!once) {
> > -        error_setg(errp, "The incoming migration has already been started");
> 
> What guards against someone doing a migrate_incoming after the succesful
> completion of an incoming migration?

If deferred incoming is not enabled, we should be protected by above
check on (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED).  But yes I
think this is a problem if deferred incoming is used.  Maybe I should
still keep the "once" check here for deferred migration, but I think I
can re-use the variable "deferred_incoming".  Please see below.

> Also with RDMA the following won't happen so I'm not quite sure what
> state we're in.

Indeed.  Currently there is still no good way to destroy the RDMA
accept handle easily since it's using its own qemu_set_fd_handler()
way to setup accept ports.  But I think maybe I can solve this problem
with below issue together.  Please see below.

> 
> When we get to non-blocking commands it's also a bit interesting - we
> could be getting an accept on the main thread at just the same time
> this is going down the OOB side.

This is an interesting point.  Thanks for noticing that.

How about I do it the strict way?  like this (hopefully this can solve
all the issues mentioned above):

qmp_migrate_incoming()
{
  if (deferred_incoming) {
    // PASS, deferred incoming is set, and never triggered
  } else if (state == POSTCOPY_PAUSED && listen_tag == 0) {
    // PASS, we don't have an accept port
  } else {
    // FAIL
  }

  qemu_start_incoming_migration(uri, &local_err);

  if (local_err) {
      error_propagate(errp, local_err);
      return;
  }

  // stop allowing this
  deferred_incoming = false;
}

To make sure it works, I may need to hack an unique listen tag for
RDMA for now, say, using (guint)(-1) to stands for RDMA tag (instead
of really re-write RDMA codes to use the watcher stuff with real
listen tags), like:

#define MIG_LISTEN_TAG_RDMA_FAKE ((guint)(-1))

bool migrate_incoming_detach_listen()
{
    if (listen_tag) {
        if (listen_tag != MIG_LISTEN_TAG_RDMA_FAKE) {
            // RDMA has already detached the accept port
            g_source_remove(listen_tag);
        }
        listen_tag = 0;
        return true;
    }
    return false;
}

Then when listen_tag != 0 it means that there is an acception port,
and as long as there is one port we don't allow to change it (like the
pesudo qmp_migrate_incoming() code I wrote).

Would this work?

> 
> Dave
> 
> > +
> > +    /*
> > +     * Destroy existing listening task if exist. Logically this should
> > +     * not really happen at all (for either deferred migration or
> > +     * postcopy migration, we should both detached the listening
> > +     * task). So raise an error but still we safely detach it.
> > +     */
> > +    if (migrate_incoming_detach_listen(mis)) {
> > +        error_report("%s: detected existing listen channel, "
> > +                     "while it should not exist", __func__);
> > +        /* Continue */
> >      }
> >  
> >      qemu_start_incoming_migration(uri, &local_err);
> > @@ -1307,8 +1328,6 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
> >          error_propagate(errp, local_err);
> >          return;
> >      }
> > -
> > -    once = false;
> >  }
> >  
> >  bool migration_is_blocked(Error **errp)
> > diff --git a/migration/socket.c b/migration/socket.c
> > index 6ee51ef..e3e453f 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -154,7 +154,7 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
> >  out:
> >      /* Close listening socket as its no longer needed */
> >      qio_channel_close(ioc, NULL);
> > -    return FALSE; /* unregister */
> > +    return TRUE; /* keep it registered */
> >  }
> >  
> >  
> > -- 
> > 2.7.4
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume
  2017-09-28  2:30     ` Peter Xu
@ 2017-10-02 11:04       ` Dr. David Alan Gilbert
  2017-10-09  3:55         ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-02 11:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Sep 22, 2017 at 12:33:19PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > This patch implements the first part of core RAM resume logic for
> > > postcopy. ram_resume_prepare() is provided for the work.
> > > 
> > > When the migration is interrupted by network failure, the dirty bitmap
> > > on the source side will be meaningless, because even the dirty bit is
> > > cleared, it is still possible that the sent page was lost along the way
> > > to destination. Here instead of continue the migration with the old
> > > dirty bitmap on source, we ask the destination side to send back its
> > > received bitmap, then invert it to be our initial dirty bitmap.
> > > 
> > > The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
> > > once per ramblock, to ask for the received bitmap. On destination side,
> > > MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
> > > Data will be received on the return-path thread of source, and the main
> > > migration thread will be notified when all the ramblock bitmaps are
> > > synchronized.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c  |  4 +++
> > >  migration/migration.h  |  1 +
> > >  migration/ram.c        | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  migration/trace-events |  4 +++
> > >  4 files changed, 76 insertions(+)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 19b7f3a5..19aed72 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2605,6 +2605,8 @@ static void migration_instance_finalize(Object *obj)
> > >  
> > >      g_free(params->tls_hostname);
> > >      g_free(params->tls_creds);
> > > +
> > > +    qemu_sem_destroy(&ms->rp_state.rp_sem);
> > >  }
> > >  
> > >  static void migration_instance_init(Object *obj)
> > > @@ -2629,6 +2631,8 @@ static void migration_instance_init(Object *obj)
> > >      params->has_downtime_limit = true;
> > >      params->has_x_checkpoint_delay = true;
> > >      params->has_block_incremental = true;
> > > +
> > > +    qemu_sem_init(&ms->rp_state.rp_sem, 1);
> > >  }
> > >  
> > >  /*
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index a3a0582..d041369 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -107,6 +107,7 @@ struct MigrationState
> > >          QEMUFile     *from_dst_file;
> > >          QemuThread    rp_thread;
> > >          bool          error;
> > > +        QemuSemaphore rp_sem;
> > >      } rp_state;
> > >  
> > >      double mbps;
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5d938e3..afabcf5 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -47,6 +47,7 @@
> > >  #include "exec/target_page.h"
> > >  #include "qemu/rcu_queue.h"
> > >  #include "migration/colo.h"
> > > +#include "savevm.h"
> > >  
> > >  /***********************************************************/
> > >  /* ram save/restore */
> > > @@ -295,6 +296,8 @@ struct RAMState {
> > >      RAMBlock *last_req_rb;
> > >      /* Queue of outstanding page requests from the destination */
> > >      QemuMutex src_page_req_mutex;
> > > +    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
> > > +    int ramblock_to_sync;
> > >      QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
> > >  };
> > >  typedef struct RAMState RAMState;
> > > @@ -2770,6 +2773,56 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> > >      return ret;
> > >  }
> > >  
> > > +/* Sync all the dirty bitmap with destination VM.  */
> > > +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> > > +{
> > > +    RAMBlock *block;
> > > +    QEMUFile *file = s->to_dst_file;
> > > +    int ramblock_count = 0;
> > > +
> > > +    trace_ram_dirty_bitmap_sync_start();
> > > +
> > > +    /*
> > > +     * We do this in such order:
> > > +     *
> > > +     * 1. calculate block count
> > > +     * 2. fill in the count to N
> > > +     * 3. send MIG_CMD_RECV_BITMAP requests
> > > +     * 4. wait on the semaphore until N -> 0
> > > +     */
> > > +
> > > +    RAMBLOCK_FOREACH(block) {
> > > +        ramblock_count++;
> > > +    }
> > > +
> > > +    atomic_set(&rs->ramblock_to_sync, ramblock_count);
> > > +    RAMBLOCK_FOREACH(block) {
> > > +        qemu_savevm_send_recv_bitmap(file, block->idstr);
> > > +    }
> > > +
> > > +    trace_ram_dirty_bitmap_sync_wait();
> > 
> > Please include the RAMBlock name in the trace, so if it hangs we can
> > see where.
> 
> This is to note when we start to wait, while there is a trace below
> when we reload one single ramblock at [1].  Would that suffice?

If you easily have the name, it's worth including it in the trace before
you wait, so that if it fails and never gets out of this wait we'd
have the trace telling us the block it was waiting in.

Dave

> > 
> > > +
> > > +    /* Wait until all the ramblocks' dirty bitmap synced */
> > > +    while (atomic_read(&rs->ramblock_to_sync)) {
> > > +        qemu_sem_wait(&s->rp_state.rp_sem);
> > > +    }
> > 
> > Do you need to make ramblock_to_sync global and use atomics - I think
> > you can simplify it;  if you qemu_sem_init to 0, then I think you
> > can do:
> >    while (ramblock_count--) {
> >        qemu_sem_wait(&s->rp_state.rp_sem);
> >    }
> > 
> > qemu_sem_wait will block until the semaphore is >0....
> 
> You are right!
> 
> > 
> > > +
> > > +    trace_ram_dirty_bitmap_sync_complete();
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static void ram_dirty_bitmap_reload_notify(MigrationState *s)
> > > +{
> > > +    atomic_dec(&ram_state->ramblock_to_sync);
> > > +    if (ram_state->ramblock_to_sync == 0) {
> > > +        /* Make sure the other thread gets the latest */
> > > +        trace_ram_dirty_bitmap_sync_notify();
> > > +        qemu_sem_post(&s->rp_state.rp_sem);
> > > +    }
> > 
> > then with the suggestion above you just do a qemu_sem_post each time.
> 
> Yes.  I'll also remove the notify trace since there is a better
> tracepoint before calling this function.
> 
> > 
> > > +}
> > > +
> > >  /*
> > >   * Read the received bitmap, revert it as the initial dirty bitmap.
> > >   * This is only used when the postcopy migration is paused but wants
> > > @@ -2841,12 +2894,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > >  
> > >      trace_ram_dirty_bitmap_reload(block->idstr);
> 
> [1]
> 
> > >  
> > > +    /*
> > > +     * We succeeded to sync bitmap for current ramblock. If this is
> > > +     * the last one to sync, we need to notify the main send thread.
> > > +     */
> > > +    ram_dirty_bitmap_reload_notify(s);
> > > +
> > >      ret = 0;
> > >  out:
> > >      free(le_bitmap);
> > >      return ret;
> > >  }
> > >  
> > > +static int ram_resume_prepare(MigrationState *s, void *opaque)
> > > +{
> > > +    RAMState *rs = *(RAMState **)opaque;
> > > +
> > > +    return ram_dirty_bitmap_sync_all(s, rs);
> > > +}
> > > +
> > >  static SaveVMHandlers savevm_ram_handlers = {
> > >      .save_setup = ram_save_setup,
> > >      .save_live_iterate = ram_save_iterate,
> > > @@ -2857,6 +2923,7 @@ static SaveVMHandlers savevm_ram_handlers = {
> > >      .save_cleanup = ram_save_cleanup,
> > >      .load_setup = ram_load_setup,
> > >      .load_cleanup = ram_load_cleanup,
> > > +    .resume_prepare = ram_resume_prepare,
> > >  };
> > >  
> > >  void ram_mig_init(void)
> > > diff --git a/migration/trace-events b/migration/trace-events
> > > index 61b0d49..8962916 100644
> > > --- a/migration/trace-events
> > > +++ b/migration/trace-events
> > > @@ -81,6 +81,10 @@ ram_postcopy_send_discard_bitmap(void) ""
> > >  ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
> > >  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
> > >  ram_dirty_bitmap_reload(char *str) "%s"
> > > +ram_dirty_bitmap_sync_start(void) ""
> > > +ram_dirty_bitmap_sync_wait(void) ""
> > > +ram_dirty_bitmap_sync_notify(void) ""
> > > +ram_dirty_bitmap_sync_complete(void) ""
> > >  
> > >  # migration/migration.c
> > >  await_return_path_close_on_source_close(void) ""
> > > -- 
> > > 2.7.4
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume
  2017-10-02 11:04       ` Dr. David Alan Gilbert
@ 2017-10-09  3:55         ` Peter Xu
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-10-09  3:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Mon, Oct 02, 2017 at 12:04:46PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Fri, Sep 22, 2017 at 12:33:19PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > This patch implements the first part of core RAM resume logic for
> > > > postcopy. ram_resume_prepare() is provided for the work.
> > > > 
> > > > When the migration is interrupted by network failure, the dirty bitmap
> > > > on the source side will be meaningless, because even the dirty bit is
> > > > cleared, it is still possible that the sent page was lost along the way
> > > > to destination. Here instead of continue the migration with the old
> > > > dirty bitmap on source, we ask the destination side to send back its
> > > > received bitmap, then invert it to be our initial dirty bitmap.
> > > > 
> > > > The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
> > > > once per ramblock, to ask for the received bitmap. On destination side,
> > > > MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
> > > > Data will be received on the return-path thread of source, and the main
> > > > migration thread will be notified when all the ramblock bitmaps are
> > > > synchronized.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  migration/migration.c  |  4 +++
> > > >  migration/migration.h  |  1 +
> > > >  migration/ram.c        | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  migration/trace-events |  4 +++
> > > >  4 files changed, 76 insertions(+)
> > > > 
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 19b7f3a5..19aed72 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -2605,6 +2605,8 @@ static void migration_instance_finalize(Object *obj)
> > > >  
> > > >      g_free(params->tls_hostname);
> > > >      g_free(params->tls_creds);
> > > > +
> > > > +    qemu_sem_destroy(&ms->rp_state.rp_sem);
> > > >  }
> > > >  
> > > >  static void migration_instance_init(Object *obj)
> > > > @@ -2629,6 +2631,8 @@ static void migration_instance_init(Object *obj)
> > > >      params->has_downtime_limit = true;
> > > >      params->has_x_checkpoint_delay = true;
> > > >      params->has_block_incremental = true;
> > > > +
> > > > +    qemu_sem_init(&ms->rp_state.rp_sem, 1);
> > > >  }
> > > >  
> > > >  /*
> > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > index a3a0582..d041369 100644
> > > > --- a/migration/migration.h
> > > > +++ b/migration/migration.h
> > > > @@ -107,6 +107,7 @@ struct MigrationState
> > > >          QEMUFile     *from_dst_file;
> > > >          QemuThread    rp_thread;
> > > >          bool          error;
> > > > +        QemuSemaphore rp_sem;
> > > >      } rp_state;
> > > >  
> > > >      double mbps;
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 5d938e3..afabcf5 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -47,6 +47,7 @@
> > > >  #include "exec/target_page.h"
> > > >  #include "qemu/rcu_queue.h"
> > > >  #include "migration/colo.h"
> > > > +#include "savevm.h"
> > > >  
> > > >  /***********************************************************/
> > > >  /* ram save/restore */
> > > > @@ -295,6 +296,8 @@ struct RAMState {
> > > >      RAMBlock *last_req_rb;
> > > >      /* Queue of outstanding page requests from the destination */
> > > >      QemuMutex src_page_req_mutex;
> > > > +    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
> > > > +    int ramblock_to_sync;
> > > >      QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
> > > >  };
> > > >  typedef struct RAMState RAMState;
> > > > @@ -2770,6 +2773,56 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> > > >      return ret;
> > > >  }
> > > >  
> > > > +/* Sync all the dirty bitmap with destination VM.  */
> > > > +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> > > > +{
> > > > +    RAMBlock *block;
> > > > +    QEMUFile *file = s->to_dst_file;
> > > > +    int ramblock_count = 0;
> > > > +
> > > > +    trace_ram_dirty_bitmap_sync_start();
> > > > +
> > > > +    /*
> > > > +     * We do this in such order:
> > > > +     *
> > > > +     * 1. calculate block count
> > > > +     * 2. fill in the count to N
> > > > +     * 3. send MIG_CMD_RECV_BITMAP requests
> > > > +     * 4. wait on the semaphore until N -> 0
> > > > +     */
> > > > +
> > > > +    RAMBLOCK_FOREACH(block) {
> > > > +        ramblock_count++;
> > > > +    }
> > > > +
> > > > +    atomic_set(&rs->ramblock_to_sync, ramblock_count);
> > > > +    RAMBLOCK_FOREACH(block) {
> > > > +        qemu_savevm_send_recv_bitmap(file, block->idstr);
> > > > +    }
> > > > +
> > > > +    trace_ram_dirty_bitmap_sync_wait();
> > > 
> > > Please include the RAMBlock name in the trace, so if it hangs we can
> > > see where.
> > 
> > This is to note when we start to wait, while there is a trace below
> > when we reload one single ramblock at [1].  Would that suffice?
> 
> If you easily have the name, it's worth including it in the trace before
> you wait, so that if it fails and never gets out of this wait we'd
> have the trace telling us the block it was waiting in.

I see your point.  Currently I am sending the MIG_CMD_RECV_BITMAP cmds
in batch, so there's no obvious way to know which one we are waiting
for (actually currently logic would allow the destination to send back
the ramblock bitmaps in different order as it wishes).  But I can add
one more trace here:

    RAMBLOCK_FOREACH(block) {
        qemu_savevm_send_recv_bitmap(file, block->idstr);
        trace_ram_dirty_bitmap_request(block->idstr);
        ramblock_count++;
    }

And assuming destination is actually sending these bitmaps in order,
then on console if we stuck at ramblock2, we can at least see:

ram_dirty_bitmap_request: ramblock1
ram_dirty_bitmap_request: ramblock2
ram_dirty_bitmap_request: ramblock3
...
ram_dirty_bitmap_reload: ramblock1
[console stuck here]

I can also add one more trace at the entry of
ram_dirty_bitmap_reload() if you like, so that if transmission is
interrupted during sending one single bitmap we'll know exactly which
bitmap we are sending.

> 
> Dave
> 
> > > 
> > > > +
> > > > +    /* Wait until all the ramblocks' dirty bitmap synced */
> > > > +    while (atomic_read(&rs->ramblock_to_sync)) {
> > > > +        qemu_sem_wait(&s->rp_state.rp_sem);
> > > > +    }
> > > 
> > > Do you need to make ramblock_to_sync global and use atomics - I think
> > > you can simplify it;  if you qemu_sem_init to 0, then I think you
> > > can do:
> > >    while (ramblock_count--) {
> > >        qemu_sem_wait(&s->rp_state.rp_sem);
> > >    }
> > > 
> > > qemu_sem_wait will block until the semaphore is >0....
> > 
> > You are right!
> > 
> > > 
> > > > +
> > > > +    trace_ram_dirty_bitmap_sync_complete();
> > > > +
> > > > +    return 0;
> > > > +}
> > > > +
> > > > +static void ram_dirty_bitmap_reload_notify(MigrationState *s)
> > > > +{
> > > > +    atomic_dec(&ram_state->ramblock_to_sync);
> > > > +    if (ram_state->ramblock_to_sync == 0) {
> > > > +        /* Make sure the other thread gets the latest */
> > > > +        trace_ram_dirty_bitmap_sync_notify();
> > > > +        qemu_sem_post(&s->rp_state.rp_sem);
> > > > +    }
> > > 
> > > then with the suggestion above you just do a qemu_sem_post each time.
> > 
> > Yes.  I'll also remove the notify trace since there is a better
> > tracepoint before calling this function.
> > 
> > > 
> > > > +}
> > > > +
> > > >  /*
> > > >   * Read the received bitmap, revert it as the initial dirty bitmap.
> > > >   * This is only used when the postcopy migration is paused but wants
> > > > @@ -2841,12 +2894,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > > >  
> > > >      trace_ram_dirty_bitmap_reload(block->idstr);
> > 
> > [1]
> > 
> > > >  
> > > > +    /*
> > > > +     * We succeeded to sync bitmap for current ramblock. If this is
> > > > +     * the last one to sync, we need to notify the main send thread.
> > > > +     */
> > > > +    ram_dirty_bitmap_reload_notify(s);
> > > > +
> > > >      ret = 0;
> > > >  out:
> > > >      free(le_bitmap);
> > > >      return ret;
> > > >  }
> > > >  
> > > > +static int ram_resume_prepare(MigrationState *s, void *opaque)
> > > > +{
> > > > +    RAMState *rs = *(RAMState **)opaque;
> > > > +
> > > > +    return ram_dirty_bitmap_sync_all(s, rs);
> > > > +}
> > > > +
> > > >  static SaveVMHandlers savevm_ram_handlers = {
> > > >      .save_setup = ram_save_setup,
> > > >      .save_live_iterate = ram_save_iterate,
> > > > @@ -2857,6 +2923,7 @@ static SaveVMHandlers savevm_ram_handlers = {
> > > >      .save_cleanup = ram_save_cleanup,
> > > >      .load_setup = ram_load_setup,
> > > >      .load_cleanup = ram_load_cleanup,
> > > > +    .resume_prepare = ram_resume_prepare,
> > > >  };
> > > >  
> > > >  void ram_mig_init(void)
> > > > diff --git a/migration/trace-events b/migration/trace-events
> > > > index 61b0d49..8962916 100644
> > > > --- a/migration/trace-events
> > > > +++ b/migration/trace-events
> > > > @@ -81,6 +81,10 @@ ram_postcopy_send_discard_bitmap(void) ""
> > > >  ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
> > > >  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
> > > >  ram_dirty_bitmap_reload(char *str) "%s"
> > > > +ram_dirty_bitmap_sync_start(void) ""
> > > > +ram_dirty_bitmap_sync_wait(void) ""
> > > > +ram_dirty_bitmap_sync_notify(void) ""
> > > > +ram_dirty_bitmap_sync_complete(void) ""
> > > >  
> > > >  # migration/migration.c
> > > >  await_return_path_close_on_source_close(void) ""
> > > > -- 
> > > > 2.7.4
> > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > -- 
> > Peter Xu
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic
  2017-09-26  9:35     ` Peter Xu
@ 2017-10-09 15:32       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-09 15:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Sep 21, 2017 at 08:21:45PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Now when network down for postcopy, the source side will not fail the
> > > migration. Instead we convert the status into this new paused state, and
> > > we will try to wait for a rescue in the future.
> > > 
> > > If a recovery is detected, migration_thread() will reset its local
> > > variables to prepare for that.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++++---
> > >  migration/migration.h  |  3 ++
> > >  migration/trace-events |  1 +
> > >  3 files changed, 98 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index f6130db..8d26ea8 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -993,6 +993,8 @@ static void migrate_fd_cleanup(void *opaque)
> > >  
> > >      notifier_list_notify(&migration_state_notifiers, s);
> > >      block_cleanup_parameters(s);
> > > +
> > > +    qemu_sem_destroy(&s->postcopy_pause_sem);
> > >  }
> > >  
> > >  void migrate_fd_error(MigrationState *s, const Error *error)
> > > @@ -1136,6 +1138,7 @@ MigrationState *migrate_init(void)
> > >      s->migration_thread_running = false;
> > >      error_free(s->error);
> > >      s->error = NULL;
> > > +    qemu_sem_init(&s->postcopy_pause_sem, 0);
> > >  
> > >      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
> > >  
> > > @@ -1938,6 +1941,80 @@ bool migrate_colo_enabled(void)
> > >      return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
> > >  }
> > >  
> > > +typedef enum MigThrError {
> > > +    /* No error detected */
> > > +    MIG_THR_ERR_NONE = 0,
> > > +    /* Detected error, but resumed successfully */
> > > +    MIG_THR_ERR_RECOVERED = 1,
> > > +    /* Detected fatal error, need to exit */
> > > +    MIG_THR_ERR_FATAL = 2,
> > 
> > I don't think it's necessary to assign the values there, but it's OK.
> > 
> > > +} MigThrError;
> > > +
> > > +/*
> > > + * We don't return until we are in a safe state to continue current
> > > + * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
> > > + * MIG_THR_ERR_FATAL if unrecovery failure happened.
> > > + */
> > > +static MigThrError postcopy_pause(MigrationState *s)
> > > +{
> > > +    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > +    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > +
> > > +    /* Current channel is possibly broken. Release it. */
> > > +    assert(s->to_dst_file);
> > > +    qemu_file_shutdown(s->to_dst_file);
> > > +    qemu_fclose(s->to_dst_file);
> > > +    s->to_dst_file = NULL;
> > > +
> > > +    error_report("Detected IO failure for postcopy. "
> > > +                 "Migration paused.");
> > > +
> > > +    /*
> > > +     * We wait until things fixed up. Then someone will setup the
> > > +     * status back for us.
> > > +     */
> > > +    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > +        qemu_sem_wait(&s->postcopy_pause_sem);
> > > +    }
> > > +
> > > +    trace_postcopy_pause_continued();
> > > +
> > > +    return MIG_THR_ERR_RECOVERED;
> > > +}
> > > +
> > > +static MigThrError migration_detect_error(MigrationState *s)
> > > +{
> > > +    int ret;
> > > +
> > > +    /* Try to detect any file errors */
> > > +    ret = qemu_file_get_error(s->to_dst_file);
> > > +
> > > +    if (!ret) {
> > > +        /* Everything is fine */
> > > +        return MIG_THR_ERR_NONE;
> > > +    }
> > > +
> > > +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> > 
> > We do need to make sure that whenever we hit a failure in migration
> > due to a device that we pass that up rather than calling
> > qemu_file_set_error - e.g. an EIO in a block device or network.
> > 
> > However,
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> I'll take the R-b first. :)
> 
> Regarding to above - aren't we currently detecting these kind of
> errors using -EIO?  And network down should be only one of such case?
> 
> For now I still cannot distinguish network down out of something worse
> that cannot even be recovered.  No matter what, current code will go
> into PAUSED state as long as EIO is got.  I thought about it, and for
> now I don't think it is a problem, since even if it is a critical
> failure and unable to recover in any way, we still won't lose anything
> if we stop the VM at once (that's what paused state do - VM is just
> stopped).  For the critical failures, we will just find out that the
> recovery will fail again rather than success.

Yes I think it's fine for now;  my suspicion is that sometimes errors
from devices (e.g. disk/NIC) end up in the qemu_file_set_error - but
they shouldn't, I think we should try and keep that just for actual
migration stream transport errors, and then this patch is safe.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM
  2017-09-28  6:54     ` Peter Xu
@ 2017-10-09 17:28       ` Dr. David Alan Gilbert
  2017-10-10 10:08         ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-09 17:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Sep 22, 2017 at 09:32:28PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > migrate_incoming command is previously only used when we were providing
> > > "-incoming defer" in the command line, to defer the incoming migration
> > > channel creation.
> > > 
> > > However there is similar requirement when we are paused during postcopy
> > > migration. The old incoming channel might have been destroyed already.
> > > We may need another new channel for the recovery to happen.
> > > 
> > > This patch leveraged the same interface, but allows the user to specify
> > > incoming migration channel even for paused postcopy.
> > > 
> > > Meanwhile, now migration listening ports are always detached manually
> > > using the tag, rather than using return values of dispatchers.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/exec.c      |  2 +-
> > >  migration/fd.c        |  2 +-
> > >  migration/migration.c | 39 +++++++++++++++++++++++++++++----------
> > >  migration/socket.c    |  2 +-
> > >  4 files changed, 32 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/migration/exec.c b/migration/exec.c
> > > index ef1fb4c..26fc37d 100644
> > > --- a/migration/exec.c
> > > +++ b/migration/exec.c
> > > @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
> > >  {
> > >      migration_channel_process_incoming(ioc);
> > >      object_unref(OBJECT(ioc));
> > > -    return FALSE; /* unregister */
> > > +    return TRUE; /* keep it registered */
> > >  }
> > >  
> > >  /*
> > > diff --git a/migration/fd.c b/migration/fd.c
> > > index e9a548c..7d0aefa 100644
> > > --- a/migration/fd.c
> > > +++ b/migration/fd.c
> > > @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> > >  {
> > >      migration_channel_process_incoming(ioc);
> > >      object_unref(OBJECT(ioc));
> > > -    return FALSE; /* unregister */
> > > +    return TRUE; /* keep it registered */
> > >  }
> > >  
> > >  /*
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index daf356b..5812478 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -175,6 +175,17 @@ void migration_incoming_state_destroy(void)
> > >      qemu_event_destroy(&mis->main_thread_load_event);
> > >  }
> > >  
> > > +static bool migrate_incoming_detach_listen(MigrationIncomingState *mis)
> > > +{
> > > +    if (mis->listen_task_tag) {
> > > +        /* Never fail */
> > > +        g_source_remove(mis->listen_task_tag);
> > > +        mis->listen_task_tag = 0;
> > > +        return true;
> > > +    }
> > > +    return false;
> > > +}
> > > +
> > >  static void migrate_generate_event(int new_state)
> > >  {
> > >      if (migrate_use_events()) {
> > > @@ -432,10 +443,9 @@ void migration_fd_process_incoming(QEMUFile *f)
> > >  
> > >      /*
> > >       * When reach here, we should not need the listening port any
> > > -     * more. We'll detach the listening task soon, let's reset the
> > > -     * listen task tag.
> > > +     * more.  Detach the listening port explicitly.
> > >       */
> > > -    mis->listen_task_tag = 0;
> > > +    migrate_incoming_detach_listen(mis);
> > >  }
> > >  
> > >  /*
> > > @@ -1291,14 +1301,25 @@ void migrate_del_blocker(Error *reason)
> > >  void qmp_migrate_incoming(const char *uri, Error **errp)
> > >  {
> > >      Error *local_err = NULL;
> > > -    static bool once = true;
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > >  
> > > -    if (!deferred_incoming) {
> > > -        error_setg(errp, "For use with '-incoming defer'");
> > > +    if (!deferred_incoming &&
> > > +        mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > +        error_setg(errp, "For use with '-incoming defer'"
> > > +                   " or PAUSED postcopy migration only.");
> > >          return;
> > >      }
> > > -    if (!once) {
> > > -        error_setg(errp, "The incoming migration has already been started");
> > 
> > What guards against someone doing a migrate_incoming after the succesful
> > completion of an incoming migration?
> 
> If deferred incoming is not enabled, we should be protected by above
> check on (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED).  But yes I
> think this is a problem if deferred incoming is used.  Maybe I should
> still keep the "once" check here for deferred migration, but I think I
> can re-use the variable "deferred_incoming".  Please see below.
> 
> > Also with RDMA the following won't happen so I'm not quite sure what
> > state we're in.
> 
> Indeed.  Currently there is still no good way to destroy the RDMA
> accept handle easily since it's using its own qemu_set_fd_handler()
> way to setup accept ports.  But I think maybe I can solve this problem
> with below issue together.  Please see below.
> 
> > 
> > When we get to non-blocking commands it's also a bit interesting - we
> > could be getting an accept on the main thread at just the same time
> > this is going down the OOB side.
> 
> This is an interesting point.  Thanks for noticing that.
> 
> How about I do it the strict way?  like this (hopefully this can solve
> all the issues mentioned above):
> 
> qmp_migrate_incoming()
> {
>   if (deferred_incoming) {
>     // PASS, deferred incoming is set, and never triggered
>   } else if (state == POSTCOPY_PAUSED && listen_tag == 0) {
>     // PASS, we don't have an accept port
>   } else {
>     // FAIL

One problem is at this point you can't say much about why you failed;
my original migrate_incoming was like this, but then in 4debb5f5 I
added the 'once' to allow you to distinguish the cases of trying to use
migrate_incoming twice from never having used -incoming defer;
Markus asked for that in the review: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04079.html

>   }
> 
>   qemu_start_incoming_migration(uri, &local_err);

We still have to make sure that nothin in that takes a lock.

>   if (local_err) {
>       error_propagate(errp, local_err);
>       return;
>   }
> 
>   // stop allowing this
>   deferred_incoming = false;

OK, this works I think as long as we have the requirement that
only one OOB command can be executing at once.  So that depends
on the structure of your OOB stuff;  if you can run multiple OOB
at once then you can have two instances of this command running
at the same time and this setting passes each other.

(You may have to be careful of the read of state and listen_tag
since those are getting set from another thread).

> }
> 
> To make sure it works, I may need to hack an unique listen tag for
> RDMA for now, say, using (guint)(-1) to stands for RDMA tag (instead
> of really re-write RDMA codes to use the watcher stuff with real
> listen tags), like:
> 
> #define MIG_LISTEN_TAG_RDMA_FAKE ((guint)(-1))
> 
> bool migrate_incoming_detach_listen()
> {
>     if (listen_tag) {
>         if (listen_tag != MIG_LISTEN_TAG_RDMA_FAKE) {
>             // RDMA has already detached the accept port
>             g_source_remove(listen_tag);
>         }
>         listen_tag = 0;
>         return true;
>     }
>     return false;
> }
> 
> Then when listen_tag != 0 it means that there is an acception port,
> and as long as there is one port we don't allow to change it (like the
> pesudo qmp_migrate_incoming() code I wrote).

It's worth noting anyway that RDMA doesn't work with postcopy yet
anyway (although I now have some ideas how we could fix that).

Dave

> Would this work?
> 
> > 
> > Dave
> > 
> > > +
> > > +    /*
> > > +     * Destroy existing listening task if exist. Logically this should
> > > +     * not really happen at all (for either deferred migration or
> > > +     * postcopy migration, we should both detached the listening
> > > +     * task). So raise an error but still we safely detach it.
> > > +     */
> > > +    if (migrate_incoming_detach_listen(mis)) {
> > > +        error_report("%s: detected existing listen channel, "
> > > +                     "while it should not exist", __func__);
> > > +        /* Continue */
> > >      }
> > >  
> > >      qemu_start_incoming_migration(uri, &local_err);
> > > @@ -1307,8 +1328,6 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
> > >          error_propagate(errp, local_err);
> > >          return;
> > >      }
> > > -
> > > -    once = false;
> > >  }
> > >  
> > >  bool migration_is_blocked(Error **errp)
> > > diff --git a/migration/socket.c b/migration/socket.c
> > > index 6ee51ef..e3e453f 100644
> > > --- a/migration/socket.c
> > > +++ b/migration/socket.c
> > > @@ -154,7 +154,7 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
> > >  out:
> > >      /* Close listening socket as its no longer needed */
> > >      qio_channel_close(ioc, NULL);
> > > -    return FALSE; /* unregister */
> > > +    return TRUE; /* keep it registered */
> > >  }
> > >  
> > >  
> > > -- 
> > > 2.7.4
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-09-27  7:34     ` Peter Xu
@ 2017-10-09 18:58       ` Dr. David Alan Gilbert
  2017-10-10  9:38         ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-09 18:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Sep 21, 2017 at 08:29:03PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > When there is IO error on the incoming channel (e.g., network down),
> > > instead of bailing out immediately, we allow the dst vm to switch to the
> > > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> > > new semaphore, until someone poke it for another attempt.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c  |  1 +
> > >  migration/migration.h  |  3 +++
> > >  migration/savevm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  migration/trace-events |  2 ++
> > >  4 files changed, 64 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 8d26ea8..80de212 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
> > >          memset(&mis_current, 0, sizeof(MigrationIncomingState));
> > >          qemu_mutex_init(&mis_current.rp_mutex);
> > >          qemu_event_init(&mis_current.main_thread_load_event, false);
> > > +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> > >          once = true;
> > >      }
> > >      return &mis_current;
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index 0c957c9..c423682 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -60,6 +60,9 @@ struct MigrationIncomingState {
> > >      /* The coroutine we should enter (back) after failover */
> > >      Coroutine *migration_incoming_co;
> > >      QemuSemaphore colo_incoming_sem;
> > > +
> > > +    /* notify PAUSED postcopy incoming migrations to try to continue */
> > > +    QemuSemaphore postcopy_pause_sem_dst;
> > >  };
> > >  
> > >  MigrationIncomingState *migration_incoming_get_current(void);
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index 7172f14..3777124 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > >   */
> > >  static void *postcopy_ram_listen_thread(void *opaque)
> > >  {
> > > -    QEMUFile *f = opaque;
> > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    QEMUFile *f = mis->from_src_file;
> > >      int load_res;
> > >  
> > >      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > > @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > >       */
> > >      qemu_file_set_blocking(f, true);
> > >      load_res = qemu_loadvm_state_main(f, mis);
> > > +
> > > +    /*
> > > +     * This is tricky, but, mis->from_src_file can change after it
> > > +     * returns, when postcopy recovery happened. In the future, we may
> > > +     * want a wrapper for the QEMUFile handle.
> > > +     */
> > > +    f = mis->from_src_file;
> > > +
> > >      /* And non-blocking again so we don't block in any cleanup */
> > >      qemu_file_set_blocking(f, false);
> > >  
> > > @@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> > >      /* Start up the listening thread and wait for it to signal ready */
> > >      qemu_sem_init(&mis->listen_thread_sem, 0);
> > >      qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > > -                       postcopy_ram_listen_thread, mis->from_src_file,
> > > +                       postcopy_ram_listen_thread, NULL,
> > >                         QEMU_THREAD_DETACHED);
> > >      qemu_sem_wait(&mis->listen_thread_sem);
> > >      qemu_sem_destroy(&mis->listen_thread_sem);
> > > @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
> > >      }
> > >  }
> > >  
> > > +/* Return true if we should continue the migration, or false. */
> > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > +{
> > > +    trace_postcopy_pause_incoming();
> > > +
> > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > +
> > > +    assert(mis->from_src_file);
> > > +    qemu_file_shutdown(mis->from_src_file);
> > > +    qemu_fclose(mis->from_src_file);
> > > +    mis->from_src_file = NULL;
> > > +
> > > +    assert(mis->to_src_file);
> > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > +    qemu_file_shutdown(mis->to_src_file);
> > 
> > Should you not do the shutdown() before the lock?
> > For example if the other thread is stuck, with rp_mutex
> > held, trying to write to to_src_file, then you'll block
> > waiting for the mutex.  If you call shutdown and then take
> > the lock, the other thread will error and release the lock.
> 
> The problem is that IMHO QEMUFile is not yet thread-safe itself.  So
> if we operate on it (even to shut it down) logically we need to have
> the lock, right?

That probably needs fixing for 'shutdown' under the assumption that
   a) No one has or is deleting/freeing the QEMUFile
   b) No one is closing the QEMUFile

The whole point of using shutdown() is it forces any stuck send()'s or
read()'s to fail rather than staying stuck.

> Then, IMHO the question would be: when will the send() be stuck in the
> other thread?
> 
> Normally the only case I can think of is that source didn't recv()
> fast enough, and we even consumed all the write buffer in dst side (I
> don't really know how kernel manages the buffers though, and e.g. how
> the size of buffer is defined...).
> 
> But when reach here, the channel (say, from_src_file and to_src_file,
> since both of them are using the same channel behind the QEMUFile
> interface) should already be broken in some way, then IIUC even there
> is a send() in the other thread, it should return at some point with a
> failure as well, just like how we reached here (possibly due to a
> read() failure).

We have to be careful about this; a network can fail in a way it
gets stuck rather than fails - this can get stuck until a full TCP
disconnection; and that takes about 30mins (from memory).
The nice thing about using 'shutdown' is that you can kill the existing
connection if it's hung. (Which then makes an interesting question;
the rules in your migrate-incoming command become different if you
want to declare it's failed!).  Having said that, you're right that at
this point stuff has already failed - so do we need the shutdown?
(You might want to do the shutdown as part of the recovery earlier
or as a separate command to force the failure)

> > I'm not quite sure what will happen if we end up calling this
> > before the main thread has been returned from postcopy and the
> > device loading is complete.
> 
> IIUC you mean the time starts from when we got MIG_CMD_PACKAGED until
> main thread finishes handling that package?

Yes.

> Normally I think that should not matter much since during handling the
> package it should hardly fail (we were reading from a buffer QIO
> channel, no real IOs there)... 

Note that while the main thread is reading the package, the listener
thread is receiving pages, so you can legally get a failure at that
point when the fd fails as it's receiving pages at the same time
as reading the devices.
(There's an argument that if it fails before you've received all
your devices then perhaps you can just restart the source)

> But I agree about the reasoning.  How
> about one more patch to postpone the "active" to "postcopy-active"
> state change after the package is handled correctly?  Like:
> 
> --------------
> diff --git a/migration/savevm.c b/migration/savevm.c                     
> index b5c3214034..8317b2a7e2 100644 
> --- a/migration/savevm.c            
> +++ b/migration/savevm.c            
> @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)                                                                       
>      QEMUFile *f = mis->from_src_file;                                   
>      int load_res;                  
>                                     
> -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
>      qemu_sem_post(&mis->listen_thread_sem);                             
>      trace_postcopy_ram_listen_thread_start();                           
>                                     
> @@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)                                                          
>      qemu_fclose(packf);            
>      object_unref(OBJECT(bioc));    
>                                     
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> +                                   
>      return ret;                    
>  }                                  
> --------------
> 
> This function will only be called with "postcopy-active" state.

I *think* that's safe; you've got to be careful, but I can't see
anyone on the destination that cares about the destinction.

> > Also, at this point have we guaranteed no one else is about
> > to do an op on mis->to_src_file and will seg?
> 
> I think no?  Since IMHO the main thread is playing with the buffer QIO
> channel, rather than the real one?

OK.

> (btw, could I ask what's "seg"? :)

just short for segmentation fault; sig 11.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-09-27 10:04     ` Peter Xu
@ 2017-10-09 19:12       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-09 19:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Sep 22, 2017 at 12:05:42PM +0100, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 7e20097..5d938e3 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -182,6 +182,70 @@ void ramblock_recv_bitmap_clear(RAMBlock *rb, void *host_addr)
> > >      clear_bit(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
> > >  }
> > >  
> > > +#define  RAMBLOCK_RECV_BITMAP_ENDING  (0x0123456789abcdefULL)
> > > +
> > > +/*
> > > + * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
> > > + *
> > > + * Returns >0 if success with sent bytes, or <0 if error.
> > > + */
> > > +int64_t ramblock_recv_bitmap_send(QEMUFile *file,
> > > +                                  const char *block_name)
> > > +{
> > > +    RAMBlock *block = qemu_ram_block_by_name(block_name);
> > > +    unsigned long *le_bitmap, nbits;
> > > +    uint64_t size;
> > > +
> > > +    if (!block) {
> > > +        error_report("%s: invalid block name: %s", __func__, block_name);
> > > +        return -1;
> > > +    }
> > > +
> > > +    nbits = block->used_length >> TARGET_PAGE_BITS;
> > > +
> > > +    /*
> > > +     * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
> > > +     * machines we may need 4 more bytes for padding (see below
> > > +     * comment). So extend it a bit before hand.
> > > +     */
> > > +    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
> > 
> > I do worry what will happen on really huge RAMBlocks; the worst case is
> > that this temporary bitmap is a few GB.
> 
> IIUC the bitmap ratio is 32K, so the ramblock will be 1GB only if the
> guest RAM region size is 1GB * 32K = 32TB.
> 
> Then, can I just assume allocating (only) 1GB temporary memory for a
> guest using more than 32TB memory not a problem? :-)
> 
> I hope I didn't calculate it wrongly though.

No, I think that's right; I was off a few bits.
If they've got 32TB of RAM, then 1GB is probably no issue.

Dave

> > 
> > > +    /*
> > > +     * Always use little endian when sending the bitmap. This is
> > > +     * required that when source and destination VMs are not using the
> > > +     * same endianess. (Note: big endian won't work.)
> > > +     */
> > > +    bitmap_to_le(le_bitmap, block->receivedmap, nbits);
> > > +
> > > +    /* Size of the bitmap, in bytes */
> > > +    size = nbits / 8;
> > > +
> > > +    /*
> > > +     * size is always aligned to 8 bytes for 64bit machines, but it
> > > +     * may not be true for 32bit machines. We need this padding to
> > > +     * make sure the migration can survive even between 32bit and
> > > +     * 64bit machines.
> > > +     */
> > > +    size = ROUND_UP(size, 8);
> > > +
> > > +    qemu_put_be64(file, size);
> > > +    qemu_put_buffer(file, (const uint8_t *)le_bitmap, size);
> > > +    /*
> > > +     * Mark as an end, in case the middle part is screwed up due to
> > > +     * some "misterious" reason.
> > > +     */
> > > +    qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING);
> > > +    qemu_fflush(file);
> > > +
> > > +    free(le_bitmap);
> > > +
> > > +    if (qemu_file_get_error(file)) {
> > > +        return qemu_file_get_error(file);
> > > +    }
> > > +
> > > +    return size + sizeof(size);
> > > +}
> > > +
> > >  /*
> > >   * An outstanding page request, on the source, having been received
> > >   * and queued
> > > @@ -2706,6 +2770,83 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> > >      return ret;
> > >  }
> > >  
> > > +/*
> > > + * Read the received bitmap, revert it as the initial dirty bitmap.
> > > + * This is only used when the postcopy migration is paused but wants
> > > + * to resume from a middle point.
> > > + */
> > > +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > > +{
> > > +    int ret = -EINVAL;
> > > +    QEMUFile *file = s->rp_state.from_dst_file;
> > > +    unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
> > > +    uint64_t local_size = nbits / 8;
> > > +    uint64_t size, end_mark;
> > > +
> > > +    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > > +        error_report("%s: incorrect state %s", __func__,
> > > +                     MigrationStatus_lookup[s->state]);
> > > +        return -EINVAL;
> > > +    }
> > > +
> > > +    /*
> > > +     * Note: see comments in ramblock_recv_bitmap_send() on why we
> > > +     * need the endianess convertion, and the paddings.
> > > +     */
> > > +    local_size = ROUND_UP(local_size, 8);
> > > +
> > > +    /* Add addings */
> > > +    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
> > > +
> > > +    size = qemu_get_be64(file);
> > > +
> > > +    /* The size of the bitmap should match with our ramblock */
> > > +    if (size != local_size) {
> > > +        error_report("%s: ramblock '%s' bitmap size mismatch "
> > > +                     "(0x%lx != 0x%lx)", __func__, block->idstr,
> > > +                     size, local_size);
> > 
> > You need to use PRIx64 formatters there - %lx isn't portable.
> 
> Yes. Fixing.
> 
> > 
> > > +        ret = -EINVAL;
> > > +        goto out;
> > > +    }
> > > +
> > > +    size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
> > > +    end_mark = qemu_get_be64(file);
> > > +
> > > +    ret = qemu_file_get_error(file);
> > > +    if (ret || size != local_size) {
> > > +        error_report("%s: read bitmap failed for ramblock '%s': %d",
> > > +                     __func__, block->idstr, ret);
> > 
> > You might like to include size/local_size in the error.
> 
> Will do.  Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-10-09 18:58       ` Dr. David Alan Gilbert
@ 2017-10-10  9:38         ` Peter Xu
  2017-10-10 11:31           ` Peter Xu
  2017-10-10 12:30           ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 86+ messages in thread
From: Peter Xu @ 2017-10-10  9:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Sep 21, 2017 at 08:29:03PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > When there is IO error on the incoming channel (e.g., network down),
> > > > instead of bailing out immediately, we allow the dst vm to switch to the
> > > > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> > > > new semaphore, until someone poke it for another attempt.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  migration/migration.c  |  1 +
> > > >  migration/migration.h  |  3 +++
> > > >  migration/savevm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > > >  migration/trace-events |  2 ++
> > > >  4 files changed, 64 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 8d26ea8..80de212 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
> > > >          memset(&mis_current, 0, sizeof(MigrationIncomingState));
> > > >          qemu_mutex_init(&mis_current.rp_mutex);
> > > >          qemu_event_init(&mis_current.main_thread_load_event, false);
> > > > +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> > > >          once = true;
> > > >      }
> > > >      return &mis_current;
> > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > index 0c957c9..c423682 100644
> > > > --- a/migration/migration.h
> > > > +++ b/migration/migration.h
> > > > @@ -60,6 +60,9 @@ struct MigrationIncomingState {
> > > >      /* The coroutine we should enter (back) after failover */
> > > >      Coroutine *migration_incoming_co;
> > > >      QemuSemaphore colo_incoming_sem;
> > > > +
> > > > +    /* notify PAUSED postcopy incoming migrations to try to continue */
> > > > +    QemuSemaphore postcopy_pause_sem_dst;
> > > >  };
> > > >  
> > > >  MigrationIncomingState *migration_incoming_get_current(void);
> > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > index 7172f14..3777124 100644
> > > > --- a/migration/savevm.c
> > > > +++ b/migration/savevm.c
> > > > @@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > > >   */
> > > >  static void *postcopy_ram_listen_thread(void *opaque)
> > > >  {
> > > > -    QEMUFile *f = opaque;
> > > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > > +    QEMUFile *f = mis->from_src_file;
> > > >      int load_res;
> > > >  
> > > >      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > > > @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > > >       */
> > > >      qemu_file_set_blocking(f, true);
> > > >      load_res = qemu_loadvm_state_main(f, mis);
> > > > +
> > > > +    /*
> > > > +     * This is tricky, but, mis->from_src_file can change after it
> > > > +     * returns, when postcopy recovery happened. In the future, we may
> > > > +     * want a wrapper for the QEMUFile handle.
> > > > +     */
> > > > +    f = mis->from_src_file;
> > > > +
> > > >      /* And non-blocking again so we don't block in any cleanup */
> > > >      qemu_file_set_blocking(f, false);
> > > >  
> > > > @@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> > > >      /* Start up the listening thread and wait for it to signal ready */
> > > >      qemu_sem_init(&mis->listen_thread_sem, 0);
> > > >      qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > > > -                       postcopy_ram_listen_thread, mis->from_src_file,
> > > > +                       postcopy_ram_listen_thread, NULL,
> > > >                         QEMU_THREAD_DETACHED);
> > > >      qemu_sem_wait(&mis->listen_thread_sem);
> > > >      qemu_sem_destroy(&mis->listen_thread_sem);
> > > > @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
> > > >      }
> > > >  }
> > > >  
> > > > +/* Return true if we should continue the migration, or false. */
> > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > +{
> > > > +    trace_postcopy_pause_incoming();
> > > > +
> > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > +
> > > > +    assert(mis->from_src_file);
> > > > +    qemu_file_shutdown(mis->from_src_file);
> > > > +    qemu_fclose(mis->from_src_file);
> > > > +    mis->from_src_file = NULL;
> > > > +
> > > > +    assert(mis->to_src_file);
> > > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > > +    qemu_file_shutdown(mis->to_src_file);
> > > 
> > > Should you not do the shutdown() before the lock?
> > > For example if the other thread is stuck, with rp_mutex
> > > held, trying to write to to_src_file, then you'll block
> > > waiting for the mutex.  If you call shutdown and then take
> > > the lock, the other thread will error and release the lock.
> > 
> > The problem is that IMHO QEMUFile is not yet thread-safe itself.  So
> > if we operate on it (even to shut it down) logically we need to have
> > the lock, right?
> 
> That probably needs fixing for 'shutdown' under the assumption that
>    a) No one has or is deleting/freeing the QEMUFile
>    b) No one is closing the QEMUFile
> 
> The whole point of using shutdown() is it forces any stuck send()'s or
> read()'s to fail rather than staying stuck.

I see.  I just noticed that actually qemu_file_shutdown() is
thread-safe itself - it boils down to the system shutdown() call (as
long as the above assumption is there).

Let me call qemu_file_shutdown() first before taking the lock to make
sure send()/recv() hang won't happen.

> 
> > Then, IMHO the question would be: when will the send() be stuck in the
> > other thread?
> > 
> > Normally the only case I can think of is that source didn't recv()
> > fast enough, and we even consumed all the write buffer in dst side (I
> > don't really know how kernel manages the buffers though, and e.g. how
> > the size of buffer is defined...).
> > 
> > But when reach here, the channel (say, from_src_file and to_src_file,
> > since both of them are using the same channel behind the QEMUFile
> > interface) should already be broken in some way, then IIUC even there
> > is a send() in the other thread, it should return at some point with a
> > failure as well, just like how we reached here (possibly due to a
> > read() failure).
> 
> We have to be careful about this; a network can fail in a way it
> gets stuck rather than fails - this can get stuck until a full TCP
> disconnection; and that takes about 30mins (from memory).
> The nice thing about using 'shutdown' is that you can kill the existing
> connection if it's hung. (Which then makes an interesting question;
> the rules in your migrate-incoming command become different if you
> want to declare it's failed!).  Having said that, you're right that at
> this point stuff has already failed - so do we need the shutdown?
> (You might want to do the shutdown as part of the recovery earlier
> or as a separate command to force the failure)

I assume if I call shutdown before the lock then we'll be good then.

> 
> > > I'm not quite sure what will happen if we end up calling this
> > > before the main thread has been returned from postcopy and the
> > > device loading is complete.
> > 
> > IIUC you mean the time starts from when we got MIG_CMD_PACKAGED until
> > main thread finishes handling that package?
> 
> Yes.
> 
> > Normally I think that should not matter much since during handling the
> > package it should hardly fail (we were reading from a buffer QIO
> > channel, no real IOs there)... 
> 
> Note that while the main thread is reading the package, the listener
> thread is receiving pages, so you can legally get a failure at that
> point when the fd fails as it's receiving pages at the same time
> as reading the devices.
> (There's an argument that if it fails before you've received all
> your devices then perhaps you can just restart the source)

Yes.

> 
> > But I agree about the reasoning.  How
> > about one more patch to postpone the "active" to "postcopy-active"
> > state change after the package is handled correctly?  Like:
> > 
> > --------------
> > diff --git a/migration/savevm.c b/migration/savevm.c                     
> > index b5c3214034..8317b2a7e2 100644 
> > --- a/migration/savevm.c            
> > +++ b/migration/savevm.c            
> > @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)                                                                       
> >      QEMUFile *f = mis->from_src_file;                                   
> >      int load_res;                  
> >                                     
> > -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> >      qemu_sem_post(&mis->listen_thread_sem);                             
> >      trace_postcopy_ram_listen_thread_start();                           
> >                                     
> > @@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)                                                          
> >      qemu_fclose(packf);            
> >      object_unref(OBJECT(bioc));    
> >                                     
> > +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> > +                                   
> >      return ret;                    
> >  }                                  
> > --------------
> > 
> > This function will only be called with "postcopy-active" state.
> 
> I *think* that's safe; you've got to be careful, but I can't see
> anyone on the destination that cares about the destinction.

Indeed, but I'd say that's the best thing I can think of (and the
simplest).  Even, not sure whether it'll be more clear if we set
postcopy-active state right before starting the VM on destination,
say, at the beginning of loadvm_postcopy_handle_run_bh().

> 
> > > Also, at this point have we guaranteed no one else is about
> > > to do an op on mis->to_src_file and will seg?
> > 
> > I think no?  Since IMHO the main thread is playing with the buffer QIO
> > channel, rather than the real one?
> 
> OK.
> 
> > (btw, could I ask what's "seg"? :)
> 
> just short for segmentation fault; sig 11.

I see.  Thanks!

> 
> Dave
> 
> > -- 
> > Peter Xu
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM
  2017-10-09 17:28       ` Dr. David Alan Gilbert
@ 2017-10-10 10:08         ` Peter Xu
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-10-10 10:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Mon, Oct 09, 2017 at 06:28:06PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > > >  /*
> > > > @@ -1291,14 +1301,25 @@ void migrate_del_blocker(Error *reason)
> > > >  void qmp_migrate_incoming(const char *uri, Error **errp)
> > > >  {
> > > >      Error *local_err = NULL;
> > > > -    static bool once = true;
> > > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > >  
> > > > -    if (!deferred_incoming) {
> > > > -        error_setg(errp, "For use with '-incoming defer'");
> > > > +    if (!deferred_incoming &&
> > > > +        mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > +        error_setg(errp, "For use with '-incoming defer'"
> > > > +                   " or PAUSED postcopy migration only.");
> > > >          return;
> > > >      }
> > > > -    if (!once) {
> > > > -        error_setg(errp, "The incoming migration has already been started");
> > > 
> > > What guards against someone doing a migrate_incoming after the succesful
> > > completion of an incoming migration?
> > 
> > If deferred incoming is not enabled, we should be protected by above
> > check on (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED).  But yes I
> > think this is a problem if deferred incoming is used.  Maybe I should
> > still keep the "once" check here for deferred migration, but I think I
> > can re-use the variable "deferred_incoming".  Please see below.
> > 
> > > Also with RDMA the following won't happen so I'm not quite sure what
> > > state we're in.
> > 
> > Indeed.  Currently there is still no good way to destroy the RDMA
> > accept handle easily since it's using its own qemu_set_fd_handler()
> > way to setup accept ports.  But I think maybe I can solve this problem
> > with below issue together.  Please see below.
> > 
> > > 
> > > When we get to non-blocking commands it's also a bit interesting - we
> > > could be getting an accept on the main thread at just the same time
> > > this is going down the OOB side.
> > 
> > This is an interesting point.  Thanks for noticing that.
> > 
> > How about I do it the strict way?  like this (hopefully this can solve
> > all the issues mentioned above):
> > 
> > qmp_migrate_incoming()
> > {
> >   if (deferred_incoming) {
> >     // PASS, deferred incoming is set, and never triggered
> >   } else if (state == POSTCOPY_PAUSED && listen_tag == 0) {
> >     // PASS, we don't have an accept port
> >   } else {
> >     // FAIL
> 
> One problem is at this point you can't say much about why you failed;
> my original migrate_incoming was like this, but then in 4debb5f5 I
> added the 'once' to allow you to distinguish the cases of trying to use
> migrate_incoming twice from never having used -incoming defer;
> Markus asked for that in the review: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04079.html

Ah.  Then let me revive the "once" parameter:

  if (state == POSTCOPY_PAUSED && listen_tag == 0) {
    // PASS, we don't have an accept port and need recovery
  } else if (deferred_incoming) {
    if (!once) {
      once = true;
      // PASS, incoming is deferred
    } else {
      // FAIL: deferred incoming has been specified already
    }
  } else {
    // FAIL: neither do we need recovery, nor do we have deferred incoming
  }

> 
> >   }
> > 
> >   qemu_start_incoming_migration(uri, &local_err);
> 
> We still have to make sure that nothin in that takes a lock.

I think the monitor_lock is needed when sending events, but I think
it's fine - during critical section of monitor_lock, there is no
chance for page fault.

For the rest, I didn't see a chance.  Hope I didn't miss anything...

> 
> >   if (local_err) {
> >       error_propagate(errp, local_err);
> >       return;
> >   }
> > 
> >   // stop allowing this
> >   deferred_incoming = false;
> 
> OK, this works I think as long as we have the requirement that
> only one OOB command can be executing at once.  So that depends
> on the structure of your OOB stuff;  if you can run multiple OOB
> at once then you can have two instances of this command running
> at the same time and this setting passes each other.

Indeed.  IIUC Markus's proposal (and lastest version of the series)
won't allow OOB to be run in parallel. They (the commands) should be
fast commands, fast enough that won't need to bother to be run
concurrently.  If that can be paralleled, we may need a lock.

> 
> (You may have to be careful of the read of state and listen_tag
> since those are getting set from another thread).

IMHO think it should be fine here - I'm checking on listen_tag against
zero, and this function is the only chance we change it from zero to
non-zero. So as long as we don't parallel this function (or have lock
as mentioned above) IMHO we should be good.

> 
> > }
> > 
> > To make sure it works, I may need to hack an unique listen tag for
> > RDMA for now, say, using (guint)(-1) to stands for RDMA tag (instead
> > of really re-write RDMA codes to use the watcher stuff with real
> > listen tags), like:
> > 
> > #define MIG_LISTEN_TAG_RDMA_FAKE ((guint)(-1))
> > 
> > bool migrate_incoming_detach_listen()
> > {
> >     if (listen_tag) {
> >         if (listen_tag != MIG_LISTEN_TAG_RDMA_FAKE) {
> >             // RDMA has already detached the accept port
> >             g_source_remove(listen_tag);
> >         }
> >         listen_tag = 0;
> >         return true;
> >     }
> >     return false;
> > }
> > 
> > Then when listen_tag != 0 it means that there is an acception port,
> > and as long as there is one port we don't allow to change it (like the
> > pesudo qmp_migrate_incoming() code I wrote).
> 
> It's worth noting anyway that RDMA doesn't work with postcopy yet
> anyway (although I now have some ideas how we could fix that).

Ah, good to know.

Then I think I can avoid introducing this hacky tag any more. Instead,
I may do proper commenting showing that the check should not apply to
RDMA (since we will first check POSTCOPY_PAUSED state before checking
listen_tag, then it would never be RDMA migration).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-10-10  9:38         ` Peter Xu
@ 2017-10-10 11:31           ` Peter Xu
  2017-10-31 18:57             ` Dr. David Alan Gilbert
  2017-10-10 12:30           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-10-10 11:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Tue, Oct 10, 2017 at 05:38:01PM +0800, Peter Xu wrote:

[...]

> > > But I agree about the reasoning.  How
> > > about one more patch to postpone the "active" to "postcopy-active"
> > > state change after the package is handled correctly?  Like:
> > > 
> > > --------------
> > > diff --git a/migration/savevm.c b/migration/savevm.c                     
> > > index b5c3214034..8317b2a7e2 100644 
> > > --- a/migration/savevm.c            
> > > +++ b/migration/savevm.c            
> > > @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)                                                                       
> > >      QEMUFile *f = mis->from_src_file;                                   
> > >      int load_res;                  
> > >                                     
> > > -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > > -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> > >      qemu_sem_post(&mis->listen_thread_sem);                             
> > >      trace_postcopy_ram_listen_thread_start();                           
> > >                                     
> > > @@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)                                                          
> > >      qemu_fclose(packf);            
> > >      object_unref(OBJECT(bioc));    
> > >                                     
> > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > > +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> > > +                                   
> > >      return ret;                    
> > >  }                                  
> > > --------------
> > > 
> > > This function will only be called with "postcopy-active" state.
> > 
> > I *think* that's safe; you've got to be careful, but I can't see
> > anyone on the destination that cares about the destinction.
> 
> Indeed, but I'd say that's the best thing I can think of (and the
> simplest).  Even, not sure whether it'll be more clear if we set
> postcopy-active state right before starting the VM on destination,
> say, at the beginning of loadvm_postcopy_handle_run_bh().

When thinking about this, I had another question.

How do we handle the case if we failed to send the device states in
postcopy_start()?  In that, we do qemu_savevm_send_packaged() then we
assume we are good and return with success. However
qemu_savevm_send_packaged() only means that the data is queued in
write buffer of source host, it does not mean that destination has
loaded the device states correctly.  It's still possible that
destination VM failed to receive the whole packaged data, but source
thought it had done so without problem.

Then source will continue with postcopy-active, destination VM will
instead fail, then fail the source. VM should be lost then since it's
postcopy rather than precopy.

Meanwhile, this cannot be handled by postcopy recovery, since IIUC
postcopy recovery only works after the states are at least loaded on
destination VM (I'll avoid going deeper to think a more complex
protocol for postcopy recovery, please see below).

I think the best/simplest thing to do when encountering this error is
that, when this happens we just fail the migration on source and
continue running on source, which should be the same failure handling
path with precopy.  But still it seems that we don't have a good
mechanism to detect the error when sending MIG_CMD_PACKAGED message
fails in some way (we can add one ACK from dst->src, however it breaks
old VMs).

Before going further, would my worry make any sense?

(I hope this can be a separate problem from postcopy recovery series,
 if it is indeed a problem.  For postcopy recovery, I hope the idea of
 postponing setup POSTCOPY_ACTIVE would suffice)

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-10-10  9:38         ` Peter Xu
  2017-10-10 11:31           ` Peter Xu
@ 2017-10-10 12:30           ` Dr. David Alan Gilbert
  2017-10-11  3:00             ` Peter Xu
  1 sibling, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-10 12:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Sep 21, 2017 at 08:29:03PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > When there is IO error on the incoming channel (e.g., network down),
> > > > > instead of bailing out immediately, we allow the dst vm to switch to the
> > > > > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> > > > > new semaphore, until someone poke it for another attempt.
> > > > > 
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > ---
> > > > >  migration/migration.c  |  1 +
> > > > >  migration/migration.h  |  3 +++
> > > > >  migration/savevm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > >  migration/trace-events |  2 ++
> > > > >  4 files changed, 64 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index 8d26ea8..80de212 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
> > > > >          memset(&mis_current, 0, sizeof(MigrationIncomingState));
> > > > >          qemu_mutex_init(&mis_current.rp_mutex);
> > > > >          qemu_event_init(&mis_current.main_thread_load_event, false);
> > > > > +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> > > > >          once = true;
> > > > >      }
> > > > >      return &mis_current;
> > > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > > index 0c957c9..c423682 100644
> > > > > --- a/migration/migration.h
> > > > > +++ b/migration/migration.h
> > > > > @@ -60,6 +60,9 @@ struct MigrationIncomingState {
> > > > >      /* The coroutine we should enter (back) after failover */
> > > > >      Coroutine *migration_incoming_co;
> > > > >      QemuSemaphore colo_incoming_sem;
> > > > > +
> > > > > +    /* notify PAUSED postcopy incoming migrations to try to continue */
> > > > > +    QemuSemaphore postcopy_pause_sem_dst;
> > > > >  };
> > > > >  
> > > > >  MigrationIncomingState *migration_incoming_get_current(void);
> > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > index 7172f14..3777124 100644
> > > > > --- a/migration/savevm.c
> > > > > +++ b/migration/savevm.c
> > > > > @@ -1488,8 +1488,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > > > >   */
> > > > >  static void *postcopy_ram_listen_thread(void *opaque)
> > > > >  {
> > > > > -    QEMUFile *f = opaque;
> > > > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > +    QEMUFile *f = mis->from_src_file;
> > > > >      int load_res;
> > > > >  
> > > > >      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > > > > @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > > > >       */
> > > > >      qemu_file_set_blocking(f, true);
> > > > >      load_res = qemu_loadvm_state_main(f, mis);
> > > > > +
> > > > > +    /*
> > > > > +     * This is tricky, but, mis->from_src_file can change after it
> > > > > +     * returns, when postcopy recovery happened. In the future, we may
> > > > > +     * want a wrapper for the QEMUFile handle.
> > > > > +     */
> > > > > +    f = mis->from_src_file;
> > > > > +
> > > > >      /* And non-blocking again so we don't block in any cleanup */
> > > > >      qemu_file_set_blocking(f, false);
> > > > >  
> > > > > @@ -1581,7 +1589,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> > > > >      /* Start up the listening thread and wait for it to signal ready */
> > > > >      qemu_sem_init(&mis->listen_thread_sem, 0);
> > > > >      qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > > > > -                       postcopy_ram_listen_thread, mis->from_src_file,
> > > > > +                       postcopy_ram_listen_thread, NULL,
> > > > >                         QEMU_THREAD_DETACHED);
> > > > >      qemu_sem_wait(&mis->listen_thread_sem);
> > > > >      qemu_sem_destroy(&mis->listen_thread_sem);
> > > > > @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void)
> > > > >      }
> > > > >  }
> > > > >  
> > > > > +/* Return true if we should continue the migration, or false. */
> > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > > +{
> > > > > +    trace_postcopy_pause_incoming();
> > > > > +
> > > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > > +
> > > > > +    assert(mis->from_src_file);
> > > > > +    qemu_file_shutdown(mis->from_src_file);
> > > > > +    qemu_fclose(mis->from_src_file);
> > > > > +    mis->from_src_file = NULL;
> > > > > +
> > > > > +    assert(mis->to_src_file);
> > > > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > > > +    qemu_file_shutdown(mis->to_src_file);
> > > > 
> > > > Should you not do the shutdown() before the lock?
> > > > For example if the other thread is stuck, with rp_mutex
> > > > held, trying to write to to_src_file, then you'll block
> > > > waiting for the mutex.  If you call shutdown and then take
> > > > the lock, the other thread will error and release the lock.
> > > 
> > > The problem is that IMHO QEMUFile is not yet thread-safe itself.  So
> > > if we operate on it (even to shut it down) logically we need to have
> > > the lock, right?
> > 
> > That probably needs fixing for 'shutdown' under the assumption that
> >    a) No one has or is deleting/freeing the QEMUFile
> >    b) No one is closing the QEMUFile
> > 
> > The whole point of using shutdown() is it forces any stuck send()'s or
> > read()'s to fail rather than staying stuck.
> 
> I see.  I just noticed that actually qemu_file_shutdown() is
> thread-safe itself - it boils down to the system shutdown() call (as
> long as the above assumption is there).
> 
> Let me call qemu_file_shutdown() first before taking the lock to make
> sure send()/recv() hang won't happen.
> 
> > 
> > > Then, IMHO the question would be: when will the send() be stuck in the
> > > other thread?
> > > 
> > > Normally the only case I can think of is that source didn't recv()
> > > fast enough, and we even consumed all the write buffer in dst side (I
> > > don't really know how kernel manages the buffers though, and e.g. how
> > > the size of buffer is defined...).
> > > 
> > > But when reach here, the channel (say, from_src_file and to_src_file,
> > > since both of them are using the same channel behind the QEMUFile
> > > interface) should already be broken in some way, then IIUC even there
> > > is a send() in the other thread, it should return at some point with a
> > > failure as well, just like how we reached here (possibly due to a
> > > read() failure).
> > 
> > We have to be careful about this; a network can fail in a way it
> > gets stuck rather than fails - this can get stuck until a full TCP
> > disconnection; and that takes about 30mins (from memory).
> > The nice thing about using 'shutdown' is that you can kill the existing
> > connection if it's hung. (Which then makes an interesting question;
> > the rules in your migrate-incoming command become different if you
> > want to declare it's failed!).  Having said that, you're right that at
> > this point stuff has already failed - so do we need the shutdown?
> > (You might want to do the shutdown as part of the recovery earlier
> > or as a separate command to force the failure)
> 
> I assume if I call shutdown before the lock then we'll be good then.

The question is what happens if you only allow recovery if we're already
in postcopy-paused state; in the case of a hung socket, since no IO has
actually failed yet, you will still be in postcopy-active.

Dave

> > 
> > > > I'm not quite sure what will happen if we end up calling this
> > > > before the main thread has been returned from postcopy and the
> > > > device loading is complete.
> > > 
> > > IIUC you mean the time starts from when we got MIG_CMD_PACKAGED until
> > > main thread finishes handling that package?
> > 
> > Yes.
> > 
> > > Normally I think that should not matter much since during handling the
> > > package it should hardly fail (we were reading from a buffer QIO
> > > channel, no real IOs there)... 
> > 
> > Note that while the main thread is reading the package, the listener
> > thread is receiving pages, so you can legally get a failure at that
> > point when the fd fails as it's receiving pages at the same time
> > as reading the devices.
> > (There's an argument that if it fails before you've received all
> > your devices then perhaps you can just restart the source)
> 
> Yes.
> 
> > 
> > > But I agree about the reasoning.  How
> > > about one more patch to postpone the "active" to "postcopy-active"
> > > state change after the package is handled correctly?  Like:
> > > 
> > > --------------
> > > diff --git a/migration/savevm.c b/migration/savevm.c                     
> > > index b5c3214034..8317b2a7e2 100644 
> > > --- a/migration/savevm.c            
> > > +++ b/migration/savevm.c            
> > > @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)                                                                       
> > >      QEMUFile *f = mis->from_src_file;                                   
> > >      int load_res;                  
> > >                                     
> > > -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > > -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> > >      qemu_sem_post(&mis->listen_thread_sem);                             
> > >      trace_postcopy_ram_listen_thread_start();                           
> > >                                     
> > > @@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)                                                          
> > >      qemu_fclose(packf);            
> > >      object_unref(OBJECT(bioc));    
> > >                                     
> > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > > +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> > > +                                   
> > >      return ret;                    
> > >  }                                  
> > > --------------
> > > 
> > > This function will only be called with "postcopy-active" state.
> > 
> > I *think* that's safe; you've got to be careful, but I can't see
> > anyone on the destination that cares about the destinction.
> 
> Indeed, but I'd say that's the best thing I can think of (and the
> simplest).  Even, not sure whether it'll be more clear if we set
> postcopy-active state right before starting the VM on destination,
> say, at the beginning of loadvm_postcopy_handle_run_bh().
> 
> > 
> > > > Also, at this point have we guaranteed no one else is about
> > > > to do an op on mis->to_src_file and will seg?
> > > 
> > > I think no?  Since IMHO the main thread is playing with the buffer QIO
> > > channel, rather than the real one?
> > 
> > OK.
> > 
> > > (btw, could I ask what's "seg"? :)
> > 
> > just short for segmentation fault; sig 11.
> 
> I see.  Thanks!
> 
> > 
> > Dave
> > 
> > > -- 
> > > Peter Xu
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-10-10 12:30           ` Dr. David Alan Gilbert
@ 2017-10-11  3:00             ` Peter Xu
  2017-10-12 12:19               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 86+ messages in thread
From: Peter Xu @ 2017-10-11  3:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Tue, Oct 10, 2017 at 01:30:18PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > > We have to be careful about this; a network can fail in a way it
> > > gets stuck rather than fails - this can get stuck until a full TCP
> > > disconnection; and that takes about 30mins (from memory).
> > > The nice thing about using 'shutdown' is that you can kill the existing
> > > connection if it's hung. (Which then makes an interesting question;
> > > the rules in your migrate-incoming command become different if you
> > > want to declare it's failed!).  Having said that, you're right that at
> > > this point stuff has already failed - so do we need the shutdown?
> > > (You might want to do the shutdown as part of the recovery earlier
> > > or as a separate command to force the failure)
> > 
> > I assume if I call shutdown before the lock then we'll be good then.
> 
> The question is what happens if you only allow recovery if we're already
> in postcopy-paused state; in the case of a hung socket, since no IO has
> actually failed yet, you will still be in postcopy-active.

Hmm, but isn't that a problem of kernel rather than QEMU?  Since
sockets are after all managed by kernel.

I don't really know what is the best thing to do to detect whether a
socket is stuck.  Assume we can observed that (say, we see migration
transferred bytes keep static for 30 seconds), IIRC you mentioned
about iptable tricks to break an existing e.g. TCP connection, then we
can trigger the -EIO path.

Or do you think we should provide a way to manually trigger the paused
state?  Then it goes back to something we discussed with Dan in the
earlier post - I'd appreciate if we can postpone the manual trigger
support a bit (to make this series small, which is already not...).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-10-11  3:00             ` Peter Xu
@ 2017-10-12 12:19               ` Dr. David Alan Gilbert
  2017-10-13  5:08                 ` Peter Xu
  0 siblings, 1 reply; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Oct 10, 2017 at 01:30:18PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > > We have to be careful about this; a network can fail in a way it
> > > > gets stuck rather than fails - this can get stuck until a full TCP
> > > > disconnection; and that takes about 30mins (from memory).
> > > > The nice thing about using 'shutdown' is that you can kill the existing
> > > > connection if it's hung. (Which then makes an interesting question;
> > > > the rules in your migrate-incoming command become different if you
> > > > want to declare it's failed!).  Having said that, you're right that at
> > > > this point stuff has already failed - so do we need the shutdown?
> > > > (You might want to do the shutdown as part of the recovery earlier
> > > > or as a separate command to force the failure)
> > > 
> > > I assume if I call shutdown before the lock then we'll be good then.
> > 
> > The question is what happens if you only allow recovery if we're already
> > in postcopy-paused state; in the case of a hung socket, since no IO has
> > actually failed yet, you will still be in postcopy-active.
> 
> Hmm, but isn't that a problem of kernel rather than QEMU?  Since
> sockets are after all managed by kernel.

Kind of, but it comes down to what the right behaviour of a TCP socket
is, and the kernel is probably doing the right thing.

> I don't really know what is the best thing to do to detect whether a
> socket is stuck.  Assume we can observed that (say, we see migration
> transferred bytes keep static for 30 seconds), IIRC you mentioned
> about iptable tricks to break an existing e.g. TCP connection, then we
> can trigger the -EIO path.

From the qemu level I'd prefer to make it a command;  if we start
adding heuristics and timeouts etc then it's very difficult to actually
get them right.

> Or do you think we should provide a way to manually trigger the paused
> state?  Then it goes back to something we discussed with Dan in the
> earlier post - I'd appreciate if we can postpone the manual trigger
> support a bit (to make this series small, which is already not...).

I think that manual trigger is probably necessary; it would just call a
shutdown() on the sockets and let the things fail into the paused state.
It'd be pretty simple.  It would be another OOB command; the tricky
part is just making sure it's thread safe against hte migration
finishing when you issue it.

I think it can wait until after this series if you want, but it would
be good if we can figure it out.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-10-12 12:19               ` Dr. David Alan Gilbert
@ 2017-10-13  5:08                 ` Peter Xu
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2017-10-13  5:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

On Thu, Oct 12, 2017 at 01:19:52PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Oct 10, 2017 at 01:30:18PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:
> > 
> > [...]
> > 
> > > > > We have to be careful about this; a network can fail in a way it
> > > > > gets stuck rather than fails - this can get stuck until a full TCP
> > > > > disconnection; and that takes about 30mins (from memory).
> > > > > The nice thing about using 'shutdown' is that you can kill the existing
> > > > > connection if it's hung. (Which then makes an interesting question;
> > > > > the rules in your migrate-incoming command become different if you
> > > > > want to declare it's failed!).  Having said that, you're right that at
> > > > > this point stuff has already failed - so do we need the shutdown?
> > > > > (You might want to do the shutdown as part of the recovery earlier
> > > > > or as a separate command to force the failure)
> > > > 
> > > > I assume if I call shutdown before the lock then we'll be good then.
> > > 
> > > The question is what happens if you only allow recovery if we're already
> > > in postcopy-paused state; in the case of a hung socket, since no IO has
> > > actually failed yet, you will still be in postcopy-active.
> > 
> > Hmm, but isn't that a problem of kernel rather than QEMU?  Since
> > sockets are after all managed by kernel.
> 
> Kind of, but it comes down to what the right behaviour of a TCP socket
> is, and the kernel is probably doing the right thing.
> 
> > I don't really know what is the best thing to do to detect whether a
> > socket is stuck.  Assume we can observed that (say, we see migration
> > transferred bytes keep static for 30 seconds), IIRC you mentioned
> > about iptable tricks to break an existing e.g. TCP connection, then we
> > can trigger the -EIO path.
> 
> From the qemu level I'd prefer to make it a command;  if we start
> adding heuristics and timeouts etc then it's very difficult to actually
> get them right.
> 
> > Or do you think we should provide a way to manually trigger the paused
> > state?  Then it goes back to something we discussed with Dan in the
> > earlier post - I'd appreciate if we can postpone the manual trigger
> > support a bit (to make this series small, which is already not...).
> 
> I think that manual trigger is probably necessary; it would just call a
> shutdown() on the sockets and let the things fail into the paused state.
> It'd be pretty simple.  It would be another OOB command; the tricky
> part is just making sure it's thread safe against hte migration
> finishing when you issue it.
> 
> I think it can wait until after this series if you want, but it would
> be good if we can figure it out.

OK.  Let me try it in my next post.  I hope it won't grow into
something bigger (which does happens sometimes... :).

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
  2017-10-10 11:31           ` Peter Xu
@ 2017-10-31 18:57             ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 86+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-31 18:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Daniel P . Berrange,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Oct 10, 2017 at 05:38:01PM +0800, Peter Xu wrote:
> 
> [...]
> 
> > > > But I agree about the reasoning.  How
> > > > about one more patch to postpone the "active" to "postcopy-active"
> > > > state change after the package is handled correctly?  Like:
> > > > 
> > > > --------------
> > > > diff --git a/migration/savevm.c b/migration/savevm.c                     
> > > > index b5c3214034..8317b2a7e2 100644 
> > > > --- a/migration/savevm.c            
> > > > +++ b/migration/savevm.c            
> > > > @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void *opaque)                                                                       
> > > >      QEMUFile *f = mis->from_src_file;                                   
> > > >      int load_res;                  
> > > >                                     
> > > > -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > > > -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> > > >      qemu_sem_post(&mis->listen_thread_sem);                             
> > > >      trace_postcopy_ram_listen_thread_start();                           
> > > >                                     
> > > > @@ -1817,6 +1815,9 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)                                                          
> > > >      qemu_fclose(packf);            
> > > >      object_unref(OBJECT(bioc));    
> > > >                                     
> > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,             
> > > > +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);   
> > > > +                                   
> > > >      return ret;                    
> > > >  }                                  
> > > > --------------
> > > > 
> > > > This function will only be called with "postcopy-active" state.
> > > 
> > > I *think* that's safe; you've got to be careful, but I can't see
> > > anyone on the destination that cares about the destinction.
> > 
> > Indeed, but I'd say that's the best thing I can think of (and the
> > simplest).  Even, not sure whether it'll be more clear if we set
> > postcopy-active state right before starting the VM on destination,
> > say, at the beginning of loadvm_postcopy_handle_run_bh().
> 
> When thinking about this, I had another question.
> 
> How do we handle the case if we failed to send the device states in
> postcopy_start()?  In that, we do qemu_savevm_send_packaged() then we
> assume we are good and return with success. However
> qemu_savevm_send_packaged() only means that the data is queued in
> write buffer of source host, it does not mean that destination has
> loaded the device states correctly.  It's still possible that
> destination VM failed to receive the whole packaged data, but source
> thought it had done so without problem.
> 
> Then source will continue with postcopy-active, destination VM will
> instead fail, then fail the source. VM should be lost then since it's
> postcopy rather than precopy.
> 
> Meanwhile, this cannot be handled by postcopy recovery, since IIUC
> postcopy recovery only works after the states are at least loaded on
> destination VM (I'll avoid going deeper to think a more complex
> protocol for postcopy recovery, please see below).
> 
> I think the best/simplest thing to do when encountering this error is
> that, when this happens we just fail the migration on source and
> continue running on source, which should be the same failure handling
> path with precopy.  But still it seems that we don't have a good
> mechanism to detect the error when sending MIG_CMD_PACKAGED message
> fails in some way (we can add one ACK from dst->src, however it breaks
> old VMs).
> 
> Before going further, would my worry make any sense?

Yes, I think it does; it wouldn't be unusual for a device-load to fail
due to some problem on the destination host or a problem in device
serialisation.
I also think we should be OK to restart on the source; although we
have to be careful - can we really know what the previous devices (that
loaded succesfully) did?  Hopefully they didn't change the state of the
storage/networking because the destination CPUs haven't started.

> (I hope this can be a separate problem from postcopy recovery series,
>  if it is indeed a problem.  For postcopy recovery, I hope the idea of
>  postponing setup POSTCOPY_ACTIVE would suffice)

Sure.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2017-10-31 18:57 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30  8:31 [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery Peter Xu
2017-08-30  8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
2017-09-20  8:41   ` Juan Quintela
2017-08-30  8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
2017-09-20  8:25   ` Juan Quintela
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
2017-09-21 17:35   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
2017-09-06 14:36   ` Dr. David Alan Gilbert
2017-09-20  8:44   ` Juan Quintela
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
2017-09-21 17:51   ` Dr. David Alan Gilbert
2017-09-26  8:48     ` Peter Xu
2017-09-26  8:53       ` Dr. David Alan Gilbert
2017-09-26  9:13         ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
2017-09-20  8:47   ` Juan Quintela
2017-09-20  9:06   ` Juan Quintela
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
2017-09-21 17:57   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
2017-09-21 19:21   ` Dr. David Alan Gilbert
2017-09-26  9:35     ` Peter Xu
2017-10-09 15:32       ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
2017-09-21 19:29   ` Dr. David Alan Gilbert
2017-09-27  7:34     ` Peter Xu
2017-10-09 18:58       ` Dr. David Alan Gilbert
2017-10-10  9:38         ` Peter Xu
2017-10-10 11:31           ` Peter Xu
2017-10-31 18:57             ` Dr. David Alan Gilbert
2017-10-10 12:30           ` Dr. David Alan Gilbert
2017-10-11  3:00             ` Peter Xu
2017-10-12 12:19               ` Dr. David Alan Gilbert
2017-10-13  5:08                 ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
2017-09-22  9:09   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
2017-09-22  9:56   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
2017-09-22 10:08   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-09-22 11:05   ` Dr. David Alan Gilbert
2017-09-27 10:04     ` Peter Xu
2017-10-09 19:12       ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-09-22 11:08   ` Dr. David Alan Gilbert
2017-09-27 10:11     ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-09-22 11:13   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-09-22 11:17   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
2017-09-22 11:33   ` Dr. David Alan Gilbert
2017-09-28  2:30     ` Peter Xu
2017-10-02 11:04       ` Dr. David Alan Gilbert
2017-10-09  3:55         ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
2017-09-22 11:53   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
2017-09-22 11:56   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
2017-09-22 20:08   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
2017-09-22 20:11   ` Dr. David Alan Gilbert
2017-09-28  3:12     ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
2017-09-22 20:15   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
2017-09-22 20:15   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
2017-09-22 20:17   ` Dr. David Alan Gilbert
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
2017-09-22 20:32   ` Dr. David Alan Gilbert
2017-09-28  6:54     ` Peter Xu
2017-10-09 17:28       ` Dr. David Alan Gilbert
2017-10-10 10:08         ` Peter Xu
2017-08-30  8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
2017-09-22 20:37   ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.